...making Linux just a little more fun!
Jimmy ORegan [joregan at gmail.com]
I have a couple of scripts that almost work, and I was wondering if anyone (Ben? could tell me why...
First, I want to convert a list of tags in the IPA PAN's corpus format (subst:pl:dat:f) to Apertium's tag format (n.f.pl.dat). I have this:
#!/usr/bin/perl use warnings; use strict; # tags to replace my %terms = qw(n nt pri p1 sec p2 ter p3 subst n); while (<>) { my @in = split/:/; my @out = map { ($terms{$_} ne "") ? $terms{$_} : $_ } @in; if ($#out > 3) { my $type = $out[3]; $out[3] = $out[2]; $out[2] = $out[1]; $out[1] = $type; } print join '.', @out; }
That's broken, because it only works for tag sets which have more than 4 entries, but changing the if to "($#out >= 3)" gives me this: ".sg.nomxxs.m3" from "xxs:sg:nom:m3". I also get a lot of warnings:
Use of uninitialized value in string ne at foo.pl line 11, <> line 1085. Use of uninitialized value in string ne at foo.pl line 11, <> line 1086.
Next, I have a list of names extracted from a Polish morphology dictionary[1] that I'm trying to convert to a list of word stems and endings. I have this, which works (aside from a couple of errors):
#!/usr/bin/perl use warnings; use strict; use String::Diff qw/diff_fully/; use Data::Dumper; #test(); while(<>) { s/,\W+$//; my $endings = $_; my @a = split/, /; my $stem = find_stem(@a); $endings =~ s/$stem//g; print $stem; if ($endings =~ /?/) {print ":n.f:";} elsif ($endings =~ /owie/) {print ":n.m1:";} else {print ":n.??:";} print $endings . "\n"; } sub test() { my $test = "Adam, Adama, Adaemie, Adamowi, Adamem, Adamach, Adamami, Adamom"; my @t = split/, /, $test; print find_stem(@t); print "\n"; } sub find_stem() { my @in = @_; my ($r, $l, $cur, $last); my $i=0; while ($i<($#in)) { ($r, $l) = diff_fully($in[$i], $in[$i+1]); $cur = $r->[0]->[1]; $last = $cur if (!$last); if ($cur ne $last) { ($r, $l) = diff_fully($last, $cur); $last = $r->[0]->[1]; } $i++; } return $last; }
but if I change the end of the while() to this:
else {print ":n.??:";} my @ends = split/, /, $endings; sort(@ends); $endings = join(',', at ends); print $endings . "\n";
to sort the endings, it... doesn't. What am I missing?
[1] "S?ownik alternatywny", under the GPL: http://www.kurnik.pl/slownik/odmiany/
Ben Okopnik [ben at linuxgazette.net]
On Tue, Oct 09, 2007 at 07:20:41PM +0100, Jimmy O'Regan wrote:
> I have a couple of scripts that almost work, and I was wondering if > anyone (Ben? could tell me why...
I'll give it a shot. The only problem is, your script is doing more than what you describe here - so I'm going to have to guess about a few things. Worse yet, since your code isn't doing what it's supposed to do, I'm guessing based on wrong data. But hey, for a friend...
> First, I want to convert a list of tags in the IPA PAN's corpus format > (subst:pl:dat:f) to Apertium's tag format (n.f.pl.dat). I have this:
Something like this, maybe? Again, I'm just guessing.
perl _F: -wlne'shift @F; print "n. at F[2,0,1]"' input_file
> `` > #!/usr/bin/perl > > use warnings; > use strict; > > # tags to replace > my %terms = qw(n nt pri p1 sec p2 ter p3 subst n);Just a personal reaction here - BLECH. I hate having to count terms to figure out what's a key and what's a value.
my %terms = ( n => 'nt', pri => 'p1', sec => 'p2', ter => 'p3', subst => 'n' );
> while (<>) > {
chomp; # If you don't handle that "\n", you'll be sorry...
> my @in = split/:/; > my @out = map { ($terms{$_} ne "") ? $terms{$_} : $_ } @in;
What happens when $terms{$_} is undefined? Bad news, that's what. I suspect that this is where your errors are coming from - perhaps with help from the absence of that 'chomp'.
> if ($#out > 3) { > my $type = $out[3]; > $out[3] = $out[2]; > $out[2] = $out[1]; > $out[1] = $type;
What happens to your $out[0]? Is it just supposed to be ignored? In any case, you could just use a list slice instead of all the manual swapping:
@out[1 .. 3] = @out[3, 1, 2];
However, I strongly suspect that the 'map' statement is the source of your problems.
> } > print join '.', @out; > } > '' > > That's broken, because it only works for tag sets which have more than > 4 entries, but changing the if to "($#out >= 3)" gives me this: > ".sg.nomxxs.m3" from "xxs:sg:nom:m3". I also get a lot of warnings:
The best thing you could do to help me help you is by providing a bunch of example inputs and expected outputs. It sounds like it should be trivially simple to mung this stuff; this is the kind of thing that Perl is really good at.
> Next, I have a list of names extracted from a Polish morphology > dictionary[1] that I'm trying to convert to a list of word stems and > endings. I have this, which works (aside from a couple of errors): > > `` > #!/usr/bin/perl > > use warnings; > use strict; > > use String::Diff qw/diff_fully/; > use Data::Dumper; > > #test(); > while(<>) > { > s/,\W+$//; > my $endings = $_; > my @a = split/, /; > my $stem = find_stem(@a); > $endings =~ s/$stem//g; > print $stem; > if ($endings =~ /??/) {print ":n.f:";} > elsif ($endings =~ /owie/) {print ":n.m1:";} > else {print ":n.??:";} > print $endings . "\n"; > } > > sub test() > { > my $test = "Adam, Adama, Adaemie, Adamowi, Adamem, Adamach, Adamami, Adamom"; > my @t = split/, /, $test; > print find_stem(@t); > print "\n"; > } > > sub find_stem() > { > my @in = @_; > my ($r, $l, $cur, $last); > my $i=0; > > while ($i<($#in)) > { > ($r, $l) = diff_fully($in[$i], $in[$i+1]); > > $cur = $r->[0]->[1]; > $last = $cur if (!$last); > if ($cur ne $last) { > ($r, $l) = diff_fully($last, $cur); > $last = $r->[0]->[1]; > } > $i++; > } > return $last; > } > '' > > but if I change the end of the while() to this: > > `` > else {print ":n.??:";} > my @ends = split/, /, $endings; > sort(@ends); > $endings = join(',', at ends); > print $endings . "\n"; > '' > > to sort the endings, it... doesn't. What am I missing?
Like most Perl functions, "sort" doesn't modify the specified variable - it just returns a modified value. Your 'use warnings;' line should definitely have generated a warning about that.
ben at Tyr:/tmp$ perl -wle'@a = qw/3 2 1/; sort @a; print "@a"' Useless use of sort in void context at -e line 1. 3 2 1
You can try this instead:
else { print ":n.??:"; } # If you're only going to use a variable once, don't bother. print join(',', sort split/, /, $endings), "\n";
-- * Ben Okopnik * Editor-in-Chief, Linux Gazette * http://LinuxGazette.NET *
Ben Okopnik [ben at linuxgazette.net]
On Tue, Oct 09, 2007 at 04:55:55PM -0400, Benjamin Okopnik wrote:
> > Something like this, maybe? Again, I'm just guessing. > > `` > perl _F: -wlne'shift @F; print "n. at F[2,0,1]"' input_file > ''
Arrgh. Must be the welding fumes getting to me. That should be
perl -F: -walne'shift @F; print "n. at F[2,0,1]"' input_file
-- * Ben Okopnik * Editor-in-Chief, Linux Gazette * http://LinuxGazette.NET *
Jimmy ORegan [joregan at gmail.com]
On 09/10/2007, Ben Okopnik <ben at linuxgazette.net> wrote:
> On Tue, Oct 09, 2007 at 07:20:41PM +0100, Jimmy O'Regan wrote: > > I have a couple of scripts that almost work, and I was wondering if > > anyone (Ben? could tell me why... > > I'll give it a shot. The only problem is, your script is doing more than > what you describe here - so I'm going to have to guess about a few > things. Worse yet, since your code isn't doing what it's supposed to do, > I'm guessing based on wrong data. But hey, for a friend... >
Oh, OK. Input:
subst:pl:acc:f subst:pl:acc:m1 subst:pl:acc:m2 subst:pl:acc:m3 subst:pl:acc:n subst:pl:dat:f subst:pl:dat:m1 subst:pl:dat:m2 adj:sg:nom:n:comp adj:sg:nom:n:pos adj:sg:nom:n:sup adj:sg:voc:f:pos adj:sg:voc:m1:comp
expected output:
n.f.pl.acc n.m1.pl.acc n.m2.pl.acc n.m3.pl.acc n.nt.pl.acc n.f.pl.dat n.m1.pl.dat n.m2.pl.dat adj.nt.sg.nom.comp adj.nt.sg.nom.pos adj.nt.sg.nom.sup adj.f.sg.voc.pos adj.m1.sg.voc.comp
> > First, I want to convert a list of tags in the IPA PAN's corpus format > > (subst:pl:dat:f) to Apertium's tag format (n.f.pl.dat). I have this: > > Something like this, maybe? Again, I'm just guessing. > > `` > perl _F: -wlne'shift @F; print "n. at F[2,0,1]"' input_file > '' >
Kind of. Everything except for 1..3 passes through, except maybe with a change from that hash.
> > `` > > #!/usr/bin/perl > > > > use warnings; > > use strict; > > > > # tags to replace > > my %terms = qw(n nt pri p1 sec p2 ter p3 subst n); > > Just a personal reaction here - BLECH. I hate having to count > terms to figure out what's a key and what's a value. >
Oh... yeah. I can see a few more terms that'll have to be swapped, and your way is definitely less confusing.
> `` > my %terms = ( n => 'nt', > pri => 'p1', > sec => 'p2', > ter => 'p3', > subst => 'n' > ); > '' > > > while (<>) > > { > > chomp; # If you don't handle that "\n", you'll be sorry... >
I had one in there at one stage; I don't remember why I took it out.
> > my @in = split/:/; > > my @out = map { ($terms{$_} ne "") ? $terms{$_} : $_ } @in; > > What happens when $terms{$_} is undefined? Bad news, that's what. I > suspect that this is where your errors are coming from - perhaps with > help from the absence of that 'chomp'. > > > if ($#out > 3) { > > my $type = $out[3]; > > $out[3] = $out[2]; > > $out[2] = $out[1]; > > $out[1] = $type; > > What happens to your $out[0]? Is it just supposed to be ignored? > In any case, you could just use a list slice instead of all the manual > swapping: >
It's ignored because it's in the right place.
> `` > @out[1 .. 3] = @out[3, 1, 2]; > '' > > However, I strongly suspect that the 'map' statement is the source of > your problems. >
Yeah... using that was kind of wishful thinking, because I don't really get it. (Yet!)
> > } > > print join '.', @out; > > } > > '' > > > > That's broken, because it only works for tag sets which have more than > > 4 entries, but changing the if to "($#out >= 3)" gives me this: > > ".sg.nomxxs.m3" from "xxs:sg:nom:m3". I also get a lot of warnings: > > The best thing you could do to help me help you is by providing a bunch > of example inputs and expected outputs. It sounds like it should be > trivially simple to mung this stuff; this is the kind of thing that Perl > is really good at. > > > `` > > else {print ":n.??:";} > > my @ends = split/, /, $endings; > > sort(@ends); > > $endings = join(',', at ends); > > print $endings . "\n"; > > '' > > > > to sort the endings, it... doesn't. What am I missing? > > Like most Perl functions, "sort" doesn't modify the specified variable - > it just returns a modified value. Your 'use warnings;' line should > definitely have generated a warning about that. >
Ah. So it does.
> `` > ben at Tyr:/tmp$ perl -wle'@a = qw/3 2 1/; sort @a; print "@a"' > Useless use of sort in void context at -e line 1. > 3 2 1 > '' > > You can try this instead: > > `` > else { > print ":n.??:"; > } > > # If you're only going to use a variable once, don't bother. > print join(',', sort split/, /, $endings), "\n"; > ''
Oh, cool. Thanks.
Ben Okopnik [ben at linuxgazette.net]
On Wed, Oct 10, 2007 at 12:16:10AM +0100, Jimmy O'Regan wrote:
> On 09/10/2007, Ben Okopnik <ben at linuxgazette.net> wrote: > > On Tue, Oct 09, 2007 at 07:20:41PM +0100, Jimmy O'Regan wrote: > > > I have a couple of scripts that almost work, and I was wondering if > > > anyone (Ben? could tell me why... > > > > I'll give it a shot. The only problem is, your script is doing more than > > what you describe here - so I'm going to have to guess about a few > > things. Worse yet, since your code isn't doing what it's supposed to do, > > I'm guessing based on wrong data. But hey, for a friend... > > > > Oh, OK. Input: > > `` > subst:pl:acc:f > subst:pl:acc:m1 > subst:pl:acc:m2 > subst:pl:acc:m3 > subst:pl:acc:n > subst:pl:dat:f > subst:pl:dat:m1 > subst:pl:dat:m2 > adj:sg:nom:n:comp > adj:sg:nom:n:pos > adj:sg:nom:n:sup > adj:sg:voc:f:pos > adj:sg:voc:m1:comp > '' > > expected output: > > `` > n.f.pl.acc > n.m1.pl.acc > n.m2.pl.acc > n.m3.pl.acc > n.nt.pl.acc > n.f.pl.dat > n.m1.pl.dat > n.m2.pl.dat > adj.nt.sg.nom.comp > adj.nt.sg.nom.pos > adj.nt.sg.nom.sup > adj.f.sg.voc.pos > adj.m1.sg.voc.comp > ''
Ah. OK, got it.
#!/usr/bin/perl -w # Created by Ben Okopnik on Wed Oct 10 09:21:10 EDT 2007 %repl = ( subst => 'n', n => 'nt', ); while (<>){ next unless /:/; chomp; my @in = split /:/; if (@in > 4){ @in = @in[0, 3, 1, 2, 4]; $in[1] =~ s/(.*)/$repl{$1}||$1/e; } else { @in = @in[0, 3, 1, 2]; $in[0] =~ s/(.*)/$repl{$1}||$1/e; $in[1] =~ s/(.*)/$repl{$1}||$1/e; } print join(".", @in), "\n"; }
It's a little clunky, but... I've got a lot on my mind this morning.
> > However, I strongly suspect that the 'map' statement is the source of > > your problems. > > Yeah... using that was kind of wishful thinking, because I don't > really get it. (Yet!)
[grin] You're not the only one. 'map' can get a little complex - especially since, based on its semantics, it can modify a list "in place" - or return a modified list, leaving the original alone.
-- * Ben Okopnik * Editor-in-Chief, Linux Gazette * http://LinuxGazette.NET *