...making Linux just a little more fun!
[ In reference to "/okopnik.html" in LG#issue84 ]
clarjon1 [clarjon1 at gmail.com]
Hey, all. I've recently downloaded a bunch of HTML files, and wanted to name them by their title. I remembered the scripts in the Perl One-Liner of the Month: The Adventure of the Misnamed Files (LG 84), and thought that they would be useful, as they seemed to be what I needed.
I first tried the one-liner, and instead of the zero, I got 258 (which, btw, is the number of files in the directory I was in) So I copied the "expanded" version of the script, and saved it as ../script1.pl. Ran it, it came up with 0 as output (which, according to the story, is a Good Thing), so I then tried the second one liner. Laptop thought for a second, then gave me a command line again. So, I run ls, and lo and behold! No changes. Tried it with the expanded version, saved as ../script2.pl. Same result.
I was wondering if you might know of an updated version that I could try to use? I'm not well enough versed in Perl to figure it out all on my own, and I'm not usre what (or where) I should be looking...
Also, for reference, the files are all html, and about half named with an html extension, and the other half have no extension.
Thanks in advance. I'd hate to have to do the task manually.
-- clarjon1
Ben Okopnik [ben at linuxgazette.net]
On Thu, Jan 24, 2008 at 12:50:14PM -0500, clarjon1 wrote:
> Hey, all. > I've recently downloaded a bunch of HTML files, and wanted to name > them by their title. I remembered the scripts in the Perl One-Liner > of the Month: The Adventure of the Misnamed Files (LG 84), and thought > that they would be useful, as they seemed to be what I needed.
Whoops. There's a reason for the warning that I've put at the top of a lot of those articles: one-liners should not be used for production code; they're just a way to learn to read code, and maybe have a little fun along the way.
However, in looking at the code in that particular article, I'd say that it should indeed work; there's nothing particularly edgy in it, and no deprecated features.
> I first tried the one-liner, and instead of the zero, I got 258 > (which, btw, is the number of files in the directory I was in)
Since that one liner simply counts the number of files and subtracts those that have titles, this would imply that none of your HTML files had titles - at least not in the layout that the script presumes, which is
<title>Title Goes Here</title>You may have something like this, instead:
<title> Title Goes Here </title>which is fine for HTML purposes but won't work with that one-liner unless it's modified a bit. Something like this should work:
perl -0 -wne'END{print"$n\n"}eof&&$n++;/<title>\s*[^<]+/i&&$n--' *
> So I > copied the "expanded" version of the script, and saved it as > ../script1.pl. Ran it, it came up with 0 as output (which, according > to the story, is a Good Thing), so I then tried the second one liner. > Laptop thought for a second, then gave me a command line again. So, I > run ls, and lo and behold! No changes. Tried it with the expanded > version, saved as ../script2.pl. Same result.
Same problem, I would think.
perl -0 -wne'/<title>\s*([^<\n]+)\s*/i&&rename$ARGV,"$1.html"' *The script you'd want to do this kind of job on a regular basis would do both, and look something like this:
#!/usr/bin/perl -w # Created by Ben Okopnik on Thu Jan 24 13:39:27 EST 2008 die "Usage: ", $0 =~ /([^\/]+)$/, " <html_file[s]_to_rename>\n" unless @ARGV; for (@ARGV){ open F, $_ or die "$_: $!\n"; { local $/; ($title) = <F> =~ /<title>\s*([^<\n]+)/i; } close F; die "$_ does not have a title; aborting\n" unless $title; die "'$title' already exists in this directory; aborting\n" if -f $title; rename $_, "$title.html"; }
> Also, for reference, the files are all html, and about half named with > an html extension, and the other half have no extension.
The above should take care of all that, since it renames the files to whatever is in their titles plus an HTML extension.
> Thanks in advance. I'd hate to have to do the task manually.
I figure that it's a useful gadget, especially for people who like to keep archives of interesting pages they've found on the Net. It can be a little tough to look things up in those archives if you don't know which of several hundred files is relevant. Enjoy!
-- * Ben Okopnik * Editor-in-Chief, Linux Gazette * http://LinuxGazette.NET *
clarjon1 [clarjon1 at gmail.com]
On Jan 24, 2008 1:51 PM, Ben Okopnik <[email protected]> wrote:
> On Thu, Jan 24, 2008 at 12:50:14PM -0500, clarjon1 wrote: > > Hey, all. > > I've recently downloaded a bunch of HTML files, and wanted to name > > them by their title. I remembered the scripts in the Perl One-Liner > > of the Month: The Adventure of the Misnamed Files (LG 84), and thought > > that they would be useful, as they seemed to be what I needed. > > Whoops. There's a reason for the warning that I've put at the top of a > lot of those articles: one-liners should not be used for production > code; they're just a way to learn to read code, and maybe have a > little fun along the way. >
/me nods, he knows all about that,
> However, in looking at the code in that particular article, I'd say that > it should indeed work; there's nothing particularly edgy in it, and no > deprecated features. > > > I first tried the one-liner, and instead of the zero, I got 258 > > (which, btw, is the number of files in the directory I was in) > > Since that one liner simply counts the number of files and subtracts > those that have titles, this would imply that none of your HTML files > had titles - at least not in the layout that the script presumes, which > is > > `` > <title>Title Goes Here</title> > '' > > You may have something like this, instead: > > `` > <title> > Title Goes Here > </title> > '' >
You're close. I looked, it's acutally :
--- <title> Title Goes Here</title> ---But I digress...
> which is fine for HTML purposes but won't work with that one-liner > unless it's modified a bit. Something like this should work: > > `` > perl -0 -wne'END{print"$n\n"}eof&&$n++;/<title>\s*[^<]+/i&&$n--' * > '' >
Well, then, I guess it's a good thing Woomert was lucky enough to run into a bunch of mean files which follow styling standards
> > So I > > copied the "expanded" version of the script, and saved it as > > ../script1.pl. Ran it, it came up with 0 as output (which, according > > to the story, is a Good Thing), so I then tried the second one liner. > > Laptop thought for a second, then gave me a command line again. So, I > > run ls, and lo and behold! No changes. Tried it with the expanded > > version, saved as ../script2.pl. Same result. > > Same problem, I would think. > > `` > perl -0 -wne'/<title>\s*([^<\n]+)\s*/i&&rename$ARGV,"$1.html"' * > '' > > The script you'd want to do this kind of job on a regular basis would do > both, and look something like this: > > ``` > #!/usr/bin/perl -w > # Created by Ben Okopnik on Thu Jan 24 13:39:27 EST 2008 > > die "Usage: ", $0 =~ /([^\/]+)$/, " <html_file[s]_to_rename>\n" unless @ARGV; > > for (@ARGV){ > open F, $_ or die "$_: $!\n"; > { > local $/; > ($title) = <F> =~ /<title>\s*([^<\n]+)/i; > } > close F; > > die "$_ does not have a title; aborting\n" > unless $title; > die "'$title' already exists in this directory; aborting\n" > if -f $title; > > rename $_, "$title.html"; > } > ''' > > > Also, for reference, the files are all html, and about half named with > > an html extension, and the other half have no extension. >
Oooh, nice. Thanks. *clicky click* It works :D Thanks!
> The above should take care of all that, since it renames the files to > whatever is in their titles plus an HTML extension. > > > Thanks in advance. I'd hate to have to do the task manually. > > I figure that it's a useful gadget, especially for people who like to > keep archives of interesting pages they've found on the Net. It can be a > little tough to look things up in those archives if you don't know which > of several hundred files is relevant. Enjoy! >
Desktop searching apps like beagle come in handy for that, but the files hadn't been indexed, and besides, it's much easier to see what the pages are before I open them now. Oh, and don't worry, I will enjoy.