Talkback:142/moen.html

[ In reference to "Preventing Domain Expiration" in LG#142 ]

s. keeling [keeling at nucleus.com]

Sun, 2 Sep 2007 17:17:42 -0600

Entertaining and informative article, Rick, as always. I hope I can consider some of my recent floundering with whois and linuxmafia.{com,net} as partial inspiration.

However, getting down to the picture at the .sig block, shouldn't you be doing more long distance bicycling? I always pictured you as one of those ca. 150 lb., wiry joggers gorging yourself on tofu. Instead, the picture shows you're either a sumo wrestler, a linebacker (a geek linebacker?!?), or you need to avoid burger joints more often. Oh, and that perl book behind you is pink. Doesn't that need an upgrade?

My perl books are pink too, and they suit me just fine (damnit). Sorry for the crack about the picture. I'd just prefer that folks like you outlive me (I'm a selfish that way).

P.S. Ben, looks like we need to fiddle with pinehelper.pl again. It's ignoring the bit in parens in the subject line, making "Subject: Talkback". Crap.

My version of your pinehelper.pl attached. This is pinehelper.pl called by FF 2.0.0.4 (Debian stable/Etch Iceweasel), fwiw.

I've also just installed flashplayer-mozilla, which appears to handle swf better than swf-player, so maybe I can actually read the cartoons this time. :-)

http://linuxgazette.net/143/misc/lg/pinehelper

-- 
Any technology distinguishable from magic is insufficiently advanced.
(*)
- -

Top Back

Rick Moen [rick at linuxmafia.com]

Sun, 2 Sep 2007 16:54:49 -0700

Quoting s. keeling ([email protected]):

> Entertaining and informative article, Rick, as always.  I hope I can
> consider some of my recent floundering with whois and
> linuxmafia.{com,net} as partial inspiration.

You're more than welcome. The immediate inspiration, though, was a fan-run science fiction convention of my acquaintance losing (and then having to ransom back) its Internet domain.

> However, getting down to the picture at the .sig block, shouldn't you
> be doing more long distance bicycling?

Oddly enough, I am indeed intending to climb up over the Santa Cruz Mountains tomorrow, for Labor Day -- maybe over to Big Basin State Park. In any event:

> I always pictured you as one of those ca. 150 lb., wiry joggers
> gorging yourself on tofu.  Instead, the picture shows you're either a
> sumo wrestler, a linebacker (a geek linebacker?!?), or you need to
> avoid burger joints more often.

It's not a really flattering photo, and doesn't reveal that I'm just shy of 6' with a 34"-36" waist (fluctuates), weighing probably -- oh -- 200 lb. (I haven't actually weighed myself since the days when jokes about Generalissimo Francisco Franco were still considered funny.) I'm no longer wiry, and have never been guilty of jogging (cross-country running was always more the ticket), but am at least still considered an ectomorph and blessed with absurdly excellent health. But thanks for caring.

Perhaps I should also GIMP up the logo on that t-shirt a bit, because some people report reading it as "BayUSA", whereas it's actually "BayLISA", the local sysadmin group.

> Oh, and that perl book behind you is pink.  Doesn't that need an upgrade?

It does -- but I'm also notoriously cheap, and so borrow Deirdre's copy. ;->

Top Back

Rick Moen [rick at linuxmafia.com]

Mon, 3 Sep 2007 00:46:00 -0700

I wrote:

> Quoting s. keeling ([email protected]):

> > Entertaining and informative article, Rick, as always.  I hope I can
> > consider some of my recent floundering with whois and
> > linuxmafia.{com,net} as partial inspiration.
> 
> You're more than welcome.  The immediate inspiration, though, was a
> fan-run science fiction convention of my acquaintance losing (and then
> having to ransom back) its Internet domain.

Actually, I just remembered your e-mail query from last June, when you were having some odd problems getting useful WHOIS data regarding my linuxmafia.com domain -- which turned out to be that you were querying the WHOIS server at ARIN, which is actually the official Internet registry in/around North America for numbers (such as IP addresses) rather than names.

I was a little groggy when you'd asked that earlier query, because I'd just gotten off a flight back from Turkey via a short stop in London. (What was really bizarre was taking the London Underground train from Heathrow Airport into central London, so I could walk over to my old stomping grounds in Southwark, walk up the stairs into Picadilly Circus -- and immediately have my mobile ring: "Hello, this is Mike. Is the Linux user group meeting today?" "Um, possibly, but I'm a third of the way around the planet right now. Might I call back?")

Anyhow, yes, as it turned out, your WHOIS question about linuxmafia.com was indeed a motivator for that article, because it made me curious about how /usr/bin/whois knows which WHOIS server to query for particular data, and why it sometimes goes very badly wrong.

Top Back

Ben Okopnik [ben at linuxgazette.net]

Sun, 2 Sep 2007 23:53:56 -0400

On Sun, Sep 02, 2007 at 05:17:42PM -0600, s. keeling wrote:

> 
> P.S. Ben, looks like we need to fiddle with pinehelper.pl again.  It's
>      ignoring the bit in parens in the subject line, making "Subject:
>      Talkback".  Crap.

Ah, my sins come back to haunt me. I knew I shouldn't have been splitting on that colon; the damn things can occur in the text, too. Also - if you modify my scripts without having a sufficient level of Perl-Fu, things will break. Anyway... for anyone tuning in late: this relates back to the discussion that we had on using Mutt as a mail client for Firefox. The process goes something like this:

1) Save the following script somewhere reasonable, e.g.
	'/usr/local/bin/mutthelper'.
2) Type 'about:config' in the Firefox URL bar.
3) Either set or create the entry for 'network.protocol-handler.app.mailto'
	with a value of '/usr/local/bin/mutthelper' (or wherever you save it.)

#!/usr/bin/perl -w
# Created by Ben Okopnik on Sun Nov 13 17:24:14 EST 2005
 
($str = shift) =~ s/^(mailto):/$1=/;
 
for (split /[?&]/, $str){
		( $k, $v ) = split /=/;
		# Low-rent entity conversion
		( $header{ $k } = $v ) =~ s/%(..)/pack("H2",$1)/eg;
}
 
# Define appropriate switches for a given mail client (e.g., 'mutt')
$opts = qq[ -s "$header{subject}" ] if exists $header{subject};
$opts .= $header{mailto};
 
exec "/usr/bin/xterm -T Mutt -e /usr/bin/mutt $opts";

May it serve you well all the days of your life.

-- 
* Ben Okopnik * Editor-in-Chief, Linux Gazette * http://LinuxGazette.NET *

Top Back

s. keeling [keeling at nucleus.com]

Mon, 3 Sep 2007 07:52:09 -0600

Incoming from Ben Okopnik:

> On Sun, Sep 02, 2007 at 05:17:42PM -0600, s. keeling wrote:
> > 
> > P.S. Ben, looks like we need to fiddle with pinehelper.pl again.  It's
> >      ignoring the bit in parens in the subject line, making "Subject:
> >      Talkback".  Crap.
> 
> Ah, my sins come back to haunt me. I knew I shouldn't have been
> splitting on that colon; the damn things can occur in the text, too.
> Also - if you modify my scripts without having a sufficient level of
> Perl-Fu, things will break. Anyway... for anyone tuning in late: this

Well, one person's "sufficient level of Perl-fu" is another person's "why the heck did they do it that way?!?" For instance, at the moment, I'm rebuilding a client's grotty old Korn shell script, and the other day I found this in it:

     grep "^.*$foo.*"  ...

Anchored at the beginning of the line, followed by any number of any characters, test for the existence of $foo, followed by any number of any characters. Well, why not just grep for $foo? Is this actually doing something magical which for some reason I can't see it, or is it just silly?

Sometimes, it's hard to tell. That got past at least three very capable big iron *nix guys before I tripped over it.

Thanks for your help bringing pinehelper/mutthelper back.

-- 
Any technology distinguishable from magic is insufficiently advanced.
(*)
- -

Top Back

Ben Okopnik [ben at linuxgazette.net]

Mon, 3 Sep 2007 13:20:57 -0400

On Mon, Sep 03, 2007 at 07:52:09AM -0600, s. keeling wrote:

> Incoming from Ben Okopnik:
> > On Sun, Sep 02, 2007 at 05:17:42PM -0600, s. keeling wrote:
> > > 
> > > P.S. Ben, looks like we need to fiddle with pinehelper.pl again.  It's
> > >      ignoring the bit in parens in the subject line, making "Subject:
> > >      Talkback".  Crap.
> > 
> > Ah, my sins come back to haunt me. I knew I shouldn't have been
> > splitting on that colon; the damn things can occur in the text, too.
> > Also - if you modify my scripts without having a sufficient level of
> > Perl-Fu, things will break. Anyway... for anyone tuning in late: this
> 
> Well, one person's "sufficient level of Perl-fu" is another person's
> "why the heck did they do it that way?!?"

Only when you're critiquing someone else's script - not when you're actually changing it. In the latter case, you really do need to understand what they did - and all the peripheral results of it - or rewrite that entire script, since you don't know what else may be affected.

perl -le'sub x{local $"="+"; print eval "@_"}; split //, 1234; &x'
10

"Oh, look at that idiot: he's using '&' in front of the subroutine call. Doesn't he know that's long deprecated?"

perl -le'sub x{local $"="+"; print eval "@_"}; split //, 123; x()'

"Hey, what happened to the output???"

> For instance, at the
> moment, I'm rebuilding a client's grotty old Korn shell script, and
> the other day I found this in it:
> 
>      grep "^.*$foo.*"  ...
> 
> Anchored at the beginning of the line, followed by any number of any
> characters, test for the existence of $foo, followed by any number of
> any characters.  Well, why not just grep for $foo?  Is this actually doing
> something magical which for some reason I can't see it, or is it just
> silly?

Neither - it's like dead yeast in beer. No benefit, but no particular negative effect, either. If it was "^.+$foo.+", that would be a completely different deal - but '.*'s are essentially meaningless in this case.

> Sometimes, it's hard to tell.  That got past at least three very
> capable big iron *nix guys before I tripped over it.

I could see just leaving it alone, myself. I can't see any scenario in which it would do harm.

-- 
* Ben Okopnik * Editor-in-Chief, Linux Gazette * http://LinuxGazette.NET *

Top Back

Neil Youngman [ny at youngman.org.uk]

Mon, 3 Sep 2007 21:34:21 +0100

On or around Monday 03 September 2007 18:20, Ben Okopnik reorganised a bunch of electrons to form the message:

> On Mon, Sep 03, 2007 at 07:52:09AM -0600, s. keeling wrote:
> > For instance, at the
> > moment, I'm rebuilding a client's grotty old Korn shell script, and
> > the other day I found this in it:
> >
> >      grep "^.*$foo.*"  ...
> >
> > Anchored at the beginning of the line, followed by any number of any
> > characters, test for the existence of $foo, followed by any number of
> > any characters.  Well, why not just grep for $foo?  Is this actually
> > doing something magical which for some reason I can't see it, or is it
> > just silly?
>
> Neither - it's like dead yeast in beer. No benefit, but no particular
> negative effect, either. If it was "^.+$foo.+", that would be a
> completely different deal - but '.*'s are essentially meaningless in
> this case.

Semantically that's true, however ...

> > Sometimes, it's hard to tell.  That got past at least three very
> > capable big iron *nix guys before I tripped over it.
>
> I could see just leaving it alone, myself. I can't see any scenario in
> which it would do harm.

... that's the sort of thing, which could have an impact on performance if run often enough on large enough strings. I went to a course on Exim, which included a very enlightening seminar on regex performance. This isn't particularly bad, but it is inefficient.

I would expect the handling of the above to work on the following lines

``` 1. Match with ^ at the start of the line, no problem 2. The first .* matches all the way to the end of the line. repeat 3. $foo doesn't match anything, roll back 1 character and try again until $foo matches or you can't rollback any further 4 If you have a match then the final .* matches to the end of line else match failed ''

All these rollbacks are unnecessary and can soak up a serious amount of CPU with a badly written regex. A little testing (below) shows that it doesn't seem to be too bad in this case, but it could be expensive if repeated too often.

Pcretest shows that the simpler form takes 0.046 milliseconds on a longish, non-matching string, vs. 0.133 milliseconds for the version with the unnecessary complications. That's a factor of 3 difference in cost for zero benefit. More complex regexes could show an even higher cost.

$ pcretest -t
PCRE version 6.7 04-Jul-2006
 
  re> /^.*test string.*/
Compile time 0.003 milliseconds
data> A little over a month ago, Microsoft announced that it would submit one 
or more of its Shared Source licences to OSI for certification. That has 
generated a lot of controversy within the Open Source community, with some 
Open Source supporters welcoming the prospect of Microsoft approaching OSI 
but many others regarding any move by Microsoft with deep suspicion. Eric 
Raymond acknowledges that ongoing debate and allows as how it's been going on 
within OSI, too. However, as he writes, "OSI's official position, from the 
beginning, which I helped formulate and have expressed to any number of 
reporters and analysts, is that OSI will treat any licenses submitted [by] 
Microsoft strictly on their merits, without fear or favor. That remains OSI's 
position. But..." He goes on, "...Microsoft's behavior in the last few months 
with respect to OOXML has been egregious. They haven't stopped at pushing 
a "standard" that is divisive, technically bogus, and an obvious tool of 
monopoly lock-in; they have resorted to lying, ballot-stuffing, 
committee-packing, and outright bribery to ram it through the ISO 
standardization process in ways that violate ISO's own guidelines wholesale."
Execute time 0.133 milliseconds
No match
data> 
  re> /test string/
Compile time 0.002 milliseconds
data> A little over a month ago, Microsoft announced that it would submit one 
or more of its Shared Source licences to OSI for certification. That has 
generated a lot of controversy within the Open Source community, with some 
Open Source supporters welcoming the prospect of Microsoft approaching OSI 
but many others regarding any move by Microsoft with deep suspicion. Eric 
Raymond acknowledges that ongoing debate and allows as how it's been going on 
within OSI, too. However, as he writes, "OSI's official position, from the 
beginning, which I helped formulate and have expressed to any number of 
reporters and analysts, is that OSI will treat any licenses submitted [by] 
Microsoft strictly on their merits, without fear or favor. That remains OSI's 
position. But..." He goes on, "...Microsoft's behavior in the last few months 
with respect to OOXML has been egregious. They haven't stopped at pushing 
a "standard" that is divisive, technically bogus, and an obvious tool of 
monopoly lock-in; they have resorted to lying, ballot-stuffing, 
committee-packing, and outright bribery to ram it through the ISO 
standardization process in ways that violate ISO's own guidelines wholesale."
Execute time 0.046 milliseconds
No match
data>

Neil Youngman

Top Back

Ben Okopnik [ben at linuxgazette.net]

Mon, 3 Sep 2007 22:01:42 -0400

On Mon, Sep 03, 2007 at 09:34:21PM +0100, Neil Youngman wrote:

> On or around Monday 03 September 2007 18:20, Ben Okopnik reorganised a bunch 
> of electrons to form the message:
> > On Mon, Sep 03, 2007 at 07:52:09AM -0600, s. keeling wrote:

[...]

> > >      grep "^.*$foo.*"  ...

[...]

> > > Well, why not just grep for $foo?  Is this actually
> > > doing something magical which for some reason I can't see it, or is it
> > > just silly?
> >
> > Neither - it's like dead yeast in beer. No benefit, but no particular
> > negative effect, either. If it was "^.+$foo.+", that would be a
> > completely different deal - but '.*'s are essentially meaningless in
> > this case.
> 
> Semantically that's true, however ...

[...]

> ... that's the sort of thing, which could have an impact on performance if run 
> often enough on large enough strings.

Neil, I can't say that I've ever considered performance as a major goal when writing a shell script. If I did, I wouldn't be writing a shell script.

> I went to a course on Exim, which 
> included a very enlightening seminar on regex performance. This isn't 
> particularly bad, but it is inefficient.

I actually add a section on regex performance when I teach my Perl class, since Perl scripts are often used by, e.g., Web servers - where performance matters quite a lot. I also throw in, gratis, a rant about using '.*' to match things (it's way too broad and tends to grab far more than necessary; useful for consuming space, but not for accurate matching.) Part of the former is a section on backtracking - which is nicely covered in "perldoc perlre". My particular favorite is this bit:

  It's important to realize that a regular expression is merely a set
  of assertions that gives a definition of success. There may be 0, 1, or
  several different ways that the definition might succeed against a
  particular string. And if there are multiple ways it might succeed, you
  need to understand backtracking to know which variety of success you
  will achieve.

> I would expect the handling of the above to work on the following lines
> 
> 1. Match with ^ at the start of the line, no problem
> 2. The first .* matches all the way to the end of the line.
> repeat
>   3. $foo doesn't match anything, roll back 1 character and try again
> until $foo matches or you can't rollback any further

The Perl regex engine is a bit smarter than that, actually; several optimizations would kick in at this last step.

> 4 If you have a match
>    then
>      the final .* matches to the end of line
>   else
>      match failed
> 
> All these rollbacks are unnecessary and can soak up a serious amount of CPU 
> with a badly written regex. A little testing (below) shows that it doesn't 
> seem to be too bad in this case, but it could be expensive if repeated too 
> often.
> 
> Pcretest shows that the simpler form takes 0.046 milliseconds on a longish, 
> non-matching string, vs. 0.133 milliseconds for the version with the 
> unnecessary complications. That's a factor of 3 difference in cost for zero 
> benefit.

All of which get swamped in disk access, etc.:

ben@Tyr:/tmp$ time grep '^.*test string.*' foo
 
real    0m0.010s
user    0m0.008s
sys     0m0.000s
ben@Tyr:/tmp$ time grep 'test string' foo
 
real    0m0.010s
user    0m0.008s
sys     0m0.000s

-- 
* Ben Okopnik * Editor-in-Chief, Linux Gazette * http://LinuxGazette.NET *

Top Back

s. keeling [keeling at nucleus.com]

Mon, 3 Sep 2007 16:17:02 -0600

Incoming from Ben Okopnik:

> On Mon, Sep 03, 2007 at 07:52:09AM -0600, s. keeling wrote:
> > Incoming from Ben Okopnik:
> > > Also - if you modify my scripts without having a sufficient level of
> > > Perl-Fu, things will break. Anyway... for anyone tuning in late: this
> > 
> > Well, one person's "sufficient level of Perl-fu" is another person's
> > "why the heck did they do it that way?!?"  
> > [schnip]
> > 
> >      grep "^.*$foo.*"  ...
> > 
> > any characters.  Well, why not just grep for $foo?  Is this actually doing
> > something magical which for some reason I can't see it, or is it just
> > silly?
> 
> Neither - it's like dead yeast in beer. No benefit, but no particular

<pontification> Well, the first thing I thought when I saw it was, "How much can I trust the last person to have known what they were doing?"

It goes to the question of reputation. How much of the rest of this thing was written by someone who apparently had only a vague understanding of regexps? How much can I trust the rest of this thing to not blow up somewhere down the line and have that failure point to me for not having caught and fixed such obvious bloopers? I don't want my name embedded in something which contains silliness like that. </pontification>

Thanks to Neil for the interesting pcre analysis (good to know), but no, this isn't a situation where that would apply. This app is a highly interactive menu program which Ops use (I'm "Midrange") to manage accounts and group memberships. It's not timing sensitive that way.

It's also since been stripped of all its "Useless Use of Cat":

   cat /etc/passwd | grep $foo

(grep is quite capable of opening up files all by itself, thank you very much) and it's now using ksh test correctly. This (sh/bash):

   if [ "${foo}x" = "${bar}x" ]; then ...

is (roughly) equivalent to (ksh):

   if [[ $foo = $bar ]]; then ...

Here's an oddity I discovered. In ksh, this actually works as a valid test (not assignment of $bar to foo):

   if [ $foo=$bar ]; then ...

Try that in C/C++. :-)

You can't do the same with double brackets. This:

   if [[ $foo=$bar ]]; then ...

fails on a syntax error. ksh has some interesting features. For instance, "typeset var" creates a local variable named var.

I still much prefer perl, bash, and zsh, but ksh too is interesting as shell scripting languages go. Though I have looked into ksh before, and have owned *O'Really*'s "Learning the Korn shell" for a long time, this is the first time I've needed to use it professionally. You wizards, I imagine, think all of this this is pretty elementary stuff, but the last time I worked on commercial Unix "shell scripting" (not perl) was using HP/UX' sh-posix (ca. 1996). Does that still exist?

There's a lot of mediocre code extant in the corporate world. Corps. are seldom interested in "doing it right." They're far more often interested in "getting it done" so you can move on and get to the next fire that needs fighting. I believe that's why @#$% like that regexp above happens. Not enough time allotted to learn to get it right. "Just get it done, damnit! Time's money." :-P

-- 
Any technology distinguishable from magic is insufficiently advanced.
(*)
- -

Top Back

Ben Okopnik [ben at linuxgazette.net]

Mon, 3 Sep 2007 22:40:47 -0400

On Mon, Sep 03, 2007 at 04:17:02PM -0600, s. keeling wrote:

> 
> <pontification>
> Well, the first thing I thought when I saw it was, "How much can I
> trust the last person to have known what they were doing?"
> 
> It goes to the question of reputation.  How much of the rest of this
> thing was written by someone who apparently had only a vague
> understanding of regexps?  How much can I trust the rest of this thing
> to not blow up somewhere down the line and have that failure point to
> me for not having caught and fixed such obvious bloopers?  I don't
> want my name embedded in something which contains silliness like that.
> </pontification>

That's an excellent perspective as well - and it's one I have whenever I undertake to fix someone else's code.

>    cat /etc/passwd | grep $foo

Oh-oh. Classic bad idea, that - and it's not just "Useless Use of 'cat'", either.

ben@Tyr:/tmp$ foo="a b"
ben@Tyr:/tmp$ < /etc/passwd grep $foo
grep: b: No such file or directory

Always - yes, always - quote the argument. Particularly when it's a variable.

> (grep is quite capable of opening up files all by itself, thank you
> very much) and it's now using ksh test correctly.  This (sh/bash):
> 
>    if [ "${foo}x" = "${bar}x" ]; then ...

This is right...

> is (roughly) equivalent to (ksh):
> 
>    if [[ $foo = $bar ]]; then ...

...while this could cause a problem - randomly (whenever the var is undefined, or contains whitespace.)

ben@Tyr:/tmp$ [ $a = foo ] && echo 'Ooops!'
-bash: [: =: unary operator expected

> Here's an oddity I discovered.  In ksh, this actually works as a valid
> test (not assignment of $bar to foo):
> 
>    if [ $foo=$bar ]; then ...

Yep. Ksh being an sh-derived shell, it still follows all the POSIX 1002.3 spec (i.e., does all the 'sh'-standard stuff). I generally avoid using the '[[ ]]' string-comparison syntax, since it's not portable; '[ ]' works everywhere.

> Try that in C/C++.    You can't do the same with double brackets.
> This:
> 
>    if [[ $foo=$bar ]]; then ...
> 
> fails on a syntax error.

ben@Tyr:/tmp$ ksh
[ben@Tyr:/tmp]$ foo=abc
[ben@Tyr:/tmp]$ bar=abc
[ben@Tyr:/tmp]$ [[ "$foo"="$bar" ]] && print 'It worked!'
It worked!
[ben@Tyr:/tmp]$

You may be running some version different from mine, though (I've got ksh93).

> There's a lot of mediocre code extant in the corporate world.
> Corps. are seldom interested in "doing it right."  They're far more
> often interested in "getting it done" so you can move on and get to
> the next fire that needs fighting.  I believe that's why @#$% like
> that regexp above happens.  Not enough time allotted to learn to get
> it right.  "Just get it done, damnit!  Time's money."  :-P

Heh. Too true.

-- 
* Ben Okopnik * Editor-in-Chief, Linux Gazette * http://LinuxGazette.NET *

Top Back

Kapil Hari Paranjape [kapil at imsc.res.in]

Mon, 3 Sep 2007 08:53:52 +0530

Hello,

One reason to renew annually (but well in time!) is that if you take a 3-4 year lease you might not have this job on your annual TODO list --- then you forget! That, of course, is where the domain-check script will be useful.

Great article!

Kapil. --

Top Back

Rick Moen [rick at linuxmafia.com]

Sun, 2 Sep 2007 23:44:06 -0700

Quoting Kapil Hari Paranjape ([email protected]):

> One reason to renew annually (but well in time!) is that if you
> take a 3-4 year lease you might not have this job on your annual
> TODO list --- then you forget! That, of course, is where the
> domain-check script will be useful.

Well, speaking of that, there's actually no reason why you can't renew (i.e., add an extra year to) a domain annually and always keep a fixed distance of about 3-4 years away from expiration. In fact, that sounds like a capital idea, to me!

-- 
Cheers,                     Errors have been made.  Others will be blamed.
Rick Moen
[email protected]

Top Back