...making Linux just a little more fun!
Kat Tanaka Okopnik [kat at linuxgazette.net]
[[[ The originating thread for this discussion is http://linuxgazette.net/147/misc/lg/transliterating_arabic.html -- Kat ]]]
On Tue, Jan 22, 2008 at 02:39:21PM -0500, Benjamin A. Okopnik wrote:
> Latin character set (ISO-8859-1 and such) to Russian, yes. > > Eh... I'll send this example, and hope the 8-bit stuff makes it through > the mail. > ``` > ben@Tyr:~$ tsl2utf8 -h > Mappings: > > A|<90> B|<91> V|<92> G|<93> D|<94> E|<95> J|<96> Z|<97> > I|<98> Y|<99> K|<9a> L|<9b> M|<9c> N|<9d> O|<9e> P|<9f> > R|<a0> S|<a1> T|<a2> U|<a3> F|<a4> H|<a5> C|<a6> X|<a7> > 1|<a8> 2|<a9> 3|<aa> 4|<ab> 5|<ac> 6|<ad> 7|<ae> 8|<af> > a|<b0> b|<b1> v|<b2> g|<b3> d|<b4> e|<b5> j|<b6> z|<b7> > i|<b8> y|<b9> k|<ba> l|<bb> m|<bc> n|<bd> o|<be> p|<bf> > r|<80> s|<81> t|<82> u|<83> f|<84> h|<85> c|<86> x|<87> > !|<88> @|<89> #|<8a> $|<8b> %|<8c> ^|<8d> &|<8e> *|<8f> > +|<91> > > ben@Tyr:~$ tsl2utf8 > samovar > <81><b0><bc><be><b2><b0><80> > babu!ka > <b1><b0><b1><83><88><ba><b0> > 7jno-^fiopskiy grax uv+l m$!% za hobot na s#ezd *@eric. > <ae><b6><bd><be>-<8d><84><b8><be><bf><81><ba><b8><b9> [...] > '''
Alas, as you may note from the above, it came through as utter mojibake, even though my system is capable of reading (some) Russian.
http://people.debian.org/~kubota/mojibake/
http://en.wikipedia.org/wiki/Mojibake
Hmm. Wikipedia sugests that I call it krakozyabry (крокозя́бры). ;)
This looked like a useful gizmo: http://2cyr.com/decode/?lang=en but it failed to produce anything ungarbled this time.
-- Kat Tanaka Okopnik Linux Gazette Mailbag Editor [email protected]
Ben Okopnik [ben at linuxgazette.net]
On Tue, Jan 22, 2008 at 12:13:11PM -0800, Kat wrote:
> On Tue, Jan 22, 2008 at 02:39:21PM -0500, Benjamin A. Okopnik wrote: > > > > Eh... I'll send this example, and hope the 8-bit stuff makes it through > > the mail. > > ``` > > ben@Tyr:~$ tsl2utf8 -h > > Mappings: > > > > A|<90> B|<91> V|<92> G|<93> D|<94> E|<95> J|<96> Z|<97> > > I|<98> Y|<99> K|<9a> L|<9b> M|<9c> N|<9d> O|<9e> P|<9f> > > R|<a0> S|<a1> T|<a2> U|<a3> F|<a4> H|<a5> C|<a6> X|<a7> > > 1|<a8> 2|<a9> 3|<aa> 4|<ab> 5|<ac> 6|<ad> 7|<ae> 8|<af> > > a|<b0> b|<b1> v|<b2> g|<b3> d|<b4> e|<b5> j|<b6> z|<b7> > > i|<b8> y|<b9> k|<ba> l|<bb> m|<bc> n|<bd> o|<be> p|<bf> > > r|<80> s|<81> t|<82> u|<83> f|<84> h|<85> c|<86> x|<87> > > !|<88> @|<89> #|<8a> $|<8b> %|<8c> ^|<8d> &|<8e> *|<8f> > > +|<91> > > > > ben@Tyr:~$ tsl2utf8 > > samovar > > <81><b0><bc><be><b2><b0><80> > > babu!ka > > <b1><b0><b1><83><88><ba><b0> > > 7jno-^fiopskiy grax uv+l m$!% za hobot na s#ezd *@eric. > > <ae><b6><bd><be>-<8d><84><b8><be><bf><81><ba><b8><b9> [...] > > ''' > > Alas, as you may note from the above, it came through as utter > mojibake, even though my system is capable of reading (some) Russian.
Bleh. As I'm responding to this, using 'vi', I can see exactly where the UTF-8 characters got turned back into the... other... stuff. I.e., the first letter pair looks like 'A|<83><90>' (the '<90>' part being the value of the second byte in hex, i.e. dec144/oct221) - which is actually what the UTF-8 two-byte pair for the character is supposed to be.
To the best of my troubleshooting ability so far, everything breaks somewhere between the time that it leaves my mail client and the time that it arrives at the LG mail server - but I've checked everything on my end, and I'm sending it out with 'utf-8' as the charset and 8 bits set for the SMTP transaction. I'm pretty much stuck at that point, and have been for a while.
> Hmm. Wikipedia sugests that I call it krakozyabry (крокозя́бры). ;)^ ^
The translit version is fine; the so-called Russian isn't (it's 'kra', not 'kro'.) I also get somewhat annoyed when people put accent marks into plain Russian text anywhere outside a primer without denoting it: given that Russian uses a mark of that sort as part of a letter... well, there's no such letter as a "ya-tilde" in Russian, although anyone looking at the above cite would think so.
-- * Ben Okopnik * Editor-in-Chief, Linux Gazette * http://LinuxGazette.NET *
René Pfeiffer [lynx at luchs.at]
On Jan 22, 2008 at 1712 -0500, Ben Okopnik appeared and said:
> On Tue, Jan 22, 2008 at 12:13:11PM -0800, Kat wrote: > > On Tue, Jan 22, 2008 at 02:39:21PM -0500, Benjamin A. Okopnik wrote: > > > > > > Eh... I'll send this example, and hope the 8-bit stuff makes it through > > > the mail. > > [...] > > Alas, as you may note from the above, it came through as utter > > mojibake, even though my system is capable of reading (some) Russian. > > Bleh. As I'm responding to this, using 'vi', I can see exactly where the > UTF-8 characters got turned back into the... other... stuff. I.e., the > first letter pair looks like 'A|=C3=90<90>' (the '<90>' part being the value > of the second byte in hex, i.e. dec144/oct221) - which is actually what > the UTF-8 two-byte pair for the character is supposed to be.
It looks a bit unreadable to me (not that I could understand Arabic or Russian though).
> To the best of my troubleshooting ability so far, everything breaks > somewhere between the time that it leaves my mail client and the time > that it arrives at the LG mail server - but I've checked everything on > my end, and I'm sending it out with 'utf-8' as the charset and 8 bits > set for the SMTP transaction. I'm pretty much stuck at that point, and > have been for a while.
It took me some time to convert all my workstations and my mutt mail enviroment to UTF-8. Basically I have the following configuration.
- I use LANG=3Den_GB.UTF-8 as locale setting (don't like the German translations ;-). - I use UTF-8-capable xterms. "ps ax" says they were started with the following options: xterm -class UXTerm -title uxterm -u8 -bg black -fg green - My .muttrc offers mutt the following encodings when writing emails: set send_charset=3D"us-ascii:iso-8859-15:utf-8" - Additionally I have the following two lines in my .muttrc: set charset=3D"utf-8" set editor=3D"vim +':set textwidth=3D72' +':set wrap' +':set encoding=3Dutf-8' +'set si'"With this combination the encoding is fairly sure to survive (even PGP/MIME and other manglings on the way out).
We now return to your regular scheduled programme.
Best, René.
P.S.: I wonder what happens to the é in Ben's mutt/xterm/window/thing.
Ben Okopnik [ben at linuxgazette.net]
On Wed, Jan 23, 2008 at 12:45:37AM +0100, Ren<a9> Pfeiffer wrote:
> On Jan 22, 2008 at 1712 -0500, Ben Okopnik appeared and said: > > > To the best of my troubleshooting ability so far, everything breaks > > somewhere between the time that it leaves my mail client and the time > > that it arrives at the LG mail server - but I've checked everything on > > my end, and I'm sending it out with 'utf-8' as the charset and 8 bits > > set for the SMTP transaction. I'm pretty much stuck at that point, and > > have been for a while. > > It took me some time to convert all my workstations and my mutt mail > enviroment to UTF-8. Basically I have the following configuration. > > - I use LANG=en_GB.UTF-8 as locale setting (don't like the German > translations ;-).
I've got LANG=en_US.UTF-8; did that pretty early on, since I often have a need for mixing different languages.
> - I use UTF-8-capable xterms. "ps ax" says they were started with the > following options: > xterm -class UXTerm -title uxterm -u8 -bg black -fg green
I've got most of that, except I set it in my .Xresources:
xterm*utf8:1 xterm*background: black xterm*foreground: goldNot sure what the UXTerm class does beyond turning on '-u8' and providing a different class for font settings, etc., but I can display/read Unicode stuff just fine (':dig' in Vim is a pretty good test; so is Markus Kuhn's "UTF-8-demo.txt.gz".) I'll try adding it, just to see.
> - My .muttrc offers mutt the following encodings when writing emails: > set send_charset="us-ascii:iso-8859-15:utf-8"
Hmm, I didn't have that one - I'll try adding it. Frankly, I doubt that it'll change anything, since the messages queued in my SMTP spool look fine.
> - Additionally I have the following two lines in my .muttrc: > set charset="utf-8"
Got that one.
> set editor="vim +':set textwidth=72' +':set wrap' +':set encoding=utf-8' +'set si'"
Set in my ~/.vimrc, except for "si" and "encoding". I'll add that too - although, again, the UTF-8 stuff that I write in my files saves and displays just fine.
I suspect that I'm just missing something in my understanding of how SMTP works - although I've studied everything I thought was relevant. Mutt is pretty smart, so my messages (both the one I sent to the list earlier and the test ones I've just sent on a round trip) went out with the following relevant headers:
MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bitThe method that I invoke on the Net::SMTP object that does the server interaction is:
$smtp{connection}->mail( $message{from}, Bits => 8 )This is the belt and the suspenders and the no-wrinkle fabric with the double-stitched pockets with special coin holders on the sides, y'know what I mean?
> With this combination the encoding is fairly sure to survive (even > PGP/MIME and other manglings on the way out).
All I can say is ((((( ...
> Ren<a9>. > > P.S.: I wonder what happens to the é in Ben's > mutt/xterm/window/thing.
I can see it just fine right now - but it's going to break once I send it back to the list.
-- * Ben Okopnik * Editor-in-Chief, Linux Gazette * http://LinuxGazette.NET *
René Pfeiffer [lynx at luchs.at]
On Jan 23, 2008 at 0009 -0500, Ben Okopnik appeared and said:
> On Wed, Jan 23, 2008 at 12:45:37AM +0100, René Pfeiffer wrote: > > [...] > > - I use LANG=3Den_GB.UTF-8 as locale setting (don't like the German > > translations ;-). > > I've got LANG=3Den_US.UTF-8; did that pretty early on, since I often have > a need for mixing different languages.
Ok.
> > - I use UTF-8-capable xterms. "ps ax" says they were started with the > > following options: > > xterm -class UXTerm -title uxterm -u8 -bg black -fg green > > I've got most of that, except I set it in my .Xresources: > > `` > xterm*utf8:1 > xterm*background: black > xterm*foreground: gold > ''
This looks good, too.
> Not sure what the UXTerm class does beyond turning on '-u8' and > providing a different class for font settings, etc., but I can > display/read Unicode stuff just fine (':dig' in Vim is a pretty good > test; so is Markus Kuhn's "UTF-8-demo.txt.gz".) I'll try adding it, just > to see.
I use xterm with "-u8" out of habit since it's in my xfce menu configuration. xterms with this option handle the UTF-8-demo.txt file just fine.
> > - My .muttrc offers mutt the following encodings when writing emails: > > set send_charset=3D"us-ascii:iso-8859-15:utf-8" > > Hmm, I didn't have that one - I'll try adding it. Frankly, I doubt that > it'll change anything, since the messages queued in my SMTP spool look > fine.
Yes, and your headers also look good. I use the above line mainly because then mutt can use an appropriate encoding. UTF-8 isn't always necessary.
> > - Additionally I have the following two lines in my .muttrc: > > set charset=3D"utf-8" > > Got that one.
Hm.
> > set editor=3D"vim +':set textwidth=3D72' +':set wrap' +':set encoding=3Dutf-8' +'set si'" > > Set in my ~/.vimrc, except for "si" and "encoding". I'll add that too - > although, again, the UTF-8 stuff that I write in my files saves and > displays just fine.
Well, what can I say? Looks good to me.
> I suspect that I'm just missing something in my understanding of how > SMTP works - although I've studied everything I thought was relevant.
Most modern MTAs are 8-bit clean. From personal experience I know that Postfix, Sendmail, Exim and CommuniGate Pro deal with 8-bit mail bodies just fine.
> Mutt is pretty smart, so my messages (both the one I sent to the list > earlier and the test ones I've just sent on a round trip) went out with > the following relevant headers: > > `` > MIME-Version: 1.0 > Content-Type: text/plain; charset=3Dutf-8 > Content-Disposition: inline > Content-Transfer-Encoding: 8bit > ''
Yes, I saw that, and that's correct - apart from the content in your email. ;-)
> The method that I invoke on the Net::SMTP object that does the server > interaction is: > > `` > $smtp{connection}->mail( $message{from}, Bits =3D> 8 ) > '' > > This is the belt and the suspenders and the no-wrinkle fabric with > the double-stitched pockets with special coin holders on the sides, > y'know what I mean?
Yes, basically the take away variant with everything and extra cheese. The only difference is that I carry a Postfix around all the time which handles the submitted emails.
> > With this combination the encoding is fairly sure to survive (even > > PGP/MIME and other manglings on the way out). > > All I can say is ((((( ...
Which leaves me with ?????????...
> > René. > > > > P.S.: I wonder what happens to the é in Ben's > > mutt/xterm/window/thing. > > I can see it just fine right now - but it's going to break once I send > it back to the list.
Why not use screenshots for the quoted text then? On my end of your email I see a "Ren =C3=A9" which looks like "René" encoded in UTF-8 and displayed as ISO-8859-1(5). Maybe wireshark can help us out.
Best, René.
Ben Okopnik [ben at linuxgazette.net]
On Wed, Jan 23, 2008 at 02:52:49PM +0100, Ren<a9> Pfeiffer wrote:
> On Jan 23, 2008 at 0009 -0500, Ben Okopnik appeared and said: > > > I suspect that I'm just missing something in my understanding of how > > SMTP works - although I've studied everything I thought was relevant. > > Most modern MTAs are 8-bit clean. From personal experience I know that > Postfix, Sendmail, Exim and CommuniGate Pro deal with 8-bit mail bodies > just fine.
I'm pretty sure that 8bit is the default method, but I wanted to nail it down just in case. I've even tried 'Bits => "binary"', with no better result.
> > Mutt is pretty smart, so my messages (both the one I sent to the list > > earlier and the test ones I've just sent on a round trip) went out with > > the following relevant headers: > > > > `` > > MIME-Version: 1.0 > > Content-Type: text/plain; charset=utf-8 > > Content-Disposition: inline > > Content-Transfer-Encoding: 8bit > > '' > > Yes, I saw that, and that's correct - apart from the content in your > email. ;-)
Heh.
> > > RenA<a9>.
250 2.1.0 [email protected]... Sender ok 250 2.1.5 [email protected]... Recipient ok 354 Enter mail, end with "." on a line by itself test тест
Ben Okopnik [ben at linuxgazette.net]
On Wed, Jan 23, 2008 at 02:52:49PM +0100, Ren<a9> Pfeiffer wrote:
> On Jan 23, 2008 at 0009 -0500, Ben Okopnik appeared and said: > > > I suspect that I'm just missing something in my understanding of how > > SMTP works - although I've studied everything I thought was relevant. > > Most modern MTAs are 8-bit clean. From personal experience I know that > Postfix, Sendmail, Exim and CommuniGate Pro deal with 8-bit mail bodies > just fine.
I'm pretty sure that 8bit is the default method, but I wanted to nail it down just in case. I've even tried 'Bits => "binary"', with no better result.
> > Mutt is pretty smart, so my messages (both the one I sent to the list > > earlier and the test ones I've just sent on a round trip) went out with > > the following relevant headers: > > > > `` > > MIME-Version: 1.0 > > Content-Type: text/plain; charset=utf-8 > > Content-Disposition: inline > > Content-Transfer-Encoding: 8bit > > '' > > Yes, I saw that, and that's correct - apart from the content in your > email. ;-)
Heh.
> > > Ren?????^F<a9>.
Yep - that's what I see after I've sent it on a round trip.
> Maybe wireshark can help us > out.
Oh, I'm pretty sure it's Net::SMTP at this point. Since the queued message is OK, and manually sending the text is OK as well, that's pretty much the only set of gears left in between.
(I'm going to try sending this email manually - launch 'bssmtp' with the '-odq' ("only queue") option and copy the queued result to the telnet session. We'll see what that looks like.)
-- * Ben Okopnik * Editor-in-Chief, Linux Gazette * http://LinuxGazette.NET *
Ben Okopnik [ben at linuxgazette.net]
On Wed, Jan 23, 2008 at 09:44:27AM -0500, Benjamin Okopnik wrote:
> On Wed, Jan 23, 2008 at 02:52:49PM +0100, Ren? Pfeiffer wrote: > > On Jan 23, 2008 at 0009 -0500, Ben Okopnik appeared and said: > > > > > I suspect that I'm just missing something in my understanding of how > > > SMTP works - although I've studied everything I thought was relevant. > > > > Most modern MTAs are 8-bit clean. From personal experience I know that > > Postfix, Sendmail, Exim and CommuniGate Pro deal with 8-bit mail bodies > > just fine. > > I'm pretty sure that 8bit is the default method, but I wanted to nail it > down just in case. I've even tried 'Bits => "binary"', with no better > result. > > > > Mutt is pretty smart, so my messages (both the one I sent to the list > > > earlier and the test ones I've just sent on a round trip) went out with > > > the following relevant headers: > > > > > > `` > > > MIME-Version: 1.0 > > > Content-Type: text/plain; charset=utf-8 > > > Content-Disposition: inline > > > Content-Transfer-Encoding: 8bit > > > '' > > > > Yes, I saw that, and that's correct - apart from the content in your > > email. ;-) > > Heh. > > > > > Ren?????^F?. > 250 2.1.0 [email protected]... Sender ok > 250 2.1.5 [email protected]... Recipient ok > 354 Enter mail, end with "." on a line by itself
[laugh] Whoops. I tried sending this manually, so the UTF-8 content would make it through; knowing that '.' by itself means "End of session", I added spaces after the period at this point, but I guess the upstream mail server I was using didn't take me seriously. I'll try resending the whole thing, this time "renaming" that period.
-- * Ben Okopnik * Editor-in-Chief, Linux Gazette * http://LinuxGazette.NET *
Ben Okopnik [ben at linuxgazette.net]
On Wed, Jan 23, 2008 at 09:44:27AM -0500, Benjamin Okopnik wrote:
> On Wed, Jan 23, 2008 at 02:52:49PM +0100, Ren<a9> Pfeiffer wrote: > > On Jan 23, 2008 at 0009 -0500, Ben Okopnik appeared and said: > > > > > I suspect that I'm just missing something in my understanding of how > > > SMTP works - although I've studied everything I thought was relevant. > > > > Most modern MTAs are 8-bit clean. From personal experience I know that > > Postfix, Sendmail, Exim and CommuniGate Pro deal with 8-bit mail bodies > > just fine.
[snip]
Bleh. Never mind the manual method; the interaction times out before I can glue it all in, and the console hoses some of the text. I'll just send it as is, and let the UTF-8 characters do their thing for now.
-- * Ben Okopnik * Editor-in-Chief, Linux Gazette * http://LinuxGazette.NET *
Ben Okopnik [ben at linuxgazette.net]
On Wed, Jan 23, 2008 at 02:52:49PM +0100, Ren<83><a9> Pfeiffer wrote:
> On Jan 23, 2008 at 0009 -0500, Ben Okopnik appeared and said: > > > I suspect that I'm just missing something in my understanding of how > > SMTP works - although I've studied everything I thought was relevant. > > Most modern MTAs are 8-bit clean. From personal experience I know that > Postfix, Sendmail, Exim and CommuniGate Pro deal with 8-bit mail bodies > just fine.
I'm pretty sure that 8bit is the default method, but I wanted to nail it down just in case. I've even tried 'Bits => "binary"', with no better result.
> > Mutt is pretty smart, so my messages (both the one I sent to the list > > earlier and the test ones I've just sent on a round trip) went out with > > the following relevant headers: > > > > `` > > MIME-Version: 1.0 > > Content-Type: text/plain; charset=utf-8 > > Content-Disposition: inline > > Content-Transfer-Encoding: 8bit > > '' > > Yes, I saw that, and that's correct - apart from the content in your > email. ;-)
Heh.
> > > Ren<83><a9>. > > > > > > P.S.: I wonder what happens to the é in Ben's > > > mutt/xterm/window/thing. > > > > I can see it just fine right now - but it's going to break once I send > > it back to the list. > > Why not use screenshots for the quoted text then?
Well, yes, that would fix this specific instance of the problem - and it would still suck to have an SMTP server that doesn't do the right thing. I just tested a part of my SMTP chain with the following:
ben@Tyr:~$ telnet linuxgazette.net 25 Trying 64.246.26.120... Connected to linuxgazette.net. Escape character is '^]'. 220 genetikayos.com ESMTP Sendmail 8.12.11.20060308/8.12.11; Wed, 23 Jan 2008 06:34:44 -0800 HELO Tyr.Thor MAIL FROM: [email protected] RCPT TO: [email protected] DATA 250 genetikayos.com Hello 72.sub-75-203-218.myvzw.com [75.203.218.72], pleased to meet you 250 2.1.0 [email protected]... Sender ok 250 2.1.5 [email protected]... Recipient ok 354 Enter mail, end with "." on a line by itself test <82><b5><81><82> . 250 2.0.0 m0NEYiOR015906 Message accepted for delivery QUIT 221 2.0.0 genetikayos.com closing connection Connection closed by foreign host.and that came through just fine, the UTF-8 content makes it across without any special headers (the only header I actually put in was 'Subject: ...'.) This means that Net::SMTP is hosing my content while doing the transaction - which is not great news. On the one hand, I wrote/rewrote 'bssmtp' to be modular and easy to service; on the other hand, I really don't feel like reloading my brain with all the SMTP-relevant guck and sitting down to rewrite that part of it. [sigh] I'm going to have to do that, it seems. Maybe I'll just do the SMTP stuff manually instead of using a module; it's not that tough, and I won't have someone else's code handing me this kind of surprises anymore.
> On my end of your email I see a "Ren<83><a9>" which looks like "Ren<83><a9>" encoded > in UTF-8 and displayed as ISO-8859-1(5).
Yep - that's what I see after I've sent it on a round trip.
> Maybe wireshark can help us > out.
Oh, I'm pretty sure it's Net::SMTP at this point. Since the queued message is OK, and manually sending the text is OK as well, that's pretty much the only set of gears left in between.
-- * Ben Okopnik * Editor-in-Chief, Linux Gazette * http://LinuxGazette.NET *
Breen Mullins [breen.mullins at gmail.com]
* Ben Okopnik <[email protected]> [2008-01-23 00:09 -0500]:
>On Wed, Jan 23, 2008 at 12:45:37AM +0100, Ren<a9> Pfeiffer wrote: >> On Jan 22, 2008 at 1712 -0500, Ben Okopnik appeared and said: >> >> > To the best of my troubleshooting ability so far, everything breaks >> > somewhere between the time that it leaves my mail client and the time >> > that it arrives at the LG mail server - but I've checked everything on >> > my end, and I'm sending it out with 'utf-8' as the charset and 8 bits >> > set for the SMTP transaction. I'm pretty much stuck at that point, and >> > have been for a while.
Hmm. Your headers don't look good to me.
========= X-MIME-Autoconverted: from 8bit to quoted-printable by genetikayos.com id m0N59WPL030073 Subject: Re: [TAG] Problems with UTF-8 over SMTP Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable X-SA-Exim-Version: 4.2.1 (built Mon, 27 Mar 2006 13:42:28 +0200) ======It definitely says that it's iso-8859-1 here. The Content-Type line is at the end of the headers, just before the Spamassassin stuff. The autoconverted line is several up from there. I suspect that the conversion is misflagging the message.
Breen
-- Breen Mullins Menlo Park, California
Ben Okopnik [ben at linuxgazette.net]
On Wed, Jan 23, 2008 at 06:35:11AM -0800, Breen Mullins wrote:
> * Ben Okopnik <[email protected]> [2008-01-23 00:09 -0500]: > > >On Wed, Jan 23, 2008 at 12:45:37AM +0100, Ren<83><a9> Pfeiffer wrote: > >> On Jan 22, 2008 at 1712 -0500, Ben Okopnik appeared and said: > >> > >> > To the best of my troubleshooting ability so far, everything breaks > >> > somewhere between the time that it leaves my mail client and the time > >> > that it arrives at the LG mail server - but I've checked everything on > >> > my end, and I'm sending it out with 'utf-8' as the charset and 8 bits > >> > set for the SMTP transaction. I'm pretty much stuck at that point, and > >> > have been for a while. > > Hmm. Your headers don't look good to me. > > ========= > X-MIME-Autoconverted: from 8bit to quoted-printable by genetikayos.com id > m0N59WPL030073 > ======
[blink] How the hell did I miss *that*?
Wow. Thanks, Breen - that sounds like the place where it's getting hosed, all right (I'll test that in a moment by sending it through another server.) The question is, how do I stop it? Anybody familiar with that aspect of SMTP?
I'm definitely sending out a character set - again, this is even before it gets to 'bssmtp', Mutt sets it based on the content and it definitely does the right thing when I have UTF-8 in there. Why it would get converted is a mystery to me.
-- * Ben Okopnik * Editor-in-Chief, Linux Gazette * http://LinuxGazette.NET *
Neil Youngman [Neil.Youngman at youngman.org.uk]
On Wednesday 23 January 2008 15:28, Ben Okopnik wrote:
> On Wed, Jan 23, 2008 at 06:35:11AM -0800, Breen Mullins wrote: > > * Ben Okopnik <[email protected]> [2008-01-23 00:09 -0500]: > > Hmm. Your headers don't look good to me. > > > > ========= > > X-MIME-Autoconverted: from 8bit to quoted-printable by genetikayos.com id > > m0N59WPL030073 > > ====== > > [blink] How the hell did I miss *that*? > > Wow. Thanks, Breen - that sounds like the place where it's getting > hosed, all right (I'll test that in a moment by sending it through > another server.) The question is, how do I stop it? Anybody familiar > with that aspect of SMTP? > > I'm definitely sending out a character set - again, this is even before > it gets to 'bssmtp', Mutt sets it based on the content and it definitely > does the right thing when I have UTF-8 in there. Why it would get > converted is a mystery to me.
That sounds to me very much like a server receiving it as 8 bit and deciding that the host it is sending to doesn't accept 8 bit ESMTP. The correct thing to do in that circumstance is to encode it, so it can be accepted by a 7 bit only server.
RFC 1652 says
"If a server SMTP does not support the 8-bit MIME transport extension (either by not responding with code 250 to the EHLO command, or by not including the EHLO keyword value 8BITMIME in its response), then the client SMTP must not, under any circumstances, attempt to transfer a content which contains characters outside the US-ASCII octet range (hex 00-7F). A client SMTP has two options in this case: first, it may implement a gateway transformation to convert the message into valid 7bit MIME, or second, or may treat this as a permanent error and handle it in the usual manner for delivery failures. The specifics of the transformation from 8bit MIME to 7bit MIME are not described by this RFC; the conversion is nevertheless constrained in the following ways: (1) it must cause no loss of information; MIME transport encodings must be employed as needed to insure this is the case, and (2) the resulting message must be valid 7bit MIME."I assume that the headers should be altered to correctly reflect the encoding, as otherwise it wouldn't be "valid 7bit MIME".
Neil
Ben Okopnik [ben at linuxgazette.net]
On Wed, Jan 23, 2008 at 03:45:52PM +0000, Neil Youngman wrote:
> On Wednesday 23 January 2008 15:28, Ben Okopnik wrote: > > On Wed, Jan 23, 2008 at 06:35:11AM -0800, Breen Mullins wrote: > > > * Ben Okopnik <[email protected]> [2008-01-23 00:09 -0500]: > > > Hmm. Your headers don't look good to me. > > > > > > ========= > > > X-MIME-Autoconverted: from 8bit to quoted-printable by genetikayos.com id > > > m0N59WPL030073 > > > ====== > > > > [blink] How the hell did I miss *that*? > > > > Wow. Thanks, Breen - that sounds like the place where it's getting > > hosed, all right (I'll test that in a moment by sending it through > > another server.) The question is, how do I stop it? Anybody familiar > > with that aspect of SMTP? > > > > I'm definitely sending out a character set - again, this is even before > > it gets to 'bssmtp', Mutt sets it based on the content and it definitely > > does the right thing when I have UTF-8 in there. Why it would get > > converted is a mystery to me. > > That sounds to me very much like a server receiving it as 8 bit and deciding > that the host it is sending to doesn't accept 8 bit ESMTP. The correct thing > to do in that circumstance is to encode it, so it can be accepted by a 7 bit > only server. > > RFC 1652 says > > "If a server SMTP does not support the 8-bit MIME transport extension > (either by not responding with code 250 to the EHLO command, or by > not including the EHLO keyword value 8BITMIME in its response)
Hmm. If I'm sending a message to myself, then the receiving host is 'linuxgazette.net'.
ben@Tyr:~$ telnet linuxgazette.net 25 Trying 64.246.26.120... Connected to linuxgazette.net. Escape character is '^]'. EHLO Tyr.Thor 250-genetikayos.com Hello 72.sub-75-203-218.myvzw.com [75.203.218.72], pleased to meet you 250-ENHANCEDSTATUSCODES 250-PIPELINING 250-8BITMIME 250-SIZE 250-DSN 250-ETRN 250-AUTH GSSAPI 250-STARTTLS 250-DELIVERBY 250 HELPDoesn't seem like that would trigger it off. In fact, I don't know that anybody out there is still doing 7bit-only stuff.
> , then > the client SMTP must not, under any circumstances, attempt to > transfer a content which contains characters outside the US-ASCII > octet range (hex 00-7F). > > A client SMTP has two options in this case: first, it may implement a > gateway transformation to convert the message into valid 7bit MIME, > or second, or may treat this as a permanent error and handle it in > > the usual manner for delivery failures. The specifics of the > transformation from 8bit MIME to 7bit MIME are not described by this > RFC; the conversion is nevertheless constrained in the following > ways: > > (1) it must cause no loss of information; MIME transport > encodings must be employed as needed to insure this is > the case, and > > (2) the resulting message must be valid 7bit MIME." > > I assume that the headers should be altered to correctly reflect the encoding, > as otherwise it wouldn't be "valid 7bit MIME".
Right... I don't think that "quoted-printable" is exactly equivalent to "7bit MIME" (although it would indeed pass through that filter.) Even more to the point, the headers on this email still say "Content-Transfer-Encoding: 8bit".
-- * Ben Okopnik * Editor-in-Chief, Linux Gazette * http://LinuxGazette.NET *
Ben Okopnik [ben at linuxgazette.net]
On Wed, Jan 23, 2008 at 03:45:52PM +0000, Neil Youngman wrote:
> On Wednesday 23 January 2008 15:28, Ben Okopnik wrote: > > On Wed, Jan 23, 2008 at 06:35:11AM -0800, Breen Mullins wrote: > > > * Ben Okopnik <[email protected]> [2008-01-23 00:09 -0500]: > > > Hmm. Your headers don't look good to me. > > > > > > ========= > > > X-MIME-Autoconverted: from 8bit to quoted-printable by genetikayos.com id > > > m0N59WPL030073 > > > ====== > > > > [blink] How the hell did I miss *that*? > > > > Wow. Thanks, Breen - that sounds like the place where it's getting > > hosed, all right (I'll test that in a moment by sending it through > > another server.) The question is, how do I stop it? Anybody familiar > > with that aspect of SMTP? > > > > I'm definitely sending out a character set - again, this is even before > > it gets to 'bssmtp', Mutt sets it based on the content and it definitely > > does the right thing when I have UTF-8 in there. Why it would get > > converted is a mystery to me. > > That sounds to me very much like a server receiving it as 8 bit and deciding > that the host it is sending to doesn't accept 8 bit ESMTP. The correct thing > to do in that circumstance is to encode it, so it can be accepted by a 7 bit > only server. > > RFC 1652 says > > "If a server SMTP does not support the 8-bit MIME transport extension > (either by not responding with code 250 to the EHLO command, or by > not including the EHLO keyword value 8BITMIME in its response)
Hmm. If I'm sending a message to myself, then the receiving host is 'linuxgazette.net'.
ben@Tyr:~$ telnet linuxgazette.net 25 Trying 64.246.26.120... Connected to linuxgazette.net. Escape character is '^]'. EHLO Tyr.Thor 250-genetikayos.com Hello 72.sub-75-203-218.myvzw.com [75.203.218.72], pleased to meet you 250-ENHANCEDSTATUSCODES 250-PIPELINING 250-8BITMIME 250-SIZE 250-DSN 250-ETRN 250-AUTH GSSAPI 250-STARTTLS 250-DELIVERBY 250 HELPDoesn't seem like that would trigger it off. In fact, I don't know that anybody out there is still doing 7bit-only stuff.
> , then > the client SMTP must not, under any circumstances, attempt to > transfer a content which contains characters outside the US-ASCII > octet range (hex 00-7F). > > A client SMTP has two options in this case: first, it may implement a > gateway transformation to convert the message into valid 7bit MIME, > or second, or may treat this as a permanent error and handle it in > > the usual manner for delivery failures. The specifics of the > transformation from 8bit MIME to 7bit MIME are not described by this > RFC; the conversion is nevertheless constrained in the following > ways: > > (1) it must cause no loss of information; MIME transport > encodings must be employed as needed to insure this is > the case, and > > (2) the resulting message must be valid 7bit MIME." > > I assume that the headers should be altered to correctly reflect the encoding, > as otherwise it wouldn't be "valid 7bit MIME".
Right... I don't think that "quoted-printable" is exactly equivalent to "7bit MIME" (although it would indeed pass through that filter.) Even more to the point, the headers on this email still say "Content-Transfer-Encoding: 8bit".
-- * Ben Okopnik * Editor-in-Chief, Linux Gazette * http://LinuxGazette.NET *
Neil Youngman [Neil.Youngman at youngman.org.uk]
On Wednesday 23 January 2008 16:15, Ben Okopnik wrote:
> Doesn't seem like that would trigger it off. In fact, I don't know that > anybody out there is still doing 7bit-only stuff.
I don't have the headers to look at, so I can only guess. Is the last received header "genetikayos.com"?
> Right... I don't think that "quoted-printable" is exactly equivalent to > "7bit MIME" (although it would indeed pass through that filter.) Even > more to the point, the headers on this email still say > "Content-Transfer-Encoding: 8bit".
I would say that "quoted-printable" is a subset of 7bit mime and in my (very limited) experience, it seems to be the default choice. It does sound as though the headers haven't been correctly updated.
Neil
Ben Okopnik [ben at linuxgazette.net]
On Wed, Jan 23, 2008 at 04:27:22PM +0000, Neil Youngman wrote:
> On Wednesday 23 January 2008 16:15, Ben Okopnik wrote: > > Doesn't seem like that would trigger it off. In fact, I don't know that > > anybody out there is still doing 7bit-only stuff. > > I don't have the headers to look at, so I can only guess. Is the last received > header "genetikayos.com"?
Here's a header from a successful one (i.e., the one I sent via a manual SMTP session):
From ben Wed Jan 23 09:37:32 2008 Return-Path: [email protected] Received: from genetikayos.com [64.246.26.120] by Tyr with POP3 (fetchmail-6.3.2) for <ben@localhost> (single-drop); Wed, 23 Jan 2008 09:37:32 -0500 (EST) Received: from Tyr.Thor (72.sub-75-203-218.myvzw.com [75.203.218.72]) by genetikayos.com (8.12.11.20060308/8.12.11) with SMTP id m0NEYiOR015906 for [email protected]; Wed, 23 Jan 2008 06:34:51 -0800And here's one from an email with the same content (which came through broken), sent via Mutt and bssmtp:Date: Wed, 23 Jan 2008 06:34:44 -0800 From: Ben Okopnik <[email protected]>Message-Id: <[email protected]> X-Spam-Status: No, score=-0.7 required=5.0 tests=BAYES_00,MISSING_SUBJECT, RCVD_IN_PBL,TO_CC_NONE autolearn=no version=3.1.8 X-Spam-Checker-Version: SpamAssassin 3.1.8 (2007-02-13) on genetikayos.com Status: RO Content-Length: 14 Lines: 2
From ben Wed Jan 23 14:08:26 2008 Return-Path: [email protected] Received: from localhost [127.0.0.1] by Tyr with POP3 (fetchmail-6.3.2) for <ben@localhost> (single-drop); Wed, 23 Jan 2008 14:08:26 -0500 (EST) Received: from localhost.localdomain (genetikayos.com [64.246.26.120]) by genetikayos.com (8.12.11.20060308/8.12.11) with ESMTP id m0NJ7G4Q026459 for <[email protected]>; Wed, 23 Jan 2008 11:07:28 -0800 Received: from localhost ([127.0.0.1]) by Fenrir (bssmtp 0.3) with SMTP id 70419937; Wed, 23 Jan 2008 14:07:40 -0500I note that there's no conversion warning in it - and yet, it's broken.From: Ben Okopnik <[email protected]> Message-ID: <[email protected]> Date: Wed, 23 Jan 2008 14:07:40 -0500 To: Ben Okopnik <[email protected]>MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bitUser-Agent: Mutt/1.5.11X-Spam-Status: No, score=-1.6 required=5.0 tests=AWL,BAYES_00,MISSING_SUBJECT autolearn=no version=3.1.8 X-Spam-Checker-Version: SpamAssassin 3.1.8 (2007-02-13) on genetikayos.com
> > Right... I don't think that "quoted-printable" is exactly equivalent to > > "7bit MIME" (although it would indeed pass through that filter.) Even > > more to the point, the headers on this email still say > > "Content-Transfer-Encoding: 8bit". > > I would say that "quoted-printable" is a subset of 7bit mime and in my (very > limited) experience, it seems to be the default choice.
That sounds reasonable. Perhaps converting anything that's not 100% clear to 7bit is some servers' default policy.
> It does sound as > though the headers haven't been correctly updated.
Again, possible - but the UTF-8 does come through in a manual session, where I've used no headers from the sender side beyond the 'From:' (the 'RCPT TO:' just gets used to determine where to send it, and doesn't get added to the headers.)
-- * Ben Okopnik * Editor-in-Chief, Linux Gazette * http://LinuxGazette.NET *
Neil Youngman [Neil.Youngman at youngman.org.uk]
On Wednesday 23 January 2008 19:17, Ben Okopnik wrote:
> On Wed, Jan 23, 2008 at 04:27:22PM +0000, Neil Youngman wrote: > > On Wednesday 23 January 2008 16:15, Ben Okopnik wrote: > > > Doesn't seem like that would trigger it off. In fact, I don't know that > > > anybody out there is still doing 7bit-only stuff. > > > > I don't have the headers to look at, so I can only guess. Is the last > > received header "genetikayos.com"? > > Here's a header from a successful one (i.e., the one I sent via a manual > SMTP session): > > `` > From ben Wed Jan 23 09:37:32 2008 > Return-Path: [email protected] > Received: from genetikayos.com [64.246.26.120] > by Tyr with POP3 (fetchmail-6.3.2) > for <ben@localhost> (single-drop); Wed, 23 Jan 2008 09:37:32 -0500 > (EST) Received: from Tyr.Thor (72.sub-75-203-218.myvzw.com [75.203.218.72]) > by genetikayos.com (8.12.11.20060308/8.12.11) with SMTP id m0NEYiOR015906 > for [email protected]; Wed, 23 Jan 2008 06:34:51 -0800 > Date: Wed, 23 Jan 2008 06:34:44 -0800 > From: Ben Okopnik <[email protected]> > Message-Id: <[email protected]> > X-Spam-Status: No, score=-0.7 required=5.0 tests=BAYES_00,MISSING_SUBJECT, > RCVD_IN_PBL,TO_CC_NONE autolearn=no version=3.1.8 > X-Spam-Checker-Version: SpamAssassin 3.1.8 (2007-02-13) on genetikayos.com > Status: RO > Content-Length: 14 > Lines: 2
No MIME headers, should be treated as plain US-ASCII for most purposes IIRC. Probably most 8 bit clean MTAs wouldn't check that it's seven bit clean, especially if they're handling ESMTP.
> And here's one from an email with the same content (which came through > broken), sent via Mutt and bssmtp: > > `` > From ben Wed Jan 23 14:08:26 2008 > Return-Path: [email protected] > Received: from localhost [127.0.0.1] > by Tyr with POP3 (fetchmail-6.3.2) > for <ben@localhost> (single-drop); Wed, 23 Jan 2008 14:08:26 -0500 > (EST) Received: from localhost.localdomain (genetikayos.com > [64.246.26.120]) by genetikayos.com (8.12.11.20060308/8.12.11) with ESMTP > id m0NJ7G4Q026459 for <[email protected]>; Wed, 23 Jan 2008 11:07:28 > -0800 Received: from localhost ([127.0.0.1]) by Fenrir (bssmtp 0.3) with > SMTP id 70419937; Wed, 23 Jan 2008 14:07:40 -0500 > From: Ben Okopnik <[email protected]> > Message-ID: <[email protected]> > Date: Wed, 23 Jan 2008 14:07:40 -0500 > To: Ben Okopnik <[email protected]> > MIME-Version: 1.0 > Content-Type: text/plain; charset=utf-8 > Content-Disposition: inline > Content-Transfer-Encoding: 8bit > User-Agent: Mutt/1.5.11 > X-Spam-Status: No, score=-1.6 required=5.0 > tests=AWL,BAYES_00,MISSING_SUBJECT autolearn=no version=3.1.8 > X-Spam-Checker-Version: SpamAssassin 3.1.8 (2007-02-13) on genetikayos.com > '' > > I note that there's no conversion warning in it - and yet, it's broken.
That's got all the proper MIME headers. While I wouldn't normally expect MTAs to rely on the MIME headers, if they've got to update the MIME headers when they convert, maybe they do require MIME headers to be present before doing the conversion?
I'm afraid the headers don't suggest much to me.
The 2 big differences are the MIME headers and the use of mutt/bssmtp. I don't have much info on bssmtp. I wonder if it offers 8bitmime? If not, maybe Mutt does the 8bit to 7bit conversion when handing of bssmtp? I think that's clutching at straws, but you never know.
Neil
Neil Youngman [Neil.Youngman at youngman.org.uk]
On Wednesday 23 January 2008 20:06, Neil Youngman wrote:
> On Wednesday 23 January 2008 19:17, Ben Okopnik wrote: > > From ben Wed Jan 23 14:08:26 2008 > > Return-Path: [email protected] > > Received: from localhost [127.0.0.1] > > by Tyr with POP3 (fetchmail-6.3.2) > > for <ben@localhost> (single-drop); Wed, 23 Jan 2008 14:08:26 > > -0500 (EST) Received: from localhost.localdomain (genetikayos.com > > [64.246.26.120]) by genetikayos.com (8.12.11.20060308/8.12.11) with ESMTP > > id m0NJ7G4Q026459 for <[email protected]>; Wed, 23 Jan 2008 11:07:28 > > -0800 Received: from localhost ([127.0.0.1]) by Fenrir (bssmtp 0.3) with > > SMTP id 70419937; Wed, 23 Jan 2008 14:07:40 -0500 > > From: Ben Okopnik <[email protected]> > > Message-ID: <[email protected]> > > Date: Wed, 23 Jan 2008 14:07:40 -0500 > > To: Ben Okopnik <[email protected]> > > MIME-Version: 1.0 > > Content-Type: text/plain; charset=utf-8 > > Content-Disposition: inline > > Content-Transfer-Encoding: 8bit > > User-Agent: Mutt/1.5.11 > > X-Spam-Status: No, score=-1.6 required=5.0 > > tests=AWL,BAYES_00,MISSING_SUBJECT autolearn=no version=3.1.8 > > X-Spam-Checker-Version: SpamAssassin 3.1.8 (2007-02-13) on > > genetikayos.com '' > > > > I note that there's no conversion warning in it - and yet, it's broken.
I missed that there's no mention of "quoted-printable" at all, which suggests that conversion to quoted-printable is a red herring, absent any other indication of quoted-printable encoding.
Neil
René Pfeiffer [lynx at luchs.at]
On Jan 23, 2008 at 2019 +0000, Neil Youngman appeared and said:
> On Wednesday 23 January 2008 20:06, Neil Youngman wrote: > > On Wednesday 23 January 2008 19:17, Ben Okopnik wrote: > > > From ben Wed Jan 23 14:08:26 2008 > > > Return-Path: [email protected] > > > Received: from localhost [127.0.0.1] > > > by Tyr with POP3 (fetchmail-6.3.2) > > > for <ben@localhost> (single-drop); Wed, 23 Jan 2008 14:08:26 > > > -0500 (EST) Received: from localhost.localdomain (genetikayos.com > > > [64.246.26.120]) by genetikayos.com (8.12.11.20060308/8.12.11) with ESMTP > > > id m0NJ7G4Q026459 for <[email protected]>; Wed, 23 Jan 2008 11:07:28 > > > -0800 Received: from localhost ([127.0.0.1]) by Fenrir (bssmtp 0.3) with > > > SMTP id 70419937; Wed, 23 Jan 2008 14:07:40 -0500 > > > From: Ben Okopnik <[email protected]> > > > Message-ID: <[email protected]> > > > Date: Wed, 23 Jan 2008 14:07:40 -0500 > > > To: Ben Okopnik <[email protected]> > > > MIME-Version: 1.0 > > > Content-Type: text/plain; charset=3Dutf-8 > > > Content-Disposition: inline > > > Content-Transfer-Encoding: 8bit > > > User-Agent: Mutt/1.5.11 > > > X-Spam-Status: No, score=3D-1.6 required=3D5.0 > > > tests=3DAWL,BAYES_00,MISSING_SUBJECT autolearn=3Dno version=3D3.1.8 > > > X-Spam-Checker-Version: SpamAssassin 3.1.8 (2007-02-13) on > > > genetikayos.com '' > > > > > > I note that there's no conversion warning in it - and yet, it's broken. > > I missed that there's no mention of "quoted-printable" at all, which suggests > that conversion to quoted-printable is a red herring, absent any other > indication of quoted-printable encoding.
In this case the only thing I can think of is a filter plugin, be it anti-virus, anti-spam or even anti-UTF-8.
Best, René.
René Pfeiffer [lynx at luchs.at]
On Jan 23, 2008 at 1028 -0500, Ben Okopnik appeared and said:
> On Wed, Jan 23, 2008 at 06:35:11AM -0800, Breen Mullins wrote: > > * Ben Okopnik <[email protected]> [2008-01-23 00:09 -0500]: > > > > >On Wed, Jan 23, 2008 at 12:45:37AM +0100, Ren=C3=83=C2=A9 Pfeiffer wrote: > > >> On Jan 22, 2008 at 1712 -0500, Ben Okopnik appeared and said: > > >> > > >> > To the best of my troubleshooting ability so far, everything breaks > > >> > somewhere between the time that it leaves my mail client and the time > > >> > that it arrives at the LG mail server - but I've checked everything on > > >> > my end, and I'm sending it out with 'utf-8' as the charset and 8 bits > > >> > set for the SMTP transaction. I'm pretty much stuck at that point, and > > >> > have been for a while. > > > > Hmm. Your headers don't look good to me. > > > > =3D=3D=3D=3D=3D=3D=3D=3D=3D > > X-MIME-Autoconverted: from 8bit to quoted-printable by genetikayos.com id > > m0N59WPL030073 > > =3D=3D=3D=3D=3D=3D > > [blink] How the hell did I miss *that*?
I missed it as well.
> Wow. Thanks, Breen - that sounds like the place where it's getting > hosed, all right (I'll test that in a moment by sending it through > another server.) The question is, how do I stop it? Anybody familiar > with that aspect of SMTP?
Yes, Sendmail does a conversion when it's not configured to pass 8-bit data. This can be changed in the mailer flags; AFAIK one has to use the smtp8 mailer for SMTP.
Now I know why I don't see this problem. Postfix passes 8-bit data and PGP/MIME uses quoted-printable encoding to be on the safe side.
> I'm definitely sending out a character set - again, this is even before > it gets to 'bssmtp', Mutt sets it based on the content and it definitely > does the right thing when I have UTF-8 in there. Why it would get > converted is a mystery to me.
I think it's due to the MTA doing the conversion mentioned above.
Best, René.
Ben Okopnik [ben at linuxgazette.net]
On Wed, Jan 23, 2008 at 04:56:20PM +0100, Ren<a9> Pfeiffer wrote:
> On Jan 23, 2008 at 1028 -0500, Ben Okopnik appeared and said: > > > Wow. Thanks, Breen - that sounds like the place where it's getting > > hosed, all right (I'll test that in a moment by sending it through > > another server.) The question is, how do I stop it? Anybody familiar > > with that aspect of SMTP? > > Yes, Sendmail does a conversion when it's not configured to pass 8-bit > data. This can be changed in the mailer flags; AFAIK one has to use the > smtp8 mailer for SMTP.
In theory, that's what the 'Bits => 8' was supposed to do - but somehow, it's not having any effect.
> Now I know why I don't see this problem. Postfix passes 8-bit data and > PGP/MIME uses quoted-printable encoding to be on the safe side.
I've been coming around to thinking that I should wrap up my messages as MIME attachments rather than inlining them. Again, a bit of a pain, but better than this.
> > I'm definitely sending out a character set - again, this is even before > > it gets to 'bssmtp', Mutt sets it based on the content and it definitely > > does the right thing when I have UTF-8 in there. Why it would get > > converted is a mystery to me. > > I think it's due to the MTA doing the conversion mentioned above.
I'll poke at it in the next day or two and see how it goes.
-- * Ben Okopnik * Editor-in-Chief, Linux Gazette * http://LinuxGazette.NET *
Breen Mullins [breen.mullins at gmail.com]
* Ben Okopnik <[email protected]> [2008-01-23 10:28 -0500]:
> >[blink] How the hell did I miss *that*? > >Wow. Thanks, Breen - that sounds like the place where it's getting >hosed, all right (I'll test that in a moment by sending it through >another server.) The question is, how do I stop it? Anybody familiar >with that aspect of SMTP?
It's not SMTP. I think Ren?'s right - it's a filter of some sort.
Note that after it's changed your Content-Type declaration, it puts the new one right after your subject line and just before the spamassassin line - which makes me think that the filter on that server is getting called just before SA.
Looks like it's just 'smart' enough to get triggered only part of the time.
(FWIW, your message was in Quoted-Printable by the time I got it from the list.)
Breen
-- Breen Mullins Menlo Park, California
Ben Okopnik [ben at linuxgazette.net]
On Wed, Jan 23, 2008 at 09:19:13PM -0800, Breen Mullins wrote:
> * Ben Okopnik <[email protected]> [2008-01-23 10:28 -0500]: > > > > >[blink] How the hell did I miss *that*? > > > >Wow. Thanks, Breen - that sounds like the place where it's getting > >hosed, all right (I'll test that in a moment by sending it through > >another server.) The question is, how do I stop it? Anybody familiar > >with that aspect of SMTP? > > It's not SMTP. I think Ren<a9>'s right - it's a filter of some sort.
Actually, it turns out that the Net::SMTP module on my end is screwing it up. I was pretty sure that was it by this point - since doing a manual SMTP transaction with the LG mail server made the UTF-8 content come through without any problems - and then I found a smoking gun.
Ren<a9>'s idea of using 'wireshark' got me started on troubleshooting the actual transaction nitty-gritty; I used 'tcpdump'... which, of course, showed nothing since I'm using SSH to port-forward LG:25 to my localhost:2025 (duh!) So then, I set 'Debug => 1' in Net::SMTP, and saw the following:
Net::SMTP>>> Net::SMTP(2.30) Net::SMTP>>> Net::Cmd(2.27) Net::SMTP>>> Exporter(5.58) Net::SMTP>>> IO::Socket::INET(1.31) Net::SMTP>>> IO::Socket(1.30) Net::SMTP>>> IO::Handle(1.27) Net::SMTP=GLOB(0x1150240)<<< 220 genetikayos.com ESMTP Sendmail 8.12.11.20060308/8.12.11; Thu, 24 Jan 2008 07:38:36 -0800 Net::SMTP=GLOB(0x1150240)>>> EHLO localhost.localdomain Net::SMTP=GLOB(0x1150240)<<< 250-genetikayos.com Hello genetikayos.com [64.246.26.120], pleased to meet you Net::SMTP=GLOB(0x1150240)<<< 250-ENHANCEDSTATUSCODES Net::SMTP=GLOB(0x1150240)<<< 250-PIPELINING Net::SMTP=GLOB(0x1150240)<<< 250-8BITMIME Net::SMTP=GLOB(0x1150240)<<< 250-SIZE Net::SMTP=GLOB(0x1150240)<<< 250-DSN Net::SMTP=GLOB(0x1150240)<<< 250-ETRN Net::SMTP=GLOB(0x1150240)<<< 250-AUTH GSSAPI Net::SMTP=GLOB(0x1150240)<<< 250-STARTTLS Net::SMTP=GLOB(0x1150240)<<< 250-DELIVERBY Net::SMTP=GLOB(0x1150240)<<< 250 HELP Net::SMTP=GLOB(0x1150240)>>> MAIL FROM[email protected]> BODY=8BITMIME Net::SMTP=GLOB(0x1150240)<<< 250 2.1.0 <[email protected]>... Sender ok Net::SMTP=GLOB(0x1150240)>>> RCPT TO[email protected]> Net::SMTP=GLOB(0x1150240)<<< 250 2.1.5 <[email protected]>... Recipient ok Net::SMTP=GLOB(0x1150240)>>> DATA Net::SMTP=GLOB(0x1150240)<<< 354 Enter mail, end with "." on a line by itself Net::SMTP=GLOB(0x1150240)>>> Received: from localhost ([127.0.0.1]) by Fenrir (bssmtp 0.3) with SMTP id 4796104; Net::SMTP=GLOB(0x1150240)>>> Thu, 24 Jan 2008 10:38:10 -0500 Net::SMTP=GLOB(0x1150240)>>> From: Ben Okopnik <[email protected]> Net::SMTP=GLOB(0x1150240)>>> Message-ID: <[email protected]> Net::SMTP=GLOB(0x1150240)>>> Date: Thu, 24 Jan 2008 10:38:09 -0500 Net::SMTP=GLOB(0x1150240)>>> To: Ben Okopnik <[email protected]> Net::SMTP=GLOB(0x1150240)>>> Subject: Ttt-ttt Net::SMTP=GLOB(0x1150240)>>> MIME-Version: 1.0 Net::SMTP=GLOB(0x1150240)>>> Content-Type: text/plain; charset=utf-8 Net::SMTP=GLOB(0x1150240)>>> Content-Disposition: inline Net::SMTP=GLOB(0x1150240)>>> Content-Transfer-Encoding: 8bit Net::SMTP=GLOB(0x1150240)>>> User-Agent: Mutt/1.5.11 Net::SMTP=GLOB(0x1150240)>>> Net::SMTP=GLOB(0x1150240)>>> test Net::SMTP=GLOB(0x1150240)>>> <b5><91> Net::SMTP=GLOB(0x1150240)>>> . Net::SMTP=GLOB(0x1150240)<<< 250 2.0.0 m0OFcaKj019698 Message accepted for delivery Jan 24 10:39:25 bssmtp: Removing c-4796104 and m-4796104 Jan 24 10:39:25 bssmtp: Unlocking message 4796104 Net::SMTP=GLOB(0x1150240)>>> QUIT Net::SMTP=GLOB(0x1150240)<<< 221 2.0.0 genetikayos.com closing connectionI don't know if the above characters are going to come through, but the content (lines 7 and 8 from the bottom) are already munged. Game, set, and match - it's on my end.
Net::SMTP does not appear to have any user-controllable handles for twiddling this kind of thing, so I'm going to a) see if I can fix the internal bits in it, b) if I can't do that in a reasonably short amount of time, I'm going to give up and replicate the above manually, /sans/ conversion, and c) file a bug report.
Thanks very much for the help, everybody!
-- * Ben Okopnik * Editor-in-Chief, Linux Gazette * http://LinuxGazette.NET *
Ben Okopnik [ben at linuxgazette.net]
Woo-HOO. Nailed it.
Marcus Kuhn's UTF-8 test file:
[[[ Elided for publication. You can see it at: http://www.cl.cam.ac.uk/~mgk25/ucs/examples/UTF-8-demo.txt ]]]
-- * Ben Okopnik * Editor-in-Chief, Linux Gazette * http://LinuxGazette.NET *
Karl-Heinz Herrmann [kh1 at khherrmann.de]
Hi,
I see the same garbage in my mailer (sylpheed-claws) which does support utf-8 usually.
A look in the header says....
On Tue, 22 Jan 2008 14:39:21 -0500 Ben Okopnik <[email protected]> wrote:
> X-MIME-Autoconverted: from 8bit to base64 by genetikayos.com id m0MJcqEJ007364 > translation Content-Type: text/plain; charset="utf-8" > Content-Transfer-Encoding: base64
but if I force utf8 the characters look a bit differnt (less chars per group). If I force the base64 decode as well I get:
> [Error decoding BASE64] > [Error decoding BASE64] > [Error decoding BASE64] > [Error decoding BASE64] > [Error decoding BASE64]
So something must have decoded the base64 already (and it seems not to tell in the header) and messed it up.
K.-H.