* smtpmail and ~/.authinfo @ 2011-08-20 10:26 Eli Zaretskii 2011-08-21 4:39 ` Lars Magne Ingebrigtsen 0 siblings, 1 reply; 45+ messages in thread From: Eli Zaretskii @ 2011-08-20 10:26 UTC (permalink / raw) To: Lars Magne Ingebrigtsen; +Cc: emacs-devel I switched today to using ~/.authinfo with smtpmail in Emacs 24 for the first time, and immediately hit a snag: sending mail failed with an error message from the SMTP server claiming that my login credentials were incorrect. It turned out that ~/.authinfo _must_ have Unix EOLs, or else sending mail with smtpmail not work. This happens because auth-source-search is called from smtpmail inside a form that let-binds coding-system-for-read to `binary'. That binding is there for reasons that have nothing to do with auth-source-search, and a cursory search finds no similar bindings in other users of auth-source-search. It should be easy to fix this, but I need to know what can be in Netrc files to do this correctly. Can these files include non-ASCII characters, or do all fields in these files have to be strict 7-bit ASCII? If non-ASCII characters are allowed, then are there any limitations on the charsets that can be used in Netrc files, or can they be anything at all in any valid encoding? Also, is there any need to do something special with non-ASCII characters (if they are allowed) when communicating with the SMTP server, like encode them in some particular way? Given the answers to these questions, fixing the above problem should be as simple as adding a few lines to smtpmail-via-smtp. TIA ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: smtpmail and ~/.authinfo 2011-08-20 10:26 smtpmail and ~/.authinfo Eli Zaretskii @ 2011-08-21 4:39 ` Lars Magne Ingebrigtsen 2011-08-21 6:12 ` Eli Zaretskii 0 siblings, 1 reply; 45+ messages in thread From: Lars Magne Ingebrigtsen @ 2011-08-21 4:39 UTC (permalink / raw) To: Eli Zaretskii; +Cc: emacs-devel Eli Zaretskii <eliz@gnu.org> writes: > It turned out that ~/.authinfo _must_ have Unix EOLs, or else sending > mail with smtpmail not work. This happens because auth-source-search > is called from smtpmail inside a form that let-binds > coding-system-for-read to `binary'. That binding is there for reasons > that have nothing to do with auth-source-search, and a cursory search > finds no similar bindings in other users of auth-source-search. Yes, that sounds like an accident. Perhaps that let binding should be narrowed dramatically? It's bad practise to bind variables like that over non-relevant function calls. > It should be easy to fix this, but I need to know what can be in Netrc > files to do this correctly. Can these files include non-ASCII > characters, or do all fields in these files have to be strict 7-bit > ASCII? There can basically be anything in the files, I think, and the encoding is local. But it's unusual to put non-ASCII into the file for most protocols, since so many protocols developed their auth schemes before anybody had considered the problem of coding systems. > Also, is there any need to do something special with non-ASCII > characters (if they are allowed) when communicating with the SMTP > server, like encode them in some particular way? It... varies. :-) SMTP allows using several AUTH methods, and I'm actually not sure whether any of them actually specify what charset to use. DIGEST-MD5 does, I think? But smtpmail.el doesn't support it, anyway. I think AUTH PLAIN, for instance, is basically essentially a binary thing, where you're allowed to use any blob of bytes as user name and password. Except NULs. This is just from memory, so if somebody knows better, please correct me... -- (domestic pets only, the antidote for overdose, milk.) bloggy blog http://lars.ingebrigtsen.no/ ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: smtpmail and ~/.authinfo 2011-08-21 4:39 ` Lars Magne Ingebrigtsen @ 2011-08-21 6:12 ` Eli Zaretskii 2011-08-21 19:25 ` Lars Magne Ingebrigtsen 0 siblings, 1 reply; 45+ messages in thread From: Eli Zaretskii @ 2011-08-21 6:12 UTC (permalink / raw) To: Lars Magne Ingebrigtsen; +Cc: emacs-devel > From: Lars Magne Ingebrigtsen <larsi@gnus.org> > Cc: emacs-devel@gnu.org > Date: Sun, 21 Aug 2011 06:39:12 +0200 > > Eli Zaretskii <eliz@gnu.org> writes: > > > It turned out that ~/.authinfo _must_ have Unix EOLs, or else sending > > mail with smtpmail not work. This happens because auth-source-search > > is called from smtpmail inside a form that let-binds > > coding-system-for-read to `binary'. That binding is there for reasons > > that have nothing to do with auth-source-search, and a cursory search > > finds no similar bindings in other users of auth-source-search. > > Yes, that sounds like an accident. Perhaps that let binding should be > narrowed dramatically? You should know: you put it there ;-) The log message for revision 104742, where these bindings were introduced, doesn't say much. Can you tell why did you need them (for Windows, no less)? > > It should be easy to fix this, but I need to know what can be in Netrc > > files to do this correctly. Can these files include non-ASCII > > characters, or do all fields in these files have to be strict 7-bit > > ASCII? > > There can basically be anything in the files, I think, and the encoding > is local. But it's unusual to put non-ASCII into the file for most > protocols, since so many protocols developed their auth schemes before > anybody had considered the problem of coding systems. > > > Also, is there any need to do something special with non-ASCII > > characters (if they are allowed) when communicating with the SMTP > > server, like encode them in some particular way? > > It... varies. :-) SMTP allows using several AUTH methods, and I'm > actually not sure whether any of them actually specify what charset to > use. DIGEST-MD5 does, I think? But smtpmail.el doesn't support it, > anyway. > > I think AUTH PLAIN, for instance, is basically essentially a binary > thing, where you're allowed to use any blob of bytes as user name and > password. Except NULs. This tells me that TRT is to bind coding-system-for-read to raw-text for auth-source-search to do its thing. But I'm still uncertain what should be the binding in the rest of smtpmail-via-smtp. ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: smtpmail and ~/.authinfo 2011-08-21 6:12 ` Eli Zaretskii @ 2011-08-21 19:25 ` Lars Magne Ingebrigtsen 2011-08-21 19:59 ` Eli Zaretskii 0 siblings, 1 reply; 45+ messages in thread From: Lars Magne Ingebrigtsen @ 2011-08-21 19:25 UTC (permalink / raw) To: Eli Zaretskii; +Cc: emacs-devel Eli Zaretskii <eliz@gnu.org> writes: > You should know: you put it there ;-) > > The log message for revision 104742, where these bindings were > introduced, doesn't say much. Can you tell why did you need them (for > Windows, no less)? I don't see any Windows special-casing there? Anyway, they're for the `open-network-stream' call. I've now wrapped them closer around that call, which should probably fix the problem. -- (domestic pets only, the antidote for overdose, milk.) bloggy blog http://lars.ingebrigtsen.no/ ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: smtpmail and ~/.authinfo 2011-08-21 19:25 ` Lars Magne Ingebrigtsen @ 2011-08-21 19:59 ` Eli Zaretskii 2011-08-21 20:17 ` Lars Magne Ingebrigtsen 2011-09-25 12:33 ` Ted Zlatanov 0 siblings, 2 replies; 45+ messages in thread From: Eli Zaretskii @ 2011-08-21 19:59 UTC (permalink / raw) To: Lars Magne Ingebrigtsen; +Cc: emacs-devel > From: Lars Magne Ingebrigtsen <larsi@gnus.org> > Cc: emacs-devel@gnu.org > Date: Sun, 21 Aug 2011 21:25:55 +0200 > > Eli Zaretskii <eliz@gnu.org> writes: > > > You should know: you put it there ;-) > > > > The log message for revision 104742, where these bindings were > > introduced, doesn't say much. Can you tell why did you need them (for > > Windows, no less)? > > I don't see any Windows special-casing there? I was quoting your ChangeLog entry: 2011-06-27 Lars Magne Ingebrigtsen <larsi@gnus.org> * mail/smtpmail.el (smtpmail-via-smtp): Bind coding-system-for-* to binary to possibly avoid line encoding issues on Windows (among other things). Btw, to _avoid_ line encoding issues on Windows, one should NOT bind coding-system-for-read to `binary', because that binding brings the CR-LF EOLs right into Emacs buffers. > Anyway, they're for the `open-network-stream' call. I've now wrapped > them closer around that call, which should probably fix the problem. Only partially. Since you say netrc files can have non-ASCII characters, we should bind coding-system-for-read to raw-text when calling auth-source-search. I will take care of that. Thanks. ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: smtpmail and ~/.authinfo 2011-08-21 19:59 ` Eli Zaretskii @ 2011-08-21 20:17 ` Lars Magne Ingebrigtsen 2011-08-22 5:35 ` Eli Zaretskii 2011-09-25 12:33 ` Ted Zlatanov 1 sibling, 1 reply; 45+ messages in thread From: Lars Magne Ingebrigtsen @ 2011-08-21 20:17 UTC (permalink / raw) To: Eli Zaretskii; +Cc: emacs-devel Eli Zaretskii <eliz@gnu.org> writes: > I was quoting your ChangeLog entry: > > 2011-06-27 Lars Magne Ingebrigtsen <larsi@gnus.org> > > * mail/smtpmail.el (smtpmail-via-smtp): Bind coding-system-for-* > to binary to possibly avoid line encoding issues on Windows (among > other things). > > Btw, to _avoid_ line encoding issues on Windows, one should NOT bind > coding-system-for-read to `binary', because that binding brings the > CR-LF EOLs right into Emacs buffers. Yes, and that's what we want, since SMTP uses CRLF as the line ending. And now I remember what the problem was -- under Windows it would infloop looking for CRLF, and never getting it, since Emacs helpfully auto-translated CRLF to newline under Windows. -- (domestic pets only, the antidote for overdose, milk.) bloggy blog http://lars.ingebrigtsen.no/ ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: smtpmail and ~/.authinfo 2011-08-21 20:17 ` Lars Magne Ingebrigtsen @ 2011-08-22 5:35 ` Eli Zaretskii 2011-09-10 19:01 ` Lars Magne Ingebrigtsen 0 siblings, 1 reply; 45+ messages in thread From: Eli Zaretskii @ 2011-08-22 5:35 UTC (permalink / raw) To: Lars Magne Ingebrigtsen; +Cc: emacs-devel > From: Lars Magne Ingebrigtsen <larsi@gnus.org> > Cc: emacs-devel@gnu.org > Date: Sun, 21 Aug 2011 22:17:44 +0200 > > Eli Zaretskii <eliz@gnu.org> writes: > > > I was quoting your ChangeLog entry: > > > > 2011-06-27 Lars Magne Ingebrigtsen <larsi@gnus.org> > > > > * mail/smtpmail.el (smtpmail-via-smtp): Bind coding-system-for-* > > to binary to possibly avoid line encoding issues on Windows (among > > other things). > > > > Btw, to _avoid_ line encoding issues on Windows, one should NOT bind > > coding-system-for-read to `binary', because that binding brings the > > CR-LF EOLs right into Emacs buffers. > > Yes, and that's what we want, since SMTP uses CRLF as the line ending. > And now I remember what the problem was -- under Windows it would > infloop looking for CRLF, and never getting it, since Emacs helpfully > auto-translated CRLF to newline under Windows. ??? The same auto-translation happens on Unix as well. Are you saying this problem happened only on Windows? ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: smtpmail and ~/.authinfo 2011-08-22 5:35 ` Eli Zaretskii @ 2011-09-10 19:01 ` Lars Magne Ingebrigtsen 0 siblings, 0 replies; 45+ messages in thread From: Lars Magne Ingebrigtsen @ 2011-09-10 19:01 UTC (permalink / raw) To: Eli Zaretskii; +Cc: emacs-devel Eli Zaretskii <eliz@gnu.org> writes: >> Yes, and that's what we want, since SMTP uses CRLF as the line ending. >> And now I remember what the problem was -- under Windows it would >> infloop looking for CRLF, and never getting it, since Emacs helpfully >> auto-translated CRLF to newline under Windows. > > ??? The same auto-translation happens on Unix as well. Are you saying > this problem happened only on Windows? Yup. -- (domestic pets only, the antidote for overdose, milk.) bloggy blog http://lars.ingebrigtsen.no/ ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: smtpmail and ~/.authinfo 2011-08-21 19:59 ` Eli Zaretskii 2011-08-21 20:17 ` Lars Magne Ingebrigtsen @ 2011-09-25 12:33 ` Ted Zlatanov 2011-09-25 12:48 ` Eli Zaretskii 1 sibling, 1 reply; 45+ messages in thread From: Ted Zlatanov @ 2011-09-25 12:33 UTC (permalink / raw) To: emacs-devel On Sun, 21 Aug 2011 22:59:03 +0300 Eli Zaretskii <eliz@gnu.org> wrote: EZ> Since [Lars says] netrc files can have non-ASCII characters, we EZ> should bind coding-system-for-read to raw-text when calling EZ> auth-source-search. I will take care of that. Thank you. This was set to binary by historical accident as you guessed. The Emacs authinfo/netrc format, incidentally, is evolved from the original because we use the Lisp reader to consume tokens. So for instance we can handle quoted strings, which do not work in other consumers of netrc-style files, notably libcurl and thus curl and Git. Thus the Emacs format is backwards compatible but older netrc consumers can't necessarily read our tokens, so I think it's OK that we go further and explicitly allow Unicode characters through UTF-8. Would it make sense, then, to explicitly use utf-8 or auto-guess for the encoding instead of raw-text? There is no standard that says it should be UTF-8 but that would be the cleanest compatibility path to allow older consumers still using ASCII to read our netrc files. I don't know the Emacs reading/writing coding systems well so any suggestions or ideas you have are most welcome. Thanks Ted ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: smtpmail and ~/.authinfo 2011-09-25 12:33 ` Ted Zlatanov @ 2011-09-25 12:48 ` Eli Zaretskii 2011-09-25 13:21 ` Ted Zlatanov 0 siblings, 1 reply; 45+ messages in thread From: Eli Zaretskii @ 2011-09-25 12:48 UTC (permalink / raw) To: emacs-devel > From: Ted Zlatanov <tzz@lifelogs.com> > Date: Sun, 25 Sep 2011 07:33:20 -0500 > Reply-To: emacs-devel@gnu.org > > Thus the Emacs format is backwards compatible but older netrc consumers > can't necessarily read our tokens, so I think it's OK that we go further > and explicitly allow Unicode characters through UTF-8. Would it make > sense, then, to explicitly use utf-8 or auto-guess for the encoding > instead of raw-text? Only if either (a) we encode the responses we send to the SMTP server during handshake, or (b) SMTP servers support UTF-8 encoding in the strings they expect to receive. Lars said "encoding is local", which suggest that neither of the above is true. raw-text leaves the byte stream unchanged, and only converts the EOL, so a netrc file encoded in some locale-specific way has a better chance with SMTP servers from the same locale. IOW, to answer your question, someone who knows more than I do about communications with SMTP servers should tell us how, if at all, non-ASCII characters are supposed to be handled when communicating with the server. ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: smtpmail and ~/.authinfo 2011-09-25 12:48 ` Eli Zaretskii @ 2011-09-25 13:21 ` Ted Zlatanov 2011-09-25 17:08 ` Eli Zaretskii 2011-09-26 18:04 ` Lars Magne Ingebrigtsen 0 siblings, 2 replies; 45+ messages in thread From: Ted Zlatanov @ 2011-09-25 13:21 UTC (permalink / raw) To: emacs-devel On Sun, 25 Sep 2011 08:48:07 -0400 Eli Zaretskii <eliz@gnu.org> wrote: >> From: Ted Zlatanov <tzz@lifelogs.com> >> Date: Sun, 25 Sep 2011 07:33:20 -0500 >> Reply-To: emacs-devel@gnu.org >> >> Thus the Emacs format is backwards compatible but older netrc consumers >> can't necessarily read our tokens, so I think it's OK that we go further >> and explicitly allow Unicode characters through UTF-8. Would it make >> sense, then, to explicitly use utf-8 or auto-guess for the encoding >> instead of raw-text? EZ> Only if either (a) we encode the responses we send to the SMTP server EZ> during handshake, or (b) SMTP servers support UTF-8 encoding in the EZ> strings they expect to receive. EZ> Lars said "encoding is local", which suggest that neither of the above EZ> is true. raw-text leaves the byte stream unchanged, and only converts EZ> the EOL, so a netrc file encoded in some locale-specific way has a EZ> better chance with SMTP servers from the same locale. EZ> IOW, to answer your question, someone who knows more than I do about EZ> communications with SMTP servers should tell us how, if at all, EZ> non-ASCII characters are supposed to be handled when communicating EZ> with the server. I don't think the SMTP interaction should not be the critical factor here. The SMTP library should deal with invalid (for SMTP) characters on its side; many other libraries and protocols use `auth-source-search' that can handle non-ASCII characters. In other words, let's not limit the capabilities of `auth-source-search' just because one of the users can't handle non-ASCII. I think authinfo/netrc files should be portable and support Unicode in a way that enables other (older or new!) software to use them too. IMHO enforcing UTF-8 encoding is the best way to achieve that. Ted ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: smtpmail and ~/.authinfo 2011-09-25 13:21 ` Ted Zlatanov @ 2011-09-25 17:08 ` Eli Zaretskii 2011-09-26 14:41 ` Ted Zlatanov 2011-09-26 18:04 ` Lars Magne Ingebrigtsen 1 sibling, 1 reply; 45+ messages in thread From: Eli Zaretskii @ 2011-09-25 17:08 UTC (permalink / raw) To: emacs-devel > From: Ted Zlatanov <tzz@lifelogs.com> > Date: Sun, 25 Sep 2011 08:21:36 -0500 > > I don't think the SMTP interaction should not be the critical factor > here. The SMTP library should deal with invalid (for SMTP) characters > on its side; many other libraries and protocols use `auth-source-search' > that can handle non-ASCII characters. In other words, let's not limit > the capabilities of `auth-source-search' just because one of the users > can't handle non-ASCII. > > I think authinfo/netrc files should be portable and support Unicode in a > way that enables other (older or new!) software to use them too. IMHO > enforcing UTF-8 encoding is the best way to achieve that. Fine with me, but then Someone™ should simultaneously modify smtpmail (and perhaps also other users of authinfo) to DTRT when communicating with the SMTP server, whatever "TRT" may mean in this case. Do one, but not the other, and we will have a bug waiting to happen on our ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: smtpmail and ~/.authinfo 2011-09-25 17:08 ` Eli Zaretskii @ 2011-09-26 14:41 ` Ted Zlatanov 2011-09-26 16:18 ` Eli Zaretskii 0 siblings, 1 reply; 45+ messages in thread From: Ted Zlatanov @ 2011-09-26 14:41 UTC (permalink / raw) To: emacs-devel On Sun, 25 Sep 2011 20:08:30 +0300 Eli Zaretskii <eliz@gnu.org> wrote: >> From: Ted Zlatanov <tzz@lifelogs.com> >> Date: Sun, 25 Sep 2011 08:21:36 -0500 >> >> I don't think the SMTP interaction should not be the critical factor >> here. The SMTP library should deal with invalid (for SMTP) characters >> on its side; many other libraries and protocols use `auth-source-search' >> that can handle non-ASCII characters. In other words, let's not limit >> the capabilities of `auth-source-search' just because one of the users >> can't handle non-ASCII. >> >> I think authinfo/netrc files should be portable and support Unicode in a >> way that enables other (older or new!) software to use them too. IMHO >> enforcing UTF-8 encoding is the best way to achieve that. EZ> Fine with me, but then Someone™ should simultaneously modify smtpmail EZ> (and perhaps also other users of authinfo) to DTRT when communicating EZ> with the SMTP server, whatever "TRT" may mean in this case. Do one, EZ> but not the other, and we will have a bug waiting to happen on our I have a pretty good handle on the `auth-source-search' users in the Emacs space. More importantly, this makes no difference on the API user's side. With raw-text they also get potentially unsafe characters, right? We're just going to enforce UTF-8 as the non-ASCII encoding and in the case of ASCII data UTF-8 is the same as unencoded. Could you help me (or point me to the right examples) to: - always create/write a file in UTF-8 on every platform - opportunistically open the file in binary, raw-text, UTF-8, etc. on every platform I'll use your suggestions in auth-source.el's authinfo/netrc backend. Thanks Ted ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: smtpmail and ~/.authinfo 2011-09-26 14:41 ` Ted Zlatanov @ 2011-09-26 16:18 ` Eli Zaretskii 2011-09-26 16:53 ` Ted Zlatanov 2011-09-26 17:00 ` Stefan Monnier 0 siblings, 2 replies; 45+ messages in thread From: Eli Zaretskii @ 2011-09-26 16:18 UTC (permalink / raw) To: emacs-devel > From: Ted Zlatanov <tzz@lifelogs.com> > Date: Mon, 26 Sep 2011 09:41:08 -0500 > > With raw-text they also get potentially unsafe characters, right? They get what they put in the file. If we assume that what's there is acceptable by their SMTP server, it's "safe". > Could you help me (or point me to the right examples) to: > > - always create/write a file in UTF-8 on every platform You mean, force Emacs to encode .authinfo in UTF-8 when creating it? I guess that's the job for file-coding-system-alist. > - opportunistically open the file in binary, raw-text, UTF-8, etc. on > every platform Sorry, I don't understand what you'd like to do. Please elaborate, and I will gladly try to help. ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: smtpmail and ~/.authinfo 2011-09-26 16:18 ` Eli Zaretskii @ 2011-09-26 16:53 ` Ted Zlatanov 2011-09-26 17:15 ` Eli Zaretskii 2011-09-26 17:00 ` Stefan Monnier 1 sibling, 1 reply; 45+ messages in thread From: Ted Zlatanov @ 2011-09-26 16:53 UTC (permalink / raw) To: emacs-devel On Mon, 26 Sep 2011 19:18:39 +0300 Eli Zaretskii <eliz@gnu.org> wrote: >> From: Ted Zlatanov <tzz@lifelogs.com> >> Date: Mon, 26 Sep 2011 09:41:08 -0500 >> >> With raw-text they also get potentially unsafe characters, right? EZ> They get what they put in the file. If we assume that what's there is EZ> acceptable by their SMTP server, it's "safe". Exactly. So the UTF-8 encoding won't change anything, it will only make it easier for the netrc/authinfo file to be shared :) >> Could you help me (or point me to the right examples) to: >> >> - always create/write a file in UTF-8 on every platform EZ> You mean, force Emacs to encode .authinfo in UTF-8 when creating it? EZ> I guess that's the job for file-coding-system-alist. So I would just override that when writing the netrc/authinfo file. I can't imagine any value in letting the user override the UTF-8 encoding, can you? >> - opportunistically open the file in binary, raw-text, UTF-8, etc. on >> every platform EZ> Sorry, I don't understand what you'd like to do. Please elaborate, EZ> and I will gladly try to help. There must be netrc/authinfo files written in binary encoding because that was the default. I'd like to open them, but also open UTF-8 encoded netrc/authinfo files, and also accept raw-text or any other reasonably guessed encoding. For UTF-8 there are heuristics but Emacs has them built-in, right? So I don't have to write special code to guess? The alternative is to try as utf-8, then try binary, then give up. But that's less friendly to the user I think. Ted ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: smtpmail and ~/.authinfo 2011-09-26 16:53 ` Ted Zlatanov @ 2011-09-26 17:15 ` Eli Zaretskii 2011-09-26 17:23 ` Eli Zaretskii 2011-09-26 17:31 ` Ted Zlatanov 0 siblings, 2 replies; 45+ messages in thread From: Eli Zaretskii @ 2011-09-26 17:15 UTC (permalink / raw) To: emacs-devel > From: Ted Zlatanov <tzz@lifelogs.com> > Date: Mon, 26 Sep 2011 11:53:18 -0500 > > >> Could you help me (or point me to the right examples) to: > >> > >> - always create/write a file in UTF-8 on every platform > > EZ> You mean, force Emacs to encode .authinfo in UTF-8 when creating it? > EZ> I guess that's the job for file-coding-system-alist. > > So I would just override that when writing the netrc/authinfo file. I > can't imagine any value in letting the user override the UTF-8 encoding, > can you? No, I cannot. > >> - opportunistically open the file in binary, raw-text, UTF-8, etc. on > >> every platform > > EZ> Sorry, I don't understand what you'd like to do. Please elaborate, > EZ> and I will gladly try to help. > > There must be netrc/authinfo files written in binary encoding because > that was the default. I'd like to open them, but also open UTF-8 > encoded netrc/authinfo files, and also accept raw-text or any other > reasonably guessed encoding. For UTF-8 there are heuristics but Emacs > has them built-in, right? So I don't have to write special code to > guess? > > The alternative is to try as utf-8, then try binary, then give up. But > that's less friendly to the user I think. Just let Emacs do its usual guesswork. ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: smtpmail and ~/.authinfo 2011-09-26 17:15 ` Eli Zaretskii @ 2011-09-26 17:23 ` Eli Zaretskii 2011-09-26 17:31 ` Ted Zlatanov 1 sibling, 0 replies; 45+ messages in thread From: Eli Zaretskii @ 2011-09-26 17:23 UTC (permalink / raw) To: emacs-devel > Date: Mon, 26 Sep 2011 20:15:52 +0300 > From: Eli Zaretskii <eliz@gnu.org> > > > From: Ted Zlatanov <tzz@lifelogs.com> > > Date: Mon, 26 Sep 2011 11:53:18 -0500 > > > > >> Could you help me (or point me to the right examples) to: > > >> > > >> - always create/write a file in UTF-8 on every platform > > > > EZ> You mean, force Emacs to encode .authinfo in UTF-8 when creating it? > > EZ> I guess that's the job for file-coding-system-alist. > > > > So I would just override that when writing the netrc/authinfo file. It may be worthwhile to add this permanently to the alist we maintain in Emacs. > > I can't imagine any value in letting the user override the UTF-8 > > encoding, can you? > > No, I cannot. That said, users can always override if they want with "C-x RET c". ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: smtpmail and ~/.authinfo 2011-09-26 17:15 ` Eli Zaretskii 2011-09-26 17:23 ` Eli Zaretskii @ 2011-09-26 17:31 ` Ted Zlatanov 1 sibling, 0 replies; 45+ messages in thread From: Ted Zlatanov @ 2011-09-26 17:31 UTC (permalink / raw) To: emacs-devel On Mon, 26 Sep 2011 20:15:52 +0300 Eli Zaretskii <eliz@gnu.org> wrote: >> From: Ted Zlatanov <tzz@lifelogs.com> >> Date: Mon, 26 Sep 2011 11:53:18 -0500 >> >> >> Could you help me (or point me to the right examples) to: >> >> >> >> - always create/write a file in UTF-8 on every platform >> EZ> You mean, force Emacs to encode .authinfo in UTF-8 when creating it? EZ> I guess that's the job for file-coding-system-alist. >> >> So I would just override that when writing the netrc/authinfo file. I >> can't imagine any value in letting the user override the UTF-8 encoding, >> can you? EZ> No, I cannot. OK, I can add that. Stefan, would you consider this a bug fix (since the previous writes were broken)? If not I'll hold off until the pretest is done. >> >> - opportunistically open the file in binary, raw-text, UTF-8, etc. on >> >> every platform ... EZ> Just let Emacs do its usual guesswork. Great, thanks for the advice. Ted ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: smtpmail and ~/.authinfo 2011-09-26 16:18 ` Eli Zaretskii 2011-09-26 16:53 ` Ted Zlatanov @ 2011-09-26 17:00 ` Stefan Monnier 2011-09-26 17:28 ` Ted Zlatanov 1 sibling, 1 reply; 45+ messages in thread From: Stefan Monnier @ 2011-09-26 17:00 UTC (permalink / raw) To: Eli Zaretskii; +Cc: emacs-devel > You mean, force Emacs to encode .authinfo in UTF-8 when creating it? > I guess that's the job for file-coding-system-alist. Or adding a -*- coding: utf-8 -*- cookie to the file. Stefan ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: smtpmail and ~/.authinfo 2011-09-26 17:00 ` Stefan Monnier @ 2011-09-26 17:28 ` Ted Zlatanov 2011-09-26 21:27 ` Stefan Monnier 0 siblings, 1 reply; 45+ messages in thread From: Ted Zlatanov @ 2011-09-26 17:28 UTC (permalink / raw) To: emacs-devel On Mon, 26 Sep 2011 13:00:47 -0400 Stefan Monnier <monnier@IRO.UMontreal.CA> wrote: >> You mean, force Emacs to encode .authinfo in UTF-8 when creating it? >> I guess that's the job for file-coding-system-alist. SM> Or adding a -*- coding: utf-8 -*- cookie to the file. That's not standard, so a netrc/authinfo file created by someone else would not have it and we're back to guessing. Better to guess on read, enforce UTF-8 on write IMO. Ted ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: smtpmail and ~/.authinfo 2011-09-26 17:28 ` Ted Zlatanov @ 2011-09-26 21:27 ` Stefan Monnier 0 siblings, 0 replies; 45+ messages in thread From: Stefan Monnier @ 2011-09-26 21:27 UTC (permalink / raw) To: emacs-devel >>> You mean, force Emacs to encode .authinfo in UTF-8 when creating it? >>> I guess that's the job for file-coding-system-alist. SM> Or adding a -*- coding: utf-8 -*- cookie to the file. > That's not standard, so a netrc/authinfo file created by someone else > would not have it and we're back to guessing. Better to guess on read, > enforce UTF-8 on write IMO. For the "read, modify, write", if the guess is wrong, the write will not magically be fixed by using utf-8. What do the files that we generated until now contain? utf-8? something else? Stefan ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: smtpmail and ~/.authinfo 2011-09-25 13:21 ` Ted Zlatanov 2011-09-25 17:08 ` Eli Zaretskii @ 2011-09-26 18:04 ` Lars Magne Ingebrigtsen 2011-09-26 19:22 ` Ted Zlatanov 1 sibling, 1 reply; 45+ messages in thread From: Lars Magne Ingebrigtsen @ 2011-09-26 18:04 UTC (permalink / raw) To: emacs-devel Ted Zlatanov <tzz@lifelogs.com> writes: > I think authinfo/netrc files should be portable and support Unicode in a > way that enables other (older or new!) software to use them too. IMHO > enforcing UTF-8 encoding is the best way to achieve that. That's not realistic, I think. Look, these protocols (SMTP, NNTP, pop3, etc) are really old. Most of them were created in a "just send ASCII" world, which then morphed into a "just send 8bit, just make sure you don't send any null bytes" world, which then again sort of morphed into a world that's somewhat cognisant of charsets server-side. But for NNTP basic auth, for instance, it's perfectly valid to use the five-byte sequence representing "héllo" in iso-8859-15 as the password, if that's what the user has set up, and it's what the NNTP server has stored. (And the same goes for pop3 and SMTP. (For IMAP the situation is different -- there they've actually defined the charset to use, and it's a tweak on utf7.)) So there isn't any wiggle room here. The user has to be able to store a random sequence of bytes into the .authinfo file to be able to contact their servers -- if they have been careless enough to create a non-ASCII user name or password. Because using non-ASCII credentials is so fraught with problems, almost nobody does it, which is why we don't get many (or any, really) bug reports about this. -- (domestic pets only, the antidote for overdose, milk.) bloggy blog http://lars.ingebrigtsen.no/ ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: smtpmail and ~/.authinfo 2011-09-26 18:04 ` Lars Magne Ingebrigtsen @ 2011-09-26 19:22 ` Ted Zlatanov 2011-09-26 19:30 ` Lars Magne Ingebrigtsen ` (2 more replies) 0 siblings, 3 replies; 45+ messages in thread From: Ted Zlatanov @ 2011-09-26 19:22 UTC (permalink / raw) To: emacs-devel On Mon, 26 Sep 2011 20:04:42 +0200 Lars Magne Ingebrigtsen <larsi@gnus.org> wrote: LMI> Ted Zlatanov <tzz@lifelogs.com> writes: >> I think authinfo/netrc files should be portable and support Unicode in a >> way that enables other (older or new!) software to use them too. IMHO >> enforcing UTF-8 encoding is the best way to achieve that. LMI> That's not realistic, I think. ... LMI> So there isn't any wiggle room here. The user has to be able to store a LMI> random sequence of bytes into the .authinfo file to be able to contact LMI> their servers -- if they have been careless enough to create a non-ASCII LMI> user name or password. I agree 100%. I'm saying we should save the netrc/authinfo file in the UTF-8 coding system instead of raw-text so Unicode characters in there are usable by other programs too. Forget the `auth-source-search' callers, they won't know or care. There will be no difference to their usage or the data they get. I believe random bytes can be encoded just fine by UTF-8. If they are read by a program that doesn't know UTF-8 that's a problem, but IMO we can live with it and it's entirely theoretical. Ted ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: smtpmail and ~/.authinfo 2011-09-26 19:22 ` Ted Zlatanov @ 2011-09-26 19:30 ` Lars Magne Ingebrigtsen 2011-09-26 19:48 ` Ted Zlatanov 2011-09-26 19:34 ` Eli Zaretskii 2011-09-27 13:54 ` Jason Rumney 2 siblings, 1 reply; 45+ messages in thread From: Lars Magne Ingebrigtsen @ 2011-09-26 19:30 UTC (permalink / raw) To: emacs-devel Ted Zlatanov <tzz@lifelogs.com> writes: > I agree 100%. I'm saying we should save the netrc/authinfo file in the > UTF-8 coding system instead of raw-text so Unicode characters in there > are usable by other programs too. No, if the sequence "héllo" is a five-byte sequence, it should be saved as such. Otherwise it's not usable to other programs. -- (domestic pets only, the antidote for overdose, milk.) bloggy blog http://lars.ingebrigtsen.no/ ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: smtpmail and ~/.authinfo 2011-09-26 19:30 ` Lars Magne Ingebrigtsen @ 2011-09-26 19:48 ` Ted Zlatanov 2011-09-26 21:31 ` Stefan Monnier 0 siblings, 1 reply; 45+ messages in thread From: Ted Zlatanov @ 2011-09-26 19:48 UTC (permalink / raw) To: emacs-devel On Mon, 26 Sep 2011 21:30:16 +0200 Lars Magne Ingebrigtsen <larsi@gnus.org> wrote: LMI> Ted Zlatanov <tzz@lifelogs.com> writes: >> I agree 100%. I'm saying we should save the netrc/authinfo file in the >> UTF-8 coding system instead of raw-text so Unicode characters in there >> are usable by other programs too. LMI> No, if the sequence "héllo" is a five-byte sequence, it should be saved LMI> as such. Otherwise it's not usable to other programs. That's exactly my point. Right now we save as raw-text, which is not usable to other programs in the long term. In UTF-8 it would be saved "as such" because IIRC all codepoints under 255 don't need to be encoded (and your string goes up to 233). Ted ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: smtpmail and ~/.authinfo 2011-09-26 19:48 ` Ted Zlatanov @ 2011-09-26 21:31 ` Stefan Monnier 2011-09-26 21:43 ` Lars Magne Ingebrigtsen ` (2 more replies) 0 siblings, 3 replies; 45+ messages in thread From: Stefan Monnier @ 2011-09-26 21:31 UTC (permalink / raw) To: emacs-devel > That's exactly my point. Right now we save as raw-text, which is not > usable to other programs in the long term. In UTF-8 it would be saved > "as such" because IIRC all codepoints under 255 don't need to be encoded > (and your string goes up to 233). No, chars from the latin-1 set have identical *Unicode* code points (i.e. between 128 and 255), but their encoding into utf-8 occupies 2 bytes. As for saving random bytes, you can't either, at least not in a way that is supported by all utf-8 implementations. I think raw-text is more likely to work, based on what Lars says. Stefan ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: smtpmail and ~/.authinfo 2011-09-26 21:31 ` Stefan Monnier @ 2011-09-26 21:43 ` Lars Magne Ingebrigtsen 2011-09-26 21:54 ` Ted Zlatanov 2011-09-27 4:07 ` Stephen J. Turnbull 2011-09-26 21:55 ` Ted Zlatanov 2011-09-27 2:57 ` Eli Zaretskii 2 siblings, 2 replies; 45+ messages in thread From: Lars Magne Ingebrigtsen @ 2011-09-26 21:43 UTC (permalink / raw) To: Stefan Monnier; +Cc: emacs-devel Stefan Monnier <monnier@IRO.UMontreal.CA> writes: > I think raw-text is more likely to work, based on what Lars says. On the other hand, if auth-source prompts for a password, and you type in something non-ASCII, the result will probably be something utf8-ey, I think? Which may or may not work on the server, but I don't really see what to do about it. Except asking the user "You've typed in something non-ASCII. What bit pattern are you imagining Emacs will actually send to the server?" :-) Or what charset to use. Probably slightly less confusing, but probably not a whole lot more. -- (domestic pets only, the antidote for overdose, milk.) bloggy blog http://lars.ingebrigtsen.no/ ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: smtpmail and ~/.authinfo 2011-09-26 21:43 ` Lars Magne Ingebrigtsen @ 2011-09-26 21:54 ` Ted Zlatanov 2011-09-27 4:07 ` Stephen J. Turnbull 1 sibling, 0 replies; 45+ messages in thread From: Ted Zlatanov @ 2011-09-26 21:54 UTC (permalink / raw) To: emacs-devel On Mon, 26 Sep 2011 23:43:08 +0200 Lars Magne Ingebrigtsen <larsi@gnus.org> wrote: LMI> Stefan Monnier <monnier@IRO.UMontreal.CA> writes: >> I think raw-text is more likely to work, based on what Lars says. LMI> On the other hand, if auth-source prompts for a password, and you type LMI> in something non-ASCII, the result will probably be something utf8-ey, I LMI> think? Which may or may not work on the server, but I don't really see LMI> what to do about it. Except asking the user "You've typed in something LMI> non-ASCII. What bit pattern are you imagining Emacs will actually send LMI> to the server?" :-) LMI> Or what charset to use. Probably slightly less confusing, but probably LMI> not a whole lot more. OK, let's start from the beginning. We should support Unicode characters for secrets, yes? I think each API user should limit that further, but there's no reason for auth-source to block some data arbitrarily. So if I can't encode the secrets with UTF-8, what should I use that gives me good compatibility with other GNU and other libraries and programs like libcurl for instance? Ted ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: smtpmail and ~/.authinfo 2011-09-26 21:43 ` Lars Magne Ingebrigtsen 2011-09-26 21:54 ` Ted Zlatanov @ 2011-09-27 4:07 ` Stephen J. Turnbull 2011-09-27 6:11 ` Lars Magne Ingebrigtsen 2011-09-27 10:29 ` Ted Zlatanov 1 sibling, 2 replies; 45+ messages in thread From: Stephen J. Turnbull @ 2011-09-27 4:07 UTC (permalink / raw) To: Lars Magne Ingebrigtsen; +Cc: Stefan Monnier, emacs-devel Lars Magne Ingebrigtsen writes: > On the other hand, if auth-source prompts for a password, and you type > in something non-ASCII, the result will probably be something utf8-ey, I > think? No. 1.3 billion Chinese are very likely to use GB2312, not to mention 130 million Japanese who use Shift JIS. These are not UTF-8-ey in several ways, and Shift JIS even abuses octets in the ASCII range for use in multibyte characters. If you have *no* password and the user asks to store one, yes, use UTF-8, and warn the user that Emacs has chosen to use the standard Unicode encoding "UTF-8", but other applications (especially on Windows) may choose something else. In which case the user will be unable to log in from those applications. If you already have a password, it should be read verbatim (binary, or raw-text should do given the line-oriented nature of these configuration files) and treated as a binary blob. > Or what charset to use. Probably slightly less confusing, but probably > not a whole lot more. Mule should have a language-to-list-of-charset alist around somewhere. Use that to generate a menu of suggestions. Ask Ken'ichi about how to access it. ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: smtpmail and ~/.authinfo 2011-09-27 4:07 ` Stephen J. Turnbull @ 2011-09-27 6:11 ` Lars Magne Ingebrigtsen 2011-09-27 10:29 ` Ted Zlatanov 1 sibling, 0 replies; 45+ messages in thread From: Lars Magne Ingebrigtsen @ 2011-09-27 6:11 UTC (permalink / raw) To: Stephen J. Turnbull; +Cc: Stefan Monnier, emacs-devel "Stephen J. Turnbull" <stephen@xemacs.org> writes: > > On the other hand, if auth-source prompts for a password, and you type > > in something non-ASCII, the result will probably be something utf8-ey, I > > think? > > No. 1.3 billion Chinese are very likely to use GB2312, not to mention > 130 million Japanese who use Shift JIS. These are not UTF-8-ey in > several ways, and Shift JIS even abuses octets in the ASCII range for > use in multibyte characters. I meant: If you type something into auth-source today that is non-ASCII, what you'll get in the .authinfo file is probably utf-8. Which, as you point out, may not be what the user wants. -- (domestic pets only, the antidote for overdose, milk.) bloggy blog http://lars.ingebrigtsen.no/ ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: smtpmail and ~/.authinfo 2011-09-27 4:07 ` Stephen J. Turnbull 2011-09-27 6:11 ` Lars Magne Ingebrigtsen @ 2011-09-27 10:29 ` Ted Zlatanov 2011-09-27 12:33 ` Stephen J. Turnbull 1 sibling, 1 reply; 45+ messages in thread From: Ted Zlatanov @ 2011-09-27 10:29 UTC (permalink / raw) To: emacs-devel On Tue, 27 Sep 2011 13:07:42 +0900 "Stephen J. Turnbull" <stephen@xemacs.org> wrote: SJT> Lars Magne Ingebrigtsen writes: >> On the other hand, if auth-source prompts for a password, and you type >> in something non-ASCII, the result will probably be something utf8-ey, I >> think? SJT> No. 1.3 billion Chinese are very likely to use GB2312, not to mention SJT> 130 million Japanese who use Shift JIS. These are not UTF-8-ey in SJT> several ways, and Shift JIS even abuses octets in the ASCII range for SJT> use in multibyte characters. UTF-8 is an encoding; you're talking about charsets. Can you explain more precisely what you mean by "not UTF-8-ey in several ways"? SJT> If you have *no* password and the user asks to store one, yes, use SJT> UTF-8, and warn the user that Emacs has chosen to use the standard SJT> Unicode encoding "UTF-8", but other applications (especially on SJT> Windows) may choose something else. In which case the user will be SJT> unable to log in from those applications. Would it be enough to let the user override that coding system choice through a defcustom? For all the use cases I have seen, UTF-8 is enough, so I'd rather use it by default. SJT> If you already have a password, it should be read verbatim (binary, or SJT> raw-text should do given the line-oriented nature of these SJT> configuration files) and treated as a binary blob. That's not helpful when you need to encode it for IMAP, for instance. You have to know the actual characters that make up the binary blob. Ted ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: smtpmail and ~/.authinfo 2011-09-27 10:29 ` Ted Zlatanov @ 2011-09-27 12:33 ` Stephen J. Turnbull 2011-09-27 20:15 ` Ted Zlatanov 0 siblings, 1 reply; 45+ messages in thread From: Stephen J. Turnbull @ 2011-09-27 12:33 UTC (permalink / raw) To: emacs-devel Ted Zlatanov writes: > UTF-8 is an encoding; you're talking about charsets. No, I'm talking about encodings. I'm not entirely sure about GB 2312, but I believe it has a defined preferred encoding (the one registered as the MIME charset GB2312 -- MIME charsets are all encodings, they specify what *bytes* will appear in the stream, not just an abstract character to abstract integer mapping). Shift JIS is most definitely an encoding for the JIS character set (although which JIS character set is poorly defined). > Can you explain more precisely what you mean by "not UTF-8-ey in > several ways"? In the case of Shift JIS, I already did: octets in the ASCII range are used in multibyte characters. That *never* happens in valid UTF-8. The distinctions for GB2312 are more nebulous. But Lars meant something different, so it's not relevent. > Would it be enough to let the user override that coding system choice > through a defcustom? No. That requires a huge amount of user sophistication, and is too global; different applications might very well use different coding systems for non-ASCII characters. > For all the use cases I have seen, UTF-8 is enough, so I'd rather > use it by default. Isn't that what I said? > SJT> If you already have a password, it should be read verbatim (binary, or > SJT> raw-text should do given the line-oriented nature of these > SJT> configuration files) and treated as a binary blob. > > That's not helpful when you need to encode it for IMAP, for instance. > You have to know the actual characters that make up the binary blob. Since when? I haven't paid much attention to IMAP since RFC 3501 was an internet-draft, but in that document there are a few commands that accept a CHARSET parameter. LOGIN and AUTHENTICATE aren't among them. So you're just passing along binary blobs, which in the case of LOGIN will often look like somebody's birthday or a child's name, but that's just an unfortunate accident. ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: smtpmail and ~/.authinfo 2011-09-27 12:33 ` Stephen J. Turnbull @ 2011-09-27 20:15 ` Ted Zlatanov 2011-09-28 1:41 ` Stephen J. Turnbull 0 siblings, 1 reply; 45+ messages in thread From: Ted Zlatanov @ 2011-09-27 20:15 UTC (permalink / raw) To: emacs-devel On Tue, 27 Sep 2011 21:33:37 +0900 "Stephen J. Turnbull" <stephen@xemacs.org> wrote: SJT> Ted Zlatanov writes: >> UTF-8 is an encoding; you're talking about charsets. SJT> No, I'm talking about encodings. I'm not entirely sure about GB 2312, SJT> but I believe it has a defined preferred encoding (the one registered SJT> as the MIME charset GB2312 -- MIME charsets are all encodings, they SJT> specify what *bytes* will appear in the stream, not just an abstract SJT> character to abstract integer mapping). Shift JIS is most definitely SJT> an encoding for the JIS character set (although which JIS character SJT> set is poorly defined). Thanks for correcting my misunderstanding. SJT> If you already have a password, it should be read verbatim (binary, or SJT> raw-text should do given the line-oriented nature of these SJT> configuration files) and treated as a binary blob. >> >> That's not helpful when you need to encode it for IMAP, for instance. >> You have to know the actual characters that make up the binary blob. SJT> Since when? I haven't paid much attention to IMAP since RFC 3501 was SJT> an internet-draft, but in that document there are a few commands that SJT> accept a CHARSET parameter. LOGIN and AUTHENTICATE aren't among them. SJT> So you're just passing along binary blobs, which in the case of LOGIN SJT> will often look like somebody's birthday or a child's name, but that's SJT> just an unfortunate accident. Ditto. I thought the CHARSET was used for passwords. On Tue, 27 Sep 2011 07:31:23 -0400 Eli Zaretskii <eliz@gnu.org> wrote: >> From: Ted Zlatanov <tzz@lifelogs.com> >> Date: Tue, 27 Sep 2011 05:38:28 -0500 >> Reply-To: emacs-devel@gnu.org >> >> On Tue, 27 Sep 2011 05:57:28 +0300 Eli Zaretskii <eliz@gnu.org> wrote: >> >> >> From: Stefan Monnier <monnier@IRO.UMontreal.CA> >> >> Date: Mon, 26 Sep 2011 17:31:52 -0400 >> >> >> >> I think raw-text is more likely to work, based on what Lars says. >> EZ> That was also my conclusion. >> >> I think we should make an effort to make the netrc/authinfo file >> shareable with other programs EZ> I agree. But to do that, it sounds like we are lacking some knowledge EZ> about the intended use of these files, especially when they are used EZ> in conjunction with external services. If someone can prepare an EZ> exhaustive list of such uses, or at least those we want to support, EZ> and tell what encodings can be used with each of them, we can take it EZ> from there the way you want it. But if such details are not known at EZ> the moment, we may actually break some legitimate uses, which would be EZ> a pity. I know for sure only ASCII (up to 0xff) is supported by libcurl and older FTP clients. I thought UTF-8 would be a good compatibility path but apparently I'm wrong. EZ> So I think you are being overly optimistic in asserting that UTF-8 is EZ> "the safest choice". OK. EZ> You read "binary" incorrectly. For the purposes of this discussion, EZ> "binary" == "arbitrary byte values". Not every 8-bit byte is valid as EZ> part of a UTF-8 sequence. If the authinfo file includes such bytes, EZ> it cannot be encoded in UTF-8, except if we use the Emacs extensions, EZ> which will be only useful for Emacs. Such bytes can easily come from EZ> some single-byte encoding, for example. To DTRT with such bytes, we EZ> _must_ know its precise encoding; then we could _recode_ it in UTF-8, EZ> and encode back when we send the string to external services. Got it. On Tue, 27 Sep 2011 08:55:45 -0400 Stefan Monnier <monnier@iro.umontreal.ca> wrote: SM> Here's my take on it: SM> .authinfo contains various things and is used in different ways, and SM> there isn't a single answer that covers all cases: SM> - each kind of field (hostname, username, password) may require SM> a different encoding/decoding. SM> - when reading a password from the file, it should be read using SM> raw-text (i.e. as a "unibyte string"). SM> In other words, the password should not be decoded into chars but left SM> as a sequence of bytes that will be sent as-is to whoever needs it. SM> - when a password is typed by the user it'll be a sequence of chars, so SM> we'll have to convert it into a sequence of bytes. The best coding SM> system to use for that purpose is probably going to be SM> locale-coding-system. That sequence of bytes is then send to whoever SM> needs it and saved as-is (using raw-text) into the .authinfo file. SM> - i.e. authinfo should be read as a unibyte file. SM> - i.e. when reading other fields than passwords, we'll have to SM> explicitly decode them using the coding system we want to use for SM> those fields. SM> - similarly, we'll have to encode those other fields manually when SM> writing them into .authinfo. SM> Of course, another option is to just read&write authinfo without SM> thinking about it, so Emacs will usually pick locale-coding-system for SM> it and it'll work just fine in 99.9% of the cases. It sounds like the latter option is the least work and most reliable. Users should be able to override the coding system as with any other file, and we'll just keep the status quo. I appreciate all the details and corrections; I thought UTF-8 was better and more widely useful than it really is. Thanks Ted ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: smtpmail and ~/.authinfo 2011-09-27 20:15 ` Ted Zlatanov @ 2011-09-28 1:41 ` Stephen J. Turnbull 2011-09-28 8:38 ` Eli Zaretskii 0 siblings, 1 reply; 45+ messages in thread From: Stephen J. Turnbull @ 2011-09-28 1:41 UTC (permalink / raw) To: emacs-devel Ted Zlatanov writes: > I appreciate all the details and corrections; I thought UTF-8 was > better and more widely useful than it really is. Please hang on to that impression. UTF-8 really is the best thing since sliced bread (but also like sliced bread you still need to drink milk and eat fruit to get all essential vitamins). Although in many localizations, Windows defaults to something other than UTF-8 (AFAIK) for most text operations (including file system access etc), most Windows text applications do fine with UTF-8. What UTF-8 is not (yet), is backward compatible with legacy systems -- a lot of people have not yet converted from 60s- and 70s-era encodings to Unicode, even where that is almost trivial even for non-techies. IOW, your general impression is correct: UTF-8 is now an appropriate (ie, "usable") *system* default even on Windows (not that text files are in great vogue on Windows, except for program sources). Please don't hesitate to advocate it in that role. However, by default in a portable *application* that needs to deal with both variation *among* platforms and local customization within any given platform, Emacs needs to ask the system what its default is (or in some cases we can be a little more fine-grained, but POSIX localization isn't very useful in that direction). ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: smtpmail and ~/.authinfo 2011-09-28 1:41 ` Stephen J. Turnbull @ 2011-09-28 8:38 ` Eli Zaretskii 0 siblings, 0 replies; 45+ messages in thread From: Eli Zaretskii @ 2011-09-28 8:38 UTC (permalink / raw) To: Stephen J. Turnbull; +Cc: emacs-devel > From: "Stephen J. Turnbull" <stephen@xemacs.org> > Date: Wed, 28 Sep 2011 10:41:09 +0900 > > Ted Zlatanov writes: > > > I appreciate all the details and corrections; I thought UTF-8 was > > better and more widely useful than it really is. > > Please hang on to that impression. UTF-8 really is the best thing > since sliced bread FWIW, I agree. > Although in many localizations, Windows defaults to something other > than UTF-8 (AFAIK) for most text operations (including file system > access etc) To set the record straight: AFAIK there's not a single locale where Windows uses UTF-8 as the default encoding. Internal operations all use UTF-16, and file names are encoded by the NTFS filesystem in UTF-16 (FAT32 uses the locale-specific encoding, and thus can support only the characters in that encoding). Clipboard works in UTF-16. Etc. etc. > most Windows text applications do fine with UTF-8. True. Even Notepad can. There's a single exception, though: the shell (a.k.a. console) window. I cannot get "emacs -nw" on Windows use UTF-8 as its terminal encoding, nor have other Windows programs display UTF-8 in the console window. There's a UTF-8 codepage allegedly supported by Windows, but if I set the console window to use that codepage, I get gibberish or a crashed application. Maybe I'm doing something wrong. ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: smtpmail and ~/.authinfo 2011-09-26 21:31 ` Stefan Monnier 2011-09-26 21:43 ` Lars Magne Ingebrigtsen @ 2011-09-26 21:55 ` Ted Zlatanov 2011-09-27 2:57 ` Eli Zaretskii 2 siblings, 0 replies; 45+ messages in thread From: Ted Zlatanov @ 2011-09-26 21:55 UTC (permalink / raw) To: emacs-devel On Mon, 26 Sep 2011 17:31:52 -0400 Stefan Monnier <monnier@IRO.UMontreal.CA> wrote: >> That's exactly my point. Right now we save as raw-text, which is not >> usable to other programs in the long term. In UTF-8 it would be saved >> "as such" because IIRC all codepoints under 255 don't need to be encoded >> (and your string goes up to 233). SM> No, chars from the latin-1 set have identical *Unicode* code points SM> (i.e. between 128 and 255), but their encoding into utf-8 occupies SM> 2 bytes. SM> As for saving random bytes, you can't either, at least not in a way that SM> is supported by all utf-8 implementations. SM> I think raw-text is more likely to work, based on what Lars says. Thanks for explaining, my recollection of the high-bit extended ASCII encoding was wrong. I hope we find a way that works for everyone. Ted ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: smtpmail and ~/.authinfo 2011-09-26 21:31 ` Stefan Monnier 2011-09-26 21:43 ` Lars Magne Ingebrigtsen 2011-09-26 21:55 ` Ted Zlatanov @ 2011-09-27 2:57 ` Eli Zaretskii 2011-09-27 10:38 ` Ted Zlatanov 2 siblings, 1 reply; 45+ messages in thread From: Eli Zaretskii @ 2011-09-27 2:57 UTC (permalink / raw) To: Stefan Monnier; +Cc: emacs-devel > From: Stefan Monnier <monnier@IRO.UMontreal.CA> > Date: Mon, 26 Sep 2011 17:31:52 -0400 > > I think raw-text is more likely to work, based on what Lars says. That was also my conclusion. ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: smtpmail and ~/.authinfo 2011-09-27 2:57 ` Eli Zaretskii @ 2011-09-27 10:38 ` Ted Zlatanov 2011-09-27 11:31 ` Eli Zaretskii 2011-09-27 14:02 ` Jason Rumney 0 siblings, 2 replies; 45+ messages in thread From: Ted Zlatanov @ 2011-09-27 10:38 UTC (permalink / raw) To: emacs-devel On Tue, 27 Sep 2011 05:57:28 +0300 Eli Zaretskii <eliz@gnu.org> wrote: >> From: Stefan Monnier <monnier@IRO.UMontreal.CA> >> Date: Mon, 26 Sep 2011 17:31:52 -0400 >> >> I think raw-text is more likely to work, based on what Lars says. EZ> That was also my conclusion. I think we should make an effort to make the netrc/authinfo file shareable with other programs, or else what's the point of using such a file? We may as well `print' straight to a file. raw-text encoding is, to me, saying "we give up." I thought today, on most popular platforms, UTF-8 was the safest choice if you want to share data that covers UCS. I think the non-UCS data can be covered by letting the user override the encoding in a defcustom. The other objection to UTF-8 was that some binary sequences can't be encoded by it. Remember, we're talking about passwords and other legible tokens, not binary files. The likelihood of such a sequence in a token is too small to matter IMO. So I still think raw-text is the worse choice even though it's easier to make it. Ted ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: smtpmail and ~/.authinfo 2011-09-27 10:38 ` Ted Zlatanov @ 2011-09-27 11:31 ` Eli Zaretskii 2011-09-27 12:55 ` Stefan Monnier 2011-09-27 14:02 ` Jason Rumney 1 sibling, 1 reply; 45+ messages in thread From: Eli Zaretskii @ 2011-09-27 11:31 UTC (permalink / raw) To: emacs-devel > From: Ted Zlatanov <tzz@lifelogs.com> > Date: Tue, 27 Sep 2011 05:38:28 -0500 > Reply-To: emacs-devel@gnu.org > > On Tue, 27 Sep 2011 05:57:28 +0300 Eli Zaretskii <eliz@gnu.org> wrote: > > >> From: Stefan Monnier <monnier@IRO.UMontreal.CA> > >> Date: Mon, 26 Sep 2011 17:31:52 -0400 > >> > >> I think raw-text is more likely to work, based on what Lars says. > > EZ> That was also my conclusion. > > I think we should make an effort to make the netrc/authinfo file > shareable with other programs I agree. But to do that, it sounds like we are lacking some knowledge about the intended use of these files, especially when they are used in conjunction with external services. If someone can prepare an exhaustive list of such uses, or at least those we want to support, and tell what encodings can be used with each of them, we can take it from there the way you want it. But if such details are not known at the moment, we may actually break some legitimate uses, which would be a pity. > raw-text encoding is, to me, saying "we give up." Give up knowing exactly how the stuff is encoded, yes. There's nothing wrong with that; after all, we do that when we edit binary files, don't we? > I thought today, on most popular platforms, UTF-8 was the safest choice > if you want to share data that covers UCS. UCS and UTF-8 are not the same thing. Windows uses UCS (well, actually UTF-16) internally, but UTF-8 is seldom seen there, e.g. you will never see a file name encoded in UTF-8 on a Windows filesystem, except as an accident. Stephen gave you examples with CJK locales, where UTF-8 might not be as popular as you'd like it, even on Posix systems. And even in Europe there are a few locales which prefer single-byte encoding of some kind, AFAIK. So I think you are being overly optimistic in asserting that UTF-8 is "the safest choice". > The other objection to UTF-8 was that some binary sequences can't be > encoded by it. Remember, we're talking about passwords and other > legible tokens, not binary files. The likelihood of such a sequence in > a token is too small to matter IMO. So I still think raw-text is the > worse choice even though it's easier to make it. You read "binary" incorrectly. For the purposes of this discussion, "binary" == "arbitrary byte values". Not every 8-bit byte is valid as part of a UTF-8 sequence. If the authinfo file includes such bytes, it cannot be encoded in UTF-8, except if we use the Emacs extensions, which will be only useful for Emacs. Such bytes can easily come from some single-byte encoding, for example. To DTRT with such bytes, we _must_ know its precise encoding; then we could _recode_ it in UTF-8, and encode back when we send the string to external services. Once again, blindly assuming that UTF-8 is "safe" is not good enough, IMO. We need more details, if someone can provide them. ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: smtpmail and ~/.authinfo 2011-09-27 11:31 ` Eli Zaretskii @ 2011-09-27 12:55 ` Stefan Monnier 0 siblings, 0 replies; 45+ messages in thread From: Stefan Monnier @ 2011-09-27 12:55 UTC (permalink / raw) To: Eli Zaretskii; +Cc: emacs-devel Here's my take on it: .authinfo contains various things and is used in different ways, and there isn't a single answer that covers all cases: - each kind of field (hostname, username, password) may require a different encoding/decoding. - when reading a password from the file, it should be read using raw-text (i.e. as a "unibyte string"). In other words, the password should not be decoded into chars but left as a sequence of bytes that will be sent as-is to whoever needs it. - when a password is typed by the user it'll be a sequence of chars, so we'll have to convert it into a sequence of bytes. The best coding system to use for that purpose is probably going to be locale-coding-system. That sequence of bytes is then send to whoever needs it and saved as-is (using raw-text) into the .authinfo file. - i.e. authinfo should be read as a unibyte file. - i.e. when reading other fields than passwords, we'll have to explicitly decode them using the coding system we want to use for those fields. - similarly, we'll have to encode those other fields manually when writing them into .authinfo. Of course, another option is to just read&write authinfo without thinking about it, so Emacs will usually pick locale-coding-system for it and it'll work just fine in 99.9% of the cases. Stefan ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: smtpmail and ~/.authinfo 2011-09-27 10:38 ` Ted Zlatanov 2011-09-27 11:31 ` Eli Zaretskii @ 2011-09-27 14:02 ` Jason Rumney 1 sibling, 0 replies; 45+ messages in thread From: Jason Rumney @ 2011-09-27 14:02 UTC (permalink / raw) To: emacs-devel Ted Zlatanov <tzz@lifelogs.com> writes: > The other objection to UTF-8 was that some binary sequences can't be > encoded by it. Remember, we're talking about passwords and other > legible tokens, not binary files. The likelihood of such a sequence in > a token is too small to matter IMO. Where the binary sequence is non ASCII characters in an encoding other than UTF-8, the likelyhood that the sequence is not valid UTF-8 is very high. ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: smtpmail and ~/.authinfo 2011-09-26 19:22 ` Ted Zlatanov 2011-09-26 19:30 ` Lars Magne Ingebrigtsen @ 2011-09-26 19:34 ` Eli Zaretskii 2011-09-26 19:40 ` Ted Zlatanov 2011-09-27 13:54 ` Jason Rumney 2 siblings, 1 reply; 45+ messages in thread From: Eli Zaretskii @ 2011-09-26 19:34 UTC (permalink / raw) To: emacs-devel > From: Ted Zlatanov <tzz@lifelogs.com> > Date: Mon, 26 Sep 2011 14:22:36 -0500 > > I believe random bytes can be encoded just fine by UTF-8. No, they cannot. A given sequence of "random bytes" can be a valid UTF-8 encoding of some character with a sufficiently large code point. ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: smtpmail and ~/.authinfo 2011-09-26 19:34 ` Eli Zaretskii @ 2011-09-26 19:40 ` Ted Zlatanov 2011-09-27 2:51 ` Eli Zaretskii 0 siblings, 1 reply; 45+ messages in thread From: Ted Zlatanov @ 2011-09-26 19:40 UTC (permalink / raw) To: emacs-devel On Mon, 26 Sep 2011 22:34:21 +0300 Eli Zaretskii <eliz@gnu.org> wrote: >> From: Ted Zlatanov <tzz@lifelogs.com> >> Date: Mon, 26 Sep 2011 14:22:36 -0500 >> >> I believe random bytes can be encoded just fine by UTF-8. EZ> No, they cannot. A given sequence of "random bytes" can be a valid EZ> UTF-8 encoding of some character with a sufficiently large code point. But that's not what I said :) They can be *encoded* to a UTF-8 sequence than can be eventually decoded back from UTF-8. Ted ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: smtpmail and ~/.authinfo 2011-09-26 19:40 ` Ted Zlatanov @ 2011-09-27 2:51 ` Eli Zaretskii 0 siblings, 0 replies; 45+ messages in thread From: Eli Zaretskii @ 2011-09-27 2:51 UTC (permalink / raw) To: emacs-devel > From: Ted Zlatanov <tzz@lifelogs.com> > Date: Mon, 26 Sep 2011 14:40:07 -0500 > > On Mon, 26 Sep 2011 22:34:21 +0300 Eli Zaretskii <eliz@gnu.org> wrote: > > >> From: Ted Zlatanov <tzz@lifelogs.com> > >> Date: Mon, 26 Sep 2011 14:22:36 -0500 > >> > >> I believe random bytes can be encoded just fine by UTF-8. > > EZ> No, they cannot. A given sequence of "random bytes" can be a valid > EZ> UTF-8 encoding of some character with a sufficiently large code point. > > But that's not what I said :) They can be *encoded* to a UTF-8 sequence > than can be eventually decoded back from UTF-8. No. Some byte sequences are invalid UTF-8, and are decoded into a single special character. IOW, what you suggest is in general lossy conversion. ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: smtpmail and ~/.authinfo 2011-09-26 19:22 ` Ted Zlatanov 2011-09-26 19:30 ` Lars Magne Ingebrigtsen 2011-09-26 19:34 ` Eli Zaretskii @ 2011-09-27 13:54 ` Jason Rumney 2 siblings, 0 replies; 45+ messages in thread From: Jason Rumney @ 2011-09-27 13:54 UTC (permalink / raw) To: emacs-devel Ted Zlatanov <tzz@lifelogs.com> writes: > I believe random bytes can be encoded just fine by UTF-8. Not without Emacs extensions. ^ permalink raw reply [flat|nested] 45+ messages in thread
end of thread, other threads:[~2011-09-28 8:38 UTC | newest] Thread overview: 45+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2011-08-20 10:26 smtpmail and ~/.authinfo Eli Zaretskii 2011-08-21 4:39 ` Lars Magne Ingebrigtsen 2011-08-21 6:12 ` Eli Zaretskii 2011-08-21 19:25 ` Lars Magne Ingebrigtsen 2011-08-21 19:59 ` Eli Zaretskii 2011-08-21 20:17 ` Lars Magne Ingebrigtsen 2011-08-22 5:35 ` Eli Zaretskii 2011-09-10 19:01 ` Lars Magne Ingebrigtsen 2011-09-25 12:33 ` Ted Zlatanov 2011-09-25 12:48 ` Eli Zaretskii 2011-09-25 13:21 ` Ted Zlatanov 2011-09-25 17:08 ` Eli Zaretskii 2011-09-26 14:41 ` Ted Zlatanov 2011-09-26 16:18 ` Eli Zaretskii 2011-09-26 16:53 ` Ted Zlatanov 2011-09-26 17:15 ` Eli Zaretskii 2011-09-26 17:23 ` Eli Zaretskii 2011-09-26 17:31 ` Ted Zlatanov 2011-09-26 17:00 ` Stefan Monnier 2011-09-26 17:28 ` Ted Zlatanov 2011-09-26 21:27 ` Stefan Monnier 2011-09-26 18:04 ` Lars Magne Ingebrigtsen 2011-09-26 19:22 ` Ted Zlatanov 2011-09-26 19:30 ` Lars Magne Ingebrigtsen 2011-09-26 19:48 ` Ted Zlatanov 2011-09-26 21:31 ` Stefan Monnier 2011-09-26 21:43 ` Lars Magne Ingebrigtsen 2011-09-26 21:54 ` Ted Zlatanov 2011-09-27 4:07 ` Stephen J. Turnbull 2011-09-27 6:11 ` Lars Magne Ingebrigtsen 2011-09-27 10:29 ` Ted Zlatanov 2011-09-27 12:33 ` Stephen J. Turnbull 2011-09-27 20:15 ` Ted Zlatanov 2011-09-28 1:41 ` Stephen J. Turnbull 2011-09-28 8:38 ` Eli Zaretskii 2011-09-26 21:55 ` Ted Zlatanov 2011-09-27 2:57 ` Eli Zaretskii 2011-09-27 10:38 ` Ted Zlatanov 2011-09-27 11:31 ` Eli Zaretskii 2011-09-27 12:55 ` Stefan Monnier 2011-09-27 14:02 ` Jason Rumney 2011-09-26 19:34 ` Eli Zaretskii 2011-09-26 19:40 ` Ted Zlatanov 2011-09-27 2:51 ` Eli Zaretskii 2011-09-27 13:54 ` Jason Rumney
Code repositories for project(s) associated with this public inbox https://git.savannah.gnu.org/cgit/emacs.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).