* getpwent, user-full-name and utf-8 @ 2007-03-21 9:58 David Kastrup 2007-03-21 19:37 ` Eli Zaretskii 2007-03-22 5:01 ` Richard Stallman 0 siblings, 2 replies; 9+ messages in thread From: David Kastrup @ 2007-03-21 9:58 UTC (permalink / raw) To: emacs-devel Hi, user-full-name is set using getpwentry without decoding the resulting byte string at all. The manual page of getpwent does not mention any encoding of /etc/passwd, neither does that of /etc/passwd. It is a safe bet, however, that /etc/passwd is not encoded in emacs-mule. Since different users may use different language environments, I propose that we decode the results from getpwent according to utf-8. There will likely be similar problems with other system functions (name server lookup?). emacs-mule certainly is not the right answer to the encoding problem. And the problem will persist with emacs-unicode2 as well since there is a difference between illegal byte sequences and decoded illegal byte sequences. I propose that we bite the bullet, assume a fixed external system encoding of utf-8 for such strings, and recode accordingly. -- David Kastrup ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: getpwent, user-full-name and utf-8 2007-03-21 9:58 getpwent, user-full-name and utf-8 David Kastrup @ 2007-03-21 19:37 ` Eli Zaretskii 2007-03-21 20:51 ` David Kastrup 2007-03-22 5:01 ` Richard Stallman 1 sibling, 1 reply; 9+ messages in thread From: Eli Zaretskii @ 2007-03-21 19:37 UTC (permalink / raw) To: David Kastrup; +Cc: emacs-devel > From: David Kastrup <dak@gnu.org> > Date: Wed, 21 Mar 2007 10:58:08 +0100 > > I propose that we bite the bullet, assume a fixed external system > encoding of utf-8 for such strings, and recode accordingly. I'd rather assume that usernames are encoded in the locale's encoding, not necessarily in UTF-8. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: getpwent, user-full-name and utf-8 2007-03-21 19:37 ` Eli Zaretskii @ 2007-03-21 20:51 ` David Kastrup 2007-03-22 2:30 ` Miles Bader 2007-03-22 7:01 ` Jan Djärv 0 siblings, 2 replies; 9+ messages in thread From: David Kastrup @ 2007-03-21 20:51 UTC (permalink / raw) To: Eli Zaretskii; +Cc: emacs-devel Eli Zaretskii <eliz@gnu.org> writes: >> From: David Kastrup <dak@gnu.org> >> Date: Wed, 21 Mar 2007 10:58:08 +0100 >> >> I propose that we bite the bullet, assume a fixed external system >> encoding of utf-8 for such strings, and recode accordingly. > > I'd rather assume that usernames are encoded in the locale's > encoding, not necessarily in UTF-8. That assumes that every user operates under the same locale, and that this locale agrees with the locale of the system files. In particular on multi-user machines, that is not realistic. It might be reasonable to add a new variable to hold the system locale which should not depend on the user locale. However, it is somewhat late for this. Clearly, assuming emacs-mule encoding for the system, as it now appears the case, is always wrong. For current systems, assuming utf-8 will likely be correct most of the time, at least. -- David Kastrup, Kriemhildstr. 15, 44793 Bochum ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: getpwent, user-full-name and utf-8 2007-03-21 20:51 ` David Kastrup @ 2007-03-22 2:30 ` Miles Bader 2007-03-22 7:01 ` Jan Djärv 1 sibling, 0 replies; 9+ messages in thread From: Miles Bader @ 2007-03-22 2:30 UTC (permalink / raw) To: emacs-devel David Kastrup <dak@gnu.org> writes: > Eli Zaretskii <eliz@gnu.org> writes: >> I'd rather assume that usernames are encoded in the locale's >> encoding, not necessarily in UTF-8. ... > That assumes that every user operates under the same locale, and that > this locale agrees with the locale of the system files. In particular > on multi-user machines, that is not realistic. ... > For current systems, assuming utf-8 will likely be correct most of the > time, at least. Do you have any data to back that up? If you think of multiuser systems versus single-user systems, I'd think: * On a single-user systems, the user's locale would often match /etc/passwd. * Multi-user systems tend to be much longer-lived (I think much of the data on the servers at my work dates back 15 years or more -- often the hardware gets upgraded, but the user-related data is just kept verbatim from the old system), and in many cases probably have user databases that predate widespread use of utf-8. In europe I guess that would mean they use latin-XX. There's really no way you can always get it right, but my intuition is that the safest thing to do is use the locale as Eli suggests. Of course you're right that emacs-mule is basically never correct... (maybe there are some crazies out there though :-) -Miles -- Is it true that nothing can be known? If so how do we know this? -Woody Allen ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: getpwent, user-full-name and utf-8 2007-03-21 20:51 ` David Kastrup 2007-03-22 2:30 ` Miles Bader @ 2007-03-22 7:01 ` Jan Djärv 2007-03-22 7:40 ` David Kastrup 1 sibling, 1 reply; 9+ messages in thread From: Jan Djärv @ 2007-03-22 7:01 UTC (permalink / raw) To: David Kastrup; +Cc: Eli Zaretskii, emacs-devel David Kastrup skrev: > Eli Zaretskii <eliz@gnu.org> writes: > >>> From: David Kastrup <dak@gnu.org> >>> Date: Wed, 21 Mar 2007 10:58:08 +0100 >>> >>> I propose that we bite the bullet, assume a fixed external system >>> encoding of utf-8 for such strings, and recode accordingly. >> I'd rather assume that usernames are encoded in the locale's >> encoding, not necessarily in UTF-8. > > That assumes that every user operates under the same locale, and that > this locale agrees with the locale of the system files. In particular > on multi-user machines, that is not realistic. Since users themselves can set their full name, I'd think the user locale would be a good choice. > > It might be reasonable to add a new variable to hold the system locale > which should not depend on the user locale. However, it is somewhat > late for this. Clearly, assuming emacs-mule encoding for the system, > as it now appears the case, is always wrong. > > For current systems, assuming utf-8 will likely be correct most of the > time, at least. UTF-8 is much better than emacs-mule. If it is not too much work, I'd suggest checking if the name is valid UTF-8, and if it isn't, use the user locale. Jan D. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: getpwent, user-full-name and utf-8 2007-03-22 7:01 ` Jan Djärv @ 2007-03-22 7:40 ` David Kastrup 2007-03-22 8:17 ` Jan Djärv 0 siblings, 1 reply; 9+ messages in thread From: David Kastrup @ 2007-03-22 7:40 UTC (permalink / raw) To: Jan Djärv; +Cc: Eli Zaretskii, emacs-devel Jan Djärv <jan.h.d@swipnet.se> writes: > David Kastrup skrev: >> Eli Zaretskii <eliz@gnu.org> writes: >> >>>> From: David Kastrup <dak@gnu.org> >>>> Date: Wed, 21 Mar 2007 10:58:08 +0100 >>>> >>>> I propose that we bite the bullet, assume a fixed external system >>>> encoding of utf-8 for such strings, and recode accordingly. >>> I'd rather assume that usernames are encoded in the locale's >>> encoding, not necessarily in UTF-8. >> >> That assumes that every user operates under the same locale, and that >> this locale agrees with the locale of the system files. In particular >> on multi-user machines, that is not realistic. > > Since users themselves can set their full name, Since when? Well, I took a look at the manual page of passwd(1) on my GNU/Linux system, and the description indeed says: passwd also changes account information, such as the full name of the user, the user's login shell, or his/her password expiry date and interval. Amusingly, however, there is no option for doing any of that except the password related stuff. There is, however, CHFN(1) -- 06/06/2006 -- User Commands NAME chfn - change real user name and information SYNOPSIS chfn [-f full_name] [-r room_no] [-w work_ph] [-h home_ph] [-o other] [user] DESCRIPTION chfn changes user fullname, office number, office extension, and home phone number information for a user's account. This information is typically printed by finger(1) and similar programs. A normal user may only change the fields for her own account, subject to the restrictions in /etc/login.defs. (The default configuration is to prevent users from changing their fullname.) The super user may change any field for any account. Additionally, only the super user may use the -o option to change the undefined portions of the GECOS field. The only restriction placed on the contents of the fields is that no control characters may be present, nor any of comma, colon, or equal sign. The other field does not have this restriction, and is used to store accounting information used by other applications. So with the default settings, a user can't change his settings. > I'd think the user locale would be a good choice. It is not the worst choice, but I consider it likely that a better way would be something like a separate "system locale". For lack of better information, one could let it default to the user locale, but it should be at least configurable separately. Does anybody have access to the X/Open or Posix specs? Maybe something is said about this in there. -- David Kastrup ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: getpwent, user-full-name and utf-8 2007-03-22 7:40 ` David Kastrup @ 2007-03-22 8:17 ` Jan Djärv 2007-03-22 9:06 ` Jan Djärv 0 siblings, 1 reply; 9+ messages in thread From: Jan Djärv @ 2007-03-22 8:17 UTC (permalink / raw) To: David Kastrup; +Cc: Eli Zaretskii, emacs-devel David Kastrup skrev: > Jan Djärv <jan.h.d@swipnet.se> writes: > >> David Kastrup skrev: >>> Eli Zaretskii <eliz@gnu.org> writes: >>> >>>>> From: David Kastrup <dak@gnu.org> >>>>> Date: Wed, 21 Mar 2007 10:58:08 +0100 >>>>> >>>>> I propose that we bite the bullet, assume a fixed external system >>>>> encoding of utf-8 for such strings, and recode accordingly. >>>> I'd rather assume that usernames are encoded in the locale's >>>> encoding, not necessarily in UTF-8. >>> That assumes that every user operates under the same locale, and that >>> this locale agrees with the locale of the system files. In particular >>> on multi-user machines, that is not realistic. >> Since users themselves can set their full name, > > Since when? > > Well, I took a look at the manual page of passwd(1) on my GNU/Linux > system, and the description indeed says: > > passwd also changes account information, such as the full > name of the user, the user's login shell, or his/her > password expiry date and interval. > > Amusingly, however, there is no option for doing any of that except > the password related stuff. > > There is, however, > > CHFN(1) -- 06/06/2006 -- User Commands Yes that is what I meant. > ... > /etc/login.defs. (The default configuration is to prevent > users from changing their fullname.) > > So with the default settings, a user can't change his settings. Ok, I haven't seen that restriction before. > >> I'd think the user locale would be a good choice. > > It is not the worst choice, but I consider it likely that a better way > would be something like a separate "system locale". For lack of > better information, one could let it default to the user locale, but > it should be at least configurable separately. > > Does anybody have access to the X/Open or Posix specs? Maybe > something is said about this in there. > They don't say anything, they don't specify any passwd or chfn program. The description of <pwd.h> just says: The <pwd.h> header shall provide a definition for struct passwd, which shall include at least the following members: char *pw_name User's login name. ... and that is about it. I checked some of the machines I have accounts on, but no admin access to, and they all use Latin-1 (which BTW is not my locale, UTF-8 is). So some system locale may not be a bad idea, but default UTF-8 is OK. Jan D. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: getpwent, user-full-name and utf-8 2007-03-22 8:17 ` Jan Djärv @ 2007-03-22 9:06 ` Jan Djärv 0 siblings, 0 replies; 9+ messages in thread From: Jan Djärv @ 2007-03-22 9:06 UTC (permalink / raw) To: David Kastrup; +Cc: Eli Zaretskii, emacs-devel Jan Djärv skrev: > > They don't say anything, they don't specify any passwd or chfn program. > The description of <pwd.h> just says: > > The <pwd.h> header shall provide a definition for struct passwd, which > shall > include at least the following members: > > char *pw_name User's login name. > ... > > and that is about it. > I.e. they don't specify a GECOS field at all. Jan D. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: getpwent, user-full-name and utf-8 2007-03-21 9:58 getpwent, user-full-name and utf-8 David Kastrup 2007-03-21 19:37 ` Eli Zaretskii @ 2007-03-22 5:01 ` Richard Stallman 1 sibling, 0 replies; 9+ messages in thread From: Richard Stallman @ 2007-03-22 5:01 UTC (permalink / raw) To: David Kastrup; +Cc: emacs-devel I propose that we bite the bullet, assume a fixed external system encoding of utf-8 for such strings, and recode accordingly. Since this is a matter of interacting with the rest of the system, the pertinent question is not how people OUGHT to encode /etc/passwd, but rather how they DO encode it. Is there a common practice of encoding parts of /etc/passwd in utf-8? ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2007-03-22 9:06 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2007-03-21 9:58 getpwent, user-full-name and utf-8 David Kastrup 2007-03-21 19:37 ` Eli Zaretskii 2007-03-21 20:51 ` David Kastrup 2007-03-22 2:30 ` Miles Bader 2007-03-22 7:01 ` Jan Djärv 2007-03-22 7:40 ` David Kastrup 2007-03-22 8:17 ` Jan Djärv 2007-03-22 9:06 ` Jan Djärv 2007-03-22 5:01 ` Richard Stallman
Code repositories for project(s) associated with this public inbox https://git.savannah.gnu.org/cgit/emacs.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).