unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
* getpwent, user-full-name and utf-8
@ 2007-03-21  9:58 David Kastrup
  2007-03-21 19:37 ` Eli Zaretskii
  2007-03-22  5:01 ` Richard Stallman
  0 siblings, 2 replies; 9+ messages in thread
From: David Kastrup @ 2007-03-21  9:58 UTC (permalink / raw)
  To: emacs-devel


Hi,

user-full-name is set using getpwentry without decoding the resulting
byte string at all.

The manual page of getpwent does not mention any encoding of
/etc/passwd, neither does that of /etc/passwd.

It is a safe bet, however, that /etc/passwd is not encoded in
emacs-mule.

Since different users may use different language environments, I
propose that we decode the results from getpwent according to utf-8.

There will likely be similar problems with other system functions
(name server lookup?).  emacs-mule certainly is not the right answer
to the encoding problem.  And the problem will persist with
emacs-unicode2 as well since there is a difference between illegal
byte sequences and decoded illegal byte sequences.

I propose that we bite the bullet, assume a fixed external system
encoding of utf-8 for such strings, and recode accordingly.

-- 
David Kastrup

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: getpwent, user-full-name and utf-8
  2007-03-21  9:58 getpwent, user-full-name and utf-8 David Kastrup
@ 2007-03-21 19:37 ` Eli Zaretskii
  2007-03-21 20:51   ` David Kastrup
  2007-03-22  5:01 ` Richard Stallman
  1 sibling, 1 reply; 9+ messages in thread
From: Eli Zaretskii @ 2007-03-21 19:37 UTC (permalink / raw)
  To: David Kastrup; +Cc: emacs-devel

> From: David Kastrup <dak@gnu.org>
> Date: Wed, 21 Mar 2007 10:58:08 +0100
> 
> I propose that we bite the bullet, assume a fixed external system
> encoding of utf-8 for such strings, and recode accordingly.

I'd rather assume that usernames are encoded in the locale's encoding,
not necessarily in UTF-8.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: getpwent, user-full-name and utf-8
  2007-03-21 19:37 ` Eli Zaretskii
@ 2007-03-21 20:51   ` David Kastrup
  2007-03-22  2:30     ` Miles Bader
  2007-03-22  7:01     ` Jan Djärv
  0 siblings, 2 replies; 9+ messages in thread
From: David Kastrup @ 2007-03-21 20:51 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel

Eli Zaretskii <eliz@gnu.org> writes:

>> From: David Kastrup <dak@gnu.org>
>> Date: Wed, 21 Mar 2007 10:58:08 +0100
>> 
>> I propose that we bite the bullet, assume a fixed external system
>> encoding of utf-8 for such strings, and recode accordingly.
>
> I'd rather assume that usernames are encoded in the locale's
> encoding, not necessarily in UTF-8.

That assumes that every user operates under the same locale, and that
this locale agrees with the locale of the system files.  In particular
on multi-user machines, that is not realistic.

It might be reasonable to add a new variable to hold the system locale
which should not depend on the user locale.  However, it is somewhat
late for this.  Clearly, assuming emacs-mule encoding for the system,
as it now appears the case, is always wrong.

For current systems, assuming utf-8 will likely be correct most of the
time, at least.

-- 
David Kastrup, Kriemhildstr. 15, 44793 Bochum

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: getpwent, user-full-name and utf-8
  2007-03-21 20:51   ` David Kastrup
@ 2007-03-22  2:30     ` Miles Bader
  2007-03-22  7:01     ` Jan Djärv
  1 sibling, 0 replies; 9+ messages in thread
From: Miles Bader @ 2007-03-22  2:30 UTC (permalink / raw)
  To: emacs-devel

David Kastrup <dak@gnu.org> writes:
> Eli Zaretskii <eliz@gnu.org> writes:
>> I'd rather assume that usernames are encoded in the locale's
>> encoding, not necessarily in UTF-8.
...
> That assumes that every user operates under the same locale, and that
> this locale agrees with the locale of the system files.  In particular
> on multi-user machines, that is not realistic.
...
> For current systems, assuming utf-8 will likely be correct most of the
> time, at least.

Do you have any data to back that up?

If you think of multiuser systems versus single-user systems, I'd think:

  * On a single-user systems, the user's locale would often match
    /etc/passwd.

  * Multi-user systems tend to be much longer-lived (I think much of the
    data on the servers at my work dates back 15 years or more -- often
    the hardware gets upgraded, but the user-related data is just kept
    verbatim from the old system), and in many cases probably have user
    databases that predate widespread use of utf-8.  In europe I guess
    that would mean they use latin-XX.

There's really no way you can always get it right, but my intuition is
that the safest thing to do is use the locale as Eli suggests.

Of course you're right that emacs-mule is basically never
correct... (maybe there are some crazies out there though :-)

-Miles

-- 
Is it true that nothing can be known?  If so how do we know this?  -Woody Allen

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: getpwent, user-full-name and utf-8
  2007-03-21  9:58 getpwent, user-full-name and utf-8 David Kastrup
  2007-03-21 19:37 ` Eli Zaretskii
@ 2007-03-22  5:01 ` Richard Stallman
  1 sibling, 0 replies; 9+ messages in thread
From: Richard Stallman @ 2007-03-22  5:01 UTC (permalink / raw)
  To: David Kastrup; +Cc: emacs-devel

    I propose that we bite the bullet, assume a fixed external system
    encoding of utf-8 for such strings, and recode accordingly.

Since this is a matter of interacting with the rest of the system, the
pertinent question is not how people OUGHT to encode /etc/passwd, but
rather how they DO encode it.

Is there a common practice of encoding parts of /etc/passwd in utf-8?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: getpwent, user-full-name and utf-8
  2007-03-21 20:51   ` David Kastrup
  2007-03-22  2:30     ` Miles Bader
@ 2007-03-22  7:01     ` Jan Djärv
  2007-03-22  7:40       ` David Kastrup
  1 sibling, 1 reply; 9+ messages in thread
From: Jan Djärv @ 2007-03-22  7:01 UTC (permalink / raw)
  To: David Kastrup; +Cc: Eli Zaretskii, emacs-devel



David Kastrup skrev:
> Eli Zaretskii <eliz@gnu.org> writes:
> 
>>> From: David Kastrup <dak@gnu.org>
>>> Date: Wed, 21 Mar 2007 10:58:08 +0100
>>>
>>> I propose that we bite the bullet, assume a fixed external system
>>> encoding of utf-8 for such strings, and recode accordingly.
>> I'd rather assume that usernames are encoded in the locale's
>> encoding, not necessarily in UTF-8.
> 
> That assumes that every user operates under the same locale, and that
> this locale agrees with the locale of the system files.  In particular
> on multi-user machines, that is not realistic.

Since users themselves can set their full name, I'd think the user locale 
would be a good choice.

> 
> It might be reasonable to add a new variable to hold the system locale
> which should not depend on the user locale.  However, it is somewhat
> late for this.  Clearly, assuming emacs-mule encoding for the system,
> as it now appears the case, is always wrong.
> 
> For current systems, assuming utf-8 will likely be correct most of the
> time, at least.

UTF-8 is much better than emacs-mule.  If it is not too much work, I'd suggest 
checking if the name is valid UTF-8, and if it isn't, use the user locale.

	Jan D.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: getpwent, user-full-name and utf-8
  2007-03-22  7:01     ` Jan Djärv
@ 2007-03-22  7:40       ` David Kastrup
  2007-03-22  8:17         ` Jan Djärv
  0 siblings, 1 reply; 9+ messages in thread
From: David Kastrup @ 2007-03-22  7:40 UTC (permalink / raw)
  To: Jan Djärv; +Cc: Eli Zaretskii, emacs-devel

Jan Djärv <jan.h.d@swipnet.se> writes:

> David Kastrup skrev:
>> Eli Zaretskii <eliz@gnu.org> writes:
>>
>>>> From: David Kastrup <dak@gnu.org>
>>>> Date: Wed, 21 Mar 2007 10:58:08 +0100
>>>>
>>>> I propose that we bite the bullet, assume a fixed external system
>>>> encoding of utf-8 for such strings, and recode accordingly.
>>> I'd rather assume that usernames are encoded in the locale's
>>> encoding, not necessarily in UTF-8.
>>
>> That assumes that every user operates under the same locale, and that
>> this locale agrees with the locale of the system files.  In particular
>> on multi-user machines, that is not realistic.
>
> Since users themselves can set their full name,

Since when?

Well, I took a look at the manual page of passwd(1) on my GNU/Linux
system, and the description indeed says:

	passwd also changes account information, such as the full
	name of the user, the user's login shell, or his/her
	password expiry date and interval.

Amusingly, however, there is no option for doing any of that except
the password related stuff.

There is, however,

CHFN(1) -- 06/06/2006 -- User Commands

NAME
	chfn - change real user name and information

SYNOPSIS

	chfn [-f full_name] [-r room_no] [-w work_ph] [-h home_ph]
		[-o other] [user]

DESCRIPTION

	chfn changes user fullname, office number, office extension,
	and home phone number information for a user's account. This
	information is typically printed by finger(1) and similar
	programs. A normal user may only change the fields for her
	own account, subject to the restrictions in
	/etc/login.defs. (The default configuration is to prevent
	users from changing their fullname.) The super user may
	change any field for any account. Additionally, only the
	super user may use the -o option to change the undefined
	portions of the GECOS field.

	The only restriction placed on the contents of the fields is
	that no control characters may be present, nor any of comma,
	colon, or equal sign. The other field does not have this
	restriction, and is used to store accounting information
	used by other applications.

So with the default settings, a user can't change his settings.

> I'd think the user locale would be a good choice.

It is not the worst choice, but I consider it likely that a better way
would be something like a separate "system locale".  For lack of
better information, one could let it default to the user locale, but
it should be at least configurable separately.

Does anybody have access to the X/Open or Posix specs?  Maybe
something is said about this in there.

-- 
David Kastrup

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: getpwent, user-full-name and utf-8
  2007-03-22  7:40       ` David Kastrup
@ 2007-03-22  8:17         ` Jan Djärv
  2007-03-22  9:06           ` Jan Djärv
  0 siblings, 1 reply; 9+ messages in thread
From: Jan Djärv @ 2007-03-22  8:17 UTC (permalink / raw)
  To: David Kastrup; +Cc: Eli Zaretskii, emacs-devel



David Kastrup skrev:
> Jan Djärv <jan.h.d@swipnet.se> writes:
> 
>> David Kastrup skrev:
>>> Eli Zaretskii <eliz@gnu.org> writes:
>>>
>>>>> From: David Kastrup <dak@gnu.org>
>>>>> Date: Wed, 21 Mar 2007 10:58:08 +0100
>>>>>
>>>>> I propose that we bite the bullet, assume a fixed external system
>>>>> encoding of utf-8 for such strings, and recode accordingly.
>>>> I'd rather assume that usernames are encoded in the locale's
>>>> encoding, not necessarily in UTF-8.
>>> That assumes that every user operates under the same locale, and that
>>> this locale agrees with the locale of the system files.  In particular
>>> on multi-user machines, that is not realistic.
>> Since users themselves can set their full name,
> 
> Since when?
> 
> Well, I took a look at the manual page of passwd(1) on my GNU/Linux
> system, and the description indeed says:
> 
> 	passwd also changes account information, such as the full
> 	name of the user, the user's login shell, or his/her
> 	password expiry date and interval.
> 
> Amusingly, however, there is no option for doing any of that except
> the password related stuff.
> 
> There is, however,
> 
> CHFN(1) -- 06/06/2006 -- User Commands

Yes that is what I meant.

> 
...
> 	/etc/login.defs. (The default configuration is to prevent
> 	users from changing their fullname.)



> 
> So with the default settings, a user can't change his settings.

Ok, I haven't seen that restriction before.

> 
>> I'd think the user locale would be a good choice.
> 
> It is not the worst choice, but I consider it likely that a better way
> would be something like a separate "system locale".  For lack of
> better information, one could let it default to the user locale, but
> it should be at least configurable separately.
> 
> Does anybody have access to the X/Open or Posix specs?  Maybe
> something is said about this in there.
> 

They don't say anything, they don't specify any passwd or chfn program.  The 
description of <pwd.h> just says:

   The <pwd.h> header shall provide a definition for struct passwd, which shall
   include at least the following members:

   char    *pw_name   User's login name.
...

and that is about it.

I checked some of the machines I have accounts on, but no admin access to, and 
they all use Latin-1 (which BTW is not my locale, UTF-8 is).  So some system 
locale may not be a bad idea, but default UTF-8 is OK.

	Jan D.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: getpwent, user-full-name and utf-8
  2007-03-22  8:17         ` Jan Djärv
@ 2007-03-22  9:06           ` Jan Djärv
  0 siblings, 0 replies; 9+ messages in thread
From: Jan Djärv @ 2007-03-22  9:06 UTC (permalink / raw)
  To: David Kastrup; +Cc: Eli Zaretskii, emacs-devel



Jan Djärv skrev:

> 
> They don't say anything, they don't specify any passwd or chfn program.  
> The description of <pwd.h> just says:
> 
>   The <pwd.h> header shall provide a definition for struct passwd, which 
> shall
>   include at least the following members:
> 
>   char    *pw_name   User's login name.
> ...
> 
> and that is about it.
> 

I.e. they don't specify a GECOS field at all.

	Jan D.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2007-03-22  9:06 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-03-21  9:58 getpwent, user-full-name and utf-8 David Kastrup
2007-03-21 19:37 ` Eli Zaretskii
2007-03-21 20:51   ` David Kastrup
2007-03-22  2:30     ` Miles Bader
2007-03-22  7:01     ` Jan Djärv
2007-03-22  7:40       ` David Kastrup
2007-03-22  8:17         ` Jan Djärv
2007-03-22  9:06           ` Jan Djärv
2007-03-22  5:01 ` Richard Stallman

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).