unofficial mirror of help-gnu-emacs@gnu.org
 help / color / mirror / Atom feed
* Opening file in UTF-8  mode automatically
@ 2007-11-24 16:38 spamfilteraccount
  2007-11-24 18:26 ` Peter Dyballa
                   ` (4 more replies)
  0 siblings, 5 replies; 18+ messages in thread
From: spamfilteraccount @ 2007-11-24 16:38 UTC (permalink / raw)
  To: help-gnu-emacs

I need to edit some UTF-8 files and it's very annoying emacs doesn't
detect it automatically (I have to reopen them as utf-8 manually) and
sometimes I notice it only after I already edited and saved the file
which messes up the formatting.

I tried prefer-coding-system utf-8, but it didn't help.

I can't put lisp code into the files, because they are data files.

Is there a definitive way to do it? The BOM is at the beginning of the
files, so Emacs could detect it automatically.

It's emacs 22.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Opening file in UTF-8  mode automatically
  2007-11-24 16:38 Opening file in UTF-8 mode automatically spamfilteraccount
@ 2007-11-24 18:26 ` Peter Dyballa
  2007-11-24 20:09   ` Eli Zaretskii
  2007-11-24 18:38 ` Reiner Steib
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 18+ messages in thread
From: Peter Dyballa @ 2007-11-24 18:26 UTC (permalink / raw)
  To: PT; +Cc: help-gnu-emacs


Am 24.11.2007 um 17:38 schrieb spamfilteraccount:

> I tried prefer-coding-system utf-8, but it didn't help.


It might help to set LANG and LC_CTYPE to some UTF-8 value. Another  
step would be to avoid set-language-environment. Both means – and  
(prefer-coding-system 'utf-8) works fine for me.

Finally, if the files' extension is quite unique, you could use  
something like this to bind a file name extension to some particular  
file encoding:

	(add-to-list 'file-coding-system-alist '("\\.tex\\'" . utf-8))

This can be done temporarily.


If your files really use a particular mark at the beginning (EF BB  
BF), you could augment magic-mode-alist, but this does not directly  
set the file's encoding.

--
Greetings

   Pete

Ce qui été compris n'existe plus.    (Paul Eluard)

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Opening file in UTF-8  mode automatically
  2007-11-24 16:38 Opening file in UTF-8 mode automatically spamfilteraccount
  2007-11-24 18:26 ` Peter Dyballa
@ 2007-11-24 18:38 ` Reiner Steib
  2007-11-25  5:49   ` spamfilteraccount
  2007-11-24 20:07 ` Eli Zaretskii
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 18+ messages in thread
From: Reiner Steib @ 2007-11-24 18:38 UTC (permalink / raw)
  To: help-gnu-emacs

On Sat, Nov 24 2007, spamfilteraccount@gmail.com wrote:

> I need to edit some UTF-8 files and it's very annoying emacs doesn't
> detect it automatically (I have to reopen them as utf-8 manually) and
> sometimes I notice it only after I already edited and saved the file
> which messes up the formatting.
>
> I tried prefer-coding-system utf-8, but it didn't help.
>
> I can't put lisp code into the files, because they are data files.
>
> Is there a definitive way to do it? The BOM is at the beginning of the
> files, so Emacs could detect it automatically.
>
> It's emacs 22.

`auto-coding-regexp-alist' in Emacs 22 already has an entry for the
BOM: ("\\`\xEF\xBB\xBF" . utf-8).

Either you have removed this entry or there is a bug.  If the latter,
please report it as a bug (M-x report-emacs-bug RET) along with a
small gzipped sample document and a recipe starting from "emacs -Q" to
reproduce the problem.

Bye, Reiner.
-- 
       ,,,
      (o o)
---ooO-(_)-Ooo---  |  PGP key available  |  http://rsteib.home.pages.de/

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Opening file in UTF-8  mode automatically
  2007-11-24 16:38 Opening file in UTF-8 mode automatically spamfilteraccount
  2007-11-24 18:26 ` Peter Dyballa
  2007-11-24 18:38 ` Reiner Steib
@ 2007-11-24 20:07 ` Eli Zaretskii
       [not found] ` <mailman.4039.1195934839.18990.help-gnu-emacs@gnu.org>
  2007-11-25 22:49 ` Xah Lee
  4 siblings, 0 replies; 18+ messages in thread
From: Eli Zaretskii @ 2007-11-24 20:07 UTC (permalink / raw)
  To: help-gnu-emacs

> From: "spamfilteraccount@gmail.com" <spamfilteraccount@gmail.com>
> Date: Sat, 24 Nov 2007 08:38:20 -0800 (PST)
> 
> I need to edit some UTF-8 files and it's very annoying emacs doesn't
> detect it automatically (I have to reopen them as utf-8 manually) and
> sometimes I notice it only after I already edited and saved the file
> which messes up the formatting.

What is your locale?

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Opening file in UTF-8  mode automatically
  2007-11-24 18:26 ` Peter Dyballa
@ 2007-11-24 20:09   ` Eli Zaretskii
  0 siblings, 0 replies; 18+ messages in thread
From: Eli Zaretskii @ 2007-11-24 20:09 UTC (permalink / raw)
  To: help-gnu-emacs

> From: Peter Dyballa <Peter_Dyballa@Web.DE>
> Date: Sat, 24 Nov 2007 19:26:15 +0100
> Cc: help-gnu-emacs@gnu.org
> 
> Am 24.11.2007 um 17:38 schrieb spamfilteraccount:
> 
> > I tried prefer-coding-system utf-8, but it didn't help.
> 
> 
> It might help to set LANG and LC_CTYPE to some UTF-8 value.

On Posix systems, perhaps.  On Windows, the results of setting these
might be different, as the locale settings are not done via
environment variables there.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Opening file in UTF-8 mode automatically
  2007-11-24 18:38 ` Reiner Steib
@ 2007-11-25  5:49   ` spamfilteraccount
  2007-11-25 11:33     ` Reiner Steib
  0 siblings, 1 reply; 18+ messages in thread
From: spamfilteraccount @ 2007-11-25  5:49 UTC (permalink / raw)
  To: help-gnu-emacs

On Nov 24, 7:38 pm, Reiner Steib <reinersteib+gm...@imap.cc> wrote:
>
> `auto-coding-regexp-alist' in Emacs 22 already has an entry for the
> BOM: ("\\`\xEF\xBB\xBF" . utf-8).
>

Yep, it's there, I checked it, but it doesn't trigger opening the file
in utf-8 mode.

Is there any other setting which is necessary for auto-coding-regexp-
alist to work, or should this single setting be enough? I don't have
anything else set.

I checked mule.el (the only .el file where this variable is seemingly
used), but it's not clear for me what should trigger it. Apparently
it's not triggered by find-file-hook or something which I would
expect.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Opening file in UTF-8 mode automatically
       [not found] ` <mailman.4039.1195934839.18990.help-gnu-emacs@gnu.org>
@ 2007-11-25  5:51   ` spamfilteraccount
  2007-11-25 20:59     ` Eli Zaretskii
       [not found]     ` <mailman.4082.1196024377.18990.help-gnu-emacs@gnu.org>
  0 siblings, 2 replies; 18+ messages in thread
From: spamfilteraccount @ 2007-11-25  5:51 UTC (permalink / raw)
  To: help-gnu-emacs

On Nov 24, 9:07 pm, Eli Zaretskii <e...@gnu.org> wrote:

> What is your locale?

None set. It's an English Windows. I'd like to solve this problem from
emacs alone.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Opening file in UTF-8 mode automatically
  2007-11-25  5:49   ` spamfilteraccount
@ 2007-11-25 11:33     ` Reiner Steib
  2007-11-27  7:12       ` spamfilteraccount
  0 siblings, 1 reply; 18+ messages in thread
From: Reiner Steib @ 2007-11-25 11:33 UTC (permalink / raw)
  To: help-gnu-emacs

On Sun, Nov 25 2007, spamfilteraccount@gmail.com wrote:

> On Nov 24, 7:38 pm, Reiner Steib <reinersteib+gm...@imap.cc> wrote:
>> `auto-coding-regexp-alist' in Emacs 22 already has an entry for the
>> BOM: ("\\`\xEF\xBB\xBF" . utf-8).
>
> Yep, it's there, I checked it, but it doesn't trigger opening the file
> in utf-8 mode.

Works for me:

$ echo -e '\xEF\xBB\xBF BOM test' > /tmp/BOM.txt

$ cvs-EMACS_22_BASE/i686$ LC_ALL=C ./src/emacs -Q /tmp/BOM.txt

==> u -- mule-utf-8-unix in the mode line.

So we need a (small) sample file[1] and a recipe to reproduce the
problem starting from emacs -Q.  `M-x report-emacs-bug RET'
additionally provides useful information for the developers.

[1] E.g. the first few lines of your text file.

Bye, Reiner.
-- 
       ,,,
      (o o)
---ooO-(_)-Ooo---  |  PGP key available  |  http://rsteib.home.pages.de/

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Opening file in UTF-8 mode automatically
  2007-11-25  5:51   ` spamfilteraccount
@ 2007-11-25 20:59     ` Eli Zaretskii
       [not found]     ` <mailman.4082.1196024377.18990.help-gnu-emacs@gnu.org>
  1 sibling, 0 replies; 18+ messages in thread
From: Eli Zaretskii @ 2007-11-25 20:59 UTC (permalink / raw)
  To: help-gnu-emacs

> From: "spamfilteraccount@gmail.com" <spamfilteraccount@gmail.com>
> Date: Sat, 24 Nov 2007 21:51:18 -0800 (PST)
> 
> On Nov 24, 9:07 pm, Eli Zaretskii <e...@gnu.org> wrote:
> 
> > What is your locale?
> 
> None set.

That's not possible on Windows, AFAIK.

What does Emacs produce under "Important settings" in the *mail*
buffer if you type "M-x report-emacs-bug RET foo RET"?

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Opening file in UTF-8 mode automatically
  2007-11-24 16:38 Opening file in UTF-8 mode automatically spamfilteraccount
                   ` (3 preceding siblings ...)
       [not found] ` <mailman.4039.1195934839.18990.help-gnu-emacs@gnu.org>
@ 2007-11-25 22:49 ` Xah Lee
  2007-11-27  7:15   ` spamfilteraccount
  4 siblings, 1 reply; 18+ messages in thread
From: Xah Lee @ 2007-11-25 22:49 UTC (permalink / raw)
  To: help-gnu-emacs

Not sure about auto-detecting, but i work with utf-8 daily and never
have a problem.

I do, however, set my emacs to use utf-8 by default.

To set your file encoding in emacs, use the menu "Options→Mule
(Multilingual Environment)→Set Language Environment".

After you've pulled the menu, be sure to also pull the menu command
"Options→Save Options" so that emacs remembers your settings.

or

Alt+x set-language-environment UTF-8

See also:
* Emacs and Unicode tips
http://xahlee.org/emacs/emacs_n_unicode.html

  Xah
  xah@xahlee.org
\xAD\xF4 http://xahlee.org/

On Nov 24, 8:38 am, "spamfilteracco...@gmail.com"
<spamfilteracco...@gmail.com> wrote:
> I need to edit some UTF-8 files and it's very annoying emacs doesn't
> detect it automatically (I have to reopen them as utf-8 manually) and
> sometimes I notice it only after I already edited and saved the file
> which messes up the formatting.
>
> I tried prefer-coding-system utf-8, but it didn't help.
>
> I can't put lisp code into the files, because they are data files.
>
> Is there a definitive way to do it? The BOM is at the beginning of the
> files, so Emacs could detect it automatically.
>
> It's emacs 22.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Opening file in UTF-8 mode automatically
  2007-11-25 11:33     ` Reiner Steib
@ 2007-11-27  7:12       ` spamfilteraccount
  0 siblings, 0 replies; 18+ messages in thread
From: spamfilteraccount @ 2007-11-27  7:12 UTC (permalink / raw)
  To: help-gnu-emacs

On Nov 25, 12:33 pm, Reiner Steib <reinersteib+gm...@imap.cc> wrote:
> On Sun, Nov 25 2007, spamfilteracco...@gmail.com wrote:
> > On Nov 24, 7:38 pm, Reiner Steib <reinersteib+gm...@imap.cc> wrote:
> >> `auto-coding-regexp-alist' in Emacs 22 already has an entry for the
> >> BOM: ("\\`\xEF\xBB\xBF" . utf-8).
>
> > Yep, it's there, I checked it, but it doesn't trigger opening the file
> > in utf-8 mode.
>
> Works for me:
>

If anyone can tell me what code should trigger utf-8 mode via auto-
coding-regexp-alist then I can debug the problem myself.

I'm quite knowledgeable in Emacs Lisp, so if it's in the lisp part
then I can surely debug it.

I took a cursory look into mule.el and I saw auto-coding-regexp-alist
is used there, but I haven't seen any apparent mechanism tying that
stuff to file opening (no find-file-hook or something).

Anyone can give me some pointers regarding this?

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Opening file in UTF-8 mode automatically
       [not found]     ` <mailman.4082.1196024377.18990.help-gnu-emacs@gnu.org>
@ 2007-11-27  7:14       ` spamfilteraccount
  2007-11-27  7:30         ` Zhang Wei
  2007-11-27 21:48         ` Eli Zaretskii
  0 siblings, 2 replies; 18+ messages in thread
From: spamfilteraccount @ 2007-11-27  7:14 UTC (permalink / raw)
  To: help-gnu-emacs

On Nov 25, 9:59 pm, Eli Zaretskii <e...@gnu.org> wrote:
> > From: "spamfilteracco...@gmail.com" <spamfilteracco...@gmail.com>
> > Date: Sat, 24 Nov 2007 21:51:18 -0800 (PST)
>
> > On Nov 24, 9:07 pm, Eli Zaretskii <e...@gnu.org> wrote:
>
> > > What is your locale?
>
> > None set.
>
> That's not possible on Windows, AFAIK.
>
> What does Emacs produce under "Important settings" in the *mail*
> buffer if you type "M-x report-emacs-bug RET foo RET"?

Important settings:
  value of $LC_ALL: nil
  value of $LC_COLLATE: nil
  value of $LC_CTYPE: nil
  value of $LC_MESSAGES: nil
  value of $LC_MONETARY: nil
  value of $LC_NUMERIC: nil
  value of $LC_TIME: nil
  value of $LANG: HUN
  locale-coding-system: cp1252
  default-enable-multibyte-characters: nil

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Opening file in UTF-8 mode automatically
  2007-11-25 22:49 ` Xah Lee
@ 2007-11-27  7:15   ` spamfilteraccount
  0 siblings, 0 replies; 18+ messages in thread
From: spamfilteraccount @ 2007-11-27  7:15 UTC (permalink / raw)
  To: help-gnu-emacs

On Nov 25, 11:49 pm, Xah Lee <x...@xahlee.org> wrote:
> Not sure about auto-detecting, but i work with utf-8 daily and never
> have a problem.
>
> I do, however, set my emacs to use utf-8 by default.
>
> To set your file encoding in emacs, use the menu "Options→Mule
> (Multilingual Environment)→Set Language Environment".
>
> After you've pulled the menu, be sure to also pull the menu command
> "Options→Save Options" so that emacs remembers your settings.
>
> or
>
> Alt+x set-language-environment UTF-8
>
>

Didn't work. The file is still not opened in utf-8 mode.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Opening file in UTF-8 mode automatically
  2007-11-27  7:14       ` spamfilteraccount
@ 2007-11-27  7:30         ` Zhang Wei
  2007-11-27 17:41           ` Reiner Steib
                             ` (2 more replies)
  2007-11-27 21:48         ` Eli Zaretskii
  1 sibling, 3 replies; 18+ messages in thread
From: Zhang Wei @ 2007-11-27  7:30 UTC (permalink / raw)
  To: help-gnu-emacs

"spamfilteraccount@gmail.com" <spamfilteraccount@gmail.com> writes:

[...]

>   default-enable-multibyte-characters: nil

I think this variable should be set to "t" to open a utf-8 file.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Opening file in UTF-8 mode automatically
  2007-11-27  7:30         ` Zhang Wei
@ 2007-11-27 17:41           ` Reiner Steib
  2007-11-27 21:47           ` Eli Zaretskii
       [not found]           ` <mailman.4200.1196200033.18990.help-gnu-emacs@gnu.org>
  2 siblings, 0 replies; 18+ messages in thread
From: Reiner Steib @ 2007-11-27 17:41 UTC (permalink / raw)
  To: help-gnu-emacs

On Tue, Nov 27 2007, Zhang Wei wrote:

> "spamfilteraccount@gmail.com" <spamfilteraccount@gmail.com> writes:
>
> [...]
>
>>   default-enable-multibyte-characters: nil
>
> I think this variable should be set to "t" to open a utf-8 file.

It is t by default.  So the OP should check his init files and remove
the code that disables it.  See e.g.

,----[ (info "(emacs)Enabling Multibyte") ]
|    To turn off multibyte character support by default, start Emacs with
| the `--unibyte' option (*note Initial Options::), or set the
| environment variable `EMACS_UNIBYTE'.  You can also customize
| `enable-multibyte-characters' or, equivalently, directly set the
| variable `default-enable-multibyte-characters' to `nil' in your init
| file to have basically the same effect as `--unibyte'.
`----

Bye, Reiner.
-- 
       ,,,
      (o o)
---ooO-(_)-Ooo---  |  PGP key available  |  http://rsteib.home.pages.de/

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Opening file in UTF-8 mode automatically
  2007-11-27  7:30         ` Zhang Wei
  2007-11-27 17:41           ` Reiner Steib
@ 2007-11-27 21:47           ` Eli Zaretskii
       [not found]           ` <mailman.4200.1196200033.18990.help-gnu-emacs@gnu.org>
  2 siblings, 0 replies; 18+ messages in thread
From: Eli Zaretskii @ 2007-11-27 21:47 UTC (permalink / raw)
  To: help-gnu-emacs

> From: Zhang Wei <id.brep@gmail.com>
> Date: Tue, 27 Nov 2007 15:30:23 +0800
> 
> "spamfilteraccount@gmail.com" <spamfilteraccount@gmail.com> writes:
> 
> [...]
> 
> >   default-enable-multibyte-characters: nil
> 
> I think this variable should be set to "t" to open a utf-8 file.

Yes, of course!

To the OP: why are you running Emacs in the unibyte mode?  Does the
problem go away if you invoke Emacs with "emacs -Q"?

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Opening file in UTF-8 mode automatically
  2007-11-27  7:14       ` spamfilteraccount
  2007-11-27  7:30         ` Zhang Wei
@ 2007-11-27 21:48         ` Eli Zaretskii
  1 sibling, 0 replies; 18+ messages in thread
From: Eli Zaretskii @ 2007-11-27 21:48 UTC (permalink / raw)
  To: help-gnu-emacs

> From: "spamfilteraccount@gmail.com" <spamfilteraccount@gmail.com>
> Date: Mon, 26 Nov 2007 23:14:14 -0800 (PST)
> 
> On Nov 25, 9:59 pm, Eli Zaretskii <e...@gnu.org> wrote:
> > > From: "spamfilteracco...@gmail.com" <spamfilteracco...@gmail.com>
> > > Date: Sat, 24 Nov 2007 21:51:18 -0800 (PST)
> >
> > > On Nov 24, 9:07 pm, Eli Zaretskii <e...@gnu.org> wrote:
> >
> > > > What is your locale?
> >
> > > None set.
> >
> > That's not possible on Windows, AFAIK.
> >
> > What does Emacs produce under "Important settings" in the *mail*
> > buffer if you type "M-x report-emacs-bug RET foo RET"?
> 
> Important settings:
>   value of $LC_ALL: nil
>   value of $LC_COLLATE: nil
>   value of $LC_CTYPE: nil
>   value of $LC_MESSAGES: nil
>   value of $LC_MONETARY: nil
>   value of $LC_NUMERIC: nil
>   value of $LC_TIME: nil
>   value of $LANG: HUN
>   locale-coding-system: cp1252

So, as you see, yours is the Hungarian locale with codepage 1252 as
the locale-native encoding.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Opening file in UTF-8 mode automatically
       [not found]           ` <mailman.4200.1196200033.18990.help-gnu-emacs@gnu.org>
@ 2007-11-28  6:54             ` spamfilteraccount
  0 siblings, 0 replies; 18+ messages in thread
From: spamfilteraccount @ 2007-11-28  6:54 UTC (permalink / raw)
  To: help-gnu-emacs

On Nov 27, 10:47 pm, Eli Zaretskii <e...@gnu.org> wrote:
> > From: Zhang Wei <id.b...@gmail.com>
> > Date: Tue, 27 Nov 2007 15:30:23 +0800
>
> > "spamfilteracco...@gmail.com" <spamfilteracco...@gmail.com> writes:
>
> > [...]
>
> > >   default-enable-multibyte-characters: nil
>
> > I think this variable should be set to "t" to open a utf-8 file.
>
> Yes, of course!
>
> To the OP: why are you running Emacs in the unibyte mode?

I don't know. :)

I checked it and I did start emacs with --unibyte. I set it ages ago
and completely forgot about it. I didn't edit utf encoded files in the
past, so it was no problem.

I removed the --unibyte option and now everything is working fine.
UTF-8 files are opened in UTF-8 mode automatically.

Thanks for the help everyone.

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2007-11-28  6:54 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-11-24 16:38 Opening file in UTF-8 mode automatically spamfilteraccount
2007-11-24 18:26 ` Peter Dyballa
2007-11-24 20:09   ` Eli Zaretskii
2007-11-24 18:38 ` Reiner Steib
2007-11-25  5:49   ` spamfilteraccount
2007-11-25 11:33     ` Reiner Steib
2007-11-27  7:12       ` spamfilteraccount
2007-11-24 20:07 ` Eli Zaretskii
     [not found] ` <mailman.4039.1195934839.18990.help-gnu-emacs@gnu.org>
2007-11-25  5:51   ` spamfilteraccount
2007-11-25 20:59     ` Eli Zaretskii
     [not found]     ` <mailman.4082.1196024377.18990.help-gnu-emacs@gnu.org>
2007-11-27  7:14       ` spamfilteraccount
2007-11-27  7:30         ` Zhang Wei
2007-11-27 17:41           ` Reiner Steib
2007-11-27 21:47           ` Eli Zaretskii
     [not found]           ` <mailman.4200.1196200033.18990.help-gnu-emacs@gnu.org>
2007-11-28  6:54             ` spamfilteraccount
2007-11-27 21:48         ` Eli Zaretskii
2007-11-25 22:49 ` Xah Lee
2007-11-27  7:15   ` spamfilteraccount

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).