unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
* Editing exportet registry files
@ 2005-06-30 20:27 Markus Gritsch
  2005-06-30 21:23 ` Stefan Monnier
  0 siblings, 1 reply; 38+ messages in thread
From: Markus Gritsch @ 2005-06-30 20:27 UTC (permalink / raw)


Hi,

when I export part of the registry on MS Windows to a .reg file, it is a 
Unicode file (little-endian) encoded with BOM (FF FE) as the first 
bytes.  Is it possible to edit such a file with Emacs?

Kind regards,
Markus

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Editing exportet registry files
  2005-06-30 20:27 Markus Gritsch
@ 2005-06-30 21:23 ` Stefan Monnier
  2005-07-01  7:12   ` gritsch
  2005-07-01  8:28   ` gritsch
  0 siblings, 2 replies; 38+ messages in thread
From: Stefan Monnier @ 2005-06-30 21:23 UTC (permalink / raw)
  Cc: emacs-devel

> when I export part of the registry on MS Windows to a .reg file, it is
> a Unicode file (little-endian) encoded with BOM (FF FE) as the first bytes.

You mean, it's utf-16-le

> Is it possible to edit such a file with Emacs?

Yes,


        Stefan

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Editing exportet registry files
  2005-06-30 21:23 ` Stefan Monnier
@ 2005-07-01  7:12   ` gritsch
  2005-07-01  8:17     ` Miles Bader
  2005-07-01  8:24     ` Juanma Barranquero
  2005-07-01  8:28   ` gritsch
  1 sibling, 2 replies; 38+ messages in thread
From: gritsch @ 2005-07-01  7:12 UTC (permalink / raw)
  Cc: emacs-devel

[-- Attachment #1: Type: text/plain, Size: 588 bytes --]

Quoting Stefan Monnier <monnier@iro.umontreal.ca>:

> > when I export part of the registry on MS Windows to a .reg file, it is
> > a Unicode file (little-endian) encoded with BOM (FF FE) as the first
> bytes.
> 
> You mean, it's utf-16-le
> 
> > Is it possible to edit such a file with Emacs?
> 
> Yes,

Can you give me any advice how to achive this?  When I load such a file it is
displayed like in the attached screen shot.

Thank you in advance,
Markus


----------------------------------------------------------------
This message was sent using IMP, the Internet Messaging Program.

[-- Attachment #2: screen.png --]
[-- Type: image/png, Size: 5478 bytes --]

[-- Attachment #3: Type: text/plain, Size: 142 bytes --]

_______________________________________________
Emacs-devel mailing list
Emacs-devel@gnu.org
http://lists.gnu.org/mailman/listinfo/emacs-devel

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Editing exportet registry files
  2005-07-01  7:12   ` gritsch
@ 2005-07-01  8:17     ` Miles Bader
  2005-07-01  8:24     ` Juanma Barranquero
  1 sibling, 0 replies; 38+ messages in thread
From: Miles Bader @ 2005-07-01  8:17 UTC (permalink / raw)
  Cc: Stefan Monnier, emacs-devel

>> You mean, it's utf-16-le
>
> Can you give me any advice how to achive this?  When I load such a file it is
> displayed like in the attached screen shot.

   C-u C-m c utf-16-le RET C-x C-f FILENAME RET

-Miles
-- 
"Whatever you do will be insignificant, but it is very important that
 you do it."  Mahatma Gandhi

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Editing exportet registry files
  2005-07-01  7:12   ` gritsch
  2005-07-01  8:17     ` Miles Bader
@ 2005-07-01  8:24     ` Juanma Barranquero
  2005-07-01 17:45       ` David Kastrup
  1 sibling, 1 reply; 38+ messages in thread
From: Juanma Barranquero @ 2005-07-01  8:24 UTC (permalink / raw)
  Cc: emacs-devel

On 7/1/05, gritsch@iue.tuwien.ac.at <gritsch@iue.tuwien.ac.at> wrote:

> Can you give me any advice how to achive this?

Try

  C-x RET c utf-16-le RET

(that's `universal-coding-system-argument'), then:

C-x C-f yourfile RET

and you should be able to edit the file.

-- 
                    /L/e/k/t/u

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Editing exportet registry files
  2005-06-30 21:23 ` Stefan Monnier
  2005-07-01  7:12   ` gritsch
@ 2005-07-01  8:28   ` gritsch
  1 sibling, 0 replies; 38+ messages in thread
From: gritsch @ 2005-07-01  8:28 UTC (permalink / raw)
  Cc: emacs-devel

[-- Attachment #1: Type: text/plain, Size: 588 bytes --]

Quoting Stefan Monnier <monnier@iro.umontreal.ca>:

> > when I export part of the registry on MS Windows to a .reg file, it is
> > a Unicode file (little-endian) encoded with BOM (FF FE) as the first
> bytes.
> 
> You mean, it's utf-16-le
> 
> > Is it possible to edit such a file with Emacs?
> 
> Yes,

Can you give me any advice how to achive this?  When I load such a file it is
displayed like in the attached screen shot.

Thank you in advance,
Markus


----------------------------------------------------------------
This message was sent using IMP, the Internet Messaging Program.

[-- Attachment #2: screen.png --]
[-- Type: image/png, Size: 5478 bytes --]

[-- Attachment #3: Type: text/plain, Size: 142 bytes --]

_______________________________________________
Emacs-devel mailing list
Emacs-devel@gnu.org
http://lists.gnu.org/mailman/listinfo/emacs-devel

^ permalink raw reply	[flat|nested] 38+ messages in thread

* RE: Editing exportet registry files
@ 2005-07-01  9:02 Dhruva Krishnamurthy (RBIN/EDI3) *
  2005-07-01  9:49 ` gritsch
  0 siblings, 1 reply; 38+ messages in thread
From: Dhruva Krishnamurthy (RBIN/EDI3) * @ 2005-07-01  9:02 UTC (permalink / raw)
  Cc: emacs-devel

Hello,

> Can you give me any advice how to achive this?  When I load 
> such a file it is
> displayed like in the attached screen shot.

1. Open the file in Emacs (normal file open).
2. Use the following key strokes: C-x Ret r
3. Type utf-16le at the prompt
4. Accept the file reverting...

You can then see a more meaningful file which you can edit like any
other file in Emacs.

With best regards,
dk

^ permalink raw reply	[flat|nested] 38+ messages in thread

* RE: Editing exportet registry files
  2005-07-01  9:02 Editing exportet registry files Dhruva Krishnamurthy (RBIN/EDI3) *
@ 2005-07-01  9:49 ` gritsch
  2005-07-01 11:02   ` Mathias Dahl
  2005-07-02 14:50   ` Juanma Barranquero
  0 siblings, 2 replies; 38+ messages in thread
From: gritsch @ 2005-07-01  9:49 UTC (permalink / raw)
  Cc: Miles Bader, emacs-devel, Stefan Monnier, Juanma Barranquero

Thank you all very much for your help.  With the given information and the Emacs
ducumentation I have now added the following lines to my .emacs file, which
makes editing exportet registry files less painfull:

(setq file-coding-system-alist
      (append '(("\\.reg\\'" . utf-16le-with-signature))
              file-coding-system-alist))

Kind regards,
Markus



Quoting "Dhruva Krishnamurthy (RBIN/EDI3) *"
<Dhruva.Krishnamurthy@in.bosch.com>:

> Hello,
> 
> > Can you give me any advice how to achive this?  When I load 
> > such a file it is
> > displayed like in the attached screen shot.
> 
> 1. Open the file in Emacs (normal file open).
> 2. Use the following key strokes: C-x Ret r
> 3. Type utf-16le at the prompt
> 4. Accept the file reverting...
> 
> You can then see a more meaningful file which you can edit like any
> other file in Emacs.
> 
> With best regards,
> dk


Quoting Juanma Barranquero <lekktu@gmail.com>:

> On 7/1/05, gritsch@iue.tuwien.ac.at <gritsch@iue.tuwien.ac.at> wrote:
> 
> > Can you give me any advice how to achive this?
> 
> Try
> 
>   C-x RET c utf-16-le RET
> 
> (that's `universal-coding-system-argument'), then:
> 
> C-x C-f yourfile RET
> 
> and you should be able to edit the file.
> 
> -- 
>                     /L/e/k/t/u


----------------------------------------------------------------
This message was sent using IMP, the Internet Messaging Program.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Editing exportet registry files
  2005-07-01  9:49 ` gritsch
@ 2005-07-01 11:02   ` Mathias Dahl
  2005-07-01 13:13     ` Juanma Barranquero
  2005-07-02 14:50   ` Juanma Barranquero
  1 sibling, 1 reply; 38+ messages in thread
From: Mathias Dahl @ 2005-07-01 11:02 UTC (permalink / raw)


gritsch@iue.tuwien.ac.at writes:

> Thank you all very much for your help.  With the given information and the Emacs
> ducumentation I have now added the following lines to my .emacs file, which
> makes editing exportet registry files less painfull:
>
> (setq file-coding-system-alist
>       (append '(("\\.reg\\'" . utf-16le-with-signature))
>               file-coding-system-alist))
>

Great! I have wondered how to do this and have been too lazy to find
out myself... :)

A comment, and I really do not want to troll here:

Why cannot Emacs handle this automatically? I mean, it already handles
LF vs CR + LF issues when I open text files encoded using either
format. Are there any drawbacks trying to "guess" more than what is
already done?

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Editing exportet registry files
  2005-07-01 11:02   ` Mathias Dahl
@ 2005-07-01 13:13     ` Juanma Barranquero
  0 siblings, 0 replies; 38+ messages in thread
From: Juanma Barranquero @ 2005-07-01 13:13 UTC (permalink / raw)
  Cc: emacs-devel

On 7/1/05, Mathias Dahl <brakjoller@gmail.com> wrote:

> Why cannot Emacs handle this automatically?

See `(emacs)Recognize Coding' on the Emacs manual. Basically, there's
no universal ordering of coding systems that would satisfy every
user's needs, so Emacs assumes a default order according to your
language environment, and you can always alter that ordering to match
your specific requirements.

-- 
                    /L/e/k/t/u

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Editing exportet registry files
  2005-07-01  8:24     ` Juanma Barranquero
@ 2005-07-01 17:45       ` David Kastrup
  2005-07-01 18:01         ` Juanma Barranquero
  0 siblings, 1 reply; 38+ messages in thread
From: David Kastrup @ 2005-07-01 17:45 UTC (permalink / raw)
  Cc: gritsch@iue.tuwien.ac.at, emacs-devel

Juanma Barranquero <lekktu@gmail.com> writes:

> On 7/1/05, gritsch@iue.tuwien.ac.at <gritsch@iue.tuwien.ac.at> wrote:
>
>> Can you give me any advice how to achive this?
>
> Try
>
>   C-x RET c utf-16-le RET
>
> (that's `universal-coding-system-argument'), then:
>
> C-x C-f yourfile RET
>
> and you should be able to edit the file.

Shouldn't the utf-x files with signature be quite in front of the list
of detected coding systems?  I mean, that's what the signature is good
for in the first place, right?

-- 
David Kastrup, Kriemhildstr. 15, 44793 Bochum

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Editing exportet registry files
  2005-07-01 17:45       ` David Kastrup
@ 2005-07-01 18:01         ` Juanma Barranquero
  2005-07-01 18:52           ` Gaëtan LEURENT
  2005-07-01 22:12           ` Jason Rumney
  0 siblings, 2 replies; 38+ messages in thread
From: Juanma Barranquero @ 2005-07-01 18:01 UTC (permalink / raw)
  Cc: emacs-devel

On 7/1/05, David Kastrup <dak@gnu.org> wrote:

> Shouldn't the utf-x files with signature be quite in front of the list
> of detected coding systems?  I mean, that's what the signature is good
> for in the first place, right?

Well, yeah, but FF and FE *are* valid characters in many encodings.
Latin encodings for most european language environments are going to
be higher up the priority list, for example. It makes no sense putting
utf-* encodings before the others unless you know beforehand that
you're going to deal with a lot of these files.

And there's an UTF-8 language environment, after all, for people who
routinely deals with utf-8 data. Are there environments (operating
systems, language environments, whatever) where utf-16 is the norm?

-- 
                    /L/e/k/t/u

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Editing exportet registry files
  2005-07-01 18:01         ` Juanma Barranquero
@ 2005-07-01 18:52           ` Gaëtan LEURENT
  2005-07-01 23:25             ` Juanma Barranquero
  2005-07-01 22:12           ` Jason Rumney
  1 sibling, 1 reply; 38+ messages in thread
From: Gaëtan LEURENT @ 2005-07-01 18:52 UTC (permalink / raw)



Juanma Barranquero wrote on 01 Jul 2005 20:01:14 +0200:

>> Shouldn't the utf-x files with signature be quite in front of the list
>> of detected coding systems?  I mean, that's what the signature is good
>> for in the first place, right?
>
> Well, yeah, but FF and FE *are* valid characters in many encodings.
> Latin encodings for most european language environments are going to
> be higher up the priority list, for example.

Well, a Latin-1 file is quite unlikely to begin with « þÿ » or « ÿþ ».

> Are there environments (operating systems, language environments,
> whatever) where utf-16 is the norm?

I think it's the case on MS Windows.

-- 
Gaëtan LEURENT

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Editing exportet registry files
  2005-07-01 18:01         ` Juanma Barranquero
  2005-07-01 18:52           ` Gaëtan LEURENT
@ 2005-07-01 22:12           ` Jason Rumney
  2005-07-01 23:38             ` Juanma Barranquero
  2005-07-03 19:52             ` Gaëtan LEURENT
  1 sibling, 2 replies; 38+ messages in thread
From: Jason Rumney @ 2005-07-01 22:12 UTC (permalink / raw)
  Cc: emacs-devel

Juanma Barranquero <lekktu@gmail.com> writes:

> Well, yeah, but FF and FE *are* valid characters in many encodings.

How common is it to have FF FE or FE FF as the first two characters in
text in any other encoding? Is it acceptable for Emacs to ignore the
most common case where those two bytes will appear in sequence as the
first two bytes of a file, because of some theoretical worry that it
might break a hypothetical case that I suspect will only exist in real
life if someone deliberately sets out to break auto-detection.

> Latin encodings for most european language environments are going to
> be higher up the priority list, for example. It makes no sense putting
> utf-* encodings before the others unless you know beforehand that
> you're going to deal with a lot of these files.

Nonsense. It is very unlikely that UTF-16-LE-WITH-SIGNATURE,
UTF-16-BE-WITH-SIGNATURE, or even UTF-8 will falsely match any Latin
(or cyrillic or probably Asian) encoding. They should be at the front
of the list.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Editing exportet registry files
  2005-07-01 18:52           ` Gaëtan LEURENT
@ 2005-07-01 23:25             ` Juanma Barranquero
  0 siblings, 0 replies; 38+ messages in thread
From: Juanma Barranquero @ 2005-07-01 23:25 UTC (permalink / raw)


On 7/1/05, Gaëtan LEURENT <gaetan.leurent@ens.fr> wrote:

> Well, a Latin-1 file is quite unlikely to begin with «þÿ» or «ÿþ».

Sure. But "quite unlikely" is not "imposible".

> I think it's the case on MS Windows.

Well, I use MS Windows and most of my files are ASCII, Latin-1,
Latin-9, a few UTF-8. Other than the .REG ones, I'm not aware of any
files that I use frequently (system files or other) which are encoded
as UTF-16.

-- 
                    /L/e/k/t/u

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Editing exportet registry files
  2005-07-01 22:12           ` Jason Rumney
@ 2005-07-01 23:38             ` Juanma Barranquero
  2005-07-02 11:50               ` Andreas Schwab
  2005-07-03 19:52             ` Gaëtan LEURENT
  1 sibling, 1 reply; 38+ messages in thread
From: Juanma Barranquero @ 2005-07-01 23:38 UTC (permalink / raw)
  Cc: emacs-devel

On 7/2/05, Jason Rumney <jasonr@gnu.org> wrote:
> Juanma Barranquero <lekktu@gmail.com> writes:

> How common is it to have FF FE or FE FF as the first two characters in
> text in any other encoding?

Pretty uncommon. I haven't said otherwise. I'm just pointing out what
I suppose is the original reason to not putting the utf-16 encodings
higher up on the list.

> because of some theoretical worry that it
> might break a hypothetical case that I suspect will only exist in real
> life if someone deliberately sets out to break auto-detection.

I've not checked other encodings. Did you? Are you really sure that
all other frequently used 8-bit encodings put uncommon characters for
0xFF and 0xFE? Because the fact that they aren't ASCII doesn't mean
that they are infrequent in the target language.

> Nonsense. It is very unlikely that UTF-16-LE-WITH-SIGNATURE,
> UTF-16-BE-WITH-SIGNATURE, or even UTF-8 will falsely match any Latin
> (or cyrillic or probably Asian) encoding.

I lack the confidence that you apparently have. I suppose you're
better informed than me (I'm not being facetious). So just change it,
or propose it to be changed. (And perhaps it'd be wise to hear what
Handa-san thinks about it.)

-- 
                    /L/e/k/t/u

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Editing exportet registry files
  2005-07-01 23:38             ` Juanma Barranquero
@ 2005-07-02 11:50               ` Andreas Schwab
  2005-07-02 14:33                 ` Juanma Barranquero
  0 siblings, 1 reply; 38+ messages in thread
From: Andreas Schwab @ 2005-07-02 11:50 UTC (permalink / raw)
  Cc: emacs-devel, Jason Rumney

Juanma Barranquero <lekktu@gmail.com> writes:

> I've not checked other encodings. Did you? Are you really sure that
> all other frequently used 8-bit encodings put uncommon characters for
> 0xFF and 0xFE? Because the fact that they aren't ASCII doesn't mean
> that they are infrequent in the target language.

Among the 8-bit encodings supported by glibc there are IMHO no encodings
which have a significant probability of being misdetected as UTF-16.  They
either don't define both code points, or define them to characters that
are very unlikely to occur next to each other, let alone at the start of a
file.

Andreas.

-- 
Andreas Schwab, SuSE Labs, schwab@suse.de
SuSE Linux Products GmbH, Maxfeldstraße 5, 90409 Nürnberg, Germany
Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Editing exportet registry files
  2005-07-02 11:50               ` Andreas Schwab
@ 2005-07-02 14:33                 ` Juanma Barranquero
  0 siblings, 0 replies; 38+ messages in thread
From: Juanma Barranquero @ 2005-07-02 14:33 UTC (permalink / raw)
  Cc: emacs-devel

On 7/2/05, Andreas Schwab <schwab@suse.de> wrote:
> Juanma Barranquero <lekktu@gmail.com> writes:

> Among the 8-bit encodings supported by glibc there are IMHO no encodings
> which have a significant probability of being misdetected as UTF-16.

Well, if you're right that would settle the issue, wouldn't it?
Propose it changed and let people opine.

-- 
                    /L/e/k/t/u

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Editing exportet registry files
  2005-07-01  9:49 ` gritsch
  2005-07-01 11:02   ` Mathias Dahl
@ 2005-07-02 14:50   ` Juanma Barranquero
  2005-07-02 14:55     ` Juanma Barranquero
  1 sibling, 1 reply; 38+ messages in thread
From: Juanma Barranquero @ 2005-07-02 14:50 UTC (permalink / raw)
  Cc: Dhruva Krishnamurthy (RBIN/EDI3) *, emacs-devel, Stefan Monnier,
	Miles Bader

On 7/1/05, gritsch@iue.tuwien.ac.at <gritsch@iue.tuwien.ac.at> wrote:

> With the given information and the Emacs
> ducumentation I have now added the following lines to my .emacs file, which
> makes editing exportet registry files less painfull:
> 
> (setq file-coding-system-alist
>       (append '(("\\.reg\\'" . utf-16le-with-signature))
>               file-coding-system-alist))

FYI, re-reading the docs it seems the "right" way would be:

  (modify-coding-system-alist 'file "\\.reg\\'" 'utf-16le-with-signature)

which, according to "(emacs)Recognize Coding", is the recommended
method to modify the *-coding-system-alist variables.

-- 
                    /L/e/k/t/u

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Editing exportet registry files
  2005-07-02 14:50   ` Juanma Barranquero
@ 2005-07-02 14:55     ` Juanma Barranquero
  2005-07-02 16:07       ` Juanma Barranquero
  2005-07-08  1:52       ` Kenichi Handa
  0 siblings, 2 replies; 38+ messages in thread
From: Juanma Barranquero @ 2005-07-02 14:55 UTC (permalink / raw)


On 7/2/05, Juanma Barranquero <lekktu@gmail.com> wrote:

> which, according to "(emacs)Recognize Coding"

BTW, that same node contains this info:

     If Emacs recognizes the encoding of a file incorrectly, you can
  reread the file using the correct coding system by typing `C-x <RET> c
  CODING-SYSTEM <RET> M-x revert-buffer <RET>'.

Shouldn't it recommend instead `C-x <RET> r CODING-SYSTEM <RET>',
i.e., `revert-buffer-with-coding-system'?

-- 
                    /L/e/k/t/u

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Editing exportet registry files
  2005-07-02 14:55     ` Juanma Barranquero
@ 2005-07-02 16:07       ` Juanma Barranquero
  2005-07-07  0:28         ` Juanma Barranquero
  2005-07-08  1:52       ` Kenichi Handa
  1 sibling, 1 reply; 38+ messages in thread
From: Juanma Barranquero @ 2005-07-02 16:07 UTC (permalink / raw)


Is that to be expected somehow?

emacs -Q

M-x ielm <RET>
ELISP> (prefer-coding-system 'utf-16le-with-signature)
(mule-utf-16le-with-signature-dos . mule-utf-16le-with-signature-unix)

ELISP> (list-coding-systems)
*** Eval error ***  Cannot open load file: mule-diag
ELISP> 

-- 
                    /L/e/k/t/u

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Editing exportet registry files
  2005-07-01 22:12           ` Jason Rumney
  2005-07-01 23:38             ` Juanma Barranquero
@ 2005-07-03 19:52             ` Gaëtan LEURENT
  2005-07-03 21:34               ` Jason Rumney
  2005-07-04  7:43               ` Kaloian Doganov
  1 sibling, 2 replies; 38+ messages in thread
From: Gaëtan LEURENT @ 2005-07-03 19:52 UTC (permalink / raw)
  Cc: Juanma Barranquero, emacs-devel


Jason Rumney wrote on 02 Jul 2005 00:12:53 +0200:

> Nonsense. It is very unlikely that UTF-16-LE-WITH-SIGNATURE,
> UTF-16-BE-WITH-SIGNATURE, or even UTF-8 will falsely match any Latin
> (or cyrillic or probably Asian) encoding. They should be at the front
> of the list.

For UTF-16 with signature, I agree, but UTF-8 could sometimes match a
Latin-1 file. For instance, "4×½=2" encoded in Latin-1 is valid as a
UTF-8 string. A friend of mine suggested "Try our new exclusive WAZA®
for just $0.02!" which is even meaningful in both cases.

-- 
Gaëtan LEURENT

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Editing exportet registry files
  2005-07-03 19:52             ` Gaëtan LEURENT
@ 2005-07-03 21:34               ` Jason Rumney
  2005-07-03 21:48                 ` David Kastrup
  2005-07-04  7:43               ` Kaloian Doganov
  1 sibling, 1 reply; 38+ messages in thread
From: Jason Rumney @ 2005-07-03 21:34 UTC (permalink / raw)
  Cc: Juanma Barranquero

gaetan.leurent@ens.fr (Gaëtan LEURENT) writes:

> For UTF-16 with signature, I agree, but UTF-8 could sometimes match a
> Latin-1 file. For instance, "4×½=2" encoded in Latin-1 is valid as a
> UTF-8 string. A friend of mine suggested "Try our new exclusive WAZA®
> for just $0.02!" which is even meaningful in both cases.

Coming up with isolated theoretical problem cases should not stop us
from doing what is correct in the other 99% of cases.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Editing exportet registry files
  2005-07-03 21:34               ` Jason Rumney
@ 2005-07-03 21:48                 ` David Kastrup
  0 siblings, 0 replies; 38+ messages in thread
From: David Kastrup @ 2005-07-03 21:48 UTC (permalink / raw)
  Cc: Juanma Barranquero, emacs-devel

Jason Rumney <jasonr@gnu.org> writes:

> gaetan.leurent@ens.fr (Gaëtan LEURENT) writes:
>
>> For UTF-16 with signature, I agree, but UTF-8 could sometimes match
>> a Latin-1 file. For instance, "4×½=2" encoded in Latin-1 is valid
>> as a UTF-8 string. A friend of mine suggested "Try our new
>> exclusive WAZA® for just $0.02!" which is even meaningful in both
>> cases.
>
> Coming up with isolated theoretical problem cases should not stop us
> from doing what is correct in the other 99% of cases.

I think Gaëtan is arguing that we should not prefer UTF-8 in a Latin-1
locale.  This is pretty much a red herring: we were discussing the
UTF-16-with-signature encodings: there is no necessity whatsoever to
group their priority with UTF-8.

I agree that in a Latin-1 locale, Latin-1 should be preferred over
UTF-8 and vice versa as long as the buffers can be interpreted as
being valid in both encodings.

-- 
David Kastrup, Kriemhildstr. 15, 44793 Bochum

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Editing exportet registry files
  2005-07-03 19:52             ` Gaëtan LEURENT
  2005-07-03 21:34               ` Jason Rumney
@ 2005-07-04  7:43               ` Kaloian Doganov
  1 sibling, 0 replies; 38+ messages in thread
From: Kaloian Doganov @ 2005-07-04  7:43 UTC (permalink / raw)
  Cc: lekktu, emacs-devel, jasonr


> For UTF-16 with signature, I agree, but UTF-8 could sometimes match a
> Latin-1 file.

I would like to stress that. Latin-1 (ISO-8859-1) is a superset of
US-ASCII. The first 128 characters are basically US-ASCII. On the other
hand, although UTF-8 is a variable length encoding, it is designed to
match US-ASCII in it's first 128 characters (Unicode range U+0000 to
U+007F). These characters are encoded as single bytes in UTF-8.

So, every single US-ASCII file out there is a valid UTF-8 file. This is
one of the features of UTF-8.

And for historical reasons, every US-ASCII file is a valid Latin-1 file.


-- 
Поздрави,
Калоян Доганов,
Сдружение "Свободен софтуер".
___________________________________________________________
Ако не отговарям на писмата Ви: http://6lyokavitza.org/mail

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Editing exportet registry files
  2005-07-02 16:07       ` Juanma Barranquero
@ 2005-07-07  0:28         ` Juanma Barranquero
  2005-07-07  6:25           ` Kenichi Handa
  0 siblings, 1 reply; 38+ messages in thread
From: Juanma Barranquero @ 2005-07-07  0:28 UTC (permalink / raw)


Am I the only one who gets this error?

> emacs -Q
> 
> M-x ielm <RET>
> ELISP> (prefer-coding-system 'utf-16le-with-signature)
> (mule-utf-16le-with-signature-dos . mule-utf-16le-with-signature-unix)
> 
> ELISP> (list-coding-systems)
> *** Eval error ***  Cannot open load file: mule-diag
> ELISP>

After I use `prefer-coding-system' and select an utf-*-with-signature,
Emacs becomes unusable. I must kill it forcefully.

-- 
                    /L/e/k/t/u

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Editing exportet registry files
  2005-07-07  0:28         ` Juanma Barranquero
@ 2005-07-07  6:25           ` Kenichi Handa
  2005-07-07  9:09             ` Juanma Barranquero
  0 siblings, 1 reply; 38+ messages in thread
From: Kenichi Handa @ 2005-07-07  6:25 UTC (permalink / raw)
  Cc: emacs-devel

In article <f7ccd24b050706172858123cbe@mail.gmail.com>, Juanma Barranquero <lekktu@gmail.com> writes:

> Am I the only one who gets this error?
>>  emacs -Q
>>  
>>  M-x ielm <RET>
ELISP>  (prefer-coding-system 'utf-16le-with-signature)
>>  (mule-utf-16le-with-signature-dos . mule-utf-16le-with-signature-unix)
>>  
ELISP>  (list-coding-systems)
>>  *** Eval error ***  Cannot open load file: mule-diag
ELISP> 

This is because prefer-coding-system sets also
default-file-name-coding-system.  It seems that any attempt
to set it (and keyboard-coding-system) to ascii-incompatible
coding system should be avoided.  So, I've just installed
these changes.

2005-07-07  Kenichi Handa  <handa@m17n.org>

	* international/mule.el (make-coding-system): Describe
	`ascii-incompatible' property in the docstring.
	(set-file-name-coding-system): Signal an error if coding-system is
	ascii-incompatible.
	(set-keyboard-coding-system): Likewise.

	* international/mule-cmds.el (set-default-coding-systems): Don't
	set default-file-name-coding-system and
	default-keyboard-coding-system if coding-system is
	ASCII-incompatible.

	* international/utf-16.el: Declare that all UTF-16-based coding
	systems ASCII-incompatible.

---
Kenichi Handa
handa@m17n.org

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Editing exportet registry files
  2005-07-07  6:25           ` Kenichi Handa
@ 2005-07-07  9:09             ` Juanma Barranquero
  2005-07-07  9:55               ` Juanma Barranquero
  0 siblings, 1 reply; 38+ messages in thread
From: Juanma Barranquero @ 2005-07-07  9:09 UTC (permalink / raw)
  Cc: emacs-devel

On 7/7/05, Kenichi Handa <handa@m17n.org> wrote:

> This is because prefer-coding-system sets also
> default-file-name-coding-system.  It seems that any attempt
> to set it (and keyboard-coding-system) to ascii-incompatible
> coding system should be avoided.  So, I've just installed
> these changes.

It works now. Thanks.

-- 
                    /L/e/k/t/u

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Editing exportet registry files
  2005-07-07  9:09             ` Juanma Barranquero
@ 2005-07-07  9:55               ` Juanma Barranquero
  2005-07-07 12:36                 ` Kenichi Handa
  0 siblings, 1 reply; 38+ messages in thread
From: Juanma Barranquero @ 2005-07-07  9:55 UTC (permalink / raw)
  Cc: emacs-devel

On 7/7/05, Kenichi Handa <handa@m17n.org> wrote:

> This is because prefer-coding-system sets also
> default-file-name-coding-system.

Unfortunately, setting

  (prefer-coding-system 'utf-16le-with-signature)

into .emacs is still not a viable option. I'm getting a lot of trouble
out of saveplace.el, and vc-diff returns gibberish.

-- 
                    /L/e/k/t/u

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Editing exportet registry files
  2005-07-07  9:55               ` Juanma Barranquero
@ 2005-07-07 12:36                 ` Kenichi Handa
  2005-07-07 13:35                   ` Juanma Barranquero
  0 siblings, 1 reply; 38+ messages in thread
From: Kenichi Handa @ 2005-07-07 12:36 UTC (permalink / raw)
  Cc: emacs-devel

In article <f7ccd24b05070702551aa68a1d@mail.gmail.com>, Juanma Barranquero <lekktu@gmail.com> writes:

> On 7/7/05, Kenichi Handa <handa@m17n.org> wrote:
>>  This is because prefer-coding-system sets also
>>  default-file-name-coding-system.

> Unfortunately, setting

>   (prefer-coding-system 'utf-16le-with-signature)

> into .emacs is still not a viable option. I'm getting a lot of trouble
> out of saveplace.el, and vc-diff returns gibberish.

When you prefer that coding system, Emacs communicates with
a process by utf-16le-with-signature, so it's not surprising
that vc-diff doesn't work.  I think almost all tools using
external process without explicit a coding system stop
working.

As I've never used saveplace.el, I don't know why it doesn't
work.  Please describe the problem in more detail.

---
Kenichi Handa
handa@m17n.org

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Editing exportet registry files
  2005-07-07 12:36                 ` Kenichi Handa
@ 2005-07-07 13:35                   ` Juanma Barranquero
  2005-07-07 18:56                     ` Benjamin Riefenstahl
  2005-07-08  1:42                     ` Kenichi Handa
  0 siblings, 2 replies; 38+ messages in thread
From: Juanma Barranquero @ 2005-07-07 13:35 UTC (permalink / raw)
  Cc: emacs-devel

On 7/7/05, Kenichi Handa <handa@m17n.org> wrote:

> When you prefer that coding system, Emacs communicates with
> a process by utf-16le-with-signature, so it's not surprising
> that vc-diff doesn't work.

I never doubted there was a good reason ;-)

So, is there any way to convince Emacs that I want to put
utf-16le-with-signature high in the list of coding systems to try when
reading a file (with find-file), and not when communicating with
processes, etc.? Basically I want to use latin-9 for everything, but I
want to be able to do C-x C-f my-utf-16-file and get it decoded right.

> As I've never used saveplace.el, I don't know why it doesn't
> work.  Please describe the problem in more detail.

I'll try to repeat the problem as soon as I get time.

-- 
                    /L/e/k/t/u

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Editing exportet registry files
  2005-07-07 13:35                   ` Juanma Barranquero
@ 2005-07-07 18:56                     ` Benjamin Riefenstahl
  2005-07-07 19:34                       ` Gaëtan LEURENT
  2005-07-08  0:15                       ` Juanma Barranquero
  2005-07-08  1:42                     ` Kenichi Handa
  1 sibling, 2 replies; 38+ messages in thread
From: Benjamin Riefenstahl @ 2005-07-07 18:56 UTC (permalink / raw)
  Cc: emacs-devel

Hi Juanma,


Juanma Barranquero writes:
> Basically I want to use latin-9 for everything, but I want to be
> able to do C-x C-f my-utf-16-file and get it decoded right.

Have you tried to do something like this:

   (prefer-coding-system 'utf-16le-with-signature)
   (prefer-coding-system 'latin-9)

I.e. first put UTF-16 on the "priority list for automatic detection"
and than override it again with Latin-9 for the defaults.


benny

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Editing exportet registry files
  2005-07-07 18:56                     ` Benjamin Riefenstahl
@ 2005-07-07 19:34                       ` Gaëtan LEURENT
  2005-07-07 21:32                         ` Benjamin Riefenstahl
  2005-07-08  0:15                       ` Juanma Barranquero
  1 sibling, 1 reply; 38+ messages in thread
From: Gaëtan LEURENT @ 2005-07-07 19:34 UTC (permalink / raw)
  Cc: Juanma Barranquero, emacs-devel


Benjamin Riefenstahl wrote on 07 Jul 2005 20:56:42 +0200:

> Have you tried to do something like this:
>
>    (prefer-coding-system 'utf-16le-with-signature)
>    (prefer-coding-system 'latin-9)
>
> I.e. first put UTF-16 on the "priority list for automatic detection"
> and than override it again with Latin-9 for the defaults.

That would try latin-9 first, and if the file is valid as latin-9 (which
is quite likely), it will fail. Anyway, utf-16 is already on the list on
my emacs ...

As I said in a previous post, I would be nice to have an easy way to
modify coding-category-list without doing the other stuff
prefer-coding-system does.

-- 
Gaëtan LEURENT

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Editing exportet registry files
  2005-07-07 19:34                       ` Gaëtan LEURENT
@ 2005-07-07 21:32                         ` Benjamin Riefenstahl
  0 siblings, 0 replies; 38+ messages in thread
From: Benjamin Riefenstahl @ 2005-07-07 21:32 UTC (permalink / raw)
  Cc: Juanma Barranquero, emacs-devel

Hi Gaëtan,

> Benjamin Riefenstahl wrote on 07 Jul 2005 20:56:42 +0200:
>>    (prefer-coding-system 'utf-16le-with-signature)
>>    (prefer-coding-system 'latin-9)

Gaëtan LEURENT writes:
> That would try latin-9 first, and if the file is valid as latin-9
> (which is quite likely), it will fail. [...]

I thought it would not detect a UTF-16 file as Latin-9 because of all
the NUL characters you get when you try to interpret UTF-16 as an
8-bit encoding.

I just did a test and sadly, detection of Latin-9 seems to like those
NUL characters fine, even though nobody would ever write such a file.
Pity.

benny

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Editing exportet registry files
  2005-07-07 18:56                     ` Benjamin Riefenstahl
  2005-07-07 19:34                       ` Gaëtan LEURENT
@ 2005-07-08  0:15                       ` Juanma Barranquero
  1 sibling, 0 replies; 38+ messages in thread
From: Juanma Barranquero @ 2005-07-08  0:15 UTC (permalink / raw)
  Cc: emacs-devel

On 7/7/05, Benjamin Riefenstahl <b.riefenstahl@turtle-trading.net> wrote:

>    (prefer-coding-system 'utf-16le-with-signature)
>    (prefer-coding-system 'latin-9)
> 
> I.e. first put UTF-16 on the "priority list for automatic detection"
> and than override it again with Latin-9 for the defaults.

Many UTF-16 files would be detected as Latin-9. Try it with a .REG
file in utf-16le-with-signature mode.

-- 
                    /L/e/k/t/u

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Editing exportet registry files
  2005-07-07 13:35                   ` Juanma Barranquero
  2005-07-07 18:56                     ` Benjamin Riefenstahl
@ 2005-07-08  1:42                     ` Kenichi Handa
  2005-07-11  9:37                       ` Juanma Barranquero
  1 sibling, 1 reply; 38+ messages in thread
From: Kenichi Handa @ 2005-07-08  1:42 UTC (permalink / raw)
  Cc: emacs-devel

In article <f7ccd24b0507070635367ca650@mail.gmail.com>, Juanma Barranquero <lekktu@gmail.com> writes:

> So, is there any way to convince Emacs that I want to put
> utf-16le-with-signature high in the list of coding systems to try when
> reading a file (with find-file), and not when communicating with
> processes, etc.? Basically I want to use latin-9 for everything, but I
> want to be able to do C-x C-f my-utf-16-file and get it decoded right.

You can use set-coding-priority as below:

(set-coding-priority (list (coding-system-category 'utf-16le-with-signature)))

>>  As I've never used saveplace.el, I don't know why it doesn't
>>  work.  Please describe the problem in more detail.

> I'll try to repeat the problem as soon as I get time.

Thank you.

---
Kenichi Handa
handa@m17n.org

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Editing exportet registry files
  2005-07-02 14:55     ` Juanma Barranquero
  2005-07-02 16:07       ` Juanma Barranquero
@ 2005-07-08  1:52       ` Kenichi Handa
  1 sibling, 0 replies; 38+ messages in thread
From: Kenichi Handa @ 2005-07-08  1:52 UTC (permalink / raw)
  Cc: emacs-devel

In article <f7ccd24b050702075517768c14@mail.gmail.com>, Juanma Barranquero <lekktu@gmail.com> writes:
> BTW, that same node contains this info:

>      If Emacs recognizes the encoding of a file incorrectly, you can
>   reread the file using the correct coding system by typing `C-x <RET> c
>   CODING-SYSTEM <RET> M-x revert-buffer <RET>'.

> Shouldn't it recommend instead `C-x <RET> r CODING-SYSTEM <RET>',
> i.e., `revert-buffer-with-coding-system'?

I agee.  I've just installed that change.

---
Kenichi Handa
handa@m17n.org

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Editing exportet registry files
  2005-07-08  1:42                     ` Kenichi Handa
@ 2005-07-11  9:37                       ` Juanma Barranquero
  0 siblings, 0 replies; 38+ messages in thread
From: Juanma Barranquero @ 2005-07-11  9:37 UTC (permalink / raw)
  Cc: emacs-devel

On 7/8/05, Kenichi Handa <handa@m17n.org> wrote:
> (set-coding-priority (list (coding-system-category 'utf-16le-with-signature)))

Ah, of course you're right. Thanks.

-- 
                    /L/e/k/t/u

^ permalink raw reply	[flat|nested] 38+ messages in thread

end of thread, other threads:[~2005-07-11  9:37 UTC | newest]

Thread overview: 38+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-07-01  9:02 Editing exportet registry files Dhruva Krishnamurthy (RBIN/EDI3) *
2005-07-01  9:49 ` gritsch
2005-07-01 11:02   ` Mathias Dahl
2005-07-01 13:13     ` Juanma Barranquero
2005-07-02 14:50   ` Juanma Barranquero
2005-07-02 14:55     ` Juanma Barranquero
2005-07-02 16:07       ` Juanma Barranquero
2005-07-07  0:28         ` Juanma Barranquero
2005-07-07  6:25           ` Kenichi Handa
2005-07-07  9:09             ` Juanma Barranquero
2005-07-07  9:55               ` Juanma Barranquero
2005-07-07 12:36                 ` Kenichi Handa
2005-07-07 13:35                   ` Juanma Barranquero
2005-07-07 18:56                     ` Benjamin Riefenstahl
2005-07-07 19:34                       ` Gaëtan LEURENT
2005-07-07 21:32                         ` Benjamin Riefenstahl
2005-07-08  0:15                       ` Juanma Barranquero
2005-07-08  1:42                     ` Kenichi Handa
2005-07-11  9:37                       ` Juanma Barranquero
2005-07-08  1:52       ` Kenichi Handa
  -- strict thread matches above, loose matches on Subject: below --
2005-06-30 20:27 Markus Gritsch
2005-06-30 21:23 ` Stefan Monnier
2005-07-01  7:12   ` gritsch
2005-07-01  8:17     ` Miles Bader
2005-07-01  8:24     ` Juanma Barranquero
2005-07-01 17:45       ` David Kastrup
2005-07-01 18:01         ` Juanma Barranquero
2005-07-01 18:52           ` Gaëtan LEURENT
2005-07-01 23:25             ` Juanma Barranquero
2005-07-01 22:12           ` Jason Rumney
2005-07-01 23:38             ` Juanma Barranquero
2005-07-02 11:50               ` Andreas Schwab
2005-07-02 14:33                 ` Juanma Barranquero
2005-07-03 19:52             ` Gaëtan LEURENT
2005-07-03 21:34               ` Jason Rumney
2005-07-03 21:48                 ` David Kastrup
2005-07-04  7:43               ` Kaloian Doganov
2005-07-01  8:28   ` gritsch

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).