unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
* Editing exportet registry files
@ 2005-06-30 20:27 Markus Gritsch
  2005-06-30 21:23 ` Stefan Monnier
  0 siblings, 1 reply; 19+ messages in thread
From: Markus Gritsch @ 2005-06-30 20:27 UTC (permalink / raw)


Hi,

when I export part of the registry on MS Windows to a .reg file, it is a 
Unicode file (little-endian) encoded with BOM (FF FE) as the first 
bytes.  Is it possible to edit such a file with Emacs?

Kind regards,
Markus

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Editing exportet registry files
  2005-06-30 20:27 Editing exportet registry files Markus Gritsch
@ 2005-06-30 21:23 ` Stefan Monnier
  2005-07-01  7:12   ` gritsch
  2005-07-01  8:28   ` Editing exportet registry files gritsch
  0 siblings, 2 replies; 19+ messages in thread
From: Stefan Monnier @ 2005-06-30 21:23 UTC (permalink / raw)
  Cc: emacs-devel

> when I export part of the registry on MS Windows to a .reg file, it is
> a Unicode file (little-endian) encoded with BOM (FF FE) as the first bytes.

You mean, it's utf-16-le

> Is it possible to edit such a file with Emacs?

Yes,


        Stefan

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Editing exportet registry files
  2005-06-30 21:23 ` Stefan Monnier
@ 2005-07-01  7:12   ` gritsch
  2005-07-01  8:17     ` Miles Bader
  2005-07-01  8:24     ` Juanma Barranquero
  2005-07-01  8:28   ` Editing exportet registry files gritsch
  1 sibling, 2 replies; 19+ messages in thread
From: gritsch @ 2005-07-01  7:12 UTC (permalink / raw)
  Cc: emacs-devel

[-- Attachment #1: Type: text/plain, Size: 588 bytes --]

Quoting Stefan Monnier <monnier@iro.umontreal.ca>:

> > when I export part of the registry on MS Windows to a .reg file, it is
> > a Unicode file (little-endian) encoded with BOM (FF FE) as the first
> bytes.
> 
> You mean, it's utf-16-le
> 
> > Is it possible to edit such a file with Emacs?
> 
> Yes,

Can you give me any advice how to achive this?  When I load such a file it is
displayed like in the attached screen shot.

Thank you in advance,
Markus


----------------------------------------------------------------
This message was sent using IMP, the Internet Messaging Program.

[-- Attachment #2: screen.png --]
[-- Type: image/png, Size: 5478 bytes --]

[-- Attachment #3: Type: text/plain, Size: 142 bytes --]

_______________________________________________
Emacs-devel mailing list
Emacs-devel@gnu.org
http://lists.gnu.org/mailman/listinfo/emacs-devel

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Editing exportet registry files
  2005-07-01  7:12   ` gritsch
@ 2005-07-01  8:17     ` Miles Bader
  2005-07-01  8:24     ` Juanma Barranquero
  1 sibling, 0 replies; 19+ messages in thread
From: Miles Bader @ 2005-07-01  8:17 UTC (permalink / raw)
  Cc: Stefan Monnier, emacs-devel

>> You mean, it's utf-16-le
>
> Can you give me any advice how to achive this?  When I load such a file it is
> displayed like in the attached screen shot.

   C-u C-m c utf-16-le RET C-x C-f FILENAME RET

-Miles
-- 
"Whatever you do will be insignificant, but it is very important that
 you do it."  Mahatma Gandhi

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Editing exportet registry files
  2005-07-01  7:12   ` gritsch
  2005-07-01  8:17     ` Miles Bader
@ 2005-07-01  8:24     ` Juanma Barranquero
  2005-07-01 17:45       ` David Kastrup
  1 sibling, 1 reply; 19+ messages in thread
From: Juanma Barranquero @ 2005-07-01  8:24 UTC (permalink / raw)
  Cc: emacs-devel

On 7/1/05, gritsch@iue.tuwien.ac.at <gritsch@iue.tuwien.ac.at> wrote:

> Can you give me any advice how to achive this?

Try

  C-x RET c utf-16-le RET

(that's `universal-coding-system-argument'), then:

C-x C-f yourfile RET

and you should be able to edit the file.

-- 
                    /L/e/k/t/u

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Editing exportet registry files
  2005-06-30 21:23 ` Stefan Monnier
  2005-07-01  7:12   ` gritsch
@ 2005-07-01  8:28   ` gritsch
  1 sibling, 0 replies; 19+ messages in thread
From: gritsch @ 2005-07-01  8:28 UTC (permalink / raw)
  Cc: emacs-devel

[-- Attachment #1: Type: text/plain, Size: 588 bytes --]

Quoting Stefan Monnier <monnier@iro.umontreal.ca>:

> > when I export part of the registry on MS Windows to a .reg file, it is
> > a Unicode file (little-endian) encoded with BOM (FF FE) as the first
> bytes.
> 
> You mean, it's utf-16-le
> 
> > Is it possible to edit such a file with Emacs?
> 
> Yes,

Can you give me any advice how to achive this?  When I load such a file it is
displayed like in the attached screen shot.

Thank you in advance,
Markus


----------------------------------------------------------------
This message was sent using IMP, the Internet Messaging Program.

[-- Attachment #2: screen.png --]
[-- Type: image/png, Size: 5478 bytes --]

[-- Attachment #3: Type: text/plain, Size: 142 bytes --]

_______________________________________________
Emacs-devel mailing list
Emacs-devel@gnu.org
http://lists.gnu.org/mailman/listinfo/emacs-devel

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Editing exportet registry files
  2005-07-01  8:24     ` Juanma Barranquero
@ 2005-07-01 17:45       ` David Kastrup
  2005-07-01 18:01         ` Juanma Barranquero
  2005-07-03 19:58         ` Coding system priority (was: Editing exportet registry files) Gaëtan LEURENT
  0 siblings, 2 replies; 19+ messages in thread
From: David Kastrup @ 2005-07-01 17:45 UTC (permalink / raw)
  Cc: gritsch@iue.tuwien.ac.at, emacs-devel

Juanma Barranquero <lekktu@gmail.com> writes:

> On 7/1/05, gritsch@iue.tuwien.ac.at <gritsch@iue.tuwien.ac.at> wrote:
>
>> Can you give me any advice how to achive this?
>
> Try
>
>   C-x RET c utf-16-le RET
>
> (that's `universal-coding-system-argument'), then:
>
> C-x C-f yourfile RET
>
> and you should be able to edit the file.

Shouldn't the utf-x files with signature be quite in front of the list
of detected coding systems?  I mean, that's what the signature is good
for in the first place, right?

-- 
David Kastrup, Kriemhildstr. 15, 44793 Bochum

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Editing exportet registry files
  2005-07-01 17:45       ` David Kastrup
@ 2005-07-01 18:01         ` Juanma Barranquero
  2005-07-01 18:52           ` Gaëtan LEURENT
  2005-07-01 22:12           ` Jason Rumney
  2005-07-03 19:58         ` Coding system priority (was: Editing exportet registry files) Gaëtan LEURENT
  1 sibling, 2 replies; 19+ messages in thread
From: Juanma Barranquero @ 2005-07-01 18:01 UTC (permalink / raw)
  Cc: emacs-devel

On 7/1/05, David Kastrup <dak@gnu.org> wrote:

> Shouldn't the utf-x files with signature be quite in front of the list
> of detected coding systems?  I mean, that's what the signature is good
> for in the first place, right?

Well, yeah, but FF and FE *are* valid characters in many encodings.
Latin encodings for most european language environments are going to
be higher up the priority list, for example. It makes no sense putting
utf-* encodings before the others unless you know beforehand that
you're going to deal with a lot of these files.

And there's an UTF-8 language environment, after all, for people who
routinely deals with utf-8 data. Are there environments (operating
systems, language environments, whatever) where utf-16 is the norm?

-- 
                    /L/e/k/t/u

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Editing exportet registry files
  2005-07-01 18:01         ` Juanma Barranquero
@ 2005-07-01 18:52           ` Gaëtan LEURENT
  2005-07-01 23:25             ` Juanma Barranquero
  2005-07-01 22:12           ` Jason Rumney
  1 sibling, 1 reply; 19+ messages in thread
From: Gaëtan LEURENT @ 2005-07-01 18:52 UTC (permalink / raw)



Juanma Barranquero wrote on 01 Jul 2005 20:01:14 +0200:

>> Shouldn't the utf-x files with signature be quite in front of the list
>> of detected coding systems?  I mean, that's what the signature is good
>> for in the first place, right?
>
> Well, yeah, but FF and FE *are* valid characters in many encodings.
> Latin encodings for most european language environments are going to
> be higher up the priority list, for example.

Well, a Latin-1 file is quite unlikely to begin with « þÿ » or « ÿþ ».

> Are there environments (operating systems, language environments,
> whatever) where utf-16 is the norm?

I think it's the case on MS Windows.

-- 
Gaëtan LEURENT

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Editing exportet registry files
  2005-07-01 18:01         ` Juanma Barranquero
  2005-07-01 18:52           ` Gaëtan LEURENT
@ 2005-07-01 22:12           ` Jason Rumney
  2005-07-01 23:38             ` Juanma Barranquero
  2005-07-03 19:52             ` Gaëtan LEURENT
  1 sibling, 2 replies; 19+ messages in thread
From: Jason Rumney @ 2005-07-01 22:12 UTC (permalink / raw)
  Cc: emacs-devel

Juanma Barranquero <lekktu@gmail.com> writes:

> Well, yeah, but FF and FE *are* valid characters in many encodings.

How common is it to have FF FE or FE FF as the first two characters in
text in any other encoding? Is it acceptable for Emacs to ignore the
most common case where those two bytes will appear in sequence as the
first two bytes of a file, because of some theoretical worry that it
might break a hypothetical case that I suspect will only exist in real
life if someone deliberately sets out to break auto-detection.

> Latin encodings for most european language environments are going to
> be higher up the priority list, for example. It makes no sense putting
> utf-* encodings before the others unless you know beforehand that
> you're going to deal with a lot of these files.

Nonsense. It is very unlikely that UTF-16-LE-WITH-SIGNATURE,
UTF-16-BE-WITH-SIGNATURE, or even UTF-8 will falsely match any Latin
(or cyrillic or probably Asian) encoding. They should be at the front
of the list.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Editing exportet registry files
  2005-07-01 18:52           ` Gaëtan LEURENT
@ 2005-07-01 23:25             ` Juanma Barranquero
  0 siblings, 0 replies; 19+ messages in thread
From: Juanma Barranquero @ 2005-07-01 23:25 UTC (permalink / raw)


On 7/1/05, Gaëtan LEURENT <gaetan.leurent@ens.fr> wrote:

> Well, a Latin-1 file is quite unlikely to begin with «þÿ» or «ÿþ».

Sure. But "quite unlikely" is not "imposible".

> I think it's the case on MS Windows.

Well, I use MS Windows and most of my files are ASCII, Latin-1,
Latin-9, a few UTF-8. Other than the .REG ones, I'm not aware of any
files that I use frequently (system files or other) which are encoded
as UTF-16.

-- 
                    /L/e/k/t/u

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Editing exportet registry files
  2005-07-01 22:12           ` Jason Rumney
@ 2005-07-01 23:38             ` Juanma Barranquero
  2005-07-02 11:50               ` Andreas Schwab
  2005-07-03 19:52             ` Gaëtan LEURENT
  1 sibling, 1 reply; 19+ messages in thread
From: Juanma Barranquero @ 2005-07-01 23:38 UTC (permalink / raw)
  Cc: emacs-devel

On 7/2/05, Jason Rumney <jasonr@gnu.org> wrote:
> Juanma Barranquero <lekktu@gmail.com> writes:

> How common is it to have FF FE or FE FF as the first two characters in
> text in any other encoding?

Pretty uncommon. I haven't said otherwise. I'm just pointing out what
I suppose is the original reason to not putting the utf-16 encodings
higher up on the list.

> because of some theoretical worry that it
> might break a hypothetical case that I suspect will only exist in real
> life if someone deliberately sets out to break auto-detection.

I've not checked other encodings. Did you? Are you really sure that
all other frequently used 8-bit encodings put uncommon characters for
0xFF and 0xFE? Because the fact that they aren't ASCII doesn't mean
that they are infrequent in the target language.

> Nonsense. It is very unlikely that UTF-16-LE-WITH-SIGNATURE,
> UTF-16-BE-WITH-SIGNATURE, or even UTF-8 will falsely match any Latin
> (or cyrillic or probably Asian) encoding.

I lack the confidence that you apparently have. I suppose you're
better informed than me (I'm not being facetious). So just change it,
or propose it to be changed. (And perhaps it'd be wise to hear what
Handa-san thinks about it.)

-- 
                    /L/e/k/t/u

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Editing exportet registry files
  2005-07-01 23:38             ` Juanma Barranquero
@ 2005-07-02 11:50               ` Andreas Schwab
  2005-07-02 14:33                 ` Juanma Barranquero
  0 siblings, 1 reply; 19+ messages in thread
From: Andreas Schwab @ 2005-07-02 11:50 UTC (permalink / raw)
  Cc: emacs-devel, Jason Rumney

Juanma Barranquero <lekktu@gmail.com> writes:

> I've not checked other encodings. Did you? Are you really sure that
> all other frequently used 8-bit encodings put uncommon characters for
> 0xFF and 0xFE? Because the fact that they aren't ASCII doesn't mean
> that they are infrequent in the target language.

Among the 8-bit encodings supported by glibc there are IMHO no encodings
which have a significant probability of being misdetected as UTF-16.  They
either don't define both code points, or define them to characters that
are very unlikely to occur next to each other, let alone at the start of a
file.

Andreas.

-- 
Andreas Schwab, SuSE Labs, schwab@suse.de
SuSE Linux Products GmbH, Maxfeldstraße 5, 90409 Nürnberg, Germany
Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Editing exportet registry files
  2005-07-02 11:50               ` Andreas Schwab
@ 2005-07-02 14:33                 ` Juanma Barranquero
  0 siblings, 0 replies; 19+ messages in thread
From: Juanma Barranquero @ 2005-07-02 14:33 UTC (permalink / raw)
  Cc: emacs-devel

On 7/2/05, Andreas Schwab <schwab@suse.de> wrote:
> Juanma Barranquero <lekktu@gmail.com> writes:

> Among the 8-bit encodings supported by glibc there are IMHO no encodings
> which have a significant probability of being misdetected as UTF-16.

Well, if you're right that would settle the issue, wouldn't it?
Propose it changed and let people opine.

-- 
                    /L/e/k/t/u

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Editing exportet registry files
  2005-07-01 22:12           ` Jason Rumney
  2005-07-01 23:38             ` Juanma Barranquero
@ 2005-07-03 19:52             ` Gaëtan LEURENT
  2005-07-03 21:34               ` Jason Rumney
  2005-07-04  7:43               ` Kaloian Doganov
  1 sibling, 2 replies; 19+ messages in thread
From: Gaëtan LEURENT @ 2005-07-03 19:52 UTC (permalink / raw)
  Cc: Juanma Barranquero, emacs-devel


Jason Rumney wrote on 02 Jul 2005 00:12:53 +0200:

> Nonsense. It is very unlikely that UTF-16-LE-WITH-SIGNATURE,
> UTF-16-BE-WITH-SIGNATURE, or even UTF-8 will falsely match any Latin
> (or cyrillic or probably Asian) encoding. They should be at the front
> of the list.

For UTF-16 with signature, I agree, but UTF-8 could sometimes match a
Latin-1 file. For instance, "4×½=2" encoded in Latin-1 is valid as a
UTF-8 string. A friend of mine suggested "Try our new exclusive WAZA®
for just $0.02!" which is even meaningful in both cases.

-- 
Gaëtan LEURENT

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Coding system priority (was: Editing exportet registry files)
  2005-07-01 17:45       ` David Kastrup
  2005-07-01 18:01         ` Juanma Barranquero
@ 2005-07-03 19:58         ` Gaëtan LEURENT
  1 sibling, 0 replies; 19+ messages in thread
From: Gaëtan LEURENT @ 2005-07-03 19:58 UTC (permalink / raw)
  Cc: Juanma Barranquero, gritsch@iue.tuwien.ac.at, emacs-devel


David Kastrup wrote on 01 Jul 2005 19:45:32 +0200:

> Shouldn't the utf-x files with signature be quite in front of the list
> of detected coding systems?  I mean, that's what the signature is good
> for in the first place, right?

By the way, what is the correct method for a user to put
utf-16-le-with-signature on top of the list? prefer-coding-system has
some bad side effects (it changes I/O coding system), but the docstring
of coding-category-list says « Don't modify this variable directly, but
use `set-coding-priority'. ».

-- 
Gaëtan LEURENT

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Editing exportet registry files
  2005-07-03 19:52             ` Gaëtan LEURENT
@ 2005-07-03 21:34               ` Jason Rumney
  2005-07-03 21:48                 ` David Kastrup
  2005-07-04  7:43               ` Kaloian Doganov
  1 sibling, 1 reply; 19+ messages in thread
From: Jason Rumney @ 2005-07-03 21:34 UTC (permalink / raw)
  Cc: Juanma Barranquero

gaetan.leurent@ens.fr (Gaëtan LEURENT) writes:

> For UTF-16 with signature, I agree, but UTF-8 could sometimes match a
> Latin-1 file. For instance, "4×½=2" encoded in Latin-1 is valid as a
> UTF-8 string. A friend of mine suggested "Try our new exclusive WAZA®
> for just $0.02!" which is even meaningful in both cases.

Coming up with isolated theoretical problem cases should not stop us
from doing what is correct in the other 99% of cases.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Editing exportet registry files
  2005-07-03 21:34               ` Jason Rumney
@ 2005-07-03 21:48                 ` David Kastrup
  0 siblings, 0 replies; 19+ messages in thread
From: David Kastrup @ 2005-07-03 21:48 UTC (permalink / raw)
  Cc: Juanma Barranquero, emacs-devel

Jason Rumney <jasonr@gnu.org> writes:

> gaetan.leurent@ens.fr (Gaëtan LEURENT) writes:
>
>> For UTF-16 with signature, I agree, but UTF-8 could sometimes match
>> a Latin-1 file. For instance, "4×½=2" encoded in Latin-1 is valid
>> as a UTF-8 string. A friend of mine suggested "Try our new
>> exclusive WAZA® for just $0.02!" which is even meaningful in both
>> cases.
>
> Coming up with isolated theoretical problem cases should not stop us
> from doing what is correct in the other 99% of cases.

I think Gaëtan is arguing that we should not prefer UTF-8 in a Latin-1
locale.  This is pretty much a red herring: we were discussing the
UTF-16-with-signature encodings: there is no necessity whatsoever to
group their priority with UTF-8.

I agree that in a Latin-1 locale, Latin-1 should be preferred over
UTF-8 and vice versa as long as the buffers can be interpreted as
being valid in both encodings.

-- 
David Kastrup, Kriemhildstr. 15, 44793 Bochum

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Editing exportet registry files
  2005-07-03 19:52             ` Gaëtan LEURENT
  2005-07-03 21:34               ` Jason Rumney
@ 2005-07-04  7:43               ` Kaloian Doganov
  1 sibling, 0 replies; 19+ messages in thread
From: Kaloian Doganov @ 2005-07-04  7:43 UTC (permalink / raw)
  Cc: lekktu, emacs-devel, jasonr


> For UTF-16 with signature, I agree, but UTF-8 could sometimes match a
> Latin-1 file.

I would like to stress that. Latin-1 (ISO-8859-1) is a superset of
US-ASCII. The first 128 characters are basically US-ASCII. On the other
hand, although UTF-8 is a variable length encoding, it is designed to
match US-ASCII in it's first 128 characters (Unicode range U+0000 to
U+007F). These characters are encoded as single bytes in UTF-8.

So, every single US-ASCII file out there is a valid UTF-8 file. This is
one of the features of UTF-8.

And for historical reasons, every US-ASCII file is a valid Latin-1 file.


-- 
Поздрави,
Калоян Доганов,
Сдружение "Свободен софтуер".
___________________________________________________________
Ако не отговарям на писмата Ви: http://6lyokavitza.org/mail

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2005-07-04  7:43 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-06-30 20:27 Editing exportet registry files Markus Gritsch
2005-06-30 21:23 ` Stefan Monnier
2005-07-01  7:12   ` gritsch
2005-07-01  8:17     ` Miles Bader
2005-07-01  8:24     ` Juanma Barranquero
2005-07-01 17:45       ` David Kastrup
2005-07-01 18:01         ` Juanma Barranquero
2005-07-01 18:52           ` Gaëtan LEURENT
2005-07-01 23:25             ` Juanma Barranquero
2005-07-01 22:12           ` Jason Rumney
2005-07-01 23:38             ` Juanma Barranquero
2005-07-02 11:50               ` Andreas Schwab
2005-07-02 14:33                 ` Juanma Barranquero
2005-07-03 19:52             ` Gaëtan LEURENT
2005-07-03 21:34               ` Jason Rumney
2005-07-03 21:48                 ` David Kastrup
2005-07-04  7:43               ` Kaloian Doganov
2005-07-03 19:58         ` Coding system priority (was: Editing exportet registry files) Gaëtan LEURENT
2005-07-01  8:28   ` Editing exportet registry files gritsch

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).