File Encoding Issue on Windows

unofficial mirror of help-gnu-emacs@gnu.org
 help / color / mirror / Atom feed

* File Encoding Issue on Windows
@ 2013-03-12  3:08 Tech Stuff
  2013-03-12 10:50 ` Peter Dyballa
  0 siblings, 1 reply; 24+ messages in thread
From: Tech Stuff @ 2013-03-12  3:08 UTC (permalink / raw)
  To: help-gnu-emacs@gnu.org

[-- Attachment #1: Type: text/plain, Size: 1580 bytes --]

Hi,

I use Emacs on Windows to write Spanish text.  In order to enter the special characters I installed the US international keyboard layout.  I never changed any variables in Emacs (in fact, I don't think that I've ever even touched the .emacs file on this machine) and things have just always worked fine.  Last week however, when I opened one of my recent files in notepad to print it out, there were some extra characters, one before every extended character (ie, é) in the file.  I googled around and I made some changes (unfortunately, I don't recall exactly what they were, but I think that I explictly saved the files in UTF-8) and I thought that all was good.  Not so much though.  Today I opened one of the files in Emacs and again the special characters are incorrect, though this time in new and interesting ways.  Here's an example of what I see in the buffer:

 Â¿En quÃ© fecha llegaron

when I should see:

¿En qué fecha llegaron

(hopefully these things post correctly to the mail lists)

Obviously this has something to do with encoding (though I can't imagine why it started all of the sudden) but I'm afraid that I'm out of my depth.  I only want to be able to write text in emacs and save it such that it can subsequently be opened both in ms notepad and emacs again and have all of the characters render correctly.  Can anyone point me in the right direction?  I'd be happy to post any information you might need to diagnose this.  Here is the version information:

GNU Emacs 22.1.1 (i386-mingw-nt5.1.2600) of 2007-06-02 on RELEASE

Thanks!

[-- Attachment #2: Type: text/html, Size: 1842 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: File Encoding Issue on Windows
  2013-03-12  3:08 File Encoding Issue on Windows Tech Stuff
@ 2013-03-12 10:50 ` Peter Dyballa
  2013-03-12 14:57   ` Tech Stuff
  0 siblings, 1 reply; 24+ messages in thread
From: Peter Dyballa @ 2013-03-12 10:50 UTC (permalink / raw)
  To: Tech Stuff; +Cc: help-gnu-emacs@gnu.org

Am 12.03.2013 um 04:08 schrieb Tech Stuff:

>  Â¿En quÃ© fecha llegaron
> 
> when I should see:
> 
> ¿En qué fecha llegaron

The first line encodes the text of the last line in UTF-8 encoding, but is displayed to you in a different, an 8-bit encoding. In UTF-8 more than one byte, more than 8 bits, are used to encode the characters. Only the characters of the US-ASCII range (U+0001 - U+007E), i.e. the digits, non-accented characters, punctuation, are encoded by one byte.

The character ¿, INVERTED QUESTION MARK, U+00BF, is encoded in UTF-8 as two bytes: C2BF. These two bytes are in Notepad interpreted as some Latin or MS Windows encoding, i.e. as two different characters, as Â and as ¿, which are then displayed as such.

The character é, LATIN SMALL LETTER E WITH ACUTE, U+00E9, is encoded in UTF-8 as two bytes: C3A9. These two bytes are in Notepad interpreted as some Latin or MS Windows encoding, i.e. as two different characters and then displayed as Ã and as ©.

In MS Windows code page CP1252 uses for encoding:

	A9 = ©, COPYRIGHT SIGN
	BF = ¿, INVERTED QUESTION MARK
	C2 = Â, LATIN CAPITAL LETTER A WITH CIRCUMFLEX
	C3 = Ä, LATIN CAPITAL LETTER A WITH DIAERESIS

So Notepad is using this code page, CP1252, to display the UTF-8 encoded file. What you need to do is to tell Notepad to use UTF-8.

--
Greetings

  Pete

Give a man a fish, and you've fed him for a day. Teach him to fish, and you've depleted the lake.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: File Encoding Issue on Windows
  2013-03-12 10:50 ` Peter Dyballa
@ 2013-03-12 14:57   ` Tech Stuff
  2013-03-12 16:32     ` W. Greenhouse
  2013-03-12 17:23     ` Peter Dyballa
  0 siblings, 2 replies; 24+ messages in thread
From: Tech Stuff @ 2013-03-12 14:57 UTC (permalink / raw)
  To: Peter Dyballa; +Cc: help-gnu-emacs@gnu.org

[-- Attachment #1: Type: text/plain, Size: 2427 bytes --]

Hi Peter,

Thanks for taking the time to reply.  Though it was useful, I'm still confused about how to resolve this issue.  To be clear, when I posted yesterday, it was in emacs that I was seeing the extraneous characters, not in notepad.  However I just opened it again in notepad to check on the encoding and now I'm seeing the extra characters there as well.  So something must have changed when as part of trying to figure out what was going on, I saved the file in Emacs.  Emacs seems to be the culprit.  Is there something that I can put in my .emacs to tell it to save automatically in utf-8?  Or am I maybe still not understanding things.

Thanks again.

-ts1971  

________________________________
 From: Peter Dyballa <Peter_Dyballa@Web.DE>
To: Tech Stuff <techstuff1971@yahoo.com> 
Cc: "help-gnu-emacs@gnu.org" <help-gnu-emacs@gnu.org> 
Sent: Tuesday, March 12, 2013 3:50 AM
Subject: Re: File Encoding Issue on Windows

Am 12.03.2013 um 04:08 schrieb Tech Stuff:

>  Â¿En quÃ© fecha llegaron
> 
> when I should see:
> 
> ¿En qué fecha llegaron

The first line encodes the text of the last line in UTF-8 encoding, but is displayed to you in a different, an 8-bit encoding. In UTF-8 more than one byte, more than 8 bits, are used to encode the characters. Only the characters of the US-ASCII range (U+0001 - U+007E), i.e. the digits, non-accented characters, punctuation, are encoded by one byte.

The character ¿, INVERTED QUESTION MARK, U+00BF, is encoded in UTF-8 as two bytes: C2BF. These two bytes are in Notepad interpreted as some Latin or MS Windows encoding, i.e. as two different characters, as Â and as ¿, which are then displayed as such.

The character é, LATIN SMALL LETTER E WITH ACUTE, U+00E9, is encoded in UTF-8 as two bytes: C3A9. These two bytes are in Notepad interpreted as some Latin or MS Windows encoding, i.e. as two different characters and then displayed as Ã and as ©.

In MS Windows code page CP1252 uses for encoding:

    A9 = ©, COPYRIGHT SIGN
    BF = ¿, INVERTED QUESTION MARK
    C2 = Â, LATIN CAPITAL LETTER A WITH CIRCUMFLEX
    C3 = Ä, LATIN CAPITAL LETTER A WITH DIAERESIS

So Notepad is using this code page, CP1252, to display the UTF-8 encoded file. What you need to do is to tell Notepad to use UTF-8.

--
Greetings

  Pete

Give a man a fish, and you've fed him for a day. Teach him to fish, and you've depleted the lake.

[-- Attachment #2: Type: text/html, Size: 3323 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: File Encoding Issue on Windows
  2013-03-12 14:57   ` Tech Stuff
@ 2013-03-12 16:32     ` W. Greenhouse
  2013-03-13 17:44       ` Tech Stuff
  2013-03-12 17:23     ` Peter Dyballa
  1 sibling, 1 reply; 24+ messages in thread
From: W. Greenhouse @ 2013-03-12 16:32 UTC (permalink / raw)
  To: help-gnu-emacs-mXXj517/zsQ

Hi,

Tech Stuff <techstuff1971-/E1597aS9LQAvxtiuMwx3w@public.gmane.org> writes:

> Hi Peter,
>
> Thanks for taking the time to reply.  Though it was useful, I'm still
> confused about how to resolve this issue.  To be clear, when I posted
> yesterday, it was in emacs that I was seeing the extraneous
> characters, not in notepad.  However I just opened it again in
> notepad to check on the encoding and now I'm seeing the extra
> characters there as well.  So something must have changed when as
> part of trying to figure out what was going on, I saved the file in
> Emacs.  Emacs seems to be the culprit.  Is there something that I can
> put in my .emacs to tell it to save automatically in utf-8?  Or am I
> maybe still not understanding things.
>
> Thanks again.
>
> -ts1971 

The following should unequivocally set utf-8 in all relevant contexts:

(setq locale-coding-system 'utf-8)
(set-terminal-coding-system 'utf-8)
(set-keyboard-coding-system 'utf-8)
(set-selection-coding-system 'utf-8)
(prefer-coding-system 'utf-8)

The above is tested by me only as far back as Emacs 23.  On an unrelated
note, however, you should consider upgrading if you possibly can;
according to http://www.gnu.org/software/emacs/#Releases the newest
Emacs 22 is nearly 5 years old already.


-- 
Regards,
WGG




^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: File Encoding Issue on Windows
  2013-03-12 14:57   ` Tech Stuff
  2013-03-12 16:32     ` W. Greenhouse
@ 2013-03-12 17:23     ` Peter Dyballa
  1 sibling, 0 replies; 24+ messages in thread
From: Peter Dyballa @ 2013-03-12 17:23 UTC (permalink / raw)
  To: Tech Stuff; +Cc: help-gnu-emacs@gnu.org

Am 12.03.2013 um 15:57 schrieb Tech Stuff:

> To be clear, when I posted yesterday, it was in emacs that I was seeing the extraneous characters, not in notepad.

That makes it clearer! GNU Emacs 22.1 is pretty old and has only rudimentary UTF-8 support. The recent GNU Emacs 24.x (24.3. was just released) versions are much better.

How to handle encodings? For GNU Emacs you can use file-local variables as in:

	;; -*- mode: Emacs-Lisp; coding: utf-8-unix; -*- 

or

	%%% Local Variables:
	%%% mode: LaTeX
	%%% TeX-engine: xetex
	%%% fill-column: 99999
	%%% coding: utf-8-unix
	%%% TeX-command-default: "XeLaTeX5E"
	%%% End:
	%

The first example is meant for the file's beginning, the latter for its end. (Notice the different comment characters!) When you see, i.e. know, that GNU Emacs is using the wrong encoding to display (present) the file's contents, you can use C-x RET r <encoding name> RET (revert-buffer-with-coding-system) to try another encoding. Then record this value in a file-local variable and save the file.

It's all in the documentation.

--
Greetings

  Pete

Rain is saved up in cloud banks.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: File Encoding Issue on Windows
       [not found] <mailman.21917.1363080184.855.help-gnu-emacs@gnu.org>
@ 2013-03-13 12:33 ` Phoenix Gris
  2013-03-13 14:48   ` Peter Dyballa
                     ` (3 more replies)
  0 siblings, 4 replies; 24+ messages in thread
From: Phoenix Gris @ 2013-03-13 12:33 UTC (permalink / raw)
  To: help-gnu-emacs; +Cc: help-gnu-emacs@gnu.org, Tech Stuff

Hello,

I am using Emacs, 23.3 on three different machines: Ms-Windows XP, RedHat Linux sever and Xunbuntu PC, I have the same problems on the three machines.  It is not all the time, but most of the time, and this is not between different software, it is Emacs itself that doesn't display in the right encoding.  Most of my writings are in French with all its accents.  Most of the time I have to do «C-x RET f» to select the encoding to save the file, then «C-x RET r» to revert it, I find this veryyyyyyyyyyy tedius!!!

Similarly, setting file local variables for each file is very tedius when you navigate through lors of files!!!

On one of the machine I have this in my .emacs file, but it doesn't do the trick!!!

(setq buffer-file-coding-system 'utf-8-unix)
(setq file-name-coding-system 'utf-8-unix)
(setq default-keyboard-coding-system 'utf-8-unix)
(setq default-process-coding-system '(utf-8-unix . utf-8-unix))
(setq default-sendmail-coding-system 'utf-8-unix)
(setq default-terminal-coding-system 'utf-8-unix)
(setq buffer-file-coding-system 'utf-8-unix)

The process-coding-system is because I interact with statistical softwares through Emacs, some data is characters and it as accents!!!

Is there something that could be put in .emacs files so the encoding is utf-8 for everything, NO MATTER WHAT???

Thanks,

Gérald

Le lundi 11 mars 2013 23:08:46 UTC-4, Tech Stuff a écrit :
> Hi,
> 
> I use Emacs on Windows to write Spanish text.  In order to enter the special characters I installed the US international keyboard layout.  I never changed any variables in Emacs (in fact, I don't think that I've ever even touched the .emacs file on this machine) and things have just always worked fine.  Last week however, when I opened one of my recent files in notepad to print it out, there were some extra characters, one before every extended character (ie, é) in the file.  I googled around and I made some changes (unfortunately, I don't recall exactly what they were, but I think that I explictly saved the files in UTF-8) and I thought that all was good.  Not so much though.  Today I opened one of the files in Emacs and again the special characters are incorrect, though this time in new
>  and interesting ways.  Here's an example of what I see in the buffer:
> 
>  Â¿En quÃ© fecha llegaron
> 
> when I should see:
> 
> ¿En qué fecha llegaron
> 
> (hopefully these things post correctly to the mail lists)
> 
> 
> Obviously this has something to do with encoding (though I can't imagine why it started all of the sudden) but I'm afraid that I'm out of my depth.  I only want to be able to write text in emacs and save it such that it can subsequently be opened both in ms notepad and emacs again and have all of the characters render correctly.  Can anyone point me in the right direction?  I'd be happy to post any information you might need to diagnose this.  Here is the version information:
> 
> GNU Emacs 22.1.1 (i386-mingw-nt5.1.2600) of 2007-06-02 on RELEASE
> 
> Thanks!

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: File Encoding Issue on Windows
  2013-03-13 12:33 ` Phoenix Gris
@ 2013-03-13 14:48   ` Peter Dyballa
  2013-03-13 15:29   ` Filipp Gunbin
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 24+ messages in thread
From: Peter Dyballa @ 2013-03-13 14:48 UTC (permalink / raw)
  To: Phoenix Gris; +Cc: help-gnu-emacs, Tech Stuff

Am 13.03.2013 um 13:33 schrieb Phoenix Gris:

> Is there something that could be put in .emacs files so the encoding is utf-8 for everything, NO MATTER WHAT???

	(prefer-coding-system	'utf-8)

might already work, but GNU Emacs is quite clever and tries to learn from the environment in which it runs. The values of environment variables like LANG or LC_CTYPE or LC_ALL could override the ELisp setting.

--
Greetings

  Pete

If you're not confused, you're not paying attention.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: File Encoding Issue on Windows
  2013-03-13 12:33 ` Phoenix Gris
  2013-03-13 14:48   ` Peter Dyballa
@ 2013-03-13 15:29   ` Filipp Gunbin
  2013-03-13 17:16   ` Eli Zaretskii
  2013-03-13 20:33   ` Stefan Monnier
  3 siblings, 0 replies; 24+ messages in thread
From: Filipp Gunbin @ 2013-03-13 15:29 UTC (permalink / raw)
  To: Phoenix Gris; +Cc: help-gnu-emacs@gnu.org

On 13/03/2013 16:33 +0400, Phoenix Gris wrote:

> Is there something that could be put in .emacs files so the encoding
> is utf-8 for everything, NO MATTER WHAT???

Try `(set-language-environment "UTF-8")'.

Also, see (info "(emacs) Recognize Coding").

Probably you'll also want to set `default-input-method' which you will
switch to with `C-\' to enter French. However, I can't tell what value
is right for you. Check `C-h I TAB' to see available choices.

Filipp



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: File Encoding Issue on Windows
  2013-03-13 12:33 ` Phoenix Gris
  2013-03-13 14:48   ` Peter Dyballa
  2013-03-13 15:29   ` Filipp Gunbin
@ 2013-03-13 17:16   ` Eli Zaretskii
  2013-03-13 20:33   ` Stefan Monnier
  3 siblings, 0 replies; 24+ messages in thread
From: Eli Zaretskii @ 2013-03-13 17:16 UTC (permalink / raw)
  To: help-gnu-emacs

> Date: Wed, 13 Mar 2013 05:33:01 -0700 (PDT)
> From: Phoenix Gris <phoenixgris@gmail.com>
> Injection-Date: Wed, 13 Mar 2013 12:33:01 +0000
> Cc: "help-gnu-emacs@gnu.org" <help-gnu-emacs@gnu.org>,
> 	Tech Stuff <techstuff1971@yahoo.com>
> 
> (setq buffer-file-coding-system 'utf-8-unix)

You should use setq-default here, not setq.



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: File Encoding Issue on Windows
  2013-03-12 16:32     ` W. Greenhouse
@ 2013-03-13 17:44       ` Tech Stuff
  2013-03-13 20:37         ` Peter Dyballa
  0 siblings, 1 reply; 24+ messages in thread
From: Tech Stuff @ 2013-03-13 17:44 UTC (permalink / raw)
  To: W. Greenhouse, help-gnu-emacs@gnu.org

[-- Attachment #1: Type: text/plain, Size: 3489 bytes --]

Thanks for that.  I've taken your suggestions and added the line suggested by eli in a separate email.  My .emacs looks like this now:

// being .emacs
(message "TS1971 Hello World from .emacs")

(setq locale-coding-system 'utf-8)
(set-terminal-coding-system 'utf-8)
(set-keyboard-coding-system 'utf-8)
(set-selection-coding-system 'utf-8)
(prefer-coding-system 'utf-8)

(setq-default buffer-file-coding-system 'utf-8)
// end .emacs

I also upgraded to version - GNU Emacs 24.2.1 (i386-mingw-nt5.1.2600) of 2012-08-28 on MARVIN.  Someone had earlier mentioned something about version 24.3 but 24.2 was the newest I saw on the GNU download site.  In any event, I'm hoping that these .emcas settings will solve my problems going forward.  

I'm still a bit stuck though with trying to use global search-replace to fix the files that are already hosed.  Here again is an example of the error I'm trying to fix:

Â¿CuÃ¡ntos dÃas se quedaron?

where I obviously want to repalce 'Â¿' with '¿' and 'Ã' with 'í'. Doug suggested earlier that I learn about input methods and I've spent some time reading about them but I still don't know how to accomplish what I'm trying to do.  For instance describe-char tells me that the first character is codepoint 194.  Given that, how do I know which of the input methods I need to use to enter the character?  Also a probably related question.  I enter the group of extended characters neccesary for spanish by toggling on the MS International Keyboard layout.  So in order to enter 'é' for instance I type the sequence ''e' - that is a single quote followed by an 'e'.  This just works everywhere, including in the emacs edit buffer.  It doesn't work however in the emacs mini-buffer when I try to do search-replaces.  What's the solution here?  I suppose that it's input methods again, but again, I have no idea which input method I should be choosing.  

Can someone point me in the right direction here?

Thanks.

________________________________
 From: W. Greenhouse <wgreenhouse@riseup.net>
To: help-gnu-emacs@gnu.org 
Sent: Tuesday, March 12, 2013 9:32 AM
Subject: Re: File Encoding Issue on Windows

Hi,

Tech Stuff <techstuff1971@yahoo.com> writes:

> Hi Peter,
>
> Thanks for taking the time to reply.  Though it was useful, I'm still
> confused about how to resolve this issue.  To be clear, when I posted
> yesterday, it was in emacs that I was seeing the extraneous
> characters, not in notepad.  However I just opened it again in
> notepad to check on the encoding and now I'm seeing the extra
> characters there as well.  So something must have changed when as
> part of trying to figure out what was going on, I saved the file in
> Emacs.  Emacs seems to be the culprit.  Is there something that I can
> put in my .emacs to tell it to save automatically in utf-8?  Or am I
> maybe still not understanding things.
>
> Thanks again.
>
> -ts1971 

The following should unequivocally set utf-8 in all relevant contexts:

(setq locale-coding-system 'utf-8)
(set-terminal-coding-system 'utf-8)
(set-keyboard-coding-system 'utf-8)
(set-selection-coding-system 'utf-8)
(prefer-coding-system 'utf-8)

The above is tested by me only as far back as Emacs 23.  On an unrelated
note, however, you should consider upgrading if you possibly can;
according to http://www.gnu.org/software/emacs/#Releases the newest
Emacs 22 is nearly 5 years old already.

-- 
Regards,
WGG

[-- Attachment #2: Type: text/html, Size: 4862 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: File Encoding Issue on Windows
  2013-03-13 12:33 ` Phoenix Gris
                     ` (2 preceding siblings ...)
  2013-03-13 17:16   ` Eli Zaretskii
@ 2013-03-13 20:33   ` Stefan Monnier
  3 siblings, 0 replies; 24+ messages in thread
From: Stefan Monnier @ 2013-03-13 20:33 UTC (permalink / raw)
  To: help-gnu-emacs

> On one of the machine I have this in my .emacs file, but it doesn't do
> the trick!!!

> (setq buffer-file-coding-system 'utf-8-unix)
> (setq file-name-coding-system 'utf-8-unix)
> (setq default-keyboard-coding-system 'utf-8-unix)
> (setq default-process-coding-system '(utf-8-unix . utf-8-unix))
> (setq default-sendmail-coding-system 'utf-8-unix)
> (setq default-terminal-coding-system 'utf-8-unix)
> (setq buffer-file-coding-system 'utf-8-unix)

I recommend you throw away most/all of those settings, then when a file
is mis-recognized, tell us about it via M-x report-emacs-bug (which
will give us additional info about your locale settings and things like
that), ideally attaching the file to the bug-report (so it's better if
you can reproduce the problem on a small file with no private
information).

Nowadays, Emacs should recognize a utf-8 file in most circumstances,

        Stefan

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: File Encoding Issue on Windows
  2013-03-13 17:44       ` Tech Stuff
@ 2013-03-13 20:37         ` Peter Dyballa
  2013-03-13 21:11           ` Tech Stuff
  0 siblings, 1 reply; 24+ messages in thread
From: Peter Dyballa @ 2013-03-13 20:37 UTC (permalink / raw)
  To: Tech Stuff; +Cc: W. Greenhouse, help-gnu-emacs@gnu.org

Am 13.03.2013 um 18:44 schrieb Tech Stuff:

> Here again is an example of the error I'm trying to fix:
> 
> 
> 
> Â¿CuÃ¡ntos dÃas se quedaron?

UTF-8 encoded Unicode text displayed in some code page… C-x RET r allows you to revert, change the encoding.

Input methods probably play no role for you since you can easily insert all the Spanish characters. When you'll start to write in Arabic, Japanese, or such you might need one…

--
Greetings

  Pete

"A TRUE Klingon warrior does not comment his code."

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: File Encoding Issue on Windows
  2013-03-13 20:37         ` Peter Dyballa
@ 2013-03-13 21:11           ` Tech Stuff
  2013-03-13 22:16             ` Peter Dyballa
  0 siblings, 1 reply; 24+ messages in thread
From: Tech Stuff @ 2013-03-13 21:11 UTC (permalink / raw)
  To: Peter Dyballa; +Cc: W. Greenhouse, help-gnu-emacs@gnu.org

[-- Attachment #1: Type: text/plain, Size: 1712 bytes --]

Maybe I'm confused but I don't think that it's that simple.  I think that had I understood what was happening the first time I opened the file I could have reverted the encoding and I would have been okay.  However, I didnt and apparently I saved the files with the wrong encoding.  So now I think that I really have those incorrect characters.  At this point I'm just trying to figure out how to use global search-replace to fix the small number of files that have been corrupted, presumably by saving them in the wrong encoding.  So, for example, I want to be able to repalce all instances of codepoint 195 followed by codepoint 173 with a single instance of codepoint 233.  I can't figure out how to enter those codepoints into the search-replace mini buffer. I can't believe how complicated this has turned out to be.  

Thanks again.

-ts1971

________________________________
 From: Peter Dyballa <Peter_Dyballa@Web.DE>
To: Tech Stuff <techstuff1971@yahoo.com> 
Cc: W. Greenhouse <wgreenhouse@riseup.net>; "help-gnu-emacs@gnu.org" <help-gnu-emacs@gnu.org> 
Sent: Wednesday, March 13, 2013 1:37 PM
Subject: Re: File Encoding Issue on Windows

Am 13.03.2013 um 18:44 schrieb Tech Stuff:

> Here again is an example of the error I'm trying to fix:
> 
> 
> 
> Â¿CuÃ¡ntos dÃas se quedaron?

UTF-8 encoded Unicode text displayed in some code page… C-x RET r allows you to revert, change the encoding.

Input methods probably play no role for you since you can easily insert all the Spanish characters. When you'll start to write in Arabic, Japanese, or such you might need one…

--
Greetings

  Pete

"A TRUE Klingon warrior does not comment his code."

[-- Attachment #2: Type: text/html, Size: 2658 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: File Encoding Issue on Windows
  2013-03-13 21:11           ` Tech Stuff
@ 2013-03-13 22:16             ` Peter Dyballa
  2013-03-13 23:26               ` Tech Stuff
  0 siblings, 1 reply; 24+ messages in thread
From: Peter Dyballa @ 2013-03-13 22:16 UTC (permalink / raw)
  To: Tech Stuff; +Cc: W. Greenhouse, help-gnu-emacs@gnu.org

Am 13.03.2013 um 22:11 schrieb Tech Stuff:

> apparently I saved the files with the wrong encoding.  So now I think that I really have those incorrect characters.

Why? Before your failure the file had 31 bytes contents. In some code page this represents 31 characters, in UTF-8 this represents 29 characters.

When you save a text in UTF-8 encoding in some 8-bit code page *and* *you* *do* *not* *change* *one* *single* *byte* then the file's contents is not changed (because GNU Emacs does not change a single byte). What's changed, for the application that displays this file's contents, is the perspective. Example: as a child on four extremities you could only see from aside the green of a carrot. As a grown-up you can look down on the same green (and know that something with a different colour is below the surface). And when you're dead you'll see what the other colour is.

Same bytes, different perspectives, different (re)presentations for you.

Or consider a series of bit and bytes in a computer's memory. Some computers read the same sequence  and interpret the first eight bits as the Most Significant Byte, others assume it's the Least Significant Byte, one sees that your bank account has a credit, the other sees the debit.

So just try to "switch" through some encodings! And don't forget to watch the mode-line: Does it signal a modified file while switching? And: Does it work to save an unmodified file? (What has this to do with encodings?!)

--
Greetings

  Pete

The day Microsoft makes something that doesn't suck is the day they start selling vacuum cleaners.
				– Ernest Jan Plugge

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: File Encoding Issue on Windows
  2013-03-13 22:16             ` Peter Dyballa
@ 2013-03-13 23:26               ` Tech Stuff
  2013-03-13 23:41                 ` Peter Dyballa
  0 siblings, 1 reply; 24+ messages in thread
From: Tech Stuff @ 2013-03-13 23:26 UTC (permalink / raw)
  To: Peter Dyballa; +Cc: help-gnu-emacs@gnu.org

[-- Attachment #1: Type: text/plain, Size: 2430 bytes --]

Hi Again,

I believe that the bytes on disk *have* changed.  There is no other way to explain that the text used to display correctly in notepad and now doesn't.  In notepad I see the same extraneous / incorrect characters that I see in Emacs.  So I think that I have a correctly utf-8 encoded file which contains some characters that I don't want.  Is there really no way to use global search and replace to replace these codepoints?

-jason

________________________________
 From: Peter Dyballa <Peter_Dyballa@Web.DE>
To: Tech Stuff <techstuff1971@yahoo.com> 
Cc: W. Greenhouse <wgreenhouse@riseup.net>; "help-gnu-emacs@gnu.org" <help-gnu-emacs@gnu.org> 
Sent: Wednesday, March 13, 2013 3:16 PM
Subject: Re: File Encoding Issue on Windows

Am 13.03.2013 um 22:11 schrieb Tech Stuff:

> apparently I saved the files with the wrong encoding.  So now I think that I really have those incorrect characters.

Why? Before your failure the file had 31 bytes contents. In some code page this represents 31 characters, in UTF-8 this represents 29 characters.

When you save a text in UTF-8 encoding in some 8-bit code page *and* *you* *do* *not* *change* *one* *single* *byte* then the file's contents is not changed (because GNU Emacs does not change a single byte). What's changed, for the application that displays this file's contents, is the perspective. Example: as a child on four extremities you could only see from aside the green of a carrot. As a grown-up you can look down on the same green (and know that something with a different colour is below the surface). And when you're dead you'll see what the other colour is.

Same bytes, different perspectives, different (re)presentations for you.

Or consider a series of bit and bytes in a computer's memory. Some computers read the same sequence  and interpret the first eight bits as the Most Significant Byte, others assume it's the Least Significant Byte, one sees that your bank account has a credit, the other sees the debit.

So just try to "switch" through some encodings! And don't forget to watch the mode-line: Does it signal a modified file while switching? And: Does it work to save an unmodified file? (What has this to do with encodings?!)

--
Greetings

  Pete

The day Microsoft makes something that doesn't suck is the day they start selling vacuum cleaners.
                – Ernest Jan Plugge

[-- Attachment #2: Type: text/html, Size: 3266 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: File Encoding Issue on Windows
  2013-03-13 23:26               ` Tech Stuff
@ 2013-03-13 23:41                 ` Peter Dyballa
  2013-03-13 23:48                   ` Tech Stuff
  0 siblings, 1 reply; 24+ messages in thread
From: Peter Dyballa @ 2013-03-13 23:41 UTC (permalink / raw)
  To: Tech Stuff; +Cc: help-gnu-emacs@gnu.org


Am 14.03.2013 um 00:26 schrieb Tech Stuff:

> Is there really no way to use global search and replace to replace these codepoints?

This is the hard way.

When your original files have been corrupted, then try the backup files GNU Emacs generates.

--
Greetings

  Pete

The future will be much better tomorrow.
				– George W. Bush




^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: File Encoding Issue on Windows
  2013-03-13 23:41                 ` Peter Dyballa
@ 2013-03-13 23:48                   ` Tech Stuff
  2013-03-13 23:58                     ` Peter Dyballa
  2013-03-14  0:38                     ` Axel E. Retif
  0 siblings, 2 replies; 24+ messages in thread
From: Tech Stuff @ 2013-03-13 23:48 UTC (permalink / raw)
  To: Peter Dyballa; +Cc: help-gnu-emacs@gnu.org

[-- Attachment #1: Type: text/plain, Size: 861 bytes --]

I don't have the backup files and I'm willing to do it the hard way.  I'm willing to do anything short of actually gonig in and changing every occurrence individually as there are hundreds of them.

-ts1971

________________________________
 From: Peter Dyballa <Peter_Dyballa@Web.DE>
To: Tech Stuff <techstuff1971@yahoo.com> 
Cc: "help-gnu-emacs@gnu.org" <help-gnu-emacs@gnu.org> 
Sent: Wednesday, March 13, 2013 4:41 PM
Subject: Re: File Encoding Issue on Windows

Am 14.03.2013 um 00:26 schrieb Tech Stuff:

> Is there really no way to use global search and replace to replace these codepoints?

This is the hard way.

When your original files have been corrupted, then try the backup files GNU Emacs generates.

--
Greetings

  Pete

The future will be much better tomorrow.
                – George W. Bush

[-- Attachment #2: Type: text/html, Size: 1653 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: File Encoding Issue on Windows
  2013-03-13 23:48                   ` Tech Stuff
@ 2013-03-13 23:58                     ` Peter Dyballa
  2013-03-14  0:38                     ` Axel E. Retif
  1 sibling, 0 replies; 24+ messages in thread
From: Peter Dyballa @ 2013-03-13 23:58 UTC (permalink / raw)
  To: Tech Stuff; +Cc: help-gnu-emacs@gnu.org

[-- Attachment #1: Type: text/plain, Size: 629 bytes --]


Am 14.03.2013 um 00:48 schrieb Tech Stuff:

> I'm willing to do anything short of actually gonig in and changing every occurrence individually as there are hundreds of them.

Here is a file with a comparison of CP1252 to UTF-8. If you want you can add a new column that does not show the HEX values but the actual characters that would used to represent the UTF-8 bytes.

All the missing code points are US-ASCII in both encodings (CP1252 and UTF-8). The bytes and their meanings are equal and the same in both.

--
Greetings

  Pete

Behold the warranty … the bold print giveth and the fine print taketh away.


[-- Attachment #2: CP1252.txt --]
[-- Type: text/plain, Size: 8952 bytes --]

;;; -*- mode: Text; coding: windows-1252-unix; -*-
;
;	Time-stamp: <2006-10-28 22:16:09 pete>
;
;   ANSI Microsoft Windows Codepage
;
;   oct   dec   hex    UCS2    UTF-8
;=====================================
€ = 200 = 128 = 80 = U+20AC = E2 82 AC : EURO SIGN
  = 201 = 129 = 81 = 	      	       	 (UNDEFINED)
‚ = 202 = 130 = 82 = U+201A = E2 80 9A : SINGLE LOW-9 QUOTATION MARK
ƒ = 203 = 131 = 83 = U+0192 =    C6 92 : LATIN SMALL LETTER F WITH HOOK
„ = 204 = 132 = 84 = U+201E = E2 80 9E : DOUBLE LOW-9 QUOTATION MARK
… = 205 = 133 = 85 = U+2026 = E2 80 A6 : HORIZONTAL ELLIPSIS
† = 206 = 134 = 86 = U+2020 = E2 80 A0 : DAGGER
‡ = 207 = 135 = 87 = U+2021 = E2 80 A1 : DOUBLE DAGGER
ˆ = 210 = 136 = 88 = U+005E =       5E : CIRCUMFLEX ACCENT
‰ = 211 = 137 = 89 = U+2030 = E2 80 B0 : PER MILLE SIGN
Š = 212 = 138 = 8A = U+0160 =    C5 A0 : LATIN CAPITAL LETTER S WITH CARON
‹ = 213 = 139 = 8B = U+2039 = E2 80 B9 : SINGLE LEFT-POINTING ANGLE QUOTATION MARK
Œ = 214 = 140 = 8C = U+0152 =    C5 92 : LATIN CAPITAL LIGATURE OE
  = 215 = 141 = 8D = 	    	       	 (UNDEFINED)
Ž = 216 = 142 = 8E = U+017D =    C5 BD : LATIN CAPITAL LETTER Z WITH CARON
  = 217 = 143 = 8F = 	    	       	 (UNDEFINED)
  = 220 = 144 = 90 = 	    	       	 (UNDEFINED)
‘ = 221 = 145 = 91 = U+2018 = E2 80 98 : LEFT SINGLE QUOTATION MARK
’ = 222 = 146 = 92 = U+2019 = E2 80 99 : RIGHT SINGLE QUOTATION MARK
“ = 223 = 147 = 93 = U+201C = E2 80 9C : LEFT DOUBLE QUOTATION MARK
” = 224 = 148 = 94 = U+201D = E2 80 9D : RIGHT DOUBLE QUOTATION MARK
• = 225 = 149 = 95 = U+2022 = E2 80 A2 : BULLET
– = 226 = 150 = 96 = U+2013 = E2 80 92 : EN DASH
— = 227 = 151 = 97 = U+2014 = E2 80 93 : EM DASH
˜ = 230 = 152 = 98 = U+007E =       7E : TILDE
™ = 231 = 153 = 99 = U+2122 = E2 84 A2 : TRADEMARK SIGN
š = 232 = 154 = 9A = U+0161 =    C5 A1 : LATIN SMALL LETTER S WITH CARON
› = 233 = 155 = 9B = U+203A = E2 80 BA : SINGLE RIGHT-POINTING ANGLE QUOTATION MARK
œ = 234 = 156 = 9C = U+0153 =    C5 93 : LATIN SMALL LIGATURE OE
  = 235 = 157 = 9D = 	    	       	 (UNDEFINED)
ž = 236 = 158 = 9E = U+017E =    C5 BE : LATIN SMALL LETTER Z WITH CARON
Ÿ = 237 = 159 = 9F = U+0178 =    C5 B8 : LATIN CAPITAL LETTER Y WITH DIAERESIS
  = 240 = 160 = A0 = U+00A0 =    C2 A0 : NO-BREAK SPACE
¡ = 241 = 161 = A1 = U+00A1 =    C2 A1 : INVERTED EXCLAMATION MARK
¢ = 242 = 162 = A2 = U+00A2 =    C2 A2 : CENT SIGN
£ = 243 = 163 = A3 = U+00A3 =    C2 A3 : POUND SIGN
¤ = 244 = 164 = A4 = U+00A4 =    C2 A4 : CURRENCY SIGN
¥ = 245 = 165 = A5 = U+00A5 =    C2 A5 : YEN SIGN
¦ = 246 = 166 = A6 = U+00A6 =    C2 A6 : BROKEN BAR
§ = 247 = 167 = A7 = U+00A7 =    C2 A7 : SECTION SIGN
¨ = 250 = 168 = A8 = U+00A8 =    C2 A8 : DIAERESIS
© = 251 = 169 = A9 = U+00A9 =    C2 A9 : COPYRIGHT SIGN
ª = 252 = 170 = AA = U+00AA =    C2 AA : FEMININE ORDINAL INDICATOR
« = 253 = 171 = AB = U+00AB =    C2 AB : LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
¬ = 254 = 172 = AC = U+00AC =    C2 AC : NOT SIGN
 = 255 = 173 = AD = U+00AD =    C2 AD : HYPHEN-MINUS
® = 256 = 174 = AE = U+00AE =    C2 AE : REGISTERED SIGN
¯ = 257 = 175 = AF = U+00AF =    C2 AF : MACRON
° = 260 = 176 = B0 = U+00B0 =    C2 B0 : DEGREE SIGN
± = 261 = 177 = B1 = U+00B1 =    C2 B1 : PLUS-MINUS SIGN
² = 262 = 178 = B2 = U+00B2 =    C2 B2 : SUPERSCRIPT TWO
³ = 263 = 179 = B3 = U+00B3 =    C2 B3 : SUPERSCRIPT THREE
´ = 264 = 180 = B4 = U+00B4 =    C2 B4 : ACUTE ACCENT
µ = 265 = 181 = B5 = U+00B5 =    C2 B5 : MICRO SIGN
¶ = 266 = 182 = B6 = U+00B6 =    C2 B6 : PILCROW SIGN
· = 267 = 183 = B7 = U+00B7 =    C2 B7 : MIDDLE DOT
¸ = 270 = 184 = B8 = U+00B8 =    C2 B8 : CEDILLA
¹ = 271 = 185 = B9 = U+00B9 =    C2 B9 : SUPERSCRIPT ONE
º = 272 = 186 = BA = U+00BA =    C2 BA : MASCULINE ORDINAL INDICATOR
» = 273 = 187 = BB = U+00BB =    C2 BB : RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
¼ = 274 = 188 = BC = U+00BC =    C2 BC : VULGAR FRACTION ONE QUARTER
½ = 275 = 189 = BD = U+00BD =    C2 BD : VULGAR FRACTION ONE HALF
¾ = 276 = 190 = BE = U+00BE =    C2 BE : VULGAR FRACTION THREE QUARTERS
¿ = 277 = 191 = BF = U+00BF =    C2 BF : INVERTED QUESTION MARK
À = 300 = 192 = C0 = U+00C0 =    C3 80 : LATIN CAPITAL LETTER A WITH GRAVE
Á = 301 = 193 = C1 = U+00C1 =    C3 81 : LATIN CAPITAL LETTER A WITH ACUTE
Â = 302 = 194 = C2 = U+00C2 =    C3 82 : LATIN CAPITAL LETTER A WITH CIRCUMFLEX
Ã = 303 = 195 = C3 = U+00C3 =    C3 83 : LATIN CAPITAL LETTER A WITH TILDE
Ä = 304 = 196 = C4 = U+00C4 =    C3 84 : LATIN CAPITAL LETTER A WITH DIAERESIS
Å = 305 = 197 = C5 = U+00C5 =    C3 85 : LATIN CAPITAL LETTER A WITH RING ABOVE
Æ = 306 = 198 = C6 = U+00C6 =    C3 86 : LATIN CAPITAL LETTER AE
Ç = 307 = 199 = C7 = U+00C7 =    C3 87 : LATIN CAPITAL LETTER C WITH CEDILLA
È = 310 = 200 = C8 = U+00C8 =    C3 88 : LATIN CAPITAL LETTER E WITH GRAVE
É = 311 = 201 = C9 = U+00C9 =    C3 89 : LATIN CAPITAL LETTER E WITH ACUTE
Ê = 312 = 202 = CA = U+00CA =    C3 8A : LATIN CAPITAL LETTER E WITH CIRCUMFLEX
Ë = 313 = 203 = CB = U+00CB =    C3 8B : LATIN CAPITAL LETTER E WITH DIAERESIS
Ì = 314 = 204 = CC = U+00CC =    C3 8C : LATIN CAPITAL LETTER I WITH GRAVE
Í = 315 = 205 = CD = U+00CD =    C3 8D : LATIN CAPITAL LETTER I WITH ACUTE
Î = 316 = 206 = CE = U+00CE =    C3 8E : LATIN CAPITAL LETTER I WITH CIRCUMFLEX
Ï = 317 = 207 = CF = U+00CF =    C3 8F : LATIN CAPITAL LETTER I WITH DIAERESIS
Ð = 320 = 208 = D0 = U+00D0 =    C3 90 : LATIN CAPITAL LETTER ETH
Ñ = 321 = 209 = D1 = U+00D1 =    C3 91 : LATIN CAPITAL LETTER N WITH TILDE
Ò = 322 = 210 = D2 = U+00D2 =    C3 92 : LATIN CAPITAL LETTER O WITH GRAVE
Ó = 323 = 211 = D3 = U+00D3 =    C3 93 : LATIN CAPITAL LETTER O WITH ACUTE
Ô = 324 = 212 = D4 = U+00D4 =    C3 94 : LATIN CAPITAL LETTER O WITH CIRCUMFLEX
Õ = 325 = 213 = D5 = U+00D5 =    C3 95 : LATIN CAPITAL LETTER O WITH TILDE
Ö = 326 = 214 = D6 = U+00D6 =    C3 96 : LATIN CAPITAL LETTER O WITH DIAERESIS
× = 327 = 215 = D7 = U+00D7 =    C3 97 : MULTIPLICATION SIGN
Ø = 330 = 216 = D8 = U+00D8 =    C3 98 : LATIN CAPITAL LETTER O WITH STROKE
Ù = 331 = 217 = D9 = U+00D9 =    C3 99 : LATIN CAPITAL LETTER U WITH GRAVE
Ú = 332 = 218 = DA = U+00DA =    C3 9A : LATIN CAPITAL LETTER U WITH ACUTE
Û = 333 = 219 = DB = U+00DB =    C3 9B : LATIN CAPITAL LETTER U WITH CIRCUMFLEX
Ü = 334 = 220 = DC = U+00DC =    C3 9C : LATIN CAPITAL LETTER U WITH DIAERESIS
Ý = 335 = 221 = DD = U+00DD =    C3 9D : LATIN CAPITAL LETTER Y WITH ACUTE
Þ = 336 = 222 = DE = U+00DE =    C3 9E : LATIN CAPITAL LETTER THORN
ß = 337 = 223 = DF = U+00DF =    C3 9F : LATIN SMALL LETTER SHARP S
à = 340 = 224 = E0 = U+00E0 =    C3 A0 : LATIN SMALL LETTER A WITH GRAVE
á = 341 = 225 = E1 = U+00E1 =    C3 A1 : LATIN SMALL LETTER A WITH ACUTE
â = 342 = 226 = E2 = U+00E2 =    C3 A2 : LATIN SMALL LETTER A WITH CIRCUMFLEX
ã = 343 = 227 = E3 = U+00E3 =    C3 A3 : LATIN SMALL LETTER A WITH TILDE
ä = 344 = 228 = E4 = U+00E4 =    C3 A4 : LATIN SMALL LETTER A WITH DIAERESIS
å = 345 = 229 = E5 = U+00E5 =    C3 A5 : LATIN SMALL LETTER A WITH RING ABOVE
æ = 346 = 230 = E6 = U+00E6 =    C3 A6 : LATIN SMALL LETTER AE
ç = 347 = 231 = E7 = U+00E7 =    C3 A7 : LATIN SMALL LETTER C WITH CEDILLA
è = 350 = 232 = E8 = U+00E8 =    C3 A8 : LATIN SMALL LETTER E WITH GRAVE
é = 351 = 233 = E9 = U+00E9 =    C3 A9 : LATIN SMALL LETTER E WITH ACUTE
ê = 352 = 234 = EA = U+00EA =    C3 AA : LATIN SMALL LETTER E WITH CIRCUMFLEX
ë = 353 = 235 = EB = U+00EB =    C3 AB : LATIN SMALL LETTER E WITH DIAERESIS
ì = 354 = 236 = EC = U+00EC =    C3 AC : LATIN SMALL LETTER I WITH GRAVE
í = 355 = 237 = ED = U+00ED =    C3 AD : LATIN SMALL LETTER I WITH ACUTE
î = 356 = 238 = EE = U+00EE =    C3 AE : LATIN SMALL LETTER I WITH CIRCUMFLEX
ï = 357 = 239 = EF = U+00EF =    C3 AF : LATIN SMALL LETTER I WITH DIAERESIS
ð = 360 = 240 = F0 = U+00F0 =    C3 B0 : LATIN SMALL LETTER ETH
ñ = 361 = 241 = F1 = U+00F1 =    C3 B1 : LATIN SMALL LETTER N WITH TILDE
ò = 362 = 242 = F2 = U+00F2 =    C3 B2 : LATIN SMALL LETTER O WITH GRAVE
ó = 363 = 243 = F3 = U+00F3 =    C3 B3 : LATIN SMALL LETTER O WITH ACUTE
ô = 364 = 244 = F4 = U+00F4 =    C3 B4 : LATIN SMALL LETTER O WITH CIRCUMFLEX
õ = 365 = 245 = F5 = U+00F5 =    C3 B5 : LATIN SMALL LETTER O WITH TILDE
ö = 366 = 246 = F6 = U+00F6 =    C3 B6 : LATIN SMALL LETTER O WITH DIAERESIS
÷ = 367 = 247 = F7 = U+00F7 =    C3 B7 : DIVISION SIGN
ø = 370 = 248 = F8 = U+00F8 =    C3 B8 : LATIN SMALL LETTER O WITH STROKE
ù = 371 = 249 = F9 = U+00F9 =    C3 B9 : LATIN SMALL LETTER U WITH GRAVE
ú = 372 = 250 = FA = U+00FA =    C3 BA : LATIN SMALL LETTER U WITH ACUTE
û = 373 = 251 = FB = U+00FB =    C3 BB : LATIN SMALL LETTER U WITH CIRCUMFLEX
ü = 374 = 252 = FC = U+00FC =    C3 BC : LATIN SMALL LETTER U WITH DIAERESIS
ý = 375 = 253 = FD = U+00FD =    C3 BD : LATIN SMALL LETTER Y WITH ACUTE
þ = 376 = 254 = FE = U+00FE =    C3 BE : LATIN SMALL LETTER THORN
ÿ = 377 = 255 = FF = U+00FF =    C3 BF : LATIN SMALL LETTER Y WITH DIAERESIS

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: File Encoding Issue on Windows
  2013-03-13 23:48                   ` Tech Stuff
  2013-03-13 23:58                     ` Peter Dyballa
@ 2013-03-14  0:38                     ` Axel E. Retif
  2013-03-14  2:24                       ` Tech Stuff
  1 sibling, 1 reply; 24+ messages in thread
From: Axel E. Retif @ 2013-03-14  0:38 UTC (permalink / raw)
  To: Tech Stuff; +Cc: help-gnu-emacs@gnu.org

On 03/13/2013 05:48 PM, Tech Stuff wrote:

> I don't have the backup files and I'm willing to do it the hard way.
> I'm willing to do anything short of actually gonig in and changing every
> occurrence individually as there are hundreds of them.

I also change hundreds of occurrences of characters ``a la TeX'' in 
LaTeX files (say, \'{\i} for í, `? for ¿, etc.) with query-replace: 
Meta-Shift-%

You can try

    Meta-Shift-%

At the minibuffer prompt

    Query replace:

type, say,

     Â¿

and at the prompt

     Query replace  Â¿ with:

type

     ¿

It will start showing each occurrence of Â¿; you can type y for «yes, 
replace» n for «no». If after some occurrences you are satisfied with 
the results, you can type ! «to replace all remaining matches with no 
more questions».

When you are in the Query-replace mode, you can type Control-? to see 
all the options.

Best

Axel

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: File Encoding Issue on Windows
  2013-03-14  0:38                     ` Axel E. Retif
@ 2013-03-14  2:24                       ` Tech Stuff
  2013-03-14  2:35                         ` Tech Stuff
  0 siblings, 1 reply; 24+ messages in thread
From: Tech Stuff @ 2013-03-14  2:24 UTC (permalink / raw)
  To: Axel E. Retif; +Cc: help-gnu-emacs@gnu.org

[-- Attachment #1: Type: text/plain, Size: 1813 bytes --]

Hi Axel,

This is *exactly* what I want to do.  The problem is that I can't figure out how, using my US keyboard I can enter, say the  'Â' character.  I know it's codepoint (194), but I don't know how to enter that value into the mini buffer.  On a related note, I use the US international keyboard to type spanish.  Using this software I can get the 'é' character for example by typing an single quote followed by an 'e'.  This has always just worked in the main emacs edit buffer but not in the mini buffer.  I've always been able to deal with this limitation but it would also be a nice to get that working.

Thanks.

________________________________
 From: Axel E. Retif <axel.retif@mac.com>
To: Tech Stuff <techstuff1971@yahoo.com> 
Cc: "help-gnu-emacs@gnu.org" <help-gnu-emacs@gnu.org> 
Sent: Wednesday, March 13, 2013 5:38 PM
Subject: Re: File Encoding Issue on Windows

On 03/13/2013 05:48 PM, Tech Stuff wrote:

> I don't have the backup files and I'm willing to do it the hard way.
> I'm willing to do anything short of actually gonig in and changing every
> occurrence individually as there are hundreds of them.

I also change hundreds of occurrences of characters ``a la TeX'' in LaTeX files (say, \'{\i} for í, `? for ¿, etc.) with query-replace: Meta-Shift-%

You can try

   Meta-Shift-%

At the minibuffer prompt

   Query replace:

type, say,

    Â¿

and at the prompt

    Query replace  Â¿ with:

type

    ¿

It will start showing each occurrence of Â¿; you can type y for «yes, replace» n for «no». If after some occurrences you are satisfied with the results, you can type ! «to replace all remaining matches with no more questions».

When you are in the Query-replace mode, you can type Control-? to see all the options.

Best

Axel

[-- Attachment #2: Type: text/html, Size: 2676 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: File Encoding Issue on Windows
  2013-03-14  2:24                       ` Tech Stuff
@ 2013-03-14  2:35                         ` Tech Stuff
  2013-03-14  2:59                           ` Axel E. Retif
  0 siblings, 1 reply; 24+ messages in thread
From: Tech Stuff @ 2013-03-14  2:35 UTC (permalink / raw)
  To: Tech Stuff, Axel E. Retif; +Cc: help-gnu-emacs@gnu.org

[-- Attachment #1: Type: text/plain, Size: 2758 bytes --]

So, having played around a bit more I do have to make one correction.  It does appear that now I am able to enter the extended characters supported by the US International keyboard, even in the mini-buffer.  This must have been fixed between the old 22.x version I was running and the new 24.2 I just installed.  So that's good.  I also realized that the Â character *is* supported by the US International layout so I actually can do the query replace for that particular example.  I still have the general problem though.  How do I for instance enter the character at codepoint 173 which is a soft hyphen (whatever that is) into the mini buffer?

Thanks.

-ts1971

________________________________
 From: Tech Stuff <techstuff1971@yahoo.com>
To: Axel E. Retif <axel.retif@mac.com> 
Cc: "help-gnu-emacs@gnu.org" <help-gnu-emacs@gnu.org> 
Sent: Wednesday, March 13, 2013 7:24 PM
Subject: Re: File Encoding Issue on Windows

Hi Axel,

This is *exactly* what I want to do.  The problem is that I can't figure out how, using my US keyboard I can enter, say the  'Â' character.  I know it's codepoint (194), but I don't know how to enter that value into the mini buffer.  On a related note, I use the US international keyboard to type spanish.  Using this software I can get the 'é' character for example by typing an single quote followed by an 'e'.  This has always just worked in the main emacs edit buffer but not in the mini buffer.  I've always been able to deal with this limitation but it would also be a nice to get that working.

Thanks.

________________________________
 From: Axel E. Retif <axel.retif@mac.com>
To: Tech Stuff <techstuff1971@yahoo.com> 
Cc: "help-gnu-emacs@gnu.org" <help-gnu-emacs@gnu.org> 
Sent: Wednesday, March 13, 2013 5:38 PM
Subject: Re: File Encoding Issue on Windows

On 03/13/2013 05:48 PM, Tech Stuff wrote:

> I don't have the backup files and I'm willing to do it the hard way.
> I'm willing to do anything short of actually gonig in and changing every
> occurrence individually as there are hundreds of them.

I also change hundreds of occurrences of characters ``a la TeX'' in LaTeX files (say, \'{\i} for í, `? for ¿, etc.) with query-replace: Meta-Shift-%

You can try

   Meta-Shift-%

At the minibuffer prompt

   Query replace:

type, say,

    Â¿

and at the prompt

    Query replace  Â¿ with:

type

    ¿

It will start showing each occurrence of Â¿; you can type y for «yes, replace» n for «no». If after some occurrences you are satisfied with the results, you can type ! «to replace all remaining matches with no more questions».

When you are in the
 Query-replace mode, you can type Control-? to see all the options.

Best

Axel

[-- Attachment #2: Type: text/html, Size: 5030 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: File Encoding Issue on Windows
  2013-03-14  2:35                         ` Tech Stuff
@ 2013-03-14  2:59                           ` Axel E. Retif
  2013-03-14  4:23                             ` Tech Stuff
  0 siblings, 1 reply; 24+ messages in thread
From: Axel E. Retif @ 2013-03-14  2:59 UTC (permalink / raw)
  To: Tech Stuff; +Cc: help-gnu-emacs@gnu.org

On 03/13/2013 08:35 PM, Tech Stuff wrote:

> So, having played around a bit more I do have to make one correction.
> It does appear that now I am able to enter the extended characters
> supported by the US International keyboard, even in the mini-buffer.

Yes, I also use an US keyboard (an Apple US Keyboard, in fact, both in 
Linux and Mac) for Spanish text.


> I still
> have the general problem though.  How do I for instance enter the
> character at codepoint 173 which is a soft hyphen (whatever that is)
> into the mini buffer?

That I don't know (copy-paste?). But I usually enter UTF-8 characters 
with C-X 8 RET; for example, to get an em dash,

     C-X 8 RET em dash RET

or, to get an Â, as per Peter's table,

     C-X 8 RET 00C2 RET.

See

http://emacswiki.org/emacs/UnicodeEncoding


Best

Axel






^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: File Encoding Issue on Windows
  2013-03-14  2:59                           ` Axel E. Retif
@ 2013-03-14  4:23                             ` Tech Stuff
  2013-03-14  6:07                               ` Axel E. Retif
  0 siblings, 1 reply; 24+ messages in thread
From: Tech Stuff @ 2013-03-14  4:23 UTC (permalink / raw)
  To: Axel E. Retif; +Cc: help-gnu-emacs@gnu.org

[-- Attachment #1: Type: text/plain, Size: 1237 bytes --]

copy-paste worked lol.  I feel like such an idiot.  Thanks Axel! (and everyone else who participated in the thread)

-ts1971

________________________________
 From: Axel E. Retif <axel.retif@mac.com>
To: Tech Stuff <techstuff1971@yahoo.com> 
Cc: "help-gnu-emacs@gnu.org" <help-gnu-emacs@gnu.org> 
Sent: Wednesday, March 13, 2013 7:59 PM
Subject: Re: File Encoding Issue on Windows

On 03/13/2013 08:35 PM, Tech Stuff wrote:

> So, having played around a bit more I do have to make one correction.
> It does appear that now I am able to enter the extended characters
> supported by the US International keyboard, even in the mini-buffer.

Yes, I also use an US keyboard (an Apple US Keyboard, in fact, both in Linux and Mac) for Spanish text.

> I still
> have the general problem though.  How do I for instance enter the
> character at codepoint 173 which is a soft hyphen (whatever that is)
> into the mini buffer?

That I don't know (copy-paste?). But I usually enter UTF-8 characters with C-X 8 RET; for example, to get an em dash,

    C-X 8 RET em dash RET

or, to get an Â, as per Peter's table,

    C-X 8 RET 00C2 RET.

See

http://emacswiki.org/emacs/UnicodeEncoding

Best

Axel

[-- Attachment #2: Type: text/html, Size: 2068 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: File Encoding Issue on Windows
  2013-03-14  4:23                             ` Tech Stuff
@ 2013-03-14  6:07                               ` Axel E. Retif
  0 siblings, 0 replies; 24+ messages in thread
From: Axel E. Retif @ 2013-03-14  6:07 UTC (permalink / raw)
  To: Tech Stuff; +Cc: help-gnu-emacs@gnu.org

On  13 Mar, 2013, at 22:23, Tech Stuff wrote:

> copy-paste worked lol.  I feel like such an idiot.  Thanks Axel!  
> (and everyone else who participated in the thread)

Good. Just be careful with «Replace All». Say, if I replace all  
occurrences of `` with opening quotation marks (“), maybe everything  
will be fine; but if I then replace all '' with closing quotation  
marks (”), double primes will get replaced as well.

Also, Windows applications can really mess up your files. I would  
advise you against using programs such as Notepad (though I *really*  
don't use and don't know Windows, I have suffered a lot with files  
coming from Windows apps).

Set up Emacs to use UTF-8 and that's it. If you have to use any other  
app, try to use a reliable editor (for example, with (La)TeX files,  
TeXworks uses UTF-8 across platforms).

Best

Axel

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2013-03-14  6:07 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-03-12  3:08 File Encoding Issue on Windows Tech Stuff
2013-03-12 10:50 ` Peter Dyballa
2013-03-12 14:57   ` Tech Stuff
2013-03-12 16:32     ` W. Greenhouse
2013-03-13 17:44       ` Tech Stuff
2013-03-13 20:37         ` Peter Dyballa
2013-03-13 21:11           ` Tech Stuff
2013-03-13 22:16             ` Peter Dyballa
2013-03-13 23:26               ` Tech Stuff
2013-03-13 23:41                 ` Peter Dyballa
2013-03-13 23:48                   ` Tech Stuff
2013-03-13 23:58                     ` Peter Dyballa
2013-03-14  0:38                     ` Axel E. Retif
2013-03-14  2:24                       ` Tech Stuff
2013-03-14  2:35                         ` Tech Stuff
2013-03-14  2:59                           ` Axel E. Retif
2013-03-14  4:23                             ` Tech Stuff
2013-03-14  6:07                               ` Axel E. Retif
2013-03-12 17:23     ` Peter Dyballa
     [not found] <mailman.21917.1363080184.855.help-gnu-emacs@gnu.org>
2013-03-13 12:33 ` Phoenix Gris
2013-03-13 14:48   ` Peter Dyballa
2013-03-13 15:29   ` Filipp Gunbin
2013-03-13 17:16   ` Eli Zaretskii
2013-03-13 20:33   ` Stefan Monnier

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).