* Questions on charset encoding detection and keyboard layout
@ 2009-12-10 14:42 Hou, Ruoyu
2009-12-10 19:03 ` Eli Zaretskii
0 siblings, 1 reply; 7+ messages in thread
From: Hou, Ruoyu @ 2009-12-10 14:42 UTC (permalink / raw)
To: Emacs Mailing List
Hello,
I am a beginner to emacs and starting to experience some problems on
multilingual editing. I just hope someone would kindly give me some
hints. My main platform is NetBSD 5.0.1 running emacs23.1-gtk from pkgsrc.
1. My working environment involves handling documents with major eastern
Asian characters in different encodings(GB2312, GB18030, BIG5, GBK,
UTF-8, Shift-JIS, EUC-JP, ISO-2022-JP). My language environment was set
as UTF-8, because I want all the documents *I create* saved as UTF-8,
and properly display/edit/save other file in different encodings without
changing the raw encodings. I noticed that my emacs was not properly
recognizing documents encoded with euc-jp, so I have to manually set it
every time I encounter such documents. Is there any configurations I
could tweak to accurately auto-detect and display the file encodings I
mentioned above? In most occasions, I don't have any a priori knowledge
on what the exact encoding is in a given document.
From the manual I learnt that some encodings are not easily
distinguishable one another. So I guess the setting would be delicate.
Some gvim user mentioned that EUC-JP has to be located before GBK in
encoding list to get appropriate result.
2. I'm using a computer with Japanese keyboard layout(a 84-key notebook
variant of jp-106). When using emacs in X GUI mode, the language
conversion keys (<henkan>, <muhenkan>, <hirakana/katakana>, etc.)
respond perfectly in echo area, so I can bind some key macros to toggle
Japanese input method. However, when I start emacs -nw in say uxterm,
those keys are not echoed.
Frankly, I prefer to run emacs in non-GUI mode for faster response on my
old notebook. I'm wondering if I could also configure those conversion
keys to work the same as in GUI mode. Do I have to changing any settings
in X or emacs? Is there anyone experiencing a similar situation?
I have to confess that I do not have much experiences in emacs after
switching my work platform to open source softwares, and as a somewhat
busy benchwork biologist I probably missed some homework I should have
done before asking. I would appreciate any hints or reference readings.
Thanks for any attention.
Regards,
--
Hou, Ruoyu
Laboratory of Reproductive & Stem Cell Biology,
College of Life Science & Biotech.,
Shanghai Jiao Tong University,
Shanghai 200240, P.R.China.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Questions on charset encoding detection and keyboard layout
2009-12-10 14:42 Questions on charset encoding detection and keyboard layout Hou, Ruoyu
@ 2009-12-10 19:03 ` Eli Zaretskii
2009-12-11 5:42 ` Hou, Ruoyu
0 siblings, 1 reply; 7+ messages in thread
From: Eli Zaretskii @ 2009-12-10 19:03 UTC (permalink / raw)
To: help-gnu-emacs
> Date: Thu, 10 Dec 2009 22:42:11 +0800
> From: "Hou, Ruoyu" <phoenixhou@gmail.com>
>
> 1. My working environment involves handling documents with major eastern
> Asian characters in different encodings(GB2312, GB18030, BIG5, GBK,
> UTF-8, Shift-JIS, EUC-JP, ISO-2022-JP). My language environment was set
> as UTF-8, because I want all the documents *I create* saved as UTF-8,
> and properly display/edit/save other file in different encodings without
> changing the raw encodings. I noticed that my emacs was not properly
> recognizing documents encoded with euc-jp, so I have to manually set it
> every time I encounter such documents. Is there any configurations I
> could tweak to accurately auto-detect and display the file encodings I
> mentioned above?
Try putting this in your ~/.emacs init file (and restart Emacs after
that):
(prefer-coding-system 'euc-jp)
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Questions on charset encoding detection and keyboard layout
2009-12-10 19:03 ` Eli Zaretskii
@ 2009-12-11 5:42 ` Hou, Ruoyu
2009-12-11 8:42 ` Eli Zaretskii
0 siblings, 1 reply; 7+ messages in thread
From: Hou, Ruoyu @ 2009-12-11 5:42 UTC (permalink / raw)
To: Emacs Mailing List
Dear Zaretskii,
I tried the tip you gave me, but now I've got my GBK-encoded files
unreadable. How you would solve the problem?
Moreover, as I mentioned in the previous post, how could I set a
prefer-coding-system without beforehand knowledge about the encoding I
am supposed to encounter?
Thanks for your help.
Regards,
Eli Zaretskii wrote:
>> Date: Thu, 10 Dec 2009 22:42:11 +0800
>> From: "Hou, Ruoyu" <phoenixhou@gmail.com>
>>
>> 1. My working environment involves handling documents with major eastern
>> Asian characters in different encodings(GB2312, GB18030, BIG5, GBK,
>> UTF-8, Shift-JIS, EUC-JP, ISO-2022-JP). My language environment was set
>> as UTF-8, because I want all the documents *I create* saved as UTF-8,
>> and properly display/edit/save other file in different encodings without
>> changing the raw encodings. I noticed that my emacs was not properly
>> recognizing documents encoded with euc-jp, so I have to manually set it
>> every time I encounter such documents. Is there any configurations I
>> could tweak to accurately auto-detect and display the file encodings I
>> mentioned above?
>
> Try putting this in your ~/.emacs init file (and restart Emacs after
> that):
>
> (prefer-coding-system 'euc-jp)
>
>
>
--
Hou, Ruoyu
Laboratory of Reproductive & Stem Cell Biology,
College of Life Science & Biotech.,
Shanghai Jiao Tong University,
Shanghai 200240, P.R.China.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Questions on charset encoding detection and keyboard layout
2009-12-11 5:42 ` Hou, Ruoyu
@ 2009-12-11 8:42 ` Eli Zaretskii
2009-12-11 15:18 ` Kevin Rodgers
2009-12-11 19:51 ` Hou, Ruoyu
0 siblings, 2 replies; 7+ messages in thread
From: Eli Zaretskii @ 2009-12-11 8:42 UTC (permalink / raw)
To: help-gnu-emacs
> Date: Fri, 11 Dec 2009 13:42:39 +0800
> From: "Hou, Ruoyu" <phoenixhou@gmail.com>
>
> I tried the tip you gave me, but now I've got my GBK-encoded files
> unreadable. How you would solve the problem?
>
> Moreover, as I mentioned in the previous post, how could I set a
> prefer-coding-system without beforehand knowledge about the encoding I
> am supposed to encounter?
If you have many documents in different encodings that Emacs cannot
distinguish by itself, then I'm afraid there's no good solution except
"C-x RET c", which requires that you know the encoding in advance. At
least I'm not aware of any better way. What do other applications do?
Of course, if you inadvertently visit a file without knowing the
encoding, and want to re-visit it with the correct encoding, after you
notice that Emacs didn't properly decode it, then typing "C-x RET c
CORRECT-ENCODING RET M-x revert-buffer RET" will fix the problem.
Here CORRECT-ENCODING is the correct encoding of the file.
Also, if you could somehow manage to have documents in different
encodings to reside in different directories, then perhaps you could
set up the directory-local variables to cause Emacs decode the files
in each directory correctly. See the node "Directory Variables" in
the Emacs user manual for details about this feature.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Questions on charset encoding detection and keyboard layout
2009-12-11 8:42 ` Eli Zaretskii
@ 2009-12-11 15:18 ` Kevin Rodgers
2009-12-11 19:51 ` Hou, Ruoyu
1 sibling, 0 replies; 7+ messages in thread
From: Kevin Rodgers @ 2009-12-11 15:18 UTC (permalink / raw)
To: help-gnu-emacs
Eli Zaretskii wrote:
> Of course, if you inadvertently visit a file without knowing the
> encoding, and want to re-visit it with the correct encoding, after you
> notice that Emacs didn't properly decode it, then typing "C-x RET c
> CORRECT-ENCODING RET M-x revert-buffer RET" will fix the problem.
> Here CORRECT-ENCODING is the correct encoding of the file.
aka C-x RET r CORRECT-ENCODING RET
--
Kevin Rodgers
Denver, Colorado, USA
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Questions on charset encoding detection and keyboard layout
2009-12-11 8:42 ` Eli Zaretskii
2009-12-11 15:18 ` Kevin Rodgers
@ 2009-12-11 19:51 ` Hou, Ruoyu
2009-12-11 20:54 ` Eli Zaretskii
1 sibling, 1 reply; 7+ messages in thread
From: Hou, Ruoyu @ 2009-12-11 19:51 UTC (permalink / raw)
To: help-gnu-emacs
Dear Zaretskii,
Before switching to Emacs I've been using EmEditor, a proprietary editor
under Windows. It could auto-detect those files with different encodings
and prompt a coding list in statistical confidence order for me to
determine the most likely file encoding. So I guess it may implements
certain statistical algorithm to detect the proper encoding.
I also tried MadEdit, an open source cross-platform editor. So far it
could automatically decode files it handled even without the need for me
to choose a likely one. I am not skilled to read its source code so I
can't tell how it is done. Also I don't know how MULE handles the coding
detection case.
A friend of mine, a Vim user, showed me handling those different
encodings by ":set fencs=(a list of possible encodings, the point is to
put euc-jp before gbk)". It seems to be done by calling libiconv and
libintl(or gettext, I'm not sure).
I just thought that my Emacs should perform better or at least
equivalent with these softwares.
Thanks for your help. I am actually using the commands you mentioned to
set encodings for viewing or saving. The classification for document
storage is a good idea and habit, only if I had the foresight. It's a
bit unrealistic when facing a large quantity of unsorted documents in
different encodings already on the disk and constantly increasing (as I
always complain, why can't those guys just use UTF-8?). Is it possible
to for example write a script to distinguish and sort those documents?
Regards,
Eli Zaretskii wrote:
>> Date: Fri, 11 Dec 2009 13:42:39 +0800
>> From: "Hou, Ruoyu" <phoenixhou@gmail.com>
>>
>> I tried the tip you gave me, but now I've got my GBK-encoded files
>> unreadable. How you would solve the problem?
>>
>> Moreover, as I mentioned in the previous post, how could I set a
>> prefer-coding-system without beforehand knowledge about the encoding I
>> am supposed to encounter?
>
> If you have many documents in different encodings that Emacs cannot
> distinguish by itself, then I'm afraid there's no good solution except
> "C-x RET c", which requires that you know the encoding in advance. At
> least I'm not aware of any better way. What do other applications do?
>
> Of course, if you inadvertently visit a file without knowing the
> encoding, and want to re-visit it with the correct encoding, after you
> notice that Emacs didn't properly decode it, then typing "C-x RET c
> CORRECT-ENCODING RET M-x revert-buffer RET" will fix the problem.
> Here CORRECT-ENCODING is the correct encoding of the file.
>
> Also, if you could somehow manage to have documents in different
> encodings to reside in different directories, then perhaps you could
> set up the directory-local variables to cause Emacs decode the files
> in each directory correctly. See the node "Directory Variables" in
> the Emacs user manual for details about this feature.
>
>
>
--
Hou, Ruoyu
Laboratory of Reproductive & Stem Cell Biology,
College of Life Science & Biotech.,
Shanghai Jiao Tong University,
Shanghai 200240, P.R.China.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Questions on charset encoding detection and keyboard layout
2009-12-11 19:51 ` Hou, Ruoyu
@ 2009-12-11 20:54 ` Eli Zaretskii
0 siblings, 0 replies; 7+ messages in thread
From: Eli Zaretskii @ 2009-12-11 20:54 UTC (permalink / raw)
To: help-gnu-emacs
> Date: Sat, 12 Dec 2009 03:51:50 +0800
> From: "Hou, Ruoyu" <phoenixhou@gmail.com>
>
> Before switching to Emacs I've been using EmEditor, a proprietary editor
> under Windows. It could auto-detect those files with different encodings
> and prompt a coding list in statistical confidence order for me to
> determine the most likely file encoding. So I guess it may implements
> certain statistical algorithm to detect the proper encoding.
This feature still awaits a volunteer to be added to Emacs. It
shouldn't be too hard, I think.
> A friend of mine, a Vim user, showed me handling those different
> encodings by ":set fencs=(a list of possible encodings, the point is to
> put euc-jp before gbk)".
The customization I suggested, i.e.
(prefer-coding-system 'euc-jp)
was supposed to make euc-jp of higher priority than GBK (and
everything else). However, I understand it did you more harm than
good.
For more fine-grain control, try calling set-coding-system-priority
for every encoding you need to deal with, and in such an order that
the resulting list returned by coding-system-priority-list would show
the encodings in the order you want them. (These two functions are
documented in the ELisp manual.) I'm not sure this will have the same
effect as ":set fencs" in vim, though.
> The classification for document
> storage is a good idea and habit, only if I had the foresight. It's a
> bit unrealistic when facing a large quantity of unsorted documents in
> different encodings already on the disk and constantly increasing (as I
> always complain, why can't those guys just use UTF-8?). Is it possible
> to for example write a script to distinguish and sort those documents?
I would try to find a program that could print a file's
encoding. `file' does not do that, but maybe there's something else
out there.
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2009-12-11 20:54 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-12-10 14:42 Questions on charset encoding detection and keyboard layout Hou, Ruoyu
2009-12-10 19:03 ` Eli Zaretskii
2009-12-11 5:42 ` Hou, Ruoyu
2009-12-11 8:42 ` Eli Zaretskii
2009-12-11 15:18 ` Kevin Rodgers
2009-12-11 19:51 ` Hou, Ruoyu
2009-12-11 20:54 ` Eli Zaretskii
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).