* How to make emacs auto-recognize utf-8 encoded files upon visiting @ 2002-09-23 16:39 Gerald Wildgruber 2002-09-23 23:35 ` Jesper Harder ` (2 more replies) 0 siblings, 3 replies; 32+ messages in thread From: Gerald Wildgruber @ 2002-09-23 16:39 UTC (permalink / raw) Hello, I'm trying to make my emacs (GNU Emacs 21.3.50.1 on linux) auto-recognize the right encoding when visiting files with utf-8 encoding. The emacs info help entry says on the topic: "Some coding systems can be recognized or distinguished by which byte sequences appear in the data. However, there are coding systems that cannot be distinguished, not even potentially." Does this also apply to utf-8 encoded files? Is it impossible for emacs to auto-recognize them (as for example the `file' command on the shell does)? I'm aware of how to do this with File Variables (either by using the `-*-...-*-' construct or a local variables list at the end of the file). Both of them work well. Setting `(prefer-coding-system 'utf-8)' in `.emacs' also works, but is kind of intrusive as all new files are then using this encoding by default. Even without file variables, Emacs does correctly recognize the encoding when visiting latin-1 or latin-9 encoded files. Yet it fails when visiting utf-8 encoded files. I get the `-:--' abbrev on the encoding part of the modeline, and letters beyond ascii are messed up. Can anyone give me a hint on how to make emacs find the correct coding system (without setting it explicitly through file variables)? Thanks, Gerald. ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: How to make emacs auto-recognize utf-8 encoded files upon visiting 2002-09-23 16:39 How to make emacs auto-recognize utf-8 encoded files upon visiting Gerald Wildgruber @ 2002-09-23 23:35 ` Jesper Harder 2002-09-24 3:29 ` Charles Muller [not found] ` <mailman.1032838300.26368.help-gnu-emacs@gnu.org> 2002-09-24 11:45 ` auto-recognize utf-8 encoded files upon visiting: solved (sort of...) Gerald Wildgruber 2002-09-24 18:57 ` How to make emacs auto-recognize utf-8 encoded files upon visiting Dominic Cronin 2 siblings, 2 replies; 32+ messages in thread From: Jesper Harder @ 2002-09-23 23:35 UTC (permalink / raw) Gerald Wildgruber <gwil.remove.this.phrase@lrz.uni-muenchen.de> writes: > I'm trying to make my emacs (GNU Emacs 21.3.50.1 on linux) auto-recognize > the right encoding when visiting files with utf-8 encoding. > > Can anyone give me a hint on how to make emacs find the correct coding > system (without setting it explicitly through file variables)? Does it work if you say `M-x prefer-coding-system utf-8' ? If it doesn't, you can open a file using a specified coding system with: `C-x RET c utf-8 RET C-x C-f' ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: How to make emacs auto-recognize utf-8 encoded files upon visiting 2002-09-23 23:35 ` Jesper Harder @ 2002-09-24 3:29 ` Charles Muller [not found] ` <mailman.1032838300.26368.help-gnu-emacs@gnu.org> 1 sibling, 0 replies; 32+ messages in thread From: Charles Muller @ 2002-09-24 3:29 UTC (permalink / raw) Cc: help-gnu-emacs Jesper Harder wrote: > > I'm trying to make my emacs (GNU Emacs 21.3.50.1 on linux) auto-recognize > > the right encoding when visiting files with utf-8 encoding. Sorry to mention TEI-Emacs again, but its superior ability to identify document encodings (especially Unicode) is one of things that I like best about it. http://www.tei-c.org/Software/tei-emacs.tar.gz >Does it work if you say `M-x prefer-coding-system utf-8' ? > If it doesn't, you can open a file using a specified coding system with: > `C-x RET c utf-8 RET C-x C-f' I had also tried these strategies, and they will allow you to edit a document as UTF-8, but AFAICT they don't solve the problem of having Emacs recognize these encodings automatically. Chuck --------------------------- Charles Muller <acmuller@gol.com> Faculty of Humanities, Toyo Gakuen University Digital Dictionary of Buddhism and CJKV-English Dictionary [http://www.acmuller.net] Mobile Phone: 090-9310-1787 ^ permalink raw reply [flat|nested] 32+ messages in thread
[parent not found: <mailman.1032838300.26368.help-gnu-emacs@gnu.org>]
* Re: How to make emacs auto-recognize utf-8 encoded files upon visiting [not found] ` <mailman.1032838300.26368.help-gnu-emacs@gnu.org> @ 2002-09-24 6:27 ` Miles Bader 2002-09-24 8:59 ` Charles Muller [not found] ` <mailman.1032848900.31556.help-gnu-emacs@gnu.org> 1 sibling, 1 reply; 32+ messages in thread From: Miles Bader @ 2002-09-24 6:27 UTC (permalink / raw) Charles Muller <acmuller@gol.com> writes: > Sorry to mention TEI-Emacs again, but its superior ability to identify > document encodings (especially Unicode) is one of things that I > like best about it. I'm sure, but saying `download an entirely new version of emacs' is a rather heavyweight solution. If they've hacked emacs to do a better job of this, how did they do it? Did they ever try to submit their changes back to emacs? -Miles -- Would you like fries with that? ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: How to make emacs auto-recognize utf-8 encoded files upon visiting 2002-09-24 6:27 ` Miles Bader @ 2002-09-24 8:59 ` Charles Muller 2002-09-24 15:12 ` Eli Zaretskii 2002-09-24 19:05 ` tramp Roger Mason 0 siblings, 2 replies; 32+ messages in thread From: Charles Muller @ 2002-09-24 8:59 UTC (permalink / raw) Miles Bader asked: >I'm sure, but saying `download an entirely new version of emacs' is a >rather heavyweight solution. As I understand it, it is not an entirely new version, in the sense that it does not replace the basic program files for Emacs itself. But it supplies its own version of Mule-UCS, and everything concerned with doing XML/HTML/XSLT in international environments. > If they've hacked emacs to do a better job of this, how did they do it? > Did they ever try to submit their changes back to emacs? I am not a member of the TEI Consortium, so I can't answer this. Certainly, all the source code for what they've done is readily and openly available for anyone interested. All I know is that before I began to use this package, when doing mail in Emacs (21.2), there was never any automatic recognition of any kinds of non-Western encoding. Now it recognizes Japanese, Korean, Chinese, and whatever, whether these be in the local encoding (JIS, Big5, etc.) or Unicode, without me having to do "prefer-coding-system" or anything like that. As a person working in several languages, Emacs would be of very little use to me without this. Chuck --------------------------- Charles Muller <acmuller@gol.com> Faculty of Humanities, Toyo Gakuen University Digital Dictionary of Buddhism and CJKV-English Dictionary [http://www.acmuller.net] Mobile Phone: 090-9310-1787 ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: How to make emacs auto-recognize utf-8 encoded files upon visiting 2002-09-24 8:59 ` Charles Muller @ 2002-09-24 15:12 ` Eli Zaretskii 2002-09-25 6:45 ` Charles Muller [not found] ` <mailman.1032936261.7964.help-gnu-emacs@gnu.org> 2002-09-24 19:05 ` tramp Roger Mason 1 sibling, 2 replies; 32+ messages in thread From: Eli Zaretskii @ 2002-09-24 15:12 UTC (permalink / raw) Cc: help-gnu-emacs On Tue, 24 Sep 2002, Charles Muller wrote: > All I know is that before I began to use this package, when doing mail in > Emacs (21.2), there was never any automatic recognition of any kinds of > non-Western encoding. Now it recognizes Japanese, Korean, Chinese, and > whatever, whether these be in the local encoding (JIS, Big5, etc.) or > Unicode, without me having to do "prefer-coding-system" or anything like > that. FWIW, my stock Emacs 21.2 recognizes non-Western encodings right out of the box. I'm not saying that you were dreaming, just that those problems are not trivial to reproduce, so a full-blown bug report is in order. ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: How to make emacs auto-recognize utf-8 encoded files upon visiting 2002-09-24 15:12 ` Eli Zaretskii @ 2002-09-25 6:45 ` Charles Muller 2002-09-25 6:55 ` Eli Zaretskii [not found] ` <mailman.1032936261.7964.help-gnu-emacs@gnu.org> 1 sibling, 1 reply; 32+ messages in thread From: Charles Muller @ 2002-09-25 6:45 UTC (permalink / raw) Cc: help-gnu-emacs Eli wrote: > FWIW, my stock Emacs 21.2 recognizes non-Western encodings right out of > the box. Are you saying that if you open up a document in your build of 21.2 that contains, for instance, Japanese, Chinese, Korean, and Latin characters, all the fonts display correctly without needing any tweaking at all? If so, this is the first case I have heard among many of my colleagues who are using Emacs. > I'm not saying that you were dreaming, just that those problems are not > trivial to reproduce, so a full-blown bug report is in order. I don't see it as a bug. I see it more as a level of technical capability that Emacs has not yet reached to, but no doubt will some day. Chuck --------------------------- Charles Muller <acmuller@gol.com> Faculty of Humanities, Toyo Gakuen University Digital Dictionary of Buddhism and CJKV-English Dictionary [http://www.acmuller.net] Mobile Phone: 090-9310-1787 ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: How to make emacs auto-recognize utf-8 encoded files upon visiting 2002-09-25 6:45 ` Charles Muller @ 2002-09-25 6:55 ` Eli Zaretskii 2002-09-25 8:07 ` Charles Muller ` (3 more replies) 0 siblings, 4 replies; 32+ messages in thread From: Eli Zaretskii @ 2002-09-25 6:55 UTC (permalink / raw) Cc: help-gnu-emacs On Wed, 25 Sep 2002, Charles Muller wrote: > Are you saying that if you open up a document in your build of 21.2 that > contains, for instance, Japanese, Chinese, Korean, and Latin characters, all > the fonts display correctly without needing any tweaking at all? Yes. Don't you see that when you visit etc/HELLO? > > I'm not saying that you were dreaming, just that those problems are not > > trivial to reproduce, so a full-blown bug report is in order. > > I don't see it as a bug. I see it more as a level of technical capability > that Emacs has not yet reached to That's another definition of a bug ;-) ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: How to make emacs auto-recognize utf-8 encoded files upon visiting 2002-09-25 6:55 ` Eli Zaretskii @ 2002-09-25 8:07 ` Charles Muller 2002-09-25 8:33 ` Charles Muller ` (2 subsequent siblings) 3 siblings, 0 replies; 32+ messages in thread From: Charles Muller @ 2002-09-25 8:07 UTC (permalink / raw) Cc: help-gnu-emacs Eli wrote: > Don't you see that when you visit etc/HELLO? This is interesting, because the CJK fonts in the HELLO file *do* display correctly. But when I open up any other utf-8 file on my system without using TEI-Emacs, they don't display. Odd, isn't it? Chuck --------------------------- Charles Muller <acmuller@gol.com> Faculty of Humanities, Toyo Gakuen University Digital Dictionary of Buddhism and CJKV-English Dictionary [http://www.acmuller.net] Mobile Phone: 090-9310-1787 ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: How to make emacs auto-recognize utf-8 encoded files upon visiting 2002-09-25 6:55 ` Eli Zaretskii 2002-09-25 8:07 ` Charles Muller @ 2002-09-25 8:33 ` Charles Muller 2002-09-26 4:42 ` Eli Zaretskii 2002-09-25 9:21 ` Charles Muller 2002-09-25 9:26 ` Charles Muller 3 siblings, 1 reply; 32+ messages in thread From: Charles Muller @ 2002-09-25 8:33 UTC (permalink / raw) Cc: help-gnu-emacs [-- Attachment #1: Type: Text/Plain, Size: 534 bytes --] Dear Eli, I am curious as to why I am able to see fonts in the HELLO file, but not in any of my other utf-8 encoded text files (unless I use TEI-Emacs). I want to ask you if you are also able to see the Chinese in the attached text file. Also, are you using Linux? If so, which distribution? Thanks, Chuck --------------------------- Charles Muller <acmuller@gol.com> Faculty of Humanities, Toyo Gakuen University Digital Dictionary of Buddhism and CJKV-English Dictionary [http://www.acmuller.net] Mobile Phone: 090-9310-1787 [-- Attachment #2: Lesson11.xml --] [-- Type: Application/Xml, Size: 1757 bytes --] ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: How to make emacs auto-recognize utf-8 encoded files upon visiting 2002-09-25 8:33 ` Charles Muller @ 2002-09-26 4:42 ` Eli Zaretskii 2002-09-26 7:00 ` Charles Muller 0 siblings, 1 reply; 32+ messages in thread From: Eli Zaretskii @ 2002-09-26 4:42 UTC (permalink / raw) Cc: help-gnu-emacs On Wed, 25 Sep 2002, Charles Muller wrote: > I am curious as to why I am able to see fonts in the HELLO file, but not in > any of my other utf-8 encoded text files (unless I use TEI-Emacs). You need to install Unicode fonts as explained in the file INSTALL in the Emacs distribution. > I want > to ask you if you are also able to see the Chinese in the attached text > file. Stock Emacs 21.2 doesn't support utf-8 encoded files if the characters are outside the Latin-1 region, IIRC. ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: How to make emacs auto-recognize utf-8 encoded files upon visiting 2002-09-26 4:42 ` Eli Zaretskii @ 2002-09-26 7:00 ` Charles Muller 2002-09-26 16:05 ` Eli Zaretskii 0 siblings, 1 reply; 32+ messages in thread From: Charles Muller @ 2002-09-26 7:00 UTC (permalink / raw) Eli wrote, > You need to install Unicode fonts as explained in the file INSTALL in the > Emacs distribution. Due to my concerns with CJK and Unicode, this was the first thing I did when I set up my Emacs. I should reiterate: Since I already have a solution in the form of TEI-Emacs, it doesn't matter to me so much personally whether or not this issue is solved here. But I would like to point out that although there has been much generous advice offered on the matter, I haven't heard a subsequent response from anyone who has been *successful* with this--and I mean in a real UTF-8 environment, and not the 7-bit HELLO file. This is also a true among my colleagues in East Asian studies who are using Emacs. Every one of them that I know who has handled the problem successfully has done so through the TEI extension. So until we hear otherwise from someone who has succeeded, and can tell us how, I would suggest that it is simply not doable with the present build of Emacs, and that we should just admit it. Chuck --------------------------- Charles Muller <acmuller@gol.com> Faculty of Humanities, Toyo Gakuen University Digital Dictionary of Buddhism and CJKV-English Dictionary [http://www.acmuller.net] Mobile Phone: 090-9310-1787 ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: How to make emacs auto-recognize utf-8 encoded files upon visiting 2002-09-26 7:00 ` Charles Muller @ 2002-09-26 16:05 ` Eli Zaretskii 2002-09-27 0:36 ` Charles Muller [not found] ` <mailman.1033086929.4506.help-gnu-emacs@gnu.org> 0 siblings, 2 replies; 32+ messages in thread From: Eli Zaretskii @ 2002-09-26 16:05 UTC (permalink / raw) Cc: help-gnu-emacs > From: Charles Muller <acmuller@gol.com> > Date: Thu, 26 Sep 2002 16:00:18 +0900 (JST) > > This is also a true among my colleagues in East Asian studies who are using > Emacs. Every one of them that I know who has handled the problem > successfully has done so through the TEI extension. So until we hear > otherwise from someone who has succeeded, and can tell us how, I would > suggest that it is simply not doable with the present build of Emacs, and > that we should just admit it. Stefan already explained that UTF-8 support in Emacs 21.2 does not include CJK characters. (IIRC, there's an entry in etc/PROBLEMS about that.) However, your original complaint seemed to indicate that CJK support doesn't work in general, not only in UTF-8 encoded files, which is a much graver problem if it were true. ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: How to make emacs auto-recognize utf-8 encoded files upon visiting 2002-09-26 16:05 ` Eli Zaretskii @ 2002-09-27 0:36 ` Charles Muller [not found] ` <mailman.1033086929.4506.help-gnu-emacs@gnu.org> 1 sibling, 0 replies; 32+ messages in thread From: Charles Muller @ 2002-09-27 0:36 UTC (permalink / raw) Cc: help-gnu-emacs Eli wrote: >However, your original complaint seemed to indicate that CJK > support doesn't work in general, not only in UTF-8 encoded files, "Seemed to indicate"? What are you talking about? This entire thread has had "utf-8" in the subject line, and I have mentioned it in every single message. Chuck --------------------------- Charles Muller <acmuller@gol.com> Faculty of Humanities, Toyo Gakuen University Digital Dictionary of Buddhism and CJKV-English Dictionary [http://www.acmuller.net] Mobile Phone: 090-9310-1787 ^ permalink raw reply [flat|nested] 32+ messages in thread
[parent not found: <mailman.1033086929.4506.help-gnu-emacs@gnu.org>]
* Re: How to make emacs auto-recognize utf-8 encoded files upon visiting [not found] ` <mailman.1033086929.4506.help-gnu-emacs@gnu.org> @ 2002-09-27 1:42 ` Miles Bader 2002-09-27 7:06 ` Charles Muller [not found] ` <mailman.1033110323.17834.help-gnu-emacs@gnu.org> 0 siblings, 2 replies; 32+ messages in thread From: Miles Bader @ 2002-09-27 1:42 UTC (permalink / raw) Charles Muller <acmuller@gol.com> writes: > > However, your original complaint seemed to indicate that CJK > > support doesn't work in general, not only in UTF-8 encoded files, > > "Seemed to indicate"? What are you talking about? This entire thread > has had "utf-8" in the subject line, and I have mentioned it in every > single message. The original thread (and thus the `utf-8' in the subject header) was about emacs `not recognizing utf-8 files,' and the message made it clear that it was non-CJK utf-8 files that were being talked about (and that emacs could read them correctly with a bit of assistance). You then recommended that he use `TEI emacs,' and I asked how they did a better job of auto-recognition. You then said `when doing mail in Emacs (21.2), there was never any automatic recognition of any kinds of non-Western encoding' -- no mention of utf-8, and as far as I could see, the topic was about encoding recognition in general, not specifically about utf-8 (the `any kinds of non-Western encoding' being a prime reason for this -- there are things besides utf-8...). Only later did you make a post which made it clear that you were talking about CJK utf-8 support (which indeed emacs does not currently have). A hint for using netnews: Topics drift, and subject headers are often inaccurate in followup postings (and so people tend to disregard them). -Miles -- Ich bin ein Virus. Mach' mit und kopiere mich in Deine .signature. ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: How to make emacs auto-recognize utf-8 encoded files upon visiting 2002-09-27 1:42 ` Miles Bader @ 2002-09-27 7:06 ` Charles Muller [not found] ` <mailman.1033110323.17834.help-gnu-emacs@gnu.org> 1 sibling, 0 replies; 32+ messages in thread From: Charles Muller @ 2002-09-27 7:06 UTC (permalink / raw) Cc: help-gnu-emacs Thanks to Miles for his interpretation of the course of the UTF-8 thread. Most importantly, it has been clearly established that Emacs does not support CJK in utf-8. That's the point I wanted to make. Hopefully this shortcoming will be addressed in future releases. Chuck --------------------------- Charles Muller <acmuller@gol.com> Faculty of Humanities, Toyo Gakuen University Digital Dictionary of Buddhism and CJKV-English Dictionary [http://www.acmuller.net] Mobile Phone: 090-9310-1787 ^ permalink raw reply [flat|nested] 32+ messages in thread
[parent not found: <mailman.1033110323.17834.help-gnu-emacs@gnu.org>]
* Re: How to make emacs auto-recognize utf-8 encoded files upon visiting [not found] ` <mailman.1033110323.17834.help-gnu-emacs@gnu.org> @ 2002-09-27 9:07 ` Miles Bader 2002-09-27 11:56 ` Kai Großjohann 1 sibling, 0 replies; 32+ messages in thread From: Miles Bader @ 2002-09-27 9:07 UTC (permalink / raw) Charles Muller <acmuller@gol.com> writes: > Most importantly, it has been clearly established that Emacs > does not support CJK in utf-8. That's the point I wanted to make. Hopefully > this shortcoming will be addressed in future releases. That's certainly an important feature to be added (and it will be, though perhaps not until the full `internal unicode' version of emacs; I'm not entirely cognizant of the various issues involved). However, it's really tangential to original poster's question -- how to make emacs recognize (non CJK for the moment) utf-8 files more intelligently. That, presumably, is a much simpler thing to fix. -Miles -- "1971 pickup truck; will trade for guns" ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: How to make emacs auto-recognize utf-8 encoded files upon visiting [not found] ` <mailman.1033110323.17834.help-gnu-emacs@gnu.org> 2002-09-27 9:07 ` Miles Bader @ 2002-09-27 11:56 ` Kai Großjohann 2002-09-27 14:10 ` Charles Muller [not found] ` <mailman.1033135767.32171.help-gnu-emacs@gnu.org> 1 sibling, 2 replies; 32+ messages in thread From: Kai Großjohann @ 2002-09-27 11:56 UTC (permalink / raw) Charles Muller <acmuller@gol.com> writes: > Most importantly, it has been clearly established that Emacs > does not support CJK in utf-8. That's the point I wanted to make. Hopefully > this shortcoming will be addressed in future releases. Maybe it is sufficient to install Mule-UCS? I guess that TEI Emacs is Emacs with Mule-UCS pre-installed (plus some other packages). kai -- ~/.signature is: umop ap!sdn (Frank Nobis) ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: How to make emacs auto-recognize utf-8 encoded files upon visiting 2002-09-27 11:56 ` Kai Großjohann @ 2002-09-27 14:10 ` Charles Muller [not found] ` <mailman.1033135767.32171.help-gnu-emacs@gnu.org> 1 sibling, 0 replies; 32+ messages in thread From: Charles Muller @ 2002-09-27 14:10 UTC (permalink / raw) Cc: help-gnu-emacs Kai wrote: > Maybe it is sufficient to install Mule-UCS? I guess that TEI Emacs > is Emacs with Mule-UCS pre-installed (plus some other packages). TEI-Emacs does install Mule-UCS, but the reason for its ability to do what it does must be more than that, because I always install Mule-UCS with my Emacs, and they never render CJK fonts in Unicode until I install the TEI package. Since all of my internet publication and data compilation has to be done in Unicode, that's always the first thing I check. But I don't know enough about Lisp programming to tell you exactly *what* the TEI package does to get this working. No doubt Sebastian Rahtz, Christian Wittern, and some of the others who wrote the package would be happy to tell you what the key routines are. The first priority of the TEI people is to make sure that the SGML/XML/HTML modes are working more precisely and comprehensively than in the standard Emacs package, since the target audience is mainly humanities scholars who are using TEI-XML to mark up literary texts. For example, the way the PSGML is set up in the standard package, it is hard to get it to determine the difference between XML and SGML. They have also added a whole array of DTD's for various purposes, including distinctions in XHTML/strict/transitional. There is also an XSLT mode added that allows for adjustments and debugging. Then, on top of that, because there are so many of us working with mixed international scripts (including CJK), apparently someone decided to figure out how to get all the fonts properly recognized. I am guessing that part of the problem facing the standard installation of Emacs is that with any other traditional encoding outside of Unicode, such as Big5, JIS, or KSC, you always have at least one full font set that is traditionally mapped to the encoding. With Unicode, I don't know of a font that is designed to work readily in Linux/Emacs, that covers all codepoints (the way MS Arial Unicode does in Windows, for example). So a function needs to be added which goes through the document and properly plugs in fonts for each given codepoint. I am not well-enough versed at the technical end to be able to explain how they have accomplished this over in Oxford. I noticed a good bit of negative reaction toward TEI-Emacs when I first mentioned it, where people expressed alarm about the TEI people not caring about the about the GPL and not reporting to the GNU development team. I think that these concerns come as a result of people not really checking into what the package is, and what it does. It is not a new version of Emacs, such as XEmacs. It is simply an add-on, that contains mode enhancements, and some of its own new modes--just the way people are accustomed to adding on calendar modes, e-mail packages, or whatever. I know many of the dedicated people in the Text Encoding Initiative very well, and there is not a bunch around who are more concerned about free software and donating code. But when you really need to get a certain type of application set up for a certain use, I don't think you can just write to the GNU development team and then wait for a future version for it to be implemented. And after all, the Lisp code for TEI-Emacs is just as openly available as any other development based on Emacs. And they have also made a concerted effort to have their add-on support GNU Emacs, rather than XEmacs, so I really see it as a very positive development that should be learned from, rather than disparaged. Chuck --------------------------- Charles Muller <acmuller@gol.com> Faculty of Humanities, Toyo Gakuen University Digital Dictionary of Buddhism and CJKV-English Dictionary [http://www.acmuller.net] Mobile Phone: 090-9310-1787 ^ permalink raw reply [flat|nested] 32+ messages in thread
[parent not found: <mailman.1033135767.32171.help-gnu-emacs@gnu.org>]
* Re: How to make emacs auto-recognize utf-8 encoded files upon visiting [not found] ` <mailman.1033135767.32171.help-gnu-emacs@gnu.org> @ 2002-09-27 14:41 ` Miles Bader 2002-09-27 15:54 ` Stefan Monnier <foo@acm.com> 1 sibling, 0 replies; 32+ messages in thread From: Miles Bader @ 2002-09-27 14:41 UTC (permalink / raw) Charles Muller <acmuller@gol.com> writes: > I noticed a good bit of negative reaction toward TEI-Emacs when I > first mentioned it, where people expressed alarm about the TEI people > not caring about the about the GPL and not reporting to the GNU > development team. I think that these concerns come as a result of > people not really checking into what the package is, and what it does. > It is not a new version of Emacs, such as XEmacs. It is simply an > add-on, that contains mode enhancements, and some of its own new > modes--just the way people are accustomed to adding on calendar modes, > e-mail packages, or whatever. It's perfectly fine for groups such as the TEI to have such add-on packages, and it's certainly not necessary for them to `write to the GNU development team and then wait for a future version for it to be implemented.' However, for people who have developed big packages, I think it would be a good thing to try to maintain some sort of `loose contact' (and maybe they have done so, I don't know), just so that people know what's out there, and perhaps can avoid duplicating the work they've done. If some of their stuff is of general utility, often it can be merged into the emacs distribution, which will lessen the maintenance burden. For instance the problem with unicode font coverage you mentioned is a apparently a big annoyance for the people developing the `real' unicode version of emacs. I think that in the case of many big packages (e.g. `semantic'), there's a general awareness in the emacs community that they exist, so formal contact isn't necassary. I hadn't heard of the TEI work though. -Miles -- We live, as we dream -- alone.... ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: How to make emacs auto-recognize utf-8 encoded files upon visiting [not found] ` <mailman.1033135767.32171.help-gnu-emacs@gnu.org> 2002-09-27 14:41 ` Miles Bader @ 2002-09-27 15:54 ` Stefan Monnier <foo@acm.com> 1 sibling, 0 replies; 32+ messages in thread From: Stefan Monnier <foo@acm.com> @ 2002-09-27 15:54 UTC (permalink / raw) > I noticed a good bit of negative reaction toward TEI-Emacs when I first > mentioned it, where people expressed alarm about the TEI people not caring > about the about the GPL and not reporting to the GNU development team. I didn't react negatively when I asked if they could get (and keep) in touch with us. I just think it'd be good for both us and them. Almost every time I bump into such a package as TEI-Emacs which I had never heard of I feel "what a pity: if I don't know about it despite all the time I spend around Emacs, who will?". Also the code in such packages often does both strange/wrong things (either because of misunderstandings about how some functionality works or because of a lack of some functionality) and great things (which could benefit many more people than users of that package). In either case, it would have been great if they had complained here (for example) about the problems and shortcomings they encountered. Stefan ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: How to make emacs auto-recognize utf-8 encoded files upon visiting 2002-09-25 6:55 ` Eli Zaretskii 2002-09-25 8:07 ` Charles Muller 2002-09-25 8:33 ` Charles Muller @ 2002-09-25 9:21 ` Charles Muller 2002-09-25 9:26 ` Charles Muller 3 siblings, 0 replies; 32+ messages in thread From: Charles Muller @ 2002-09-25 9:21 UTC (permalink / raw) Cc: help-gnu-emacs After Eli wrote ... > Yes. Don't you see that when you visit etc/HELLO? I spend quite a bit of time trying to visit utf-8 files on my hard drives in Emacs 21.2 without the TEI package loaded, and could not view Chinese fonts in any of them, which is odd, in view of the fact that I can see them in the HELLO file. Then I tried C-u C-x = on some of the characters in HELLO, and got this interesting piece of information (for example): file code: ESC 24 28 43 31 5B (encoded by coding system iso-2022-7bit-unix) Unless I am completely misunderstanding something (and I may well be, for I am not a programmer) if this file is encoded as iso-2022-7, it seems that we should not be using it as a test example of utf-8 functionality. Am I right? I have written up a small test file in utf-8 that contains just one line each of Korean, Japanese, and Chinese, in case any one is interested in trying it. It displays for me fine with TEI installed, but as gibberish without it. http://www.acmuller.net/test.txt Chuck --------------------------- Charles Muller <acmuller@gol.com> Faculty of Humanities, Toyo Gakuen University Digital Dictionary of Buddhism and CJKV-English Dictionary [http://www.acmuller.net] Mobile Phone: 090-9310-1787 ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: How to make emacs auto-recognize utf-8 encoded files upon visiting 2002-09-25 6:55 ` Eli Zaretskii ` (2 preceding siblings ...) 2002-09-25 9:21 ` Charles Muller @ 2002-09-25 9:26 ` Charles Muller 2002-09-25 9:41 ` Charles Muller 3 siblings, 1 reply; 32+ messages in thread From: Charles Muller @ 2002-09-25 9:26 UTC (permalink / raw) After Eli wrote ... > Yes. Don't you see that when you visit etc/HELLO? I spent quite a bit of time trying to visit utf-8 files on my hard drives in Emacs 21.2 without the TEI package loaded, and could not view East Asian fonts in any of them, which is odd, in view of the fact that I can see them in the HELLO file. Then I tried C-u C-x = on some of the characters in HELLO, and got this interesting piece of information: (encoded by coding system iso-2022-7bit-unix) Unless I am completely misunderstanding something (and I may well be, for I am not a programmer) if this file is encoded as iso-2022-7bit, it seems that we should not be using it as a test example of utf-8 functionality. Am I right? I have written up a small test file in utf-8 that contains just one line each of Korean, Japanese, and Chinese, in case any one is interested in trying it. It displays for me fine with TEI-Emacs installed, but as gibberish without it (Emacs 21.2 on Red Hat 7.3). I have Mule installed in my regular Emacs package, as well as all the necessary fonts. http://www.acmuller.net/test.txt Chuck --------------------------- Charles Muller <acmuller@gol.com> Faculty of Humanities, Toyo Gakuen University Digital Dictionary of Buddhism and CJKV-English Dictionary [http://www.acmuller.net] Mobile Phone: 090-9310-1787 ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: How to make emacs auto-recognize utf-8 encoded files upon visiting 2002-09-25 9:26 ` Charles Muller @ 2002-09-25 9:41 ` Charles Muller 0 siblings, 0 replies; 32+ messages in thread From: Charles Muller @ 2002-09-25 9:41 UTC (permalink / raw) I wrote: > (encoded by coding system iso-2022-7bit-unix) > > Unless I am completely misunderstanding something (and I may well be, for I > am not a programmer) if this file is encoded as iso-2022-7bit, it seems that we should not be using > it as a test example of utf-8 functionality. I am pretty sure now that I am right about this, because when I check the codepoints in this file > http://www.acmuller.net/test.txt I get this kind of result: category: h:Korean buffer code: 0x93 0xBF 0xF8 file code: 0xEC 0x9B 0x90 (encoded by coding system utf-8-unix) So I think we can forget about HELLO for the purposes of the present discussion. Chuck --------------------------- Charles Muller <acmuller@gol.com> Faculty of Humanities, Toyo Gakuen University Digital Dictionary of Buddhism and CJKV-English Dictionary [http://www.acmuller.net] Mobile Phone: 090-9310-1787 ^ permalink raw reply [flat|nested] 32+ messages in thread
[parent not found: <mailman.1032936261.7964.help-gnu-emacs@gnu.org>]
* Re: How to make emacs auto-recognize utf-8 encoded files upon visiting [not found] ` <mailman.1032936261.7964.help-gnu-emacs@gnu.org> @ 2002-09-25 8:23 ` Miles Bader 2002-09-25 14:55 ` Stefan Monnier <foo@acm.com> 1 sibling, 0 replies; 32+ messages in thread From: Miles Bader @ 2002-09-25 8:23 UTC (permalink / raw) Charles Muller <acmuller@gol.com> writes: > Are you saying that if you open up a document in your build of 21.2 > that contains, for instance, Japanese, Chinese, Korean, and Latin > characters, all the fonts display correctly without needing any > tweaking at all? That's such a vague question that it's hard to answer -- certainly I've never had any particular problems with the above (though the only time I've ever seen _all_ of them in a single file, is in the `HELLO' file that comes with emacs). But for my standard daily usage, everything seems to work fine, with no font tweaking or whatever. Sometimes I get spam email that emacs fails to decode correctly, though. [I do (set-language-environment "Japanese"), which may affect this] > > I'm not saying that you were dreaming, just that those problems are not > > trivial to reproduce, so a full-blown bug report is in order. > > I don't see it as a bug. I see it more as a level of technical capability > that Emacs has not yet reached to, but no doubt will some day. Your original message alluded to problems recognizing certain encodings; that should be easy to report as a bug -- send a document which emacs fails to recognize, which it probably should. -Miles -- `...the Soviet Union was sliding in to an economic collapse so comprehensive that in the end its factories produced not goods but bads: finished products less valuable than the raw materials they were made from.' [The Economist] ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: How to make emacs auto-recognize utf-8 encoded files upon visiting [not found] ` <mailman.1032936261.7964.help-gnu-emacs@gnu.org> 2002-09-25 8:23 ` Miles Bader @ 2002-09-25 14:55 ` Stefan Monnier <foo@acm.com> 1 sibling, 0 replies; 32+ messages in thread From: Stefan Monnier <foo@acm.com> @ 2002-09-25 14:55 UTC (permalink / raw) > Are you saying that if you open up a document in your build of 21.2 that > contains, for instance, Japanese, Chinese, Korean, and Latin characters, > all the fonts display correctly without needing any tweaking at all? Please keep in mind that UTF-8 is not the only encoding in the world. I had to read the rest of the thread to understand that you meant "a UTF-8 file with the above chars". Indeed the stock Emacs distribution doesn't yet handle UTF-8 encoded asian chars quite right. IIUC the development code is getting closer, but isn't quite there yet. To get what you want you currently need to use Mule-UCS (which I gather is one of the things that are included in TEI-Emacs). Stefan ^ permalink raw reply [flat|nested] 32+ messages in thread
* tramp 2002-09-24 8:59 ` Charles Muller 2002-09-24 15:12 ` Eli Zaretskii @ 2002-09-24 19:05 ` Roger Mason 1 sibling, 0 replies; 32+ messages in thread From: Roger Mason @ 2002-09-24 19:05 UTC (permalink / raw) Hi, I downloaded tramp-2.0.22 and installed it. I am trying to get it configured. For now I have to use telnet as the default connection method, so I executed (setq tramp-default-method "tm") in the scratch buffer and it returned "tm" When I try to get a file from a my home dir on a remote machine: /fred.here.there.everywhere:~/.emacs I get Method 'tm' didn't specify a connection function in the mini-buffer. Obviously I missed something in the configuration. Can anyone help? Thanks, Roger Mason emacs 21.1 ^ permalink raw reply [flat|nested] 32+ messages in thread
[parent not found: <mailman.1032848900.31556.help-gnu-emacs@gnu.org>]
* Re: How to make emacs auto-recognize utf-8 encoded files upon visiting [not found] ` <mailman.1032848900.31556.help-gnu-emacs@gnu.org> @ 2002-09-24 8:26 ` A. Lucien Meyers 0 siblings, 0 replies; 32+ messages in thread From: A. Lucien Meyers @ 2002-09-24 8:26 UTC (permalink / raw) * Miles Bader <miles@lsi.nec.co.jp>: > Charles Muller <acmuller@gol.com> writes: > > Sorry to mention TEI-Emacs again, but its superior ability to > > identify document encodings (especially Unicode) is one of things > > that I like best about it. > > I'm sure, but saying `download an entirely new version of emacs' is > a rather heavyweight solution. > > If they've hacked emacs to do a better job of this, how did they do > it? Did they ever try to submit their changes back to emacs? TEI-Emacs? Huh? What, where, why? Are TEI developers observing GPL? Lucien -- If you receive this by error, please delete it and inform the sender. http://www.consult-meyers.com recommends email encryption using GnuPG. ^ permalink raw reply [flat|nested] 32+ messages in thread
* auto-recognize utf-8 encoded files upon visiting: solved (sort of...) 2002-09-23 16:39 How to make emacs auto-recognize utf-8 encoded files upon visiting Gerald Wildgruber 2002-09-23 23:35 ` Jesper Harder @ 2002-09-24 11:45 ` Gerald Wildgruber 2002-09-24 12:39 ` Charles Muller [not found] ` <mailman.1032871109.14505.help-gnu-emacs@gnu.org> 2002-09-24 18:57 ` How to make emacs auto-recognize utf-8 encoded files upon visiting Dominic Cronin 2 siblings, 2 replies; 32+ messages in thread From: Gerald Wildgruber @ 2002-09-24 11:45 UTC (permalink / raw) Thanks to everybody who helped answering my question! What I was trying to do was to make emacs auto-recognize utf-8 encoded files upon visiting. In the beginning, it didn't. The problem seems to have been that -- latin-9 being my primary language environment -- utf-8 only appeared at the end of my priority list for encodings. Emacs then seems to take my utf-8 file as beeing encoded in one of the prior coding entries of the priority list (probably as one of the iso-2022 family). Seems to be an erroneous recognition. Letters beyond ascii are messed up then. I didn't want utf-8 to be on the first place of the priority list (because then all newly cereated files then have it as their default encoding), but neither on the last one. If you do a double M-x prefer-coding-system, the first time with utf-8, the second time with latin-9 as the value, utf is promoted to the second place of the priority list, latin-9 remains on the first place. Now without any explicit indication of the encoding (e.g. via file variables) emacs correctly recognizes the encoding, when I'm visiting utf-8 files. To achieve this entry order in the priority list PERMANENTLY I simply put the following two lines, in this order, into my init file: (prefer-coding-system 'utf-8) (prefer-coding-system 'latin-9) I'm sure there is a cleaner and more elegant solution, but it kind of works. Last remark: Charles, thanks for your hint on TEI; I gave TEI a long try many years ago, when SGML came up, they provided very good introductory material to the whole issue. But I didn't know of their work on emacs. You say that the unicode stuff didn't work right on emacs 21.2. I compiled an emacs version from the CVS sources (http://savannah.gnu.org/projects/emacs/) and there unicode integration seems to be already more evolved than in the official distribution. Almost everything works very well. Perhaps you should give it a try. I'm also working with different languages (cl.greek) and I am very happy with the evolving unicode capabilities of emacs. I think unicode integration is THE way by which emacs stops being merely a tool for programmers and addresses a wider audience also in the humanities. Gerald. ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: auto-recognize utf-8 encoded files upon visiting: solved (sort of...) 2002-09-24 11:45 ` auto-recognize utf-8 encoded files upon visiting: solved (sort of...) Gerald Wildgruber @ 2002-09-24 12:39 ` Charles Muller [not found] ` <mailman.1032871109.14505.help-gnu-emacs@gnu.org> 1 sibling, 0 replies; 32+ messages in thread From: Charles Muller @ 2002-09-24 12:39 UTC (permalink / raw) Gerald wrote: > Charles, thanks for your hint on TEI; I gave TEI a long try many years ago, > when SGML came up, they provided very good introductory material to the > whole issue. But I didn't know of their work on emacs. You say that the > unicode stuff didn't work right on emacs 21.2. It worked OK for me for European languages, and I can do CJK input/output in Chinese, Japanese, Korean, either in the national encodings or in utf-8. But regular Emacs 21.2, although it seems to be able to recognize the encodings of files, does not seem to be able to apply the proper fonts very well--especially East Asian fonts, when they are mixed in together with Latin. > I compiled an emacs version > from the CVS sources (http://savannah.gnu.org/projects/emacs/) and there > unicode integration seems to be already more evolved than in the official > distribution. Almost everything works very well. Perhaps you should give it > a try. TEI-Emacs applies a "fontifying" process wherein the proper fonts are applied to all of the different mixed-language codepoints that I am using. I wonder if this recent package you mention can do that? In any case, what you have told me here is certainly good news, and I'll keep it in mind. As you have guessed, I do not use Emacs for programming, but as an infinitely customizable text-editing environment to carry out humanities research projects. The main reason I use TEI-Emacs is not for the Unicode handling, but because I am running a number of research projects that are structured by TEI-XML. I'm using their DTD's, style sheets, and everything, so it's a complete package for me, that takes care of almost everything. Nonetheless, like you, I'm looking forward to the continued development of the Emacsens toward Unicode. I know that Xemacs 21.5 will set UTF-8 as the default encoding. I hope the developers at GNU Emacs have similar intentions in mind. Regards, Chuck --------------------------- Charles Muller <acmuller@gol.com> Faculty of Humanities, Toyo Gakuen University Digital Dictionary of Buddhism and CJKV-English Dictionary [http://www.acmuller.net] Mobile Phone: 090-9310-1787 ^ permalink raw reply [flat|nested] 32+ messages in thread
[parent not found: <mailman.1032871109.14505.help-gnu-emacs@gnu.org>]
* Re: auto-recognize utf-8 encoded files upon visiting: solved (sort of...) [not found] ` <mailman.1032871109.14505.help-gnu-emacs@gnu.org> @ 2002-09-25 14:28 ` A. L. Meyers 0 siblings, 0 replies; 32+ messages in thread From: A. L. Meyers @ 2002-09-25 14:28 UTC (permalink / raw) Very nice to hear some knowledgable people reiterate the significance and importance of unicode in the intellectual history of mankind. And even nicer to see emacsen in its vanguard. Lucien -- If you receive this by error, please delete it and inform the sender. PGP key fingerprint=F1C0 D9AE 1B18 1405 4DFA B4CC 6DC7 FF78 C76E FB15 To Big Brother Echelon from "spook": SEAL Team 6 smuggle supercomputer Soviet opus dei Somalia Kabul spy Nazi ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: How to make emacs auto-recognize utf-8 encoded files upon visiting 2002-09-23 16:39 How to make emacs auto-recognize utf-8 encoded files upon visiting Gerald Wildgruber 2002-09-23 23:35 ` Jesper Harder 2002-09-24 11:45 ` auto-recognize utf-8 encoded files upon visiting: solved (sort of...) Gerald Wildgruber @ 2002-09-24 18:57 ` Dominic Cronin 2 siblings, 0 replies; 32+ messages in thread From: Dominic Cronin @ 2002-09-24 18:57 UTC (permalink / raw) On 23 Sep 2002 18:39:19 +0200, Gerald Wildgruber <gwil.remove.this.phrase@lrz.uni-muenchen.de> wrote: > >Hello, > >I'm trying to make my emacs (GNU Emacs 21.3.50.1 on linux) auto-recognize >the right encoding when visiting files with utf-8 encoding. The emacs info >help entry says on the topic: > >"Some coding systems can be recognized or distinguished by which byte >sequences appear in the data. However, there are coding systems that cannot >be distinguished, not even potentially." > >Does this also apply to utf-8 encoded files? Is it impossible for emacs to >auto-recognize them (as for example the `file' command on the shell does)? The RFC for UTF-8 (see http://www.ietf.org/rfc/rfc2279.txt) states: UTF-8 strings can be fairly reliably recognized as such by a simple algorithm, i.e. the probability that a string of characters in any other encoding appears as valid UTF-8 is low, diminishing with increasing string length. BTW - the RFC is quite an interesting read: an elegant solution to a problem. -- Dominic Cronin Amsterdam ^ permalink raw reply [flat|nested] 32+ messages in thread
end of thread, other threads:[~2002-09-27 15:54 UTC | newest] Thread overview: 32+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2002-09-23 16:39 How to make emacs auto-recognize utf-8 encoded files upon visiting Gerald Wildgruber 2002-09-23 23:35 ` Jesper Harder 2002-09-24 3:29 ` Charles Muller [not found] ` <mailman.1032838300.26368.help-gnu-emacs@gnu.org> 2002-09-24 6:27 ` Miles Bader 2002-09-24 8:59 ` Charles Muller 2002-09-24 15:12 ` Eli Zaretskii 2002-09-25 6:45 ` Charles Muller 2002-09-25 6:55 ` Eli Zaretskii 2002-09-25 8:07 ` Charles Muller 2002-09-25 8:33 ` Charles Muller 2002-09-26 4:42 ` Eli Zaretskii 2002-09-26 7:00 ` Charles Muller 2002-09-26 16:05 ` Eli Zaretskii 2002-09-27 0:36 ` Charles Muller [not found] ` <mailman.1033086929.4506.help-gnu-emacs@gnu.org> 2002-09-27 1:42 ` Miles Bader 2002-09-27 7:06 ` Charles Muller [not found] ` <mailman.1033110323.17834.help-gnu-emacs@gnu.org> 2002-09-27 9:07 ` Miles Bader 2002-09-27 11:56 ` Kai Großjohann 2002-09-27 14:10 ` Charles Muller [not found] ` <mailman.1033135767.32171.help-gnu-emacs@gnu.org> 2002-09-27 14:41 ` Miles Bader 2002-09-27 15:54 ` Stefan Monnier <foo@acm.com> 2002-09-25 9:21 ` Charles Muller 2002-09-25 9:26 ` Charles Muller 2002-09-25 9:41 ` Charles Muller [not found] ` <mailman.1032936261.7964.help-gnu-emacs@gnu.org> 2002-09-25 8:23 ` Miles Bader 2002-09-25 14:55 ` Stefan Monnier <foo@acm.com> 2002-09-24 19:05 ` tramp Roger Mason [not found] ` <mailman.1032848900.31556.help-gnu-emacs@gnu.org> 2002-09-24 8:26 ` How to make emacs auto-recognize utf-8 encoded files upon visiting A. Lucien Meyers 2002-09-24 11:45 ` auto-recognize utf-8 encoded files upon visiting: solved (sort of...) Gerald Wildgruber 2002-09-24 12:39 ` Charles Muller [not found] ` <mailman.1032871109.14505.help-gnu-emacs@gnu.org> 2002-09-25 14:28 ` A. L. Meyers 2002-09-24 18:57 ` How to make emacs auto-recognize utf-8 encoded files upon visiting Dominic Cronin
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).