* bug#33796: 27.0.50; Use utf-8 is all our Elisp files @ 2018-12-18 18:46 Stefan Monnier 2018-12-18 19:22 ` Eli Zaretskii ` (2 more replies) 0 siblings, 3 replies; 36+ messages in thread From: Stefan Monnier @ 2018-12-18 18:46 UTC (permalink / raw) To: 33796 [-- Attachment #1: Type: text/plain, Size: 996 bytes --] Package: Emacs Version: 27.0.50 Since Emacs-25, UTF-8 is the standard/default encoding for Elisp files. The attached patch changes the few non-utf-8 Elisp files to use utf-8. AFAICT, this patch is safe in the sense that the resulting .elc files are identical (except for titdic-cnv.elc obviously, since I not only changed the encoding but also the code, but I also checked that the change of encoding itself does not affect the resulting .elc file). In this patch, I made titdic-cnv.el use utf-8-emacs instead of utf-8 since it includes chars that can't be encoded with utf-8. I'm not sure why the same does not apply to the files it generates, but in my tests all the quail files it generates can use utf-8 (rather than utf-8-emacs) without affecting the generated .elc files (although the non-utf-8 chars of titdic-cnv.el seem to be inserted into some of the generated files according to my reading of the code). Any comments on the patch, or objection to installing it? Stefan [-- Attachment #2: 0001-Convert-remaining-non-utf-8-Elisp-files-to-utf-8.patch --] [-- Type: application/octet-stream, Size: 56269 bytes --] ^ permalink raw reply [flat|nested] 36+ messages in thread
* bug#33796: 27.0.50; Use utf-8 is all our Elisp files 2018-12-18 18:46 bug#33796: 27.0.50; Use utf-8 is all our Elisp files Stefan Monnier @ 2018-12-18 19:22 ` Eli Zaretskii 2018-12-18 19:46 ` Stefan Monnier 2018-12-19 17:54 ` Paul Eggert 2019-01-08 2:20 ` Stefan Monnier 2 siblings, 1 reply; 36+ messages in thread From: Eli Zaretskii @ 2018-12-18 19:22 UTC (permalink / raw) To: Stefan Monnier; +Cc: 33796 > From: Stefan Monnier <monnier@iro.umontreal.ca> > Date: Tue, 18 Dec 2018 13:46:45 -0500 > > Since Emacs-25, UTF-8 is the standard/default encoding for Elisp files. > The attached patch changes the few non-utf-8 Elisp files to use utf-8. > > AFAICT, this patch is safe in the sense that the resulting .elc files > are identical (except for titdic-cnv.elc obviously, since I not only > changed the encoding but also the code, but I also checked that the > change of encoding itself does not affect the resulting .elc file). The .elc files are identical, but visiting the .el files will (or might) use different fonts, because the charset information is lost. (You will see that I jumped through some hoops to do something similar with etc/HELLO.) So I don't think we should make this change without considering whether the charset information is as important nowadays as it was back then. And I'm not really sure who to ask about this. ^ permalink raw reply [flat|nested] 36+ messages in thread
* bug#33796: 27.0.50; Use utf-8 is all our Elisp files 2018-12-18 19:22 ` Eli Zaretskii @ 2018-12-18 19:46 ` Stefan Monnier 0 siblings, 0 replies; 36+ messages in thread From: Stefan Monnier @ 2018-12-18 19:46 UTC (permalink / raw) To: Eli Zaretskii; +Cc: 33796 > The .elc files are identical, but visiting the .el files will (or > might) use different fonts, because the charset information is lost. > (You will see that I jumped through some hoops to do something similar > with etc/HELLO.) That's indeed what I understand of the situation. But I don't think it's a good reason to keep supporting non-utf-8 encoding for ever (many/most programming languages only support a single encoding, typically ASCII or utf-8 nowadays). Part of the purpose of this bug-report is to try and come up with a plan ;-) Hence, there are some questions: - Do those people who edit those files really care about the difference? After all, IIUC utf-8 is becoming standard even in the CJK world so maybe the change is not that terrible (or at least, users have gotten used to lowering their expectations in this respect). - If the change is indeed problematic, can we adjust it by using a file-global language tag? - If that's not sufficient, can we use a scheme like that of etc/HELLO but to keep the files directly usable as Elisp (so as to have our cake and eat it too)? > So I don't think we should make this change without considering > whether the charset information is as important nowadays as it was > back then. How 'bout installing the titdic-cnv.el part which changes the coding system used for the generated quail files (being auto-generated their rending as source files shouldn't matter nearly as much since noone should edit them)? > And I'm not really sure who to ask about this. I added Handa in the Cc, since I had forgotten to add him to the X-Debbugs-Cc. Stefan ^ permalink raw reply [flat|nested] 36+ messages in thread
* bug#33796: 27.0.50; Use utf-8 is all our Elisp files 2018-12-18 18:46 bug#33796: 27.0.50; Use utf-8 is all our Elisp files Stefan Monnier 2018-12-18 19:22 ` Eli Zaretskii @ 2018-12-19 17:54 ` Paul Eggert 2018-12-19 18:11 ` Eli Zaretskii 2018-12-19 21:16 ` bug#33796: 27.0.50; Use utf-8 is all our Elisp files Stefan Monnier 2019-01-08 2:20 ` Stefan Monnier 2 siblings, 2 replies; 36+ messages in thread From: Paul Eggert @ 2018-12-19 17:54 UTC (permalink / raw) To: Stefan Monnier; +Cc: 33796 > I'm not really sure who to ask about this. You can ask me (:-). Although I can't read east-Asian languages I do have significant experience with CJK text as my previous (15-year) job was in a company whose customers were almost all CJK and where CJK internationalization was essential and where I regularly dealt with weird encodings and displays. And this one is an easy call: for maintaining these particular files, UTF-8 is an improvement and this patch should go in. To take just one example, titdic-cnv.el: people who are seriously maintaining it and who need to read the Chinese text will almost surely have their environment set up to display UTF-8 Chinese text well already. Furthermore, if you take a look at all the changes made to this file in the last decade, here are the statistics: edits contributor 15 Author: Paul Eggert <eggert@cs.ucla.edu> 10 Author: Glenn Morris <rgm@gnu.org> 2 Author: Stefan Monnier <monnier@iro.umontreal.ca> 2 Author: Juanma Barranquero <lekktu@gmail.com> 1 Author: Phillip Lord <phillip.lord@russet.org.uk> 1 Author: Kenichi Handa <handa@m17n.org> 1 Author: Andreas Schwab <schwab@linux-m68k.org> Only one edit was made by a CJK user, and handa's edit involved only ASCII characters. Switching this file to UTF-8 would not have made any of our maintenance any more difficult in the last decade. Conversely, I commonly use tools like 'git grep' to look for issues in the code, and these tools mishandle non-UTF-8 files and I see mojibake on my screen because of this. So it will be a significant win for me (and I suspect others) when we switch these files to UTF-8. To try to answer Stefan's questions: > - Do those people who edit those files really care about the difference? No, almost always: see above. > utf-8 is becoming standard even in the CJK world so > maybe the change is not that terrible (or at least, users have gotten > used to lowering their expectations in this respect). Yes, that’s happened. I looked for recent reports about this, and it appears that the controversy is mostly over. For example, <https://gihyo.jp/lifestyle/serial/01/ganshiki-soushi/0069> (dated 2015) lamented the demise of Japanese Knoppix and said that Plamo Linux had problems with EUC-JP and suggested users switch to UTF-8. More recently <https://qiita.com/tenforward/items/5e353f290f0b401139cb> (dated this year) says that the choice of EUC-JP or UTF-8 is user-specific for Plamo Linux, and that applications like Firefox have problems with EUC-JP so discretion is advised if you choose EUC-JP. If even hardcore holdouts like Plamo are folding.... > - If the change is indeed problematic, can we adjust it by using > a file-global language tag? I hope that’s not necessary, but it’d be OK if we have to do it. > - If that's not sufficient, can we use a scheme like that > of etc/HELLO but to keep the files directly usable as Elisp (so as to > have our cake and eat it too)? etc/HELLO is pretty much a disaster for me now, as I can’t use any tool other than Emacs to look at it, and even Emacs screws up if I do something like 'M-x grep RET hello etc/HELLO RET'. I’d rather not extend this disaster to other files. PS. One minor suggestion for your patch: please also update the list of files in admin/notes/unicode to remove mention of the files in question. PPS. How about also converting etc/tutorials/TUTORIAL.ja, lisp/leim/quail/hanja-jis.el, lisp/leim/quail/japanese.el, lisp/leim/quail/py-punct.el, and lisp/leim/quail/pypunct-b5.el? ^ permalink raw reply [flat|nested] 36+ messages in thread
* bug#33796: 27.0.50; Use utf-8 is all our Elisp files 2018-12-19 17:54 ` Paul Eggert @ 2018-12-19 18:11 ` Eli Zaretskii 2018-12-19 22:13 ` Paul Eggert 2018-12-19 21:16 ` bug#33796: 27.0.50; Use utf-8 is all our Elisp files Stefan Monnier 1 sibling, 1 reply; 36+ messages in thread From: Eli Zaretskii @ 2018-12-19 18:11 UTC (permalink / raw) To: Paul Eggert; +Cc: monnier, 33796 > Cc: 33796@debbugs.gnu.org, Eli Zaretskii <eliz@gnu.org> > From: Paul Eggert <eggert@cs.ucla.edu> > Date: Wed, 19 Dec 2018 09:54:40 -0800 > > > I'm not really sure who to ask about this. > > You can ask me (:-). Although I can't read east-Asian languages I do > have significant experience with CJK text as my previous (15-year) job > was in a company whose customers were almost all CJK and where CJK > internationalization was essential and where I regularly dealt with > weird encodings and displays. And this one is an easy call: for > maintaining these particular files, UTF-8 is an improvement and this > patch should go in. Thanks. I could predict your answers in advance. I need to hear a second opinion, from someone who does read these languages, because the issue at hand is how the charset information affects the font(s) selected for displaying the text, and how important are the differences in those fonts to CJK users. > etc/HELLO is pretty much a disaster for me now, as I can’t use any tool > other than Emacs to look at it ??? It's a UTF-8 file with markup. Do you have the same problems with HTML and XML files? (I'm not saying that we should use the same technique for Lisp files, of course.) ^ permalink raw reply [flat|nested] 36+ messages in thread
* bug#33796: 27.0.50; Use utf-8 is all our Elisp files 2018-12-19 18:11 ` Eli Zaretskii @ 2018-12-19 22:13 ` Paul Eggert 2018-12-20 16:06 ` Eli Zaretskii 0 siblings, 1 reply; 36+ messages in thread From: Paul Eggert @ 2018-12-19 22:13 UTC (permalink / raw) To: Eli Zaretskii; +Cc: monnier, 33796 On 12/19/18 10:11 AM, Eli Zaretskii wrote: > I need to hear a second opinion, That would actually be a third opinion, as Stefan's opinion surely counts too and he has good reasons to prefer UTF-8 here. And to some extent opinions should be weighted for the kind of maintenance that is actually done with these files as opposed to the rare cases where the font's style might annoy a language-expert developer if the wrong language environment were used. >> etc/HELLO is pretty much a disaster for me now, as I can’t use any tool >> other than Emacs to look at it > > ??? It's a UTF-8 file with markup. Do you have the same problems with > HTML and XML files? No, because when I visit those files I see the same thing in my Emacs editing buffer that I see after using common keystrokes like 'C-x v =' or standard tools like "git diff", and it's easy to use Emacs to edit these files in the usual way without becoming expert in html-mode etc. In contrast, with etc/HELLO standard tools and common keystrokes give me gibberish, and one must gain expertise in enriched-mode to make nontrivial changes. A primary goal of Emacs is to have source code that the user can change easily, and using enriched-text mode in etc/HELLO works against this. It might be OK just for that one file (as a demonstration of enriched-text mode perhaps) but as things stand we shouldn't let these issues infect the rest of the Emacs sources. ^ permalink raw reply [flat|nested] 36+ messages in thread
* bug#33796: 27.0.50; Use utf-8 is all our Elisp files 2018-12-19 22:13 ` Paul Eggert @ 2018-12-20 16:06 ` Eli Zaretskii 2018-12-20 21:49 ` Paul Eggert 0 siblings, 1 reply; 36+ messages in thread From: Eli Zaretskii @ 2018-12-20 16:06 UTC (permalink / raw) To: Paul Eggert, Kenichi Handa; +Cc: monnier, 33796 > Cc: monnier@iro.umontreal.ca, 33796@debbugs.gnu.org > From: Paul Eggert <eggert@cs.ucla.edu> > Date: Wed, 19 Dec 2018 14:13:59 -0800 > > On 12/19/18 10:11 AM, Eli Zaretskii wrote: > > I need to hear a second opinion, > > That would actually be a third opinion, as Stefan's opinion surely > counts too and he has good reasons to prefer UTF-8 here. Technically, it's the forth, because my opinion should also count, right? But this is besides the point, because we need the opinion of people who might be actually affected by the proposed change, and none of us qualify. All 3 of us simply don't care, because we don't read these scripts and don't distinguish the various fonts used to display the same Unicode codepoints under different cultural conventions. At some point in the past that distinction was very important. If nowadays it no longer is, then I see no problems making the change. Otherwise, the change will lose information important to some of our users. We need someone to advise us what is the actual state of the affairs. I hope Handa-san will (please don't drop him from the CC list). Or maybe someone here can propose other experts or even just users with relevant experience. > And to some extent opinions should be weighted for the kind of > maintenance that is actually done with these files as opposed to the > rare cases where the font's style might annoy a language-expert > developer if the wrong language environment were used. This is also beyond the point, because we have nothing to weigh this against for now. When we do, we will. > >> etc/HELLO is pretty much a disaster for me now, as I can’t use any tool > >> other than Emacs to look at it > > > > ??? It's a UTF-8 file with markup. Do you have the same problems with > > HTML and XML files? > > No, because when I visit those files I see the same thing in my Emacs > editing buffer that I see after using common keystrokes like 'C-x v =' > or standard tools like "git diff", and it's easy to use Emacs to edit > these files in the usual way without becoming expert in html-mode etc. > In contrast, with etc/HELLO standard tools and common keystrokes give me > gibberish, and one must gain expertise in enriched-mode to make > nontrivial changes. This line of reasoning makes little sense to me: . Displaying HELLO doesn't show "gibberish", it shows UTF-8 encoded text with pure-ASCII markup. If your terminal can display these characters, you should see legible marked-up text, whereas the ISO-2022 encoded file of yore would display as illegible escape sequences. But since in your opinion the current situation is a "disaster", you seem to be saying that we should go back to ISO-2022? . By the above reasoning, if Emacs is enhanced to interpret HTML/XML and show typefaces instead of markup, you will see that as a regression and complain that raw HTML files are "gibberish"? . You have find-file-literally to show you HELLO exactly as any text-mode tool will see it, if you really need that. . No experience in Enriched mode is needed to edit HELLO, you just need to apply text properties (via facemenu.el commands or the menu-bar's Edit->Text Properties menu). And these properties are optional. > A primary goal of Emacs is to have source code that the user can change > easily, and using enriched-text mode in etc/HELLO works against this. It > might be OK just for that one file (as a demonstration of enriched-text > mode perhaps) but as things stand we shouldn't let these issues infect > the rest of the Emacs sources. etc/HELLO is not a demonstration of Enriched mode, it is a demonstration of facilities to edit and display many different scripts and character sets in the same buffer. We use Enriched mode there because we have no other feature which allows us to save 'charset' text property to a disk file. ^ permalink raw reply [flat|nested] 36+ messages in thread
* bug#33796: 27.0.50; Use utf-8 is all our Elisp files 2018-12-20 16:06 ` Eli Zaretskii @ 2018-12-20 21:49 ` Paul Eggert 2018-12-21 7:29 ` Eli Zaretskii 0 siblings, 1 reply; 36+ messages in thread From: Paul Eggert @ 2018-12-20 21:49 UTC (permalink / raw) To: Eli Zaretskii, Kenichi Handa; +Cc: monnier, 33796 On 12/20/18 8:06 AM, Eli Zaretskii wrote: > my opinion should also count, right? Of course, although my impression was that you weren't expressing an opinion and were soliciting opinions. If your opinion is that we should not make the change, then of course that counts. > we need the opinion of people > who might be actually affected by the proposed change, I assume you mean that we need the opinion of people who would be affected _negatively_. Stefan and I would actually be affected _positively_ by the proposed change, for the reasons we stated. > All 3 of us simply don't care, No, actually I do care. Non-UTF-8 source files are a real annoyance for me, on a fairly regular basis. Stefan seems to care too, though I suspect he doesn't care as much as I do. > . Displaying HELLO doesn't show "gibberish", it shows UTF-8 encoded > text with pure-ASCII markup. You're right. My apologies: when I wrote "gibberish" I was looking at the output of "git diff emacs-26..master etc/HELLO", which does indeed display gibberish but that's not the current encoding's fault. > But since in your opinion the current situation is a > "disaster", you seem to be saying that we should go back to ISO-2022? Not at all, but I do think we should cut down on the unnecessary markup in that file. The markup should be used only when it helps. Text like "<x-charset><param>mule-unicode-0100-24ff</param> </x-charset>" is not helping anybody; the file should just contain " " there. Most of the markup in that file is not necessary for proper display, and just gets in the way when using tools other than Emacs. > . By the above reasoning, if Emacs is enhanced to interpret HTML/XML > and show typefaces instead of markup, you will see that as a > regression and complain that raw HTML files are "gibberish"? I hope Emacs doesn't do any such thing by default. I often use Emacs to edit .html and .xml files, and if it attempted to render these files by default I would be inconvenienced. Presumably there would be an option to keep the old behavior, and I'd use that option. > . You have find-file-literally to show you HELLO exactly as any > text-mode tool will see it No, because find-file-literally shows hard-to-read stuff like this: </x-charset><x-charset><param>greek-iso8859-7</param>Greek (\316\265\316\273\316\273\316\267\316\275\316\271\316\272\316\254) \316\223\316\265\316\271\316\254 \317\203\316\261\317\202 which differs from (and is even worse than) what an ordinary tool like git or cat shows: </x-charset><x-charset><param>greek-iso8859-7</param>Greek (ελληνικά) Γειά σας It would be better to remove this particular markup, so that git etc. would show this: Greek (ελληνικά) Γειά σας which is what Emacs ordinarily shows. > . No experience in Enriched mode is needed to edit HELLO, you just > need to apply text properties (via facemenu.el commands or the > menu-bar's Edit->Text Properties menu). And these properties are > optional. Let's leave most of them out then, as they're not working well in etc/HELLO. I don't use that menu, but I took your hint and just now tried it, by selecting the abovementioned word "ελληνικά" and menuing to Edit > Text Properties > Describe Properties, but all it said was 'Text content at position 1530: There are text properties here: unknown ("x-charset")'. This missed the point that the word's character set is greek-iso8859-7 which is a special hack that hints to Emacs (and nobody else, I guess? I couldn't find documentation for this stuff even in the Emacs manuals) that the text should be displayed with a Greek font instead of the same Greek font that Emacs would be using anyway. And I didn't see an easy way to see visually that the this (unnecessary) <x-charset> hint is misplaced, since it should be placed so that it applies only to the Greek text and not to the surrounding English text in the same line. ^ permalink raw reply [flat|nested] 36+ messages in thread
* bug#33796: 27.0.50; Use utf-8 is all our Elisp files 2018-12-20 21:49 ` Paul Eggert @ 2018-12-21 7:29 ` Eli Zaretskii 2018-12-21 13:46 ` Stefan Monnier ` (2 more replies) 0 siblings, 3 replies; 36+ messages in thread From: Eli Zaretskii @ 2018-12-21 7:29 UTC (permalink / raw) To: Paul Eggert; +Cc: monnier, 33796 > Cc: monnier@iro.umontreal.ca, 33796@debbugs.gnu.org > From: Paul Eggert <eggert@cs.ucla.edu> > Date: Thu, 20 Dec 2018 13:49:44 -0800 > > On 12/20/18 8:06 AM, Eli Zaretskii wrote: > > > my opinion should also count, right? > > Of course, although my impression was that you weren't expressing an > opinion and were soliciting opinions. Same as Stefan, actually: he asked whether there were objections. > > we need the opinion of people > > who might be actually affected by the proposed change, > > I assume you mean that we need the opinion of people who would be > affected _negatively_. Not necessarily. I would actually like to hear opinions from people who read CJK scripts who think the distinction no longer matters, not these days. > > All 3 of us simply don't care, > > No, actually I do care. Non-UTF-8 source files are a real annoyance for > me This is a misunderstanding: by "don't care" I meant we don't care which font is used to display a particular Unicode codepoint in the Han area. > I do think we should cut down on the unnecessary markup > in that file. Agreed. > The markup should be used only when it helps. Text like > "<x-charset><param>mule-unicode-0100-24ff</param> </x-charset>" is not > helping anybody; the file should just contain " " there. There are only 2 such occurrences, so this isn't a grave problem. I will take a look when I have time. > Most of the markup in that file is not necessary for proper display, > and just gets in the way when using tools other than Emacs. Which markup is not necessary for display, in your opinion? I'm surprised to hear that "most of it" is unnecessary, but maybe I'm missing something. > > . By the above reasoning, if Emacs is enhanced to interpret HTML/XML > > and show typefaces instead of markup, you will see that as a > > regression and complain that raw HTML files are "gibberish"? > > I hope Emacs doesn't do any such thing by default. Really? Quite a few Emacs users think that it should, and that the fact it doesn't is one of the significant deficiencies in Emacs, as compared to other popular editors. > </x-charset><x-charset><param>greek-iso8859-7</param>Greek (ελληνικά) > Γειά σας > > It would be better to remove this particular markup, so that git etc. > would show this: > > Greek (ελληνικά) Γειά σας > > which is what Emacs ordinarily shows. That markup is precisely what keeps the charset properties on the corresponding greetings. Removing it would be losing information that HELLO is trying to preserve. > I don't use that menu, but I took your hint and just now > tried it, by selecting the abovementioned word "ελληνικά" and menuing to > Edit > Text Properties > Describe Properties, but all it said was 'Text > content at position 1530: There are text properties here: unknown > ("x-charset")'. This missed the point that the word's character set is > greek-iso8859-7 I cannot reproduce this. That menu item invokes the command describe-text-properties, which pops up the *Help* buffer, and the text there says: Text content at position 1530: There are text properties here: charset greek-iso8859-7 I wonder why you don't see that. Is it possible that you are looking at a file/buffer that was modified from its original contents? > which is a special hack that hints to Emacs (and nobody else, I > guess? I couldn't find documentation for this stuff even in the > Emacs manuals) that the text should be displayed with a Greek font > instead of the same Greek font that Emacs would be using anyway. The charset property allows us to have a fontset that directs Emacs to use specific fonts for specific character ranges. See set-fontset-font. I do agree that these issues are notoriously under-documented. ^ permalink raw reply [flat|nested] 36+ messages in thread
* bug#33796: 27.0.50; Use utf-8 is all our Elisp files 2018-12-21 7:29 ` Eli Zaretskii @ 2018-12-21 13:46 ` Stefan Monnier 2018-12-21 15:54 ` Eli Zaretskii 2018-12-21 13:55 ` Eli Zaretskii 2018-12-21 21:07 ` Paul Eggert 2 siblings, 1 reply; 36+ messages in thread From: Stefan Monnier @ 2018-12-21 13:46 UTC (permalink / raw) To: Eli Zaretskii; +Cc: Paul Eggert, 33796 > Not necessarily. I would actually like to hear opinions from people > who read CJK scripts who think the distinction no longer matters, not > these days. BTW, while looking closer, I'm inclined to think that maybe their opinion doesn't matter that much: while the general issue of font choice for CJK text in Elisp files might really affect some users, in the specific case of the files affected by this patch I believe this likely isn't the case, because while there are affected *chars*, there is no affected *text*. More specifically, AFAICT the affected chars are all part of the code and they represent themselves rather than being used as a carrier for a specific meaning in a text (because all this code is about how to insert specific chars). [ Snipped the rest about etc/HELLO. ] Stefan "I asked Chong what he thought about it but said that he's not using CJK enough to be a good source of opinion" ^ permalink raw reply [flat|nested] 36+ messages in thread
* bug#33796: 27.0.50; Use utf-8 is all our Elisp files 2018-12-21 13:46 ` Stefan Monnier @ 2018-12-21 15:54 ` Eli Zaretskii 0 siblings, 0 replies; 36+ messages in thread From: Eli Zaretskii @ 2018-12-21 15:54 UTC (permalink / raw) To: Stefan Monnier; +Cc: eggert, 33796 > From: Stefan Monnier <monnier@IRO.UMontreal.CA> > Cc: Paul Eggert <eggert@cs.ucla.edu>, handa@gnu.org, 33796@debbugs.gnu.org > Date: Fri, 21 Dec 2018 08:46:11 -0500 > > BTW, while looking closer, I'm inclined to think that maybe their > opinion doesn't matter that much: while the general issue of font choice > for CJK text in Elisp files might really affect some users, in the > specific case of the files affected by this patch I believe this likely > isn't the case, because while there are affected *chars*, there is no > affected *text*. Maybe. But I wouldn't jump to conclusions: it could be that the aversion is (or was) to how the glyphs look, regardless of whether they are part of meaningful text. ^ permalink raw reply [flat|nested] 36+ messages in thread
* bug#33796: 27.0.50; Use utf-8 is all our Elisp files 2018-12-21 7:29 ` Eli Zaretskii 2018-12-21 13:46 ` Stefan Monnier @ 2018-12-21 13:55 ` Eli Zaretskii 2018-12-21 21:07 ` Paul Eggert 2 siblings, 0 replies; 36+ messages in thread From: Eli Zaretskii @ 2018-12-21 13:55 UTC (permalink / raw) To: eggert; +Cc: monnier, 33796 > Date: Fri, 21 Dec 2018 09:29:36 +0200 > From: Eli Zaretskii <eliz@gnu.org> > Cc: monnier@iro.umontreal.ca, 33796@debbugs.gnu.org > > > I don't use that menu, but I took your hint and just now > > tried it, by selecting the abovementioned word "ελληνικά" and menuing to > > Edit > Text Properties > Describe Properties, but all it said was 'Text > > content at position 1530: There are text properties here: unknown > > ("x-charset")'. This missed the point that the word's character set is > > greek-iso8859-7 > > I cannot reproduce this. That menu item invokes the command > describe-text-properties, which pops up the *Help* buffer, and the > text there says: > > Text content at position 1530: > > > There are text properties here: > charset greek-iso8859-7 > > I wonder why you don't see that. I think I know the answer to that: you use Emacs 26 or older to look at the file. Only Emacs 27 supports the x-charset property in Enriched mode. ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: 27.0.50; Use utf-8 is all our Elisp files 2018-12-21 7:29 ` Eli Zaretskii 2018-12-21 13:46 ` Stefan Monnier 2018-12-21 13:55 ` Eli Zaretskii @ 2018-12-21 21:07 ` Paul Eggert 2018-12-22 1:19 ` Eric Lindblad 2018-12-22 8:12 ` etc/HELLO markup etc Eli Zaretskii 2 siblings, 2 replies; 36+ messages in thread From: Paul Eggert @ 2018-12-21 21:07 UTC (permalink / raw) To: Eli Zaretskii; +Cc: handa, monnier, Emacs Development [removing 33796@debbugs.gnu.org and adding emacs-devel@gnu.org to cc list] Eli Zaretskii wrote: > Which markup is not necessary for display, in your opinion? At most all that's useful is markup that distinguishes Chinese and Japanese variants of Han characters; this might also include hanja (Korean) and Chữ Nôm (Vietnamese) variants if we ever added such characters to etc/HELLO. Such markup might be useful because a significant set of east Asian users dislike Unicode's Han unification and prefer specific variants of Han characters. I'm not aware of any other set of users who dislike unification in that way. > That markup is precisely what keeps the charset properties on the > corresponding greetings. Removing it would be losing information that > HELLO is trying to preserve. Although the etc/HELLO markup might be of interest to those who care about annotating languages in the text, it's irrelevant to the ordinary purpose of that file, which is to show textual translations of "Hello", as examples, to an audience that doesn't know all those languages, but who can easily see the language names in the English (or native-language) parts of the text without involving any of the markup. It's a bit like reading a translation of (say) "War and Peace". Most people just want to read the translated text. A small fraction might want to know which part of the original was written in Russian, which in French, which in English, etc. Markup can help that small fraction, but just gets in the way of the primary use. > Is it possible that you are looking > at a file/buffer that was modified from its original contents? No, I was using Emacs 26 by mistake. Sorry about the noise. It's still not a good user interface, though, as it is difficult to see the markup's effect when visiting etc/HELLO in the usual way, and this makes it hard to see mistakes in the markup. etc/HELLO is littered with so much useless markup, and the effect of markup errors is so subtle, and it's so much of a pain to edit the markup in its ordinary form of display, that the file is not a good showroom for how to maintain multilingual text. It's not a good sign that there seem to be errors in the possibly-useful (i.e., CJ) markup that nobody has noticed since the markup was introduced in May, and that I noticed these errors now only because I was visiting the file literally. ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: 27.0.50; Use utf-8 is all our Elisp files 2018-12-21 21:07 ` Paul Eggert @ 2018-12-22 1:19 ` Eric Lindblad 2018-12-22 7:56 ` etc/HELLO markup etc. (Was: 27.0.50; Use utf-8 is all our Elisp files) Eli Zaretskii 2018-12-22 8:12 ` etc/HELLO markup etc Eli Zaretskii 1 sibling, 1 reply; 36+ messages in thread From: Eric Lindblad @ 2018-12-22 1:19 UTC (permalink / raw) To: Emacs-devel [-- Attachment #1: Type: text/html, Size: 450 bytes --] ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: etc/HELLO markup etc. (Was: 27.0.50; Use utf-8 is all our Elisp files) 2018-12-22 1:19 ` Eric Lindblad @ 2018-12-22 7:56 ` Eli Zaretskii 0 siblings, 0 replies; 36+ messages in thread From: Eli Zaretskii @ 2018-12-22 7:56 UTC (permalink / raw) To: Eric Lindblad; +Cc: Emacs-devel > From: "Eric Lindblad" <lindblad@gmx.com> > Date: Sat, 22 Dec 2018 02:19:47 +0100 > Sensitivity: Normal > > Would there be any sympathy to adding a link to this webpage in the etc/HELLO file? > > See also: UTF-8 SAMPLER > http://kermitproject.org/utf8.html Thanks, I looked at that file when I added a few scripts to HELLO. The goals of that file are different from what we try doing in HELLO. Our goal is to show the different scripts, not different languages or fonts. For that reason, many languages are absent from HELLO if they use the same scripts which are already present in the file (for other languages). IOW, the different languages in HELLO are just the means to a certain end: we need a language using a script to say "hello" for that script. ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: etc/HELLO markup etc. 2018-12-21 21:07 ` Paul Eggert 2018-12-22 1:19 ` Eric Lindblad @ 2018-12-22 8:12 ` Eli Zaretskii 2018-12-22 19:41 ` Paul Eggert ` (3 more replies) 1 sibling, 4 replies; 36+ messages in thread From: Eli Zaretskii @ 2018-12-22 8:12 UTC (permalink / raw) To: Paul Eggert; +Cc: handa, monnier, Emacs-devel > Cc: handa@gnu.org, monnier@iro.umontreal.ca, > Emacs Development <Emacs-devel@gnu.org> > From: Paul Eggert <eggert@cs.ucla.edu> > Date: Fri, 21 Dec 2018 13:07:09 -0800 > > [removing 33796@debbugs.gnu.org and adding emacs-devel@gnu.org to cc list] I've changed the Subject, as the original one was too similar to the bug report. > Eli Zaretskii wrote: > > Which markup is not necessary for display, in your opinion? > > At most all that's useful is markup that distinguishes Chinese and Japanese > variants of Han characters; this might also include hanja (Korean) and Chữ Nôm > (Vietnamese) variants if we ever added such characters to etc/HELLO. Such markup > might be useful because a significant set of east Asian users dislike Unicode's > Han unification and prefer specific variants of Han characters. I'm not aware of > any other set of users who dislike unification in that way. I'm not yet sure this is only about Han unification. Using charsets for specifying fonts is a general feature in Emacs, which can be used to control which fonts are selected independently of what the OS facilities such as fontconfig do. I hope Handa-san will be able to comment on this stuff. If Han unification is the only important user of the charset property, then yes, we could remove the rest of the charset info from HELLO. But please realize that the current HELLO just keeps the information that was there before recoding it in UTF-8, nothing was added. It is just kept in a different form, which makes the charset info human-readable, where previously it was encoded in the ISO 2022 sequences. > > That markup is precisely what keeps the charset properties on the > > corresponding greetings. Removing it would be losing information that > > HELLO is trying to preserve. > > Although the etc/HELLO markup might be of interest to those who care about > annotating languages in the text, it's irrelevant to the ordinary purpose of > that file, which is to show textual translations of "Hello" That's not the original purpose of that file. The purpose is to show scripts, not languages, and to show how we display different scripts in the same buffer. > It's still not a good user interface, though, as it is difficult to see the > markup's effect when visiting etc/HELLO in the usual way If the usual way is via find-file and its ilk, then you should see the same results as with "C-h h", so I'm not sure I understand what you mean here. > etc/HELLO is littered with so much useless markup I disagree that it's useless. Most of it is useful. > the effect of markup errors is so subtle, and it's so much of a pain > to edit the markup in its ordinary form of display If you mean manually editing the markup, then you aren't supposed to do that. In what way most of what you say is not applicable to etc/enriched.txt in general? If you just dislike what Enriched mode produces on disk, then let's stop this argument, as you seem to be arguing against files with markup in general, and that's a non-starter for me. > the file is not a good showroom for how to maintain multilingual > text. What other facilities are you aware of or can suggest for showing multilingual text with such level of detail and precision? > It's not a good sign that there seem to be errors in the > possibly-useful (i.e., CJ) markup that nobody has noticed since the > markup was introduced in May, and that I noticed these errors now > only because I was visiting the file literally. Which errors? I don't think we discovered any errors. We may have discovered some markup on whitespace where we perhaps could do without it (I'm not yet sure of that), but that's all, and is not necessarily an error. ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: etc/HELLO markup etc. 2018-12-22 8:12 ` etc/HELLO markup etc Eli Zaretskii @ 2018-12-22 19:41 ` Paul Eggert 2018-12-22 20:42 ` Eli Zaretskii 2018-12-23 7:47 ` Yuri Khan ` (2 subsequent siblings) 3 siblings, 1 reply; 36+ messages in thread From: Paul Eggert @ 2018-12-22 19:41 UTC (permalink / raw) To: Eli Zaretskii; +Cc: handa, monnier, Emacs-devel Eli Zaretskii wrote: > If Han unification is the only important user of the charset property, > then yes, we could remove the rest of the charset info from HELLO. Yes, that's the case. > the current HELLO just keeps the information > that was there before recoding it in UTF-8, nothing was added. Sure, but the non-Han markup is merely a relic of that file's old method of encoding, which avoided Unicode and instead used ISO 2022 escape sequences to switch among various 8- and 16-bit encodings, as that was the only way to show text in (say) Russian under the constraints of the old method. The non-Han markup is completely unnecessary now that the file uses UTF-8. (The Han markup probably isn't needed either, though I also would like Handa's opinion on that.) >> Although the etc/HELLO markup might be of interest to those who care about >> annotating languages in the text, it's irrelevant to the ordinary purpose of >> that file, which is to show textual translations of "Hello" > > That's not the original purpose of that file. The purpose is to show > scripts, not languages, and to show how we display different scripts > in the same buffer. OK, but either way the non-Han markup is irrelevant to the ordinary purpose of the file. >> It's still not a good user interface, though, as it is difficult to see the >> markup's effect when visiting etc/HELLO in the usual way > > If the usual way is via find-file and its ilk, then you should see the > same results as with "C-h h", so I'm not sure I understand what you > mean here. I meant that one cannot see the markup's effect when visiting the file with either C-h h or find-file in the usual way. It's useless markup. > In what way most of what you say is not applicable to etc/enriched.txt > in general? Other forms of enriched-text markup are typically easily visible. If I visit etc/enriched.txt I can easily see which parts are marked white on blue background, which parts are marked italic, etc. Invisible enriched-text markup is much harder to deal with when editing an enriched-text file. >> the file is not a good showroom for how to maintain multilingual >> text. > > What other facilities are you aware of or can suggest for showing > multilingual text with such level of detail and precision? In practice the most common and often the best way to deal with the situation is to do what the non-markup part of etc/HELLO is already doing: indicate within the text itself what language or script is being used, to help the reader who may be unacquainted with them, and with enough punctuation within the text so that the reader can easily see what's going on. This technique has been used for centuries, it's by far the most popular technique in common practice today, and it suffices for this particular application (with the possible exception of its Chinese and Japanese text). >> It's not a good sign that there seem to be errors in the >> possibly-useful (i.e., CJ) markup that nobody has noticed since the >> markup was introduced in May, and that I noticed these errors now >> only because I was visiting the file literally. > > Which errors? I don't think we discovered any errors. Yes, and that's the point! The approach we're taking is not good for dealing with the situation. One example of such an error is that "日本語" has no charset properties even though it's obviously intended to use a Japanese script (since it follows the word "Japanese"). I'm sure there are others. ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: etc/HELLO markup etc. 2018-12-22 19:41 ` Paul Eggert @ 2018-12-22 20:42 ` Eli Zaretskii 0 siblings, 0 replies; 36+ messages in thread From: Eli Zaretskii @ 2018-12-22 20:42 UTC (permalink / raw) To: Paul Eggert; +Cc: handa, monnier, Emacs-devel > Cc: handa@gnu.org, monnier@iro.umontreal.ca, Emacs-devel@gnu.org > From: Paul Eggert <eggert@cs.ucla.edu> > Date: Sat, 22 Dec 2018 11:41:05 -0800 > > Eli Zaretskii wrote: > > > If Han unification is the only important user of the charset property, > > then yes, we could remove the rest of the charset info from HELLO. > > Yes, that's the case. Says you. The issue at hand is precisely whether that is so, or just your opinion and tendency. > the non-Han markup is merely a relic of that file's old method of > encoding It could be both a relic and an important piece of information. > one cannot see the markup's effect when visiting the file with > either C-h h or find-file in the usual way. Of course, one can: via the fonts used to display the various scripts. > > In what way most of what you say is not applicable to etc/enriched.txt > > in general? > > Other forms of enriched-text markup are typically easily visible. Typically, but not exclusively. There's read-only property, there's the 'display' property, and to some extent even the "fixed" face. > > What other facilities are you aware of or can suggest for showing > > multilingual text with such level of detail and precision? > > In practice the most common and often the best way to deal with the situation is > to do what the non-markup part of etc/HELLO is already doing: indicate within > the text itself what language or script is being used, to help the reader who > may be unacquainted with them, and with enough punctuation within the text so > that the reader can easily see what's going on. That's useless for preserving text properties, so won't fit the bill. > One example of such an error is that "日本語" has no charset properties even > though it's obviously intended to use a Japanese script (since it follows the > word "Japanese"). Thanks, I fixed that. > I'm sure there are others. Please report them if you find them. ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: etc/HELLO markup etc. 2018-12-22 8:12 ` etc/HELLO markup etc Eli Zaretskii 2018-12-22 19:41 ` Paul Eggert @ 2018-12-23 7:47 ` Yuri Khan 2018-12-23 15:42 ` Eli Zaretskii 2018-12-28 7:10 ` Eli Zaretskii 2018-12-29 7:23 ` handa 3 siblings, 1 reply; 36+ messages in thread From: Yuri Khan @ 2018-12-23 7:47 UTC (permalink / raw) To: Eli Zaretskii; +Cc: handa, Paul Eggert, Stefan Monnier, Emacs developers On Sat, Dec 22, 2018 at 3:13 PM Eli Zaretskii <eliz@gnu.org> wrote: > I'm not yet sure this is only about Han unification. Using charsets > for specifying fonts is a general feature in Emacs, which can be used > to control which fonts are selected independently of what the OS > facilities such as fontconfig do. There is at least one more situation where different glyphs could/should be selected for the same Unicode code points, which charset markup does not solve. I’m talking about italic shapes of Cyrillic letters. For some of them, Russian and Bulgarian use one shape but Serbian and Macedonian use another shape[1]. There are no examples of Bulgarian, Serbian, or Macedonian in HELLO, but Russian, Ukrainian and Mongolian examples are all marked up as “cyrillic-iso8859-5”, which is an encoding that does not carry language information. So: charset markup is not the right solution to the problem of rendering the same Unicode code point with different glyphs. [1]: https://en.wikipedia.org/wiki/Cyrillic_script#/media/File:Cyrillic_cursive.svg ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: etc/HELLO markup etc. 2018-12-23 7:47 ` Yuri Khan @ 2018-12-23 15:42 ` Eli Zaretskii 2018-12-23 15:53 ` Werner LEMBERG 0 siblings, 1 reply; 36+ messages in thread From: Eli Zaretskii @ 2018-12-23 15:42 UTC (permalink / raw) To: Yuri Khan; +Cc: handa, eggert, monnier, Emacs-devel > From: Yuri Khan <yurivkhan@gmail.com> > Date: Sun, 23 Dec 2018 14:47:39 +0700 > Cc: Paul Eggert <eggert@cs.ucla.edu>, handa@gnu.org, > Stefan Monnier <monnier@iro.umontreal.ca>, Emacs developers <Emacs-devel@gnu.org> > > There is at least one more situation where different glyphs > could/should be selected for the same Unicode code points, which > charset markup does not solve. > > I’m talking about italic shapes of Cyrillic letters. For some of them, > Russian and Bulgarian use one shape but Serbian and Macedonian use > another shape[1]. There are no examples of Bulgarian, Serbian, or > Macedonian in HELLO, but Russian, Ukrainian and Mongolian examples are > all marked up as “cyrillic-iso8859-5”, which is an encoding that does > not carry language information. > > So: charset markup is not the right solution to the problem of > rendering the same Unicode code point with different glyphs. You mean, it's not a perfect solution, right? Because in the "good" department, it's "good enough" to solve at least part of the problem. No one says we need to reject a solution because it is only partial. I would also like to point out that, as far as the 'charset' property is considered, HELLO is just an example of what _can_ be done, it doesn't pretend to show _everything_ that you could do. E.g., if it's important to be able to display Ukrainian in a font different from that used for Russian, we could use the koi8-u charset for the Ukrainian greeting, and tweak our default fontset to use special fonts for that. We could even invent additional charsets (see define-charset) and then use them for some greetings. Of course, this machinery works best when a charset is unequivocally determined by the prevalent encoding used for text that uses that charset, and that isn't always the case. But still, the feature is there, and it can be extended if needed. Finally, regarding the special handling of italics in Serbian: is there _any_ application out there that solves this problem satisfactorily in multilingual environment? I'm not sure how you could go about that, since fonts generally cover scripts, and there's no special Serbian Cyrillic script, there's just Cyrl to cover them all. ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: etc/HELLO markup etc. 2018-12-23 15:42 ` Eli Zaretskii @ 2018-12-23 15:53 ` Werner LEMBERG 2018-12-23 16:04 ` Eli Zaretskii 0 siblings, 1 reply; 36+ messages in thread From: Werner LEMBERG @ 2018-12-23 15:53 UTC (permalink / raw) To: eliz; +Cc: yurivkhan, eggert, Emacs-devel, monnier, handa >> So: charset markup is not the right solution to the problem of >> rendering the same Unicode code point with different glyphs. > > Finally, regarding the special handling of italics in Serbian: is > there _any_ application out there that solves this problem > satisfactorily in multilingual environment? I'm not sure how you > could go about that, since fonts generally cover scripts, and > there's no special Serbian Cyrillic script, there's just Cyrl to > cover them all. OpenType fonts provide a language tag (in addition to a script tag) to handle this. XeTeX and luatex support language tags – I don't know whether there is an editor with such a capability. Werner ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: etc/HELLO markup etc. 2018-12-23 15:53 ` Werner LEMBERG @ 2018-12-23 16:04 ` Eli Zaretskii 2018-12-23 21:11 ` Werner LEMBERG 0 siblings, 1 reply; 36+ messages in thread From: Eli Zaretskii @ 2018-12-23 16:04 UTC (permalink / raw) To: Werner LEMBERG; +Cc: yurivkhan, eggert, Emacs-devel, monnier, handa > Date: Sun, 23 Dec 2018 16:53:14 +0100 (CET) > Cc: yurivkhan@gmail.com, handa@gnu.org, eggert@cs.ucla.edu, > monnier@iro.umontreal.ca, Emacs-devel@gnu.org > From: Werner LEMBERG <wl@gnu.org> > > > Finally, regarding the special handling of italics in Serbian: is > > there _any_ application out there that solves this problem > > satisfactorily in multilingual environment? I'm not sure how you > > could go about that, since fonts generally cover scripts, and > > there's no special Serbian Cyrillic script, there's just Cyrl to > > cover them all. > > OpenType fonts provide a language tag (in addition to a script tag) to > handle this. Yes, but aren't these tags used only to select fonts that have features required by the language's shaping requirements? That's what Emacs does with those. ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: etc/HELLO markup etc. 2018-12-23 16:04 ` Eli Zaretskii @ 2018-12-23 21:11 ` Werner LEMBERG 0 siblings, 0 replies; 36+ messages in thread From: Werner LEMBERG @ 2018-12-23 21:11 UTC (permalink / raw) To: eliz; +Cc: yurivkhan, eggert, Emacs-devel, monnier, handa >> > Finally, regarding the special handling of italics in Serbian: is >> > there _any_ application out there that solves this problem >> > satisfactorily in multilingual environment? I'm not sure how you >> > could go about that, since fonts generally cover scripts, and >> > there's no special Serbian Cyrillic script, there's just Cyrl to >> > cover them all. >> >> OpenType fonts provide a language tag (in addition to a script tag) >> to handle this. > > Yes, but aren't these tags used only to select fonts that have > features required by the language's shaping requirements? That's > what Emacs does with those. Well, I could imagine the following use case: Within Emacs, you activate a Serbian language environment. This passes the script tag `Cyrl' and the language tag `SRB' to the current font (which must be reloaded). Within a document, the language tag must be explicitly passed to the text snippet in question (using some sort of markup or text properties); while it might be possible to algorithmically deduce a language tag for longer texts, this certainly doesn't work for just a few characters. Werner ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: etc/HELLO markup etc. 2018-12-22 8:12 ` etc/HELLO markup etc Eli Zaretskii 2018-12-22 19:41 ` Paul Eggert 2018-12-23 7:47 ` Yuri Khan @ 2018-12-28 7:10 ` Eli Zaretskii 2018-12-29 7:23 ` handa 3 siblings, 0 replies; 36+ messages in thread From: Eli Zaretskii @ 2018-12-28 7:10 UTC (permalink / raw) To: Kenichi Handa; +Cc: eggert, monnier, Emacs-devel Ping! Kenichi, could you please comment on this issue? TIA. > Date: Sat, 22 Dec 2018 10:12:37 +0200 > From: Eli Zaretskii <eliz@gnu.org> > Cc: handa@gnu.org, monnier@iro.umontreal.ca, Emacs-devel@gnu.org > > > Cc: handa@gnu.org, monnier@iro.umontreal.ca, > > Emacs Development <Emacs-devel@gnu.org> > > From: Paul Eggert <eggert@cs.ucla.edu> > > Date: Fri, 21 Dec 2018 13:07:09 -0800 > > > > [removing 33796@debbugs.gnu.org and adding emacs-devel@gnu.org to cc list] > > I've changed the Subject, as the original one was too similar to the > bug report. > > > Eli Zaretskii wrote: > > > Which markup is not necessary for display, in your opinion? > > > > At most all that's useful is markup that distinguishes Chinese and Japanese > > variants of Han characters; this might also include hanja (Korean) and Chữ Nôm > > (Vietnamese) variants if we ever added such characters to etc/HELLO. Such markup > > might be useful because a significant set of east Asian users dislike Unicode's > > Han unification and prefer specific variants of Han characters. I'm not aware of > > any other set of users who dislike unification in that way. > > I'm not yet sure this is only about Han unification. Using charsets > for specifying fonts is a general feature in Emacs, which can be used > to control which fonts are selected independently of what the OS > facilities such as fontconfig do. > > I hope Handa-san will be able to comment on this stuff. > > If Han unification is the only important user of the charset property, > then yes, we could remove the rest of the charset info from HELLO. > But please realize that the current HELLO just keeps the information > that was there before recoding it in UTF-8, nothing was added. It is > just kept in a different form, which makes the charset info > human-readable, where previously it was encoded in the ISO 2022 > sequences. > > > > That markup is precisely what keeps the charset properties on the > > > corresponding greetings. Removing it would be losing information that > > > HELLO is trying to preserve. > > > > Although the etc/HELLO markup might be of interest to those who care about > > annotating languages in the text, it's irrelevant to the ordinary purpose of > > that file, which is to show textual translations of "Hello" > > That's not the original purpose of that file. The purpose is to show > scripts, not languages, and to show how we display different scripts > in the same buffer. > > > It's still not a good user interface, though, as it is difficult to see the > > markup's effect when visiting etc/HELLO in the usual way > > If the usual way is via find-file and its ilk, then you should see the > same results as with "C-h h", so I'm not sure I understand what you > mean here. > > > etc/HELLO is littered with so much useless markup > > I disagree that it's useless. Most of it is useful. > > > the effect of markup errors is so subtle, and it's so much of a pain > > to edit the markup in its ordinary form of display > > If you mean manually editing the markup, then you aren't supposed to > do that. > > In what way most of what you say is not applicable to etc/enriched.txt > in general? If you just dislike what Enriched mode produces on disk, > then let's stop this argument, as you seem to be arguing against files > with markup in general, and that's a non-starter for me. > > > the file is not a good showroom for how to maintain multilingual > > text. > > What other facilities are you aware of or can suggest for showing > multilingual text with such level of detail and precision? > > > It's not a good sign that there seem to be errors in the > > possibly-useful (i.e., CJ) markup that nobody has noticed since the > > markup was introduced in May, and that I noticed these errors now > > only because I was visiting the file literally. > > Which errors? I don't think we discovered any errors. We may have > discovered some markup on whitespace where we perhaps could do without > it (I'm not yet sure of that), but that's all, and is not necessarily > an error. > > ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: etc/HELLO markup etc. 2018-12-22 8:12 ` etc/HELLO markup etc Eli Zaretskii ` (2 preceding siblings ...) 2018-12-28 7:10 ` Eli Zaretskii @ 2018-12-29 7:23 ` handa 2018-12-29 7:37 ` Eli Zaretskii 3 siblings, 1 reply; 36+ messages in thread From: handa @ 2018-12-29 7:23 UTC (permalink / raw) To: Eli Zaretskii; +Cc: eggert, monnier, Emacs-devel In article <838t0iasju.fsf@gnu.org>, Eli Zaretskii <eliz@gnu.org> writes: > > Eli Zaretskii wrote: > > > Which markup is not necessary for display, in your opinion? > > > > At most all that's useful is markup that distinguishes Chinese and Japanese > > variants of Han characters; this might also include hanja (Korean) and Chữ Nôm > > (Vietnamese) variants if we ever added such characters to etc/HELLO. Such markup > > might be useful because a significant set of east Asian users dislike Unicode's > > Han unification and prefer specific variants of Han characters. I'm not aware of > > any other set of users who dislike unification in that way. > I'm not yet sure this is only about Han unification. Using charsets > for specifying fonts is a general feature in Emacs, which can be used > to control which fonts are selected independently of what the OS > facilities such as fontconfig do. > I hope Handa-san will be able to comment on this stuff. > If Han unification is the only important user of the charset property, > then yes, we could remove the rest of the charset info from HELLO. Long ago, the quality of fonts designed for a specific regacy charset were far better than so-called Unicode fonts even for non-Han charaters. So, the charset information for non-Han charsets did have some meaning. But, I don't know the current situation. Perhaps, it is good to remove them and wait for complaint from users. --- K. Handa handa@gnu.org ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: etc/HELLO markup etc. 2018-12-29 7:23 ` handa @ 2018-12-29 7:37 ` Eli Zaretskii 2019-01-06 12:06 ` handa 0 siblings, 1 reply; 36+ messages in thread From: Eli Zaretskii @ 2018-12-29 7:37 UTC (permalink / raw) To: handa; +Cc: eggert, monnier, Emacs-devel > From: handa <handa@gnu.org> > Cc: eggert@cs.ucla.edu, monnier@iro.umontreal.ca, Emacs-devel@gnu.org > Date: Sat, 29 Dec 2018 16:23:24 +0900 > > > I'm not yet sure this is only about Han unification. Using charsets > > for specifying fonts is a general feature in Emacs, which can be used > > to control which fonts are selected independently of what the OS > > facilities such as fontconfig do. > > > I hope Handa-san will be able to comment on this stuff. > > > If Han unification is the only important user of the charset property, > > then yes, we could remove the rest of the charset info from HELLO. > > Long ago, the quality of fonts designed for a specific regacy charset > were far better than so-called Unicode fonts even for non-Han charaters. > So, the charset information for non-Han charsets did have some meaning. > But, I don't know the current situation. Perhaps, it is good to remove > them and wait for complaint from users. Thanks. What about using the charset information in general for font selection? Do you think this is a valuable feature, or was it again designed only due to the issues you mention above with fonts designed for legacy charsets? ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: etc/HELLO markup etc. 2018-12-29 7:37 ` Eli Zaretskii @ 2019-01-06 12:06 ` handa 2019-01-06 15:29 ` Eli Zaretskii 0 siblings, 1 reply; 36+ messages in thread From: handa @ 2019-01-06 12:06 UTC (permalink / raw) To: Eli Zaretskii; +Cc: eggert, monnier, Emacs-devel In article <83lg486awy.fsf@gnu.org>, Eli Zaretskii <eliz@gnu.org> writes: > What about using the charset information in general for font > selection? Do you think this is a valuable feature, or was it again > designed only due to the issues you mention above with fonts designed > for legacy charsets? The latter. As an Open Type font has shaping rules for script and/or language, script and language information is more useful than charset. --- K. Handa handa@gnu.org ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: etc/HELLO markup etc. 2019-01-06 12:06 ` handa @ 2019-01-06 15:29 ` Eli Zaretskii 2019-01-06 17:26 ` Stefan Monnier 0 siblings, 1 reply; 36+ messages in thread From: Eli Zaretskii @ 2019-01-06 15:29 UTC (permalink / raw) To: handa; +Cc: eggert, monnier, Emacs-devel > From: handa <handa@gnu.org> > Cc: eggert@cs.ucla.edu, monnier@iro.umontreal.ca, Emacs-devel@gnu.org > Date: Sun, 06 Jan 2019 21:06:22 +0900 > > In article <83lg486awy.fsf@gnu.org>, Eli Zaretskii <eliz@gnu.org> writes: > > > What about using the charset information in general for font > > selection? Do you think this is a valuable feature, or was it again > > designed only due to the issues you mention above with fonts designed > > for legacy charsets? > > The latter. As an Open Type font has shaping rules for script and/or > language, script and language information is more useful than charset. Thanks. I guess we can remove most of charset markup from HELLO, leaving only one or two as an example of the facility. ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: etc/HELLO markup etc. 2019-01-06 15:29 ` Eli Zaretskii @ 2019-01-06 17:26 ` Stefan Monnier 2019-01-06 17:39 ` Eli Zaretskii 0 siblings, 1 reply; 36+ messages in thread From: Stefan Monnier @ 2019-01-06 17:26 UTC (permalink / raw) To: Eli Zaretskii; +Cc: handa, eggert, Emacs-devel > Thanks. I guess we can remove most of charset markup from HELLO, > leaving only one or two as an example of the facility. And to get back to bug#33796: does that mean I can install a change to convert those Elisp files to utf-8? Stefan ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: etc/HELLO markup etc. 2019-01-06 17:26 ` Stefan Monnier @ 2019-01-06 17:39 ` Eli Zaretskii 2019-01-06 18:08 ` Stefan Monnier 0 siblings, 1 reply; 36+ messages in thread From: Eli Zaretskii @ 2019-01-06 17:39 UTC (permalink / raw) To: Stefan Monnier; +Cc: handa, eggert, Emacs-devel > From: Stefan Monnier <monnier@IRO.UMontreal.CA> > Cc: handa <handa@gnu.org>, eggert@cs.ucla.edu, Emacs-devel@gnu.org > Date: Sun, 06 Jan 2019 12:26:39 -0500 > > > Thanks. I guess we can remove most of charset markup from HELLO, > > leaving only one or two as an example of the facility. > > And to get back to bug#33796: does that mean I can install a change to > convert those Elisp files to utf-8? Yes, I think so. Except that I'd prefer not to mix code changes and encoding changes. Can you do that in two separate patches? ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: etc/HELLO markup etc. 2019-01-06 17:39 ` Eli Zaretskii @ 2019-01-06 18:08 ` Stefan Monnier 0 siblings, 0 replies; 36+ messages in thread From: Stefan Monnier @ 2019-01-06 18:08 UTC (permalink / raw) To: emacs-devel >> > Thanks. I guess we can remove most of charset markup from HELLO, >> > leaving only one or two as an example of the facility. >> >> And to get back to bug#33796: does that mean I can install a change to >> convert those Elisp files to utf-8? > > Yes, I think so. Except that I'd prefer not to mix code changes and > encoding changes. Can you do that in two separate patches? Yes, of course, Stefan ^ permalink raw reply [flat|nested] 36+ messages in thread
* bug#33796: 27.0.50; Use utf-8 is all our Elisp files 2018-12-19 17:54 ` Paul Eggert 2018-12-19 18:11 ` Eli Zaretskii @ 2018-12-19 21:16 ` Stefan Monnier 1 sibling, 0 replies; 36+ messages in thread From: Stefan Monnier @ 2018-12-19 21:16 UTC (permalink / raw) To: Paul Eggert; +Cc: 33796 > PPS. How about also converting etc/tutorials/TUTORIAL.ja, > lisp/leim/quail/hanja-jis.el, lisp/leim/quail/japanese.el, > lisp/leim/quail/py-punct.el, and lisp/leim/quail/pypunct-b5.el? I don't see how we'll ever get rid of support for iso-2022 encoding, so I'm not terribly concerned about converting files like TUTORIAL.ja. If you think it's a good idea, of course, I'm very much in favor of such a change, but I focused on .el files because I'm interested in standardizing Elisp files to utf-8 and get rid of load-with-code-conversion (a distant target, admittedly, but at least I can see a path that can get us there). I missed the above 4 Elisp files because my regexp fu was too weak. I'll update my patch, thanks, Stefan ^ permalink raw reply [flat|nested] 36+ messages in thread
* bug#33796: 27.0.50; Use utf-8 is all our Elisp files 2018-12-18 18:46 bug#33796: 27.0.50; Use utf-8 is all our Elisp files Stefan Monnier 2018-12-18 19:22 ` Eli Zaretskii 2018-12-19 17:54 ` Paul Eggert @ 2019-01-08 2:20 ` Stefan Monnier 2 siblings, 0 replies; 36+ messages in thread From: Stefan Monnier @ 2019-01-08 2:20 UTC (permalink / raw) To: 33796-done Installed, Stefan ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: etc/HELLO markup etc. @ 2018-12-29 5:32 Van L 2018-12-29 7:33 ` Eli Zaretskii 0 siblings, 1 reply; 36+ messages in thread From: Van L @ 2018-12-29 5:32 UTC (permalink / raw) To: Emacs developers >> Although the etc/HELLO markup might be of interest to those who care about >> annotating languages in the text, it's irrelevant to the ordinary purpose of >> that file, which is to show textual translations of "Hello” > That's not the original purpose of that file. The purpose is to show scripts, > not languages, and to show how we display different scripts in the same buffer. The descriptive text accompanying (view-hello-file) says the following, which needs to swap scripts for languages where that is. : Display the HELLO file, which lists many languages and characters. ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: etc/HELLO markup etc. 2018-12-29 5:32 etc/HELLO markup etc Van L @ 2018-12-29 7:33 ` Eli Zaretskii 2018-12-30 6:51 ` Van L 0 siblings, 1 reply; 36+ messages in thread From: Eli Zaretskii @ 2018-12-29 7:33 UTC (permalink / raw) To: Van L; +Cc: emacs-devel > From: Van L <van@scratch.space> > Date: Sat, 29 Dec 2018 16:32:26 +1100 > > >> Although the etc/HELLO markup might be of interest to those who care about > >> annotating languages in the text, it's irrelevant to the ordinary purpose of > >> that file, which is to show textual translations of "Hello” > > > That's not the original purpose of that file. The purpose is to show scripts, > > not languages, and to show how we display different scripts in the same buffer. > > The descriptive text accompanying (view-hello-file) says the following, > which needs to swap scripts for languages where that is. > > : Display the HELLO file, which lists many languages and characters. I'm not sure. This discussion has been very technical, and presumably the participants are well aware of what a script is, in this context. By contrast, a random reader of the doc string doesn't necessarily know what a script is. Saying "many languages and characters" is vaguely similar, while using only terminology most people understand. ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: etc/HELLO markup etc. 2018-12-29 7:33 ` Eli Zaretskii @ 2018-12-30 6:51 ` Van L 0 siblings, 0 replies; 36+ messages in thread From: Van L @ 2018-12-30 6:51 UTC (permalink / raw) To: Emacs developers >>> That's not the original purpose of that file. The purpose is to show scripts, >>> not languages, and to show how we display different scripts in the same buffer. >> >> The descriptive text accompanying (view-hello-file) says the following, >> which needs to swap scripts for languages where that is. >> >> : Display the HELLO file, which lists many languages and characters. > > I'm not sure. This discussion has been very technical, and presumably Yes. It is well and truely deep in the weed of it. > the participants are well aware of what a script is, in this context. > By contrast, a random reader of the doc string doesn't necessarily > know what a script is. Saying "many languages and characters" is > vaguely similar, while using only terminology most people understand. A random reader may have in their consciousness the Rosetta Stone, ESA’s Rosetta Mission which will be surpassed by NASA’s New Horizon at Ultima Thule very very soon to bring in the New Year. You have to think all the grade school students following the Rosetta Mission were taught what is a language as distinct from a script and characters at least among the EU nationals when the UK was in there before Brexit. How about the following? it is 74 columns wide; anyway. : Display HELLO file, a short sample of some languages, scripts, characters. ^ permalink raw reply [flat|nested] 36+ messages in thread
end of thread, other threads:[~2019-01-08 2:20 UTC | newest] Thread overview: 36+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2018-12-18 18:46 bug#33796: 27.0.50; Use utf-8 is all our Elisp files Stefan Monnier 2018-12-18 19:22 ` Eli Zaretskii 2018-12-18 19:46 ` Stefan Monnier 2018-12-19 17:54 ` Paul Eggert 2018-12-19 18:11 ` Eli Zaretskii 2018-12-19 22:13 ` Paul Eggert 2018-12-20 16:06 ` Eli Zaretskii 2018-12-20 21:49 ` Paul Eggert 2018-12-21 7:29 ` Eli Zaretskii 2018-12-21 13:46 ` Stefan Monnier 2018-12-21 15:54 ` Eli Zaretskii 2018-12-21 13:55 ` Eli Zaretskii 2018-12-21 21:07 ` Paul Eggert 2018-12-22 1:19 ` Eric Lindblad 2018-12-22 7:56 ` etc/HELLO markup etc. (Was: 27.0.50; Use utf-8 is all our Elisp files) Eli Zaretskii 2018-12-22 8:12 ` etc/HELLO markup etc Eli Zaretskii 2018-12-22 19:41 ` Paul Eggert 2018-12-22 20:42 ` Eli Zaretskii 2018-12-23 7:47 ` Yuri Khan 2018-12-23 15:42 ` Eli Zaretskii 2018-12-23 15:53 ` Werner LEMBERG 2018-12-23 16:04 ` Eli Zaretskii 2018-12-23 21:11 ` Werner LEMBERG 2018-12-28 7:10 ` Eli Zaretskii 2018-12-29 7:23 ` handa 2018-12-29 7:37 ` Eli Zaretskii 2019-01-06 12:06 ` handa 2019-01-06 15:29 ` Eli Zaretskii 2019-01-06 17:26 ` Stefan Monnier 2019-01-06 17:39 ` Eli Zaretskii 2019-01-06 18:08 ` Stefan Monnier 2018-12-19 21:16 ` bug#33796: 27.0.50; Use utf-8 is all our Elisp files Stefan Monnier 2019-01-08 2:20 ` Stefan Monnier -- strict thread matches above, loose matches on Subject: below -- 2018-12-29 5:32 etc/HELLO markup etc Van L 2018-12-29 7:33 ` Eli Zaretskii 2018-12-30 6:51 ` Van L
Code repositories for project(s) associated with this external index https://git.savannah.gnu.org/cgit/emacs.git https://git.savannah.gnu.org/cgit/emacs/org-mode.git This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.