* bug#3607: 23.0.94; odd character in fringe.el
[not found] <mailman.888.1245349050.2239.bug-gnu-emacs@gnu.org>
@ 2009-06-18 18:41 ` Teemu Likonen
2009-06-18 18:54 ` Drew Adams
0 siblings, 1 reply; 15+ messages in thread
From: Teemu Likonen @ 2009-06-18 18:41 UTC (permalink / raw)
To: Drew Adams; +Cc: 3607
On 2009-06-18 11:00 (-0700), Drew Adams wrote:
> I don't claim this is a bug, but perhaps someone could take a look to
> be sure.
>
> In fringe.el, I see this text: Pavel Jan=c3=adk. The next-to-last
> character shows in my Emacs with face `escape-glyph'. This is what
> `C-u C-x =' shows:
Perhaps you know most of this already but here's some information
anyway. The name is "Pavel Janík" and it displays just fine in my
system. File fringe.el is UTF-8-encoded.
When encoded in UTF-8 the character í (U+00CD) consists of two bytes,
0xC3 and 0xAD. When some system interpretes those bytes as separete
ISO-8859-1-encoded characters they are à (0xC3) and a soft hyphen
(0xAD). This is what your system seems to have done:
> preferred charset: iso-8859-1 (Latin-1 (ISO/IEC 8859-1))
> code point: 0xAD
> name: SOFT HYPHEN
So it sounds like some kind of singlebyte-multibyte encoding problem.
^ permalink raw reply [flat|nested] 15+ messages in thread
* bug#3607: 23.0.94; odd character in fringe.el
2009-06-18 18:41 ` bug#3607: 23.0.94; odd character in fringe.el Teemu Likonen
@ 2009-06-18 18:54 ` Drew Adams
2009-06-18 19:20 ` Teemu Likonen
0 siblings, 1 reply; 15+ messages in thread
From: Drew Adams @ 2009-06-18 18:54 UTC (permalink / raw)
To: 'Teemu Likonen'; +Cc: 3607
> Perhaps you know most of this already but here's some information
> anyway. The name is "Pavel Janík" and it displays just fine in my
> system. File fringe.el is UTF-8-encoded.
>
> When encoded in UTF-8 the character í (U+00CD) consists of two bytes,
> 0xC3 and 0xAD. When some system interpretes those bytes as separete
> ISO-8859-1-encoded characters they are à (0xC3) and a soft hyphen
> (0xAD). This is what your system seems to have done:
>
> > preferred charset: iso-8859-1 (Latin-1 (ISO/IEC 8859-1))
> > code point: 0xAD
> > name: SOFT HYPHEN
>
> So it sounds like some kind of singlebyte-multibyte encoding problem.
I see. Thanks for the info. I'm pretty ignorant about this stuff.
I see this in emacs -Q (on MS Windows), however, so I wonder if it isn't a bug.
And I wonder how you can see it as being UTF-8 encoded - are you using emacs -Q?
I don't see any local-variable thingy that would specify that the file is to be
UTF-8 encoded.
^ permalink raw reply [flat|nested] 15+ messages in thread
* bug#3607: 23.0.94; odd character in fringe.el
2009-06-18 18:54 ` Drew Adams
@ 2009-06-18 19:20 ` Teemu Likonen
2009-06-18 21:08 ` Drew Adams
0 siblings, 1 reply; 15+ messages in thread
From: Teemu Likonen @ 2009-06-18 19:20 UTC (permalink / raw)
To: Drew Adams; +Cc: 3607
On 2009-06-18 11:54 (-0700), Drew Adams wrote:
> I see this in emacs -Q (on MS Windows), however, so I wonder if it
> isn't a bug.
>
> And I wonder how you can see it as being UTF-8 encoded - are you using
> emacs -Q?
>
> I don't see any local-variable thingy that would specify that the file
> is to be UTF-8 encoded.
Yes, the file shows correctly with "emacs -Q lisp/fringe.el" too. And it
is UTF-8 encoded file.
$ file lisp/fringe.el
lisp/fringe.el: UTF-8 Unicode English text
My Debian GNU/Linux system uses UTF-8 locale (as all GNU/Linux systems
do these days). Emacs probably detects my environment and uses correct
encoding settings. But I don't know Emacs works - except everything just
works. :-)
It really seems that your default environment is something other than
UTF-8, something single-byte.
^ permalink raw reply [flat|nested] 15+ messages in thread
* bug#3607: 23.0.94; odd character in fringe.el
2009-06-18 19:20 ` Teemu Likonen
@ 2009-06-18 21:08 ` Drew Adams
2009-06-18 21:34 ` Lennart Borgman
0 siblings, 1 reply; 15+ messages in thread
From: Drew Adams @ 2009-06-18 21:08 UTC (permalink / raw)
To: 'Teemu Likonen'; +Cc: 3607
> Yes, the file shows correctly with "emacs -Q lisp/fringe.el"
> too. And it is UTF-8 encoded file.
>
> $ file lisp/fringe.el
> lisp/fringe.el: UTF-8 Unicode English text
>
> My Debian GNU/Linux system uses UTF-8 locale (as all GNU/Linux systems
> do these days). Emacs probably detects my environment and uses correct
> encoding settings. But I don't know Emacs works - except
> everything just
> works. :-)
>
> It really seems that your default environment is something other than
> UTF-8, something single-byte.
OK, thanks for checking.
IMO, if the file should be encoded in UTF-8, then the file itself should control
that - as buff-menu.el does, for instance. The user's locale shouldn't enter
into it at this level. Seems like a bug, to me. (But I'm no expert on this.)
^ permalink raw reply [flat|nested] 15+ messages in thread
* bug#3607: 23.0.94; odd character in fringe.el
2009-06-18 21:08 ` Drew Adams
@ 2009-06-18 21:34 ` Lennart Borgman
2009-06-19 0:47 ` Kenichi Handa
0 siblings, 1 reply; 15+ messages in thread
From: Lennart Borgman @ 2009-06-18 21:34 UTC (permalink / raw)
To: Drew Adams, 3607; +Cc: Teemu Likonen
On Thu, Jun 18, 2009 at 11:08 PM, Drew Adams<drew.adams@oracle.com> wrote:
>> Yes, the file shows correctly with "emacs -Q lisp/fringe.el"
>> too. And it is UTF-8 encoded file.
>>
>> $ file lisp/fringe.el
>> lisp/fringe.el: UTF-8 Unicode English text
>>
>> My Debian GNU/Linux system uses UTF-8 locale (as all GNU/Linux systems
>> do these days). Emacs probably detects my environment and uses correct
>> encoding settings. But I don't know Emacs works - except
>> everything just
>> works. :-)
>>
>> It really seems that your default environment is something other than
>> UTF-8, something single-byte.
>
> OK, thanks for checking.
>
> IMO, if the file should be encoded in UTF-8, then the file itself should control
> that - as buff-menu.el does, for instance. The user's locale shouldn't enter
> into it at this level. Seems like a bug, to me. (But I'm no expert on this.)
Yes, that must be a bug.
^ permalink raw reply [flat|nested] 15+ messages in thread
* bug#3607: 23.0.94; odd character in fringe.el
2009-06-18 21:34 ` Lennart Borgman
@ 2009-06-19 0:47 ` Kenichi Handa
2009-06-27 1:07 ` Stefan Monnier
0 siblings, 1 reply; 15+ messages in thread
From: Kenichi Handa @ 2009-06-19 0:47 UTC (permalink / raw)
To: Lennart Borgman, 3607; +Cc: tlikonen, 3607
I've just added "coding: utf-8" cookie to fringe.el.
---
Kenichi Handa
handa@m17n.org
In article <e01d8a50906181434t6e6c296ega704dc11fbb77b31@mail.gmail.com>, Lennart Borgman <lennart.borgman@gmail.com> writes:
> On Thu, Jun 18, 2009 at 11:08 PM, Drew Adams<drew.adams@oracle.com> wrote:
>>> Yes, the file shows correctly with "emacs -Q lisp/fringe.el"
>>> too. And it is UTF-8 encoded file.
>>>
>>> $ file lisp/fringe.el
>>> lisp/fringe.el: UTF-8 Unicode English text
>>>
>>> My Debian GNU/Linux system uses UTF-8 locale (as all GNU/Linux systems
>>> do these days). Emacs probably detects my environment and uses correct
>>> encoding settings. But I don't know Emacs works - except
>>> everything just
>>> works. :-)
>>>
>>> It really seems that your default environment is something other than
>>> UTF-8, something single-byte.
> >
> > OK, thanks for checking.
> >
> > IMO, if the file should be encoded in UTF-8, then the file itself should control
> > that - as buff-menu.el does, for instance. The user's locale shouldn't enter
> > into it at this level. Seems like a bug, to me. (But I'm no expert on this.)
> Yes, that must be a bug.
^ permalink raw reply [flat|nested] 15+ messages in thread
* bug#3607: 23.0.94; odd character in fringe.el
2009-06-19 0:47 ` Kenichi Handa
@ 2009-06-27 1:07 ` Stefan Monnier
2009-06-27 1:25 ` Kenichi Handa
0 siblings, 1 reply; 15+ messages in thread
From: Stefan Monnier @ 2009-06-27 1:07 UTC (permalink / raw)
To: Kenichi Handa; +Cc: tlikonen, 3607
> I've just added "coding: utf-8" cookie to fringe.el.
Thanks. But there is another bug here: a utf-8 file should be
recognized as such even in a latin-1 locale (i.e. utf-8 should always
have (one of) the highest priority). IIUC this is done right in
GNU/Linux but not under Windows.
Stefan
^ permalink raw reply [flat|nested] 15+ messages in thread
* bug#3607: 23.0.94; odd character in fringe.el
2009-06-27 1:07 ` Stefan Monnier
@ 2009-06-27 1:25 ` Kenichi Handa
2009-06-27 21:44 ` Stefan Monnier
0 siblings, 1 reply; 15+ messages in thread
From: Kenichi Handa @ 2009-06-27 1:25 UTC (permalink / raw)
To: Stefan Monnier; +Cc: tlikonen, 3607
In article <jwveit6iizt.fsf-monnier+emacsbugreports@gnu.org>, Stefan Monnier <monnier@iro.umontreal.ca> writes:
> > I've just added "coding: utf-8" cookie to fringe.el.
> Thanks. But there is another bug here: a utf-8 file should be
> recognized as such even in a latin-1 locale (i.e. utf-8 should always
> have (one of) the highest priority).
Current Emacs doesn't give utf-8 the higher priority than
iso-8859-1 in Latin-X language environment. Are you
proposing such a change? I can't decide that is good or not
because I'm not that familiar with such locales.
> IIUC this is done right in GNU/Linux but not under
> Windows.
?? Even on GNU/Linux (ubuntu), when I start emacs as this:
% LANG=de_DE emacs
iso-8859-1 has higher priority than utf-8.
Or, do you mean the other applications on GNU/Linux?
---
Kenichi Handa
handa@m17n.org
^ permalink raw reply [flat|nested] 15+ messages in thread
* bug#3607: 23.0.94; odd character in fringe.el
2009-06-27 1:25 ` Kenichi Handa
@ 2009-06-27 21:44 ` Stefan Monnier
2009-06-29 7:49 ` Kenichi Handa
0 siblings, 1 reply; 15+ messages in thread
From: Stefan Monnier @ 2009-06-27 21:44 UTC (permalink / raw)
To: Kenichi Handa; +Cc: tlikonen, 3607
> Current Emacs doesn't give utf-8 the higher priority than
> iso-8859-1 in Latin-X language environment. Are you
> proposing such a change? I can't decide that is good or not
> because I'm not that familiar with such locales.
It is a good change, because the likelyhood of a valid utf-8 file being
a proper latin-1 file is extremely low.
> ?? Even on GNU/Linux (ubuntu), when I start emacs as this:
> % LANG=de_DE emacs
> iso-8859-1 has higher priority than utf-8.
Duh, you're right. Even Emacs-22 still does that. It's wrong.
> Or, do you mean the other applications on GNU/Linux?
No, I meant Emacs, I was convinced we'd fixed it in Emacs-22 for
GNU/Linux, but it appears I was confused.
Stefan
^ permalink raw reply [flat|nested] 15+ messages in thread
* bug#3607: 23.0.94; odd character in fringe.el
2009-06-27 21:44 ` Stefan Monnier
@ 2009-06-29 7:49 ` Kenichi Handa
2009-06-29 8:52 ` Stefan Monnier
2009-06-29 18:16 ` Eli Zaretskii
0 siblings, 2 replies; 15+ messages in thread
From: Kenichi Handa @ 2009-06-29 7:49 UTC (permalink / raw)
To: Stefan Monnier; +Cc: tlikonen, 3607
In article <jwvbpo9gxsu.fsf-monnier+emacsbugreports@gnu.org>, Stefan Monnier <monnier@iro.umontreal.ca> writes:
> > Current Emacs doesn't give utf-8 the higher priority than
> > iso-8859-1 in Latin-X language environment. Are you
> > proposing such a change? I can't decide that is good or not
> > because I'm not that familiar with such locales.
> It is a good change, because the likelyhood of a valid utf-8 file being
> a proper latin-1 file is extremely low.
Ok. For that, we must do:
(set-coding-system-priority 'utf-8)
somewhere. I at first thought it could be done by
`setup-function' of Latin-1 language environment. Actually,
when a user does C-x C-m L Latin-1 RET, it works.
But, when emacs starts up, it calls set-locale-environment,
and it at first calls set-language-environment then
overrides coding-system setups. So, at the moment, I don't
have a good idea other than this very ad-hoc change for 23.1.
--- mule-cmds.el.~1.360.~ 2009-04-09 03:03:17.000000000 +0900
+++ mule-cmds.el 2009-06-29 16:45:08.000000000 +0900
@@ -2643,6 +2643,10 @@
(not (coding-system-equal coding-system
locale-coding-system)))
(prefer-coding-system coding-system)
+ ;; Even if we prefer "iso-latin-1", it is better to detect
+ ;; UTF-8.
+ (if (eq (coding-system-base coding-system) 'iso-latin-1)
+ (set-coding-system-priority 'utf-8))
;; Fixme: perhaps prefer-coding-system should set this too.
;; But it's not the time to do such a fundamental change.
(setq default-sendmail-coding-system coding-system)
For 23.2, I think we should re-design language-info-alist.
---
Kenichi Handa
handa@m17n.org
^ permalink raw reply [flat|nested] 15+ messages in thread
* bug#3607: 23.0.94; odd character in fringe.el
2009-06-29 7:49 ` Kenichi Handa
@ 2009-06-29 8:52 ` Stefan Monnier
2009-06-29 11:39 ` Kenichi Handa
2009-06-29 18:16 ` Eli Zaretskii
1 sibling, 1 reply; 15+ messages in thread
From: Stefan Monnier @ 2009-06-29 8:52 UTC (permalink / raw)
To: Kenichi Handa; +Cc: tlikonen, 3607
>> It is a good change, because the likelyhood of a valid utf-8 file being
>> a proper latin-1 file is extremely low.
> Ok. For that, we must do:
> (set-coding-system-priority 'utf-8)
> somewhere. I at first thought it could be done by
> `setup-function' of Latin-1 language environment. Actually,
> when a user does C-x C-m L Latin-1 RET, it works.
> But, when emacs starts up, it calls set-locale-environment,
> and it at first calls set-language-environment then
> overrides coding-system setups. So, at the moment, I don't
> have a good idea other than this very ad-hoc change for 23.1.
Actually, AFAIK the "unlikely false positives" property of the utf-8
encoding is not only true when applied to latin-1 files but also to most
other encodings. So really utf-8 should probably always be first (not
only for latin-1 environments), except maybe for some envs where there's
a knows non-negligible risk of false positives.
Stefan
^ permalink raw reply [flat|nested] 15+ messages in thread
* bug#3607: 23.0.94; odd character in fringe.el
2009-06-29 8:52 ` Stefan Monnier
@ 2009-06-29 11:39 ` Kenichi Handa
0 siblings, 0 replies; 15+ messages in thread
From: Kenichi Handa @ 2009-06-29 11:39 UTC (permalink / raw)
To: Stefan Monnier; +Cc: tlikonen, 3607
In article <jwv63ef4e6w.fsf-monnier+emacsbugreports@gnu.org>, Stefan Monnier <monnier@iro.umontreal.ca> writes:
> Actually, AFAIK the "unlikely false positives" property of the utf-8
> encoding is not only true when applied to latin-1 files but also to most
> other encodings. So really utf-8 should probably always be first (not
> only for latin-1 environments), except maybe for some envs where there's
> a knows non-negligible risk of false positives.
I think it's only Latin-X (and perhaps Vietnamese too) that
are mostly safe to give utf-8 the higher priority on code
detection, because only they use Latin script in which 8-bit
characters rarely appear succeedingly.
---
Kenichi Handa
handa@m17n.org
^ permalink raw reply [flat|nested] 15+ messages in thread
* bug#3607: 23.0.94; odd character in fringe.el
2009-06-29 7:49 ` Kenichi Handa
2009-06-29 8:52 ` Stefan Monnier
@ 2009-06-29 18:16 ` Eli Zaretskii
2009-06-29 20:48 ` Stefan Monnier
1 sibling, 1 reply; 15+ messages in thread
From: Eli Zaretskii @ 2009-06-29 18:16 UTC (permalink / raw)
To: Kenichi Handa, 3607; +Cc: tlikonen
> From: Kenichi Handa <handa@m17n.org>
> Date: Mon, 29 Jun 2009 16:49:32 +0900
> Cc: tlikonen@iki.fi, 3607@emacsbugs.donarmstrong.com
>
> But, when emacs starts up, it calls set-locale-environment,
> and it at first calls set-language-environment then
> overrides coding-system setups. So, at the moment, I don't
> have a good idea other than this very ad-hoc change for 23.1.
PLEEEEAAAAASE do _not_ make such ad-hoc changes on the branch at this time.
Experience shows that there be dragons, and we _do_ want to release
Emacs 23.1 some time this year...
> --- mule-cmds.el.~1.360.~ 2009-04-09 03:03:17.000000000 +0900
> +++ mule-cmds.el 2009-06-29 16:45:08.000000000 +0900
> @@ -2643,6 +2643,10 @@
> (not (coding-system-equal coding-system
> locale-coding-system)))
> (prefer-coding-system coding-system)
> + ;; Even if we prefer "iso-latin-1", it is better to detect
> + ;; UTF-8.
> + (if (eq (coding-system-base coding-system) 'iso-latin-1)
> + (set-coding-system-priority 'utf-8))
> ;; Fixme: perhaps prefer-coding-system should set this too.
> ;; But it's not the time to do such a fundamental change.
> (setq default-sendmail-coding-system coding-system)
>
> For 23.2, I think we should re-design language-info-alist.
Then let's defer the whole thing to Emacs 23.2. It's not a grave
problem, IMO, certainly not worth taking a risk of unintended
consequences.
^ permalink raw reply [flat|nested] 15+ messages in thread
* bug#3607: 23.0.94; odd character in fringe.el
2009-06-29 18:16 ` Eli Zaretskii
@ 2009-06-29 20:48 ` Stefan Monnier
0 siblings, 0 replies; 15+ messages in thread
From: Stefan Monnier @ 2009-06-29 20:48 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: tlikonen, 3607
>> But, when emacs starts up, it calls set-locale-environment,
>> and it at first calls set-language-environment then
>> overrides coding-system setups. So, at the moment, I don't
>> have a good idea other than this very ad-hoc change for 23.1.
> PLEEEEAAAAASE do _not_ make such ad-hoc changes on the branch at this time.
> Experience shows that there be dragons, and we _do_ want to release
> Emacs 23.1 some time this year...
Indeed, those things should go on the trunk only,
Stefan
^ permalink raw reply [flat|nested] 15+ messages in thread
* bug#3607: 23.0.94; odd character in fringe.el
@ 2009-06-18 18:00 Drew Adams
0 siblings, 0 replies; 15+ messages in thread
From: Drew Adams @ 2009-06-18 18:00 UTC (permalink / raw)
To: emacs-pretest-bug
I don't claim this is a bug, but perhaps someone could take a look to
be sure.
In fringe.el, I see this text: Pavel Jan=c3=adk. The next-to-last
character shows in my Emacs with face `escape-glyph'. This is what
`C-u C-x =' shows:
character: =ad (173, #o255, #xad)
preferred charset: iso-8859-1 (Latin-1 (ISO/IEC 8859-1))
code point: 0xAD
syntax: _ which means: symbol
category: b:Arabic, h:Korean, j:Japanese, l:Latin
buffer code: #xC2 #xAD
file code: #xAD (encoded by coding system iso-latin-1-unix)
display: by this font (glyph code)
uniscribe:-outline-Courier
New-normal-normal-normal-mono-13-*-*-*-c-*-iso8859-1 (#x10)
hardcoded face: escape-glyph
Character code properties: customize what to show
name: SOFT HYPHEN
general-category: Cf (Other, Format)
There are text properties here:
charset iso-8859-1
face font-lock-comment-face
fontified t
Is this something weird, or is it OK? Since I have customized face
`escape-glyph', I notice this easily, but it is not really noticeable
in emacs -Q. Why would a soft hyphen character be displayed using face
`escape-glyph'?
BTW, when trying to send this, Emacs asks if I want to convert
non-ASCII chars to hexadecimal. Dunno which would be more helpful, so
I'll guess yes. The *Help* text quoted should give enough info,
anyway.
In GNU Emacs 23.0.94.1 (i386-mingw-nt5.1.2600)
of 2009-05-24 on SOFT-MJASON
Windowing system distributor `Microsoft Corp.', version 5.1.2600
configured using `configure --with-gcc (3.4)'
^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2009-06-29 20:48 UTC | newest]
Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <mailman.888.1245349050.2239.bug-gnu-emacs@gnu.org>
2009-06-18 18:41 ` bug#3607: 23.0.94; odd character in fringe.el Teemu Likonen
2009-06-18 18:54 ` Drew Adams
2009-06-18 19:20 ` Teemu Likonen
2009-06-18 21:08 ` Drew Adams
2009-06-18 21:34 ` Lennart Borgman
2009-06-19 0:47 ` Kenichi Handa
2009-06-27 1:07 ` Stefan Monnier
2009-06-27 1:25 ` Kenichi Handa
2009-06-27 21:44 ` Stefan Monnier
2009-06-29 7:49 ` Kenichi Handa
2009-06-29 8:52 ` Stefan Monnier
2009-06-29 11:39 ` Kenichi Handa
2009-06-29 18:16 ` Eli Zaretskii
2009-06-29 20:48 ` Stefan Monnier
2009-06-18 18:00 Drew Adams
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/emacs.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).