unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
* bug#3607: 23.0.94; odd character in fringe.el
@ 2009-06-18 18:00 Drew Adams
  0 siblings, 0 replies; 15+ messages in thread
From: Drew Adams @ 2009-06-18 18:00 UTC (permalink / raw)
  To: emacs-pretest-bug

I don't claim this is a bug, but perhaps someone could take a look to
be sure.
 
In fringe.el, I see this text: Pavel Jan=c3=adk. The next-to-last
character shows in my Emacs with face `escape-glyph'. This is what
`C-u C-x =' shows:
 
        character: =ad (173, #o255, #xad)
preferred charset: iso-8859-1 (Latin-1 (ISO/IEC 8859-1))
       code point: 0xAD
           syntax: _  which means: symbol
         category: b:Arabic, h:Korean, j:Japanese, l:Latin
      buffer code: #xC2 #xAD
        file code: #xAD (encoded by coding system iso-latin-1-unix)
          display: by this font (glyph code)
    uniscribe:-outline-Courier
New-normal-normal-normal-mono-13-*-*-*-c-*-iso8859-1 (#x10)
   hardcoded face: escape-glyph
 
Character code properties: customize what to show
  name: SOFT HYPHEN
  general-category: Cf (Other, Format)
 
There are text properties here:
  charset              iso-8859-1
  face                 font-lock-comment-face
  fontified            t
 
Is this something weird, or is it OK? Since I have customized face
`escape-glyph', I notice this easily, but it is not really noticeable
in emacs -Q. Why would a soft hyphen character be displayed using face
`escape-glyph'?
 
BTW, when trying to send this, Emacs asks if I want to convert
non-ASCII chars to hexadecimal. Dunno which would be more helpful, so
I'll guess yes. The *Help* text quoted should give enough info,
anyway.
 
 
 
In GNU Emacs 23.0.94.1 (i386-mingw-nt5.1.2600)
 of 2009-05-24 on SOFT-MJASON
Windowing system distributor `Microsoft Corp.', version 5.1.2600
configured using `configure --with-gcc (3.4)'
 






^ permalink raw reply	[flat|nested] 15+ messages in thread

* bug#3607: 23.0.94; odd character in fringe.el
       [not found] <mailman.888.1245349050.2239.bug-gnu-emacs@gnu.org>
@ 2009-06-18 18:41 ` Teemu Likonen
  2009-06-18 18:54   ` Drew Adams
  0 siblings, 1 reply; 15+ messages in thread
From: Teemu Likonen @ 2009-06-18 18:41 UTC (permalink / raw)
  To: Drew Adams; +Cc: 3607

On 2009-06-18 11:00 (-0700), Drew Adams wrote:

> I don't claim this is a bug, but perhaps someone could take a look to
> be sure.
>  
> In fringe.el, I see this text: Pavel Jan=c3=adk. The next-to-last
> character shows in my Emacs with face `escape-glyph'. This is what
> `C-u C-x =' shows:

Perhaps you know most of this already but here's some information
anyway. The name is "Pavel Janík" and it displays just fine in my
system. File fringe.el is UTF-8-encoded.

When encoded in UTF-8 the character í (U+00CD) consists of two bytes,
0xC3 and 0xAD. When some system interpretes those bytes as separete
ISO-8859-1-encoded characters they are à (0xC3) and a soft hyphen
(0xAD). This is what your system seems to have done:

> preferred charset: iso-8859-1 (Latin-1 (ISO/IEC 8859-1))
>        code point: 0xAD

>   name: SOFT HYPHEN


So it sounds like some kind of singlebyte-multibyte encoding problem.





^ permalink raw reply	[flat|nested] 15+ messages in thread

* bug#3607: 23.0.94; odd character in fringe.el
  2009-06-18 18:41 ` bug#3607: 23.0.94; odd character in fringe.el Teemu Likonen
@ 2009-06-18 18:54   ` Drew Adams
  2009-06-18 19:20     ` Teemu Likonen
  0 siblings, 1 reply; 15+ messages in thread
From: Drew Adams @ 2009-06-18 18:54 UTC (permalink / raw)
  To: 'Teemu Likonen'; +Cc: 3607

> Perhaps you know most of this already but here's some information
> anyway. The name is "Pavel Janík" and it displays just fine in my
> system. File fringe.el is UTF-8-encoded.
> 
> When encoded in UTF-8 the character í (U+00CD) consists of two bytes,
> 0xC3 and 0xAD. When some system interpretes those bytes as separete
> ISO-8859-1-encoded characters they are à (0xC3) and a soft hyphen
> (0xAD). This is what your system seems to have done:
> 
> > preferred charset: iso-8859-1 (Latin-1 (ISO/IEC 8859-1))
> >        code point: 0xAD
> >   name: SOFT HYPHEN
> 
> So it sounds like some kind of singlebyte-multibyte encoding problem.

I see. Thanks for the info. I'm pretty ignorant about this stuff.

I see this in emacs -Q (on MS Windows), however, so I wonder if it isn't a bug.

And I wonder how you can see it as being UTF-8 encoded - are you using emacs -Q?

I don't see any local-variable thingy that would specify that the file is to be
UTF-8 encoded.






^ permalink raw reply	[flat|nested] 15+ messages in thread

* bug#3607: 23.0.94; odd character in fringe.el
  2009-06-18 18:54   ` Drew Adams
@ 2009-06-18 19:20     ` Teemu Likonen
  2009-06-18 21:08       ` Drew Adams
  0 siblings, 1 reply; 15+ messages in thread
From: Teemu Likonen @ 2009-06-18 19:20 UTC (permalink / raw)
  To: Drew Adams; +Cc: 3607

On 2009-06-18 11:54 (-0700), Drew Adams wrote:

> I see this in emacs -Q (on MS Windows), however, so I wonder if it
> isn't a bug.
>
> And I wonder how you can see it as being UTF-8 encoded - are you using
> emacs -Q?
>
> I don't see any local-variable thingy that would specify that the file
> is to be UTF-8 encoded.

Yes, the file shows correctly with "emacs -Q lisp/fringe.el" too. And it
is UTF-8 encoded file.

    $ file lisp/fringe.el
    lisp/fringe.el: UTF-8 Unicode English text

My Debian GNU/Linux system uses UTF-8 locale (as all GNU/Linux systems
do these days). Emacs probably detects my environment and uses correct
encoding settings. But I don't know Emacs works - except everything just
works. :-)

It really seems that your default environment is something other than
UTF-8, something single-byte.





^ permalink raw reply	[flat|nested] 15+ messages in thread

* bug#3607: 23.0.94; odd character in fringe.el
  2009-06-18 19:20     ` Teemu Likonen
@ 2009-06-18 21:08       ` Drew Adams
  2009-06-18 21:34         ` Lennart Borgman
  0 siblings, 1 reply; 15+ messages in thread
From: Drew Adams @ 2009-06-18 21:08 UTC (permalink / raw)
  To: 'Teemu Likonen'; +Cc: 3607

> Yes, the file shows correctly with "emacs -Q lisp/fringe.el" 
> too. And it is UTF-8 encoded file.
> 
>     $ file lisp/fringe.el
>     lisp/fringe.el: UTF-8 Unicode English text
> 
> My Debian GNU/Linux system uses UTF-8 locale (as all GNU/Linux systems
> do these days). Emacs probably detects my environment and uses correct
> encoding settings. But I don't know Emacs works - except 
> everything just
> works. :-)
> 
> It really seems that your default environment is something other than
> UTF-8, something single-byte.

OK, thanks for checking.

IMO, if the file should be encoded in UTF-8, then the file itself should control
that - as buff-menu.el does, for instance. The user's locale shouldn't enter
into it at this level. Seems like a bug, to me. (But I'm no expert on this.)






^ permalink raw reply	[flat|nested] 15+ messages in thread

* bug#3607: 23.0.94; odd character in fringe.el
  2009-06-18 21:08       ` Drew Adams
@ 2009-06-18 21:34         ` Lennart Borgman
  2009-06-19  0:47           ` Kenichi Handa
  0 siblings, 1 reply; 15+ messages in thread
From: Lennart Borgman @ 2009-06-18 21:34 UTC (permalink / raw)
  To: Drew Adams, 3607; +Cc: Teemu Likonen

On Thu, Jun 18, 2009 at 11:08 PM, Drew Adams<drew.adams@oracle.com> wrote:
>> Yes, the file shows correctly with "emacs -Q lisp/fringe.el"
>> too. And it is UTF-8 encoded file.
>>
>>     $ file lisp/fringe.el
>>     lisp/fringe.el: UTF-8 Unicode English text
>>
>> My Debian GNU/Linux system uses UTF-8 locale (as all GNU/Linux systems
>> do these days). Emacs probably detects my environment and uses correct
>> encoding settings. But I don't know Emacs works - except
>> everything just
>> works. :-)
>>
>> It really seems that your default environment is something other than
>> UTF-8, something single-byte.
>
> OK, thanks for checking.
>
> IMO, if the file should be encoded in UTF-8, then the file itself should control
> that - as buff-menu.el does, for instance. The user's locale shouldn't enter
> into it at this level. Seems like a bug, to me. (But I'm no expert on this.)


Yes, that must be a bug.





^ permalink raw reply	[flat|nested] 15+ messages in thread

* bug#3607: 23.0.94; odd character in fringe.el
  2009-06-18 21:34         ` Lennart Borgman
@ 2009-06-19  0:47           ` Kenichi Handa
  2009-06-27  1:07             ` Stefan Monnier
  0 siblings, 1 reply; 15+ messages in thread
From: Kenichi Handa @ 2009-06-19  0:47 UTC (permalink / raw)
  To: Lennart Borgman, 3607; +Cc: tlikonen, 3607

I've just added "coding: utf-8" cookie to fringe.el.

---
Kenichi Handa
handa@m17n.org


In article <e01d8a50906181434t6e6c296ega704dc11fbb77b31@mail.gmail.com>, Lennart Borgman <lennart.borgman@gmail.com> writes:

> On Thu, Jun 18, 2009 at 11:08 PM, Drew Adams<drew.adams@oracle.com> wrote:
>>> Yes, the file shows correctly with "emacs -Q lisp/fringe.el"
>>> too. And it is UTF-8 encoded file.
>>> 
>>>     $ file lisp/fringe.el
>>>     lisp/fringe.el: UTF-8 Unicode English text
>>> 
>>> My Debian GNU/Linux system uses UTF-8 locale (as all GNU/Linux systems
>>> do these days). Emacs probably detects my environment and uses correct
>>> encoding settings. But I don't know Emacs works - except
>>> everything just
>>> works. :-)
>>> 
>>> It really seems that your default environment is something other than
>>> UTF-8, something single-byte.
> >
> > OK, thanks for checking.
> >
> > IMO, if the file should be encoded in UTF-8, then the file itself should control
> > that - as buff-menu.el does, for instance. The user's locale shouldn't enter
> > into it at this level. Seems like a bug, to me. (But I'm no expert on this.)


> Yes, that must be a bug.





^ permalink raw reply	[flat|nested] 15+ messages in thread

* bug#3607: 23.0.94; odd character in fringe.el
  2009-06-19  0:47           ` Kenichi Handa
@ 2009-06-27  1:07             ` Stefan Monnier
  2009-06-27  1:25               ` Kenichi Handa
  0 siblings, 1 reply; 15+ messages in thread
From: Stefan Monnier @ 2009-06-27  1:07 UTC (permalink / raw)
  To: Kenichi Handa; +Cc: tlikonen, 3607

> I've just added "coding: utf-8" cookie to fringe.el.

Thanks.  But there is another bug here: a utf-8 file should be
recognized as such even in a latin-1 locale (i.e. utf-8 should always
have (one of) the highest priority).  IIUC this is done right in
GNU/Linux but not under Windows.


        Stefan





^ permalink raw reply	[flat|nested] 15+ messages in thread

* bug#3607: 23.0.94; odd character in fringe.el
  2009-06-27  1:07             ` Stefan Monnier
@ 2009-06-27  1:25               ` Kenichi Handa
  2009-06-27 21:44                 ` Stefan Monnier
  0 siblings, 1 reply; 15+ messages in thread
From: Kenichi Handa @ 2009-06-27  1:25 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: tlikonen, 3607

In article <jwveit6iizt.fsf-monnier+emacsbugreports@gnu.org>, Stefan Monnier <monnier@iro.umontreal.ca> writes:

> > I've just added "coding: utf-8" cookie to fringe.el.

> Thanks.  But there is another bug here: a utf-8 file should be
> recognized as such even in a latin-1 locale (i.e. utf-8 should always
> have (one of) the highest priority).

Current Emacs doesn't give utf-8 the higher priority than
iso-8859-1 in Latin-X language environment.  Are you
proposing such a change?  I can't decide that is good or not
because I'm not that familiar with such locales.

> IIUC this is done right in GNU/Linux but not under
> Windows.

?? Even on GNU/Linux (ubuntu), when I start emacs as this:

% LANG=de_DE emacs

iso-8859-1 has higher priority than utf-8.

Or, do you mean the other applications on GNU/Linux?

---
Kenichi Handa
handa@m17n.org





^ permalink raw reply	[flat|nested] 15+ messages in thread

* bug#3607: 23.0.94; odd character in fringe.el
  2009-06-27  1:25               ` Kenichi Handa
@ 2009-06-27 21:44                 ` Stefan Monnier
  2009-06-29  7:49                   ` Kenichi Handa
  0 siblings, 1 reply; 15+ messages in thread
From: Stefan Monnier @ 2009-06-27 21:44 UTC (permalink / raw)
  To: Kenichi Handa; +Cc: tlikonen, 3607

> Current Emacs doesn't give utf-8 the higher priority than
> iso-8859-1 in Latin-X language environment.  Are you
> proposing such a change?  I can't decide that is good or not
> because I'm not that familiar with such locales.

It is a good change, because the likelyhood of a valid utf-8 file being
a proper latin-1 file is extremely low.

> ?? Even on GNU/Linux (ubuntu), when I start emacs as this:
> % LANG=de_DE emacs
> iso-8859-1 has higher priority than utf-8.

Duh, you're right.  Even Emacs-22 still does that.  It's wrong.

> Or, do you mean the other applications on GNU/Linux?

No, I meant Emacs, I was convinced we'd fixed it in Emacs-22 for
GNU/Linux, but it appears I was confused.


        Stefan





^ permalink raw reply	[flat|nested] 15+ messages in thread

* bug#3607: 23.0.94; odd character in fringe.el
  2009-06-27 21:44                 ` Stefan Monnier
@ 2009-06-29  7:49                   ` Kenichi Handa
  2009-06-29  8:52                     ` Stefan Monnier
  2009-06-29 18:16                     ` Eli Zaretskii
  0 siblings, 2 replies; 15+ messages in thread
From: Kenichi Handa @ 2009-06-29  7:49 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: tlikonen, 3607

In article <jwvbpo9gxsu.fsf-monnier+emacsbugreports@gnu.org>, Stefan Monnier <monnier@iro.umontreal.ca> writes:

> > Current Emacs doesn't give utf-8 the higher priority than
> > iso-8859-1 in Latin-X language environment.  Are you
> > proposing such a change?  I can't decide that is good or not
> > because I'm not that familiar with such locales.

> It is a good change, because the likelyhood of a valid utf-8 file being
> a proper latin-1 file is extremely low.

Ok.  For that, we must do:

  (set-coding-system-priority 'utf-8) 

somewhere.  I at first thought it could be done by
`setup-function' of Latin-1 language environment.  Actually,
when a user does C-x C-m L Latin-1 RET, it works.

But, when emacs starts up, it calls set-locale-environment,
and it at first calls set-language-environment then
overrides coding-system setups.  So, at the moment, I don't
have a good idea other than this very ad-hoc change for 23.1.

--- mule-cmds.el.~1.360.~	2009-04-09 03:03:17.000000000 +0900
+++ mule-cmds.el	2009-06-29 16:45:08.000000000 +0900
@@ -2643,6 +2643,10 @@
 		   (not (coding-system-equal coding-system
 					     locale-coding-system)))
 	  (prefer-coding-system coding-system)
+	  ;; Even if we prefer "iso-latin-1", it is better to detect
+	  ;; UTF-8.
+	  (if (eq (coding-system-base coding-system) 'iso-latin-1)
+	      (set-coding-system-priority 'utf-8))
 	  ;; Fixme: perhaps prefer-coding-system should set this too.
 	  ;; But it's not the time to do such a fundamental change.
 	  (setq default-sendmail-coding-system coding-system)

For 23.2, I think we should re-design language-info-alist.

---
Kenichi Handa
handa@m17n.org





^ permalink raw reply	[flat|nested] 15+ messages in thread

* bug#3607: 23.0.94; odd character in fringe.el
  2009-06-29  7:49                   ` Kenichi Handa
@ 2009-06-29  8:52                     ` Stefan Monnier
  2009-06-29 11:39                       ` Kenichi Handa
  2009-06-29 18:16                     ` Eli Zaretskii
  1 sibling, 1 reply; 15+ messages in thread
From: Stefan Monnier @ 2009-06-29  8:52 UTC (permalink / raw)
  To: Kenichi Handa; +Cc: tlikonen, 3607

>> It is a good change, because the likelyhood of a valid utf-8 file being
>> a proper latin-1 file is extremely low.

> Ok.  For that, we must do:
>   (set-coding-system-priority 'utf-8) 
> somewhere.  I at first thought it could be done by
> `setup-function' of Latin-1 language environment.  Actually,
> when a user does C-x C-m L Latin-1 RET, it works.

> But, when emacs starts up, it calls set-locale-environment,
> and it at first calls set-language-environment then
> overrides coding-system setups.  So, at the moment, I don't
> have a good idea other than this very ad-hoc change for 23.1.

Actually, AFAIK the "unlikely false positives" property of the utf-8
encoding is not only true when applied to latin-1 files but also to most
other encodings.  So really utf-8 should probably always be first (not
only for latin-1 environments), except maybe for some envs where there's
a knows non-negligible risk of false positives.


        Stefan





^ permalink raw reply	[flat|nested] 15+ messages in thread

* bug#3607: 23.0.94; odd character in fringe.el
  2009-06-29  8:52                     ` Stefan Monnier
@ 2009-06-29 11:39                       ` Kenichi Handa
  0 siblings, 0 replies; 15+ messages in thread
From: Kenichi Handa @ 2009-06-29 11:39 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: tlikonen, 3607

In article <jwv63ef4e6w.fsf-monnier+emacsbugreports@gnu.org>, Stefan Monnier <monnier@iro.umontreal.ca> writes:

> Actually, AFAIK the "unlikely false positives" property of the utf-8
> encoding is not only true when applied to latin-1 files but also to most
> other encodings.  So really utf-8 should probably always be first (not
> only for latin-1 environments), except maybe for some envs where there's
> a knows non-negligible risk of false positives.

I think it's only Latin-X (and perhaps Vietnamese too) that
are mostly safe to give utf-8 the higher priority on code
detection, because only they use Latin script in which 8-bit
characters rarely appear succeedingly.

---
Kenichi Handa
handa@m17n.org





^ permalink raw reply	[flat|nested] 15+ messages in thread

* bug#3607: 23.0.94; odd character in fringe.el
  2009-06-29  7:49                   ` Kenichi Handa
  2009-06-29  8:52                     ` Stefan Monnier
@ 2009-06-29 18:16                     ` Eli Zaretskii
  2009-06-29 20:48                       ` Stefan Monnier
  1 sibling, 1 reply; 15+ messages in thread
From: Eli Zaretskii @ 2009-06-29 18:16 UTC (permalink / raw)
  To: Kenichi Handa, 3607; +Cc: tlikonen

> From: Kenichi Handa <handa@m17n.org>
> Date: Mon, 29 Jun 2009 16:49:32 +0900
> Cc: tlikonen@iki.fi, 3607@emacsbugs.donarmstrong.com
> 
> But, when emacs starts up, it calls set-locale-environment,
> and it at first calls set-language-environment then
> overrides coding-system setups.  So, at the moment, I don't
> have a good idea other than this very ad-hoc change for 23.1.

PLEEEEAAAAASE do _not_ make such ad-hoc changes on the branch at this time.
Experience shows that there be dragons, and we _do_ want to release
Emacs 23.1 some time this year...

> --- mule-cmds.el.~1.360.~	2009-04-09 03:03:17.000000000 +0900
> +++ mule-cmds.el	2009-06-29 16:45:08.000000000 +0900
> @@ -2643,6 +2643,10 @@
>  		   (not (coding-system-equal coding-system
>  					     locale-coding-system)))
>  	  (prefer-coding-system coding-system)
> +	  ;; Even if we prefer "iso-latin-1", it is better to detect
> +	  ;; UTF-8.
> +	  (if (eq (coding-system-base coding-system) 'iso-latin-1)
> +	      (set-coding-system-priority 'utf-8))
>  	  ;; Fixme: perhaps prefer-coding-system should set this too.
>  	  ;; But it's not the time to do such a fundamental change.
>  	  (setq default-sendmail-coding-system coding-system)
> 
> For 23.2, I think we should re-design language-info-alist.

Then let's defer the whole thing to Emacs 23.2.  It's not a grave
problem, IMO, certainly not worth taking a risk of unintended
consequences.





^ permalink raw reply	[flat|nested] 15+ messages in thread

* bug#3607: 23.0.94; odd character in fringe.el
  2009-06-29 18:16                     ` Eli Zaretskii
@ 2009-06-29 20:48                       ` Stefan Monnier
  0 siblings, 0 replies; 15+ messages in thread
From: Stefan Monnier @ 2009-06-29 20:48 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: tlikonen, 3607

>> But, when emacs starts up, it calls set-locale-environment,
>> and it at first calls set-language-environment then
>> overrides coding-system setups.  So, at the moment, I don't
>> have a good idea other than this very ad-hoc change for 23.1.

> PLEEEEAAAAASE do _not_ make such ad-hoc changes on the branch at this time.
> Experience shows that there be dragons, and we _do_ want to release
> Emacs 23.1 some time this year...

Indeed, those things should go on the trunk only,


        Stefan





^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2009-06-29 20:48 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <mailman.888.1245349050.2239.bug-gnu-emacs@gnu.org>
2009-06-18 18:41 ` bug#3607: 23.0.94; odd character in fringe.el Teemu Likonen
2009-06-18 18:54   ` Drew Adams
2009-06-18 19:20     ` Teemu Likonen
2009-06-18 21:08       ` Drew Adams
2009-06-18 21:34         ` Lennart Borgman
2009-06-19  0:47           ` Kenichi Handa
2009-06-27  1:07             ` Stefan Monnier
2009-06-27  1:25               ` Kenichi Handa
2009-06-27 21:44                 ` Stefan Monnier
2009-06-29  7:49                   ` Kenichi Handa
2009-06-29  8:52                     ` Stefan Monnier
2009-06-29 11:39                       ` Kenichi Handa
2009-06-29 18:16                     ` Eli Zaretskii
2009-06-29 20:48                       ` Stefan Monnier
2009-06-18 18:00 Drew Adams

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).