bug#26396: 25.1; char-displayable-p on a latin1 tty

unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed

* bug#26396: 25.1; char-displayable-p on a latin1 tty
@ 2017-04-08  2:20 Kevin Ryde
  2017-04-08  7:42 ` Eli Zaretskii
  0 siblings, 1 reply; 35+ messages in thread
From: Kevin Ryde @ 2017-04-08  2:20 UTC (permalink / raw)
  To: 26396

On a latin1 tty, char-displayable-p claims that almost all chars are
displayable, eg (char-displayable-p #x2022) => t.

I hoped when (terminal-coding-system) is iso-latin-1-unix that
char-displayable-p would say only relevant latin1 chars are displayable.

I believe this is how it was in emacs24 was (and which helped avoid a
lot of very unattractive \ display escaping, at least from code knowing
not to try to display the undisplayable).

In GNU Emacs 25.1.1 (i686-pc-linux-gnu, X toolkit, Xaw3d scroll bars)
 of 2017-01-01, modified by Debian built on x86-csail-01

Important settings:
  value of $LANG: en_AU.iso88591
  locale-coding-system: iso-latin-1-unix

^ permalink raw reply	[flat|nested] 35+ messages in thread

* bug#26396: 25.1; char-displayable-p on a latin1 tty
  2017-04-08  2:20 bug#26396: 25.1; char-displayable-p on a latin1 tty Kevin Ryde
@ 2017-04-08  7:42 ` Eli Zaretskii
  2017-04-09  5:16   ` Kevin Ryde
  0 siblings, 1 reply; 35+ messages in thread
From: Eli Zaretskii @ 2017-04-08  7:42 UTC (permalink / raw)
  To: Kevin Ryde; +Cc: 26396

> From: Kevin Ryde <user42_kevin@yahoo.com.au>
> Date: Sat, 08 Apr 2017 12:20:57 +1000
> 
> On a latin1 tty, char-displayable-p claims that almost all chars are
> displayable, eg (char-displayable-p #x2022) => t.

I cannot reproduce this.  I did

  emacs-25.1 -Q -nw
  C-x RET t latin-1 RET
  M-: (char-displayable-p #x2022) RET

and the result was nil.  How is the above different from what you
tried?

> I hoped when (terminal-coding-system) is iso-latin-1-unix that
> char-displayable-p would say only relevant latin1 chars are displayable.

It should be.  Please step through char-displayable-p, and see what
doesn't work there in your case.

Thanks.





^ permalink raw reply	[flat|nested] 35+ messages in thread

* bug#26396: 25.1; char-displayable-p on a latin1 tty
  2017-04-08  7:42 ` Eli Zaretskii
@ 2017-04-09  5:16   ` Kevin Ryde
  2017-04-10  6:47     ` Eli Zaretskii
  0 siblings, 1 reply; 35+ messages in thread
From: Kevin Ryde @ 2017-04-09  5:16 UTC (permalink / raw)
  To: 26396

Eli Zaretskii <eliz@gnu.org> writes:
>
>   emacs-25.1 -Q -nw
>   C-x RET t latin-1 RET
>   M-: (char-displayable-p #x2022) RET
>
> and the result was nil.

Ah, yes, me too under X and -nw in an xterm.

> How is the above different from what you tried?

Oops sorry not to say, this is linux console tty and LANG=en_AU.iso88591
so that (terminal-coding-system) is iso-latin-1-unix on startup.  It's
the linux fbcon thingie I think, but hope that doesn't make a
difference.  The effect (getting t) seems the same if the term coding is
set by C-x ret t later too like you showed.

> Please step through char-displayable-p, and see what
> doesn't work there in your case.

Ah, it gets to (internal-char-font nil #x2022) = 7, which goes to the
(<= 0 font-glyph) case and is t, not the terminal-coding-system checking
case.  If forced to the latter case it comes out nil as I had hoped.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* bug#26396: 25.1; char-displayable-p on a latin1 tty
  2017-04-09  5:16   ` Kevin Ryde
@ 2017-04-10  6:47     ` Eli Zaretskii
  2017-04-10  7:05       ` Eli Zaretskii
  2017-04-11  7:22       ` Kevin Ryde
  0 siblings, 2 replies; 35+ messages in thread
From: Eli Zaretskii @ 2017-04-10  6:47 UTC (permalink / raw)
  To: Kevin Ryde; +Cc: 26396

> From: Kevin Ryde <user42_kevin@yahoo.com.au>
> Date: Sun, 09 Apr 2017 15:16:18 +1000
> 
> > Please step through char-displayable-p, and see what
> > doesn't work there in your case.
> 
> Ah, it gets to (internal-char-font nil #x2022) = 7, which goes to the
> (<= 0 font-glyph) case and is t, not the terminal-coding-system checking
> case.

Then Emacs is doing TRT, AFAICT: it uses the GIO_UNIMAP that's
available on your system to query the kernel about characters
displayable by the console.  Are you saying that as a matter of fact
that character cannot be displayed by the console, i.e. that the code
which uses GIO_UNIMAP somehow misbehaves?





^ permalink raw reply	[flat|nested] 35+ messages in thread

* bug#26396: 25.1; char-displayable-p on a latin1 tty
  2017-04-10  6:47     ` Eli Zaretskii
@ 2017-04-10  7:05       ` Eli Zaretskii
  2017-04-10  7:45         ` Eli Zaretskii
  2017-04-13  6:19         ` Paul Eggert
  2017-04-11  7:22       ` Kevin Ryde
  1 sibling, 2 replies; 35+ messages in thread
From: Eli Zaretskii @ 2017-04-10  7:05 UTC (permalink / raw)
  To: Paul Eggert, user42_kevin; +Cc: 26396

> From: Eli Zaretskii <eliz@gnu.org>
> Cc: 26396@debbugs.gnu.org
> 
> > From: Kevin Ryde <user42_kevin@yahoo.com.au>
> > Date: Sun, 09 Apr 2017 15:16:18 +1000
> > 
> > > Please step through char-displayable-p, and see what
> > > doesn't work there in your case.
> > 
> > Ah, it gets to (internal-char-font nil #x2022) = 7, which goes to the
> > (<= 0 font-glyph) case and is t, not the terminal-coding-system checking
> > case.
> 
> Then Emacs is doing TRT, AFAICT: it uses the GIO_UNIMAP that's
> available on your system to query the kernel about characters
> displayable by the console.  Are you saying that as a matter of fact
> that character cannot be displayed by the console, i.e. that the code
> which uses GIO_UNIMAP somehow misbehaves?

Ah, I think I see the problem: the console indeed supports that
character, but since terminal-coding-system is latin-1, it cannot
encode it, so you see a question mark instead, is that right?

Paul, could you please look into this?  I think the code in
char-displayable-p which looks at the result of internal-char-font
should only accept a non-negative value if the terminal-coding-system
supports the character.  IOW, the Linux console should not be
considered as being able to display a character unless the terminal
encoding can safely encode it.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* bug#26396: 25.1; char-displayable-p on a latin1 tty
  2017-04-10  7:05       ` Eli Zaretskii
@ 2017-04-10  7:45         ` Eli Zaretskii
  2017-04-13  6:19         ` Paul Eggert
  1 sibling, 0 replies; 35+ messages in thread
From: Eli Zaretskii @ 2017-04-10  7:45 UTC (permalink / raw)
  To: eggert, user42_kevin; +Cc: 26396

> Date: Mon, 10 Apr 2017 10:05:01 +0300
> From: Eli Zaretskii <eliz@gnu.org>
> Cc: 26396@debbugs.gnu.org
> 
> Paul, could you please look into this?  I think the code in
> char-displayable-p which looks at the result of internal-char-font
> should only accept a non-negative value if the terminal-coding-system
> supports the character.  IOW, the Linux console should not be
> considered as being able to display a character unless the terminal
> encoding can safely encode it.

Or maybe the result of internal-char-font should be accepted only if
the terminal encoding is UTF-8 or some other Unicode encoding.





^ permalink raw reply	[flat|nested] 35+ messages in thread

* bug#26396: 25.1; char-displayable-p on a latin1 tty
  2017-04-10  6:47     ` Eli Zaretskii
  2017-04-10  7:05       ` Eli Zaretskii
@ 2017-04-11  7:22       ` Kevin Ryde
  1 sibling, 0 replies; 35+ messages in thread
From: Kevin Ryde @ 2017-04-11  7:22 UTC (permalink / raw)
  To: 26396; +Cc: eggert

Eli Zaretskii <eliz@gnu.org> writes:
>
>
> Ah, I think I see the problem: the console indeed supports that
> character, but since terminal-coding-system is latin-1, it cannot
> encode it, so you see a question mark instead, is that right?

I get the emacs \x1234 escapes.  (I might have preferred question marks
... I'm trying glyphless-char-display fallback slot as thin-space to see
if I like that better.)

I suppose \x suggests the emacs display code knows it can't show
(because can't encode it as you say) but char-displayable-p doesn't
correspond.

> GIO_UNIMAP ...

I'm not familiar with that.  It's what parts of unicode have font glyphs
available is it?  I suppose that'd be a double condition for
char-displayable-p, must be able to encode under term-coding-system, and
on the console in addition can verify there's a glyph.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* bug#26396: 25.1; char-displayable-p on a latin1 tty
  2017-04-10  7:05       ` Eli Zaretskii
  2017-04-10  7:45         ` Eli Zaretskii
@ 2017-04-13  6:19         ` Paul Eggert
  2017-04-13  7:16           ` Eli Zaretskii
  2017-04-13 22:07           ` Richard Stallman
  1 sibling, 2 replies; 35+ messages in thread
From: Paul Eggert @ 2017-04-13  6:19 UTC (permalink / raw)
  To: Eli Zaretskii, user42_kevin; +Cc: 26396

Eli Zaretskii wrote:
> I think the code in
> char-displayable-p which looks at the result of internal-char-font
> should only accept a non-negative value if the terminal-coding-system
> supports the character.  IOW, the Linux console should not be
> considered as being able to display a character unless the terminal
> encoding can safely encode it.

Wouldn't it be better if Emacs ignored terminal-coding-system when the output 
device is a Linux console and Emacs therefore knows exactly which characters the 
console can display? Instead, Emacs could simply display those characters as-is. 
This would result in a better user experience, surely.





^ permalink raw reply	[flat|nested] 35+ messages in thread

* bug#26396: 25.1; char-displayable-p on a latin1 tty
  2017-04-13  6:19         ` Paul Eggert
@ 2017-04-13  7:16           ` Eli Zaretskii
  2017-04-13 20:58             ` Paul Eggert
  2017-04-13 22:07           ` Richard Stallman
  1 sibling, 1 reply; 35+ messages in thread
From: Eli Zaretskii @ 2017-04-13  7:16 UTC (permalink / raw)
  To: Paul Eggert; +Cc: user42_kevin, 26396

> Cc: 26396@debbugs.gnu.org
> From: Paul Eggert <eggert@cs.ucla.edu>
> Date: Wed, 12 Apr 2017 23:19:53 -0700
> 
> Eli Zaretskii wrote:
> > I think the code in
> > char-displayable-p which looks at the result of internal-char-font
> > should only accept a non-negative value if the terminal-coding-system
> > supports the character.  IOW, the Linux console should not be
> > considered as being able to display a character unless the terminal
> > encoding can safely encode it.
> 
> Wouldn't it be better if Emacs ignored terminal-coding-system when the output 
> device is a Linux console and Emacs therefore knows exactly which characters the 
> console can display? Instead, Emacs could simply display those characters as-is. 

Yes, that would be better.  But it's probably a non-trivial project,
since we'd need separate code to determine double-width glyphs,
padding glyphs, and perhaps also something special for composed
characters.  Does the Linux console allow us to figure out all of
that?

And what does "display as-is" means in practice?  Should we send to
the console the glyph codes corresponding to Unicode points, or should
we send UTF-8 encoded characters?  (Is there some document which
describes these features in enough detail for us to figure out their
implications on Emacs display code?)

^ permalink raw reply	[flat|nested] 35+ messages in thread

* bug#26396: 25.1; char-displayable-p on a latin1 tty
  2017-04-13  7:16           ` Eli Zaretskii
@ 2017-04-13 20:58             ` Paul Eggert
  2017-04-14  3:01               ` Kevin Ryde
  2017-04-14 12:37               ` Eli Zaretskii
  0 siblings, 2 replies; 35+ messages in thread
From: Paul Eggert @ 2017-04-13 20:58 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: user42_kevin, 26396

On 04/13/2017 12:16 AM, Eli Zaretskii wrote:
> Yes, that would be better.  But it's probably a non-trivial project,
> since we'd need separate code to determine double-width glyphs,
> padding glyphs, and perhaps also something special for composed
> characters.  Does the Linux console allow us to figure out all of
> that?

This should not be a problem, as the Linux console has only single-width 
characters.

>
> And what does "display as-is" means in practice?  Should we send to
> the console the glyph codes corresponding to Unicode points, or should
> we send UTF-8 encoded characters?

It depends on whether the console is in UTF-8 mode. If so, send UTF-8; 
if not, send a byte that is transformed according to the current mapping 
table into a Unicode value. I hope we don't need to bother with the 
latter possibility.

> (Is there some document which
> describes these features in enough detail for us to figure out their
> implications on Emacs display code?)

Nothing definitive, but there is:

http://www.tldp.org/LDP/LG/issue91/loozzr.html
http://man7.org/linux/man-pages/man4/console_codes.4.html





^ permalink raw reply	[flat|nested] 35+ messages in thread

* bug#26396: 25.1; char-displayable-p on a latin1 tty
  2017-04-13  6:19         ` Paul Eggert
  2017-04-13  7:16           ` Eli Zaretskii
@ 2017-04-13 22:07           ` Richard Stallman
  2017-04-13 22:18             ` Paul Eggert
  1 sibling, 1 reply; 35+ messages in thread
From: Richard Stallman @ 2017-04-13 22:07 UTC (permalink / raw)
  To: Paul Eggert; +Cc: user42_kevin, 26396

[[[ To any NSA and FBI agents reading my email: please consider    ]]]
[[[ whether defending the US Constitution against all enemies,     ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

  > Wouldn't it be better if Emacs ignored terminal-coding-system when the output 
  > device is a Linux console and Emacs therefore knows exactly which characters the 
  > console can display?

Does Emacs always know for certain?

What if it is talking to that console via ssh?

-- 
Dr Richard Stallman
President, Free Software Foundation (gnu.org, fsf.org)
Internet Hall-of-Famer (internethalloffame.org)
Skype: No way! See stallman.org/skype.html.






^ permalink raw reply	[flat|nested] 35+ messages in thread

* bug#26396: 25.1; char-displayable-p on a latin1 tty
  2017-04-13 22:07           ` Richard Stallman
@ 2017-04-13 22:18             ` Paul Eggert
  2017-04-14 19:48               ` Richard Stallman
  0 siblings, 1 reply; 35+ messages in thread
From: Paul Eggert @ 2017-04-13 22:18 UTC (permalink / raw)
  To: rms; +Cc: user42_kevin, 26396

On 04/13/2017 03:07 PM, Richard Stallman wrote:
>    > Wouldn't it be better if Emacs ignored terminal-coding-system when the output
>    > device is a Linux console and Emacs therefore knows exactly which characters the
>    > console can display?
>
> Does Emacs always know for certain?
If Emacs is attached directly to a console, yes. There's already some 
code that does this, by using Linux-specific syscalls.

> What if it is talking to that console via ssh?

Then Emacs won't know, and will fall back on generic terminal code. 
However, the case that prompted this thread is where part of Emacs does 
know and part does not, and the mismatch causes a problem. I'm proposing 
that different parts of Emacs treat Linux consoles more consistently.






^ permalink raw reply	[flat|nested] 35+ messages in thread

* bug#26396: 25.1; char-displayable-p on a latin1 tty
  2017-04-13 20:58             ` Paul Eggert
@ 2017-04-14  3:01               ` Kevin Ryde
  2017-04-14 18:59                 ` Paul Eggert
  2017-04-14 12:37               ` Eli Zaretskii
  1 sibling, 1 reply; 35+ messages in thread
From: Kevin Ryde @ 2017-04-14  3:01 UTC (permalink / raw)
  To: 26396; +Cc: Paul Eggert

Paul Eggert <eggert@cs.ucla.edu> writes:
>
> It depends on whether the console is in UTF-8 mode.  If so, send UTF-8;
> if not, send a byte that is transformed according to the current
> mapping table into a Unicode value.  I hope we don't need to bother
> with the latter possibility.

The latter is just terminal-coding-system though is it?, which is what
happens now.  Would char-displayable-p always check encodable and then
ponder what further the kernel ioctls can say about viewable glphs.





^ permalink raw reply	[flat|nested] 35+ messages in thread

* bug#26396: 25.1; char-displayable-p on a latin1 tty
  2017-04-13 20:58             ` Paul Eggert
  2017-04-14  3:01               ` Kevin Ryde
@ 2017-04-14 12:37               ` Eli Zaretskii
  2017-04-14 18:56                 ` Paul Eggert
  1 sibling, 1 reply; 35+ messages in thread
From: Eli Zaretskii @ 2017-04-14 12:37 UTC (permalink / raw)
  To: Paul Eggert; +Cc: user42_kevin, 26396

> Cc: user42_kevin@yahoo.com.au, 26396@debbugs.gnu.org
> From: Paul Eggert <eggert@cs.ucla.edu>
> Date: Thu, 13 Apr 2017 13:58:45 -0700
> 
> On 04/13/2017 12:16 AM, Eli Zaretskii wrote:
> > Yes, that would be better.  But it's probably a non-trivial project,
> > since we'd need separate code to determine double-width glyphs,
> > padding glyphs, and perhaps also something special for composed
> > characters.  Does the Linux console allow us to figure out all of
> > that?
> 
> This should not be a problem, as the Linux console has only single-width 
> characters.

Are you sure?  AFAIU, the Linux console supports the BMP, and some of
the characters in the BMP are double-width (a.k.a. "full-width"), for
example U+1100, U+231A, U+2B1B, and others.  What does the Linux
console do when these characters are sent to the screen driver?

> > And what does "display as-is" means in practice?  Should we send to
> > the console the glyph codes corresponding to Unicode points, or should
> > we send UTF-8 encoded characters?
> 
> It depends on whether the console is in UTF-8 mode. If so, send UTF-8; 
> if not, send a byte that is transformed according to the current mapping 
> table into a Unicode value. I hope we don't need to bother with the 
> latter possibility.

What software puts the console in UTF-8 mode?  Is that the locale
setting?

> > (Is there some document which
> > describes these features in enough detail for us to figure out their
> > implications on Emacs display code?)
> 
> Nothing definitive, but there is:
> 
> http://www.tldp.org/LDP/LG/issue91/loozzr.html
> http://man7.org/linux/man-pages/man4/console_codes.4.html

Thanks, but that seems to be just the tip of an iceberg.  Or maybe the
issue is easier than I envisioned.

Suppose we only wanted to use this feature for UTF-8 locales.
Assuming that the OS takes care of putting the console in UTF-8 mode,
we don't need any changes in Emacs, since Emacs already sends UTF-8
sequences to the screen driver.  As the Linux console only supports
the BMP, we could then simply amend the code of char-displayable-p to
check whether a character is within the BMP, when the terminal is the
Linux console.  Do you agree with this conclusion?

OTOH, now I'm not sure I understand the need for terminal_glyph_code.
What does it do that a simple check for a Linux console and UTF-8
terminal encoding, plus a character being inside a BMP, don't?

^ permalink raw reply	[flat|nested] 35+ messages in thread

* bug#26396: 25.1; char-displayable-p on a latin1 tty
  2017-04-14 12:37               ` Eli Zaretskii
@ 2017-04-14 18:56                 ` Paul Eggert
  2017-04-15  8:48                   ` Eli Zaretskii
  0 siblings, 1 reply; 35+ messages in thread
From: Paul Eggert @ 2017-04-14 18:56 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: user42_kevin, 26396

On 04/14/2017 05:37 AM, Eli Zaretskii wrote:
>> This should not be a problem, as the Linux console has only 
>> single-width characters. 
> Are you sure?  AFAIU, the Linux console supports the BMP, and some of
> the characters in the BMP are double-width (a.k.a. "full-width"), for
> example U+1100, U+231A, U+2B1B, and others.  What does the Linux
> console do when these characters are sent to the screen driver?

I haven't experimented with it, so I'm not 100% sure. However, as I 
understand the implementation, the console driver can support at most 
512 simultaneously-displayable characters, as this is a property of the 
classic IBM VGA design that is the greatest common denominator of 
current or recent (post-1990) PC graphics hardware. The user can specify 
what each character looks like down to the pixel level, but cannot alter 
character sizes on a character-by-character basis. In theory one could 
display double-wide characters by splitting them into halves and 
displaying each half separately, but I don't know of anyone who does 
that (it would not be practical due to that 512 limit).

>
>>> And what does "display as-is" means in practice?  Should we send to
>>> the console the glyph codes corresponding to Unicode points, or should
>>> we send UTF-8 encoded characters?
>> It depends on whether the console is in UTF-8 mode. If so, send UTF-8;
>> if not, send a byte that is transformed according to the current mapping
>> table into a Unicode value. I hope we don't need to bother with the
>> latter possibility.
> What software puts the console in UTF-8 mode?  Is that the locale
> setting?

It's done at boot time. The escape sequences ESC % G (or ESC % 8) and 
ESC % @ get you into and out of UTF-8 mode; see 
<http://man7.org/linux/man-pages/man4/console_codes.4.html>. Common 
practice is to stay in UTF-8 mode as the alternative is worse (it has 
only 256 simultaneously-displayable characters).

> http://www.tldp.org/LDP/LG/issue91/loozzr.html
> http://man7.org/linux/man-pages/man4/console_codes.4.html
> that seems to be just the tip of an iceberg.  Or maybe the
> issue is easier than I envisioned.

Both, I hope. :-)

> Suppose we only wanted to use this feature for UTF-8 locales.
> Assuming that the OS takes care of putting the console in UTF-8 mode,
> we don't need any changes in Emacs, since Emacs already sends UTF-8
> sequences to the screen driver.  As the Linux console only supports
> the BMP, we could then simply amend the code of char-displayable-p to
> check whether a character is within the BMP, when the terminal is the
> Linux console.  Do you agree with this conclusion?

No, because a character is displayable only if it's in that set of 
at-most-512 characters.

> OTOH, now I'm not sure I understand the need for terminal_glyph_code.
> What does it do that a simple check for a Linux console and UTF-8
> terminal encoding, plus a character being inside a BMP, don't?

terminal_glyph_code gets the current set of at-most-512 displayable 
characters from from the kernel.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* bug#26396: 25.1; char-displayable-p on a latin1 tty
  2017-04-14  3:01               ` Kevin Ryde
@ 2017-04-14 18:59                 ` Paul Eggert
  0 siblings, 0 replies; 35+ messages in thread
From: Paul Eggert @ 2017-04-14 18:59 UTC (permalink / raw)
  To: Kevin Ryde, 26396

On 04/13/2017 08:01 PM, Kevin Ryde wrote:
> Would char-displayable-p always check encodable and then
> ponder what further the kernel ioctls can say about viewable glphs.

Yes, that's what char-displayable-p does now: it checks that the 
character is one of the at-most-512 characters that the Linux console 
can currently display. For Linux consoles this is independent of 
terminal-coding-system. The problem is that other parts of the code obey 
terminal-coding-system instead of what the Linux console can actually 
display.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* bug#26396: 25.1; char-displayable-p on a latin1 tty
  2017-04-13 22:18             ` Paul Eggert
@ 2017-04-14 19:48               ` Richard Stallman
  0 siblings, 0 replies; 35+ messages in thread
From: Richard Stallman @ 2017-04-14 19:48 UTC (permalink / raw)
  To: Paul Eggert; +Cc: user42_kevin, 26396

[[[ To any NSA and FBI agents reading my email: please consider    ]]]
[[[ whether defending the US Constitution against all enemies,     ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

  > > Does Emacs always know for certain?
  > If Emacs is attached directly to a console, yes. There's already some 
  > code that does this, by using Linux-specific syscalls.

  > > What if it is talking to that console via ssh?

If we can provide the data thru terminfo
then the functionality will work even thru ssh.

-- 
Dr Richard Stallman
President, Free Software Foundation (gnu.org, fsf.org)
Internet Hall-of-Famer (internethalloffame.org)
Skype: No way! See stallman.org/skype.html.






^ permalink raw reply	[flat|nested] 35+ messages in thread

* bug#26396: 25.1; char-displayable-p on a latin1 tty
  2017-04-14 18:56                 ` Paul Eggert
@ 2017-04-15  8:48                   ` Eli Zaretskii
  2017-04-15 21:12                     ` Paul Eggert
  0 siblings, 1 reply; 35+ messages in thread
From: Eli Zaretskii @ 2017-04-15  8:48 UTC (permalink / raw)
  To: Paul Eggert; +Cc: user42_kevin, 26396

> > Suppose we only wanted to use this feature for UTF-8 locales.
> > Assuming that the OS takes care of putting the console in UTF-8 mode,
> > we don't need any changes in Emacs, since Emacs already sends UTF-8
> > sequences to the screen driver.  As the Linux console only supports
> > the BMP, we could then simply amend the code of char-displayable-p to
> > check whether a character is within the BMP, when the terminal is the
> > Linux console.  Do you agree with this conclusion?
> 
> No, because a character is displayable only if it's in that set of 
> at-most-512 characters.
> 
> > OTOH, now I'm not sure I understand the need for terminal_glyph_code.
> > What does it do that a simple check for a Linux console and UTF-8
> > terminal encoding, plus a character being inside a BMP, don't?
> 
> terminal_glyph_code gets the current set of at-most-512 displayable 
> characters from from the kernel.

Right, I missed the 512-character part.  Quite a limitation, btw.

So, the plan seems to be this:

  . make sure the terminal is in Unicode mode, and that the user
    didn't override by a call to set-terminal-coding-system
  . if a character has a glyph in the Unicode font, send a UTF-8
    encoding for the character to the screen, disregarding the
    terminal encoding as mandated by the locale
  . if the character doesn't have a glyph in the console font, treat
    it as glyphless
  . if the conditions in the first item above are not met, fall back
    to the current code which encodes using the terminal encoding

I notice that we don't use terminal_glyph_code when determining
whether a given character should be treated as glyphless, so I guess
that means we could produce something other than what
glyphless-char-display says for a given character; this should be
fixed.

Also, the above means set-locale-environment should not call
set-terminal-coding-system if the display is a Linux console that
supports this feature.

Is that right?





^ permalink raw reply	[flat|nested] 35+ messages in thread

* bug#26396: 25.1; char-displayable-p on a latin1 tty
  2017-04-15  8:48                   ` Eli Zaretskii
@ 2017-04-15 21:12                     ` Paul Eggert
  2017-04-16  5:59                       ` Eli Zaretskii
  0 siblings, 1 reply; 35+ messages in thread
From: Paul Eggert @ 2017-04-15 21:12 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: user42_kevin, 26396

Eli Zaretskii wrote:
> So, the plan seems to be this:
>
>   . make sure the terminal is in Unicode mode,

I don't think we need to worry about whether the console is in UTF-8 mode. UTF-8 
mode has been the default for years, and unless the user goes to a good deal of 
trouble (and I suspect this part of the Linux kernel hasn't been tested 
recently) we can assume UTF-8 mode.

There is a subtlety here. The console can be in UTF-8 mode for input ('stty 
iutf8' vs 'stty -iutf8'), but that's not what we're concerned about: we're 
concerned whether it's in UTF-8 mode for output. I don't see how the user can 
affect the latter other than by outputting ESC % @ and ESC % G. And I just now 
tried outputting these sequences to my Linux console but they didn't seem to 
affect anything. Without the ability to test this stuff and with no real need to 
worry about it that I can see, I suggest that we just assume UTF-8 mode.

>and that the user
>     didn't override by a call to set-terminal-coding-system

It might be simpler to not worry about this, under the argument that the Linux 
console is not a terminal in the usual sense. Certainly set-locale-environment 
should not override the fact that Emacs is connected to a Linux console. (You 
mention this below.)

>   . if a character has a glyph in the Unicode font, send a UTF-8
>     encoding for the character to the screen, disregarding the
>     terminal encoding as mandated by the locale
>   . if the character doesn't have a glyph in the console font, treat
>     it as glyphless

This sounds right.

>   . if the conditions in the first item above are not met, fall back
>     to the current code which encodes using the terminal encoding

As I mentioned above, perhaps we should not worry about those conditions and 
therefore not worry about falling back to the current code.

> I notice that we don't use terminal_glyph_code when determining
> whether a given character should be treated as glyphless, so I guess
> that means we could produce something other than what
> glyphless-char-display says for a given character; this should be
> fixed.

Sorry, I am not quite following this, but yes the various parts of Emacs should 
be consistent in this area.

> Also, the above means set-locale-environment should not call
> set-terminal-coding-system if the display is a Linux console that
> supports this feature.

This matters only if we worry about the terminal coding system in Linux 
consoles, which it isn't clear we should do.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* bug#26396: 25.1; char-displayable-p on a latin1 tty
  2017-04-15 21:12                     ` Paul Eggert
@ 2017-04-16  5:59                       ` Eli Zaretskii
  2017-04-16 20:25                         ` Paul Eggert
  2017-04-17  3:00                         ` Kevin Ryde
  0 siblings, 2 replies; 35+ messages in thread
From: Eli Zaretskii @ 2017-04-16  5:59 UTC (permalink / raw)
  To: Paul Eggert; +Cc: user42_kevin, 26396

> Cc: user42_kevin@yahoo.com.au, 26396@debbugs.gnu.org
> From: Paul Eggert <eggert@cs.ucla.edu>
> Date: Sat, 15 Apr 2017 14:12:40 -0700
> 
> Eli Zaretskii wrote:
> > So, the plan seems to be this:
> >
> >   . make sure the terminal is in Unicode mode,
> 
> I don't think we need to worry about whether the console is in UTF-8 mode. UTF-8 
> mode has been the default for years, and unless the user goes to a good deal of 
> trouble (and I suspect this part of the Linux kernel hasn't been tested 
> recently) we can assume UTF-8 mode.
> 
> There is a subtlety here. The console can be in UTF-8 mode for input ('stty 
> iutf8' vs 'stty -iutf8'), but that's not what we're concerned about: we're 
> concerned whether it's in UTF-8 mode for output. I don't see how the user can 
> affect the latter other than by outputting ESC % @ and ESC % G. And I just now 
> tried outputting these sequences to my Linux console but they didn't seem to 
> affect anything. Without the ability to test this stuff and with no real need to 
> worry about it that I can see, I suggest that we just assume UTF-8 mode.

That depends on how easy it is to check whether the console is in
UTF-8 mode.  Isn't that just another ioctl?  If doing so is not too
hard, I'd prefer to include such a test.  Users IME are likely to find
and (ab)use any dark corner they have at their disposal, and I'd
prefer to have a sound solution rather than leave subtle bugs that
wait to be reported.

> >and that the user
> >     didn't override by a call to set-terminal-coding-system
> 
> It might be simpler to not worry about this, under the argument that the Linux 
> console is not a terminal in the usual sense.

Once again, checking this is easy, and I'd prefer that Emacs didn't
get in users' ways of doing what they want.  We've heard over the
years from several users who make a point of using non UTF-8 locales,
and I expect them to have reasons for that.  I wouldn't want us to
break their configurations.

> > I notice that we don't use terminal_glyph_code when determining
> > whether a given character should be treated as glyphless, so I guess
> > that means we could produce something other than what
> > glyphless-char-display says for a given character; this should be
> > fixed.
> 
> Sorry, I am not quite following this, but yes the various parts of Emacs should 
> be consistent in this area.

Glyphless characters are those that cannot be displayed.  On GUI
frames, we determine that by looking up the character in the available
fonts; if none is available, we display the character as determined by
glyphless-char-display.  On TTY frames, we do it differently, and the
way we do it doesn't currently consult the char-table created by
calculate_glyph_code_table.  I'm saying that we should, because
otherwise we let certain characters be displayed with the console's
replacement glyph instead of the way mandated by glyphless-char-display.

> > Also, the above means set-locale-environment should not call
> > set-terminal-coding-system if the display is a Linux console that
> > supports this feature.
> 
> This matters only if we worry about the terminal coding system in Linux 
> consoles, which it isn't clear we should do.

I think we should.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* bug#26396: 25.1; char-displayable-p on a latin1 tty
  2017-04-16  5:59                       ` Eli Zaretskii
@ 2017-04-16 20:25                         ` Paul Eggert
  2017-04-17  6:19                           ` Eli Zaretskii
  2017-04-17  3:00                         ` Kevin Ryde
  1 sibling, 1 reply; 35+ messages in thread
From: Paul Eggert @ 2017-04-16 20:25 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: user42_kevin, 26396

Eli Zaretskii wrote:

> That depends on how easy it is to check whether the console is in
> UTF-8 mode.  Isn't that just another ioctl?

Not as far as I know, for output mode. I looked for one and could not find it.

>>> and that the user
>>>     didn't override by a call to set-terminal-coding-system
>>
>> It might be simpler to not worry about this, under the argument that the Linux
>> console is not a terminal in the usual sense.
>
> Once again, checking this is easy

I don't see offhand how to distinguish a user's call to 
set-terminal-coding-system from one that Emacs does internally as part of its 
existing heuristics. Plus, even if the user invokes set-terminal-coding-system, 
when the Linux console is in UTF-8 mode (as it invariably is these days) Emacs 
will do the wrong thing if blindly follows the user's setting.

> We've heard over the
> years from several users who make a point of using non UTF-8 locales,

On Linux consoles? Who does that nowadays?

CJK locales have never worked on the Linux console, so the only concerns here 
are ISO 8859 Latin and Cyrillic consoles, that sort of thing. Generally 
speaking, the rare people who care about Linux console encoding and want to use 
non-ASCII characters on their Linux consoles, switched from 8-bit locales to 
UTF-8 long ago: the code was added to Linux in 2007 and UTF-8 mode was made the 
default, and users took the usual one to three years to switch. So this is all 
ancient history now by GNU/Linux standards. It's not clear that we can even test 
the old 8-bit mode any more; it didn't work on my Fedora 25 Linux console when I 
tried. It's a waste of time to write code that isn't needed and can't be tested.

> Glyphless characters are those that cannot be displayed.  On GUI
> frames, we determine that by looking up the character in the available
> fonts; if none is available, we display the character as determined by
> glyphless-char-display.  On TTY frames, we do it differently, and the
> way we do it doesn't currently consult the char-table created by
> calculate_glyph_code_table.  I'm saying that we should

Yes, exactly. A frame connected to a Linux console should act like a GUI frame 
not an ordinary tty frame, because we know which characters the console can 
display and we don't have to resort to guesswork and user settings like we do 
with an ordinary tty frame.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* bug#26396: 25.1; char-displayable-p on a latin1 tty
  2017-04-16  5:59                       ` Eli Zaretskii
  2017-04-16 20:25                         ` Paul Eggert
@ 2017-04-17  3:00                         ` Kevin Ryde
  2017-04-17  3:26                           ` Paul Eggert
  1 sibling, 1 reply; 35+ messages in thread
From: Kevin Ryde @ 2017-04-17  3:00 UTC (permalink / raw)
  To: 26396; +Cc: Paul Eggert

Eli Zaretskii <eliz@gnu.org> writes:
>
> We've heard over the
> years from several users who make a point of using non UTF-8 locales,
> and I expect them to have reasons for that.  I wouldn't want us to
> break their configurations.

Debian has an easy line in /etc/default/console-setup.  The default is
now utf8 but there's lots more.  I have it latin1.

Paul Eggert <eggert@cs.ucla.edu> writes:
>
> ... The problem is that other parts of the code
> obey terminal-coding-system instead of what the Linux console can
> actually display.

I think I'd like char-displayable-p to obey terminal-coding-system too
like those parts, since that is the bytes which are sent out.

I imagine making minimal enquiries into the console settings would be
both less thinking needed in emacs and more give the user what setup
they chose.  Doing stuff to change the mode doesn't sound good.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* bug#26396: 25.1; char-displayable-p on a latin1 tty
  2017-04-17  3:00                         ` Kevin Ryde
@ 2017-04-17  3:26                           ` Paul Eggert
  2017-04-17  5:56                             ` Paul Eggert
  2017-04-17  6:24                             ` Eli Zaretskii
  0 siblings, 2 replies; 35+ messages in thread
From: Paul Eggert @ 2017-04-17  3:26 UTC (permalink / raw)
  To: Kevin Ryde, 26396

Kevin Ryde wrote:

> Debian has an easy line in /etc/default/console-setup.  The default is
> now utf8 but there's lots more.  I have it latin1.

In that case I stand corrected: some people are stull using non-UTF-8 Linux 
consoles. I don't know of any convenient programmatic way for Emacs to determine 
whether the console is in UTF-8 output mode, though. (I can think of complicated 
ways, involving outputting bytes to the screen and seeing what happens to the 
cursor position; but this would be destructive to the screen contents.) I agree 
Emacs shouldn't be changing the mode.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* bug#26396: 25.1; char-displayable-p on a latin1 tty
  2017-04-17  3:26                           ` Paul Eggert
@ 2017-04-17  5:56                             ` Paul Eggert
  2017-04-17  7:33                               ` Eli Zaretskii
  2017-04-17  6:24                             ` Eli Zaretskii
  1 sibling, 1 reply; 35+ messages in thread
From: Paul Eggert @ 2017-04-17  5:56 UTC (permalink / raw)
  To: Kevin Ryde, 26396

[-- Attachment #1: Type: text/plain, Size: 519 bytes --]

Paul Eggert wrote:
> In that case I stand corrected: some people are stull using non-UTF-8 Linux
> consoles.

I installed the attached patch into master to try to work around the problem 
that prompted the original bug report. This patch assumes that the terminal 
coding system is compatible with the Linux console output mode (either UTF-8, or 
unibyte), which I hope is good enough, as anybody whose locale is incompatible 
with the output mode will have lots of other problems anyway. Please give the 
patch a try.

[-- Attachment #2: 0001-Work-around-bug-with-unibyte-Linux-consoles.patch --]
[-- Type: text/x-diff, Size: 997 bytes --]

From 746e0bb2fc148cdb96bdde75e810dd5ce446e3a4 Mon Sep 17 00:00:00 2001
From: Paul Eggert <eggert@cs.ucla.edu>
Date: Sun, 16 Apr 2017 22:50:02 -0700
Subject: [PATCH] Work around bug with unibyte Linux consoles

* src/terminal.c (terminal_glyph_code): Skip the UTF-8 stuff if
the terminal's coding system is unibyte (Bug#26396).
---
 src/terminal.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/src/terminal.c b/src/terminal.c
index 0b1cbe7..3d25b36 100644
--- a/src/terminal.c
+++ b/src/terminal.c
@@ -575,7 +575,9 @@ Lisp_Object
 terminal_glyph_code (struct terminal *t, int ch)
 {
 #if HAVE_STRUCT_UNIPAIR_UNICODE
-  if (t->type == output_termcap)
+  /* Heuristically assume that a terminal supporting glyph codes is in
+     UTF-8 mode if and only if its coding system is multibyte (Bug#26396).  */
+  if (t->type == output_termcap && t->terminal_coding->src_multibyte)
     {
       /* As a hack, recompute the table when CH is the maximum
 	 character.  */
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* bug#26396: 25.1; char-displayable-p on a latin1 tty
  2017-04-16 20:25                         ` Paul Eggert
@ 2017-04-17  6:19                           ` Eli Zaretskii
  0 siblings, 0 replies; 35+ messages in thread
From: Eli Zaretskii @ 2017-04-17  6:19 UTC (permalink / raw)
  To: Paul Eggert; +Cc: user42_kevin, 26396

> Cc: user42_kevin@yahoo.com.au, 26396@debbugs.gnu.org
> From: Paul Eggert <eggert@cs.ucla.edu>
> Date: Sun, 16 Apr 2017 13:25:42 -0700
> 
> Eli Zaretskii wrote:
> 
> > That depends on how easy it is to check whether the console is in
> > UTF-8 mode.  Isn't that just another ioctl?
> 
> Not as far as I know, for output mode. I looked for one and could not find it.

Then perhaps we should intuit that from terminal-coding-system.

> >>> and that the user
> >>>     didn't override by a call to set-terminal-coding-system
> >>
> >> It might be simpler to not worry about this, under the argument that the Linux
> >> console is not a terminal in the usual sense.
> >
> > Once again, checking this is easy
> 
> I don't see offhand how to distinguish a user's call to 
> set-terminal-coding-system from one that Emacs does internally as part of its 
> existing heuristics.

There are only a handful of such calls.  What I had in mind was to
review our heuristics and avoid calling set-terminal-coding-system
there.  Failing that, we could put a property on, say, the terminal
object to indicate whether the setting is by user or by Emacs.

> Plus, even if the user invokes set-terminal-coding-system, 
> when the Linux console is in UTF-8 mode (as it invariably is these days) Emacs 
> will do the wrong thing if blindly follows the user's setting.

Sorry, I don't follow: what wrong thing will Emacs do in this case?
Did you mean the user invoked set-terminal-coding-system with UTF-8 as
argument or with some other encoding?

> > We've heard over the
> > years from several users who make a point of using non UTF-8 locales,
> 
> On Linux consoles? Who does that nowadays?

I think Alan does, and we now know that Kevin does as well.  Plus,
term/linux.el does this:

  (defun terminal-init-linux ()
    "Terminal initialization function for linux."
    (unless (terminal-coding-system)
      (set-terminal-coding-system 'iso-latin-1))

which I guess needs to be revised?

> CJK locales have never worked on the Linux console, so the only concerns here 
> are ISO 8859 Latin and Cyrillic consoles, that sort of thing.

What about ISO 8859-8 or ISO 8859-6?

> Generally 
> speaking, the rare people who care about Linux console encoding and want to use 
> non-ASCII characters on their Linux consoles, switched from 8-bit locales to 
> UTF-8 long ago: the code was added to Linux in 2007 and UTF-8 mode was made the 
> default, and users took the usual one to three years to switch. So this is all 
> ancient history now by GNU/Linux standards. It's not clear that we can even test 
> the old 8-bit mode any more; it didn't work on my Fedora 25 Linux console when I 
> tried. It's a waste of time to write code that isn't needed and can't be tested.

I see your point, but I think we've been burnt by such decisions in
the past, so I'd prefer to leave a fire escape for users who for some
reasons don't follow the above patterns.  And the code that supports
the 8-bit mode is already written, we just need to leave it in place
when some conditions aren't satisfied.

> > Glyphless characters are those that cannot be displayed.  On GUI
> > frames, we determine that by looking up the character in the available
> > fonts; if none is available, we display the character as determined by
> > glyphless-char-display.  On TTY frames, we do it differently, and the
> > way we do it doesn't currently consult the char-table created by
> > calculate_glyph_code_table.  I'm saying that we should
> 
> Yes, exactly. A frame connected to a Linux console should act like a GUI frame 
> not an ordinary tty frame, because we know which characters the console can 
> display and we don't have to resort to guesswork and user settings like we do 
> with an ordinary tty frame.

Well, actually the TTY frames also pretends to know which characters
the console can display, it just does a less than perfect job in the
case of a Linux console.  So I think most of the code to handle this
can be left intact, we just need to reference the char-table created
by calculate_glyph_code_table when appropriate.





^ permalink raw reply	[flat|nested] 35+ messages in thread

* bug#26396: 25.1; char-displayable-p on a latin1 tty
  2017-04-17  3:26                           ` Paul Eggert
  2017-04-17  5:56                             ` Paul Eggert
@ 2017-04-17  6:24                             ` Eli Zaretskii
  2017-04-17  6:41                               ` Paul Eggert
  1 sibling, 1 reply; 35+ messages in thread
From: Eli Zaretskii @ 2017-04-17  6:24 UTC (permalink / raw)
  To: Paul Eggert; +Cc: user42_kevin, 26396

> From: Paul Eggert <eggert@cs.ucla.edu>
> Date: Sun, 16 Apr 2017 20:26:01 -0700
> 
> Kevin Ryde wrote:
> 
> > Debian has an easy line in /etc/default/console-setup.  The default is
> > now utf8 but there's lots more.  I have it latin1.
> 
> In that case I stand corrected: some people are stull using non-UTF-8 Linux 
> consoles. I don't know of any convenient programmatic way for Emacs to determine 
> whether the console is in UTF-8 output mode, though. (I can think of complicated 
> ways, involving outputting bytes to the screen and seeing what happens to the 
> cursor position; but this would be destructive to the screen contents.) I agree 
> Emacs shouldn't be changing the mode.

I agree.

Regarding testing it, the simplest way is to provide a user variable
which will tell Emacs about the console's mode, and ask users who
don't want the UTF-8 mode to set that variable to that effect.  Or
maybe my idea about using terminal-coding-system as an indication of
that could be workable, in which case it's a better alternative.
WDYT?

I can offer help in reviewing the patches and perhaps also writing
some of that, but I cannot test the code, as I don't have a convenient
access to a Linux console where I could run Emacs I build.

TIA

^ permalink raw reply	[flat|nested] 35+ messages in thread

* bug#26396: 25.1; char-displayable-p on a latin1 tty
  2017-04-17  6:24                             ` Eli Zaretskii
@ 2017-04-17  6:41                               ` Paul Eggert
  2017-04-17  7:27                                 ` Kevin Ryde
  2017-04-17  8:08                                 ` Eli Zaretskii
  0 siblings, 2 replies; 35+ messages in thread
From: Paul Eggert @ 2017-04-17  6:41 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: user42_kevin, 26396

Eli Zaretskii wrote:
> I can offer help in reviewing the patches and perhaps also writing
> some of that, but I cannot test the code, as I don't have a convenient
> access to a Linux console where I could run Emacs I build.

Rather than descend into this swamp I am hoping that the patch I installed is 
enough to solve Kevin's problem. The basic idea is to assume that the terminal's 
coding system (as per Emacs) is consistent with the Linux console UTF-8 output 
mode (as per the Linux kernel), i.e., it's the user's responsibility to get all 
the settings right.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* bug#26396: 25.1; char-displayable-p on a latin1 tty
  2017-04-17  6:41                               ` Paul Eggert
@ 2017-04-17  7:27                                 ` Kevin Ryde
  2017-04-17  8:08                                 ` Eli Zaretskii
  1 sibling, 0 replies; 35+ messages in thread
From: Kevin Ryde @ 2017-04-17  7:27 UTC (permalink / raw)
  To: 26396; +Cc: Paul Eggert

Paul Eggert <eggert@cs.ucla.edu> writes:
>
> Rather than descend into this swamp ...

I don't want to go in a swamp :).  As much like plain tty is fine.

> I am hoping that the patch I
> installed is enough to solve Kevin's problem.

So GIO_UNIMAP internal-font-char case applies when a multibyte
terminal-coding-system ... that being presumed to be utf8 ...
and any unibyte terminal-coding-system goes down the encodeable
case.  Sounds good.

> ...
> assume that the terminal's coding system (as per Emacs) is consistent
> with the Linux console UTF-8 output mode (as per the Linux kernel),
> i.e., it's the user's responsibility to get all the settings right.

Yep, beaut.





^ permalink raw reply	[flat|nested] 35+ messages in thread

* bug#26396: 25.1; char-displayable-p on a latin1 tty
  2017-04-17  5:56                             ` Paul Eggert
@ 2017-04-17  7:33                               ` Eli Zaretskii
  2017-04-17 17:22                                 ` Paul Eggert
  0 siblings, 1 reply; 35+ messages in thread
From: Eli Zaretskii @ 2017-04-17  7:33 UTC (permalink / raw)
  To: Paul Eggert; +Cc: user42_kevin, 26396

> From: Paul Eggert <eggert@cs.ucla.edu>
> Date: Sun, 16 Apr 2017 22:56:08 -0700
> 
> +  if (t->type == output_termcap && t->terminal_coding->src_multibyte)

Hmm... I'm not sure this is the right condition.  In particular,
set-terminal-coding-system-internal always sets the src_multibyte
flag, for any terminal encoding.  Shouldn't we be looking at ->encoder
instead?  See setup_coding_system for how this is set up for UTF-8.

Thanks.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* bug#26396: 25.1; char-displayable-p on a latin1 tty
  2017-04-17  6:41                               ` Paul Eggert
  2017-04-17  7:27                                 ` Kevin Ryde
@ 2017-04-17  8:08                                 ` Eli Zaretskii
  2017-04-17 18:08                                   ` Paul Eggert
  1 sibling, 1 reply; 35+ messages in thread
From: Eli Zaretskii @ 2017-04-17  8:08 UTC (permalink / raw)
  To: Paul Eggert; +Cc: user42_kevin, 26396

> Cc: user42_kevin@yahoo.com.au, 26396@debbugs.gnu.org
> From: Paul Eggert <eggert@cs.ucla.edu>
> Date: Sun, 16 Apr 2017 23:41:33 -0700
> 
> Eli Zaretskii wrote:
> > I can offer help in reviewing the patches and perhaps also writing
> > some of that, but I cannot test the code, as I don't have a convenient
> > access to a Linux console where I could run Emacs I build.
> 
> Rather than descend into this swamp I am hoping that the patch I installed is 
> enough to solve Kevin's problem.

I thought we were discussing a broader issue: how to make Emacs work
better on a Linux console, not just how to fix char-displayable-p.

At the very least, I think we should teach Emacs to call
terminal_glyph_code when it decides whether a given character should
be displayed as glyphless or not.  E.g., what do you get on a Linux
console when trying to output a character beyond the BMP, like u+17001
or u+1F800? do you get the expected \uNNNNN representation?  And what
does Emacs display for characters from the BMP that are not supported
by the console's font?

My reading of the code is that at least some of these unsupported
characters will NOT be displayed as \uNNNNN, but rather as some
fallback glyph produced by the console itself, which is not what we
want, I think.

Thanks.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* bug#26396: 25.1; char-displayable-p on a latin1 tty
  2017-04-17  7:33                               ` Eli Zaretskii
@ 2017-04-17 17:22                                 ` Paul Eggert
  0 siblings, 0 replies; 35+ messages in thread
From: Paul Eggert @ 2017-04-17 17:22 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: user42_kevin, 26396

[-- Attachment #1: Type: text/plain, Size: 147 bytes --]

On 04/17/2017 12:33 AM, Eli Zaretskii wrote:
> Shouldn't we be looking at ->encoder instead?

Sure, that's easy enough. I installed the attached.


[-- Attachment #2: 0001-Tighten-recently-added-UTF-8-check.patch --]
[-- Type: text/x-patch, Size: 2014 bytes --]

From 3c9227324f43a2e331dbbfcd2f41c8d49e3f3d6b Mon Sep 17 00:00:00 2001
From: Paul Eggert <eggert@Penguin.CS.UCLA.EDU>
Date: Mon, 17 Apr 2017 10:19:39 -0700
Subject: [PATCH] Tighten recently-added UTF-8 check

* src/coding.c (encode_coding_utf_8): Now extern.
* src/terminal.c (terminal_glyph_code) [HAVE_STRUCT_UNIPAIR_UNICODE]:
Check for UTF-8, not just for multibyte.
---
 src/coding.c   | 2 +-
 src/coding.h   | 1 +
 src/terminal.c | 5 +++--
 3 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/src/coding.c b/src/coding.c
index e341a71..367a975 100644
--- a/src/coding.c
+++ b/src/coding.c
@@ -1449,7 +1449,7 @@ decode_coding_utf_8 (struct coding_system *coding)
 }
 
 
-static bool
+bool
 encode_coding_utf_8 (struct coding_system *coding)
 {
   bool multibytep = coding->dst_multibyte;
diff --git a/src/coding.h b/src/coding.h
index 77f90ec..8ed851d 100644
--- a/src/coding.h
+++ b/src/coding.h
@@ -664,6 +664,7 @@ struct coding_system
 
 /* Extern declarations.  */
 extern Lisp_Object code_conversion_save (bool, bool);
+extern bool encode_coding_utf_8 (struct coding_system *);
 extern void setup_coding_system (Lisp_Object, struct coding_system *);
 extern Lisp_Object coding_charset_list (struct coding_system *);
 extern Lisp_Object coding_system_charset_list (Lisp_Object);
diff --git a/src/terminal.c b/src/terminal.c
index 3d25b36..367f2ac 100644
--- a/src/terminal.c
+++ b/src/terminal.c
@@ -576,8 +576,9 @@ terminal_glyph_code (struct terminal *t, int ch)
 {
 #if HAVE_STRUCT_UNIPAIR_UNICODE
   /* Heuristically assume that a terminal supporting glyph codes is in
-     UTF-8 mode if and only if its coding system is multibyte (Bug#26396).  */
-  if (t->type == output_termcap && t->terminal_coding->src_multibyte)
+     UTF-8 mode if and only if its coding system is UTF-8 (Bug#26396).  */
+  if (t->type == output_termcap
+      && t->terminal_coding->encoder == encode_coding_utf_8)
     {
       /* As a hack, recompute the table when CH is the maximum
 	 character.  */
-- 
2.9.3


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* bug#26396: 25.1; char-displayable-p on a latin1 tty
  2017-04-17  8:08                                 ` Eli Zaretskii
@ 2017-04-17 18:08                                   ` Paul Eggert
  2017-04-17 18:32                                     ` Eli Zaretskii
  0 siblings, 1 reply; 35+ messages in thread
From: Paul Eggert @ 2017-04-17 18:08 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: user42_kevin, 26396

On 04/17/2017 01:08 AM, Eli Zaretskii wrote:

> My reading of the code is that at least some of these unsupported
> characters will NOT be displayed as \uNNNNN, but rather as some
> fallback glyph produced by the console itself, which is not what we
> want, I think.
Yes, that's what happens. It's not ideal, and perhaps it could be 
improved. (I hope by someone else....)

There are similar display problems even in unibyte mode on the Linux 
console. Sometimes a character above U+00FF  is displayed as '\uNNNN', 
sometimes as '?', sometimes the same character is displayed in different 
forms depending on what else is in the buffer, and I don't know why. 
(And likewise, I don't want to spend time worrying about this, as the 
1990s are long gone....)

^ permalink raw reply	[flat|nested] 35+ messages in thread

* bug#26396: 25.1; char-displayable-p on a latin1 tty
  2017-04-17 18:08                                   ` Paul Eggert
@ 2017-04-17 18:32                                     ` Eli Zaretskii
  2017-04-18 17:49                                       ` Paul Eggert
  0 siblings, 1 reply; 35+ messages in thread
From: Eli Zaretskii @ 2017-04-17 18:32 UTC (permalink / raw)
  To: Paul Eggert; +Cc: user42_kevin, 26396

> Cc: user42_kevin@yahoo.com.au, 26396@debbugs.gnu.org
> From: Paul Eggert <eggert@cs.ucla.edu>
> Date: Mon, 17 Apr 2017 11:08:09 -0700
> 
> On 04/17/2017 01:08 AM, Eli Zaretskii wrote:
> 
> > My reading of the code is that at least some of these unsupported
> > characters will NOT be displayed as \uNNNNN, but rather as some
> > fallback glyph produced by the console itself, which is not what we
> > want, I think.
> Yes, that's what happens. It's not ideal, and perhaps it could be 
> improved. (I hope by someone else....)

Lat's hope.

> There are similar display problems even in unibyte mode on the Linux 
> console. Sometimes a character above U+00FF  is displayed as '\uNNNN', 
> sometimes as '?', sometimes the same character is displayed in different 
> forms depending on what else is in the buffer, and I don't know why. 
> (And likewise, I don't want to spend time worrying about this, as the 
> 1990s are long gone....)

Yes, the TTY code that handles such characters has some very weird
logic.

Can you show an example of a character displayed in different forms
depending on buffer contents?  I'd like to look what the code does and
why.

Thanks.





^ permalink raw reply	[flat|nested] 35+ messages in thread

* bug#26396: 25.1; char-displayable-p on a latin1 tty
  2017-04-17 18:32                                     ` Eli Zaretskii
@ 2017-04-18 17:49                                       ` Paul Eggert
  2017-04-18 18:19                                         ` Eli Zaretskii
  0 siblings, 1 reply; 35+ messages in thread
From: Paul Eggert @ 2017-04-18 17:49 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: user42_kevin, 26396

On 04/17/2017 11:32 AM, Eli Zaretskii wrote:
> Can you show an example of a character displayed in different forms
> depending on buffer contents?  I'd like to look what the code does and why.

In master on the Linux console in non-UTF-8 mode and with a unibyte 
en_US locale, if I run 'emacs -Q' and type 'C-x 8 RET 100 RET C-x 8 RET 
200 RET' the screen looks like this:

\u0100\u0200

If I then type 'C-x 8 RET 300 RET', the '\u0200' magically changes to 
'?' and another '?' is appended, so that the screen then looks like this:

\u0100??

Presumably this is some sort of combining-character thing. However, if 
the intent is to present a combined character, shouldn't the character 
be displayed as a single '?', to better mimic the single glyph you'd see 
on an X display?

By the way, the '?'s look like ordinary question marks; they are not 
highlighted, as the \u0100 is. Shouldn't they be highlighted somehow? 
And while I have your ear, why is U+0700 SYRIAC END OF PARAGRAPH 
displayed as an ordinary '?' while U+0500 CYRILLIC CAPITAL LETTER KOMI 
DE is displayed as a highlighted '\u0500'?

^ permalink raw reply	[flat|nested] 35+ messages in thread

* bug#26396: 25.1; char-displayable-p on a latin1 tty
  2017-04-18 17:49                                       ` Paul Eggert
@ 2017-04-18 18:19                                         ` Eli Zaretskii
  0 siblings, 0 replies; 35+ messages in thread
From: Eli Zaretskii @ 2017-04-18 18:19 UTC (permalink / raw)
  To: Paul Eggert; +Cc: user42_kevin, 26396

> Cc: user42_kevin@yahoo.com.au, 26396@debbugs.gnu.org
> From: Paul Eggert <eggert@cs.ucla.edu>
> Date: Tue, 18 Apr 2017 10:49:37 -0700
> 
> On 04/17/2017 11:32 AM, Eli Zaretskii wrote:
> > Can you show an example of a character displayed in different forms
> > depending on buffer contents?  I'd like to look what the code does and why.
> 
> In master on the Linux console in non-UTF-8 mode and with a unibyte 
> en_US locale, if I run 'emacs -Q' and type 'C-x 8 RET 100 RET C-x 8 RET 
> 200 RET' the screen looks like this:
> 
> \u0100\u0200
> 
> If I then type 'C-x 8 RET 300 RET', the '\u0200' magically changes to 
> '?' and another '?' is appended, so that the screen then looks like this:
> 
> \u0100??

Yes, I see that, too.

> Presumably this is some sort of combining-character thing.

Yes.  Try "C-u C-x =" on the first '?', and you will see.  Or type
"M-x auto-composition-mode RET" to disable composition and get your
original characters back.

> However, if the intent is to present a combined character, shouldn't
> the character be displayed as a single '?', to better mimic the
> single glyph you'd see on an X display?

It probably should (if we want at all to allow compositions on text
terminals, which is questionable on non UTF-8 TTYs).

> By the way, the '?'s look like ordinary question marks; they are not 
> highlighted, as the \u0100 is. Shouldn't they be highlighted somehow? 

AFAIU, the '?' should not appear at all, as glyphless-char-display
specifies hex codes for those codepoints.  This is one of the
manifestations of the fact that glyphless-char-display doesn't work
correctly on TTY frames.  This code from term.c:

  else
    {
      Lisp_Object charset_list = FRAME_TERMINAL (it->f)->charset_list;

      if (char_charset (it->char_to_display, charset_list, NULL))
	{
	  it->pixel_width = CHARACTER_WIDTH (it->char_to_display);
	  it->nglyphs = it->pixel_width;
	  if (it->glyph_row)
	    append_glyph (it);
	}
      else
	{
	  Lisp_Object acronym = lookup_glyphless_char_display (-1, it);

	  eassert (it->what == IT_GLYPHLESS);
	  produce_glyphless_glyph (it, acronym);
	}
    }

is weird, because the test in char_charset, which controls how such
characters will be displayed, makes little sense to me.  The idea was
to see if the character belongs to one of the charsets supported by
the terminal, but in practice this doesn't work.

> And while I have your ear, why is U+0700 SYRIAC END OF PARAGRAPH 
> displayed as an ordinary '?' while U+0500 CYRILLIC CAPITAL LETTER KOMI 
> DE is displayed as a highlighted '\u0500'?

AFAIU, that's a direct consequence of the above weird test.
Characters which fail the char_charset test are displayed via
produce_glyphless_glyph, which on a TTY produces \uNNNN, whereas
characters which pass the test are just appended verbatim to the
buffer that is then encoded by terminal-coding-system, and that
produces the question marks for unsupported characters, bypassing
glyphless-char-display.  That's the bug I'd like to fix.

^ permalink raw reply	[flat|nested] 35+ messages in thread

end of thread, other threads:[~2017-04-18 18:19 UTC | newest]

Thread overview: 35+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-04-08  2:20 bug#26396: 25.1; char-displayable-p on a latin1 tty Kevin Ryde
2017-04-08  7:42 ` Eli Zaretskii
2017-04-09  5:16   ` Kevin Ryde
2017-04-10  6:47     ` Eli Zaretskii
2017-04-10  7:05       ` Eli Zaretskii
2017-04-10  7:45         ` Eli Zaretskii
2017-04-13  6:19         ` Paul Eggert
2017-04-13  7:16           ` Eli Zaretskii
2017-04-13 20:58             ` Paul Eggert
2017-04-14  3:01               ` Kevin Ryde
2017-04-14 18:59                 ` Paul Eggert
2017-04-14 12:37               ` Eli Zaretskii
2017-04-14 18:56                 ` Paul Eggert
2017-04-15  8:48                   ` Eli Zaretskii
2017-04-15 21:12                     ` Paul Eggert
2017-04-16  5:59                       ` Eli Zaretskii
2017-04-16 20:25                         ` Paul Eggert
2017-04-17  6:19                           ` Eli Zaretskii
2017-04-17  3:00                         ` Kevin Ryde
2017-04-17  3:26                           ` Paul Eggert
2017-04-17  5:56                             ` Paul Eggert
2017-04-17  7:33                               ` Eli Zaretskii
2017-04-17 17:22                                 ` Paul Eggert
2017-04-17  6:24                             ` Eli Zaretskii
2017-04-17  6:41                               ` Paul Eggert
2017-04-17  7:27                                 ` Kevin Ryde
2017-04-17  8:08                                 ` Eli Zaretskii
2017-04-17 18:08                                   ` Paul Eggert
2017-04-17 18:32                                     ` Eli Zaretskii
2017-04-18 17:49                                       ` Paul Eggert
2017-04-18 18:19                                         ` Eli Zaretskii
2017-04-13 22:07           ` Richard Stallman
2017-04-13 22:18             ` Paul Eggert
2017-04-14 19:48               ` Richard Stallman
2017-04-11  7:22       ` Kevin Ryde

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).