unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
* bug#4751: 23.1; `read-char' inserts accented chars when you use `M-' modifier
@ 2009-10-18 21:44 Drew Adams
  2009-10-18 22:36 ` Andreas Schwab
  2009-10-19  2:15 ` Stefan Monnier
  0 siblings, 2 replies; 13+ messages in thread
From: Drew Adams @ 2009-10-18 21:44 UTC (permalink / raw)
  To: bug-gnu-emacs

This is probably a feature, not a bug. But I don't really see it
explained, and it confuses me, at least.
 
emacs -Q
M-: (read-char "aaa: ")
Hit `M-a'.
 
That returns an `a' char with acute accent - 225 (#o341, #xe1).
Similarly, for other `Meta' key sequences. `read-char-exclusive' does
the same thing.
 
The doc string does say:
"If the character has modifiers, they are resolved and reflected to the
character code if possible (e.g. C-SPC -> 0)."
 
That's a bit cryptic ("reflected to the char code"?), but I guess it
means this has something to do with character encodings?  How do I
control that in Elisp - how, for instance, do I make `read-char' treat
`M-a' as a non-character event? (I assume that most people don't use
`M-a' if they want to insert an `a' with acute accent.)
 
Maybe the doc (e.g. manual) could explain this.
 
Also, for the arg descriptions the Elisp manual refers to the
`read-event' doc (same node) for an explanation. But that doc says:
 
 If INHERIT-INPUT-METHOD is non-`nil', then the current input
 method (if any) is employed to make it possible to enter a
 non-ASCII character.  Otherwise, input method handling is disabled
 for reading this event.
 
I don't think that accented chars are ASCII chars. So is this doc
incorrect? It makes it sound as if you cannot enter a non-ASCII
character, but `M-a' seems to do just that (and it does so whether the
INHERIT-INPUT-METHOD arg is nil or t).
 
Anyway, call me confused. HTH, to clarify the doc for me and others
who might be similarly confused.
 

In GNU Emacs 23.1.1 (i386-mingw-nt5.1.2600)
 of 2009-07-29 on SOFT-MJASON
Windowing system distributor `Microsoft Corp.', version 5.1.2600
configured using `configure --with-gcc (4.4)'
 







^ permalink raw reply	[flat|nested] 13+ messages in thread

* bug#4751: 23.1; `read-char' inserts accented chars when you use `M-' modifier
  2009-10-18 21:44 bug#4751: 23.1; `read-char' inserts accented chars when you use `M-' modifier Drew Adams
@ 2009-10-18 22:36 ` Andreas Schwab
  2009-10-19  2:15 ` Stefan Monnier
  1 sibling, 0 replies; 13+ messages in thread
From: Andreas Schwab @ 2009-10-18 22:36 UTC (permalink / raw)
  To: Drew Adams; +Cc: bug-gnu-emacs, 4751

"Drew Adams" <drew.adams@oracle.com> writes:

> emacs -Q
> M-: (read-char "aaa: ")
> Hit `M-a'.
>  
> That returns an `a' char with acute accent - 225 (#o341, #xe1).
> Similarly, for other `Meta' key sequences. `read-char-exclusive' does
> the same thing.
>  
> The doc string does say:
> "If the character has modifiers, they are resolved and reflected to the
> character code if possible (e.g. C-SPC -> 0)."
>  
> That's a bit cryptic ("reflected to the char code"?), but I guess it
> means this has something to do with character encodings?

Like "\M-a" (*note (elisp) Strings of Events::).

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."





^ permalink raw reply	[flat|nested] 13+ messages in thread

* bug#4751: 23.1; `read-char' inserts accented chars when you use `M-' modifier
  2009-10-18 21:44 bug#4751: 23.1; `read-char' inserts accented chars when you use `M-' modifier Drew Adams
  2009-10-18 22:36 ` Andreas Schwab
@ 2009-10-19  2:15 ` Stefan Monnier
  2009-10-19  6:11   ` Drew Adams
  1 sibling, 1 reply; 13+ messages in thread
From: Stefan Monnier @ 2009-10-19  2:15 UTC (permalink / raw)
  To: Drew Adams; +Cc: 4751

> This is probably a feature, not a bug. But I don't really see it
> explained, and it confuses me, at least.
 
read-char is a very low-level function, and it has many quirks in
this respect (it doesn't perform all the keyboard decoding usually
performed for read-key-sequence).
Fixing them is generally very difficult.

That's why I introduced `read-key'.  I don't guarantee that it'll fix
your problems, but if you can try and use it and tell us of problems you
find with it, that would be helpful.


        Stefan





^ permalink raw reply	[flat|nested] 13+ messages in thread

* bug#4751: 23.1; `read-char' inserts accented chars when you use `M-' modifier
  2009-10-19  2:15 ` Stefan Monnier
@ 2009-10-19  6:11   ` Drew Adams
  2009-10-19 13:54     ` Stefan Monnier
  0 siblings, 1 reply; 13+ messages in thread
From: Drew Adams @ 2009-10-19  6:11 UTC (permalink / raw)
  To: 'Stefan Monnier'; +Cc: 4751

> read-char is a very low-level function, and it has many quirks in
> this respect (it doesn't perform all the keyboard decoding usually
> performed for read-key-sequence).
> Fixing them is generally very difficult.
> 
> That's why I introduced `read-key'.  I don't guarantee that it'll fix
> your problems, but if you can try and use it and tell us of 
> problems you
> find with it, that would be helpful.

Where is it? I don't see it in Emacs 23.1.

What does it do when you hit a such as M-a that corresponds to a non-ASCII char?






^ permalink raw reply	[flat|nested] 13+ messages in thread

* bug#4751: 23.1; `read-char' inserts accented chars when you use `M-' modifier
  2009-10-19  6:11   ` Drew Adams
@ 2009-10-19 13:54     ` Stefan Monnier
  2009-10-19 21:42       ` Drew Adams
  0 siblings, 1 reply; 13+ messages in thread
From: Stefan Monnier @ 2009-10-19 13:54 UTC (permalink / raw)
  To: Drew Adams; +Cc: 4751

> Where is it? I don't see it in Emacs 23.1.

Indeed, it's newer than that.

> What does it do when you hit a such as M-a that corresponds to
> a non-ASCII char?

It should give you the same thing as C-h c M-a gives you.


        Stefan





^ permalink raw reply	[flat|nested] 13+ messages in thread

* bug#4751: 23.1; `read-char' inserts accented chars when you use `M-' modifier
  2009-10-19 13:54     ` Stefan Monnier
@ 2009-10-19 21:42       ` Drew Adams
  2009-10-20  1:20         ` Stefan Monnier
  0 siblings, 1 reply; 13+ messages in thread
From: Drew Adams @ 2009-10-19 21:42 UTC (permalink / raw)
  To: 'Stefan Monnier'; +Cc: 4751

I guess you're saying that there is no bug here (for `read-char').

What about this?

(characterp ?\M-t) -> nil

Seems like that would mean that `read-char' should raise an error saying that
it's not a char, and that `read-char-exclusive' should ignore it.

?\M-t -> 134217844, which is beyond the limit of 4194303 for string/buffer
chars. Which means I guess it is a keyboard-only char ((elisp)Character Type).

But the doc of `characterp' doesn't say anything about being limited to
string/buffer chars. Is this a doc bug? Is it correct for `characterp' to return
nil here?

(read-char "a: "), then hit `M-t' -> 244

So the "character" read by `read-char' does not correspond to the non-characterp
integer (presumably a keyboard char) returned by ?\M-t, although the latter is
the Lisp reader syntax for a character. And the Elisp manual (node Meta-Char
Syntax) says that ?\M-t represents the meta version of ASCII `t'. So what is
`read-char' coming up with, and why?

I'm still confused. Understanding welcome.

Seems like the doc also needs a little clarification wrt which types of
"character" are intended in each of these contexts.

Apparently, for instance, the `?' character read syntax allows use of
non-string/buffer chars, such as M-t. That's not stated and not obvious. And
`read-char' interprets the key `M-t' differently, returning a different integer
from what ?\M-t returns and from what "\M-t" means when the string is used for a
key. And nowhere is it said what `characterp' tests, apart from being a
character - why it considers ?\M-t not to be a char, for instance.

And the info seems spread around a bit too much. You need to get to node
`Nonprinting Characters', for instance, to learn that "strings cannot hold meta
characters", but that, when used to represent a key, "\M-" in a string
represents the meta version of the ASCII char.

The `read-char' behavior still seems like a bug, to me, but I do want to
understand it. Why does it return 244 - what's the relation between that
integer/char (o circumflex) and the meta version of ASCII `t'?






^ permalink raw reply	[flat|nested] 13+ messages in thread

* bug#4751: 23.1; `read-char' inserts accented chars when you use `M-' modifier
  2009-10-19 21:42       ` Drew Adams
@ 2009-10-20  1:20         ` Stefan Monnier
  2009-10-20  2:13           ` Stefan Monnier
  0 siblings, 1 reply; 13+ messages in thread
From: Stefan Monnier @ 2009-10-20  1:20 UTC (permalink / raw)
  To: Drew Adams; +Cc: 4751

> I guess you're saying that there is no bug here (for `read-char').

Not at all.  Actually it turns out that read-key also returns
244 for M-t so the problem is elsewhere.  Hmm...


        Stefan





^ permalink raw reply	[flat|nested] 13+ messages in thread

* bug#4751: 23.1; `read-char' inserts accented chars when you use `M-' modifier
  2009-10-20  1:20         ` Stefan Monnier
@ 2009-10-20  2:13           ` Stefan Monnier
  2009-10-20  2:30             ` Drew Adams
  0 siblings, 1 reply; 13+ messages in thread
From: Stefan Monnier @ 2009-10-20  2:13 UTC (permalink / raw)
  To: Drew Adams; +Cc: 4751

>> I guess you're saying that there is no bug here (for `read-char').
> Not at all.  Actually it turns out that read-key also returns
> 244 for M-t so the problem is elsewhere.  Hmm...

Actually, for read-key, I just fixed it.
For read-char, it's done on purpose, so it's easy to change, but it's
likely to break some code somewhere.


        Stefan





^ permalink raw reply	[flat|nested] 13+ messages in thread

* bug#4751: 23.1; `read-char' inserts accented chars when you use `M-' modifier
  2009-10-20  2:13           ` Stefan Monnier
@ 2009-10-20  2:30             ` Drew Adams
  2009-10-20 13:51               ` Stefan Monnier
  0 siblings, 1 reply; 13+ messages in thread
From: Drew Adams @ 2009-10-20  2:30 UTC (permalink / raw)
  To: 'Stefan Monnier'; +Cc: 4751

> >> I guess you're saying that there is no bug here (for `read-char').
> > Not at all.  Actually it turns out that read-key also returns
> > 244 for M-t so the problem is elsewhere.  Hmm...
> 
> Actually, for read-key, I just fixed it.
> For read-char, it's done on purpose, so it's easy to change, but it's
> likely to break some code somewhere.

FWIW, this is a new bug. Prior to Emacs 23, read-char for the input `M-t'
returned ?\M-t. It returns, for M-t:

Emacs 20, 21: -134217612 = ?\M-t
Emacs 22: 134217844 (#o1000000164, #x8000074) = ?\M-t
Emacs 23: 244 (#o364, #xf4)






^ permalink raw reply	[flat|nested] 13+ messages in thread

* bug#4751: 23.1; `read-char' inserts accented chars when you use `M-' modifier
  2009-10-20  2:30             ` Drew Adams
@ 2009-10-20 13:51               ` Stefan Monnier
  2009-10-20 15:05                 ` Drew Adams
  0 siblings, 1 reply; 13+ messages in thread
From: Stefan Monnier @ 2009-10-20 13:51 UTC (permalink / raw)
  To: Drew Adams; +Cc: 4751

>> >> I guess you're saying that there is no bug here (for `read-char').
>> > Not at all.  Actually it turns out that read-key also returns
>> > 244 for M-t so the problem is elsewhere.  Hmm...
>> Actually, for read-key, I just fixed it.
>> For read-char, it's done on purpose, so it's easy to change, but it's
>> likely to break some code somewhere.
> FWIW, this is a new bug.  Prior to Emacs 23, read-char for the input `M-t'
> returned ?\M-t. It returns, for M-t:

> Emacs 20, 21: -134217612 = ?\M-t
> Emacs 22: 134217844 (#o1000000164, #x8000074) = ?\M-t
> Emacs 23: 244 (#o364, #xf4)

Yes, but it's a change that was done on purpose (at least
the code is quite explicit), so undoing it is a bit risky.
Hmm... OK, I just reverted that part of the change, we'll see
what happens.

The fundamental problem is that read-char is ill-defined: on the one
hand, it wants to return "raw undecoded events" and on the other it
wants to return chars (which in the general case need decoding, e.g. to
turn an escape sequence into <kp-3> and then into the char ?3) and uses
a far-reaching definition of "char" (basically: any event represented by
an integer).


        Stefan





^ permalink raw reply	[flat|nested] 13+ messages in thread

* bug#4751: 23.1; `read-char' inserts accented chars when you use `M-' modifier
  2009-10-20 13:51               ` Stefan Monnier
@ 2009-10-20 15:05                 ` Drew Adams
  2009-10-20 19:56                   ` Stefan Monnier
  0 siblings, 1 reply; 13+ messages in thread
From: Drew Adams @ 2009-10-20 15:05 UTC (permalink / raw)
  To: 'Stefan Monnier'; +Cc: 4751

> Yes, but it's a change that was done on purpose (at least
> the code is quite explicit), so undoing it is a bit risky.
> Hmm... OK, I just reverted that part of the change, we'll see
> what happens.
> 
> The fundamental problem is that read-char is ill-defined: on the one
> hand, it wants to return "raw undecoded events" and on the other it
> wants to return chars (which in the general case need 
> decoding, e.g. to
> turn an escape sequence into <kp-3> and then into the char 
> ?3) and uses
> a far-reaching definition of "char" (basically: any event 
> represented by an integer).

There are those two cases: decoding vs any integer. But there are also three
cases for integers, apparently: (1) any integer, (2) any integer < 4194303, (3)
any integer small enough that the char can be used in a string or buffer (which
limit is apparently greater than 4194303).

I'm also confused about `characterp' and the notion of a character.
`(elisp)Character Type' says that only integers/chars from 0 to 4194303 can be
in strings or buffers. But ?\M-t is 134217844, yet "\M-t" is a string with that
char (`o' circumflex), and (insert-char ?\M-t 1) inserts it in a buffer. That
doc is confusing - is it also incorrect?

There is also the ambiguity I mentioned in the too-sparse doc for `characterp':
The function returns non-nil only for a subset of the chars that can be used in
a string or buffer. What is that subset (it seems to be chars < 4194304)? This
needs to be documented, IMO. Or is there perhaps also a bug for `characterp' and
it should return t for ?\M-t? Currently, characterp seems to respect the 4194303
limit: (characterp 4194303) = t, (characterp 4194304) = nil.

Wrt `read-char' what is/are the intention(s) - use cases?

Should it too only return `characterp' chars (assuming there is no `characterp'
bug), and signal an error for a non-characterp event? In which case
`read-char-exclusive' would ignore such events and wait until getting a
`characterp' char. That is one use case - I've seen code that uses `read-char'
in a loop to accumulate a string of chars, and in some cases it seems unlikely
that what is really wanted is a string that can contain meta chars.

Or should `read-char' return any char, even one beyond the limit of being
representable in strings and buffer (whatever the correct limit is - it doesn't
seem to be 4194303)? In which case, if a user really wants a characterp char,
e.g. in order to accumulate in a string, s?he would test using `characterp'
before accumulating.

Or should `read-char' accept another arg to determine the behavior.

As you can see, I'm still confused - have questions.







^ permalink raw reply	[flat|nested] 13+ messages in thread

* bug#4751: 23.1; `read-char' inserts accented chars when you use `M-' modifier
  2009-10-20 15:05                 ` Drew Adams
@ 2009-10-20 19:56                   ` Stefan Monnier
  2011-09-11  4:47                     ` Lars Magne Ingebrigtsen
  0 siblings, 1 reply; 13+ messages in thread
From: Stefan Monnier @ 2009-10-20 19:56 UTC (permalink / raw)
  To: Drew Adams; +Cc: 4751

> I'm also confused about `characterp' and the notion of a character.
> `(elisp)Character Type' says that only integers/chars from 0 to
> 4194303 can be in strings or buffers.
> But ?\M-t is 134217844,

Yes.

> yet "\M-t" is a string with that char (`o' circumflex),

\M-t in strings is a special case inherited from ASCII days where the
meta modifier was used to represent the upper 8th bit.  We still support
it because it's very commonly (mis)used in .emacsen for key bindings,
but I'd strongly recommend to stay very far away from it.

And whenever \M-t is turned into ô (or vice-versa), this is a pretty
much a bug (tho there are a few cases where fixing this bug may be
difficult).  The only case I know where it's a "feature" is for C-q M-t.

> and (insert-char ?\M-t 1) inserts it in a buffer.

I believe the patch I installed earlier to fix read_char also fixed this one.

> a string or buffer. What is that subset (it seems to be chars <
> 4194304)? 

Yes, currently this is the subset.  In Emacs-22 it was different, and
who knows what the future may bring.

> Wrt `read-char' what is/are the intention(s) - use cases?

I really wish I knew.


        Stefan





^ permalink raw reply	[flat|nested] 13+ messages in thread

* bug#4751: 23.1; `read-char' inserts accented chars when you use `M-' modifier
  2009-10-20 19:56                   ` Stefan Monnier
@ 2011-09-11  4:47                     ` Lars Magne Ingebrigtsen
  0 siblings, 0 replies; 13+ messages in thread
From: Lars Magne Ingebrigtsen @ 2011-09-11  4:47 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: 4751

Stefan Monnier <monnier@iro.umontreal.ca> writes:

>> and (insert-char ?\M-t 1) inserts it in a buffer.
>
> I believe the patch I installed earlier to fix read_char also fixed this one.

This issue seems resolved, so I'm closing this report.

-- 
(domestic pets only, the antidote for overdose, milk.)
  bloggy blog http://lars.ingebrigtsen.no/





^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2011-09-11  4:47 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-10-18 21:44 bug#4751: 23.1; `read-char' inserts accented chars when you use `M-' modifier Drew Adams
2009-10-18 22:36 ` Andreas Schwab
2009-10-19  2:15 ` Stefan Monnier
2009-10-19  6:11   ` Drew Adams
2009-10-19 13:54     ` Stefan Monnier
2009-10-19 21:42       ` Drew Adams
2009-10-20  1:20         ` Stefan Monnier
2009-10-20  2:13           ` Stefan Monnier
2009-10-20  2:30             ` Drew Adams
2009-10-20 13:51               ` Stefan Monnier
2009-10-20 15:05                 ` Drew Adams
2009-10-20 19:56                   ` Stefan Monnier
2011-09-11  4:47                     ` Lars Magne Ingebrigtsen

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).