* bug#16216: 24.3.50; <control> entries in `ucs-names'
@ 2013-12-22 2:09 Drew Adams
2013-12-22 3:55 ` Eli Zaretskii
0 siblings, 1 reply; 6+ messages in thread
From: Drew Adams @ 2013-12-22 2:09 UTC (permalink / raw)
To: 16216
The doc for `insert-char' and `ucs-names' is sketchy. But it does at
least say that it is about inserting a character "using its UNICODE
name or its code point."
So what are all of those `<control>' character names about? Many
characters are listed in `ucs-names' as having this same "character
name", `<control>':
C-x 8 RET TAB C-g
C-h v ucs-names
C-s <control> C-s C-s...
And yet, AFAICT, there is no UNICODE character that has the name
`<control>', or even any name that has that as a substring.
http://www.unicode.org/charts/charindex.html
The seems like a bug. But since the description of `ucs-names' is
so sketchy it's hard to assert that. If this is not a bug, then:
1. In what way is `<control>' a "CHAR-NAME" for a character with any
code point? What does CHAR-NAME mean in this case?
2. What is the purpose of the multiple `<control>' CHAR-NAMEs?
3. Why are different CHAR-CODE values associated with the same
CHAR-NAME, `<control>'? What does that mean?
4. Try `C-x 8 RET <contr TAB RET'. You get only one particular
character "named" <control>, the one with code point decimal
159. That's the character named "APPLICATION PROGRAM COMMAND".
Why that one?
In GNU Emacs 24.3.50.1 (i686-pc-mingw32)
of 2013-12-16 on ODIEONE
Bzr revision: 115543 rudalics@gmx.at-20131216095844-lbjh5yerk6ff0tm7
Windowing system distributor `Microsoft Corp.', version 6.1.7601
Configured using:
`configure --prefix=/c/Devel/emacs/binary --enable-checking=yes,glyphs
'CFLAGS=-O0 -g3' LDFLAGS=-Lc:/Devel/emacs/lib
CPPFLAGS=-Ic:/Devel/emacs/include'
^ permalink raw reply [flat|nested] 6+ messages in thread
* bug#16216: 24.3.50; <control> entries in `ucs-names'
2013-12-22 2:09 Drew Adams
@ 2013-12-22 3:55 ` Eli Zaretskii
0 siblings, 0 replies; 6+ messages in thread
From: Eli Zaretskii @ 2013-12-22 3:55 UTC (permalink / raw)
To: Drew Adams; +Cc: 16216
> Date: Sat, 21 Dec 2013 18:09:17 -0800 (PST)
> From: Drew Adams <drew.adams@oracle.com>
>
> 1. In what way is `<control>' a "CHAR-NAME" for a character with any
> code point? What does CHAR-NAME mean in this case?
Look at UnicodeData.txt, near the beginning of the file.
^ permalink raw reply [flat|nested] 6+ messages in thread
* bug#16216: 24.3.50; <control> entries in `ucs-names'
[not found] ` <<83lhzd8roz.fsf@gnu.org>
@ 2013-12-22 5:08 ` Drew Adams
2013-12-22 5:10 ` Drew Adams
2013-12-22 18:10 ` Eli Zaretskii
0 siblings, 2 replies; 6+ messages in thread
From: Drew Adams @ 2013-12-22 5:08 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: 16216
> Look at UnicodeData.txt, near the beginning of the file.
I see; thanks. And I recall now that you pointed me to that
file once before.
Still, that does not really answer the questions I posed, AFAICT.
At least not for a user of `ucs-names' or the other functions
mentioned.
If `ucs-names' essentially corresponds to UnicodeData.txt, how
about citing that in its doc? Better yet, perhaps cite this,
which seems to be the place that the fields of UnicodeData.txt
are described:
http://www.unicode.org/Public/5.1.0/ucd/UCD.html#UnicodeData.txt
Still, part of my question is about `insert-char' and
`read-char-by-name', which is really what most users will see.
(Those are admittedly not the same as `ucs-names'. But they are
currently the only consumers of the latter.)
Should the `<control>' entries of `ucs-names' be included for
the completion provided by `read-char-by-name'? You can only
choose one of them, anyway. What is the use case for that -
the reason it is included as a possibility for `C-x 8 RET'?
^ permalink raw reply [flat|nested] 6+ messages in thread
* bug#16216: 24.3.50; <control> entries in `ucs-names'
2013-12-22 5:08 ` bug#16216: 24.3.50; <control> entries in `ucs-names' Drew Adams
@ 2013-12-22 5:10 ` Drew Adams
2013-12-22 18:13 ` Eli Zaretskii
2013-12-22 18:10 ` Eli Zaretskii
1 sibling, 1 reply; 6+ messages in thread
From: Drew Adams @ 2013-12-22 5:10 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: 16216
> http://www.unicode.org/Public/5.1.0/ucd/UCD.html#UnicodeData.txt
(That seems to have been replaced by this:
http://www.unicode.org/reports/tr44/#UnicodeData.txt)
^ permalink raw reply [flat|nested] 6+ messages in thread
* bug#16216: 24.3.50; <control> entries in `ucs-names'
2013-12-22 5:08 ` bug#16216: 24.3.50; <control> entries in `ucs-names' Drew Adams
2013-12-22 5:10 ` Drew Adams
@ 2013-12-22 18:10 ` Eli Zaretskii
1 sibling, 0 replies; 6+ messages in thread
From: Eli Zaretskii @ 2013-12-22 18:10 UTC (permalink / raw)
To: Drew Adams; +Cc: 16216-done
> Date: Sat, 21 Dec 2013 21:08:35 -0800 (PST)
> From: Drew Adams <drew.adams@oracle.com>
> Cc: 16216@debbugs.gnu.org
>
> > Look at UnicodeData.txt, near the beginning of the file.
>
> I see; thanks. And I recall now that you pointed me to that
> file once before.
>
> Still, that does not really answer the questions I posed, AFAICT.
> At least not for a user of `ucs-names' or the other functions
> mentioned.
I looked deeper and decided that this was a bug. The Unicode Standard
explicitly says that control characters have no 'name' property (see
Section 4.8 in the Standard), and that those "<control>" things are
just labels. The 'name' property cannot have lower-case characters of
"<>" in it anyway.
So starting with trunk revision 115693, all control characters will
have nil as their 'name' property, and "C-x 8 RET < TAB" will say "No
match". (Some of the control characters have 'old-name' property, so
they still can be called out by name.)
> If `ucs-names' essentially corresponds to UnicodeData.txt, how
> about citing that in its doc?
The exact file is an implementation detail (there's a corresponding
XML file, which could be used if we wanted); the ELisp manual
documents that the properties are derived from UCD, the Unicode
Character Database.
Thanks.
^ permalink raw reply [flat|nested] 6+ messages in thread
* bug#16216: 24.3.50; <control> entries in `ucs-names'
2013-12-22 5:10 ` Drew Adams
@ 2013-12-22 18:13 ` Eli Zaretskii
0 siblings, 0 replies; 6+ messages in thread
From: Eli Zaretskii @ 2013-12-22 18:13 UTC (permalink / raw)
To: Drew Adams; +Cc: 16216
> Date: Sat, 21 Dec 2013 21:10:50 -0800 (PST)
> From: Drew Adams <drew.adams@oracle.com>
> Cc: 16216@debbugs.gnu.org
>
> > http://www.unicode.org/Public/5.1.0/ucd/UCD.html#UnicodeData.txt
>
> (That seems to have been replaced by this:
> http://www.unicode.org/reports/tr44/#UnicodeData.txt)
The best references are to the "latest" version:
http://www.unicode.org/Public/UCD/latest/ucd/UnicodeData.txt
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2013-12-22 18:13 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <<cbbc5d36-76a4-4145-9dbe-30f8c986b2a7@default>
[not found] ` <<83lhzd8roz.fsf@gnu.org>
2013-12-22 5:08 ` bug#16216: 24.3.50; <control> entries in `ucs-names' Drew Adams
2013-12-22 5:10 ` Drew Adams
2013-12-22 18:13 ` Eli Zaretskii
2013-12-22 18:10 ` Eli Zaretskii
2013-12-22 2:09 Drew Adams
2013-12-22 3:55 ` Eli Zaretskii
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/emacs.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).