unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
* bug#16216: 24.3.50; <control> entries in `ucs-names'
@ 2013-12-22  2:09 Drew Adams
  2013-12-22  3:55 ` Eli Zaretskii
  0 siblings, 1 reply; 6+ messages in thread
From: Drew Adams @ 2013-12-22  2:09 UTC (permalink / raw)
  To: 16216

The doc for `insert-char' and `ucs-names' is sketchy.  But it does at
least say that it is about inserting a character "using its UNICODE
name or its code point."

So what are all of those `<control>' character names about?  Many
characters are listed in `ucs-names' as having this same "character
name", `<control>':

 C-x 8 RET TAB C-g
 C-h v ucs-names
 C-s <control> C-s C-s...

And yet, AFAICT, there is no UNICODE character that has the name
`<control>', or even any name that has that as a substring.
http://www.unicode.org/charts/charindex.html

The seems like a bug.  But since the description of `ucs-names' is
so sketchy it's hard to assert that.  If this is not a bug, then:

1. In what way is `<control>' a "CHAR-NAME" for a character with any
   code point?  What does CHAR-NAME mean in this case?

2. What is the purpose of the multiple `<control>' CHAR-NAMEs?

3. Why are different CHAR-CODE values associated with the same
   CHAR-NAME, `<control>'?  What does that mean?

4. Try `C-x 8 RET <contr TAB RET'.  You get only one particular
   character "named" <control>, the one with code point decimal
   159.  That's the character named "APPLICATION PROGRAM COMMAND".
   Why that one?


In GNU Emacs 24.3.50.1 (i686-pc-mingw32)
 of 2013-12-16 on ODIEONE
Bzr revision: 115543 rudalics@gmx.at-20131216095844-lbjh5yerk6ff0tm7
Windowing system distributor `Microsoft Corp.', version 6.1.7601
Configured using:
 `configure --prefix=/c/Devel/emacs/binary --enable-checking=yes,glyphs
 'CFLAGS=-O0 -g3' LDFLAGS=-Lc:/Devel/emacs/lib
 CPPFLAGS=-Ic:/Devel/emacs/include'





^ permalink raw reply	[flat|nested] 6+ messages in thread

* bug#16216: 24.3.50; <control> entries in `ucs-names'
  2013-12-22  2:09 Drew Adams
@ 2013-12-22  3:55 ` Eli Zaretskii
  0 siblings, 0 replies; 6+ messages in thread
From: Eli Zaretskii @ 2013-12-22  3:55 UTC (permalink / raw)
  To: Drew Adams; +Cc: 16216

> Date: Sat, 21 Dec 2013 18:09:17 -0800 (PST)
> From: Drew Adams <drew.adams@oracle.com>
> 
> 1. In what way is `<control>' a "CHAR-NAME" for a character with any
>    code point?  What does CHAR-NAME mean in this case?

Look at UnicodeData.txt, near the beginning of the file.





^ permalink raw reply	[flat|nested] 6+ messages in thread

* bug#16216: 24.3.50; <control> entries in `ucs-names'
       [not found] ` <<83lhzd8roz.fsf@gnu.org>
@ 2013-12-22  5:08   ` Drew Adams
  2013-12-22  5:10     ` Drew Adams
  2013-12-22 18:10     ` Eli Zaretskii
  0 siblings, 2 replies; 6+ messages in thread
From: Drew Adams @ 2013-12-22  5:08 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 16216

> Look at UnicodeData.txt, near the beginning of the file.

I see; thanks.  And I recall now that you pointed me to that
file once before.

Still, that does not really answer the questions I posed, AFAICT.
At least not for a user of `ucs-names' or the other functions
mentioned.

If `ucs-names' essentially corresponds to UnicodeData.txt, how
about citing that in its doc?  Better yet, perhaps cite this,
which seems to be the place that the fields of UnicodeData.txt
are described:
http://www.unicode.org/Public/5.1.0/ucd/UCD.html#UnicodeData.txt

Still, part of my question is about `insert-char' and
`read-char-by-name', which is really what most users will see.
(Those are admittedly not the same as `ucs-names'.  But they are
currently the only consumers of the latter.)

Should the `<control>' entries of `ucs-names' be included for
the completion provided by `read-char-by-name'?  You can only
choose one of them, anyway.  What is the use case for that -
the reason it is included as a possibility for `C-x 8 RET'?





^ permalink raw reply	[flat|nested] 6+ messages in thread

* bug#16216: 24.3.50; <control> entries in `ucs-names'
  2013-12-22  5:08   ` bug#16216: 24.3.50; <control> entries in `ucs-names' Drew Adams
@ 2013-12-22  5:10     ` Drew Adams
  2013-12-22 18:13       ` Eli Zaretskii
  2013-12-22 18:10     ` Eli Zaretskii
  1 sibling, 1 reply; 6+ messages in thread
From: Drew Adams @ 2013-12-22  5:10 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 16216

> http://www.unicode.org/Public/5.1.0/ucd/UCD.html#UnicodeData.txt

(That seems to have been replaced by this:
http://www.unicode.org/reports/tr44/#UnicodeData.txt)





^ permalink raw reply	[flat|nested] 6+ messages in thread

* bug#16216: 24.3.50; <control> entries in `ucs-names'
  2013-12-22  5:08   ` bug#16216: 24.3.50; <control> entries in `ucs-names' Drew Adams
  2013-12-22  5:10     ` Drew Adams
@ 2013-12-22 18:10     ` Eli Zaretskii
  1 sibling, 0 replies; 6+ messages in thread
From: Eli Zaretskii @ 2013-12-22 18:10 UTC (permalink / raw)
  To: Drew Adams; +Cc: 16216-done

> Date: Sat, 21 Dec 2013 21:08:35 -0800 (PST)
> From: Drew Adams <drew.adams@oracle.com>
> Cc: 16216@debbugs.gnu.org
> 
> > Look at UnicodeData.txt, near the beginning of the file.
> 
> I see; thanks.  And I recall now that you pointed me to that
> file once before.
> 
> Still, that does not really answer the questions I posed, AFAICT.
> At least not for a user of `ucs-names' or the other functions
> mentioned.

I looked deeper and decided that this was a bug.  The Unicode Standard
explicitly says that control characters have no 'name' property (see
Section 4.8 in the Standard), and that those "<control>" things are
just labels.  The 'name' property cannot have lower-case characters of
"<>" in it anyway.

So starting with trunk revision 115693, all control characters will
have nil as their 'name' property, and "C-x 8 RET < TAB" will say "No
match".  (Some of the control characters have 'old-name' property, so
they still can be called out by name.)

> If `ucs-names' essentially corresponds to UnicodeData.txt, how
> about citing that in its doc?

The exact file is an implementation detail (there's a corresponding
XML file, which could be used if we wanted); the ELisp manual
documents that the properties are derived from UCD, the Unicode
Character Database.

Thanks.





^ permalink raw reply	[flat|nested] 6+ messages in thread

* bug#16216: 24.3.50; <control> entries in `ucs-names'
  2013-12-22  5:10     ` Drew Adams
@ 2013-12-22 18:13       ` Eli Zaretskii
  0 siblings, 0 replies; 6+ messages in thread
From: Eli Zaretskii @ 2013-12-22 18:13 UTC (permalink / raw)
  To: Drew Adams; +Cc: 16216

> Date: Sat, 21 Dec 2013 21:10:50 -0800 (PST)
> From: Drew Adams <drew.adams@oracle.com>
> Cc: 16216@debbugs.gnu.org
> 
> > http://www.unicode.org/Public/5.1.0/ucd/UCD.html#UnicodeData.txt
> 
> (That seems to have been replaced by this:
> http://www.unicode.org/reports/tr44/#UnicodeData.txt)

The best references are to the "latest" version:

  http://www.unicode.org/Public/UCD/latest/ucd/UnicodeData.txt





^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2013-12-22 18:13 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <<cbbc5d36-76a4-4145-9dbe-30f8c986b2a7@default>
     [not found] ` <<83lhzd8roz.fsf@gnu.org>
2013-12-22  5:08   ` bug#16216: 24.3.50; <control> entries in `ucs-names' Drew Adams
2013-12-22  5:10     ` Drew Adams
2013-12-22 18:13       ` Eli Zaretskii
2013-12-22 18:10     ` Eli Zaretskii
2013-12-22  2:09 Drew Adams
2013-12-22  3:55 ` Eli Zaretskii

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).