* bug#9653: 24.0.50; `ucs-names' - Why all of the ("" . XXX) entries?
2011-10-02 16:36 ` bug#9653: 24.0.50; `ucs-names' - Why all of the ("" . XXX) entries? Drew Adams
@ 2011-10-02 17:38 ` Drew Adams
2011-10-02 22:31 ` Juanma Barranquero
2011-10-02 18:09 ` Thierry Volpiatto
` (4 subsequent siblings)
5 siblings, 1 reply; 49+ messages in thread
From: Drew Adams @ 2011-10-02 17:38 UTC (permalink / raw)
To: 9653
(Not claiming this additional question relates to a bug, in particular to this
bug report - except in so far as it asks for better doc.)
In `ucs-names', what are the CHAR-NAMEs "VARIATION SELECTOR-n" all about (for
n=17...256)? Are those actually character names?
Googling indicates that a variation selector is a metacharacter that selects one
of a set of semantically equivalent glyphs.
I no doubt do not fully understand (even after scanning the Unicode standard,
http://unicode.org/reports/tr28/tr28-3.html#13_7_variation_selectors and this:
http://babelstone.blogspot.com/2007/06/secret-life-of-variation-selectors.html
about it). I can, however, see the difference variation selectors can make,
e.g. here: http://www.w3.org/TR/xml-entity-names/U0FE00.html.
But why are the "VARIATION SELECTOR-n" included as CHAR-NAMEs in `ucs-names'?
IIUC, variant selectors, when used, follow characters whose
representations/appearance they modify in some sense.
Why do we treat variation selectors, in `ucs-names', as "character names", if
they are only "metacharacters", "combining marks" used to indicate how to change
the appearance of the characters they follow?
I see that the Unicode standard also refers to variation selectors as "default
ignorable characters", so I guess they are characters in some sense.
But how about providing a function that filters out all such "ignorable
characters" from `ucs-names', or how about at least providing a list of all such
chars.
I see this in the standard too: "If a user requires a visual distinction between
a character and a particular variant of that character, then fonts must be used
to make that distinction."
The "variation selector" information seems to be only about visual appearance,
not about names of displayable characters. Does it really belong in
`ucs-names'?
And I see that such "ignorable" stuff is apparently supposed to be invisible -
e.g., "default_ignorable_code points...are invisible, have no glyph...". If so,
how about a function that filters out all such invisible stuff from `ucs-names'
(or at least a list of such stuff).
How about a little more doc for `ucs-names', so that any programmer who might
want to use `ucs-names' (e.g. for completion) might know how to reasonably
use/deal with such particular CHAR-NAMEs. Please do not simply say that
`ucs-names' is only "internal" so you need not describe it better. It's already
being used in various 3rd-party code.
Again, this is not really part of this bug report (which is only about "" as a
CHAR-NAME), unless you see that it is related (e.g. wrt doc). But I would like
to know more about the "ignorable characters" - how to recognize them etc. so
that I can (optionally, at least) remove them as completion candidates.
I understand that the Emacs doc does not have as its purpose to teach the
details of the Unicode standard, but perhaps a little more explanation of the
content of `ucs-names' wouldn't hurt?
^ permalink raw reply [flat|nested] 49+ messages in thread
* bug#9653: 24.0.50; `ucs-names' - Why all of the ("" . XXX) entries?
2011-10-02 17:38 ` Drew Adams
@ 2011-10-02 22:31 ` Juanma Barranquero
2011-10-02 22:51 ` Drew Adams
0 siblings, 1 reply; 49+ messages in thread
From: Juanma Barranquero @ 2011-10-02 22:31 UTC (permalink / raw)
To: Drew Adams; +Cc: 9653
On Sun, Oct 2, 2011 at 19:38, Drew Adams <drew.adams@oracle.com> wrote:
> Please do not simply say that
> `ucs-names' is only "internal" so you need not describe it better. It's already
> being used in various 3rd-party code.
That's exactly what I would say. `ucs-names' (var and function) is an
implementation detail of `read-by-char-name'.
Juanma
^ permalink raw reply [flat|nested] 49+ messages in thread
* bug#9653: 24.0.50; `ucs-names' - Why all of the ("" . XXX) entries?
2011-10-02 22:31 ` Juanma Barranquero
@ 2011-10-02 22:51 ` Drew Adams
2011-10-02 22:55 ` Juanma Barranquero
2011-10-03 13:20 ` Jason Rumney
0 siblings, 2 replies; 49+ messages in thread
From: Drew Adams @ 2011-10-02 22:51 UTC (permalink / raw)
To: 'Juanma Barranquero'; +Cc: 9653
> > Please do not simply say that `ucs-names' is only
> > "internal" so you need not describe it better.
> > It's already being used in various 3rd-party code.
>
> That's exactly what I would say. `ucs-names' (var and function) is an
> implementation detail of `read-by-char-name'.
Which demonstrates one more time the fallacy of "internal" use and "need not
document".
It should be obvious that a data structure such as `ucs-names' is useful even
outside of reading a character name with `read-char-by-name'. I predict that
we'll find that the former is in fact more generally useful than the latter.
That could have been obvious at its creation, but it is doubly obvious now that
experience bears it out (for this relatively new fn/var) - as I mentioned wrt
its use in 3rd-party code.
Those using it notice the bug about inclusion of "" character names. Without
that use beyond `read-char-by-name' such a bug might have gone unnoticed. (Yes,
until informed otherwise, I'm assuming that including "" names is wrong.)
`ucs-names' is documented, but not as well as it should be.
^ permalink raw reply [flat|nested] 49+ messages in thread
* bug#9653: 24.0.50; `ucs-names' - Why all of the ("" . XXX) entries?
2011-10-02 22:51 ` Drew Adams
@ 2011-10-02 22:55 ` Juanma Barranquero
2011-10-03 13:20 ` Jason Rumney
1 sibling, 0 replies; 49+ messages in thread
From: Juanma Barranquero @ 2011-10-02 22:55 UTC (permalink / raw)
To: Drew Adams; +Cc: 9653
On Mon, Oct 3, 2011 at 00:51, Drew Adams <drew.adams@oracle.com> wrote:
> (Yes,
> until informed otherwise, I'm assuming that including "" names is wrong.)
Why? It is an internal variable, and its contents is apparently
correct for its intended use. It seems a bug only if you insist in
reading into it more than the docstring really says.
> `ucs-names' is documented, but not as well as it should be.
I don't think "should" is right in this context.
Juanma
^ permalink raw reply [flat|nested] 49+ messages in thread
* bug#9653: 24.0.50; `ucs-names' - Why all of the ("" . XXX) entries?
2011-10-02 22:51 ` Drew Adams
2011-10-02 22:55 ` Juanma Barranquero
@ 2011-10-03 13:20 ` Jason Rumney
2011-10-03 13:56 ` Drew Adams
1 sibling, 1 reply; 49+ messages in thread
From: Jason Rumney @ 2011-10-03 13:20 UTC (permalink / raw)
To: Drew Adams; +Cc: 'Juanma Barranquero', 9653
"Drew Adams" <drew.adams@oracle.com> writes:
> It should be obvious that a data structure such as `ucs-names' is useful even
> outside of reading a character name with `read-char-by-name'. I predict that
> we'll find that the former is in fact more generally useful than the
> latter.
It should be obvious that any variable with scant documentation that
starts off "Alist of cached..." is internal to something, and probably not
very useful outside of that.
^ permalink raw reply [flat|nested] 49+ messages in thread
* bug#9653: 24.0.50; `ucs-names' - Why all of the ("" . XXX) entries?
2011-10-03 13:20 ` Jason Rumney
@ 2011-10-03 13:56 ` Drew Adams
2011-10-03 14:00 ` Juanma Barranquero
0 siblings, 1 reply; 49+ messages in thread
From: Drew Adams @ 2011-10-03 13:56 UTC (permalink / raw)
To: 'Jason Rumney'; +Cc: 'Juanma Barranquero', 9653
> > It should be obvious that a data structure such as
> > `ucs-names' is useful even outside of reading a
> > character name with `read-char-by-name'.
> >
> > I predict that we'll find that the former is in fact
> > more generally useful than the latter.
>
> It should be obvious that any variable with scant documentation that
> starts off "Alist of cached..." is internal to something, and
> probably not very useful outside of that.
Bzzzzt! No, but thanks for playing. Scant documentation does not necessarily
mean anything in particular - it can mean anything at all.
And just because a variable caches a value does not necessarily mean that it is
internal to something and is useless outside that something.
More importantly, this is not so much about the variable as the function that is
its front end. It is really about the combination of the two, which together
act as a (constant) data structure.
If you are only trying to say that the var is internal to the function of the
same name, then fine. Thus we (properly) document the function, mentioning its
cache variable:
"Return [an] alist of...pairs cached in `ucs-names'."
It is that doc that could be enhanced, to help users.
But if you, like Juanma, are trying to say that `ucs-names' is internal to
function `read-char-by-name', then no. This is a good example of the falsehood
of such claims. `ucs-names' is used by 3rd-party code, and in situations where
`read-char-by-name' is not used at all.
^ permalink raw reply [flat|nested] 49+ messages in thread
* bug#9653: 24.0.50; `ucs-names' - Why all of the ("" . XXX) entries?
2011-10-03 13:56 ` Drew Adams
@ 2011-10-03 14:00 ` Juanma Barranquero
0 siblings, 0 replies; 49+ messages in thread
From: Juanma Barranquero @ 2011-10-03 14:00 UTC (permalink / raw)
To: Drew Adams; +Cc: 9653
On Mon, Oct 3, 2011 at 15:56, Drew Adams <drew.adams@oracle.com> wrote:
> But if you, like Juanma, are trying to say that `ucs-names' is internal to
> function `read-char-by-name', then no. This is a good example of the falsehood
> of such claims. `ucs-names' is used by 3rd-party code, and in situations where
> `read-char-by-name' is not used at all.
Bzzzzt! That means nothing, but thanks for playing.
There are no private namespaces in Elisp, so every single internal
function and variable can be used in 3rd-party code, and many have
been.
Juanma
^ permalink raw reply [flat|nested] 49+ messages in thread
* bug#9653: 24.0.50; `ucs-names' - Why all of the ("" . XXX) entries?
2011-10-02 16:36 ` bug#9653: 24.0.50; `ucs-names' - Why all of the ("" . XXX) entries? Drew Adams
2011-10-02 17:38 ` Drew Adams
@ 2011-10-02 18:09 ` Thierry Volpiatto
2011-10-03 1:28 ` Stefan Monnier
` (3 subsequent siblings)
5 siblings, 0 replies; 49+ messages in thread
From: Thierry Volpiatto @ 2011-10-02 18:09 UTC (permalink / raw)
To: 9653
"Drew Adams" <drew.adams@oracle.com> writes:
> AFAIK, there is no more recent Windows binary available than this one,
> so reporting this here. (When will there be a Windows pretest binary?)
>
> In GNU Emacs 24.0.50.1 (i386-mingw-nt5.1.2600)
> of 2011-09-19 on 3249CTO
> Windowing system distributor `Microsoft Corp.', version 5.1.2600
> configured using `configure --with-gcc (4.5) --no-opt'
>
>
> Is this a bug? There are lots of entries in `ucs-names' that have "" as
> the car. Shouldn't these be filtered out? If not, what is their
> significance (and use)?
>
> The doc is very lightweight: "Alist of cached (CHAR-NAME . CHAR-CODE)
> pairs.", for the var, and "Return alist of (CHAR-NAME . CHAR-CODE)
> pairs cached in `ucs-names', for the function.
>
> How can "" be a CHAR-NAME, i.e., the name of any character?
>
> Please fix this, if a code bug, or document what this is about, if a doc
> bug.
Right, same here, i must filter empty entries.
Also:
- Entries with empty cdr.
- Existing entries not ucs-insert/able.
- Sometimes character are missing for existing categories
(e.g MATHEMATICAL DOUBLE-STRUCK CAPITAL C:) doesn't exist.
--
⚡ 𝕋𝕙𝕚𝕖𝕣𝕣𝕪
Get my Gnupg key:
gpg --keyserver pgp.mit.edu --recv-keys 59F29997
^ permalink raw reply [flat|nested] 49+ messages in thread
* bug#9653: 24.0.50; `ucs-names' - Why all of the ("" . XXX) entries?
2011-10-02 16:36 ` bug#9653: 24.0.50; `ucs-names' - Why all of the ("" . XXX) entries? Drew Adams
2011-10-02 17:38 ` Drew Adams
2011-10-02 18:09 ` Thierry Volpiatto
@ 2011-10-03 1:28 ` Stefan Monnier
2011-10-03 4:23 ` Kenichi Handa
2011-10-05 8:59 ` Kenichi Handa
` (2 subsequent siblings)
5 siblings, 1 reply; 49+ messages in thread
From: Stefan Monnier @ 2011-10-03 1:28 UTC (permalink / raw)
To: Kenichi Handa; +Cc: 9653
> Is this a bug? There are lots of entries in `ucs-names' that have "" as
> the car.
Indeed, that's odd.
We could filter them out, of course, but there's something fishy: the
code of ucs-names does:
(while (<= c end)
(if (setq name (get-char-code-property c 'name))
(push (cons name c) names))
(if (setq name (get-char-code-property c 'old-name))
(push (cons name c) names))
but it turns out that `name' is never nil there, whereas it is often "".
So either we need to change ucs-names to filter out those useless
entries, or we need to change (get-char-code-property c 'name) and
(get-char-code-property c 'old-name) so it returns nil rather than the
empty string.
Handa-san, could you take a look at this? Is (get-char-code-property
c 'name) supposed to return "" when the char has no name or is it
supposed to return nil? Either way is fine by me (all those "" are
really one and the same string, so they don't waste memory).
> In `ucs-names', what are the CHAR-NAMEs "VARIATION SELECTOR-n" all about (for
> n=17...256)? Are those actually character names?
It depends on what you mean by "character". They are in the sense of
Elisp's characters, and it can be occasionally useful to be able to
manually insert them (if for nothing to test what happens when they
appear in a file/buffer). So it's good that C-x 8 RET allows the user
to enter them, even if you hopefully will never need to use them yourself.
Stefan
^ permalink raw reply [flat|nested] 49+ messages in thread
* bug#9653: 24.0.50; `ucs-names' - Why all of the ("" . XXX) entries?
2011-10-03 1:28 ` Stefan Monnier
@ 2011-10-03 4:23 ` Kenichi Handa
2011-10-03 8:22 ` Andreas Schwab
` (2 more replies)
0 siblings, 3 replies; 49+ messages in thread
From: Kenichi Handa @ 2011-10-03 4:23 UTC (permalink / raw)
To: Stefan Monnier; +Cc: 9653
In article <jwv1uuu2814.fsf-monnier+emacs@gnu.org>, Stefan Monnier <monnier@iro.umontreal.ca> writes:
> Handa-san, could you take a look at this? Is (get-char-code-property
> c 'name) supposed to return "" when the char has no name or is it
> supposed to return nil? Either way is fine by me (all those "" are
> really one and the same string, so they don't waste memory).
It returns "" in such a case in Emacs 24. Emacs 23 returned
nil but that behavior was fixed because Unicode Standard
Annex #44 (Unicode Character Database) says as below:
4.2.8 Default Values
[...]
* For miscellaneous properties which take strings as
values, such as the Unicode Name property, the default
value is a null string.
^^^^^^^^^^^^^
---
Kenichi Handa
handa@m17n.org
^ permalink raw reply [flat|nested] 49+ messages in thread
* bug#9653: 24.0.50; `ucs-names' - Why all of the ("" . XXX) entries?
2011-10-03 4:23 ` Kenichi Handa
@ 2011-10-03 8:22 ` Andreas Schwab
2011-10-04 1:14 ` Kenichi Handa
2011-10-03 13:31 ` Stefan Monnier
2011-10-04 2:19 ` Drew Adams
2 siblings, 1 reply; 49+ messages in thread
From: Andreas Schwab @ 2011-10-03 8:22 UTC (permalink / raw)
To: Kenichi Handa; +Cc: 9653
Kenichi Handa <handa@m17n.org> writes:
> It returns "" in such a case in Emacs 24. Emacs 23 returned
> nil but that behavior was fixed because Unicode Standard
> Annex #44 (Unicode Character Database) says as below:
>
> 4.2.8 Default Values
> [...]
> * For miscellaneous properties which take strings as
> values, such as the Unicode Name property, the default
> value is a null string.
> ^^^^^^^^^^^^^
That doesn't preclude interpreting them as nil.
Andreas.
--
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5
"And now for something completely different."
^ permalink raw reply [flat|nested] 49+ messages in thread
* bug#9653: 24.0.50; `ucs-names' - Why all of the ("" . XXX) entries?
2011-10-03 8:22 ` Andreas Schwab
@ 2011-10-04 1:14 ` Kenichi Handa
0 siblings, 0 replies; 49+ messages in thread
From: Kenichi Handa @ 2011-10-04 1:14 UTC (permalink / raw)
To: Andreas Schwab; +Cc: 9653
In article <m239faxz4w.fsf@linux-m68k.org>, Andreas Schwab <schwab@linux-m68k.org> writes:
> Kenichi Handa <handa@m17n.org> writes:
> > It returns "" in such a case in Emacs 24. Emacs 23 returned
> > nil but that behavior was fixed because Unicode Standard
> > Annex #44 (Unicode Character Database) says as below:
> >
> > 4.2.8 Default Values
> > [...]
> > * For miscellaneous properties which take strings as
> > values, such as the Unicode Name property, the default
> > value is a null string.
> > ^^^^^^^^^^^^^
> That doesn't preclude interpreting them as nil.
Hmmm, I've thought that nil usually means no-value, not an
empty string, and Unicode says they surely have a `name'
value.
---
Kenichi Handa
handa@m17n.org
^ permalink raw reply [flat|nested] 49+ messages in thread
* bug#9653: 24.0.50; `ucs-names' - Why all of the ("" . XXX) entries?
2011-10-03 4:23 ` Kenichi Handa
2011-10-03 8:22 ` Andreas Schwab
@ 2011-10-03 13:31 ` Stefan Monnier
2011-10-04 1:59 ` Kenichi Handa
2011-10-04 2:19 ` Drew Adams
2 siblings, 1 reply; 49+ messages in thread
From: Stefan Monnier @ 2011-10-03 13:31 UTC (permalink / raw)
To: Kenichi Handa; +Cc: 9653
>> Handa-san, could you take a look at this? Is (get-char-code-property
>> c 'name) supposed to return "" when the char has no name or is it
>> supposed to return nil? Either way is fine by me (all those "" are
>> really one and the same string, so they don't waste memory).
> It returns "" in such a case in Emacs 24. Emacs 23 returned
> nil but that behavior was fixed because Unicode Standard
> Annex #44 (Unicode Character Database) says as below:
> 4.2.8 Default Values
> [...]
> * For miscellaneous properties which take strings as
> values, such as the Unicode Name property, the default
> value is a null string.
> ^^^^^^^^^^^^^
I'm not opposed to this change, but your answer surprises me:
- we don't have to follow any standard.
- even less so when it talks about internal APIs rather than about
externally-visible behavior.
- "null string" can mean nil just as well as it can mean "".
They actually behave quite similarly: length/concat/mapcar treat them
the same, aref signals an error in both cases, ...
So was there some other motivation (e.g. simpler implementation?
Simpler code somewhere else?)? If not (i.e. all things being equal) I'd
prefer to use nil which is ever so slightly closer to usual Elisp
practice, and matches the Emacs-23 behavior.
Stefan
^ permalink raw reply [flat|nested] 49+ messages in thread
* bug#9653: 24.0.50; `ucs-names' - Why all of the ("" . XXX) entries?
2011-10-03 13:31 ` Stefan Monnier
@ 2011-10-04 1:59 ` Kenichi Handa
2011-10-04 12:56 ` Stefan Monnier
2011-10-06 3:53 ` Kevin Rodgers
0 siblings, 2 replies; 49+ messages in thread
From: Kenichi Handa @ 2011-10-04 1:59 UTC (permalink / raw)
To: Stefan Monnier; +Cc: 9653
In article <jwvd3eep5vo.fsf-monnier+emacs@gnu.org>, Stefan Monnier <monnier@iro.umontreal.ca> writes:
> > * For miscellaneous properties which take strings as
> > values, such as the Unicode Name property, the default
> > value is a null string.
> > ^^^^^^^^^^^^^
> I'm not opposed to this change, but your answer surprises me:
> - we don't have to follow any standard.
But, it is better to follow a standard, especially an
important one as Unicode.
> - even less so when it talks about internal APIs rather than about
> externally-visible behavior.
I think that UCD is talking about external visible
behaviour. Unicode says that all characters have `Name'
property and each value is a string. So, when you ask a
name of a specific character, you should always get a string
value.
> - "null string" can mean nil just as well as it can mean "".
But, as I wrote, nil usually means
no-value/not-specified/unassigned/unknown, which is
different from the explicit "".
> They actually behave quite similarly: length/concat/mapcar treat them
> the same, aref signals an error in both cases, ...
Similar but different. I think the difference is bigger.
insert/string-match/search-forward/etc. signal an error on
nil argument.
And these two signals the different error; wrong-type-argument
vs. args-out-of-range.
(aref nil 1)
(aref "" 1)
> So was there some other motivation (e.g. simpler implementation?
No.
> Simpler code somewhere else?)?
Yes, hypothetically. You can safely write, for instance,
(search-forward (get-char-code-property CHAR 'name) ...)
or
(insert (get-char-code-property CHAR 'name) ...)
without checking the return value.
> If not (i.e. all things being equal) I'd
> prefer to use nil which is ever so slightly closer to usual Elisp
> practice,
Really? I've thought nil and "" are rather different object
in Elisp. In a char-table, (aset CHAR-TABLE CHAR nil)
results in that (aref CHAR-TABLE CHAR) returns the default
value which may not be nil.
> and matches the Emacs-23 behavior.
I'm sorry for this incomaptible change. As I wrote before,
when I first implemented UCD in Emacs, the Unicode was not
clear about the property value of a character not explicitly
listed in the database file. So, at that time, I simply
selected nil as the default value. But, recently I found
that the default value is clearly defined in the recent
versions of Unicode.
So, if you the emacs maintainer thinks that the backward
compatibility is more important, I don't oppose to change
the default value back to nil.
---
Kenichi Handa
handa@m17n.org
^ permalink raw reply [flat|nested] 49+ messages in thread
* bug#9653: 24.0.50; `ucs-names' - Why all of the ("" . XXX) entries?
2011-10-04 1:59 ` Kenichi Handa
@ 2011-10-04 12:56 ` Stefan Monnier
2011-10-06 3:53 ` Kevin Rodgers
1 sibling, 0 replies; 49+ messages in thread
From: Stefan Monnier @ 2011-10-04 12:56 UTC (permalink / raw)
To: Kenichi Handa; +Cc: 9653
>> I'm not opposed to this change, but your answer surprises me:
>> - we don't have to follow any standard.
> But, it is better to follow a standard, especially an
> important one as Unicode.
Of course.
>> - even less so when it talks about internal APIs rather than about
>> externally-visible behavior.
> I think that UCD is talking about external visible behaviour.
If so, it doesn't apply to the behavior of (get-char-code-property CHAR
'name) which is an internal detail.
>> - "null string" can mean nil just as well as it can mean "".
> But, as I wrote, nil usually means
> no-value/not-specified/unassigned/unknown, which is
> different from the explicit "".
Indeed, and that's why I prefer nil: a char's name should be pretty much
unique and descriptive, so "" really isn't a char name, it just means
"this char doesn't have a name" and in Elisp we usually represent this
with nil.
>> So was there some other motivation (e.g. simpler implementation?
> No.
Then please revert it to using nil.
>> Simpler code somewhere else?)?
> Yes, hypothetically. You can safely write, for instance,
> (search-forward (get-char-code-property CHAR 'name) ...)
> or
> (insert (get-char-code-property CHAR 'name) ...)
> without checking the return value.
I doubt there will ever be code that can do the above because the ""
case will need special treatment.
So we end comparing things like
(insert (or (get-char-code-property CHAR 'name) "<Unnamed>"))
with
(insert (let ((name (get-char-code-property CHAR 'name)))
(if (equal name "") "<Unnamed>" name)))
where nil is clearly a more convenient choice.
>> If not (i.e. all things being equal) I'd prefer to use nil which is
>> ever so slightly closer to usual Elisp practice,
> Really? I've thought nil and "" are rather different object in Elisp.
Of course they are, nil usually means "not found" or something like
that, and I think it suits this case perfectly.
Stefan
^ permalink raw reply [flat|nested] 49+ messages in thread
* bug#9653: 24.0.50; `ucs-names' - Why all of the ("" . XXX) entries?
2011-10-04 1:59 ` Kenichi Handa
2011-10-04 12:56 ` Stefan Monnier
@ 2011-10-06 3:53 ` Kevin Rodgers
2011-10-06 12:19 ` Juanma Barranquero
1 sibling, 1 reply; 49+ messages in thread
From: Kevin Rodgers @ 2011-10-06 3:53 UTC (permalink / raw)
To: 9653
On 10/3/11 7:59 PM, Kenichi Handa wrote:
> In article<jwvd3eep5vo.fsf-monnier+emacs@gnu.org>, Stefan Monnier<monnier@iro.umontreal.ca> writes:
...
>> - "null string" can mean nil just as well as it can mean "".
>
> But, as I wrote, nil usually means
> no-value/not-specified/unassigned/unknown, which is
> different from the explicit "".
Which seems more like a "null string"?
-*- mode: C -*-
char *null_string = NULL;
char *empty_string = "";
-*- mode: Java *-*
String null_string = null;
String empty_string = new String("");
--
Kevin Rodgers
Denver, Colorado, USA
^ permalink raw reply [flat|nested] 49+ messages in thread
* bug#9653: 24.0.50; `ucs-names' - Why all of the ("" . XXX) entries?
2011-10-06 3:53 ` Kevin Rodgers
@ 2011-10-06 12:19 ` Juanma Barranquero
2011-10-06 13:02 ` Andreas Schwab
0 siblings, 1 reply; 49+ messages in thread
From: Juanma Barranquero @ 2011-10-06 12:19 UTC (permalink / raw)
To: Kevin Rodgers; +Cc: 9653
On Thu, Oct 6, 2011 at 05:53, Kevin Rodgers <kevin.d.rodgers@gmail.com> wrote:
> Which seems more like a "null string"?
>
> -*- mode: C -*-
> char *null_string = NULL;
> char *empty_string = "";
empty_string, IMHO, though the names are certainly misleading. "" is
clearly a string, NULL is a null pointer to anything (or nothing).
There's nothing stringy in NULL.
I'm with Kenichi and Eli in this, I think "" is more correct/clean
than nil in this case. (or x "") is not difficult to use, but (and
(not (eq x "")) x) isn't rocket science either, and rests to see how
often it is needed anyway.
Funnily enough, a few years back I did a change to uniquify where I
defended the idea that a basename of "" and nil were one and the same
(basename being something with a narrow definition specific to
desktop.el) and Stefan argued (and prevailed) for the relevant
function returning "" and leaving the decision of treating it like nil
to the callers... And he was right: it was a cleaner, more generic
interface.
Juanma
^ permalink raw reply [flat|nested] 49+ messages in thread
* bug#9653: 24.0.50; `ucs-names' - Why all of the ("" . XXX) entries?
2011-10-06 12:19 ` Juanma Barranquero
@ 2011-10-06 13:02 ` Andreas Schwab
2011-10-06 13:47 ` Juanma Barranquero
0 siblings, 1 reply; 49+ messages in thread
From: Andreas Schwab @ 2011-10-06 13:02 UTC (permalink / raw)
To: Juanma Barranquero; +Cc: Kevin Rodgers, 9653
Juanma Barranquero <lekktu@gmail.com> writes:
> I'm with Kenichi and Eli in this, I think "" is more correct/clean
> than nil in this case. (or x "") is not difficult to use, but (and
> (not (eq x "")) x) isn't rocket science either, and rests to see how
> often it is needed anyway.
(or x "") is much more lispy and does not require repeating a value.
Andreas.
--
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5
"And now for something completely different."
^ permalink raw reply [flat|nested] 49+ messages in thread
* bug#9653: 24.0.50; `ucs-names' - Why all of the ("" . XXX) entries?
2011-10-06 13:02 ` Andreas Schwab
@ 2011-10-06 13:47 ` Juanma Barranquero
2011-10-06 14:01 ` Andreas Schwab
0 siblings, 1 reply; 49+ messages in thread
From: Juanma Barranquero @ 2011-10-06 13:47 UTC (permalink / raw)
To: Andreas Schwab; +Cc: Kevin Rodgers, 9653
On Thu, Oct 6, 2011 at 15:02, Andreas Schwab <schwab@linux-m68k.org> wrote:
> (or x "") is much more lispy
There's nothing unlispy in the other code. Sometimes simplicity can be
attained, and sometimes the answer is more complex.
> and does not require repeating a value.
Yes. let-bindings exist for a reason.
But the point is, it seems probable that the "" interface is more
according to what a get-char-code-property user would expect. At
least, it's what I would expect.
Juanma
^ permalink raw reply [flat|nested] 49+ messages in thread
* bug#9653: 24.0.50; `ucs-names' - Why all of the ("" . XXX) entries?
2011-10-06 13:47 ` Juanma Barranquero
@ 2011-10-06 14:01 ` Andreas Schwab
2011-10-06 14:02 ` Juanma Barranquero
0 siblings, 1 reply; 49+ messages in thread
From: Andreas Schwab @ 2011-10-06 14:01 UTC (permalink / raw)
To: Juanma Barranquero; +Cc: Kevin Rodgers, 9653
Juanma Barranquero <lekktu@gmail.com> writes:
> But the point is, it seems probable that the "" interface is more
> according to what a get-char-code-property user would expect. At
> least, it's what I would expect.
A non-existent name is much better represented by nil.
Andreas.
--
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5
"And now for something completely different."
^ permalink raw reply [flat|nested] 49+ messages in thread
* bug#9653: 24.0.50; `ucs-names' - Why all of the ("" . XXX) entries?
2011-10-03 4:23 ` Kenichi Handa
2011-10-03 8:22 ` Andreas Schwab
2011-10-03 13:31 ` Stefan Monnier
@ 2011-10-04 2:19 ` Drew Adams
2011-10-04 4:02 ` Kenichi Handa
2 siblings, 1 reply; 49+ messages in thread
From: Drew Adams @ 2011-10-04 2:19 UTC (permalink / raw)
To: 'Kenichi Handa', 'Stefan Monnier'; +Cc: 9653
> > Is (get-char-code-property c 'name) supposed to return ""
> > when the char has no name or is it supposed to return nil?
> > Either way is fine by me...
>
> It returns "" in such a case in Emacs 24...
> ... the Unicode Name property, the default value is a null string.
I'm following your discussion about this, but I don't see how it relates much to
my question about `ucs-names'.
`ucs-names' can probably be used for more than just a COLLECTION arg for
completion. But at least in that context I don't see how using nil in place of
"" would help at all (on the contrary, I would think).
The question I had was whether & why we want to keep these nameless entries in
`ucs-names'. I have nothing against it really (I just filter them out), but I
wondered if they were intentional. It sounds like the answer to that is yes,
but I wonder what the reason for keeping them is. That's all.
Is the point in keeping these entries (regardless of whether we use "" or nil)
that we want an entry for each code point, even for the code points that do not
have names? If so, then let's be explicit about that aim (e.g. add it to the
doc string or a comment).
^ permalink raw reply [flat|nested] 49+ messages in thread
* bug#9653: 24.0.50; `ucs-names' - Why all of the ("" . XXX) entries?
2011-10-04 2:19 ` Drew Adams
@ 2011-10-04 4:02 ` Kenichi Handa
2011-10-04 13:43 ` Drew Adams
0 siblings, 1 reply; 49+ messages in thread
From: Kenichi Handa @ 2011-10-04 4:02 UTC (permalink / raw)
To: Drew Adams; +Cc: 9653
In article <DAA41C92AE1641A190990F9A82239A86@us.oracle.com>, "Drew Adams" <drew.adams@oracle.com> writes:
> > > Is (get-char-code-property c 'name) supposed to return ""
> > > when the char has no name or is it supposed to return nil?
> > > Either way is fine by me...
> >
> > It returns "" in such a case in Emacs 24...
> > ... the Unicode Name property, the default value is a null string.
> I'm following your discussion about this, but I don't see how it relates much to
> my question about `ucs-names'.
As I don't know how ucs-names is used (I have not known the
existence of such a variable), I have no answer to your
questions.
---
Kenichi Handa
handa@m17n.org
> `ucs-names' can probably be used for more than just a COLLECTION arg for
> completion. But at least in that context I don't see how using nil in place of
> "" would help at all (on the contrary, I would think).
> The question I had was whether & why we want to keep these nameless entries in
> `ucs-names'. I have nothing against it really (I just filter them out), but I
> wondered if they were intentional. It sounds like the answer to that is yes,
> but I wonder what the reason for keeping them is. That's all.
> Is the point in keeping these entries (regardless of whether we use "" or nil)
> that we want an entry for each code point, even for the code points that do not
> have names? If so, then let's be explicit about that aim (e.g. add it to the
> doc string or a comment).
^ permalink raw reply [flat|nested] 49+ messages in thread
* bug#9653: 24.0.50; `ucs-names' - Why all of the ("" . XXX) entries?
2011-10-04 4:02 ` Kenichi Handa
@ 2011-10-04 13:43 ` Drew Adams
2011-10-04 17:34 ` Drew Adams
0 siblings, 1 reply; 49+ messages in thread
From: Drew Adams @ 2011-10-04 13:43 UTC (permalink / raw)
To: 'Kenichi Handa'; +Cc: 9653
> > > > Is (get-char-code-property c 'name) supposed to return ""
> > > > when the char has no name or is it supposed to return nil?
> > > > Either way is fine by me...
> > >
> > > It returns "" in such a case in Emacs 24...
> > > ... the Unicode Name property, the default value is a null string.
>
> > I'm following your discussion about this, but I don't see
> > how it relates much to my question about `ucs-names'.
>
> As I don't know how ucs-names is used (I have not known the
> existence of such a variable), I have no answer to your
> questions.
I was referring to the function `ucs-names', which "returns an alist of
(CHAR-NAME . CHAR-CODE) pairs cached in [variable] `ucs-names'."
I was guessing that you wrote that code. In any case, `ucs-names' is called in
`read-char-by-name'. It is essentially passed as a COLLECTION argument to
`completing-read' (not quite, but the effect is similar).
`read-char-by-name' is used by `ucs-insert' to read the character to insert.
^ permalink raw reply [flat|nested] 49+ messages in thread
* bug#9653: 24.0.50; `ucs-names' - Why all of the ("" . XXX) entries?
2011-10-04 13:43 ` Drew Adams
@ 2011-10-04 17:34 ` Drew Adams
2011-10-04 18:19 ` Eli Zaretskii
0 siblings, 1 reply; 49+ messages in thread
From: Drew Adams @ 2011-10-04 17:34 UTC (permalink / raw)
To: 'Kenichi Handa'; +Cc: 9653
[-- Attachment #1: Type: text/plain, Size: 84 bytes --]
BTW, almost _half_ of the entries in `ucs-names' have "" as the name.
See attached.
[-- Attachment #2: throw-empty-ucs-names-entries.el.gz --]
[-- Type: application/x-gzip, Size: 112008 bytes --]
^ permalink raw reply [flat|nested] 49+ messages in thread
* bug#9653: 24.0.50; `ucs-names' - Why all of the ("" . XXX) entries?
2011-10-04 17:34 ` Drew Adams
@ 2011-10-04 18:19 ` Eli Zaretskii
2011-10-04 18:30 ` Drew Adams
0 siblings, 1 reply; 49+ messages in thread
From: Eli Zaretskii @ 2011-10-04 18:19 UTC (permalink / raw)
To: Drew Adams; +Cc: 9653
> From: "Drew Adams" <drew.adams@oracle.com>
> Date: Tue, 4 Oct 2011 10:34:08 -0700
> Cc: 9653@debbugs.gnu.org
>
> BTW, almost _half_ of the entries in `ucs-names' have "" as the name.
It should be clear from looking at the code of the function that this
is expected:
(if (setq name (get-char-code-property c 'name))
(push (cons name c) names))
(if (setq name (get-char-code-property c 'old-name))
(push (cons name c) names))
Now look at the "Old Name" fields of the characters in the Unicode
database and tell me what you see there.
And if this is not enough, then this excerpt from the full output of
the function should explain the rest:
("" . 44) ("COMMA" . 44) ("" . 43) ("PLUS SIGN" . 43)
("" . 42) ("ASTERISK" . 42)
("CLOSING PARENTHESIS" . 41) ("RIGHT PARENTHESIS" . 41)
("OPENING PARENTHESIS" . 40) ("LEFT PARENTHESIS" . 40)
("APOSTROPHE-QUOTE" . 39) ("APOSTROPHE" . 39)
("" . 38) ("AMPERSAND" . 38) ("" . 37) ("PERCENT SIGN" . 37)
("" . 36) ("DOLLAR SIGN" . 36) ("" . 35) ("NUMBER SIGN" . 35)
("" . 34) ("QUOTATION MARK" . 34) ("" . 33) ("EXCLAMATION MARK" . 33)
("" . 32) ("SPACE" . 32)
Any questions?
^ permalink raw reply [flat|nested] 49+ messages in thread
* bug#9653: 24.0.50; `ucs-names' - Why all of the ("" . XXX) entries?
2011-10-04 18:19 ` Eli Zaretskii
@ 2011-10-04 18:30 ` Drew Adams
2011-10-04 20:55 ` Eli Zaretskii
2011-10-04 21:39 ` Stefan Monnier
0 siblings, 2 replies; 49+ messages in thread
From: Drew Adams @ 2011-10-04 18:30 UTC (permalink / raw)
To: 'Eli Zaretskii'; +Cc: 9653
> Any questions?
Yes, why are you explaining this? The question is not whether "this is
expected" based on the current implementation. The question is whether that
implementation does what we want. You seem to be answering the "Why?" of the
Subject line with "how".
No one gave any indication that how this happens is a mystery. The question
posed by this bug thread is whether and why we want to have such unnamed
characters in `ucs-names'. That's all. How those names happen to be there has
never been in question.
Almost half of the *many* names in `ucs-names' are non-names (empty) - that's
49,368 empty names. Is that really what we want? That's the question being
discussed.
^ permalink raw reply [flat|nested] 49+ messages in thread
* bug#9653: 24.0.50; `ucs-names' - Why all of the ("" . XXX) entries?
2011-10-04 18:30 ` Drew Adams
@ 2011-10-04 20:55 ` Eli Zaretskii
2011-10-04 21:39 ` Stefan Monnier
1 sibling, 0 replies; 49+ messages in thread
From: Eli Zaretskii @ 2011-10-04 20:55 UTC (permalink / raw)
To: Drew Adams; +Cc: 9653
> From: "Drew Adams" <drew.adams@oracle.com>
> Cc: <handa@m17n.org>, <9653@debbugs.gnu.org>
> Date: Tue, 4 Oct 2011 11:30:22 -0700
>
> > Any questions?
>
> Yes, why are you explaining this? The question is not whether "this is
> expected" based on the current implementation. The question is whether that
> implementation does what we want. You seem to be answering the "Why?" of the
> Subject line with "how".
>
> No one gave any indication that how this happens is a mystery. The question
> posed by this bug thread is whether and why we want to have such unnamed
> characters in `ucs-names'. That's all. How those names happen to be there has
> never been in question.
>
> Almost half of the *many* names in `ucs-names' are non-names (empty) - that's
> 49,368 empty names. Is that really what we want? That's the question being
> discussed.
All those questions were already answered half the thread away. You
are wasting everybody's time and bandwidth by adding more pointless
remarks to what is a done deal. Just sit tight and wait for Handa-san
to do what Stefan asked him to. Case closed.
^ permalink raw reply [flat|nested] 49+ messages in thread
* bug#9653: 24.0.50; `ucs-names' - Why all of the ("" . XXX) entries?
2011-10-04 18:30 ` Drew Adams
2011-10-04 20:55 ` Eli Zaretskii
@ 2011-10-04 21:39 ` Stefan Monnier
2011-10-04 22:03 ` Drew Adams
1 sibling, 1 reply; 49+ messages in thread
From: Stefan Monnier @ 2011-10-04 21:39 UTC (permalink / raw)
To: Drew Adams; +Cc: 9653
> No one gave any indication that how this happens is a mystery. The question
> posed by this bug thread is whether and why we want to have such unnamed
> characters in `ucs-names'. That's all. How those names happen to be there has
> never been in question.
I'd have hoped my message #23 (http://debbugs.gnu.org/cgi/bugreport.cgi?bug=9653#23)
made it all clear already. We're just discussing how to fix
the problem.
Stefan
^ permalink raw reply [flat|nested] 49+ messages in thread
* bug#9653: 24.0.50; `ucs-names' - Why all of the ("" . XXX) entries?
2011-10-04 21:39 ` Stefan Monnier
@ 2011-10-04 22:03 ` Drew Adams
2011-10-05 4:11 ` Eli Zaretskii
0 siblings, 1 reply; 49+ messages in thread
From: Drew Adams @ 2011-10-04 22:03 UTC (permalink / raw)
To: 'Stefan Monnier'; +Cc: 9653
> > whether and why we want to have such unnamed
> > characters in `ucs-names'. That's all. How those names
> > happen to be there has never been in question.
>
> I'd have hoped my message #23
> (http://debbugs.gnu.org/cgi/bugreport.cgi?bug=9653#23)
> made it all clear already. We're just discussing how to fix
> the problem.
Well, sort of, but I didn't have the impression that the question was really
closed, that you do indeed want to "filter out those useless entries", and that
you had moved on to how to filter them.
In any case, as you point out, message #23 already showed "how" the list gets
populated, which Eli repeated.
Your subsequent discussion of whether to use "" or nil to represent an empty
name (no name) didn't do much to further the impression that you really wanted
to filter out such entries. If they are to be removed from `ucs-names', what
difference does it make how you choose to represent them temporarily?
Be that as it may, I have a further question about this.
Currently the only use of `ucs-names' in the Emacs source code is in
`read-char-by-name', and in that context the 49,368 empty names are not even
available to users as character choices.
That is, the empty _names_ are not available as completion targets, but their
characters, and the alist entries with empty names, are still available to
users, because a user can alternatively enter a character code (not a name).
It is that, in part, that made me wonder whether you might really be wanting to
keep such empty-name entries.
I supposed, however, that each empty-name alist entry has an alter-ego entry
that has the same code point but also has a non-empty name: e.g. ("" . 32) &
("SPACE" . 32).
If that were the case, then I would see no use by `read-char-by-name' for the
empty-name entries. Users of `read-char-by-name' would never see or touch a
character that has no name (has an empty name), unless it were by entering a
code point that also corresponds to a character that also has a name.
That was my impression until I verified just now. I see in fact that there are
some `ucs-names' entries that, yes, constitute a pair with the same character
code, yet _both_ are empty. E.g., ("" . 11565) & ("" . 11565).
So do we want to keep empty-name entries perhaps for that reason: because there
are some characters that have no name (old or new), and we want users to be able
to read them (e.g. to insert them using `ucs-insert') by giving their char
codes?
To me it still looks like an open question.
^ permalink raw reply [flat|nested] 49+ messages in thread
* bug#9653: 24.0.50; `ucs-names' - Why all of the ("" . XXX) entries?
2011-10-04 22:03 ` Drew Adams
@ 2011-10-05 4:11 ` Eli Zaretskii
2011-10-05 13:20 ` Drew Adams
0 siblings, 1 reply; 49+ messages in thread
From: Eli Zaretskii @ 2011-10-05 4:11 UTC (permalink / raw)
To: Drew Adams; +Cc: 9653
> From: "Drew Adams" <drew.adams@oracle.com>
> Cc: "'Eli Zaretskii'" <eliz@gnu.org>, <9653@debbugs.gnu.org>
> Date: Tue, 4 Oct 2011 15:03:27 -0700
>
> I supposed, however, that each empty-name alist entry has an alter-ego entry
> that has the same code point but also has a non-empty name: e.g. ("" . 32) &
> ("SPACE" . 32).
Yes, each character code is entered into the alist twice: once with
its name, and another time with its "old name", which is empty for
most of the characters. I thought the code fragment I posted made
that clear.
> I see in fact that there are
> some `ucs-names' entries that, yes, constitute a pair with the same character
> code, yet _both_ are empty. E.g., ("" . 11565) & ("" . 11565).
These are undefined characters. 11565 is hex 2d2d; you will not find
such a codepoint in the Unicode database.
^ permalink raw reply [flat|nested] 49+ messages in thread
* bug#9653: 24.0.50; `ucs-names' - Why all of the ("" . XXX) entries?
2011-10-05 4:11 ` Eli Zaretskii
@ 2011-10-05 13:20 ` Drew Adams
2011-10-05 17:24 ` Eli Zaretskii
0 siblings, 1 reply; 49+ messages in thread
From: Drew Adams @ 2011-10-05 13:20 UTC (permalink / raw)
To: 'Eli Zaretskii'; +Cc: 9653
> > I supposed, however, that each empty-name alist entry has
> > an alter-ego entry that has the same code point but also
> > has a non-empty name: e.g. ("" . 32) & ("SPACE" . 32).
>
> Yes, each character code is entered into the alist twice: once with
> its name, and another time with its "old name", which is empty for
> most of the characters. I thought the code fragment I posted made
> that clear.
Yes, it did make it (abundantly, redundantly) clear. As it was already clear
from Stefan's earlier code-fragment post. And as it was clear to me from the
beginning, after looking at the code.
Not the question. What I was incorrectly supposing was that every character
code corresponded to either a non-empty name or a non-empty old-name. I did not
realize that both names might be empty.
> > I see in fact that there are some `ucs-names' entries that,
> > yes, constitute a pair with the same character code, yet
> > _both_ are empty. E.g., ("" . 11565) & ("" . 11565).
>
> These are undefined characters. 11565 is hex 2d2d; you will not find
> such a codepoint in the Unicode database.
Great. If they are all such, then I suppose we can in fact filter out all empty
entries.
^ permalink raw reply [flat|nested] 49+ messages in thread
* bug#9653: 24.0.50; `ucs-names' - Why all of the ("" . XXX) entries?
2011-10-05 13:20 ` Drew Adams
@ 2011-10-05 17:24 ` Eli Zaretskii
0 siblings, 0 replies; 49+ messages in thread
From: Eli Zaretskii @ 2011-10-05 17:24 UTC (permalink / raw)
To: Drew Adams; +Cc: 9653
> From: "Drew Adams" <drew.adams@oracle.com>
> Cc: <monnier@iro.umontreal.ca>, <9653@debbugs.gnu.org>
> Date: Wed, 5 Oct 2011 06:20:21 -0700
>
> What I was incorrectly supposing was that every character
> code corresponded to either a non-empty name or a non-empty old-name. I did not
> realize that both names might be empty.
All assigned characters have a non-empty "name" attribute. A few
_also_ have a non-empty "old-name" attribute. Both attributes can be
empty only for unassigned characters.
^ permalink raw reply [flat|nested] 49+ messages in thread
* bug#9653: 24.0.50; `ucs-names' - Why all of the ("" . XXX) entries?
2011-10-02 16:36 ` bug#9653: 24.0.50; `ucs-names' - Why all of the ("" . XXX) entries? Drew Adams
` (2 preceding siblings ...)
2011-10-03 1:28 ` Stefan Monnier
@ 2011-10-05 8:59 ` Kenichi Handa
2011-10-05 10:20 ` Eli Zaretskii
2012-02-17 15:55 ` Drew Adams
2018-02-13 23:35 ` Drew Adams
5 siblings, 1 reply; 49+ messages in thread
From: Kenichi Handa @ 2011-10-05 8:59 UTC (permalink / raw)
To: Stefan Monnier; +Cc: 9653
You wrote:
> >> - even less so when it talks about internal APIs rather than about
> >> externally-visible behavior.
> > I think that UCD is talking about external visible behaviour.
>
> If so, it doesn't apply to the behavior of (get-char-code-property CHAR
> 'name) which is an internal detail.
I think what get-char-code-proeprty returns belongs to an
external API, and currently I put this docstring to `name'
property.
"Unicode character name.
Property value is a string."
So, when one wants to know a Unicode name of a character, he
can call get-char-code-proeprty and it should return "".
> >> - "null string" can mean nil just as well as it can mean "".
> > But, as I wrote, nil usually means
> > no-value/not-specified/unassigned/unknown, which is
> > different from the explicit "".
>
> Indeed, and that's why I prefer nil: a char's name should be pretty much
> unique and descriptive, so "" really isn't a char name, it just means
> "this char doesn't have a name" and in Elisp we usually represent this
> with nil.
Perhaps we are requesting two functionalities to
get-char-code-property. One is to return the specific name
of a character or nil if it doesn't have such a name, the
other is to just return a name that Unicode defines for a
character. As there's just one function at the moment, we
should find which functionality is more useful.
> >> So was there some other motivation (e.g. simpler implementation?
> > No.
>
> Then please revert it to using nil.
So, your opinion is that the former functionality is more
useful, right?
Then, ok, I'll change it back, and, change the above
docstring accordingly.
> >> Simpler code somewhere else?)?
> > Yes, hypothetically. You can safely write, for instance,
> > (search-forward (get-char-code-property CHAR 'name) ...)
> > or
> > (insert (get-char-code-property CHAR 'name) ...)
> > without checking the return value.
>
> I doubt there will ever be code that can do the above because the ""
> case will need special treatment.
>
> So we end comparing things like
>
> (insert (or (get-char-code-property CHAR 'name) "<Unnamed>"))
> with
> (insert (let ((name (get-char-code-property CHAR 'name)))
> (if (equal name "") "<Unnamed>" name)))
>
> where nil is clearly a more convenient choice.
I'll repeat that when one want to know what Unicode says
about the name of a character, the answer is "", not
"<Unnamed>".
> >> If not (i.e. all things being equal) I'd prefer to use nil which is
> >> ever so slightly closer to usual Elisp practice,
> > Really? I've thought nil and "" are rather different object in Elisp.
>
> Of course they are, nil usually means "not found" or something like
> that, and I think it suits this case perfectly.
I'm not sure because there are multiple use-cases of
get-char-code-property, and nil is better only in some of
them. But, it's just "I'm not sure". If you are sure, as I
wrote above, I'll change it back.
---
Kenichi Handa
handa@m17n.org
^ permalink raw reply [flat|nested] 49+ messages in thread
* bug#9653: 24.0.50; `ucs-names' - Why all of the ("" . XXX) entries?
2011-10-05 8:59 ` Kenichi Handa
@ 2011-10-05 10:20 ` Eli Zaretskii
2011-10-05 12:40 ` Stefan Monnier
0 siblings, 1 reply; 49+ messages in thread
From: Eli Zaretskii @ 2011-10-05 10:20 UTC (permalink / raw)
To: Kenichi Handa; +Cc: 9653
> From: Kenichi Handa <handa@m17n.org>
> Date: Wed, 05 Oct 2011 17:59:06 +0900
> Cc: 9653@debbugs.gnu.org
>
> I think what get-char-code-proeprty returns belongs to an
> external API, and currently I put this docstring to `name'
> property.
>
> "Unicode character name.
> Property value is a string."
Right, and the same is in the ELisp manual:
`name'
Corresponds to the `Name' Unicode property. The value is a string
consisting of upper-case Latin letters A to Z, digits, spaces, and
hyphen `-' characters. For unassigned codepoints, the value is an
empty string.
A similar verbiage is there for old-name.
> I'll repeat that when one want to know what Unicode says
> about the name of a character, the answer is "", not
> "<Unnamed>".
Correct.
> > >> If not (i.e. all things being equal) I'd prefer to use nil which is
> > >> ever so slightly closer to usual Elisp practice,
> > > Really? I've thought nil and "" are rather different object in Elisp.
> >
> > Of course they are, nil usually means "not found" or something like
> > that, and I think it suits this case perfectly.
>
> I'm not sure because there are multiple use-cases of
> get-char-code-property, and nil is better only in some of
> them. But, it's just "I'm not sure". If you are sure, as I
> wrote above, I'll change it back.
FWIW, I'm not sure, either. Stefan, can you please provide "heavier"
arguments than just what's been written in this thread?
If the issue is just to filter the empty names from what ucs-names
returns, we can do that with either nil or empty strings. But let's
also think about other users of get-char-code-property, such as
what-cursor-position etc., where we will now need to display an empty
string when we get nil. By contrast, the way get-char-code-property
is coded now its results are ready to be displayed in a manner that is
entirely consistent with the requirements of the Unicode standard, and
not really getting in the way of Emacs. So it is unclear to me why we
should disregard the standard's guidance in this particular case.
^ permalink raw reply [flat|nested] 49+ messages in thread
* bug#9653: 24.0.50; `ucs-names' - Why all of the ("" . XXX) entries?
2011-10-05 10:20 ` Eli Zaretskii
@ 2011-10-05 12:40 ` Stefan Monnier
2011-10-06 18:02 ` Eli Zaretskii
0 siblings, 1 reply; 49+ messages in thread
From: Stefan Monnier @ 2011-10-05 12:40 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: 9653
>> I'll repeat that when one want to know what Unicode says about the
>> name of a character, the answer is "", not "<Unnamed>".
> Correct.
Doesn't matter. The point is that it's easier to turn nil into
something else (e.g. "") than to turn "" into something else (e.g. nil).
>> I'm not sure because there are multiple use-cases of
>> get-char-code-property, and nil is better only in some of them.
>> But, it's just "I'm not sure". If you are sure, as I wrote above,
>> I'll change it back.
I'm sure.
> what-cursor-position etc., where we will now need to display an empty
> string when we get nil.
Big deal: that's just an (or ... "").
And in any case no matter what the standard says, I'm pretty sure end
users would prefer to be told explicitly that a char doesn't have a name
rather than to see an empty field and wonder what that means.
Stefan
^ permalink raw reply [flat|nested] 49+ messages in thread
* bug#9653: 24.0.50; `ucs-names' - Why all of the ("" . XXX) entries?
2011-10-05 12:40 ` Stefan Monnier
@ 2011-10-06 18:02 ` Eli Zaretskii
2011-10-06 20:56 ` Stefan Monnier
0 siblings, 1 reply; 49+ messages in thread
From: Eli Zaretskii @ 2011-10-06 18:02 UTC (permalink / raw)
To: Stefan Monnier; +Cc: 9653
> From: Stefan Monnier <monnier@iro.umontreal.ca>
> Cc: Kenichi Handa <handa@m17n.org>, 9653@debbugs.gnu.org
> Date: Wed, 05 Oct 2011 08:40:13 -0400
>
> >> I'm not sure because there are multiple use-cases of
> >> get-char-code-property, and nil is better only in some of them.
> >> But, it's just "I'm not sure". If you are sure, as I wrote above,
> >> I'll change it back.
>
> I'm sure.
>
> > what-cursor-position etc., where we will now need to display an empty
> > string when we get nil.
>
> Big deal: that's just an (or ... "").
>
> And in any case no matter what the standard says, I'm pretty sure end
> users would prefer to be told explicitly that a char doesn't have a name
> rather than to see an empty field and wonder what that means.
I asked for some reasoning, but you didn't provide any. Fine. But
should we now change the `iso-10646-comment' property to be nil when
it doesn't exist? Currently, any string attribute is returned as ""
if the attribute is not defined in the Unicode database.
^ permalink raw reply [flat|nested] 49+ messages in thread
* bug#9653: 24.0.50; `ucs-names' - Why all of the ("" . XXX) entries?
2011-10-06 18:02 ` Eli Zaretskii
@ 2011-10-06 20:56 ` Stefan Monnier
2012-01-14 18:35 ` Drew Adams
0 siblings, 1 reply; 49+ messages in thread
From: Stefan Monnier @ 2011-10-06 20:56 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: 9653
> I asked for some reasoning, but you didn't provide any. Fine. But
> should we now change the `iso-10646-comment' property to be nil when
> it doesn't exist? Currently, any string attribute is returned as ""
> if the attribute is not defined in the Unicode database.
As a general rule, yes.
Stefan
^ permalink raw reply [flat|nested] 49+ messages in thread
* bug#9653: 24.0.50; `ucs-names' - Why all of the ("" . XXX) entries?
2011-10-06 20:56 ` Stefan Monnier
@ 2012-01-14 18:35 ` Drew Adams
0 siblings, 0 replies; 49+ messages in thread
From: Drew Adams @ 2012-01-14 18:35 UTC (permalink / raw)
To: 'Stefan Monnier', 'Eli Zaretskii'; +Cc: 9653
Whatever happened with this bug? I thought that it was decided to remove the
empty entries and you were just discussing how best to do that. Nothing has
happened in this thread since 2011-10-11.
In the latest build I have, from 2012-01-05, the empty-name entries are still
there.
Dunno whether I pointed this out before, but such entries interfere with the
possibility of simply using `rassq' to look up a char code. Especially since
the empty entries seem to come _first_, before the non-empty entries (why?).
E.g., try to look up (rassq 11967 (ucs-names)).
There is a perfectly good entry for this char code: ("CJK RADICAL GRASS TWO" .
11967). But it is _preceded_ by this empty-name entry: ("" . 11967), which is
of course what `rassq' returns. (And those are the only entries for 11967.)
Not the end of the world, certainly, but can't we do better than this? You guys
discussed the implementation of this fix for quite a while, but the real fix
seems to have petered out.
^ permalink raw reply [flat|nested] 49+ messages in thread
* bug#9653: 24.0.50; `ucs-names' - Why all of the ("" . XXX) entries?
2011-10-02 16:36 ` bug#9653: 24.0.50; `ucs-names' - Why all of the ("" . XXX) entries? Drew Adams
` (3 preceding siblings ...)
2011-10-05 8:59 ` Kenichi Handa
@ 2012-02-17 15:55 ` Drew Adams
2018-02-13 23:35 ` Drew Adams
5 siblings, 0 replies; 49+ messages in thread
From: Drew Adams @ 2012-02-17 15:55 UTC (permalink / raw)
To: 9653
Forwarding to 9653@debbugs.gnu.org.
Otherwise, it is never sent to that bug list (so I, for one, do not receive it).
That is a deficiency in the bug-system design, IMO, and should be fixed.
It is not enough that mail to ####-done@debbugs.gnu.org gets added to the bug
thread indirectly, so you can see it using HTTP at
http://debbugs.gnu.org/cgi/bugreport.cgi?bug=####.
Mail sent to ####-done@debbugs.gnu.org should also continue to go to the bug
mailing list (####@debbugs.gnu.org), so subscribers following the thread there
receive such messages as well.
Just because someone closes a bug does not mean that subsequent correspondence
about it should be excluded from the bug mailing list by default.
When you hit `Reply All' to reply to the bug thread, you expect your reply to be
added to that bug thread, i.e., the bug mailing list. Users should not have to
think about editing the recipients list to add the bug-thread list
(####@debbugs.gnu.org). That address should be present by default.
-----Original Message-----
Sent: Friday, February 17, 2012 7:40 AM
To: 'Stefan Monnier'; 'Kenichi Handa' Cc: '9653-done@debbugs.gnu.org'
> >> Could you take a look at this problem and replace the ""
> >> with nil for the name of unassigned chars?
> > Done.
>
> Thank you,
Thank you for trying to fix this.
But I still see the same problem:
>> such entries interfere with the possibility of simply
>> using `rassq' to look up a char code. Especially since
>> the empty entries seem to come _first_, before the non-empty
>> entries (why?).
>>
>> E.g., try to look up (rassq 11967 (ucs-names)).
>>
>> There is a perfectly good entry for this char code: ("CJK
>> RADICAL GRASS TWO" . 11967). But it is _preceded_ by this
>> empty-name entry: ("" . 11967), which is of course what
>> `rassq' returns. (And those are the only entries for 11967.)
IOW, replacing "" by nil _only for unassigned chars_ does not solve the problem,
AFAICT. The problem is that there are ("" . <some#>) entries that precede
perfectly legitimate, assigned chars. Can you please remove those empty entries
as well?
emacs -Q
M-: (rassq 11967 (ucs-names))
That returns ("" . 11967), which is useless (for me, at least). I expect it to
return ("CJK RADICAL GRASS TWO" . 11967), which is usable.
If this was fixed after the build I'm testing (the latest Windows binary
available, from 2 days ago), then ignore what I just said. But that is anyway
what I see in this build:
In GNU Emacs 24.0.93.1 (i386-mingw-nt5.1.2600)
of 2012-02-15 on MARVIN
Windowing system distributor `Microsoft Corp.', version 5.1.2600
Configured using:
`configure --with-gcc (4.6) --no-opt --enable-checking --cflags
-ID:/devel/emacs/libs/libXpm-3.5.8/include
-ID:/devel/emacs/libs/libXpm-3.5.8/src
-ID:/devel/emacs/libs/libpng-dev_1.4.3-1/include
-ID:/devel/emacs/libs/zlib-dev_1.2.5-2/include
-ID:/devel/emacs/libs/giflib-4.1.4-1/include
-ID:/devel/emacs/libs/jpeg-6b-4/include
-ID:/devel/emacs/libs/tiff-3.8.2-1/include
-ID:/devel/emacs/libs/gnutls-3.0.9/include'
^ permalink raw reply [flat|nested] 49+ messages in thread
* bug#9653: 24.0.50; `ucs-names' - Why all of the ("" . XXX) entries?
2011-10-02 16:36 ` bug#9653: 24.0.50; `ucs-names' - Why all of the ("" . XXX) entries? Drew Adams
` (4 preceding siblings ...)
2012-02-17 15:55 ` Drew Adams
@ 2018-02-13 23:35 ` Drew Adams
2018-02-15 1:11 ` Noam Postavsky
5 siblings, 1 reply; 49+ messages in thread
From: Drew Adams @ 2018-02-13 23:35 UTC (permalink / raw)
To: Kenichi Handa, Stefan Monnier; +Cc: 9653
> In article <jwvzkch5w1i.fsf-monnier+emacs@gnu.org>, Stefan Monnier
> <monnier@iro.umontreal.ca> writes:
>
> >>> Could you take a look at this problem and replace the
> >>> "" with nil for the name of unassigned chars?
> > > Done.
>
> > Thank you,
>
> No, the original bug is not yet fixed. Perhaps I
> misunderstood what you wanted me. I've done this part only
> "replace the "" with nil for the name of unassigned chars".
> If what you wanted me includes fixing this problem, please
> ask the author of ucs-names. He should be able to work on
> it much faster/easier than me. I've never read the code of
> ucs-names.
It would be good if someone would please fix this bug.
Why was this closed as "Done"?
Thx.
^ permalink raw reply [flat|nested] 49+ messages in thread
* bug#9653: 24.0.50; `ucs-names' - Why all of the ("" . XXX) entries?
2018-02-13 23:35 ` Drew Adams
@ 2018-02-15 1:11 ` Noam Postavsky
2018-02-15 3:17 ` Drew Adams
0 siblings, 1 reply; 49+ messages in thread
From: Noam Postavsky @ 2018-02-15 1:11 UTC (permalink / raw)
To: Drew Adams; +Cc: 9653, Stefan Monnier, Kenichi Handa
Drew Adams <drew.adams@oracle.com> writes:
> It would be good if someone would please fix this bug.
>
> Why was this closed as "Done"?
It's fixed. The timeline is:
Emacs 23: (get-char-code-property CHAR 'name) returns nil for characters
lacking names. `ucs-names' doesn't add those to the list.
Emacs 24: get-char-code-property was changed to return the non-nil value
"" instead, meaning `ucs-names' no longer filtered them.
The change is reverted, so the entries are filtered again, as in Emacs 23.
^ permalink raw reply [flat|nested] 49+ messages in thread
* bug#9653: 24.0.50; `ucs-names' - Why all of the ("" . XXX) entries?
2018-02-15 1:11 ` Noam Postavsky
@ 2018-02-15 3:17 ` Drew Adams
0 siblings, 0 replies; 49+ messages in thread
From: Drew Adams @ 2018-02-15 3:17 UTC (permalink / raw)
To: Noam Postavsky; +Cc: 9653, Stefan Monnier, Kenichi Handa
> > Why was this closed as "Done"?
>
> It's fixed. The timeline is:
>
> Emacs 23: (get-char-code-property CHAR 'name) returns nil for characters
> lacking names. `ucs-names' doesn't add those to the list.
>
> Emacs 24: get-char-code-property was changed to return the non-nil value
> "" instead, meaning `ucs-names' no longer filtered them.
>
> The change is reverted, so the entries are filtered again, as in Emacs
> 23.
OK, thanks. I guess the other parts I referred to will
not be fixed - char names "VARIATION SELECTOR-n" etc.
That's OK; we can continue to filter them out (for most
purposes).
^ permalink raw reply [flat|nested] 49+ messages in thread