all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: Paul Eggert <eggert@cs.ucla.edu>
To: Philipp Stephani <p.stephani2@gmail.com>, Eli Zaretskii <eliz@gnu.org>
Cc: larsi@gnus.org, johnw@gnu.org, emacs-devel@gnu.org
Subject: Re: Character literals for Unicode (control) characters
Date: Mon, 14 Mar 2016 13:03:38 -0700	[thread overview]
Message-ID: <56E7191A.60507@cs.ucla.edu> (raw)
In-Reply-To: <CAArVCkTjD+09yPfY50Xb7TbxC2hW_Fuf03pQZy0p0bfAU0hafQ@mail.gmail.com>

Thanks, here's a detailed low level review.

> Subject: [PATCH 4/4] Use `ucs-names'.

Summary lines like "Use `ucs-names'." should not end with "." and should 
be as informative as possible within a 50-char limit.

> +#include <stdnoreturn.h>

This include reportedly doesn't work well with Microsoft compilers. Omit 
it and use _Noreturn instead of noreturn.

> +/* Signals an `invalid-read-syntax' error indicating that the
> +   character name in an \N{...} literal is invalid.  */

Use active voice "Signal an" rather than a non-sentence. Don't use grave 
quoting in comments (no quoting needed here anyway).

> +static noreturn void invalid_character_name (Lisp_Object name)

Put "static _Noreturn void" on the first line, and the rest on the next 
line; that's the usual GNU style.

> +/* Checks that CODE is a valid Unicode scalar value, and returns its
> +   value.  CODE should be parsed from the character name given by
> +   NAME.  NAME is used for error messages.  */

Active voice: "Checks" -> "Check".

> +static int check_scalar_value (Lisp_Object code, Lisp_Object name)

"static int" in a separate line.

> +{
> +  if (! RANGED_INTEGERP (0, code, MAX_UNICODE_CHAR) ||
> +      /* Don't allow surrogates.  */
> +      RANGED_INTEGERP (0xD800, code, 0xDFFF))
> +    invalid_character_name (name);
> +  return XINT (code);
> +}

RANGED_INTEGERP implies two tests for integer. Better would be an 
explicit NUMBERP check, followed by an XINT, followed by C-language 
range checks. Just use <= or < in range checks (not >= or >).

Also, don't put operators like || at the end of a line; put them at the 
start of the next line instead.

> +/* If NAME starts with PREFIX, interpret the rest as a hexadecimal
> +   number and return its value.  Raises `invalid-read-syntax' if the
> +   number is not a valid scalar value.  Returns -1 if NAME doesn't
> +   start with PREFIX.  */

Active voice. No need for grave quoting.

> +static int
> +parse_code_after_prefix (Lisp_Object name, const char* prefix)

"char* x" -> "char *x" in GNU style.

> +  if (name_len > prefix_len && name_len <= prefix_len + 8

Just use < or <= for range checks.

> +      Lisp_Object code = string_to_number (SDATA (name) + prefix_len, 
> 16, false);
> +      if (! NILP (code))
> +        return check_scalar_value (code, name);

Why is nil treated differently from other invalid values (e.g., 
floating-point numbers)? They're all invalid character names, right?

>
> +      /* Various ranges of CJK characters; see UnicodeData.txt. */
> +      if ((code >= 0x3400 && code <= 0x4DB5) ||
> +          (code >= 0x4E00 && code <= 0x9FD5) ||
> +          (code >= 0x20000 && code <= 0x2A6D6) ||
> +          (code >= 0x2A700 && code <= 0x2B734) ||
> +          (code >= 0x2B740 && code <= 0x2B81D) ||
> +          (code >= 0x2B820 && code <= 0x2CEA1))
> +        return code;

Use only <= here, and put || at the start of lines. What's the 
likelihood that the numbers in the above test will change?

>
> +  if (! CONSP (names))
> +    invalid_syntax ("Unicode character name database not loaded");

This test is not needed, as ucs-names always returns a cons, and anyway 
even if it didn't then Fassoc would do the right thing.

> +        /* 200 characters is hopefully long enough.  Increase if
> +           not.  */
> +        char name[200];

Give a name to this constant, e.g.,

/* Bound on the length of a Unicode character name.
    As of Unicode 9.0.0 the maximum is 83, so this should be safe. */
enum { UNICODE_CHARACTER_NAME_LENGTH_BOUND = 199 };
...
    char name[UNICODE_CHARACTER_NAME_LENGTH_BOUND + 1];




  reply	other threads:[~2016-03-14 20:03 UTC|newest]

Thread overview: 47+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-03-03  5:47 Character literals for Unicode (control) characters Lars Ingebrigtsen
2016-03-03  6:20 ` John Wiegley
2016-03-03  6:25   ` Lars Ingebrigtsen
2016-03-03  6:34 ` Drew Adams
2016-03-03 16:11 ` Paul Eggert
2016-03-03 20:48   ` Eli Zaretskii
2016-03-03 23:58     ` Paul Eggert
2016-03-05 15:28   ` Philipp Stephani
2016-03-05 15:39     ` Marcin Borkowski
2016-03-05 16:51       ` Philipp Stephani
2016-03-06  2:27     ` John Wiegley
2016-03-06 15:24       ` Philipp Stephani
2016-03-06 15:54         ` Eli Zaretskii
2016-03-06 17:35           ` Philipp Stephani
2016-03-06 18:08             ` Paul Eggert
2016-03-06 18:28               ` Philipp Stephani
2016-03-06 19:03                 ` Paul Eggert
2016-03-06 19:16                   ` Philipp Stephani
2016-03-06 20:05                     ` Eli Zaretskii
2016-03-13 20:31                       ` Philipp Stephani
2016-03-14 20:03                         ` Paul Eggert [this message]
2016-03-14 20:30                           ` Eli Zaretskii
2016-03-15 11:09                             ` Nikolai Weibull
2016-03-15 17:10                               ` Eli Zaretskii
2016-03-16  8:16                                 ` Nikolai Weibull
2016-03-14 21:27                           ` Clément Pit--Claudel
2016-03-14 21:48                             ` Paul Eggert
2016-03-19 16:27                           ` Philipp Stephani
2016-03-20 12:58                             ` Paul Eggert
2016-03-20 13:25                               ` Philipp Stephani
2016-03-25 17:41                                 ` Philipp Stephani
2016-04-22  2:39                                   ` Paul Eggert
2016-04-22  7:57                                     ` Eli Zaretskii
2016-04-22  8:01                                       ` Eli Zaretskii
2016-04-22  9:39                                         ` Elias Mårtenson
2016-04-22 10:01                                           ` Eli Zaretskii
2016-04-25 17:48                                             ` Paul Eggert
2016-03-05 16:35   ` Clément Pit--Claudel
2016-03-05 17:12     ` Paul Eggert
2016-03-05 17:53       ` Clément Pit--Claudel
2016-03-05 18:16         ` Eli Zaretskii
2016-03-05 18:34           ` Clément Pit--Claudel
2016-03-05 18:56             ` Eli Zaretskii
2016-03-05 19:08               ` Drew Adams
2016-03-05 22:52                 ` Clément Pit--Claudel
2016-03-06 15:49           ` Joost Kremers
2016-03-06 16:55             ` Drew Adams

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=56E7191A.60507@cs.ucla.edu \
    --to=eggert@cs.ucla.edu \
    --cc=eliz@gnu.org \
    --cc=emacs-devel@gnu.org \
    --cc=johnw@gnu.org \
    --cc=larsi@gnus.org \
    --cc=p.stephani2@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.