From mboxrd@z Thu Jan  1 00:00:00 1970
Path: news.gmane.org!not-for-mail
From: Paul Eggert <eggert@cs.ucla.edu>
Newsgroups: gmane.emacs.devel
Subject: Re: Character literals for Unicode (control) characters
Date: Mon, 14 Mar 2016 13:03:38 -0700
Organization: UCLA Computer Science Department
Message-ID: <56E7191A.60507@cs.ucla.edu>
References: <87r3fsjenn.fsf@gnus.org> <56D8623F.6060806@cs.ucla.edu>
	<CAArVCkRPFrpXkE4+Z48V529gUZSirVWbotzv1EeqRDeT8aFv0w@mail.gmail.com>
	<m2fuw4732y.fsf@newartisans.com>
	<CAArVCkTYuRHjdCKvzg2zixC3zvVzrGo7sCgVR_DoeBH8u8kPLw@mail.gmail.com>
	<838u1vwqj9.fsf@gnu.org>
	<CAArVCkQtqYfyDNmWQ4qQL9QOh8HHrJTSWesG+pqgGdtYwVrc4g@mail.gmail.com>
	<56DC7227.10708@cs.ucla.edu>
	<CAArVCkR3r34qp+iM8t5FaRBVvjjE492A_EQyB05dVPe6inA34w@mail.gmail.com>
	<56DC7F18.8050103@cs.ucla.edu>
	<CAArVCkSLbOO2RH1+VC_MK7aHKuy52jHAnnf8rcednqDPiq=wqQ@mail.gmail.com>
	<83si03v0c3.fsf@gnu.org>
	<CAArVCkTjD+09yPfY50Xb7TbxC2hW_Fuf03pQZy0p0bfAU0hafQ@mail.gmail.com>
NNTP-Posting-Host: plane.gmane.org
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
X-Trace: ger.gmane.org 1457985856 22961 80.91.229.3 (14 Mar 2016 20:04:16 GMT)
X-Complaints-To: usenet@ger.gmane.org
NNTP-Posting-Date: Mon, 14 Mar 2016 20:04:16 +0000 (UTC)
Cc: larsi@gnus.org, johnw@gnu.org, emacs-devel@gnu.org
To: Philipp Stephani <p.stephani2@gmail.com>, Eli Zaretskii <eliz@gnu.org>
Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Mon Mar 14 21:04:07 2016
Return-path: <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>
Envelope-to: ged-emacs-devel@m.gmane.org
Original-Received: from lists.gnu.org ([208.118.235.17])
	by plane.gmane.org with esmtp (Exim 4.69)
	(envelope-from <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>)
	id 1afYij-00061p-FR
	for ged-emacs-devel@m.gmane.org; Mon, 14 Mar 2016 21:04:05 +0100
Original-Received: from localhost ([::1]:43774 helo=lists.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>)
	id 1afYii-00081c-GS
	for ged-emacs-devel@m.gmane.org; Mon, 14 Mar 2016 16:04:04 -0400
Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:41782)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <eggert@cs.ucla.edu>) id 1afYiU-00081K-S9
	for emacs-devel@gnu.org; Mon, 14 Mar 2016 16:03:51 -0400
Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <eggert@cs.ucla.edu>) id 1afYiQ-0006sR-PX
	for emacs-devel@gnu.org; Mon, 14 Mar 2016 16:03:50 -0400
Original-Received: from zimbra.cs.ucla.edu ([131.179.128.68]:49782)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <eggert@cs.ucla.edu>)
	id 1afYiK-0006qQ-WE; Mon, 14 Mar 2016 16:03:41 -0400
Original-Received: from localhost (localhost [127.0.0.1])
	by zimbra.cs.ucla.edu (Postfix) with ESMTP id A92DB160E87;
	Mon, 14 Mar 2016 13:03:39 -0700 (PDT)
Original-Received: from zimbra.cs.ucla.edu ([127.0.0.1])
	by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10032)
	with ESMTP id KSSXnU1VomEP; Mon, 14 Mar 2016 13:03:38 -0700 (PDT)
Original-Received: from localhost (localhost [127.0.0.1])
	by zimbra.cs.ucla.edu (Postfix) with ESMTP id CB5E1160FF7;
	Mon, 14 Mar 2016 13:03:38 -0700 (PDT)
X-Virus-Scanned: amavisd-new at zimbra.cs.ucla.edu
Original-Received: from zimbra.cs.ucla.edu ([127.0.0.1])
	by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10026)
	with ESMTP id Dy9tixqFCwkp; Mon, 14 Mar 2016 13:03:38 -0700 (PDT)
Original-Received: from penguin.cs.ucla.edu (Penguin.CS.UCLA.EDU [131.179.64.200])
	by zimbra.cs.ucla.edu (Postfix) with ESMTPSA id AD01B160E87;
	Mon, 14 Mar 2016 13:03:38 -0700 (PDT)
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101
	Thunderbird/38.6.0
In-Reply-To: <CAArVCkTjD+09yPfY50Xb7TbxC2hW_Fuf03pQZy0p0bfAU0hafQ@mail.gmail.com>
X-detected-operating-system: by eggs.gnu.org: GNU/Linux 3.x
X-Received-From: 131.179.128.68
X-BeenThere: emacs-devel@gnu.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: "Emacs development discussions." <emacs-devel.gnu.org>
List-Unsubscribe: <https://lists.gnu.org/mailman/options/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/archive/html/emacs-devel>
List-Post: <mailto:emacs-devel@gnu.org>
List-Help: <mailto:emacs-devel-request@gnu.org?subject=help>
List-Subscribe: <https://lists.gnu.org/mailman/listinfo/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=subscribe>
Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org
Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org
Xref: news.gmane.org gmane.emacs.devel:201759
Archived-At: <http://permalink.gmane.org/gmane.emacs.devel/201759>

Thanks, here's a detailed low level review.

> Subject: [PATCH 4/4] Use `ucs-names'.

Summary lines like "Use `ucs-names'." should not end with "." and should 
be as informative as possible within a 50-char limit.

> +#include <stdnoreturn.h>

This include reportedly doesn't work well with Microsoft compilers. Omit 
it and use _Noreturn instead of noreturn.

> +/* Signals an `invalid-read-syntax' error indicating that the
> +   character name in an \N{...} literal is invalid.  */

Use active voice "Signal an" rather than a non-sentence. Don't use grave 
quoting in comments (no quoting needed here anyway).

> +static noreturn void invalid_character_name (Lisp_Object name)

Put "static _Noreturn void" on the first line, and the rest on the next 
line; that's the usual GNU style.

> +/* Checks that CODE is a valid Unicode scalar value, and returns its
> +   value.  CODE should be parsed from the character name given by
> +   NAME.  NAME is used for error messages.  */

Active voice: "Checks" -> "Check".

> +static int check_scalar_value (Lisp_Object code, Lisp_Object name)

"static int" in a separate line.

> +{
> +  if (! RANGED_INTEGERP (0, code, MAX_UNICODE_CHAR) ||
> +      /* Don't allow surrogates.  */
> +      RANGED_INTEGERP (0xD800, code, 0xDFFF))
> +    invalid_character_name (name);
> +  return XINT (code);
> +}

RANGED_INTEGERP implies two tests for integer. Better would be an 
explicit NUMBERP check, followed by an XINT, followed by C-language 
range checks. Just use <= or < in range checks (not >= or >).

Also, don't put operators like || at the end of a line; put them at the 
start of the next line instead.

> +/* If NAME starts with PREFIX, interpret the rest as a hexadecimal
> +   number and return its value.  Raises `invalid-read-syntax' if the
> +   number is not a valid scalar value.  Returns -1 if NAME doesn't
> +   start with PREFIX.  */

Active voice. No need for grave quoting.

> +static int
> +parse_code_after_prefix (Lisp_Object name, const char* prefix)

"char* x" -> "char *x" in GNU style.

> +  if (name_len > prefix_len && name_len <= prefix_len + 8

Just use < or <= for range checks.

> +      Lisp_Object code = string_to_number (SDATA (name) + prefix_len, 
> 16, false);
> +      if (! NILP (code))
> +        return check_scalar_value (code, name);

Why is nil treated differently from other invalid values (e.g., 
floating-point numbers)? They're all invalid character names, right?

>
> +      /* Various ranges of CJK characters; see UnicodeData.txt. */
> +      if ((code >= 0x3400 && code <= 0x4DB5) ||
> +          (code >= 0x4E00 && code <= 0x9FD5) ||
> +          (code >= 0x20000 && code <= 0x2A6D6) ||
> +          (code >= 0x2A700 && code <= 0x2B734) ||
> +          (code >= 0x2B740 && code <= 0x2B81D) ||
> +          (code >= 0x2B820 && code <= 0x2CEA1))
> +        return code;

Use only <= here, and put || at the start of lines. What's the 
likelihood that the numbers in the above test will change?

>
> +  if (! CONSP (names))
> +    invalid_syntax ("Unicode character name database not loaded");

This test is not needed, as ucs-names always returns a cons, and anyway 
even if it didn't then Fassoc would do the right thing.

> +        /* 200 characters is hopefully long enough.  Increase if
> +           not.  */
> +        char name[200];

Give a name to this constant, e.g.,

/* Bound on the length of a Unicode character name.
    As of Unicode 9.0.0 the maximum is 83, so this should be safe. */
enum { UNICODE_CHARACTER_NAME_LENGTH_BOUND = 199 };
...
    char name[UNICODE_CHARACTER_NAME_LENGTH_BOUND + 1];