From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Paul Eggert Newsgroups: gmane.emacs.devel Subject: Re: Character literals for Unicode (control) characters Date: Mon, 14 Mar 2016 13:03:38 -0700 Organization: UCLA Computer Science Department Message-ID: <56E7191A.60507@cs.ucla.edu> References: <87r3fsjenn.fsf@gnus.org> <56D8623F.6060806@cs.ucla.edu> <838u1vwqj9.fsf@gnu.org> <56DC7227.10708@cs.ucla.edu> <56DC7F18.8050103@cs.ucla.edu> <83si03v0c3.fsf@gnu.org> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit X-Trace: ger.gmane.org 1457985856 22961 80.91.229.3 (14 Mar 2016 20:04:16 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Mon, 14 Mar 2016 20:04:16 +0000 (UTC) Cc: larsi@gnus.org, johnw@gnu.org, emacs-devel@gnu.org To: Philipp Stephani , Eli Zaretskii Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Mon Mar 14 21:04:07 2016 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1afYij-00061p-FR for ged-emacs-devel@m.gmane.org; Mon, 14 Mar 2016 21:04:05 +0100 Original-Received: from localhost ([::1]:43774 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1afYii-00081c-GS for ged-emacs-devel@m.gmane.org; Mon, 14 Mar 2016 16:04:04 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:41782) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1afYiU-00081K-S9 for emacs-devel@gnu.org; Mon, 14 Mar 2016 16:03:51 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1afYiQ-0006sR-PX for emacs-devel@gnu.org; Mon, 14 Mar 2016 16:03:50 -0400 Original-Received: from zimbra.cs.ucla.edu ([131.179.128.68]:49782) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1afYiK-0006qQ-WE; Mon, 14 Mar 2016 16:03:41 -0400 Original-Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id A92DB160E87; Mon, 14 Mar 2016 13:03:39 -0700 (PDT) Original-Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id KSSXnU1VomEP; Mon, 14 Mar 2016 13:03:38 -0700 (PDT) Original-Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id CB5E1160FF7; Mon, 14 Mar 2016 13:03:38 -0700 (PDT) X-Virus-Scanned: amavisd-new at zimbra.cs.ucla.edu Original-Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id Dy9tixqFCwkp; Mon, 14 Mar 2016 13:03:38 -0700 (PDT) Original-Received: from penguin.cs.ucla.edu (Penguin.CS.UCLA.EDU [131.179.64.200]) by zimbra.cs.ucla.edu (Postfix) with ESMTPSA id AD01B160E87; Mon, 14 Mar 2016 13:03:38 -0700 (PDT) User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.6.0 In-Reply-To: X-detected-operating-system: by eggs.gnu.org: GNU/Linux 3.x X-Received-From: 131.179.128.68 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:201759 Archived-At: Thanks, here's a detailed low level review. > Subject: [PATCH 4/4] Use `ucs-names'. Summary lines like "Use `ucs-names'." should not end with "." and should be as informative as possible within a 50-char limit. > +#include This include reportedly doesn't work well with Microsoft compilers. Omit it and use _Noreturn instead of noreturn. > +/* Signals an `invalid-read-syntax' error indicating that the > + character name in an \N{...} literal is invalid. */ Use active voice "Signal an" rather than a non-sentence. Don't use grave quoting in comments (no quoting needed here anyway). > +static noreturn void invalid_character_name (Lisp_Object name) Put "static _Noreturn void" on the first line, and the rest on the next line; that's the usual GNU style. > +/* Checks that CODE is a valid Unicode scalar value, and returns its > + value. CODE should be parsed from the character name given by > + NAME. NAME is used for error messages. */ Active voice: "Checks" -> "Check". > +static int check_scalar_value (Lisp_Object code, Lisp_Object name) "static int" in a separate line. > +{ > + if (! RANGED_INTEGERP (0, code, MAX_UNICODE_CHAR) || > + /* Don't allow surrogates. */ > + RANGED_INTEGERP (0xD800, code, 0xDFFF)) > + invalid_character_name (name); > + return XINT (code); > +} RANGED_INTEGERP implies two tests for integer. Better would be an explicit NUMBERP check, followed by an XINT, followed by C-language range checks. Just use <= or < in range checks (not >= or >). Also, don't put operators like || at the end of a line; put them at the start of the next line instead. > +/* If NAME starts with PREFIX, interpret the rest as a hexadecimal > + number and return its value. Raises `invalid-read-syntax' if the > + number is not a valid scalar value. Returns -1 if NAME doesn't > + start with PREFIX. */ Active voice. No need for grave quoting. > +static int > +parse_code_after_prefix (Lisp_Object name, const char* prefix) "char* x" -> "char *x" in GNU style. > + if (name_len > prefix_len && name_len <= prefix_len + 8 Just use < or <= for range checks. > + Lisp_Object code = string_to_number (SDATA (name) + prefix_len, > 16, false); > + if (! NILP (code)) > + return check_scalar_value (code, name); Why is nil treated differently from other invalid values (e.g., floating-point numbers)? They're all invalid character names, right? > > + /* Various ranges of CJK characters; see UnicodeData.txt. */ > + if ((code >= 0x3400 && code <= 0x4DB5) || > + (code >= 0x4E00 && code <= 0x9FD5) || > + (code >= 0x20000 && code <= 0x2A6D6) || > + (code >= 0x2A700 && code <= 0x2B734) || > + (code >= 0x2B740 && code <= 0x2B81D) || > + (code >= 0x2B820 && code <= 0x2CEA1)) > + return code; Use only <= here, and put || at the start of lines. What's the likelihood that the numbers in the above test will change? > > + if (! CONSP (names)) > + invalid_syntax ("Unicode character name database not loaded"); This test is not needed, as ucs-names always returns a cons, and anyway even if it didn't then Fassoc would do the right thing. > + /* 200 characters is hopefully long enough. Increase if > + not. */ > + char name[200]; Give a name to this constant, e.g., /* Bound on the length of a Unicode character name. As of Unicode 9.0.0 the maximum is 83, so this should be safe. */ enum { UNICODE_CHARACTER_NAME_LENGTH_BOUND = 199 }; ... char name[UNICODE_CHARACTER_NAME_LENGTH_BOUND + 1];