From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: MON KEY Newsgroups: gmane.emacs.devel Subject: Re: raw-byte and char-table Date: Thu, 26 Aug 2010 01:30:11 -0400 Message-ID: References: NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Trace: dough.gmane.org 1282800629 22773 80.91.229.12 (26 Aug 2010 05:30:29 GMT) X-Complaints-To: usenet@dough.gmane.org NNTP-Posting-Date: Thu, 26 Aug 2010 05:30:29 +0000 (UTC) Cc: Stefan Monnier , emacs-devel@gnu.org To: Kenichi Handa Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Thu Aug 26 07:30:27 2010 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1OoV2g-00021b-VV for ged-emacs-devel@m.gmane.org; Thu, 26 Aug 2010 07:30:27 +0200 Original-Received: from localhost ([127.0.0.1]:58468 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1OoV2f-0005j6-PV for ged-emacs-devel@m.gmane.org; Thu, 26 Aug 2010 01:30:26 -0400 Original-Received: from [140.186.70.92] (port=47850 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1OoV2V-0005iY-Jj for emacs-devel@gnu.org; Thu, 26 Aug 2010 01:30:16 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.69) (envelope-from ) id 1OoV2T-0000Wc-Tu for emacs-devel@gnu.org; Thu, 26 Aug 2010 01:30:15 -0400 Original-Received: from mail-ww0-f49.google.com ([74.125.82.49]:50797) by eggs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1OoV2T-0000WQ-OP for emacs-devel@gnu.org; Thu, 26 Aug 2010 01:30:13 -0400 Original-Received: by wwj40 with SMTP id 40so2025308wwj.30 for ; Wed, 25 Aug 2010 22:30:12 -0700 (PDT) Original-Received: by 10.227.144.206 with SMTP id a14mr8351742wbv.112.1282800612043; Wed, 25 Aug 2010 22:30:12 -0700 (PDT) Original-Received: by 10.216.65.140 with HTTP; Wed, 25 Aug 2010 22:30:11 -0700 (PDT) In-Reply-To: X-Google-Sender-Auth: WSF2EQ0M0Hj6fiNQY5FXVEN_GeY X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 2) X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:129239 Archived-At: On Wed, Aug 25, 2010 at 11:34 PM, Kenichi Handa wrote: > In article , MON KEY writes: > >> > Number like #x3FFFA0 is so criptic. =C2=A0The function name >> > unibyte-char-to-multibyte is also not ideal, but I think >> > it's better than #x3FFFA0. > >> Maybe I am misunderstanding, but I think the `#x' and `#o' syntax is >> not cryptic at all in the context. > > I'm not arguing that the syntax is cryptic. What I want to > say is that it is difficult for one who reads the code to > understand what #x3FFFA0 means. So the syntax aren't the problem its their semantic denotation. This is the realm of Tarski and McDermott[1]. Regardless, right now it is all confusing (esp. for those of us less inclined to differentiating the multibyte/unibyte distinction). > >> This signals an error: >> =C2=A0(unibyte-char-to-multibyte >> =C2=A0 (unibyte-char-to-multibyte 160)) > > Yes, but is it a problem? I would urge that it is a problem wherever the numerical denotation has no visible/nameable/printable corollary. Why should it be allowed to be problem if it can be avoided? > >> > We could provide a ?\NNN (or similar) notation for it. =C2=A0Similarly= to >> > what we do for those bytes in multibyte strings. > >> Howsabout just this one for all of them: > >> =C2=A0`#\' > > Do you mean that making #\240 to be read as #x3FFFA0? > > Do you mean that making #\240 to be read as #x3FFFA0? Half-jokingly, Yes. (assuming the #\240 above is the the code-point 0xA0) Though, I _also_ had these things in mind as well: #\8-bit-240 or #\byte-240 Which would allow referencing these chars by something other than a numeric id. E.g. in some other dialects of Lisp there is this type of behaviour: CL-USER> #\ ;<-that's a #x9 after the \ ;=3D> #\Tab CL-USER> #\ ;<- that's a #xa after the \ ;=3D> ; #\Newline CL-USER> #\NO-BREAK_SPACE ;<-that's the char-name for #xa0 ;=3D> #\NO-BREAK_SPACE ;<-return is as per `identity' CL-USER> (identity #\NO-BREAK_SPACE) ;=3D> #\NO-BREAK_SPACE CL-USER> (princ #\=C2=A0) ;=3D> ; #\NO-BREAK_SPACE CL-USER> (prin1 #\=C2=A0) ;=3D> #\NO-BREAK_SPACE ; #\NO-BREAK_SPACE CL-USER> #\ ;<- That's a #x20 after the \ ;=3D> #\ CL-USER> (char-code #\ ) 32 CL-USER> (describe #\ ) ;=3D> #\ ; [standard-char] ; ; :_Char-code: 32 ; :_Char-name: Space ; _ The idea being that where those chars in the above example don't have visibly "printable" representations but the `#\' reader syntax _does_ recognize them either by char-name or a readable identity, e.g.: CL-USER> (read-char) =06 ;=3D> #\Ack Of course, introduction of this type of read syntax to Emacs lisp would (or at least it should) imply extension to all characters unibyte and multibyte... Hence the ":)" smiley in my previous response to Stefan. [1] McDermott, Drew (1978). Tarskian semantics, or no notation without denotation. Cognitive Science 2:277-82. -- /s_P\