From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: MON KEY Newsgroups: gmane.emacs.bugs Subject: bug#6283: doc/lispref/searching.texi reference to octal code `0377' correct? Date: Mon, 31 May 2010 01:35:41 -0400 Message-ID: References: <83vda9md09.fsf@gnu.org> <83sk5cmr8k.fsf@gnu.org> <83sk5btdcu.fsf@gnu.org> NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 X-Trace: dough.gmane.org 1275285470 12327 80.91.229.12 (31 May 2010 05:57:50 GMT) X-Complaints-To: usenet@dough.gmane.org NNTP-Posting-Date: Mon, 31 May 2010 05:57:50 +0000 (UTC) Cc: 6283@debbugs.gnu.org To: Eli Zaretskii Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Mon May 31 07:57:48 2010 connect(): No such file or directory Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1OIy0R-0003Ic-AZ for geb-bug-gnu-emacs@m.gmane.org; Mon, 31 May 2010 07:57:47 +0200 Original-Received: from localhost ([127.0.0.1]:51953 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1OIy0Q-0004h7-MQ for geb-bug-gnu-emacs@m.gmane.org; Mon, 31 May 2010 01:57:46 -0400 Original-Received: from [140.186.70.92] (port=42435 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1OIy0J-0004er-01 for bug-gnu-emacs@gnu.org; Mon, 31 May 2010 01:57:40 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.69) (envelope-from ) id 1OIy0H-0000wY-Pt for bug-gnu-emacs@gnu.org; Mon, 31 May 2010 01:57:38 -0400 Original-Received: from debbugs.gnu.org ([140.186.70.43]:33118) by eggs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1OIy0H-0000wS-Mc for bug-gnu-emacs@gnu.org; Mon, 31 May 2010 01:57:37 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.69) (envelope-from ) id 1OIxfO-0007YR-2T; Mon, 31 May 2010 01:36:02 -0400 X-Loop: help-debbugs@gnu.org Resent-From: MON KEY Original-Sender: debbugs-submit-bounces@debbugs.gnu.org Resent-To: owner@debbugs.gnu.org Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Mon, 31 May 2010 05:36:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 6283 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: Original-Received: via spool by 6283-submit@debbugs.gnu.org id=B6283.127528414729031 (code B ref 6283); Mon, 31 May 2010 05:36:02 +0000 Original-Received: (at 6283) by debbugs.gnu.org; 31 May 2010 05:35:47 +0000 Original-Received: from localhost ([127.0.0.1] helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1OIxf8-0007YC-GJ for submit@debbugs.gnu.org; Mon, 31 May 2010 01:35:46 -0400 Original-Received: from mail-yw0-f196.google.com ([209.85.211.196]) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1OIxf6-0007Y7-Ih for 6283@debbugs.gnu.org; Mon, 31 May 2010 01:35:45 -0400 Original-Received: by ywh34 with SMTP id 34so2418389ywh.30 for <6283@debbugs.gnu.org>; Sun, 30 May 2010 22:35:41 -0700 (PDT) Original-Received: by 10.150.188.9 with SMTP id l9mr4729812ybf.109.1275284141134; Sun, 30 May 2010 22:35:41 -0700 (PDT) Original-Received: by 10.151.143.21 with HTTP; Sun, 30 May 2010 22:35:41 -0700 (PDT) In-Reply-To: <83sk5btdcu.fsf@gnu.org> X-Google-Sender-Auth: yeGz3sbzk3vmjGD9XJxQOjS2Ep0 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.11 Precedence: list Resent-Date: Mon, 31 May 2010 01:36:02 -0400 X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 3) X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.bugs:37433 Archived-At: On Sat, May 29, 2010 at 2:45 AM, Eli Zaretskii wrote: > > It's not an Emacs convention to represent characters by their > codepoints expressed in octal. It's a widely accepted practice. If > we were to describe every convention in the world in the manual, 99% > of the manual would be devoted to describing conventions. > That it is widely accepted practice is what makes it a convention. Within Emacs lisp it also widely accepted practice to denote numeric representations with # notation. This is a conflict of convention. The purpose of demarcating the use of a particular convention in the stead of another is to clarify when one shall be used with preference over another. It is unconventional for the manual to use conflicting conventions without prejudice. This is my concern. > Again, this part of the manual is not about how Emacs represents > characters or reads them. It's about their codes. This is how I understood this portion of the manual. Maybe I'm misunderstanding something fundamental about this distinction. If this is so, I would greatly appreciate it if you could help me to see it more clearly. >> 0377 doesn't have a character that I'm aware of. > > In Unicode, it's a codepoint of LATIN SMALL LETTER Y WITH DIAERESIS. I don't understand this. > > But the text says "...many non-ASCII characters have codes above octal > 0377". It doesn't talk about a specific character here, just about > which codepoints are below it and which are above it. Yes, but the regexp is "[\200-\377]". > > I didn't say that we are going to remove these features any time soon. > Just that the manual doesn't talk too much about this, to avoid > confusing users with issues that are both very complicated and very > obscure, and are rarely if at all needed on the Lisp level. > I certainly agree they are confusing and easily misunderstood. I disagree however that these issues are all that obscure. You seem to suggest that the notation "octal 0NNN" is commonplace yet i personally find this notation to be obscure. tomato|potato <-> potato|tomato > > Of course. But why do you expect to find the description of such > abuse in the manual? > I _do_ find them whereas I don't find reference such w/re the 0377 convention. This is, I guess, my concern. Following is my attempt to come to grips with the distinction between the numeric codepoint, integer character representations, reader conventions etc. w/re the manual and particularly their use in conjuction w/ regexps. I believe this example illustrates some reasonable familiarity with aspects of char/code representation. But maybe this bit of code can help to show if is there something that I am not getting??? ;;; ================================================================ (let (chars-found frob-found) (with-temp-buffer (save-excursion (insert 10 255 10 ?\377 10 "\255" 10 4194221 10 "\377" 10 4194303)) (while (search-forward-regexp "[\200-\377]" nil t) (let* ((md (match-data t)) (md-char (char-before (cadr md)))) (push `(,md-char ,(car md) ,(cadr md)) chars-found)))) (setq chars-found (nreverse chars-found)) (dolist (cf chars-found (setq chars-found `(,(setq frob-found (nreverse frob-found)) ,chars-found))) (push (car (read-from-string (format "#o%o" (car cf)))) frob-found)) (setq frob-found nil) (dolist (ints (car chars-found) (setq chars-found `(,(setq frob-found (nreverse frob-found)) ,@chars-found))) (push `(,ints . ,(char-to-string ints)) frob-found)) (setq frob-found nil) (dolist (d (car chars-found) (setq chars-found `(,(setq frob-found (nreverse frob-found)) ,@chars-found))) (let* ((mltb-int (car d)) (unib-str (cdr d)) (unib-str->mchar (string-to-char (symbol-name (read unib-str)))) (mltb-int->uchar (multibyte-char-to-unibyte mltb-int))) (push `(:mltb-int ,mltb-int :unib-str ,unib-str :unib-str->mchar ,unib-str->mchar :mltb-int->uchar ,mltb-int->uchar) frob-found))) (insert 10 (make-string 68 59) 10 ";; With this regexp:" 10 ";; \(search-forward-regexp \"[\\200-\\377]\" nil t\)" 10 ";; Matched these chars:" 10 255 10 ?\377 10 "\255" 10 4194221 10 "\377" 10 4194303 10 (make-string 68 59) 10) (pp chars-found (current-buffer)) (insert (make-string 68 59) "\n") (let ((cnt 0)) (dolist (pl (car chars-found)) (setq cnt (1+ cnt)) (insert 10 (make-string 68 59) 10 (format (concat ";; :MATCH-DATA-#%d\n" "\n(char-to-string (unibyte-char-to-multibyte %d)) ;<-\"%c%d\"\n" "\n(insert (char-to-string (unibyte-char-to-multibyte %d))) ;<- multibyte-char\n" "\n(insert (identity %S)) ;<- raw-byte\n" "\n(insert (string-to-char (identity %S))) ;<- multibyte-char\n" "\n(insert-byte %d 1) ;<-raw-byte unibyte-char\n" "\n(insert (format \"(insert (identity #o%%o))\" (unibyte-char-to-multibyte %d)))\n") cnt (plist-get pl :mltb-int->uchar) 92 (string-to-number (format "%o" (plist-get pl :mltb-int->uchar))) (plist-get pl :mltb-int->uchar) (plist-get pl :unib-str) (plist-get pl :unib-str) (plist-get pl :mltb-int->uchar) (plist-get pl :mltb-int->uchar)))))) ;;; ================================================================ -- /s_P\