all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: MON KEY <monkey@sandpframing.com>
To: Eli Zaretskii <eliz@gnu.org>
Cc: 6283@debbugs.gnu.org
Subject: bug#6283: doc/lispref/searching.texi reference to octal code `0377' correct?
Date: Mon, 31 May 2010 01:35:41 -0400	[thread overview]
Message-ID: <AANLkTim-cnLC-CJ8QCRc7EJydSDtDj2T87fZ-yaYYz9n@mail.gmail.com> (raw)
In-Reply-To: <83sk5btdcu.fsf@gnu.org>

On Sat, May 29, 2010 at 2:45 AM, Eli Zaretskii <eliz@gnu.org> wrote:
>
> It's not an Emacs convention to represent characters by their
> codepoints expressed in octal.  It's a widely accepted practice.  If
> we were to describe every convention in the world in the manual, 99%
> of the manual would be devoted to describing conventions.
>

That it is widely accepted practice is what makes it a convention.
Within Emacs lisp it also widely accepted practice to denote numeric
representations with #<radixN> notation. This is a conflict of
convention. The purpose of demarcating the use of a particular
convention in the stead of another is to clarify when one shall be
used with preference over another. It is unconventional for the manual
to use conflicting conventions without prejudice. This is my concern.

> Again, this part of the manual is not about how Emacs represents
> characters or reads them.  It's about their codes.

This is how I understood this portion of the manual.
Maybe I'm misunderstanding something fundamental about this distinction.

If this is so, I would greatly appreciate it if you could help me to
see it more clearly.

>> 0377 doesn't have a character that I'm aware of.
>
> In Unicode, it's a codepoint of LATIN SMALL LETTER Y WITH DIAERESIS.

I don't understand this.

>
> But the text says "...many non-ASCII characters have codes above octal
> 0377".  It doesn't talk about a specific character here, just about
> which codepoints are below it and which are above it.

Yes, but the regexp is "[\200-\377]".

>
> I didn't say that we are going to remove these features any time soon.
> Just that the manual doesn't talk too much about this, to avoid
> confusing users with issues that are both very complicated and very
> obscure, and are rarely if at all needed on the Lisp level.
>

I certainly agree they are confusing and easily misunderstood.
I disagree however that these issues are all that obscure.
You seem to suggest that the notation "octal 0NNN" is commonplace yet
i personally find this notation to be obscure.

tomato|potato <-> potato|tomato

>
> Of course.  But why do you expect to find the description of such
> abuse in the manual?
>

I _do_ find them whereas I don't find reference such w/re the 0377 convention.
This is, I guess, my concern.

Following is my attempt to come to grips with the distinction between
the numeric codepoint, integer character representations, reader
conventions etc. w/re the manual and particularly their use in
conjuction w/ regexps.  I believe this example illustrates some
reasonable familiarity with aspects of char/code representation.

But maybe this bit of code can help to show if is there something that
I am not getting???

;;; ================================================================

(let (chars-found frob-found)
  (with-temp-buffer
    (save-excursion
      (insert 10 255 10 ?\377 10 "\255" 10 4194221 10 "\377" 10 4194303))
    (while (search-forward-regexp "[\200-\377]" nil t)
      (let* ((md (match-data t))
             (md-char (char-before (cadr md))))
        (push `(,md-char ,(car md) ,(cadr md)) chars-found))))
  (setq chars-found (nreverse chars-found))
  (dolist (cf chars-found
              (setq chars-found
                    `(,(setq frob-found (nreverse frob-found))
                      ,chars-found)))
    (push (car (read-from-string (format "#o%o" (car cf)))) frob-found))
  (setq frob-found nil)
  (dolist (ints (car chars-found)
                (setq chars-found
                      `(,(setq frob-found (nreverse frob-found))
                        ,@chars-found)))
    (push `(,ints . ,(char-to-string ints)) frob-found))
  (setq frob-found nil)
  (dolist (d (car chars-found)
             (setq chars-found
                   `(,(setq frob-found (nreverse frob-found)) ,@chars-found)))
    (let* ((mltb-int (car d))
           (unib-str (cdr d))
           (unib-str->mchar (string-to-char (symbol-name (read unib-str))))
           (mltb-int->uchar (multibyte-char-to-unibyte mltb-int)))
      (push `(:mltb-int ,mltb-int
                        :unib-str ,unib-str
                        :unib-str->mchar ,unib-str->mchar
                        :mltb-int->uchar ,mltb-int->uchar)
            frob-found)))
  (insert 10 (make-string 68 59) 10
          ";; With this regexp:" 10
          ";; \(search-forward-regexp \"[\\200-\\377]\" nil t\)" 10
          ";; Matched these chars:" 10
          255 10 ?\377 10 "\255" 10 4194221 10 "\377" 10 4194303 10
          (make-string 68 59) 10)
  (pp chars-found (current-buffer))
  (insert (make-string 68 59) "\n")
  (let ((cnt 0))
    (dolist (pl (car chars-found))
      (setq cnt (1+ cnt))
      (insert
       10 (make-string 68 59) 10
       (format
        (concat
         ";; :MATCH-DATA-#%d\n"
         "\n(char-to-string (unibyte-char-to-multibyte %d)) ;<-\"%c%d\"\n"
         "\n(insert (char-to-string (unibyte-char-to-multibyte %d)))
;<- multibyte-char\n"
         "\n(insert (identity %S)) ;<- raw-byte\n"
         "\n(insert (string-to-char (identity %S))) ;<- multibyte-char\n"
         "\n(insert-byte %d 1) ;<-raw-byte unibyte-char\n"
         "\n(insert (format \"(insert (identity #o%%o))\"
(unibyte-char-to-multibyte %d)))\n")
        cnt
        (plist-get pl :mltb-int->uchar)
        92
        (string-to-number (format "%o" (plist-get pl :mltb-int->uchar)))
        (plist-get pl :mltb-int->uchar)
        (plist-get pl :unib-str)
        (plist-get pl :unib-str)
        (plist-get pl :mltb-int->uchar)
        (plist-get pl :mltb-int->uchar))))))

;;; ================================================================

--
/s_P\





  reply	other threads:[~2010-05-31  5:35 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-05-27 17:28 bug#6283: doc/lispref/searching.texi reference to octal code `0377' correct? MON KEY
2010-05-27 18:10 ` Eli Zaretskii
2010-05-27 22:59   ` MON KEY
2010-05-29 14:28     ` Kevin Rodgers
     [not found]   ` <AANLkTikjCByug1U69tbhsnmS4c1VXSNzoqAOAxmbt3bI@mail.gmail.com>
2010-05-28  7:15     ` Eli Zaretskii
2010-05-28 23:20       ` MON KEY
2010-05-29  6:45         ` Eli Zaretskii
2010-05-31  5:35           ` MON KEY [this message]
2010-05-31 18:49             ` Eli Zaretskii
2010-06-01  0:24               ` MON KEY
2010-06-01 18:38                 ` Eli Zaretskii
2010-06-02 19:41                   ` MON KEY
2010-06-03 14:39                     ` Kevin Rodgers
2010-05-31 14:45           ` MON KEY
2010-05-31 18:51             ` Eli Zaretskii
2010-05-31 23:44 ` MON KEY
2010-06-02 16:06 ` MON KEY
2010-06-02 17:30   ` Chong Yidong
2010-06-02 17:46   ` Eli Zaretskii

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=AANLkTim-cnLC-CJ8QCRc7EJydSDtDj2T87fZ-yaYYz9n@mail.gmail.com \
    --to=monkey@sandpframing.com \
    --cc=6283@debbugs.gnu.org \
    --cc=eliz@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.