unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
From: Drew Adams <drew.adams@oracle.com>
To: "Stephen J. Turnbull" <stephen@xemacs.org>
Cc: emacs-devel@gnu.org
Subject: RE: char equivalence classes in search - why not symmetric?
Date: Wed, 9 Sep 2015 15:52:55 -0700 (PDT)	[thread overview]
Message-ID: <4bf04d46-418d-4950-9de3-d9f9130ce8bf@default> (raw)
In-Reply-To: <87fv2o24mf.fsf@uwakimon.sk.tsukuba.ac.jp>

[-- Attachment #1: Type: text/plain, Size: 1944 bytes --]

>  > I've approached this question only from a user point of
>  > view (it is useful to be able to do it).
> 
> Well, since I'm not going to do it any time soon, and you
> haven't even considered doing it yet, this thread is moot.

AFAICT, this (or similar) is the only code needed.  It fixes
the char-table entries for the equivalent chars, so each points
to the equivalence class and not just to itself.  (Currently,
only the "base" char points to the equivalence class.)

;; Add an entry for each equivalent char.
(let ((others  ()))
  (map-char-table
   (lambda (base v)
     (let ((chrs  (aref equiv base)))
       (when (consp chrs)
         (dolist (chr  (cdr chrs))
           (push (cons (string-to-char chr) (remove chr chrs))
                 others)))))
   equiv)
  (dolist (it  others)
    (let ((base   (car it))
          (chars  (cdr it)))
      (aset equiv base (append chars (aref equiv base)))))))

This code fragment is included in the attached code that updates
`character-fold-table'.  Evaluate the attached code, to try the
behavior proposed in this thread.

The attached code provides:

* A Boolean option, `char-fold-symmetric', so you can choose which
  behavior you want.  (Let users decide, instead of "flipping a
  coin" at design time.)

  If you use Customize (or the equivalent) to change the option
  value then `character-fold-table' is automatically updated to
  reflect the new option value.

* A function that updates `character-fold-table' to reflect the
  option value.  It evaluates the above code conditionally.

Just as now, you can use M-s ' to toggle char folding.  With the
option value non-nil you get the behavior proposed in this thread.
With the option value nil you get the current, more limited behavior.

[I'm no expert on char tables.  Perhaps the code could be improved.
But this seems to work OK.  I think it exhibits the proposed behavior.]

[-- Attachment #2: symmetric-char-fold.el --]
[-- Type: application/octet-stream, Size: 5948 bytes --]

;; Load this file, to evaluate these two definitions in order.
;;
;; The second is an option that lets you choose the proposed behavior
;; or the current Emacs behavior, for character folding.  The first is
;; a function that redefines the char-table used for character folding
;; (`character-fold-table'), so that it reflects the option value.
;;
;; When the option is non-nil, `character-fold-table' includes
;; equivalence entries for each member of a char-folding class (an
;; equivalence class wrt search).  When the option is nil,
;; `character-fold-table' includes equivalence entries only for the
;; "base" character of each class.
;;
;; Use M-' to toggle char folding, as usual.

(defun update-char-fold-table ()
  "Update the value of variable `character-fold-table'.
The new value reflects the current value of `char-fold-symmetric'."
  (setq character-fold-table
        (let* ((equiv (make-char-table 'character-fold-table))
               (table (unicode-property-table-internal 'decomposition))
               (func (char-table-extra-slot table 1)))
          ;; Ensure the table is populated.
          (map-char-table
           (lambda (i v) (when (consp i) (funcall func (car i) v table)))
           table)
          ;; Compile a list of all complex chars that each simple char should match.
          (map-char-table
           (lambda (i dec)
             (when (consp dec)
               ;; Discard a possible formatting tag.
               (when (symbolp (car dec))
                 (setq dec (cdr dec)))
               ;; Skip trivial cases like ?a decomposing to (?a).
               (unless (or (and (eq i (car dec))  (not  (cdr dec))))
                 (let ((d dec)
                       (fold-decomp t)
                       k found)
                   (while (and d (not found))
                     (setq k (pop d))
                     ;; Is k a number or letter, per unicode standard?
                     (setq found (memq (get-char-code-property k 'general-category)
                                       '(Lu Ll Lt Lm Lo Nd Nl No))))
                   (if found
                       ;; Check if the decomposition has more than one letter,
                       ;; because then we don't want the first letter to match
                       ;; the decomposition.
                       (dolist (k d)
                         (when (and fold-decomp
                                    (memq (get-char-code-property k 'general-category)
                                          '(Lu Ll Lt Lm Lo Nd Nl No)))
                           (setq fold-decomp nil)))
                     ;; If there's no number or letter on the
                     ;; decomposition, take the first character in it.
                     (setq found (car-safe dec)))
                   ;; Finally, we only fold multi-char decomposition if at
                   ;; least one of the chars is non-spacing (combining).
                   (when fold-decomp
                     (setq fold-decomp nil)
                     (dolist (k dec)
                       (when (and (not fold-decomp)
                                  (> (get-char-code-property k 'canonical-combining-class) 0))
                         (setq fold-decomp t))))
                   ;; Add i to the list of characters that k can
                   ;; represent. Also possibly add its decomposition, so we can
                   ;; match multi-char representations like (format "a%c" 769)
                   (when (and found (not (eq i k)))
                     (let ((chars (cons (char-to-string i) (aref equiv k))))
                       (aset equiv k (if fold-decomp
                                         (cons (apply #'string dec) chars)
                                       chars))))))))
           table)
          ;; Add some manual entries.
          (dolist (it '((?\" """ "“" "”" "”" "„" "⹂" "〞" "‟" "‟" "❞" "❝"
                         "❠" "“" "„" "〝" "〟" "🙷" "🙶" "🙸" "«" "»")
                        (?' "❟" "❛" "❜" "‘" "’" "‚" "‛" "‚" "󠀢" "❮" "❯" "‹" "›")
                        (?` "❛" "‘" "‛" "󠀢" "❮" "‹")))
            (let ((idx (car it))
                  (chars (cdr it)))
              (aset equiv idx (append chars (aref equiv idx)))))

          ;; --------8<------the only addition----------------
          (when char-fold-symmetric
            ;; Add an entry for each equivalent char.
            (let ((others  ()))
              (map-char-table
               (lambda (base v)
                 (let ((chrs  (aref equiv base)))
                   (when (consp chrs)
                     (dolist (chr  (cdr chrs))
                       (push (cons (string-to-char chr) (remove chr chrs)) others)))))
               equiv)
              (dolist (it  others)
                (let ((base   (car it))
                      (chars  (cdr it)))
                  (aset equiv base (append chars (aref equiv base)))))))
          ;; --------8<---------------------------------------

          ;; Convert the lists of characters we compiled into regexps.
          (map-char-table
           (lambda (i v) (let ((re (regexp-opt (cons (char-to-string i) v))))
                      (if (consp i)
                          (set-char-table-range equiv i re)
                        (aset equiv i re))))
           equiv)
          equiv)))

(defcustom char-fold-symmetric t
  "Non-nil means char-fold searching treats equivalent chars the same.
That is, use of any of a set of char-fold equivalent chars in a search
string finds any of them in the text being searched.

If nil then only the \"base\" or \"canonical\" char of the set matches
any of them.  The others match only themselves, even when char-folding
is turned on."
  :set (lambda (sym defs)
         (custom-set-default sym defs)
         (update-char-fold-table))
  :type 'boolean :group 'isearch)


  reply	other threads:[~2015-09-09 22:52 UTC|newest]

Thread overview: 86+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-09-01 15:46 char equivalence classes in search - why not symmetric? Drew Adams
2015-09-01 15:52 ` Davis Herring
2015-09-01 16:51   ` Stefan Monnier
2015-09-01 17:51   ` Drew Adams
2015-09-01 18:40     ` Davis Herring
2015-09-01 19:09       ` Drew Adams
2015-09-01 22:45       ` Juri Linkov
2015-09-02  0:33         ` Drew Adams
2015-09-01 20:10     ` Stephen J. Turnbull
2015-09-01 16:16 ` Eli Zaretskii
     [not found]   ` <<38061f42-eaf1-47c6-b74d-f676ac952b18@default>
     [not found]     ` <<83r3miatvl.fsf@gnu.org>
     [not found]       ` <<21998.29683.916211.867479@a1i15.kph.uni-mainz.de>
     [not found]         ` <<9A972800-D8F0-4DA8-877E-07D5BDC2E1F9@gmail.com>
2015-09-01 17:50   ` Drew Adams
2015-09-01 18:15     ` Eli Zaretskii
2015-09-01 18:46       ` Drew Adams
2015-09-01 19:19         ` Eli Zaretskii
2015-09-01 20:15           ` Drew Adams
2015-09-08  5:36       ` Ulrich Mueller
2015-09-08  6:04         ` Jean-Christophe Helary
2015-09-08 13:31           ` Stephen J. Turnbull
2015-09-08 14:24             ` Drew Adams
2015-09-08 15:21               ` Stephen J. Turnbull
2015-09-08 16:58                 ` Drew Adams
2015-09-08 17:38                   ` Stephen J. Turnbull
2015-09-09 22:52                     ` Drew Adams [this message]
2015-09-10  3:12                       ` Drew Adams
2015-09-10 21:46                         ` Drew Adams
2015-09-08 20:15               ` Richard Stallman
2015-09-08 20:15               ` Richard Stallman
2015-09-08 21:25                 ` Drew Adams
2015-09-09 15:07                   ` Richard Stallman
2015-09-09 15:21                     ` Drew Adams
2015-09-10  2:03                       ` Richard Stallman
2015-09-10  3:23                         ` Drew Adams
2015-09-11 10:28                           ` Richard Stallman
2015-09-11 13:28                             ` Stefan Monnier
2015-09-11 16:33                               ` Drew Adams
2015-09-11 20:59                                 ` Juri Linkov
2015-09-11 23:11                                   ` Drew Adams
2015-09-12 15:28                               ` Richard Stallman
2015-09-11 16:31                             ` Drew Adams
2015-09-11 10:28                           ` Richard Stallman
2015-09-11 16:31                             ` Drew Adams
2015-09-12 15:29                               ` Richard Stallman
     [not found]             ` <<8cf269bc-69d8-4752-8506-de8d992512e1@default>
     [not found]               ` <<E1ZZPIS-0005rf-DJ@fencepost.gnu.org>
2015-09-08 21:46                 ` Drew Adams
     [not found]               ` <<E1ZZPIT-0005s6-ST@fencepost.gnu.org>
     [not found]                 ` <<da54a6cb-90eb-481d-aa20-acfad612e709@default>
     [not found]                   ` <<E1ZZgxz-0006X2-Bg@fencepost.gnu.org>
     [not found]                     ` <<cb107072-7f90-41fb-9aff-075d50eb65bb@default>
     [not found]                       ` <<E1ZZrCm-0001x4-9a@fencepost.gnu.org>
     [not found]                         ` <<4f3b1db3-d3d2-480f-8662-fbf7c74aa67f@default>
     [not found]                           ` <<E1ZaLZR-0002Bf-8q@fencepost.gnu.org>
     [not found]                             ` <<e77f8e7b-581f-436d-816a-c8daed734ff5@default>
     [not found]                               ` <<E1ZamkM-0005d4-RN@fencepost.gnu.org>
2015-09-12 15:59                                 ` Drew Adams
2015-09-08 13:39           ` Drew Adams
2015-09-08 21:19             ` Juri Linkov
2015-09-09 15:07               ` Richard Stallman
2015-09-08 15:47         ` Eli Zaretskii
2015-09-08 16:57           ` Drew Adams
2015-09-08 21:20           ` Juri Linkov
2015-09-09  2:42             ` Eli Zaretskii
2015-09-09 11:23               ` Artur Malabarba
2015-09-09 13:32                 ` Drew Adams
2015-09-09 15:12                 ` Richard Stallman
2015-09-11 20:50                   ` Juri Linkov
     [not found]               ` <<CAAdUY-JMQVsRFku8nwX8JcA9k6Y9sHWoVL6ZC60RHnjoj0cd+Q@mail.gmail.com>
     [not found]                 ` <<E1ZZh2a-0003u6-Fj@fencepost.gnu.org>
2015-09-09 15:22                   ` Drew Adams
2015-09-10  2:03                     ` Richard Stallman
2015-09-10  3:15                       ` Drew Adams
2015-09-10  6:57                         ` David Kastrup
2015-09-10 15:02                           ` Drew Adams
2015-09-10 15:50                         ` Richard Stallman
2015-09-08 20:09         ` Richard Stallman
2015-09-08 21:00           ` Drew Adams
2015-09-09 15:06             ` Richard Stallman
2015-09-08 21:47           ` Ulrich Mueller
2015-09-02 15:34   ` Richard Stallman
2015-09-02 15:56     ` Drew Adams
2015-09-02 16:05     ` Eli Zaretskii
2015-09-02 21:51       ` Jean-Christophe Helary
2015-09-02 22:15         ` Drew Adams
2015-09-03 15:37           ` Richard Stallman
2015-09-03  2:41         ` Eli Zaretskii
2015-09-03  3:08           ` Jean-Christophe Helary
2015-09-03  7:28             ` Artur Malabarba
2015-09-03 17:15               ` Drew Adams
2015-09-07 13:52                 ` Nix
2015-09-07 17:07                   ` Drew Adams
2015-09-07 23:23                     ` Nix
2015-09-08  2:17                   ` Richard Stallman
2015-09-03 14:33             ` Eli Zaretskii
2015-09-03 15:00         ` Stefan Monnier
2015-09-03 16:15           ` Drew Adams
2015-09-03 16:23             ` Eli Zaretskii
2015-09-03 16:46               ` Drew Adams
2015-09-02 16:10     ` Artur Malabarba
2015-09-03 19:49     ` Pip Cet
     [not found] <<2a7b9134-af2a-462d-af6c-d02bad60bbe8@default>

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4bf04d46-418d-4950-9de3-d9f9130ce8bf@default \
    --to=drew.adams@oracle.com \
    --cc=emacs-devel@gnu.org \
    --cc=stephen@xemacs.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).