all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: Drew Adams <drew.adams@oracle.com>
To: "Stephen J. Turnbull" <stephen@xemacs.org>
Cc: emacs-devel@gnu.org
Subject: RE: char equivalence classes in search - why not symmetric?
Date: Wed, 9 Sep 2015 20:12:37 -0700 (PDT)	[thread overview]
Message-ID: <116512ec-bdec-43de-afa9-dc01a57715e8@default> (raw)
In-Reply-To: <4bf04d46-418d-4950-9de3-d9f9130ce8bf@default>

[-- Attachment #1: Type: text/plain, Size: 1324 bytes --]

> AFAICT, this (or similar) is the only code needed.

Sorry, I spoke too soon.

1. The following two lines are needed, before evaluating the code
I sent earlier.  (I've attached an update that includes them, so
you can just load/evaluate it.)

 (setq character-fold-search t)
 (load-library "character-fold")

This is due to the way the vanilla code is at the moment.  This also
means that for this testing char folding will be on, to start with.

2. The code I have is not sufficient for everything.  You can
use it to see what the behavior is for single-char entries in the
char table, which includes accented chars (chars with diacritics).
But it does not also handle multiple-char entries in the table.

For instance, you can search for "é" and get char folding, but you
cannot search for "é" and get char folding.  The first of these is
just the char named LATIN SMALL LETTER E WITH ACUTE.  The second is
plain "e" composed with "́" (the char named COMBINING ACUTE ACCENT).

Some more work would be needed to make such combinations work too.
As I said, I'm no expert on char tables.  But the attached code
should give you a good idea of what is involved.

At the end of the file I included some commented-out e chars to
search for.  (Use `C-u C-x =' on a char to see what it really is.)

[-- Attachment #2: symmetric-char-fold.el --]
[-- Type: application/octet-stream, Size: 5807 bytes --]

(setq character-fold-search t)
(load-library "character-fold")

(defun update-char-fold-table ()
  "Update the value of variable `character-fold-table'.
The new value reflects the current value of `char-fold-symmetric'."
  (setq character-fold-table
        (let* ((equiv (make-char-table 'character-fold-table))
               (table (unicode-property-table-internal 'decomposition))
               (func (char-table-extra-slot table 1)))
          ;; Ensure the table is populated.
          (map-char-table
           (lambda (i v) (when (consp i) (funcall func (car i) v table)))
           table)
          ;; Compile a list of all complex chars that each simple char should match.
          (map-char-table
           (lambda (i dec)
             (when (consp dec)
               ;; Discard a possible formatting tag.
               (when (symbolp (car dec))
                 (setq dec (cdr dec)))
               ;; Skip trivial cases like ?a decomposing to (?a).
               (unless (or (and (eq i (car dec))  (not  (cdr dec))))
                 (let ((d dec)
                       (fold-decomp t)
                       k found)
                   (while (and d (not found))
                     (setq k (pop d))
                     ;; Is k a number or letter, per unicode standard?
                     (setq found (memq (get-char-code-property k 'general-category)
                                       '(Lu Ll Lt Lm Lo Nd Nl No))))
                   (if found
                       ;; Check if the decomposition has more than one letter,
                       ;; because then we don't want the first letter to match
                       ;; the decomposition.
                       (dolist (k d)
                         (when (and fold-decomp
                                    (memq (get-char-code-property k 'general-category)
                                          '(Lu Ll Lt Lm Lo Nd Nl No)))
                           (setq fold-decomp nil)))
                     ;; If there's no number or letter on the
                     ;; decomposition, take the first character in it.
                     (setq found (car-safe dec)))
                   ;; Finally, we only fold multi-char decomposition if at
                   ;; least one of the chars is non-spacing (combining).
                   (when fold-decomp
                     (setq fold-decomp nil)
                     (dolist (k dec)
                       (when (and (not fold-decomp)
                                  (> (get-char-code-property k 'canonical-combining-class) 0))
                         (setq fold-decomp t))))
                   ;; Add i to the list of characters that k can
                   ;; represent. Also possibly add its decomposition, so we can
                   ;; match multi-char representations like (format "a%c" 769)
                   (when (and found (not (eq i k)))
                     (let ((chars (cons (char-to-string i) (aref equiv k))))
                       (aset equiv k (if fold-decomp
                                         (cons (apply #'string dec) chars)
                                       chars))))))))
           table)
          ;; Add some manual entries.
          (dolist (it '((?\" """ "“" "”" "”" "„" "⹂" "〞" "‟" "‟" "❞" "❝"
                         "❠" "“" "„" "〝" "〟" "🙷" "🙶" "🙸" "«" "»")
                        (?' "❟" "❛" "❜" "‘" "’" "‚" "‛" "‚" "󠀢" "❮" "❯" "‹" "›")
                        (?` "❛" "‘" "‛" "󠀢" "❮" "‹")))
            (let ((idx (car it))
                  (chars (cdr it)))
              (aset equiv idx (append chars (aref equiv idx)))))

          ;; --------8<------the only addition----------------
          (when char-fold-symmetric
            ;; Add an entry for each equivalent char.
            (let ((others  ()))
              (map-char-table
               (lambda (base v)
                 (let ((chrs  (aref equiv base)))
                   (when (consp chrs)
                     (dolist (chr  (cdr chrs))
                       (push (cons (string-to-char chr) (remove chr chrs)) others)))))
               equiv)
              (dolist (it  others)
                (let ((base   (car it))
                      (chars  (cdr it)))
                  (aset equiv base (append chars (aref equiv base)))))))
          ;; --------8<---------------------------------------

          ;; Convert the lists of characters we compiled into regexps.
          (map-char-table
           (lambda (i v) (let ((re (regexp-opt (cons (char-to-string i) v))))
                      (if (consp i)
                          (set-char-table-range equiv i re)
                        (aset equiv i re))))
           equiv)
          equiv)))

(defcustom char-fold-symmetric t
  "Non-nil means char-fold searching treats equivalent chars the same.
That is, use of any of a set of char-fold equivalent chars in a search
string finds any of them in the text being searched.

If nil then only the \"base\" or \"canonical\" char of the set matches
any of them.  The others match only themselves, even when char-folding
is turned on."
  :set (lambda (sym defs)
         (custom-set-default sym defs)
         (update-char-fold-table))
  :type 'boolean :group 'isearch)

;; ("𝚎" "𝙚" "𝘦" "𝗲" "𝖾" "𝖊" "𝕖" "𝔢" "𝓮" "𝒆" "𝑒" "𝐞" "e" "㋎" "㋍" "ⓔ" "⒠"
;;  "ⅇ" "ℯ" "ₑ" "ẽ" "ẽ" "ẻ" "ẻ" "ẹ" "ẹ" "ḛ" "ḛ" "ḙ" "ḙ" "ᵉ" "ȩ" "ȩ" "ȇ" "ȇ"
;; "ȅ" "ȅ" "ě" "ě" "ę" "ę" "ė" "ė" "ĕ" "ĕ" "ē" "ē" "ë" "ë" "ê" "ê" "é" "é" "è" "è")

;; No good yet: "𝚎" "ẽ" "ẻ" "ẹ" "ḛ" "ḙ" "ȩ" "ȇ" "ȅ"
;;              "ě" "ę" "ė" "ĕ" "ē" "ë" "ê" "é" "è"

  reply	other threads:[~2015-09-10  3:12 UTC|newest]

Thread overview: 86+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-09-01 15:46 char equivalence classes in search - why not symmetric? Drew Adams
2015-09-01 15:52 ` Davis Herring
2015-09-01 16:51   ` Stefan Monnier
2015-09-01 17:51   ` Drew Adams
2015-09-01 18:40     ` Davis Herring
2015-09-01 19:09       ` Drew Adams
2015-09-01 22:45       ` Juri Linkov
2015-09-02  0:33         ` Drew Adams
2015-09-01 20:10     ` Stephen J. Turnbull
2015-09-01 16:16 ` Eli Zaretskii
     [not found]   ` <<38061f42-eaf1-47c6-b74d-f676ac952b18@default>
     [not found]     ` <<83r3miatvl.fsf@gnu.org>
     [not found]       ` <<21998.29683.916211.867479@a1i15.kph.uni-mainz.de>
     [not found]         ` <<9A972800-D8F0-4DA8-877E-07D5BDC2E1F9@gmail.com>
2015-09-01 17:50   ` Drew Adams
2015-09-01 18:15     ` Eli Zaretskii
2015-09-01 18:46       ` Drew Adams
2015-09-01 19:19         ` Eli Zaretskii
2015-09-01 20:15           ` Drew Adams
2015-09-08  5:36       ` Ulrich Mueller
2015-09-08  6:04         ` Jean-Christophe Helary
2015-09-08 13:31           ` Stephen J. Turnbull
2015-09-08 14:24             ` Drew Adams
2015-09-08 15:21               ` Stephen J. Turnbull
2015-09-08 16:58                 ` Drew Adams
2015-09-08 17:38                   ` Stephen J. Turnbull
2015-09-09 22:52                     ` Drew Adams
2015-09-10  3:12                       ` Drew Adams [this message]
2015-09-10 21:46                         ` Drew Adams
2015-09-08 20:15               ` Richard Stallman
2015-09-08 20:15               ` Richard Stallman
2015-09-08 21:25                 ` Drew Adams
2015-09-09 15:07                   ` Richard Stallman
2015-09-09 15:21                     ` Drew Adams
2015-09-10  2:03                       ` Richard Stallman
2015-09-10  3:23                         ` Drew Adams
2015-09-11 10:28                           ` Richard Stallman
2015-09-11 13:28                             ` Stefan Monnier
2015-09-11 16:33                               ` Drew Adams
2015-09-11 20:59                                 ` Juri Linkov
2015-09-11 23:11                                   ` Drew Adams
2015-09-12 15:28                               ` Richard Stallman
2015-09-11 16:31                             ` Drew Adams
2015-09-11 10:28                           ` Richard Stallman
2015-09-11 16:31                             ` Drew Adams
2015-09-12 15:29                               ` Richard Stallman
     [not found]             ` <<8cf269bc-69d8-4752-8506-de8d992512e1@default>
     [not found]               ` <<E1ZZPIS-0005rf-DJ@fencepost.gnu.org>
2015-09-08 21:46                 ` Drew Adams
     [not found]               ` <<E1ZZPIT-0005s6-ST@fencepost.gnu.org>
     [not found]                 ` <<da54a6cb-90eb-481d-aa20-acfad612e709@default>
     [not found]                   ` <<E1ZZgxz-0006X2-Bg@fencepost.gnu.org>
     [not found]                     ` <<cb107072-7f90-41fb-9aff-075d50eb65bb@default>
     [not found]                       ` <<E1ZZrCm-0001x4-9a@fencepost.gnu.org>
     [not found]                         ` <<4f3b1db3-d3d2-480f-8662-fbf7c74aa67f@default>
     [not found]                           ` <<E1ZaLZR-0002Bf-8q@fencepost.gnu.org>
     [not found]                             ` <<e77f8e7b-581f-436d-816a-c8daed734ff5@default>
     [not found]                               ` <<E1ZamkM-0005d4-RN@fencepost.gnu.org>
2015-09-12 15:59                                 ` Drew Adams
2015-09-08 13:39           ` Drew Adams
2015-09-08 21:19             ` Juri Linkov
2015-09-09 15:07               ` Richard Stallman
2015-09-08 15:47         ` Eli Zaretskii
2015-09-08 16:57           ` Drew Adams
2015-09-08 21:20           ` Juri Linkov
2015-09-09  2:42             ` Eli Zaretskii
2015-09-09 11:23               ` Artur Malabarba
2015-09-09 13:32                 ` Drew Adams
2015-09-09 15:12                 ` Richard Stallman
2015-09-11 20:50                   ` Juri Linkov
     [not found]               ` <<CAAdUY-JMQVsRFku8nwX8JcA9k6Y9sHWoVL6ZC60RHnjoj0cd+Q@mail.gmail.com>
     [not found]                 ` <<E1ZZh2a-0003u6-Fj@fencepost.gnu.org>
2015-09-09 15:22                   ` Drew Adams
2015-09-10  2:03                     ` Richard Stallman
2015-09-10  3:15                       ` Drew Adams
2015-09-10  6:57                         ` David Kastrup
2015-09-10 15:02                           ` Drew Adams
2015-09-10 15:50                         ` Richard Stallman
2015-09-08 20:09         ` Richard Stallman
2015-09-08 21:00           ` Drew Adams
2015-09-09 15:06             ` Richard Stallman
2015-09-08 21:47           ` Ulrich Mueller
2015-09-02 15:34   ` Richard Stallman
2015-09-02 15:56     ` Drew Adams
2015-09-02 16:05     ` Eli Zaretskii
2015-09-02 21:51       ` Jean-Christophe Helary
2015-09-02 22:15         ` Drew Adams
2015-09-03 15:37           ` Richard Stallman
2015-09-03  2:41         ` Eli Zaretskii
2015-09-03  3:08           ` Jean-Christophe Helary
2015-09-03  7:28             ` Artur Malabarba
2015-09-03 17:15               ` Drew Adams
2015-09-07 13:52                 ` Nix
2015-09-07 17:07                   ` Drew Adams
2015-09-07 23:23                     ` Nix
2015-09-08  2:17                   ` Richard Stallman
2015-09-03 14:33             ` Eli Zaretskii
2015-09-03 15:00         ` Stefan Monnier
2015-09-03 16:15           ` Drew Adams
2015-09-03 16:23             ` Eli Zaretskii
2015-09-03 16:46               ` Drew Adams
2015-09-02 16:10     ` Artur Malabarba
2015-09-03 19:49     ` Pip Cet
     [not found] <<2a7b9134-af2a-462d-af6c-d02bad60bbe8@default>

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=116512ec-bdec-43de-afa9-dc01a57715e8@default \
    --to=drew.adams@oracle.com \
    --cc=emacs-devel@gnu.org \
    --cc=stephen@xemacs.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.