unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
From: Drew Adams <drew.adams@oracle.com>
To: Juri Linkov <juri@linkov.net>
Cc: 22147@debbugs.gnu.org, Artur Malabarba <bruce.connor.am@gmail.com>
Subject: bug#22147: Obsolete search-forward-lax-whitespace
Date: Sat, 14 May 2016 15:22:27 -0700 (PDT)	[thread overview]
Message-ID: <8ec0f5d4-a500-42c1-bab8-eaba00f0915c@default> (raw)
In-Reply-To: <87r3d4z7uf.fsf@mail.linkov.net>

[-- Attachment #1: Type: text/plain, Size: 2266 bytes --]

> >> I mean a char-folding customization that allows a search
> >> for “ä” match “a”.  Is this already possible?
> >
> > It sounds like you are asking for symmetric char folding: being
> > able to use any of the various A's that make up the A-characters
> > equivalence class as a search pattern and find any of those
> > characters.
> >
> > If so, I implemented that (one way, at least), and in emacs-devel
> > I proposed such behavior as a togglable option.
> >
> > It is trivial to try it, if you like: character-fold+.el.
> > http://www.emacswiki.org/emacs/download/character-fold%2b.el
> >
> > (A toggle command for it, `isearchp-toggle-symmetric-char-fold',
> > is defined in isearch+.el:
> > http://www.emacswiki.org/emacs/download/isearch%2b.el.)
> 
> I'm starting to recollect all the remaining pieces to finish this
> release blocking issue, but I can't download this library,
> because the link is broken and it seems the whole site is down.
> 
> Drew, could you please send the latest version as an attachment?

1. EmacsWiki seems to be up now.  Also, you should be able to get to
what is on EmacsWiki at the EmacsMirror: https://github.com/emacsmirror.
And you should also be able to get my libraries from MELPA.  I've
attached `character-fold+.el' anyway.  Let me know if you also want
to look at `isearch+.el' and you cannot get to it for some reason.

2. More importantly, what I wrote in `character-fold+.el' worked
only at the time I wrote it and for a while thereafter, unfortunately.
Not too long after that, Artur Malabarba rewrote `character-fold.el',
so the code I wrote is no longer appropriate.

I have not had time to look at the (fairly deep) changes he made,
or to imagine what I might do with it to obtain the symmetric
behavior I implemented for the earlier version.

4. Dunno whether what I wrote is needed or helpful for dealing
with this bug.  Perhaps you or Artur can tell.  IIUC, the part
of this bug report that I replied to seemed to be a request for
an extension of what `character-fold.el' does: symmetric folding.
But perhaps I was misunderstanding, because I don't see how that
could be a blocking bug - it was never Artur's intention to
provide symmetric folding, AFAIK.

[-- Attachment #2: character-fold+.el --]
[-- Type: application/octet-stream, Size: 12414 bytes --]

;;; character-fold+.el --- Extensions to `character-fold.el'
;;
;; Filename: character-fold+.el
;; Description: Extensions to `character-fold.el'
;; Author: Drew Adams
;; Maintainer: Drew Adams
;; Copyright (C) 2015-2016, Drew Adams, all rights reserved.
;; Created: Fri Nov 27 09:12:01 2015 (-0800)
;; Version: 0
;; Package-Requires: ()
;; Last-Updated: Sat Feb 27 15:05:20 2016 (-0800)
;;           By: dradams
;;     Update #: 93
;; URL: http://www.emacswiki.org/character-fold+.el
;; Doc URL: http://emacswiki.org/CharacterFoldPlus
;; Keywords: isearch, search, unicode
;; Compatibility: GNU Emacs: 25.x builds ON OR BEFORE 2015-12-10 
;;
;; Features that might be required by this library:
;;
;;   `backquote', `button', `bytecomp', `cconv', `character-fold',
;;   `cl-extra', `cl-lib', `help-mode', `macroexp'.
;;
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;
;;; Commentary:
;;
;;  Extensions to Isearch character folding.
;;
;;
;;  NOTE: This library is NOT UP-TO-DATE WRT EMACS 25.  The vanilla
;;        Emacs library `character-fold.el', which this library
;;        extends, was changed in incompatible ways after this library
;;        was written.  I have not yet had a chance to update this
;;        (and am waiting for Emacs 25 to be released to do so).
;;        Sorry about that.
;;
;;
;;  Choose One-Way or Symmetric Character Folding
;;  ---------------------------------------------
;;
;;  Non-nil option `char-fold-symmetric' means that char folding is
;;  symmetric: When you search for any of an equivalence class of
;;  characters you find all of them.  This behavior applies to
;;  query-replacing also - see option `replace-character-fold'.
;;
;;  The default value of `char-fold-symmetric' is `nil', which gives
;;  the same behavior as vanilla Emacs: you find all members of the
;;  equivalence class only when you search for the base character.
;;
;;  For example, with a `nil' value you can search for "e" (a base
;;  character) to find "é", but not vice versa.  With a non-`nil'
;;  value you can search for either, to find itself and the other
;;  members of the equivalence class - the base char is not treated
;;  specially.
;;
;;  Example non-`nil' behavior:
;;
;;    Searching for any of these characters and character compositions
;;    in the search string finds all of them.  (Use `C-u C-x =' with
;;    point before a character to see complete information about it.)
;;
;;      e 𝚎 𝙚 𝘦 𝗲 𝖾 𝖊 𝕖 𝔢 𝓮 𝒆 𝑒 𝐞 e ㋎ ㋍ ⓔ ⒠
;;      ⅇ ℯ ₑ ẽ ẽ ẻ ẻ ẹ ẹ ḛ ḛ ḙ ḙ ᵉ ȩ ȩ ȇ ȇ
;;      ȅ ȅ ě ě ę ę ė ė ĕ ĕ ē ē ë ë ê ê é é è è
;;
;;    An example of a composition is "é".  Searching for that finds
;;    the same matches as searching for "é" or searching for "e".
;;
;;  If you also use library `isearch+.el' then you can toggle option
;;  `char-fold-symmetric' anytime during Isearch, using `M-s ='
;;  (command `isearchp-toggle-symmetric-char-fold').
;;
;;
;;  NOTE:
;;
;;    To customize option `char-fold-symmetric', use either Customize
;;    or a Lisp function designed for customizing options, such as
;;    `customize-set-variable', that invokes the necessary `:set'
;;    function.
;;
;;
;;  CAVEAT:
;;
;;    Be aware that character-fold searching can be much slower when
;;    symmetric - there are many more possibilities to search for.
;;    If, for example, you search only for a single "e"-family
;;    character then every "e" in the buffer is a search hit (which
;;    means lazy-highlighting them all, by default).  Searching with a
;;    longer search string is much faster.
;;
;;    If you also use library `isearch+.el' then you can turn off lazy
;;    highlighting using the toggle key `M-s h L'.  This can vastly
;;    improve performance when character folding is symmetric.
;;
;;
;;  Customize the Ad Hoc Character Foldings
;;  ---------------------------------------
;;
;;  In addition to the standard equivalence classes of a base
;;  character and its family of diacriticals, vanilla Emacs includes a
;;  number of ad hoc character foldings, e.g., for different quote
;;  marks.
;;
;;  Option `char-fold-ad-hoc' lets you customize this set of ad hoc
;;  foldings.  The default value is the same set provided by vanilla
;;  Emacs.
;;
;;
;;
;;  Options defined here:
;;
;;    `char-fold-ad-hoc', `char-fold-symmetric'.
;;
;;  Non-interactive functions defined here:
;;
;;    `update-char-fold-table'.
;;
;;  Internal variables defined here:
;;
;;    `char-fold-decomps'.
;;
;;
;;  ***** NOTE: The following function defined in `mouse.el' has
;;              been ADVISED HERE:
;;
;;    `character-fold-to-regexp'.
;;
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;
;;; Change Log:
;;
;; 2015/12/01 dadams
;;     char-fold-ad-hoc: Added :set.
;; 2015/11/28 dadams
;;     Added: char-fold-ad-hoc.
;;     update-char-fold-table: Use char-fold-ad-hoc.
;; 2015/11/27 dadams
;;     Created.
;;
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;
;; This program is free software: you can redistribute it and/or modify
;; it under the terms of the GNU General Public License as published by
;; the Free Software Foundation, either version 3 of the License, or (at
;; your option) any later version.
;;
;; This program is distributed in the hope that it will be useful, but
;; WITHOUT ANY WARRANTY; without even the implied warranty of
;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
;; General Public License for more details.
;;
;; You should have received a copy of the GNU General Public License
;; along with GNU Emacs.  If not, see <http://www.gnu.org/licenses/>.
;;
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;
;;; Code:

(require 'character-fold)

;;;;;;;;;;;;;;;;;;;;;;;

(defvar char-fold-decomps ()
  "List of conses of a decomposition and its base char.")

(defun update-char-fold-table ()
  "Update the value of variable `character-fold-table'.
The new value reflects the current value of `char-fold-symmetric'."
  (setq char-fold-decomps  ())
  (setq character-fold-table
        (let* ((equiv  (make-char-table 'character-fold-table))
               (table  (unicode-property-table-internal 'decomposition))
               (func   (char-table-extra-slot table 1)))
          ;; Ensure that the table is populated.
          (map-char-table (lambda (ch val) (when (consp ch) (funcall func (car ch) val table))) table)
          ;; Compile a list of all complex chars that each simple char should match.
          (map-char-table
           (lambda (ch dec)
             (when (consp dec)
               (when (symbolp (car dec)) (setq dec  (cdr dec))) ; Discard a possible formatting tag.
               ;; Skip trivial cases like ?a decomposing to (?a).
               (unless (and (null (cdr dec))  (eq ch (car dec)))
                 (let ((dd           dec)
                       (fold-decomp  t)
                       kk found)
                   (while (and dd  (not found))
                     (setq kk  (pop dd))
                     ;; Is KK a number or letter, per unicode standard?
                     (setq found  (memq (get-char-code-property kk 'general-category)
                                        '(Lu Ll Lt Lm Lo Nd Nl No))))
                   (if found
                       ;; Check if the decomposition has more than one letter, because then
                       ;; we don't want the first letter to match the decomposition.
                       (dolist (kk  dd)
                         (when (and fold-decomp  (memq (get-char-code-property kk 'general-category)
                                                       '(Lu Ll Lt Lm Lo Nd Nl No)))
                           (setq fold-decomp  nil)))
                     ;; No number or letter on decomposition.  Take its first char.
                     (setq found  (car-safe dec)))
                   ;; Fold a multi-char decomposition only if at least one of the chars is
                   ;; non-spacing (combining).
                   (when fold-decomp
                     (setq fold-decomp  nil)
                     (dolist (kk  dec)
                       (when (and (not fold-decomp)
                                  (> (get-char-code-property kk 'canonical-combining-class) 0))
                         (setq fold-decomp  t))))
                   ;; Add II to the list of chars that KK can represent.  Maybe add its decomposition
                   ;; too, so we can match multi-char representations like (format "a%c" 769).
                   (when (and found  (not (eq ch kk)))
                     (let ((chr-strgs  (cons (char-to-string ch) (aref equiv kk))))
                       (aset equiv kk (if fold-decomp
                                          (cons (apply #'string dec) chr-strgs)
                                        chr-strgs))))))))
           table)
          ;; Add some manual entries.
          (dolist (it  char-fold-ad-hoc)
            (let ((idx        (car it))
                  (chr-strgs  (cdr it)))
              (aset equiv idx (append chr-strgs (aref equiv idx)))))

          ;; This is the essential bit added by `character-fold+.el'.
          (when (and (boundp 'char-fold-symmetric)  char-fold-symmetric)
            ;; Add an entry for each equivalent char.
            (let ((others  ()))
              (map-char-table
               (lambda (base val)
                 (let ((chr-strgs  (aref equiv base)))
                   (when (consp chr-strgs)
                     (dolist (strg  (cdr chr-strgs))
                       (when (< (length strg) 2)
                         (push (cons (string-to-char strg) (remove strg chr-strgs)) others))
                       ;; Add it and its base char to `char-fold-decomps'.
                       (push (cons strg (char-to-string base)) char-fold-decomps)))))
               equiv)
              (dolist (it  others)
                (let ((base       (car it))
                      (chr-strgs  (cdr it)))
                  (aset equiv base (append chr-strgs (aref equiv base)))))))

          (map-char-table ; Convert the lists of characters we compiled into regexps.
           (lambda (ch val) (let ((re  (regexp-opt (cons (char-to-string ch) val))))
                        (if (consp ch) (set-char-table-range equiv ch re) (aset equiv ch re))))
           equiv)
          equiv)))

(defcustom char-fold-ad-hoc '((?\" """ "“" "”" "”" "„" "⹂" "〞" "‟" "‟" "❞" "❝"
                               "❠" "“" "„" "〝" "〟" "🙷" "🙶" "🙸" "«" "»")
                              (?' "❟" "❛" "❜" "‘" "’" "‚" "‛" "‚" "󠀢" "❮" "❯" "‹" "›")
                              (?` "❛" "‘" "‛" "󠀢" "❮" "‹"))
  "Ad hoc character foldings.
Each entry is a list of a character and the strings that fold into it.

The default value includes those ad hoc foldings provided by vanilla
Emacs."
  :set (lambda (sym defs)
         (custom-set-default sym defs)
         (update-char-fold-table))
  :type '(repeat (cons
                  (character :tag "Fold to character")
                  (repeat (string :tag "Fold from string"))))
  :group 'isearch)

(defcustom char-fold-symmetric nil
  "Non-nil means char-fold searching treats equivalent chars the same.
That is, use of any of a set of char-fold equivalent chars in a search
string finds any of them in the text being searched.

If nil then only the \"base\" or \"canonical\" char of the set matches
any of them.  The others match only themselves, even when char-folding
is turned on."
  :set (lambda (sym defs)
         (custom-set-default sym defs)
         (update-char-fold-table))
  :type 'boolean :group 'isearch)

(defadvice character-fold-to-regexp (before replace-decompositions activate)
  "Replace any decompositions in `character-fold-table' by their base chars.
This allows search to match all equivalents."
  (when char-fold-decomps
    (dolist (decomp  char-fold-decomps)
      (ad-set-arg 0  (replace-regexp-in-string (regexp-quote (car decomp)) (cdr decomp)
                                               (ad-get-arg 0) 'FIXED-CASE 'LITERAL)))))
;;;;;;;;;;;;;;;;;;;;;;;

(provide 'character-fold+)

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;; character-fold+.el ends here

  parent reply	other threads:[~2016-05-14 22:22 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-12-11 23:52 bug#22147: Obsolete search-forward-lax-whitespace Juri Linkov
2015-12-12  0:44 ` Artur Malabarba
2015-12-12 23:31   ` Juri Linkov
2015-12-13  0:29     ` Artur Malabarba
2015-12-14  0:23       ` Juri Linkov
2015-12-14  1:11         ` Artur Malabarba
2015-12-14 23:58           ` Juri Linkov
2015-12-15 10:15             ` Artur Malabarba
2015-12-16  0:57               ` Juri Linkov
2015-12-16  1:47                 ` Drew Adams
2016-05-14 20:45                   ` Juri Linkov
2016-05-14 22:20                     ` Artur Malabarba
2016-05-14 22:27                       ` Drew Adams
2016-05-15 20:45                       ` Juri Linkov
2016-05-14 22:22                     ` Drew Adams [this message]
2016-05-15 20:56                       ` Juri Linkov
2016-05-15 21:51                         ` Drew Adams
2016-05-17 20:55                           ` Juri Linkov
2016-05-17 21:55                             ` Drew Adams
2016-05-18  3:00                               ` Artur Malabarba
2016-05-18 19:34                                 ` Juri Linkov
2016-05-18 20:40                                   ` Artur Malabarba
2016-05-30 20:57                                     ` Juri Linkov
2016-06-01 15:03                                       ` Artur Malabarba
2020-09-05 14:54                                         ` Lars Ingebrigtsen
2020-09-07 18:34                                           ` Juri Linkov
2015-12-16 10:59                 ` Artur Malabarba
2015-12-17  0:57                   ` Juri Linkov
2015-12-17 16:33                     ` Artur Malabarba
2015-12-17 17:21                       ` Drew Adams
2015-12-17 18:47                         ` Artur Malabarba
2015-12-17 22:16                           ` Drew Adams
2015-12-18  0:55                             ` Artur Malabarba

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8ec0f5d4-a500-42c1-bab8-eaba00f0915c@default \
    --to=drew.adams@oracle.com \
    --cc=22147@debbugs.gnu.org \
    --cc=bruce.connor.am@gmail.com \
    --cc=juri@linkov.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).