replace-match problem

all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed

* replace-match problem
@ 2002-05-03 15:44 Wolfgang Scherer
  2002-05-04 15:02 ` Richard Stallman
  0 siblings, 1 reply; 2+ messages in thread
From: Wolfgang Scherer @ 2002-05-03 15:44 UTC (permalink / raw)


This bug report will be sent to the Free Software Foundation,
not to your local site managers!
Please write in English, because the Emacs maintainers do not have
translators to read other languages for them.

Your bug report will be posted to the bug-gnu-emacs@gnu.org mailing list,
and to the gnu.emacs.bug news group.

In GNU Emacs 21.1.1 (i386-suse-linux, X toolkit, Xaw3d scroll bars)
 of 2002-03-25 on stephens
configured using `configure --with-gcc --with-pop --with-system-malloc --prefix=/usr --exec-prefix=/usr --infodir=/usr/share/info --mandir=/usr/share/man --sharedstatedir=/var/lib --libexecdir=/usr/lib --with-x --with-xpm --with-jpeg --with-tiff --with-gif --with-png --with-x-toolkit=lucid --x-includes=/usr/X11R6/include --x-libraries=/usr/X11R6/lib i386-suse-linux CC=gcc 'CFLAGS=-O2 -march=i486 -mcpu=i686 -pipe 	 -DSYSTEM_PURESIZE_EXTRA=25000 	 -DSITELOAD_PURESIZE_EXTRA=10000 	 -D_GNU_SOURCE ' LDFLAGS=-s build_alias=i386-suse-linux host_alias=i386-suse-linux target_alias=i386-suse-linux'
Important settings:
  value of $LC_ALL: nil
  value of $LC_COLLATE: POSIX
  value of $LC_CTYPE: nil
  value of $LC_MESSAGES: nil
  value of $LC_MONETARY: nil
  value of $LC_NUMERIC: nil
  value of $LC_TIME: nil
  value of $LANG: german
  locale-coding-system: iso-latin-1
  default-enable-multibyte-characters: nil

Please describe exactly what actions triggered the bug
and the precise symptoms of the bug:

REPLACE-MATCH PROBLEM
=====================

The built-in function `replace-match' seems to behave inconsistently.
Specifically, I have a problem with the semantics of "words" and
"newtext".

From the documentation of `replace-match':

    Otherwise maybe capitalize the whole text, or maybe just word
    initials, based on the replaced text.

    [1] If the replaced text has only capital letters and has at
    least one multiletter word, convert NEWTEXT to all caps.

    [2] If the replaced text has at least one word starting with a
    capital letter, then capitalize each word in NEWTEXT.


1. The lower case and upper case examples in lines 1, 2, 6, 7, 11, 12,
   16, 17 could suggest that "\\&" is subject to case conversion.
   Lines 4, 5, 9, 10, 14, 15, 19, 20 show that this is not the case.
   (I think a clarification would be nice, e.g. "Case conversion is
    done before any special sequences are expanded.")

2. The lower case and upper case examples also suggest, that the
   amount of non-word constituent characters between words does not make
   a difference.
   The examples for mixed-case replaced text in lines 3, 8, 13, 18
   show that the amount of non-word constituent characters does in fact
   make a difference.

   This is a consequence of replace-match in search.c not checking the
   syntax-code of the current character, which leads to the assumption,
   that the second and further separators are actually the initial
   characters of a word.

3. The test examples for mixed-case replaced text in lines 4, 5, 9,
   10, 14, 15, 19, 20 show that description [2] is plainly wrong.  It
   should state, that capitalization is only done, when ALL words in
   the replaced text are capitalized.

   At least the code in search.c says so:

       /* Capitalize each word, if the old text has all capitalized words.  */

TEST CASE
=========

The following table was generated with a test expression that copies
INPUT with fixed case ("\\& => \\&" ) and then replaces the copy of
INPUT with case conversion (e.g. "\\& : your-string").

    INPUT          \& REPL       STRING-REPL

  1 my-string   => my-string   : your--string 
  2 MY-STRING   => MY-STRING   : YOUR--STRING 
  3 My-String   => My-String   : Your--String 
  4 My-string   => My-string   : your--string 
  5 my-String   => my-String   : your--string 
          
  6 my--string  => my--string  : your--string
  7 MY--STRING  => MY--STRING  : YOUR--STRING
  8 My--String  => My--String  : your--string
  9 My--string  => My--string  : your--string
 10 my--String  => my--String  : your--string
          
 11 my string   => my string   : your string 
 12 MY STRING   => MY STRING   : YOUR STRING 
 13 My String   => My String   : Your String 
 14 My string   => My string   : your string 
 15 my String   => my String   : your string 
          
 16 my  string  => my  string  : your  string
 17 MY  STRING  => MY  STRING  : YOUR  STRING
 18 My  String  => My  String  : your  string
 19 My  string  => My  string  : your  string
 20 my  String  => my  String  : your  string

EMACS search.c (no difference between 20.7, 21.1 and 21.2)
==============

>> 	  if (LOWERCASEP (c))
>> 	    {
>> 	      /* Cannot be all caps if any original char is lower case */
>>
>> 	      some_lowercase = 1;
>> 	      if (SYNTAX (prevc) != Sword)
>> 		some_nonuppercase_initial = 1;
>> 	      else
>> 		some_multiletter_word = 1;
>> 	    }
>> 	  else if (!NOCASEP (c))
>> 	    {
>> 	      some_uppercase = 1;
>> 	      if (SYNTAX (prevc) != Sword)
>> 		;
>> 	      else
>> 		some_multiletter_word = 1;
>> 	    }
>> 	  else
>> 	    {
>> 	      /* If the initial is a caseless word constituent,
>> 		 treat that like a lowercase initial.  */
>> 	      if (SYNTAX (prevc) != Sword)
>> 		some_nonuppercase_initial = 1;
>> 	    }

I think it should be more correctly:

      if (SYNTAX (c) == Sword)
        {
>> 	  if (LOWERCASEP (c))
>> 	    {
>> 	      /* Cannot be all caps if any original char is lower case */
>>
>> 	      some_lowercase = 1;
>> 	      if (SYNTAX (prevc) != Sword)
>> 		some_nonuppercase_initial = 1;
>> 	      else
>> 		some_multiletter_word = 1;
>> 	    }
>> 	  else if (!NOCASEP (c))
>> 	    {
>> 	      some_uppercase = 1;
>> 	      if (SYNTAX (prevc) != Sword)
>> 		;
>> 	      else
>> 		some_multiletter_word = 1;
>> 	    }
>> 	  else
>> 	    {
>> 	      /* If the initial is a caseless word constituent,
>> 		 treat that like a lowercase initial.  */
>> 	      if (SYNTAX (prevc) != Sword)
>> 		some_nonuppercase_initial = 1;
>> 	    }
        }

Or:

>> 	  if (LOWERCASEP (c))
>> 	    {
>> 	      /* Cannot be all caps if any original char is lower case */
>>
>> 	      some_lowercase = 1;
>> 	      if (SYNTAX (prevc) != Sword)
>> 		some_nonuppercase_initial = 1;
>> 	      else
>> 		some_multiletter_word = 1;
>> 	    }
>> 	  else if (!NOCASEP (c))
>> 	    {
>> 	      some_uppercase = 1;
>> 	      if (SYNTAX (prevc) != Sword)
>> 		;
>> 	      else
>> 		some_multiletter_word = 1;
>> 	    }
   	  else if (SYNTAX (c) == Sword)
>> 	    {
>> 	      /* If the initial is a caseless word constituent,
>> 		 treat that like a lowercase initial.  */
>> 	      if (SYNTAX (prevc) != Sword)
>> 		some_nonuppercase_initial = 1;
>> 	    }

TEST EXPRESSION (JUST FOR REFERENCE)
====================================

;; |:debug:|
(let ((case-fold-search t)
      (case-replace nil)
      (str-wid 13)
      (line-no 1)
      (r-s (function
            (lambda (SEARCH REPL)
              (while (search-forward SEARCH nil t)
                ;; Duplicate SEARCH (with FIXEDCASE == t)
                (replace-match (format (format "%%-%ss => %%s"
                                               (- str-wid
                                                  (length (match-string 0))))
                                       "\\&" "\\&")
                               t nil)
                (goto-char (match-beginning 0))
                ;; Find copy of SEARCH
                (search-forward SEARCH nil t 2)
                ;; Replace SEARCH (with FIXEDCASE == nil) by "\\& => REPL"
                (replace-match (format (format "%%-%ss : %%s"
                                               (- str-wid
                                                  (length (match-string 0))))
                                       "\\&" REPL)
                               nil nil)
                ;; Add line number
                (beginning-of-line)
                (insert (format "%3d " line-no))
                (end-of-line)
                (setq line-no (1+ line-no)))))))
  (save-excursion
    (funcall r-s "my-string"  "your--string")
    (funcall r-s "my--string" "your--string")
    (funcall r-s "my string"  "your string")
    (funcall r-s "my  string" "your  string")
    ))
;; |:debug:|



Recent input:
<down> <down> <down> <down> <down> <down> <down> <down> 
<down> <down> <down> <down> <down> <down> <down> <down> 
<down> <down> <down> <down> <down> <down> <down> <down> 
<down> <down> <up> <up> <up> <up> <up> <up> <up> <up> 
<up> <up> <up> <up> <up> <up> <up> <up> <up> <up> <up> 
<up> <up> <up> <up> <up> <up> <up> <up> <up> <up> <up> 
<up> <up> <up> <down> <down> <down> <down> <down> <down> 
<down> <down> <up> <C-prior> C-x C-f m a i l <return> 
C-u C-c u d f C-x k <return> <help-echo> <help-echo> 
<help-echo> <help-echo> <help-echo> <help-echo> <help-echo> 
<help-echo> <help-echo> <help-echo> <help-echo> <help-echo> 
<help-echo> <menu-bar> <help-menu> <report-emacs-b
ug>

Recent messages:
emacs-replace-match-bug.el has auto save data; consider M-x recover-file
Scanning buffer for index (  0%)
Scanning buffer for index (100%)
call-interactively: Quit
Wrote /usr/people/ws/emacs-init/replace-match/emacs-replace-match-bug.el [3 times]
Mark set [4 times]
Wrote /usr/people/ws/emacs-init/replace-match/emacs-replace-match-bug.el [2 times]
Mark set
(New file)
Loading emacsbug...done

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: replace-match problem
  2002-05-03 15:44 replace-match problem Wolfgang Scherer
@ 2002-05-04 15:02 ` Richard Stallman
  0 siblings, 0 replies; 2+ messages in thread
From: Richard Stallman @ 2002-05-04 15:02 UTC (permalink / raw)
  Cc: emacs-devel

       This is a consequence of replace-match in search.c not checking the
       syntax-code of the current character, which leads to the assumption,
       that the second and further separators are actually the initial
       characters of a word.

That seems like a bug.  Does anyone disagree?

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2002-05-04 15:02 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2002-05-03 15:44 replace-match problem Wolfgang Scherer
2002-05-04 15:02 ` Richard Stallman

Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.