unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
* bug#63398: 28.2; Doc or behavior of replacement commands (e.g. `replace-string')
@ 2023-05-09 20:13 Drew Adams
  2023-05-10 13:37 ` Eli Zaretskii
  0 siblings, 1 reply; 8+ messages in thread
From: Drew Adams @ 2023-05-09 20:13 UTC (permalink / raw)
  To: 63398

emacs -Q

;; So `search-upper-case' is `not-yanks', and `case-fold-search' and
;; `case-replace' are both `t'.

In *scratch* enter this text:

Test 0
test 0

At bob, use `M-x replace-string RET test 0 RET test 1 RET'

(emacs) `Replacement and Lax Matches' seems to say that the result
should be this:

Test 1
test 1

But the result is this:

test 1
test 1

That doc says this:

  If the first argument of a replace command is all lower case, the
  command ignores case while searching for occurrences to replace-provided
  'case-fold-search' is non-'nil' and 'search-upper-case' is also
  non-'nil'.

OK, that's respected; both lines are found during the search.  Good.
___

BTW, it's unfortunate that we use an em dash char here, with no
preceding or following space chars.  Why?  Because it reads as if it
were a hyphen, producing adjective "replace-provided" modifying noun
`case-fold-search'.  Since we use fixed-width fonts by default, this is
all the more apparent.  Please reword or surround the em dash with space
chars.
___

The doc also says this, however, regarding replacement:

  In addition, when the NEWSTRING argument is all or partly lower case,
  replacement commands try to preserve the case pattern of each
  occurrence.  Thus, the command

     M-x replace-string <RET> foo <RET> bar <RET>

  replaces a lower case 'foo' with a lower case 'bar', an all-caps 'FOO'
  with 'BAR', and a capitalized 'Foo' with 'Bar'.  (These three
  alternatives-lower case, all caps, and capitalized, are the only ones
  that 'replace-string' can distinguish.)

My reading of this is that, since "test 1" is lower-case, the
replacement should "try" (meaning what, exactly? under what
circumstances does such a trial "fail"?) to preserve the case pattern of
the first occurrence, chaning "Test 0" to "Test 1".  That doesn't
happen.

Is the doc wrong?  Is my reading of it wrong?  If my reading and the doc
are right, is the behavior wrong (bugged)?
___

[It's also not very good to refer to argument NEWSTRING in a topic/node
that doesn't define it. Users have to look backward through the doc to
see if they can find out which argument this is talking about.]\
___


In GNU Emacs 28.2 (build 2, x86_64-w64-mingw32)
 of 2022-09-13 built on AVALON
Windowing system distributor 'Microsoft Corp.', version 10.0.19045
System Description: Microsoft Windows 10 Pro (v10.0.2009.19045.2846)

Configured using:
 'configure --with-modules --without-dbus --with-native-compilation
 --without-compress-install CFLAGS=-O2'

Configured features:
ACL GIF GMP GNUTLS HARFBUZZ JPEG JSON LCMS2 LIBXML2 MODULES NATIVE_COMP
NOTIFY W32NOTIFY PDUMPER PNG RSVG SOUND THREADS TIFF TOOLKIT_SCROLL_BARS
XPM ZLIB

(NATIVE_COMP present but libgccjit not available)






^ permalink raw reply	[flat|nested] 8+ messages in thread

* bug#63398: 28.2; Doc or behavior of replacement commands (e.g. `replace-string')
  2023-05-09 20:13 bug#63398: 28.2; Doc or behavior of replacement commands (e.g. `replace-string') Drew Adams
@ 2023-05-10 13:37 ` Eli Zaretskii
  2023-05-10 14:20   ` Drew Adams
  0 siblings, 1 reply; 8+ messages in thread
From: Eli Zaretskii @ 2023-05-10 13:37 UTC (permalink / raw)
  To: Drew Adams; +Cc: 63398-done

> From: Drew Adams <drew.adams@oracle.com>
> Date: Tue, 9 May 2023 20:13:04 +0000
> 
> The doc also says this, however, regarding replacement:
> 
>   In addition, when the NEWSTRING argument is all or partly lower case,
>   replacement commands try to preserve the case pattern of each
>   occurrence.  Thus, the command
> 
>      M-x replace-string <RET> foo <RET> bar <RET>
> 
>   replaces a lower case 'foo' with a lower case 'bar', an all-caps 'FOO'
>   with 'BAR', and a capitalized 'Foo' with 'Bar'.  (These three
>   alternatives-lower case, all caps, and capitalized, are the only ones
>   that 'replace-string' can distinguish.)
> 
> My reading of this is that, since "test 1" is lower-case, the
> replacement should "try" (meaning what, exactly? under what
> circumstances does such a trial "fail"?) to preserve the case pattern of
> the first occurrence, chaning "Test 0" to "Test 1".  That doesn't
> happen.
> 
> Is the doc wrong?  Is my reading of it wrong?  If my reading and the doc
> are right, is the behavior wrong (bugged)?

The manual says "try", and for a good reason.  There's a heuristics
involved that tries to DTRT.  The "when the NEWSTRING argument is all
or partly lower case" part is relevant.  What you expect will happen
if the original text doesn't include digits, as in

  Testing
  testing

  M-x replace-string RET testing RET foobar RET

> [It's also not very good to refer to argument NEWSTRING in a topic/node
> that doesn't define it. Users have to look backward through the doc to
> see if they can find out which argument this is talking about.]\

Fixed.

> BTW, it's unfortunate that we use an em dash char here, with no
> preceding or following space chars.  Why?  Because it reads as if it
> were a hyphen, producing adjective "replace-provided" modifying noun
> `case-fold-search'.  Since we use fixed-width fonts by default, this is
> all the more apparent.  Please reword or surround the em dash with space
> chars.

In your post the em dash was the ASCII character '-', but on my system
it is an actual em dash -- a much longer character, thus the confusion
is unlikely.  As for why there are no spaces -- that's our style.





^ permalink raw reply	[flat|nested] 8+ messages in thread

* bug#63398: 28.2; Doc or behavior of replacement commands (e.g. `replace-string')
  2023-05-10 13:37 ` Eli Zaretskii
@ 2023-05-10 14:20   ` Drew Adams
  2023-05-10 15:27     ` Eli Zaretskii
  0 siblings, 1 reply; 8+ messages in thread
From: Drew Adams @ 2023-05-10 14:20 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 63398-done@debbugs.gnu.org

> The manual says "try", and for a good reason.  There's a heuristics
> involved that tries to DTRT. The "when the NEWSTRING argument is all
> or partly lower case" part is relevant.

Yes, I assumed that.  But seeing "try" still can make
a reader wonder.

> What you expect will happen
> if the original text doesn't include digits, as in
> 
>   Testing
>   testing
> 
>   M-x replace-string RET testing RET foobar RET

Yes, I know.  That's why I included the digits - it's
this case that seems not to follow what the doc says.

Are you perhaps connecting this with your previous
sentence, about success of a "trial" depending on
NEWSTRING being partly or all lower case?  Are you
saying that if there are non-letter chars then what
the doc says might not happen because trying doesn't
succeed?

I guess it's not clear to me whether you're saying
that the behavior isn't what it should be (per the
doc) in this case, but that's unavoidable or OK, or
you're saying that the behavior does follow the doc,
and the doc is trying to say that the behavior
follows what it says only if there are no digits?

There are several variables that can affect the
behavior, which makes trying to describe (doc) and
trying to understand (reader) the behavior not so
easy.

FWIW, this was raised by a user question on reddit:

https://www.reddit.com/r/emacs/comments/13d1a5x/replacestring_keeping_case_of_the_matched_string

You closed this as fixed, but I still find the doc
- or the behavior - unclear wrt this example.
Could you maybe (e.g. here) explain a bit more how
the behavior fits the description?

> > [It's also not very good to refer to argument NEWSTRING in a topic/node
> > that doesn't define it. Users have to look backward through the doc to
> > see if they can find out which argument this is talking about.]\
> 
> Fixed.

Thx.

> In your post the em dash was the ASCII character '-',

Dunno how that happened; sorry.  It's an em dash in
the Emacs text.  And with a fixed-width font (default,
emacs -Q) the problem I cited is real.

> but on my system it is an actual em dash -- a much
> longer character, thus the confusion
> is unlikely.

How can it be a longer char, if the font is fixed width?

> As for why there are no spaces -- that's our style.

That's fine, provide the em space is actually longer
than the fixed width.

(Typographic practice varies, but a thin space is often
or typically used on each side of an em dash.)





^ permalink raw reply	[flat|nested] 8+ messages in thread

* bug#63398: 28.2; Doc or behavior of replacement commands (e.g. `replace-string')
  2023-05-10 14:20   ` Drew Adams
@ 2023-05-10 15:27     ` Eli Zaretskii
  2023-05-10 15:50       ` Drew Adams
  2023-05-10 16:46       ` Juri Linkov
  0 siblings, 2 replies; 8+ messages in thread
From: Eli Zaretskii @ 2023-05-10 15:27 UTC (permalink / raw)
  To: Drew Adams; +Cc: 63398-done

> From: Drew Adams <drew.adams@oracle.com>
> CC: "63398-done@debbugs.gnu.org" <63398-done@debbugs.gnu.org>
> Date: Wed, 10 May 2023 14:20:10 +0000
> 
> > What you expect will happen
> > if the original text doesn't include digits, as in
> > 
> >   Testing
> >   testing
> > 
> >   M-x replace-string RET testing RET foobar RET
> 
> Yes, I know.  That's why I included the digits - it's
> this case that seems not to follow what the doc says.

It's too bad you kept silent about that, because it took me some time
to discover the reason.  Why posting riddles if you already know part
of the answer?

> Are you perhaps connecting this with your previous
> sentence, about success of a "trial" depending on
> NEWSTRING being partly or all lower case?  Are you
> saying that if there are non-letter chars then what
> the doc says might not happen because trying doesn't
> succeed?

Yes.

> I guess it's not clear to me whether you're saying
> that the behavior isn't what it should be (per the
> doc) in this case, but that's unavoidable or OK, or
> you're saying that the behavior does follow the doc,
> and the doc is trying to say that the behavior
> follows what it says only if there are no digits?

The latter.

> You closed this as fixed, but I still find the doc
> - or the behavior - unclear wrt this example.
> Could you maybe (e.g. here) explain a bit more how
> the behavior fits the description?

I don't know what exactly happens and when, and thus cannot say more.
Feel free to study the code and find out.  Or maybe someone else will
be able to describe the behavior in more detail.

> > but on my system it is an actual em dash -- a much
> > longer character, thus the confusion
> > is unlikely.
> 
> How can it be a longer char, if the font is fixed width?

The ASCII dash has whitespace around it, which em dash lacks.





^ permalink raw reply	[flat|nested] 8+ messages in thread

* bug#63398: 28.2; Doc or behavior of replacement commands (e.g. `replace-string')
  2023-05-10 15:27     ` Eli Zaretskii
@ 2023-05-10 15:50       ` Drew Adams
  2023-05-10 16:46       ` Juri Linkov
  1 sibling, 0 replies; 8+ messages in thread
From: Drew Adams @ 2023-05-10 15:50 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 63398-done@debbugs.gnu.org

[-- Attachment #1: Type: text/plain, Size: 629 bytes --]

> > > but on my system it is an actual em dash -- a much
> > > longer character, thus the confusion
> > > is unlikely.
> >
> > How can it be a longer char, if the font is fixed width?
> 
> The ASCII dash has whitespace around it, which em dash lacks.

In any case, what I see with emacs -Q (in Emacs 28.2)
is that the em dash, with no surrounding space chars,
seems to have the same width as all of the other
fixed-width chars.  Hence it _appears_ as if it were
adjective "replace-provided".  See attached screenshot.
___

Also, "lower case" is better as "lowercase".

https://english.stackexchange.com/a/59413

[-- Attachment #2: throw-em-space-in-Info.png --]
[-- Type: image/png, Size: 92303 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* bug#63398: 28.2; Doc or behavior of replacement commands (e.g. `replace-string')
  2023-05-10 15:27     ` Eli Zaretskii
  2023-05-10 15:50       ` Drew Adams
@ 2023-05-10 16:46       ` Juri Linkov
  2023-05-10 17:03         ` Drew Adams
  1 sibling, 1 reply; 8+ messages in thread
From: Juri Linkov @ 2023-05-10 16:46 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 63398, Drew Adams

>> You closed this as fixed, but I still find the doc
>> - or the behavior - unclear wrt this example.
>> Could you maybe (e.g. here) explain a bit more how
>> the behavior fits the description?
>
> I don't know what exactly happens and when, and thus cannot say more.
> Feel free to study the code and find out.  Or maybe someone else will
> be able to describe the behavior in more detail.

The rules of replacement case-folding are more complex than documented.
`replace-match' checks if the initial is a caseless word constituent
like "0", and treats that like a lowercase initial.

So  "test a → test b" replaces "Test A" with "Test B",
but "test 0 → test 1" replaces "Test 0" with "test 1".





^ permalink raw reply	[flat|nested] 8+ messages in thread

* bug#63398: 28.2; Doc or behavior of replacement commands (e.g. `replace-string')
  2023-05-10 16:46       ` Juri Linkov
@ 2023-05-10 17:03         ` Drew Adams
  2023-05-11  6:23           ` Juri Linkov
  0 siblings, 1 reply; 8+ messages in thread
From: Drew Adams @ 2023-05-10 17:03 UTC (permalink / raw)
  To: Juri Linkov, Eli Zaretskii; +Cc: 63398@debbugs.gnu.org

> The rules of replacement case-folding are more complex than documented.
> `replace-match' checks if the initial is a caseless word constituent
> like "0", and treats that like a lowercase initial.
> 
> So  "test a → test b" replaces "Test A" with "Test B",
> but "test 0 → test 1" replaces "Test 0" with "test 1".

Can we fix this?  Should the behavior be changed?
Should the behavior remain like this and the doc
be changed?
	

^ permalink raw reply	[flat|nested] 8+ messages in thread

* bug#63398: 28.2; Doc or behavior of replacement commands (e.g. `replace-string')
  2023-05-10 17:03         ` Drew Adams
@ 2023-05-11  6:23           ` Juri Linkov
  0 siblings, 0 replies; 8+ messages in thread
From: Juri Linkov @ 2023-05-11  6:23 UTC (permalink / raw)
  To: Drew Adams; +Cc: Eli Zaretskii, 63398@debbugs.gnu.org

>> The rules of replacement case-folding are more complex than documented.
>> `replace-match' checks if the initial is a caseless word constituent
>> like "0", and treats that like a lowercase initial.
>>
>> So  "test a → test b" replaces "Test A" with "Test B",
>> but "test 0 → test 1" replaces "Test 0" with "test 1".
>
> Can we fix this?  Should the behavior be changed?

I guess the default should never change.
But maybe the rules could be customized.

> Should the behavior remain like this and the doc
> be changed?

The current implementation of rules is quite complex.
No sure if all details can be documented succinctly.





^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2023-05-11  6:23 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-05-09 20:13 bug#63398: 28.2; Doc or behavior of replacement commands (e.g. `replace-string') Drew Adams
2023-05-10 13:37 ` Eli Zaretskii
2023-05-10 14:20   ` Drew Adams
2023-05-10 15:27     ` Eli Zaretskii
2023-05-10 15:50       ` Drew Adams
2023-05-10 16:46       ` Juri Linkov
2023-05-10 17:03         ` Drew Adams
2023-05-11  6:23           ` Juri Linkov

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).