unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
From: Eli Zaretskii <eliz@gnu.org>
To: "Mattias Engdegård" <mattias.engdegard@gmail.com>
Cc: emacs-devel@gnu.org
Subject: Re: master 49e243c0c85: Avoid resizing mutation in subst-char-in-string, take two
Date: Tue, 14 May 2024 09:06:54 +0300	[thread overview]
Message-ID: <861q65x6yp.fsf@gnu.org> (raw)
In-Reply-To: <8AF4F364-9030-4634-91C5-79E297E5335B@gmail.com> (message from Mattias Engdegård on Mon, 13 May 2024 21:20:24 +0200)

> From: Mattias Engdegård <mattias.engdegard@gmail.com>
> Date: Mon, 13 May 2024 21:20:24 +0200
> Cc: emacs-devel@gnu.org
> 
> 13 maj 2024 kl. 19.53 skrev Eli Zaretskii <eliz@gnu.org>:
> > 
> >> +  (if (and (not inplace)
> >> +           (if (multibyte-string-p string)
> >> +               (> (max fromchar tochar) 127)
> >> +             (> tochar 255)))
> > 
> > Is the above condition correct?  My reading of it is that if INPLACE
> > is non-nil, we use aset (which will resize a string) even if TOCHAR
> > needs more bytes than FROMCHAR.  Which seems to be in contradiction
> > with the goal of the change, as advertised by the log message: "avoid
> > resizing mutation".
> 
> I agree that it does look a bit odd, but it's intentional. First of all, the aim is to insulate non-mutating calls to the function from issues arising from mutation in the implementation. If we don't have to mutate and it's faster and/or safer not to, then we shouldn't.
> 
> Second, the function is documented to change the string in-place if INPLACE is non-nil, so in that case we have no choice but to mutate, or we might silently break reasonable code.

So I guess the log message doesn't describe this intent clearly
enough.

> > why, in the case of a multibyte STRING, does the code look at the
> > codepoints of FROMCHAR and TOCHAR and not at the number of bytes they
> > take in the internal Emacs representation of the characters?
> 
> It's a conservative approximation that is much simpler than computing the size of the internal representation. (It's also the condition proposed in bug#70784.)

Which part of bug#70784 suggested that?  (It's a very long discussion,
and the suggestion at the beginning talks only about the unibyte
case.)

More to the point, the length of the multibyte string
deterministically depends on the character's codepoint, so I don't
really understand why you say it's "much simpler".  We could have a
primitive, say, char-bytes, to do that even faster, if we want this to
be as efficient as possible.  This will allow a large subset of calls
(without INPLACE = t) to be much faster than it is now, without
resizing the string.  IOW, we will be able to "avoid resizing
mutation" in many more cases.



  reply	other threads:[~2024-05-14  6:06 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-05-13 17:53 master 49e243c0c85: Avoid resizing mutation in subst-char-in-string, take two Eli Zaretskii
2024-05-13 19:20 ` Mattias Engdegård
2024-05-14  6:06   ` Eli Zaretskii [this message]
2024-05-14 10:44     ` Mattias Engdegård
2024-05-14 11:35       ` Eli Zaretskii
2024-05-15 12:29         ` Mattias Engdegård
2024-05-15 12:40           ` Eli Zaretskii
2024-05-15 17:29             ` Mattias Engdegård
2024-05-15 18:15               ` Eli Zaretskii
2024-05-15 20:19                 ` Mattias Engdegård

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=861q65x6yp.fsf@gnu.org \
    --to=eliz@gnu.org \
    --cc=emacs-devel@gnu.org \
    --cc=mattias.engdegard@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).