* bug#56347: Optimize/simplify STRING_SET_MULTIBYTE
@ 2022-07-01 23:32 Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2022-07-02 6:17 ` Eli Zaretskii
0 siblings, 1 reply; 11+ messages in thread
From: Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2022-07-01 23:32 UTC (permalink / raw)
To: 56347
[-- Attachment #1: Type: text/plain, Size: 689 bytes --]
Tags: patch
Tags: patch
The patch below simplifies code around STRING_SET_MULTIBYTE.
Any objection?
Stefan
In GNU Emacs 29.0.50 (build 1, x86_64-pc-linux-gnux32, GTK+ Version 3.24.34, cairo version 1.16.0)
of 2022-06-30 built on alfajor
Repository revision: 2c31b0d5b84471e8b1fa5737b37cf4c241aec036
Repository branch: work
Windowing system distributor 'The X.Org Foundation', version 11.0.12101003
System Description: Debian GNU/Linux bookworm/sid
Configured using:
'configure -C --enable-checking --enable-check-lisp-object-type --with-modules --with-cairo --with-tiff=ifavailable
'CFLAGS=-Wall -g3 -Og -Wno-pointer-sign'
PKG_CONFIG_PATH=/home/monnier/lib/pkgconfig'
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: set-multibyte.patch --]
[-- Type: text/patch, Size: 1513 bytes --]
diff --git a/src/composite.c b/src/composite.c
index 4d69702171f..fb0c51821a5 100644
--- a/src/composite.c
+++ b/src/composite.c
@@ -1879,11 +1879,7 @@ Otherwise (for terminal display), FONT-OBJECT must be a terminal ID, a
for (i = SBYTES (string) - 1; i >= 0; i--)
if (!ASCII_CHAR_P (SREF (string, i)))
error ("Attempt to shape unibyte text");
- /* STRING is a pure-ASCII string, so we can convert it (or,
- rather, its copy) to multibyte and use that thereafter. */
- Lisp_Object string_copy = Fconcat (1, &string);
- STRING_SET_MULTIBYTE (string_copy);
- string = string_copy;
+ /* STRING is a pure-ASCII string, so we can treat it as multibyte. */
}
frombyte = string_char_to_byte (string, frompos);
}
diff --git a/src/lisp.h b/src/lisp.h
index 7be2e5d38dc..9f2e7785d2f 100644
--- a/src/lisp.h
+++ b/src/lisp.h
@@ -1637,12 +1637,10 @@ #define STRING_SET_UNIBYTE(STR) \
/* Mark STR as a multibyte string. Assure that STR contains only
ASCII characters in advance. */
-#define STRING_SET_MULTIBYTE(STR) \
- do { \
- if (XSTRING (STR)->u.s.size == 0) \
- (STR) = empty_multibyte_string; \
- else \
- XSTRING (STR)->u.s.size_byte = XSTRING (STR)->u.s.size; \
+#define STRING_SET_MULTIBYTE(STR) \
+ do { \
+ eassert (XSTRING (STR)->u.s.size > 0); \
+ XSTRING (STR)->u.s.size_byte = XSTRING (STR)->u.s.size; \
} while (false)
/* Convenience functions for dealing with Lisp strings. */
^ permalink raw reply related [flat|nested] 11+ messages in thread
* bug#56347: Optimize/simplify STRING_SET_MULTIBYTE
2022-07-01 23:32 bug#56347: Optimize/simplify STRING_SET_MULTIBYTE Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2022-07-02 6:17 ` Eli Zaretskii
2022-07-02 16:12 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2022-07-02 16:49 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
0 siblings, 2 replies; 11+ messages in thread
From: Eli Zaretskii @ 2022-07-02 6:17 UTC (permalink / raw)
To: Stefan Monnier; +Cc: 56347
> Date: Fri, 01 Jul 2022 19:32:05 -0400
> From: Stefan Monnier via "Bug reports for GNU Emacs,
> the Swiss army knife of text editors" <bug-gnu-emacs@gnu.org>
>
> The patch below simplifies code around STRING_SET_MULTIBYTE.
> Any objection?
Rationale? Simplification in these cases is minimal, almost
non-existent, so it cannot be the only rationale.
> --- a/src/composite.c
> +++ b/src/composite.c
> @@ -1879,11 +1879,7 @@ Otherwise (for terminal display), FONT-OBJECT must be a terminal ID, a
> for (i = SBYTES (string) - 1; i >= 0; i--)
> if (!ASCII_CHAR_P (SREF (string, i)))
> error ("Attempt to shape unibyte text");
> - /* STRING is a pure-ASCII string, so we can convert it (or,
> - rather, its copy) to multibyte and use that thereafter. */
> - Lisp_Object string_copy = Fconcat (1, &string);
> - STRING_SET_MULTIBYTE (string_copy);
> - string = string_copy;
> + /* STRING is a pure-ASCII string, so we can treat it as multibyte. */
Did you actually try your change in the situations where this problem
pops up? AFAIR, the code makes a copy of the string for good reasons:
the rest of handling of the string down the line barfs if we keep a
multibyte string here.
> --- a/src/lisp.h
> +++ b/src/lisp.h
> @@ -1637,12 +1637,10 @@ #define STRING_SET_UNIBYTE(STR) \
>
> /* Mark STR as a multibyte string. Assure that STR contains only
> ASCII characters in advance. */
> -#define STRING_SET_MULTIBYTE(STR) \
> - do { \
> - if (XSTRING (STR)->u.s.size == 0) \
> - (STR) = empty_multibyte_string; \
> - else \
> - XSTRING (STR)->u.s.size_byte = XSTRING (STR)->u.s.size; \
> +#define STRING_SET_MULTIBYTE(STR) \
> + do { \
> + eassert (XSTRING (STR)->u.s.size > 0); \
> + XSTRING (STR)->u.s.size_byte = XSTRING (STR)->u.s.size; \
> } while (false)
>
> /* Convenience functions for dealing with Lisp strings. */
You want to disallow uses of empty_multibyte_string? why?
^ permalink raw reply [flat|nested] 11+ messages in thread
* bug#56347: Optimize/simplify STRING_SET_MULTIBYTE
2022-07-02 6:17 ` Eli Zaretskii
@ 2022-07-02 16:12 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2022-07-02 16:24 ` Eli Zaretskii
2022-07-02 16:49 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
1 sibling, 1 reply; 11+ messages in thread
From: Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2022-07-02 16:12 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: 56347
>> The patch below simplifies code around STRING_SET_MULTIBYTE.
>> Any objection?
> Rationale?
STRING_SET_MULTIBYTE is fundamentally evil because it changes the nature
of an object. Its current definition (like that of STRING_SET_UNIBYTE)
is rather scary (it sometimes changes the nature of the arg passed to
it, and sometimes replaces the arg with something else).
>> --- a/src/composite.c
>> +++ b/src/composite.c
>> @@ -1879,11 +1879,7 @@ Otherwise (for terminal display), FONT-OBJECT must be a terminal ID, a
>> for (i = SBYTES (string) - 1; i >= 0; i--)
>> if (!ASCII_CHAR_P (SREF (string, i)))
>> error ("Attempt to shape unibyte text");
>> - /* STRING is a pure-ASCII string, so we can convert it (or,
>> - rather, its copy) to multibyte and use that thereafter. */
>> - Lisp_Object string_copy = Fconcat (1, &string);
>> - STRING_SET_MULTIBYTE (string_copy);
>> - string = string_copy;
>> + /* STRING is a pure-ASCII string, so we can treat it as multibyte. */
>
> Did you actually try your change in the situations where this problem
> pops up?
I don't even know how to go about doing that, no.
> AFAIR, the code makes a copy of the string for good reasons:
> the rest of handling of the string down the line barfs if we keep a
> multibyte string here.
[ I assume you meant "barfs if we keep a *uni*byte string here". ]
Where? AFAICT `string` is only used in the subsequent code by passing
it to `fill_gstring_header` and that function only passes that arg to
`fetch_string_char_advance_no_check` and that function only looks at the
string's SDATA, so as long as the sequence of bytes is consistent with
a multibyte string (which we just checked with the ASCII_CHAR_P loop),
I don't see any problem.
>> --- a/src/lisp.h
>> +++ b/src/lisp.h
>> @@ -1637,12 +1637,10 @@ #define STRING_SET_UNIBYTE(STR) \
>>
>> /* Mark STR as a multibyte string. Assure that STR contains only
>> ASCII characters in advance. */
>> -#define STRING_SET_MULTIBYTE(STR) \
>> - do { \
>> - if (XSTRING (STR)->u.s.size == 0) \
>> - (STR) = empty_multibyte_string; \
>> - else \
>> - XSTRING (STR)->u.s.size_byte = XSTRING (STR)->u.s.size; \
>> +#define STRING_SET_MULTIBYTE(STR) \
>> + do { \
>> + eassert (XSTRING (STR)->u.s.size > 0); \
>> + XSTRING (STR)->u.s.size_byte = XSTRING (STR)->u.s.size; \
>> } while (false)
>>
>> /* Convenience functions for dealing with Lisp strings. */
>
> You want to disallow uses of empty_multibyte_string? why?
No, I want to reduce the scope of semantics of the macro, e.g. so it can
be implemented as a function rather than a macro and so it doesn't
magically substitute empty_multibyte_string into a variable that held
something else.
Stefan
^ permalink raw reply [flat|nested] 11+ messages in thread
* bug#56347: Optimize/simplify STRING_SET_MULTIBYTE
2022-07-02 16:12 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2022-07-02 16:24 ` Eli Zaretskii
2022-07-02 18:00 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
0 siblings, 1 reply; 11+ messages in thread
From: Eli Zaretskii @ 2022-07-02 16:24 UTC (permalink / raw)
To: Stefan Monnier; +Cc: 56347
> From: Stefan Monnier <monnier@iro.umontreal.ca>
> Cc: 56347@debbugs.gnu.org
> Date: Sat, 02 Jul 2022 12:12:06 -0400
>
> STRING_SET_MULTIBYTE is fundamentally evil because it changes the nature
> of an object. Its current definition (like that of STRING_SET_UNIBYTE)
> is rather scary (it sometimes changes the nature of the arg passed to
> it, and sometimes replaces the arg with something else).
But do we have any alternatives?
> >> - /* STRING is a pure-ASCII string, so we can convert it (or,
> >> - rather, its copy) to multibyte and use that thereafter. */
> >> - Lisp_Object string_copy = Fconcat (1, &string);
> >> - STRING_SET_MULTIBYTE (string_copy);
> >> - string = string_copy;
> >> + /* STRING is a pure-ASCII string, so we can treat it as multibyte. */
> >
> > Did you actually try your change in the situations where this problem
> > pops up?
>
> I don't even know how to go about doing that, no.
Make a character-composition rule that composes, say, two '-'
characters, and then display a buffer where you have adjacent dashes.
> > AFAIR, the code makes a copy of the string for good reasons:
> > the rest of handling of the string down the line barfs if we keep a
> > multibyte string here.
>
> [ I assume you meant "barfs if we keep a *uni*byte string here". ]
Yes.
> Where?
I don't remember, sorry.
> >> -#define STRING_SET_MULTIBYTE(STR) \
> >> - do { \
> >> - if (XSTRING (STR)->u.s.size == 0) \
> >> - (STR) = empty_multibyte_string; \
> >> - else \
> >> - XSTRING (STR)->u.s.size_byte = XSTRING (STR)->u.s.size; \
> >> +#define STRING_SET_MULTIBYTE(STR) \
> >> + do { \
> >> + eassert (XSTRING (STR)->u.s.size > 0); \
> >> + XSTRING (STR)->u.s.size_byte = XSTRING (STR)->u.s.size; \
> >> } while (false)
> >>
> >> /* Convenience functions for dealing with Lisp strings. */
> >
> > You want to disallow uses of empty_multibyte_string? why?
>
> No, I want to reduce the scope of semantics of the macro, e.g. so it can
> be implemented as a function rather than a macro and so it doesn't
> magically substitute empty_multibyte_string into a variable that held
> something else.
But the effect is that you disallow calling STRING_SET_MULTIBYTE on an
empty string, isn't it?
^ permalink raw reply [flat|nested] 11+ messages in thread
* bug#56347: Optimize/simplify STRING_SET_MULTIBYTE
2022-07-02 16:24 ` Eli Zaretskii
@ 2022-07-02 18:00 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2022-07-02 18:31 ` Eli Zaretskii
0 siblings, 1 reply; 11+ messages in thread
From: Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2022-07-02 18:00 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: 56347
>> No, I want to reduce the scope of semantics of the macro, e.g. so it can
>> be implemented as a function rather than a macro and so it doesn't
>> magically substitute empty_multibyte_string into a variable that held
>> something else.
> But the effect is that you disallow calling STRING_SET_MULTIBYTE on an
> empty string, isn't it?
Yes. In my book, STRING_SET_*IBYTE should basically not exist: a string
is created as unibyte or multibyte and never changes after that.
And indeed because we only have a single copy of the two possible empty
strings, they can't be changed between unibyte<->multibyte
Stefan
^ permalink raw reply [flat|nested] 11+ messages in thread
* bug#56347: Optimize/simplify STRING_SET_MULTIBYTE
2022-07-02 18:00 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2022-07-02 18:31 ` Eli Zaretskii
2022-07-02 18:37 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
0 siblings, 1 reply; 11+ messages in thread
From: Eli Zaretskii @ 2022-07-02 18:31 UTC (permalink / raw)
To: Stefan Monnier; +Cc: 56347
> From: Stefan Monnier <monnier@iro.umontreal.ca>
> Cc: 56347@debbugs.gnu.org
> Date: Sat, 02 Jul 2022 14:00:41 -0400
>
> >> No, I want to reduce the scope of semantics of the macro, e.g. so it can
> >> be implemented as a function rather than a macro and so it doesn't
> >> magically substitute empty_multibyte_string into a variable that held
> >> something else.
> > But the effect is that you disallow calling STRING_SET_MULTIBYTE on an
> > empty string, isn't it?
>
> Yes. In my book, STRING_SET_*IBYTE should basically not exist: a string
> is created as unibyte or multibyte and never changes after that.
That's require a much larger change, I think. It is not enough just
to add an assertion in one place, because that'd just cause
maintenance headaches for no real gain.
And I'm not even sure everyone will agree with such a radical change.
It should be discussed first.
> And indeed because we only have a single copy of the two possible empty
> strings, they can't be changed between unibyte<->multibyte
I can create an empty string without those singletons any time.
^ permalink raw reply [flat|nested] 11+ messages in thread
* bug#56347: Optimize/simplify STRING_SET_MULTIBYTE
2022-07-02 18:31 ` Eli Zaretskii
@ 2022-07-02 18:37 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2022-07-02 18:45 ` Eli Zaretskii
0 siblings, 1 reply; 11+ messages in thread
From: Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2022-07-02 18:37 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: 56347
>> Yes. In my book, STRING_SET_*IBYTE should basically not exist: a string
>> is created as unibyte or multibyte and never changes after that.
>
> That's require a much larger change, I think. It is not enough just
> to add an assertion in one place, because that'd just cause
> maintenance headaches for no real gain.
`grep STRING_SET_MULTIBYTE` suggests it's not nearly as hard you think.
The only other use is in `aset`.
>> And indeed because we only have a single copy of the two possible empty
>> strings, they can't be changed between unibyte<->multibyte
> I can create an empty string without those singletons any time.
My experience is that it takes a fair bit of work to do so.
Stefan
^ permalink raw reply [flat|nested] 11+ messages in thread
* bug#56347: Optimize/simplify STRING_SET_MULTIBYTE
2022-07-02 18:37 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2022-07-02 18:45 ` Eli Zaretskii
0 siblings, 0 replies; 11+ messages in thread
From: Eli Zaretskii @ 2022-07-02 18:45 UTC (permalink / raw)
To: Stefan Monnier; +Cc: 56347
> From: Stefan Monnier <monnier@iro.umontreal.ca>
> Cc: 56347@debbugs.gnu.org
> Date: Sat, 02 Jul 2022 14:37:18 -0400
>
> >> Yes. In my book, STRING_SET_*IBYTE should basically not exist: a string
> >> is created as unibyte or multibyte and never changes after that.
> >
> > That's require a much larger change, I think. It is not enough just
> > to add an assertion in one place, because that'd just cause
> > maintenance headaches for no real gain.
>
> `grep STRING_SET_MULTIBYTE` suggests it's not nearly as hard you think.
>
> The only other use is in `aset`.
That's not what I meant. If you want to disallow changing the
representation of strings, STRING_SET_MULTIBYTE is just the tip of the
iceberg.
> >> And indeed because we only have a single copy of the two possible empty
> >> strings, they can't be changed between unibyte<->multibyte
> > I can create an empty string without those singletons any time.
>
> My experience is that it takes a fair bit of work to do so.
I don't see how that matters here, sorry.
^ permalink raw reply [flat|nested] 11+ messages in thread
* bug#56347: Optimize/simplify STRING_SET_MULTIBYTE
2022-07-02 6:17 ` Eli Zaretskii
2022-07-02 16:12 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2022-07-02 16:49 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2022-07-02 17:06 ` Eli Zaretskii
1 sibling, 1 reply; 11+ messages in thread
From: Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2022-07-02 16:49 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: 56347
>> --- a/src/composite.c
>> +++ b/src/composite.c
>> @@ -1879,11 +1879,7 @@ Otherwise (for terminal display), FONT-OBJECT must be a terminal ID, a
>> for (i = SBYTES (string) - 1; i >= 0; i--)
>> if (!ASCII_CHAR_P (SREF (string, i)))
>> error ("Attempt to shape unibyte text");
>> - /* STRING is a pure-ASCII string, so we can convert it (or,
>> - rather, its copy) to multibyte and use that thereafter. */
>> - Lisp_Object string_copy = Fconcat (1, &string);
>> - STRING_SET_MULTIBYTE (string_copy);
>> - string = string_copy;
>> + /* STRING is a pure-ASCII string, so we can treat it as multibyte. */
>
> Did you actually try your change in the situations where this problem
> pops up? AFAIR, the code makes a copy of the string for good reasons:
> the rest of handling of the string down the line barfs if we keep a
> multibyte string here.
Of course, if we really do need a multibyte copy of the string, I can
change the patch to call `Fstring_to_multibyte` instead of `Fconcat`.
Stefan
^ permalink raw reply [flat|nested] 11+ messages in thread
* bug#56347: Optimize/simplify STRING_SET_MULTIBYTE
2022-07-02 16:49 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2022-07-02 17:06 ` Eli Zaretskii
2022-07-02 17:57 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
0 siblings, 1 reply; 11+ messages in thread
From: Eli Zaretskii @ 2022-07-02 17:06 UTC (permalink / raw)
To: Stefan Monnier; +Cc: 56347
> From: Stefan Monnier <monnier@iro.umontreal.ca>
> Cc: 56347@debbugs.gnu.org
> Date: Sat, 02 Jul 2022 12:49:53 -0400
>
> >> --- a/src/composite.c
> >> +++ b/src/composite.c
> >> @@ -1879,11 +1879,7 @@ Otherwise (for terminal display), FONT-OBJECT must be a terminal ID, a
> >> for (i = SBYTES (string) - 1; i >= 0; i--)
> >> if (!ASCII_CHAR_P (SREF (string, i)))
> >> error ("Attempt to shape unibyte text");
> >> - /* STRING is a pure-ASCII string, so we can convert it (or,
> >> - rather, its copy) to multibyte and use that thereafter. */
> >> - Lisp_Object string_copy = Fconcat (1, &string);
> >> - STRING_SET_MULTIBYTE (string_copy);
> >> - string = string_copy;
> >> + /* STRING is a pure-ASCII string, so we can treat it as multibyte. */
> >
> > Did you actually try your change in the situations where this problem
> > pops up? AFAIR, the code makes a copy of the string for good reasons:
> > the rest of handling of the string down the line barfs if we keep a
> > multibyte string here.
>
> Of course, if we really do need a multibyte copy of the string, I can
> change the patch to call `Fstring_to_multibyte` instead of `Fconcat`.
Why not call make_multibyte_string directly?
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2022-07-02 18:45 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-07-01 23:32 bug#56347: Optimize/simplify STRING_SET_MULTIBYTE Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2022-07-02 6:17 ` Eli Zaretskii
2022-07-02 16:12 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2022-07-02 16:24 ` Eli Zaretskii
2022-07-02 18:00 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2022-07-02 18:31 ` Eli Zaretskii
2022-07-02 18:37 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2022-07-02 18:45 ` Eli Zaretskii
2022-07-02 16:49 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2022-07-02 17:06 ` Eli Zaretskii
2022-07-02 17:57 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/emacs.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).