all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
* Obsolete string-to-multibyte hard to replace
@ 2017-05-29 13:01 Stefan Monnier
  2017-05-29 17:45 ` Eli Zaretskii
  2017-05-30  0:21 ` Noam Postavsky
  0 siblings, 2 replies; 5+ messages in thread
From: Stefan Monnier @ 2017-05-29 13:01 UTC (permalink / raw)
  To: emacs-devel

I was looking at ruler-mode.el's bytecomp warning about string-to-multibyte
and it seems like there is no good way to write cleaner code here:

   (let (...
         (ruler
          (propertize
           (string-to-multibyte
	    (make-string w ruler-mode-basic-graduation-char))
            ...))
         ...)
    [...]
          (aset ruler k (aref c (setq m (1- m))))

Here, the problem is as follows: make-string returns a unibyte string if
the initial char is ASCII and a multibyte string otherwise.
Seems innucuous, but it means that if the initial char
(ruler-mode-basic-graduation-char above) is ASCII, you can't later
insert a multibyte char into that string with `aset`.

Replacing string-to-multibyte with decode-coding-string is clearly wrong
in the above case (I mean it'll work, but it's even more of a hack than
using string-to-multibyte).

While investigating it, I noticed that my local Emacs hacks include
a change of make-string so it always returns a multibyte string
(I think I made this change, because an all-ASCII multibyte string is
a more precise information than an all-ASCII unibyte string: with
a multibyte string, the fact that `x->size == x->size_byte` immediately
tells us this is an ASCII-only string whereas with a unibyte string we
can't know if that string is ASCII-only without checking each and every
byte).

I think it'd make sense to change make-string so it always returns
a multibyte string, and maybe to also introduce a new make-unibyte-string.


        Stefan



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Obsolete string-to-multibyte hard to replace
  2017-05-29 13:01 Obsolete string-to-multibyte hard to replace Stefan Monnier
@ 2017-05-29 17:45 ` Eli Zaretskii
  2017-05-30  4:03   ` Stefan Monnier
  2017-05-30  0:21 ` Noam Postavsky
  1 sibling, 1 reply; 5+ messages in thread
From: Eli Zaretskii @ 2017-05-29 17:45 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: emacs-devel

> From: Stefan Monnier <monnier@iro.umontreal.ca>
> Date: Mon, 29 May 2017 09:01:24 -0400
> 
> I think it'd make sense to change make-string so it always returns
> a multibyte string, and maybe to also introduce a new make-unibyte-string.

How about an optional argument to make-string instead?

Anyway, beware of the use case of building and starting Emacs in a
non-ASCII directory, especially when the locale's codeset is not
UTF-8.  If we make such changes, these use cases must be audited to
see that they still work, because we use unibyte strings during early
stages of startup in these cases, until we figure out how to decode
them.



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Obsolete string-to-multibyte hard to replace
  2017-05-29 13:01 Obsolete string-to-multibyte hard to replace Stefan Monnier
  2017-05-29 17:45 ` Eli Zaretskii
@ 2017-05-30  0:21 ` Noam Postavsky
  2017-05-30  3:28   ` Stefan Monnier
  1 sibling, 1 reply; 5+ messages in thread
From: Noam Postavsky @ 2017-05-30  0:21 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: Emacs developers

On Mon, May 29, 2017 at 9:01 AM, Stefan Monnier
<monnier@iro.umontreal.ca> wrote:
> I was looking at ruler-mode.el's bytecomp warning about string-to-multibyte
> and it seems like there is no good way to write cleaner code here:
>
>    (let (...
>          (ruler
>           (propertize
>            (string-to-multibyte
>             (make-string w ruler-mode-basic-graduation-char))
>             ...))
>          ...)
>     [...]
>           (aset ruler k (aref c (setq m (1- m))))
>
> Here, the problem is as follows: make-string returns a unibyte string if
> the initial char is ASCII and a multibyte string otherwise.
> Seems innucuous, but it means that if the initial char
> (ruler-mode-basic-graduation-char above) is ASCII, you can't later
> insert a multibyte char into that string with `aset`.

We could the collect the characters in a vector and then only convert
to string at the end. Haven't you said that strings should ideally be
treated as immutable?



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Obsolete string-to-multibyte hard to replace
  2017-05-30  0:21 ` Noam Postavsky
@ 2017-05-30  3:28   ` Stefan Monnier
  0 siblings, 0 replies; 5+ messages in thread
From: Stefan Monnier @ 2017-05-30  3:28 UTC (permalink / raw)
  To: emacs-devel

> We could the collect the characters in a vector and then only convert
> to string at the end.  Haven't you said that strings should ideally be
> treated as immutable?

That's indeed my opinion, but I haven't pushed that even while I was
maintainer because there's a fair bit of code out there which does
modify strings in place.


        Stefan




^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Obsolete string-to-multibyte hard to replace
  2017-05-29 17:45 ` Eli Zaretskii
@ 2017-05-30  4:03   ` Stefan Monnier
  0 siblings, 0 replies; 5+ messages in thread
From: Stefan Monnier @ 2017-05-30  4:03 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel

>> I think it'd make sense to change make-string so it always returns
>> a multibyte string, and maybe to also introduce a new make-unibyte-string.
> How about an optional argument to make-string instead?

You mean (make-string N CHAR &optional UNIBYTE) so it returns
multibyte by default (for all chars) unless the new optional arg is provided?
Sounds fine, yes.

> Anyway, beware of the use case of building and starting Emacs in a
> non-ASCII directory, especially when the locale's codeset is not
> UTF-8.  If we make such changes, these use cases must be audited to
> see that they still work, because we use unibyte strings during early
> stages of startup in these cases, until we figure out how to decode
> them.

The likelihood that those code-paths use make-string is fairly low, but
of course, it can't be ruled out.


        Stefan



^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2017-05-30  4:03 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-05-29 13:01 Obsolete string-to-multibyte hard to replace Stefan Monnier
2017-05-29 17:45 ` Eli Zaretskii
2017-05-30  4:03   ` Stefan Monnier
2017-05-30  0:21 ` Noam Postavsky
2017-05-30  3:28   ` Stefan Monnier

Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.