unofficial mirror of help-gnu-emacs@gnu.org
 help / color / mirror / Atom feed
* Why does using aset sometimes output raw bytes?
@ 2018-12-09 15:16 Stephen Berman
  2018-12-09 15:20 ` Eli Zaretskii
  2018-12-09 17:10 ` Stefan Monnier
  0 siblings, 2 replies; 17+ messages in thread
From: Stephen Berman @ 2018-12-09 15:16 UTC (permalink / raw)
  To: help-gnu-emacs

When I use aset to change characters in a string to certain non-ascii
characters and insert the result into a buffer, the non-ascii characters
are displayed as raw bytes.  This only happens with certain non-ascii
characters, and also only if the string being altered is bound to a
variable and aset takes that variable as argument; if aset operates
directly on the string, those same non-ascii characters are inserted as
the expected characters.  To reproduce, start emacs with -Q and evaluate
the following sexp:

(let ((s0 "aous")
      (s1 "äöüß")
      (s2 "sdfg")
      (s3 "ſðđŋ"))
  (dolist (s `((,s0 . ,s1) (,s2 . ,s3)))
    (dotimes (i 4)
      (aset (car s) i (aref (cdr s) i))))
  (insert s0 s2 "\n")
  (dotimes (i 4)
    (insert (aset "aous" i (aref "äöüß" i))))
  (dotimes (i 4)
    (insert (aset "sdfg" i (aref "ſðđŋ" i)))))

Here's what gets inserted into the buffer (I've represented the raw
bytes by ascii strings to make sure they're readable here):

\344\366\374\337ſðđŋ
äöüßſðđŋ

Is this expected, and if so, what's the explanation, i.e., why does this
happen with some non-ascii characters (e.g. äöüß) but not with others
(e.g ſðđŋ) and why does it happen when aset gets passed a variable
for the string but not when it gets passed the string itself?

Steve Berman




^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Why does using aset sometimes output raw bytes?
  2018-12-09 15:16 Why does using aset sometimes output raw bytes? Stephen Berman
@ 2018-12-09 15:20 ` Eli Zaretskii
  2018-12-09 15:46   ` Stephen Berman
  2018-12-09 17:10 ` Stefan Monnier
  1 sibling, 1 reply; 17+ messages in thread
From: Eli Zaretskii @ 2018-12-09 15:20 UTC (permalink / raw)
  To: help-gnu-emacs

> From: Stephen Berman <stephen.berman@gmx.net>
> Date: Sun, 09 Dec 2018 16:16:15 +0100
> 
> When I use aset to change characters in a string to certain non-ascii
> characters and insert the result into a buffer, the non-ascii characters
> are displayed as raw bytes.  This only happens with certain non-ascii
> characters, and also only if the string being altered is bound to a
> variable and aset takes that variable as argument; if aset operates
> directly on the string, those same non-ascii characters are inserted as
> the expected characters.  To reproduce, start emacs with -Q and evaluate
> the following sexp:
> 
> (let ((s0 "aous")
>       (s1 "äöüß")
>       (s2 "sdfg")
>       (s3 "ſðđŋ"))
>   (dolist (s `((,s0 . ,s1) (,s2 . ,s3)))
>     (dotimes (i 4)
>       (aset (car s) i (aref (cdr s) i))))
>   (insert s0 s2 "\n")
>   (dotimes (i 4)
>     (insert (aset "aous" i (aref "äöüß" i))))
>   (dotimes (i 4)
>     (insert (aset "sdfg" i (aref "ſðđŋ" i)))))
> 
> Here's what gets inserted into the buffer (I've represented the raw
> bytes by ascii strings to make sure they're readable here):
> 
> \344\366\374\337ſðđŋ
> äöüßſðđŋ
> 
> Is this expected, and if so, what's the explanation, i.e., why does this
> happen with some non-ascii characters (e.g. äöüß) but not with others
> (e.g ſðđŋ) and why does it happen when aset gets passed a variable
> for the string but not when it gets passed the string itself?

s0 and s2 originally include only pure ASCII characters, so they are
unibyte strings.  Try making them multibyte before using aset.



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Why does using aset sometimes output raw bytes?
  2018-12-09 15:20 ` Eli Zaretskii
@ 2018-12-09 15:46   ` Stephen Berman
  2018-12-09 15:56     ` Stephen Berman
  2018-12-09 17:12     ` Eli Zaretskii
  0 siblings, 2 replies; 17+ messages in thread
From: Stephen Berman @ 2018-12-09 15:46 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: help-gnu-emacs

On Sun, 09 Dec 2018 17:20:13 +0200 Eli Zaretskii <eliz@gnu.org> wrote:

>> Here's what gets inserted into the buffer (I've represented the raw
>> bytes by ascii strings to make sure they're readable here):
>> 
>> \344\366\374\337ſðđŋ
>> äöüßſðđŋ
>> 
>> Is this expected, and if so, what's the explanation, i.e., why does this
>> happen with some non-ascii characters (e.g. äöüß) but not with others
>> (e.g ſðđŋ) and why does it happen when aset gets passed a variable
>> for the string but not when it gets passed the string itself?
>
> s0 and s2 originally include only pure ASCII characters, so they are
> unibyte strings.  Try making them multibyte before using aset.

Thanks, that works.  But why are raw bytes inserted only with some
multibyte strings (e.g. with "äöüß" but not with "ſðđŋ")?  Also, is
there some way to ensure a string is handled as multibyte if it's not
known what characters it contains?  E.g., s0 in my example sexp could be
bound to some string by a function call and before applying the function
it is not known if the string is multibyte; is there some way in Lisp to
say "treat the value of s0 as multibyte (regardless of what characters
it contains)"?

Steve Berman



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Why does using aset sometimes output raw bytes?
  2018-12-09 15:46   ` Stephen Berman
@ 2018-12-09 15:56     ` Stephen Berman
  2018-12-09 17:12     ` Eli Zaretskii
  1 sibling, 0 replies; 17+ messages in thread
From: Stephen Berman @ 2018-12-09 15:56 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: help-gnu-emacs

On Sun, 09 Dec 2018 16:46:01 +0100 Stephen Berman <stephen.berman@gmx.net> wrote:

> On Sun, 09 Dec 2018 17:20:13 +0200 Eli Zaretskii <eliz@gnu.org> wrote:
>
>>> Here's what gets inserted into the buffer (I've represented the raw
>>> bytes by ascii strings to make sure they're readable here):
>>> 
>>> \344\366\374\337ſðđŋ
>>> äöüßſðđŋ
>>> 
>>> Is this expected, and if so, what's the explanation, i.e., why does this
>>> happen with some non-ascii characters (e.g. äöüß) but not with others
>>> (e.g ſðđŋ) and why does it happen when aset gets passed a variable
>>> for the string but not when it gets passed the string itself?
>>
>> s0 and s2 originally include only pure ASCII characters, so they are
>> unibyte strings.  Try making them multibyte before using aset.
>
> Thanks, that works.  But why are raw bytes inserted only with some
> multibyte strings (e.g. with "äöüß" but not with "ſðđŋ")?  Also, is
> there some way to ensure a string is handled as multibyte if it's not
> known what characters it contains?  E.g., s0 in my example sexp could be
> bound to some string by a function call and before applying the function
> it is not known if the string is multibyte; is there some way in Lisp to
> say "treat the value of s0 as multibyte (regardless of what characters
> it contains)"?

Also "aous" is also pure ASCII, so why don't raw bytes get inserted with
(insert (aset "aous" i (aref "äöüß" i)))?

Steve Berman



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Why does using aset sometimes output raw bytes?
  2018-12-09 15:16 Why does using aset sometimes output raw bytes? Stephen Berman
  2018-12-09 15:20 ` Eli Zaretskii
@ 2018-12-09 17:10 ` Stefan Monnier
  2018-12-09 17:20   ` Stephen Berman
  1 sibling, 1 reply; 17+ messages in thread
From: Stefan Monnier @ 2018-12-09 17:10 UTC (permalink / raw)
  To: help-gnu-emacs

> When I use aset to change characters in a string

I recommend you don't do that unless it's *really* indispensable.

> (insert (aset "aous" i (aref "äöüß" i)))

I also recommend you don't use the return value of `aset`.


        Stefan




^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Why does using aset sometimes output raw bytes?
  2018-12-09 15:46   ` Stephen Berman
  2018-12-09 15:56     ` Stephen Berman
@ 2018-12-09 17:12     ` Eli Zaretskii
  2018-12-09 17:32       ` Stephen Berman
  1 sibling, 1 reply; 17+ messages in thread
From: Eli Zaretskii @ 2018-12-09 17:12 UTC (permalink / raw)
  To: help-gnu-emacs

> From: Stephen Berman <stephen.berman@gmx.net>
> Cc: help-gnu-emacs@gnu.org
> Date: Sun, 09 Dec 2018 16:46:01 +0100
> 
> > s0 and s2 originally include only pure ASCII characters, so they are
> > unibyte strings.  Try making them multibyte before using aset.
> 
> Thanks, that works.  But why are raw bytes inserted only with some
> multibyte strings (e.g. with "äöüß" but not with "ſðđŋ")?

Because ſ doesn't fit in a single byte, so when you insert it, the
entire string is made multibyte, and then the other characters are
inserted into a multibyte string.

> Also, is there some way to ensure a string is handled as multibyte
> if it's not known what characters it contains?  E.g., s0 in my
> example sexp could be bound to some string by a function call and
> before applying the function it is not known if the string is
> multibyte;

You should generally keep away of such situations, but you don't tell
enough about what you are trying to accomplish to give more practical
advice.

To answer your question: you can test whether a string is multibyte
with multibyte-string-p, and you can make it multibyte if not.  The
only problematic situation is when a unibyte string includes non-ASCII
bytes; what is TRT in that situation depends on the situation.

> is there some way in Lisp to say "treat the value of s0 as multibyte
> (regardless of what characters it contains)"?

Not that I know of, no.  And I don't really understand how could such
a thing exist: how do you "treat as multibyte" an arbitrary byte that
is beyond 127 decimal?

> Also "aous" is also pure ASCII, so why don't raw bytes get inserted with
> (insert (aset "aous" i (aref "äöüß" i)))?

This inserts characters one by one into the current buffer, and the
buffer is multibyte, so Emacs does the conversion.  IOW, you don't
insert the string, you insert individual characters which aset
returns.



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Why does using aset sometimes output raw bytes?
  2018-12-09 17:10 ` Stefan Monnier
@ 2018-12-09 17:20   ` Stephen Berman
  2018-12-09 19:20     ` Stefan Monnier
  0 siblings, 1 reply; 17+ messages in thread
From: Stephen Berman @ 2018-12-09 17:20 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: help-gnu-emacs

On Sun, 09 Dec 2018 12:10:00 -0500 Stefan Monnier <monnier@iro.umontreal.ca> wrote:

>> When I use aset to change characters in a string
>
> I recommend you don't do that unless it's *really* indispensable.
>
>> (insert (aset "aous" i (aref "äöüß" i)))
>
> I also recommend you don't use the return value of `aset`.

I don't have a use case where using aset like this is indispensable, I
was just experimenting.  Are your reservations because the
implementation of aset is brittle, leading to things like the
observations I reported -- maybe too hard to fix and not worth the
trouble?  Or are there other reasons not to use aset as above?

Steve Berman



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Why does using aset sometimes output raw bytes?
  2018-12-09 17:12     ` Eli Zaretskii
@ 2018-12-09 17:32       ` Stephen Berman
  2018-12-09 17:47         ` Eli Zaretskii
  0 siblings, 1 reply; 17+ messages in thread
From: Stephen Berman @ 2018-12-09 17:32 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: help-gnu-emacs

On Sun, 09 Dec 2018 19:12:32 +0200 Eli Zaretskii <eliz@gnu.org> wrote:

>> From: Stephen Berman <stephen.berman@gmx.net>
>> Cc: help-gnu-emacs@gnu.org
>> Date: Sun, 09 Dec 2018 16:46:01 +0100
>> 
>> > s0 and s2 originally include only pure ASCII characters, so they are
>> > unibyte strings.  Try making them multibyte before using aset.
>> 
>> Thanks, that works.  But why are raw bytes inserted only with some
>> multibyte strings (e.g. with "äöüß" but not with "ſðđŋ")?
>
> Because ſ doesn't fit in a single byte, so when you insert it, the
> entire string is made multibyte, and then the other characters are
> inserted into a multibyte string.

This seems to imply that ä, ö, ü and ß do fit in a single byte?  Yet
(multibyte-string-p "äöüß") returns t.  So I still don't understand.

>> Also, is there some way to ensure a string is handled as multibyte
>> if it's not known what characters it contains?  E.g., s0 in my
>> example sexp could be bound to some string by a function call and
>> before applying the function it is not known if the string is
>> multibyte;
>
> You should generally keep away of such situations, but you don't tell
> enough about what you are trying to accomplish to give more practical
> advice.

Nothing serious, just some experimenting.

> To answer your question: you can test whether a string is multibyte
> with multibyte-string-p, and you can make it multibyte if not.  The
> only problematic situation is when a unibyte string includes non-ASCII
> bytes; what is TRT in that situation depends on the situation.
>
>> is there some way in Lisp to say "treat the value of s0 as multibyte
>> (regardless of what characters it contains)"?
>
> Not that I know of, no.  And I don't really understand how could such
> a thing exist: how do you "treat as multibyte" an arbitrary byte that
> is beyond 127 decimal?

Actually, for the code I was experimenting with, it seems to suffice to
use (make-string len 128) as the input to aset (before, I had used
(make-string len 32), which led to raw bytes being displayed).
>
>> Also "aous" is also pure ASCII, so why don't raw bytes get inserted with
>> (insert (aset "aous" i (aref "äöüß" i)))?
>
> This inserts characters one by one into the current buffer, and the
> buffer is multibyte, so Emacs does the conversion.  IOW, you don't
> insert the string, you insert individual characters which aset
> returns.

Ah, this makes sense.  Thanks.

Steve Berman



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Why does using aset sometimes output raw bytes?
  2018-12-09 17:32       ` Stephen Berman
@ 2018-12-09 17:47         ` Eli Zaretskii
  2018-12-09 18:50           ` Stephen Berman
  0 siblings, 1 reply; 17+ messages in thread
From: Eli Zaretskii @ 2018-12-09 17:47 UTC (permalink / raw)
  To: help-gnu-emacs

> From: Stephen Berman <stephen.berman@gmx.net>
> Cc: help-gnu-emacs@gnu.org
> Date: Sun, 09 Dec 2018 18:32:26 +0100
> 
> >> why are raw bytes inserted only with some
> >> multibyte strings (e.g. with "äöüß" but not with "ſðđŋ")?
> >
> > Because ſ doesn't fit in a single byte, so when you insert it, the
> > entire string is made multibyte, and then the other characters are
> > inserted into a multibyte string.
> 
> This seems to imply that ä, ö, ü and ß do fit in a single byte?  Yet
> (multibyte-string-p "äöüß") returns t.  So I still don't understand.

Look at the codepoints: the above are all less than FF hex, so they
can fit in a single byte.  By contrast, ſ is 17F hex, more than a
single byte can hold.  So inserting ſ into a unibyte string _must_
first make that string multibyte, whereas inserting ä etc. can leave
it unibyte.

Why (multibyte-string-p "äöüß") returns t is an unrelated issue: it
has to do with how the Lisp reader reads the string.  The result is a
multibyte string, where ä is represented by its UTF-8 sequence and not
by its single-byte codepoint E4 hex.  If you want a unibyte string
with these bytes, use (multibyte-string-p "\344\366\374\337") instead.

> >> is there some way in Lisp to say "treat the value of s0 as multibyte
> >> (regardless of what characters it contains)"?
> >
> > Not that I know of, no.  And I don't really understand how could such
> > a thing exist: how do you "treat as multibyte" an arbitrary byte that
> > is beyond 127 decimal?
> 
> Actually, for the code I was experimenting with, it seems to suffice to
> use (make-string len 128) as the input to aset (before, I had used
> (make-string len 32), which led to raw bytes being displayed).

Not sure I understand what you mean by "suffice".  Feel free to ask
questions if there are some left.



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Why does using aset sometimes output raw bytes?
  2018-12-09 17:47         ` Eli Zaretskii
@ 2018-12-09 18:50           ` Stephen Berman
  2018-12-09 18:55             ` Eli Zaretskii
  0 siblings, 1 reply; 17+ messages in thread
From: Stephen Berman @ 2018-12-09 18:50 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: help-gnu-emacs

On Sun, 09 Dec 2018 19:47:03 +0200 Eli Zaretskii <eliz@gnu.org> wrote:

>> From: Stephen Berman <stephen.berman@gmx.net>
>> Cc: help-gnu-emacs@gnu.org
>> Date: Sun, 09 Dec 2018 18:32:26 +0100
>> 
>> >> why are raw bytes inserted only with some
>> >> multibyte strings (e.g. with "äöüß" but not with "ſðđŋ")?
>> >
>> > Because ſ doesn't fit in a single byte, so when you insert it, the
>> > entire string is made multibyte, and then the other characters are
>> > inserted into a multibyte string.
>> 
>> This seems to imply that ä, ö, ü and ß do fit in a single byte?  Yet
>> (multibyte-string-p "äöüß") returns t.  So I still don't understand.
>
> Look at the codepoints: the above are all less than FF hex, so they
> can fit in a single byte.  By contrast, ſ is 17F hex, more than a
> single byte can hold.  So inserting ſ into a unibyte string _must_
> first make that string multibyte, whereas inserting ä etc. can leave
> it unibyte.
>
> Why (multibyte-string-p "äöüß") returns t is an unrelated issue: it
> has to do with how the Lisp reader reads the string.  The result is a
> multibyte string, where ä is represented by its UTF-8 sequence and not
> by its single-byte codepoint E4 hex.  If you want a unibyte string
> with these bytes, use (multibyte-string-p "\344\366\374\337") instead.

Thanks for the very clear and enlightening explanations; I feel I
understand this better now.

>> >> is there some way in Lisp to say "treat the value of s0 as multibyte
>> >> (regardless of what characters it contains)"?
>> >
>> > Not that I know of, no.  And I don't really understand how could such
>> > a thing exist: how do you "treat as multibyte" an arbitrary byte that
>> > is beyond 127 decimal?
>> 
>> Actually, for the code I was experimenting with, it seems to suffice to
>> use (make-string len 128) as the input to aset (before, I had used
>> (make-string len 32), which led to raw bytes being displayed).
>
> Not sure I understand what you mean by "suffice".  Feel free to ask
> questions if there are some left.

I was experimenting with aset to make random permutations of a string
and didn't understand why there were sometimes raw bytes in the result
(which also led to args-out-of-range errors), but using (make-string len
128) as the container for the permutations prevents that.  And with your
above explanations I now think I understand why.

Steve Berman



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Why does using aset sometimes output raw bytes?
  2018-12-09 18:50           ` Stephen Berman
@ 2018-12-09 18:55             ` Eli Zaretskii
  0 siblings, 0 replies; 17+ messages in thread
From: Eli Zaretskii @ 2018-12-09 18:55 UTC (permalink / raw)
  To: help-gnu-emacs

> From: Stephen Berman <stephen.berman@gmx.net>
> Cc: help-gnu-emacs@gnu.org
> Date: Sun, 09 Dec 2018 19:50:08 +0100
> 
> I was experimenting with aset to make random permutations of a string
> and didn't understand why there were sometimes raw bytes in the result
> (which also led to args-out-of-range errors), but using (make-string len
> 128) as the container for the permutations prevents that.  And with your
> above explanations I now think I understand why.

Yes, you need to start with a multibyte string to do this kind of
thing safely.



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Why does using aset sometimes output raw bytes?
  2018-12-09 17:20   ` Stephen Berman
@ 2018-12-09 19:20     ` Stefan Monnier
  2018-12-09 20:23       ` Eli Zaretskii
  2018-12-09 20:43       ` Stephen Berman
  0 siblings, 2 replies; 17+ messages in thread
From: Stefan Monnier @ 2018-12-09 19:20 UTC (permalink / raw)
  To: help-gnu-emacs

> I don't have a use case where using aset like this is indispensable, I
> was just experimenting.  Are your reservations because the
> implementation of aset is brittle, leading to things like the
> observations I reported -- maybe too hard to fix and not worth the
> trouble?

It's not the implementation, but the semantics of unibyte/multibyte
strings presumes that the difference doesn't matter much for ASCII-only
strings, which is mostly true but isn't true in the case of `aset`.

Also you probably expect `aset` to be constant-time, but on multibyte
strings it can take time O(N) where N is the length of the string:
Emacs's multibyte strings are designed for sequential access rather than
random access, and since chars can take a variable amount of space,
replacing one with another can require shifting things around and
allocating a new chunk of memory.

> Or are there other reasons not to use aset as above?

In most cases `aset` results in more complex and more brittle code when
working on strings.  It's not always the case and the code without
`aset` occasionally is a lot worse, admittedly, but as a first rule,
I strongly recommend to stay away with it.

You'll also gain karma points along the way,


        Stefan




^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Why does using aset sometimes output raw bytes?
  2018-12-09 19:20     ` Stefan Monnier
@ 2018-12-09 20:23       ` Eli Zaretskii
  2018-12-09 21:20         ` Stefan Monnier
  2018-12-09 20:43       ` Stephen Berman
  1 sibling, 1 reply; 17+ messages in thread
From: Eli Zaretskii @ 2018-12-09 20:23 UTC (permalink / raw)
  To: help-gnu-emacs

> From: Stefan Monnier <monnier@iro.umontreal.ca>
> Date: Sun, 09 Dec 2018 14:20:07 -0500
> 
> It's not the implementation, but the semantics of unibyte/multibyte
> strings presumes that the difference doesn't matter much for ASCII-only
> strings, which is mostly true but isn't true in the case of `aset`.

The same is true about concat, btw.



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Why does using aset sometimes output raw bytes?
  2018-12-09 19:20     ` Stefan Monnier
  2018-12-09 20:23       ` Eli Zaretskii
@ 2018-12-09 20:43       ` Stephen Berman
  1 sibling, 0 replies; 17+ messages in thread
From: Stephen Berman @ 2018-12-09 20:43 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: help-gnu-emacs

On Sun, 09 Dec 2018 14:20:07 -0500 Stefan Monnier <monnier@iro.umontreal.ca> wrote:

>> I don't have a use case where using aset like this is indispensable, I
>> was just experimenting.  Are your reservations because the
>> implementation of aset is brittle, leading to things like the
>> observations I reported -- maybe too hard to fix and not worth the
>> trouble?
>
> It's not the implementation, but the semantics of unibyte/multibyte
> strings presumes that the difference doesn't matter much for ASCII-only
> strings, which is mostly true but isn't true in the case of `aset`.

Yes, thanks; I also appreciate this better now after Eli's explanations.

> Also you probably expect `aset` to be constant-time, but on multibyte
> strings it can take time O(N) where N is the length of the string:
> Emacs's multibyte strings are designed for sequential access rather than
> random access, and since chars can take a variable amount of space,
> replacing one with another can require shifting things around and
> allocating a new chunk of memory.

Interesting.  I was in fact wondering about just such issues because of
code posted here that permutes strings using split-string and sort,
which prompted me to try some alternatives, one of which was to use a
while-loop instead of sort and another was using a loop and aset instead
of split-string.  I guess this is well-explored and I could probably do
a web search for the most efficient algorithm, but I really just wanted
to see what I could come up with in Emacs Lisp and so bumped into these
multibyte issues.  So it's already been a useful learning experience.

>> Or are there other reasons not to use aset as above?
>
> In most cases `aset` results in more complex and more brittle code when
> working on strings.  It's not always the case and the code without
> `aset` occasionally is a lot worse, admittedly, but as a first rule,
> I strongly recommend to stay away with it.
>
> You'll also gain karma points along the way,

Thanks for the feedback.

Steve Berman



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Why does using aset sometimes output raw bytes?
  2018-12-09 20:23       ` Eli Zaretskii
@ 2018-12-09 21:20         ` Stefan Monnier
  2018-12-10  5:59           ` Eli Zaretskii
  0 siblings, 1 reply; 17+ messages in thread
From: Stefan Monnier @ 2018-12-09 21:20 UTC (permalink / raw)
  To: help-gnu-emacs

>> It's not the implementation, but the semantics of unibyte/multibyte
>> strings presumes that the difference doesn't matter much for ASCII-only
>> strings, which is mostly true but isn't true in the case of `aset`.
> The same is true about concat, btw.

I think it's less severe (`aset` can end up changing (by side-effect)
a unibyte string to multibyte, i.e. changing the nature of the object,
which I believe is the only time we do something like that), but yes
similar problems appear elsewhere (hence the "mostly" above).


        Stefan




^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Why does using aset sometimes output raw bytes?
  2018-12-09 21:20         ` Stefan Monnier
@ 2018-12-10  5:59           ` Eli Zaretskii
  2018-12-10 13:56             ` Stefan Monnier
  0 siblings, 1 reply; 17+ messages in thread
From: Eli Zaretskii @ 2018-12-10  5:59 UTC (permalink / raw)
  To: help-gnu-emacs

> From: Stefan Monnier <monnier@iro.umontreal.ca>
> Date: Sun, 09 Dec 2018 16:20:59 -0500
> 
> >> It's not the implementation, but the semantics of unibyte/multibyte
> >> strings presumes that the difference doesn't matter much for ASCII-only
> >> strings, which is mostly true but isn't true in the case of `aset`.
> > The same is true about concat, btw.
> 
> I think it's less severe (`aset` can end up changing (by side-effect)
> a unibyte string to multibyte, i.e. changing the nature of the object,
> which I believe is the only time we do something like that), but yes
> similar problems appear elsewhere (hence the "mostly" above).

I'm not sure the "by side effect" part is an important distinction for
users, they might be surprised anyway.  For example:

  (let ((s1 "abcd")
	(s2 "абвг"))
    (message "s1: %s concat: %s"
	     (multibyte-string-p s1)
	     (multibyte-string-p (concat s1 s2))))
    => s1: nil concat: t

Some will say that this "converts" a unibyte string s1 to a multibyte
one just because it was concatenated.

People should always keep these gotchas in mind when working with
strings.



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Why does using aset sometimes output raw bytes?
  2018-12-10  5:59           ` Eli Zaretskii
@ 2018-12-10 13:56             ` Stefan Monnier
  0 siblings, 0 replies; 17+ messages in thread
From: Stefan Monnier @ 2018-12-10 13:56 UTC (permalink / raw)
  To: help-gnu-emacs

> I'm not sure the "by side effect" part is an important distinction for
> users,

From a language semantics point of view, it is a major difference.
In practice for users it's probably a minor detail, admittedly: code
that bumps into this corner case will likely suffer from so many other
corner case problems that this one may end up drowned in the noise.


        Stefan




^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2018-12-10 13:56 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-12-09 15:16 Why does using aset sometimes output raw bytes? Stephen Berman
2018-12-09 15:20 ` Eli Zaretskii
2018-12-09 15:46   ` Stephen Berman
2018-12-09 15:56     ` Stephen Berman
2018-12-09 17:12     ` Eli Zaretskii
2018-12-09 17:32       ` Stephen Berman
2018-12-09 17:47         ` Eli Zaretskii
2018-12-09 18:50           ` Stephen Berman
2018-12-09 18:55             ` Eli Zaretskii
2018-12-09 17:10 ` Stefan Monnier
2018-12-09 17:20   ` Stephen Berman
2018-12-09 19:20     ` Stefan Monnier
2018-12-09 20:23       ` Eli Zaretskii
2018-12-09 21:20         ` Stefan Monnier
2018-12-10  5:59           ` Eli Zaretskii
2018-12-10 13:56             ` Stefan Monnier
2018-12-09 20:43       ` Stephen Berman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).