all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: Harald Hanche-Olsen <hanche@math.ntnu.no>
To: monnier@iro.umontreal.ca
Cc: emacs-devel@gnu.org
Subject: Re: (aset UNIBYTE-STRING MULTIBYTE-CHAR)
Date: Thu, 15 May 2008 08:11:46 +0200 (CEST)	[thread overview]
Message-ID: <20080515.081146.48179041.hanche@math.ntnu.no> (raw)
In-Reply-To: <jwvskwk74np.fsf-monnier+emacs@gnu.org>

+ Stefan Monnier <monnier@iro.umontreal.ca>:

> I just want to see more
> examples to better understand the context and try to figure out what's
> the right way to fix the problem.  Notice that in your example,
> 
>    (setq foo (make-string 4 ?a))
>    (aset foo 1 ?å)
>    (aset foo 1 ?€) ; => Error: args out of range
> 
> the problem comes from the fact that now that we use Unicode, ?å = 229.
> So this integer is also the code of a byte, which is why the first aset
> succeeds.

Right. Or perhaps more accurately, it is why the first aset succeeds
without automagically converting foo to a multibyte string.

> Maybe the better answer is for `make-string' to always create
> multibyte strings, just like `string' now does.

Hmm. Except it doesn't, quite:

(multibyte-string-p (string ?a ?b ?c ?d)) => nil
(multibyte-string-p (string ?a ?b ?c ?å)) => t

It seems to be the presence of non-ASCII that triggers the creation of
a multibyte string, even though in this case a unibyte string could
also hold the result. In fact, the current behaviours of string and
make-string are quite similar:

(multibyte-string-p (make-string 3 ?a)) => nil
(multibyte-string-p (make-string 3 ?å)) => t

> In any case if you stay far away from `aset on strings' your life will
> be generally better, the birds will sing and the sun will shine.

8) I am willing to believe that.

> >   The most basic way to alter the contents of an existing string is with
> >   `aset' (*note Array Functions::).  `(aset STRING IDX CHAR)' stores CHAR
> >   into STRING at index IDX.  Each character occupies one or more bytes,
> >   and if CHAR needs a different number of bytes from the character
> >   already present at that index, `aset' signals an error.
> 
> > That last bit actually seems to be outdated: An error is not ALWAYS
> > signaled in the indicated situation, only sometimes.
> 
> I hope the text is correct, if not, please report it as a bug.

Okay. I'll run it past you here first, though, since my understanding
of multibyte strings is still patchy. This succeeds and returns "€a€":

(let ((str (make-string 3 ?€)))
  (aset str 1 ?a)
  str)

If I am not mistaken ?€ needs two bytes (or more?) while ?a needs one,
right? And since two (or more) is different from one, the above text
claims that aset signals an error? Or is my understanding wrong? There
is code in aset to shuffle the contents of a multibyte strings around
in case of a size mismatch, however:

      if (prev_bytes != new_bytes)
	{
	  /* We must relocate the string data.  */

> > (defun mew-addrstr-parse-syntax-list (str sep addrp &optional depth allow-spc)
> >   (when str
> >     (let* ((i 0) (len (length str))
> > 	   (par-cnt 0) (tmp-cnt 0) (sep-cnt 0)
> > 	   (tmp (mew-make-string len))
> > 	   c ret prevc)
> >       (catch 'max
> > 	(while (< i len)
> > 	  (setq c (aref str i)) ; <= problem occurs here
> > 	  ... deleted ...)))))
> 
> Hmm... I don't see any `aset'.

Rats. Not enough caffeine, too much work. The deleted code is a big
(cond ...), about 80 lines long, that I didn't want to burden the list
with (it performs parsing after all). I assure you that it contains
(aset tmp tmp-cnt c) in multiple places.

It could have achieved the same result by consing up a list of the
characters and using (string (nreverse char-list)), or perhaps by
appending chars to a temporary buffer, but it didn't.

- Harald




  reply	other threads:[~2008-05-15  6:11 UTC|newest]

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-05-07 19:31 (aset UNIBYTE-STRING MULTIBYTE-CHAR) Harald Hanche-Olsen
2008-05-14  6:54 ` Harald Hanche-Olsen
2008-05-14 12:22   ` Stefan Monnier
2008-05-14 12:50     ` Harald Hanche-Olsen
2008-05-15  1:18       ` Stefan Monnier
2008-05-15  6:11         ` Harald Hanche-Olsen [this message]
  -- strict thread matches above, loose matches on Subject: below --
2008-04-15  7:11 Kenichi Handa
2008-04-15 15:52 ` Stefan Monnier
2008-04-17  1:13   ` Kenichi Handa
2008-02-13  2:36 Kenichi Handa
2008-02-13  2:49 ` Stefan Monnier
2008-02-13  3:48   ` Kenichi Handa
2008-02-13 15:33     ` Stefan Monnier
2008-02-13 18:06       ` Stephen J. Turnbull
2008-02-13 19:33         ` Stefan Monnier
2008-02-13 22:49         ` Miles Bader
2008-02-14  1:11           ` Stephen J. Turnbull
2008-02-14  1:17             ` Miles Bader
2008-02-14  1:40               ` Stefan Monnier
2008-02-14  1:49                 ` Miles Bader
2008-02-14 18:10                 ` Richard Stallman
2008-02-14 22:40                   ` David Kastrup
2008-02-15  1:08                     ` Stephen J. Turnbull
2008-02-15  1:17                       ` Miles Bader
2008-02-15  7:27                         ` David Kastrup
2008-02-15 12:58                     ` Richard Stallman
2008-02-14 23:37                   ` Leo
2008-02-15 12:59                     ` Richard Stallman
2008-02-14  4:20               ` Stephen J. Turnbull
2008-02-14  4:42         ` Richard Stallman
2008-02-15  1:39       ` Kenichi Handa
2008-02-15  4:27         ` Stefan Monnier
2008-02-15  8:42         ` Eli Zaretskii
2008-02-15  8:53           ` Miles Bader
2008-02-16 12:55             ` Eli Zaretskii
2008-02-16  5:53         ` Richard Stallman
2008-02-16 14:33           ` Stefan Monnier
2008-02-17 20:29             ` Richard Stallman
2008-02-18  1:15               ` Stefan Monnier
2008-02-18  4:00                 ` Kenichi Handa
2008-02-18 17:31                 ` Richard Stallman
2008-02-13 22:01 ` Richard Stallman
2008-02-13 23:13   ` Miles Bader

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20080515.081146.48179041.hanche@math.ntnu.no \
    --to=hanche@math.ntnu.no \
    --cc=emacs-devel@gnu.org \
    --cc=monnier@iro.umontreal.ca \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.