unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
* (aset UNIBYTE-STRING MULTIBYTE-CHAR)
@ 2008-02-13  2:36 Kenichi Handa
  2008-02-13  2:49 ` Stefan Monnier
  2008-02-13 22:01 ` Richard Stallman
  0 siblings, 2 replies; 43+ messages in thread
From: Kenichi Handa @ 2008-02-13  2:36 UTC (permalink / raw)
  To: emacs-devel

Before the unicode merge, this worked:
  (let ((str "a")) (aset str 0 (decode-char 'ucs #x100)))

In emacs-unicode-2 branch, there was a discussion about the
rightness of aset changing the multibyteness of a string,
and I changed the code to signal an error in the above case.

But, I got reports claiming that the change breaks some of
already existing Elisp packages.  Although changing the
current code again to make the above code work, it causes
another problem in this case:
  (let ((str "a")) (aset str 0 #xC0))

Currently, it changes STR to the unibyte string "\300" (that
is the same as before unicode merge), but if we allow
changing the string multibyteness, perhaps STR must be
changed to the multibyte string of A-grave "À" because the
character code of A-grave is #xC0.  But, that means we loose
a way to easily manipulate raw byte data in a unibyte
string.

What do you think is the right thing for this matter?

---
Kenichi Handa
handa@ni.aist.go.jp




^ permalink raw reply	[flat|nested] 43+ messages in thread
* Re: (aset UNIBYTE-STRING MULTIBYTE-CHAR)
@ 2008-04-15  7:11 Kenichi Handa
  2008-04-15 15:52 ` Stefan Monnier
  0 siblings, 1 reply; 43+ messages in thread
From: Kenichi Handa @ 2008-04-15  7:11 UTC (permalink / raw)
  To: emacs-devel; +Cc: kazu

The discussion on this problem has been suspended for long.
I'd like to settle it.

I wrote:

> In article <jwvskzrgj6d.fsf-monnier+emacs@gnu.org>, Stefan Monnier <monnier@iro.umontreal.ca> writes:

> > > That inefficiency may or may not be important in any given context.
> > > Fixing it in casefiddle is definitely desirable.
> > > But is it worth breaking all such packages just so that they
> > > will optimize an operation that might not use much of the time anyway?

> > Why work around the problem in `aset' if it isn't worth fixing in the
> > original code?

> But you wrote:

> > > Then, shouldn't we start the experiment of inhibitting aset
> > > on strings just now?
> > 
> > But I do not think we're ready for that.  Maybe 10 years from now...

> I want to avoid treating non-ASCII chars different from
> ASCII.  Then, the only solution is to make aset work well
> for multibyte characters.

The attached simple change does the work.  May I install it?

---
Kenichi Handa
handa@ni.aist.go.jp


*** lisp.h.~1.617.~	2008-04-01 15:12:13.000000000 +0900
--- lisp.h	2008-04-15 15:42:52.000000000 +0900
***************
*** 725,730 ****
--- 725,737 ----
        (STR) = empty_unibyte_string;  \
      else XSTRING (STR)->size_byte = -1; } while (0)
  
+ /* Mark STR as a multibyte string.  Assure that STR contains only
+    ASCII characters in advance.  */
+ #define STRING_SET_MULTIBYTE(STR)  \
+   do { if (EQ (STR, empty_unibyte_string))  \
+       (STR) = empty_multibyte_string;  \
+     else XSTRING (STR)->size_byte = XSTRING (STR)->size; } while (0)
+ 
  /* Get text properties.  */
  #define STRING_INTERVALS(STR)  (XSTRING (STR)->intervals + 0)
  

*** data.c.~1.290.~	2008-03-27 20:16:37.000000000 +0900
--- data.c	2008-04-15 15:42:31.000000000 +0900
***************
*** 2093,2099 ****
        CHECK_NUMBER (newelt);
  
        if (XINT (newelt) >= 0 && ! SINGLE_BYTE_CHAR_P (XINT (newelt)))
! 	args_out_of_range (array, newelt);
        SSET (array, idxval, XINT (newelt));
      }
  
--- 2093,2109 ----
        CHECK_NUMBER (newelt);
  
        if (XINT (newelt) >= 0 && ! SINGLE_BYTE_CHAR_P (XINT (newelt)))
! 	{
! 	  int i;
! 
! 	  for (i = SBYTES (array) - 1; i >= 0; i--)
! 	    if (SREF (array, i) >= 0x80)
! 	      args_out_of_range (array, newelt);
! 	  /* ARRAY is an ASCII string.  Convert it to a multibyte
! 	     string, and try `aset' again.  */
! 	  STRING_SET_MULTIBYTE (array);
! 	  return Faset (array, idx, newelt);
! 	}
        SSET (array, idxval, XINT (newelt));
      }
  




^ permalink raw reply	[flat|nested] 43+ messages in thread
* Re: (aset UNIBYTE-STRING MULTIBYTE-CHAR)
@ 2008-05-07 19:31 Harald Hanche-Olsen
  2008-05-14  6:54 ` Harald Hanche-Olsen
  0 siblings, 1 reply; 43+ messages in thread
From: Harald Hanche-Olsen @ 2008-05-07 19:31 UTC (permalink / raw)
  To: emacs-devel; +Cc: eliz

This works as it should in the latest CVS:

(setq foo (make-string 4 ?a))
(aset foo 1 ?€) ; <= that's a euro sign

But this fails:

(setq foo (make-string 4 ?a))
(aset foo 1 ?å)
(aset foo 1 ?€) ; => Error: args out of range

The problem seems to lie in these lines (2095-2107) from data.c:

      if (XINT (newelt) >= 0 && ! SINGLE_BYTE_CHAR_P (XINT (newelt)))
	{
	  int i;

	  for (i = SBYTES (array) - 1; i >= 0; i--)
	    if (SREF (array, i) >= 0x80)
	      args_out_of_range (array, newelt);
	  /* ARRAY is an ASCII string.  Convert it to a multibyte
	     string, and try `aset' again.  */
	  STRING_SET_MULTIBYTE (array);
	  return Faset (array, idx, newelt);
	}
      SSET (array, idxval, XINT (newelt));

I am sure the test for members >= 0x80 is there for a good reason, but
it clearly screws up this case and makes the fix rather less useful
than it should have been. I don't know emacs internals well enough to
suggest a fix.

And yes, this did bite in real life: It caused mew to choke on a
malformed spam email. No disaster obviously, but inconvenient.

- Harald

PS. My apologies for messing up threading; I wasn't on the list when
the message I am responding to was posted on 2008-07-15, so I don't
know its message-id.




^ permalink raw reply	[flat|nested] 43+ messages in thread

end of thread, other threads:[~2008-05-15  6:11 UTC | newest]

Thread overview: 43+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-02-13  2:36 (aset UNIBYTE-STRING MULTIBYTE-CHAR) Kenichi Handa
2008-02-13  2:49 ` Stefan Monnier
2008-02-13  3:48   ` Kenichi Handa
2008-02-13 15:33     ` Stefan Monnier
2008-02-13 18:06       ` Stephen J. Turnbull
2008-02-13 19:33         ` Stefan Monnier
2008-02-13 22:49         ` Miles Bader
2008-02-14  1:11           ` Stephen J. Turnbull
2008-02-14  1:17             ` Miles Bader
2008-02-14  1:40               ` Stefan Monnier
2008-02-14  1:49                 ` Miles Bader
2008-02-14 18:10                 ` Richard Stallman
2008-02-14 22:40                   ` David Kastrup
2008-02-15  1:08                     ` Stephen J. Turnbull
2008-02-15  1:17                       ` Miles Bader
2008-02-15  7:27                         ` David Kastrup
2008-02-15 12:58                     ` Richard Stallman
2008-02-14 23:37                   ` Leo
2008-02-15 12:59                     ` Richard Stallman
2008-02-14  4:20               ` Stephen J. Turnbull
2008-02-14  4:42         ` Richard Stallman
2008-02-15  1:39       ` Kenichi Handa
2008-02-15  4:27         ` Stefan Monnier
2008-02-15  8:42         ` Eli Zaretskii
2008-02-15  8:53           ` Miles Bader
2008-02-16 12:55             ` Eli Zaretskii
2008-02-16  5:53         ` Richard Stallman
2008-02-16 14:33           ` Stefan Monnier
2008-02-17 20:29             ` Richard Stallman
2008-02-18  1:15               ` Stefan Monnier
2008-02-18  4:00                 ` Kenichi Handa
2008-02-18 17:31                 ` Richard Stallman
2008-02-13 22:01 ` Richard Stallman
2008-02-13 23:13   ` Miles Bader
  -- strict thread matches above, loose matches on Subject: below --
2008-04-15  7:11 Kenichi Handa
2008-04-15 15:52 ` Stefan Monnier
2008-04-17  1:13   ` Kenichi Handa
2008-05-07 19:31 Harald Hanche-Olsen
2008-05-14  6:54 ` Harald Hanche-Olsen
2008-05-14 12:22   ` Stefan Monnier
2008-05-14 12:50     ` Harald Hanche-Olsen
2008-05-15  1:18       ` Stefan Monnier
2008-05-15  6:11         ` Harald Hanche-Olsen

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).