From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Stefan Monnier Newsgroups: gmane.emacs.devel Subject: Re: (aset UNIBYTE-STRING MULTIBYTE-CHAR) Date: Sat, 16 Feb 2008 09:33:24 -0500 Message-ID: References: NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: ger.gmane.org 1203172426 9447 80.91.229.12 (16 Feb 2008 14:33:46 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Sat, 16 Feb 2008 14:33:46 +0000 (UTC) Cc: emacs-devel@gnu.org, Kenichi Handa To: rms@gnu.org Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Sat Feb 16 15:34:08 2008 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.50) id 1JQO7C-0003Ww-HI for ged-emacs-devel@m.gmane.org; Sat, 16 Feb 2008 15:34:06 +0100 Original-Received: from localhost ([127.0.0.1] helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1JQO6h-00041j-W2 for ged-emacs-devel@m.gmane.org; Sat, 16 Feb 2008 09:33:36 -0500 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1JQO6c-0003yp-QL for emacs-devel@gnu.org; Sat, 16 Feb 2008 09:33:30 -0500 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1JQO6b-0003y5-Vv for emacs-devel@gnu.org; Sat, 16 Feb 2008 09:33:30 -0500 Original-Received: from [199.232.76.173] (helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1JQO6b-0003xq-QR for emacs-devel@gnu.org; Sat, 16 Feb 2008 09:33:29 -0500 Original-Received: from ironport2-out.pppoe.ca ([206.248.154.182]) by monty-python.gnu.org with esmtp (Exim 4.60) (envelope-from ) id 1JQO6Y-0003tx-0v; Sat, 16 Feb 2008 09:33:26 -0500 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: Ah4FAOqCtkdMCpRc/2dsb2JhbACRY5sSgQI X-IronPort-AV: E=Sophos;i="4.25,361,1199682000"; d="scan'208";a="14597353" Original-Received: from smtp.pppoe.ca ([65.39.196.238]) by ironport2-out.pppoe.ca with ESMTP; 16 Feb 2008 09:33:24 -0500 Original-Received: from pastel.home ([76.10.148.92]) by smtp.pppoe.ca (Internet Mail Server v1.0) with ESMTP id WQX98024; Sat, 16 Feb 2008 09:33:24 -0500 Original-Received: by pastel.home (Postfix, from userid 20848) id 4800A7FCA; Sat, 16 Feb 2008 09:33:24 -0500 (EST) In-Reply-To: (Richard Stallman's message of "Sat, 16 Feb 2008 00:53:08 -0500") User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/23.0.60 (gnu/linux) X-detected-kernel: by monty-python.gnu.org: Genre and OS details not recognized. X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:89249 Archived-At: > If we are going to inhibit aset on multibyte strings, I think we > should inhibit aset on any strings to avoid a further confusion. > I think someone should try making it work. > The way I suggested should not be terribly hard. The problem is the following: while it can be made to work, it will be inefficient. If we just make it work, the callers will never get to know that they're doing things in a terribly inefficient way. The real fix is to change the caller. BTW, I suggest the patch below to fix one such caller. Stefan --- orig/src/casefiddle.c +++ mod/src/casefiddle.c @@ -75,23 +76,18 @@ return obj; } - if (STRINGP (obj)) + if (!STRINGP (obj)) + wrong_type_argument (Qchar_or_string_p, obj); + else if (STRING_UNIBYTE (obj)) { - int multibyte = STRING_MULTIBYTE (obj); - int i, i_byte, len; - int size = SCHARS (obj); + EMACS_INT i; + EMACS_INT size = SCHARS (obj); obj = Fcopy_sequence (obj); - for (i = i_byte = 0; i < size; i++, i_byte += len) + for (i = 0; i < size; i++) { - if (multibyte) - c = STRING_CHAR_AND_LENGTH (SDATA (obj) + i_byte, 0, len); - else - { - c = SREF (obj, i_byte); - len = 1; - MAKE_CHAR_MULTIBYTE (c); - } + c = SREF (obj, i); + MAKE_CHAR_MULTIBYTE (c); c1 = c; if (inword && flag != CASE_CAPITALIZE_UP) c = DOWNCASE (c); @@ -102,24 +98,51 @@ inword = (SYNTAX (c) == Sword); if (c != c1) { - if (! multibyte) - { - MAKE_CHAR_UNIBYTE (c); - SSET (obj, i_byte, c); - } - else if (ASCII_CHAR_P (c1) && ASCII_CHAR_P (c)) - SSET (obj, i_byte, c); - else - { - Faset (obj, make_number (i), make_number (c)); - i_byte += CHAR_BYTES (c) - len; - } + MAKE_CHAR_UNIBYTE (c); + if (c < 0 || c > 255) + error ("Non-unibyte char in unibyte string"); + SSET (obj, i, c); } } return obj; } + else + { + EMACS_INT i, i_byte, len; + EMACS_INT size = SCHARS (obj); + USE_SAFE_ALLOCA; + unsigned char *dst, *o; + /* Over-allocate by 12%: this is a minor overhead, but should be + sufficient in 99.999% of the cases to avoid a reallocation. */ + EMACS_INT o_size = SBYTES (obj) + SBYTES (obj) / 8 + MAX_MULTIBYTE_LENGTH; + SAFE_ALLOCA (dst, void *, o_size); + o = dst; - wrong_type_argument (Qchar_or_string_p, obj); + for (i = i_byte = 0; i < size; i++, i_byte += len) + { + if ((o - dst) + MAX_MULTIBYTE_LENGTH > o_size) + { /* Not enough space for the next char: grow the destination. */ + unsigned char *old_dst = dst; + o_size += o_size; /* Probably overkill, but extremely rare. */ + SAFE_ALLOCA (dst, void *, o_size); + bcopy (old_dst, dst, o - old_dst); + o = dst + (o - old_dst); + } + c = STRING_CHAR_AND_LENGTH (SDATA (obj) + i_byte, 0, len); + if (inword && flag != CASE_CAPITALIZE_UP) + c = DOWNCASE (c); + else if (!UPPERCASEP (c) + && (!inword || flag != CASE_CAPITALIZE_UP)) + c = UPCASE1 (c); + if ((int) flag >= (int) CASE_CAPITALIZE) + inword = (SYNTAX (c) == Sword); + o += CHAR_STRING (c, o); + } + eassert (o - dst <= o_size); + obj = make_multibyte_string (dst, size, o - dst); + SAFE_FREE (); + return obj; + } } DEFUN ("upcase", Fupcase, Supcase, 1, 1, 0, @@ -329,10 +352,10 @@ return Qnil; } -Lisp_Object +static Lisp_Object operate_on_word (arg, newpoint) Lisp_Object arg; - int *newpoint; + EMACS_INT *newpoint; { Lisp_Object val; int farend; @@ -358,7 +381,7 @@ Lisp_Object arg; { Lisp_Object beg, end; - int newpoint; + EMACS_INT newpoint; XSETFASTINT (beg, PT); end = operate_on_word (arg, &newpoint); casify_region (CASE_UP, beg, end); @@ -373,7 +396,7 @@ Lisp_Object arg; { Lisp_Object beg, end; - int newpoint; + EMACS_INT newpoint; XSETFASTINT (beg, PT); end = operate_on_word (arg, &newpoint); casify_region (CASE_DOWN, beg, end); @@ -390,7 +413,7 @@ Lisp_Object arg; { Lisp_Object beg, end; - int newpoint; + EMACS_INT newpoint; XSETFASTINT (beg, PT); end = operate_on_word (arg, &newpoint); casify_region (CASE_CAPITALIZE, beg, end);