From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Kenichi Handa Newsgroups: gmane.emacs.bugs Subject: bug#9318: 23.3.50; The first call of encode-coding-region() returns wrong result Date: Mon, 05 Dec 2011 16:10:11 +0900 Message-ID: References: <20110818090129.DC129360038@msa102.auone-net.jp> NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain X-Trace: dough.gmane.org 1323069046 12828 80.91.229.12 (5 Dec 2011 07:10:46 GMT) X-Complaints-To: usenet@dough.gmane.org NNTP-Posting-Date: Mon, 5 Dec 2011 07:10:46 +0000 (UTC) Cc: kzhr@d1.dion.ne.jp, schwab@linux-m68k.org To: 9318@debbugs.gnu.org Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Mon Dec 05 08:10:40 2011 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([140.186.70.17]) by lo.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1RXShC-0007TU-SL for geb-bug-gnu-emacs@m.gmane.org; Mon, 05 Dec 2011 08:10:39 +0100 Original-Received: from localhost ([::1]:54971 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1RXShC-0005rj-8Y for geb-bug-gnu-emacs@m.gmane.org; Mon, 05 Dec 2011 02:10:38 -0500 Original-Received: from eggs.gnu.org ([140.186.70.92]:57142) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1RXSh8-0005rd-Rr for bug-gnu-emacs@gnu.org; Mon, 05 Dec 2011 02:10:36 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1RXSh6-0007xg-Em for bug-gnu-emacs@gnu.org; Mon, 05 Dec 2011 02:10:34 -0500 Original-Received: from debbugs.gnu.org ([140.186.70.43]:52931) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1RXSh6-0007xY-C1 for bug-gnu-emacs@gnu.org; Mon, 05 Dec 2011 02:10:32 -0500 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.69) (envelope-from ) id 1RXSha-0005Nc-Bc for bug-gnu-emacs@gnu.org; Mon, 05 Dec 2011 02:11:02 -0500 X-Loop: help-debbugs@gnu.org Resent-From: Kenichi Handa Original-Sender: debbugs-submit-bounces@debbugs.gnu.org Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Mon, 05 Dec 2011 07:11:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 9318 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: Original-Received: via spool by 9318-submit@debbugs.gnu.org id=B9318.132306905720666 (code B ref 9318); Mon, 05 Dec 2011 07:11:02 +0000 Original-Received: (at 9318) by debbugs.gnu.org; 5 Dec 2011 07:10:57 +0000 Original-Received: from localhost ([127.0.0.1] helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1RXShU-0005NH-RO for submit@debbugs.gnu.org; Mon, 05 Dec 2011 02:10:57 -0500 Original-Received: from mx1.aist.go.jp ([150.29.246.133]) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1RXShR-0005N8-IM for 9318@debbugs.gnu.org; Mon, 05 Dec 2011 02:10:56 -0500 Original-Received: from rqsmtp2.aist.go.jp (rqsmtp2.aist.go.jp [150.29.254.123]) by mx1.aist.go.jp with ESMTP id pB57ADKB023640; Mon, 5 Dec 2011 16:10:14 +0900 (JST) env-from (handa@m17n.org) Original-Received: from smtp2.aist.go.jp by rqsmtp2.aist.go.jp with ESMTP id pB57ADVZ006793; Mon, 5 Dec 2011 16:10:13 +0900 (JST) env-from (handa@m17n.org) Original-Received: by smtp2.aist.go.jp with ESMTP id pB57ACVl028557; Mon, 5 Dec 2011 16:10:12 +0900 (JST) env-from (handa@m17n.org) In-Reply-To: (message from Kenichi Handa on Thu, 01 Dec 2011 10:56:12 +0900) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.11 Precedence: list Resent-Date: Mon, 05 Dec 2011 02:11:02 -0500 X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 2) X-Received-From: 140.186.70.43 X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Original-Sender: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.bugs:54703 Archived-At: In article , Kenichi Handa writes: > In article <20110830233131.C74A61E0043@msa101.auone-net.jp>, Kazuhiro Ito writes: > > Here is the patch for the code, which contains Andreas' patch. In my > > environment, problems are fixed. I think it would be better that the > > interface of encode_designation_at_bol() is changed. > Oops, sorry, I have vaguely thought that your patch below > has already been applied, but just noticed that it was not. > I'll commit a slightly modified version including the > improved interface for encode_designation_at_bol soon. I've just installed the following changes. As I don't have cygwin environment now, could you please check if this change surely fix the problem? --- Kenichi Handa handa@m17n.org 2011-12-05 Kenichi Handa * coding.c (encode_designation_at_bol): New args charbuf_end and dst. Return the number of produced bytes. Callers changed. (coding_set_source): Return how many bytes coding->source was relocated. (coding_set_destination): Return how many bytes coding->destination was relocated. (CODING_DECODE_CHAR, CODING_ENCODE_CHAR, CODING_CHAR_CHARSET) (CODING_CHAR_CHARSET_P): Adjusted for the avove changes. 2011-12-05 Kazuhiro Ito (tiny change) * coding.c (CODING_CHAR_CHARSET_P): New macro. (encode_coding_emacs_mule, encode_coding_iso_2022): Use the above macro (Bug#9318). 2011-12-05 Andreas Schwab The following changes are to fix Bug#9318. * coding.c (CODING_ENCODE_CHAR, CODING_CHAR_CHARSET): New macros. (encode_coding_emacs_mule, ENCODE_ISO_CHARACTER) (encode_coding_iso_2022, encode_coding_sjis) (encode_coding_big5, encode_coding_charset): Use the above macros. === modified file 'src/coding.c' --- src/coding.c 2011-11-07 01:57:07 +0000 +++ src/coding.c 2011-12-05 06:14:46 +0000 @@ -847,16 +847,16 @@ static void decode_coding_raw_text (struct coding_system *); static int encode_coding_raw_text (struct coding_system *); -static void coding_set_source (struct coding_system *); -static void coding_set_destination (struct coding_system *); +static EMACS_INT coding_set_source (struct coding_system *); +static EMACS_INT coding_set_destination (struct coding_system *); static void coding_alloc_by_realloc (struct coding_system *, EMACS_INT); static void coding_alloc_by_making_gap (struct coding_system *, EMACS_INT, EMACS_INT); static unsigned char *alloc_destination (struct coding_system *, EMACS_INT, unsigned char *); static void setup_iso_safe_charsets (Lisp_Object); -static unsigned char *encode_designation_at_bol (struct coding_system *, - int *, unsigned char *); +static int encode_designation_at_bol (struct coding_system *, + int *, int *, unsigned char *); static int detect_eol (const unsigned char *, EMACS_INT, enum coding_category); static Lisp_Object adjust_coding_eol_type (struct coding_system *, int); @@ -915,27 +915,68 @@ } } -/* This wrapper macro is used to preserve validity of pointers into - buffer text across calls to decode_char, which could cause - relocation of buffers if it loads a charset map, because loading a - charset map allocates large structures. */ +/* These wrapper macros are used to preserve validity of pointers into + buffer text across calls to decode_char, encode_char, etc, which + could cause relocation of buffers if it loads a charset map, + because loading a charset map allocates large structures. */ + #define CODING_DECODE_CHAR(coding, src, src_base, src_end, charset, code, c) \ do { \ + EMACS_INT offset; \ + \ charset_map_loaded = 0; \ c = DECODE_CHAR (charset, code); \ - if (charset_map_loaded) \ + if (charset_map_loaded \ + && (offset = coding_set_source (coding))) \ { \ - const unsigned char *orig = coding->source; \ - EMACS_INT offset; \ - \ - coding_set_source (coding); \ - offset = coding->source - orig; \ src += offset; \ src_base += offset; \ src_end += offset; \ } \ } while (0) +#define CODING_ENCODE_CHAR(coding, dst, dst_end, charset, c, code) \ + do { \ + EMACS_INT offset; \ + \ + charset_map_loaded = 0; \ + code = ENCODE_CHAR (charset, c); \ + if (charset_map_loaded \ + && (offset = coding_set_destination (coding))) \ + { \ + dst += offset; \ + dst_end += offset; \ + } \ + } while (0) + +#define CODING_CHAR_CHARSET(coding, dst, dst_end, c, charset_list, code_return, charset) \ + do { \ + EMACS_INT offset; \ + \ + charset_map_loaded = 0; \ + charset = char_charset (c, charset_list, code_return); \ + if (charset_map_loaded \ + && (offset = coding_set_destination (coding))) \ + { \ + dst += offset; \ + dst_end += offset; \ + } \ + } while (0) + +#define CODING_CHAR_CHARSET_P(coding, dst, dst_end, c, charset, result) \ + do { \ + EMACS_INT offset; \ + \ + charset_map_loaded = 0; \ + result = CHAR_CHARSET_P (c, charset); \ + if (charset_map_loaded \ + && (offset = coding_set_destination (coding))) \ + { \ + dst += offset; \ + dst_end += offset; \ + } \ + } while (0) + /* If there are at least BYTES length of room at dst, allocate memory for coding->destination and update dst and dst_end. We don't have @@ -1015,9 +1056,14 @@ | ((p)[-1] & 0x3F)))) -static void +/* Update coding->source from coding->src_object, and return how many + bytes coding->source was changed. */ + +static EMACS_INT coding_set_source (struct coding_system *coding) { + const unsigned char *orig = coding->source; + if (BUFFERP (coding->src_object)) { struct buffer *buf = XBUFFER (coding->src_object); @@ -1036,11 +1082,18 @@ /* Otherwise, the source is C string and is never relocated automatically. Thus we don't have to update anything. */ } + return coding->source - orig; } -static void + +/* Update coding->destination from coding->dst_object, and return how + many bytes coding->destination was changed. */ + +static EMACS_INT coding_set_destination (struct coding_system *coding) { + const unsigned char *orig = coding->destination; + if (BUFFERP (coding->dst_object)) { if (BUFFERP (coding->src_object) && coding->src_pos < 0) @@ -1065,6 +1118,7 @@ /* Otherwise, the destination is C string and is never relocated automatically. Thus we don't have to update anything. */ } + return coding->destination - orig; } @@ -2650,14 +2704,19 @@ if (preferred_charset_id >= 0) { + int result; + charset = CHARSET_FROM_ID (preferred_charset_id); - if (CHAR_CHARSET_P (c, charset)) + CODING_CHAR_CHARSET_P (coding, dst, dst_end, c, charset, result); + if (result) code = ENCODE_CHAR (charset, c); else - charset = char_charset (c, charset_list, &code); + CODING_CHAR_CHARSET (coding, dst, dst_end, c, charset_list, + &code, charset); } else - charset = char_charset (c, charset_list, &code); + CODING_CHAR_CHARSET (coding, dst, dst_end, c, charset_list, + &code, charset); if (! charset) { c = coding->default_char; @@ -2666,7 +2725,8 @@ EMIT_ONE_ASCII_BYTE (c); continue; } - charset = char_charset (c, charset_list, &code); + CODING_CHAR_CHARSET (coding, dst, dst_end, c, charset_list, + &code, charset); } dimension = CHARSET_DIMENSION (charset); emacs_mule_id = CHARSET_EMACS_MULE_ID (charset); @@ -4185,7 +4245,8 @@ #define ENCODE_ISO_CHARACTER(charset, c) \ do { \ - int code = ENCODE_CHAR ((charset), (c)); \ + int code; \ + CODING_ENCODE_CHAR (coding, dst, dst_end, (charset), (c), code); \ \ if (CHARSET_DIMENSION (charset) == 1) \ ENCODE_ISO_CHARACTER_DIMENSION1 ((charset), code); \ @@ -4283,15 +4344,19 @@ /* Produce designation sequences of charsets in the line started from - SRC to a place pointed by DST, and return updated DST. + CHARBUF to a place pointed by DST, and return the number of + produced bytes. DST should not directly point a buffer text area + which may be relocated by char_charset call. If the current block ends before any end-of-line, we may fail to find all the necessary designations. */ -static unsigned char * -encode_designation_at_bol (struct coding_system *coding, int *charbuf, +static int +encode_designation_at_bol (struct coding_system *coding, + int *charbuf, int *charbuf_end, unsigned char *dst) { + unsigned char *orig; struct charset *charset; /* Table of charsets to be designated to each graphic register. */ int r[4]; @@ -4309,7 +4374,7 @@ for (reg = 0; reg < 4; reg++) r[reg] = -1; - while (found < 4) + while (charbuf < charbuf_end && found < 4) { int id; @@ -4334,7 +4399,7 @@ ENCODE_DESIGNATION (CHARSET_FROM_ID (r[reg]), reg, coding); } - return dst; + return dst - orig; } /* See the above "GENERAL NOTES on `encode_coding_XXX ()' functions". */ @@ -4378,13 +4443,26 @@ if (bol_designation) { - unsigned char *dst_prev = dst; - /* We have to produce designation sequences if any now. */ - dst = encode_designation_at_bol (coding, charbuf, dst); - bol_designation = 0; + unsigned char desig_buf[16]; + int nbytes; + EMACS_INT offset; + + charset_map_loaded = 0; + nbytes = encode_designation_at_bol (coding, charbuf, charbuf_end, + desig_buf); + if (charset_map_loaded + && (offset = coding_set_destination (coding))) + { + dst += offset; + dst_end += offset; + } + memcpy (dst, desig_buf, nbytes); + dst += nbytes; /* We are sure that designation sequences are all ASCII bytes. */ - produced_chars += dst - dst_prev; + produced_chars += nbytes; + bol_designation = 0; + ASSURE_DESTINATION (safe_room); } c = *charbuf++; @@ -4455,12 +4533,17 @@ if (preferred_charset_id >= 0) { + int result; + charset = CHARSET_FROM_ID (preferred_charset_id); - if (! CHAR_CHARSET_P (c, charset)) - charset = char_charset (c, charset_list, NULL); + CODING_CHAR_CHARSET_P (coding, dst, dst_end, c, charset, result); + if (! result) + CODING_CHAR_CHARSET (coding, dst, dst_end, c, charset_list, + NULL, charset); } else - charset = char_charset (c, charset_list, NULL); + CODING_CHAR_CHARSET (coding, dst, dst_end, c, charset_list, + NULL, charset); if (!charset) { if (coding->mode & CODING_MODE_SAFE_ENCODING) @@ -4471,7 +4554,8 @@ else { c = coding->default_char; - charset = char_charset (c, charset_list, NULL); + CODING_CHAR_CHARSET (coding, dst, dst_end, c, + charset_list, NULL, charset); } } ENCODE_ISO_CHARACTER (charset, c); @@ -4897,7 +4981,9 @@ else { unsigned code; - struct charset *charset = char_charset (c, charset_list, &code); + struct charset *charset; + CODING_CHAR_CHARSET (coding, dst, dst_end, c, charset_list, + &code, charset); if (!charset) { @@ -4909,7 +4995,8 @@ else { c = coding->default_char; - charset = char_charset (c, charset_list, &code); + CODING_CHAR_CHARSET (coding, dst, dst_end, c, + charset_list, &code, charset); } } if (code == CHARSET_INVALID_CODE (charset)) @@ -4984,7 +5071,9 @@ else { unsigned code; - struct charset *charset = char_charset (c, charset_list, &code); + struct charset *charset; + CODING_CHAR_CHARSET (coding, dst, dst_end, c, charset_list, + &code, charset); if (! charset) { @@ -4996,7 +5085,8 @@ else { c = coding->default_char; - charset = char_charset (c, charset_list, &code); + CODING_CHAR_CHARSET (coding, dst, dst_end, c, + charset_list, &code, charset); } } if (code == CHARSET_INVALID_CODE (charset)) @@ -5572,7 +5662,9 @@ } else { - charset = char_charset (c, charset_list, &code); + CODING_CHAR_CHARSET (coding, dst, dst_end, c, charset_list, + &code, charset); + if (charset) { if (CHARSET_DIMENSION (charset) == 1)