bug#9318: 23.3.50; The first call of encode-coding-region() returns wrong result

unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed

From: Kenichi Handa <handa@m17n.org>
To: 9318@debbugs.gnu.org
Cc: kzhr@d1.dion.ne.jp, schwab@linux-m68k.org
Subject: bug#9318: 23.3.50; The first call of encode-coding-region() returns wrong result
Date: Mon, 05 Dec 2011 16:10:11 +0900	[thread overview]
Message-ID: <tl7y5urmr3g.fsf@m17n.org> (raw)
In-Reply-To: <tl7zkfdnjgj.fsf@m17n.org> (message from Kenichi Handa on Thu, 01 Dec 2011 10:56:12 +0900)

In article <tl7zkfdnjgj.fsf@m17n.org>, Kenichi Handa <handa@m17n.org> writes:

> In article <20110830233131.C74A61E0043@msa101.auone-net.jp>, Kazuhiro Ito <kzhr@d1.dion.ne.jp> writes:
> > Here is the patch for the code, which contains Andreas' patch.  In my
> > environment, problems are fixed.  I think it would be better that the
> > interface of encode_designation_at_bol() is changed.

> Oops, sorry, I have vaguely thought that your patch below
> has already been applied, but just noticed that it was not.
> I'll commit a slightly modified version including the
> improved interface for encode_designation_at_bol soon.

I've just installed the following changes.  As I don't have
cygwin environment now, could you please check if this
change surely fix the problem?

---
Kenichi Handa
handa@m17n.org

2011-12-05  Kenichi Handa  <handa@m17n.org>

	* coding.c (encode_designation_at_bol): New args charbuf_end and
	dst.  Return the number of produced bytes.  Callers changed.
	(coding_set_source): Return how many bytes coding->source was
	relocated.
	(coding_set_destination): Return how many bytes
	coding->destination was relocated.
	(CODING_DECODE_CHAR, CODING_ENCODE_CHAR, CODING_CHAR_CHARSET)
	(CODING_CHAR_CHARSET_P): Adjusted for the avove changes.

2011-12-05  Kazuhiro Ito  <kzhr@d1.dion.ne.jp>  (tiny change)

	* coding.c (CODING_CHAR_CHARSET_P): New macro.
	(encode_coding_emacs_mule, encode_coding_iso_2022): Use the above
	macro (Bug#9318).

2011-12-05  Andreas Schwab  <schwab@linux-m68k.org>

	The following changes are to fix Bug#9318.

	* coding.c (CODING_ENCODE_CHAR, CODING_CHAR_CHARSET): New macros.
	(encode_coding_emacs_mule, ENCODE_ISO_CHARACTER)
	(encode_coding_iso_2022, encode_coding_sjis)
	(encode_coding_big5, encode_coding_charset): Use the above macros.


=== modified file 'src/coding.c'
--- src/coding.c	2011-11-07 01:57:07 +0000
+++ src/coding.c	2011-12-05 06:14:46 +0000
@@ -847,16 +847,16 @@
 static void decode_coding_raw_text (struct coding_system *);
 static int encode_coding_raw_text (struct coding_system *);
 
-static void coding_set_source (struct coding_system *);
-static void coding_set_destination (struct coding_system *);
+static EMACS_INT coding_set_source (struct coding_system *);
+static EMACS_INT coding_set_destination (struct coding_system *);
 static void coding_alloc_by_realloc (struct coding_system *, EMACS_INT);
 static void coding_alloc_by_making_gap (struct coding_system *,
                                         EMACS_INT, EMACS_INT);
 static unsigned char *alloc_destination (struct coding_system *,
                                          EMACS_INT, unsigned char *);
 static void setup_iso_safe_charsets (Lisp_Object);
-static unsigned char *encode_designation_at_bol (struct coding_system *,
-                                                 int *, unsigned char *);
+static int encode_designation_at_bol (struct coding_system *,
+				      int *, int *, unsigned char *);
 static int detect_eol (const unsigned char *,
                        EMACS_INT, enum coding_category);
 static Lisp_Object adjust_coding_eol_type (struct coding_system *, int);
@@ -915,27 +915,68 @@
     }
 }
 
-/* This wrapper macro is used to preserve validity of pointers into
-   buffer text across calls to decode_char, which could cause
-   relocation of buffers if it loads a charset map, because loading a
-   charset map allocates large structures.  */
+/* These wrapper macros are used to preserve validity of pointers into
+   buffer text across calls to decode_char, encode_char, etc, which
+   could cause relocation of buffers if it loads a charset map,
+   because loading a charset map allocates large structures.  */
+
 #define CODING_DECODE_CHAR(coding, src, src_base, src_end, charset, code, c) \
   do {									     \
+    EMACS_INT offset;							     \
+									     \
     charset_map_loaded = 0;						     \
     c = DECODE_CHAR (charset, code);					     \
-    if (charset_map_loaded)						     \
+    if (charset_map_loaded						     \
+	&& (offset = coding_set_source (coding)))			     \
       {									     \
-	const unsigned char *orig = coding->source;			     \
-	EMACS_INT offset;						     \
-									     \
-	coding_set_source (coding);					     \
-	offset = coding->source - orig;					     \
 	src += offset;							     \
 	src_base += offset;						     \
 	src_end += offset;						     \
       }									     \
   } while (0)
 
+#define CODING_ENCODE_CHAR(coding, dst, dst_end, charset, c, code)	\
+  do {									\
+    EMACS_INT offset;							\
+									\
+    charset_map_loaded = 0;						\
+    code = ENCODE_CHAR (charset, c);					\
+    if (charset_map_loaded						\
+	&& (offset = coding_set_destination (coding)))			\
+      {									\
+	dst += offset;							\
+	dst_end += offset;						\
+      }									\
+  } while (0)
+
+#define CODING_CHAR_CHARSET(coding, dst, dst_end, c, charset_list, code_return, charset) \
+  do {									\
+    EMACS_INT offset;							\
+									\
+    charset_map_loaded = 0;						\
+    charset = char_charset (c, charset_list, code_return);		\
+    if (charset_map_loaded						\
+	&& (offset = coding_set_destination (coding)))			\
+      {									\
+	dst += offset;							\
+	dst_end += offset;						\
+      }									\
+  } while (0)
+
+#define CODING_CHAR_CHARSET_P(coding, dst, dst_end, c, charset, result)	\
+  do {									\
+    EMACS_INT offset;							\
+									\
+    charset_map_loaded = 0;						\
+    result = CHAR_CHARSET_P (c, charset);				\
+    if (charset_map_loaded						\
+	&& (offset = coding_set_destination (coding)))			\
+      {									\
+	dst += offset;							\
+	dst_end += offset;						\
+      }									\
+  } while (0)
+
 
 /* If there are at least BYTES length of room at dst, allocate memory
    for coding->destination and update dst and dst_end.  We don't have
@@ -1015,9 +1056,14 @@
        | ((p)[-1] & 0x3F))))
 
 
-static void
+/* Update coding->source from coding->src_object, and return how many
+   bytes coding->source was changed.  */
+
+static EMACS_INT
 coding_set_source (struct coding_system *coding)
 {
+  const unsigned char *orig = coding->source;
+
   if (BUFFERP (coding->src_object))
     {
       struct buffer *buf = XBUFFER (coding->src_object);
@@ -1036,11 +1082,18 @@
       /* Otherwise, the source is C string and is never relocated
 	 automatically.  Thus we don't have to update anything.  */
     }
+  return coding->source - orig;
 }
 
-static void
+
+/* Update coding->destination from coding->dst_object, and return how
+   many bytes coding->destination was changed.  */
+
+static EMACS_INT
 coding_set_destination (struct coding_system *coding)
 {
+  const unsigned char *orig = coding->destination;
+
   if (BUFFERP (coding->dst_object))
     {
       if (BUFFERP (coding->src_object) && coding->src_pos < 0)
@@ -1065,6 +1118,7 @@
       /* Otherwise, the destination is C string and is never relocated
 	 automatically.  Thus we don't have to update anything.  */
     }
+  return coding->destination - orig;
 }
 
 
@@ -2650,14 +2704,19 @@
 
 	  if (preferred_charset_id >= 0)
 	    {
+	      int result;
+
 	      charset = CHARSET_FROM_ID (preferred_charset_id);
-	      if (CHAR_CHARSET_P (c, charset))
+	      CODING_CHAR_CHARSET_P (coding, dst, dst_end, c, charset, result);
+	      if (result)
 		code = ENCODE_CHAR (charset, c);
 	      else
-		charset = char_charset (c, charset_list, &code);
+		CODING_CHAR_CHARSET (coding, dst, dst_end, c, charset_list,
+				     &code, charset);
 	    }
 	  else
-	    charset = char_charset (c, charset_list, &code);
+	    CODING_CHAR_CHARSET (coding, dst, dst_end, c, charset_list,
+				 &code, charset);
 	  if (! charset)
 	    {
 	      c = coding->default_char;
@@ -2666,7 +2725,8 @@
 		  EMIT_ONE_ASCII_BYTE (c);
 		  continue;
 		}
-	      charset = char_charset (c, charset_list, &code);
+	      CODING_CHAR_CHARSET (coding, dst, dst_end, c, charset_list,
+				   &code, charset);
 	    }
 	  dimension = CHARSET_DIMENSION (charset);
 	  emacs_mule_id = CHARSET_EMACS_MULE_ID (charset);
@@ -4185,7 +4245,8 @@
 
 #define ENCODE_ISO_CHARACTER(charset, c)				   \
   do {									   \
-    int code = ENCODE_CHAR ((charset), (c));				   \
+    int code;								   \
+    CODING_ENCODE_CHAR (coding, dst, dst_end, (charset), (c), code);	   \
 									   \
     if (CHARSET_DIMENSION (charset) == 1)				   \
       ENCODE_ISO_CHARACTER_DIMENSION1 ((charset), code);		   \
@@ -4283,15 +4344,19 @@
 
 
 /* Produce designation sequences of charsets in the line started from
-   SRC to a place pointed by DST, and return updated DST.
+   CHARBUF to a place pointed by DST, and return the number of
+   produced bytes.  DST should not directly point a buffer text area
+   which may be relocated by char_charset call.
 
    If the current block ends before any end-of-line, we may fail to
    find all the necessary designations.  */
 
-static unsigned char *
-encode_designation_at_bol (struct coding_system *coding, int *charbuf,
+static int
+encode_designation_at_bol (struct coding_system *coding,
+			   int *charbuf, int *charbuf_end,
 			   unsigned char *dst)
 {
+  unsigned char *orig;
   struct charset *charset;
   /* Table of charsets to be designated to each graphic register.  */
   int r[4];
@@ -4309,7 +4374,7 @@
   for (reg = 0; reg < 4; reg++)
     r[reg] = -1;
 
-  while (found < 4)
+  while (charbuf < charbuf_end && found < 4)
     {
       int id;
 
@@ -4334,7 +4399,7 @@
 	  ENCODE_DESIGNATION (CHARSET_FROM_ID (r[reg]), reg, coding);
     }
 
-  return dst;
+  return dst - orig;
 }
 
 /* See the above "GENERAL NOTES on `encode_coding_XXX ()' functions".  */
@@ -4378,13 +4443,26 @@
 
       if (bol_designation)
 	{
-	  unsigned char *dst_prev = dst;
-
 	  /* We have to produce designation sequences if any now.  */
-	  dst = encode_designation_at_bol (coding, charbuf, dst);
-	  bol_designation = 0;
+	  unsigned char desig_buf[16];
+	  int nbytes;
+	  EMACS_INT offset;
+
+	  charset_map_loaded = 0;
+	  nbytes = encode_designation_at_bol (coding, charbuf, charbuf_end,
+					      desig_buf);
+	  if (charset_map_loaded
+	      && (offset = coding_set_destination (coding)))
+	    {
+	      dst += offset;
+	      dst_end += offset;
+	    }
+	  memcpy (dst, desig_buf, nbytes);
+	  dst += nbytes;
 	  /* We are sure that designation sequences are all ASCII bytes.  */
-	  produced_chars += dst - dst_prev;
+	  produced_chars += nbytes;
+	  bol_designation = 0;
+	  ASSURE_DESTINATION (safe_room);
 	}
 
       c = *charbuf++;
@@ -4455,12 +4533,17 @@
 
 	  if (preferred_charset_id >= 0)
 	    {
+	      int result;
+
 	      charset = CHARSET_FROM_ID (preferred_charset_id);
-	      if (! CHAR_CHARSET_P (c, charset))
-		charset = char_charset (c, charset_list, NULL);
+	      CODING_CHAR_CHARSET_P (coding, dst, dst_end, c, charset, result);
+	      if (! result)
+		CODING_CHAR_CHARSET (coding, dst, dst_end, c, charset_list,
+				     NULL, charset);
 	    }
 	  else
-	    charset = char_charset (c, charset_list, NULL);
+	    CODING_CHAR_CHARSET (coding, dst, dst_end, c, charset_list,
+				 NULL, charset);
 	  if (!charset)
 	    {
 	      if (coding->mode & CODING_MODE_SAFE_ENCODING)
@@ -4471,7 +4554,8 @@
 	      else
 		{
 		  c = coding->default_char;
-		  charset = char_charset (c, charset_list, NULL);
+		  CODING_CHAR_CHARSET (coding, dst, dst_end, c,
+				       charset_list, NULL, charset);
 		}
 	    }
 	  ENCODE_ISO_CHARACTER (charset, c);
@@ -4897,7 +4981,9 @@
       else
 	{
 	  unsigned code;
-	  struct charset *charset = char_charset (c, charset_list, &code);
+	  struct charset *charset;
+	  CODING_CHAR_CHARSET (coding, dst, dst_end, c, charset_list,
+			       &code, charset);
 
 	  if (!charset)
 	    {
@@ -4909,7 +4995,8 @@
 	      else
 		{
 		  c = coding->default_char;
-		  charset = char_charset (c, charset_list, &code);
+		  CODING_CHAR_CHARSET (coding, dst, dst_end, c,
+				       charset_list, &code, charset);
 		}
 	    }
 	  if (code == CHARSET_INVALID_CODE (charset))
@@ -4984,7 +5071,9 @@
       else
 	{
 	  unsigned code;
-	  struct charset *charset = char_charset (c, charset_list, &code);
+	  struct charset *charset;
+	  CODING_CHAR_CHARSET (coding, dst, dst_end, c, charset_list,
+			       &code, charset);
 
 	  if (! charset)
 	    {
@@ -4996,7 +5085,8 @@
 	      else
 		{
 		  c = coding->default_char;
-		  charset = char_charset (c, charset_list, &code);
+		  CODING_CHAR_CHARSET (coding, dst, dst_end, c,
+				       charset_list, &code, charset);
 		}
 	    }
 	  if (code == CHARSET_INVALID_CODE (charset))
@@ -5572,7 +5662,9 @@
 	}
       else
 	{
-	  charset = char_charset (c, charset_list, &code);
+	  CODING_CHAR_CHARSET (coding, dst, dst_end, c, charset_list,
+			       &code, charset);
+
 	  if (charset)
 	    {
 	      if (CHARSET_DIMENSION (charset) == 1)

next prev parent reply	other threads:[~2011-12-05  7:10 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-08-18  9:01 bug#9318: 23.3.50; The first call of encode-coding-region() returns wrong result on on Windows Kazuhiro Ito
2011-08-18  9:48 ` Andreas Schwab
2011-08-18 21:33   ` Kazuhiro Ito
2011-08-19 13:46 ` bug#9318: 23.3.50; The first call of encode-coding-region() returns wrong result Kazuhiro Ito
2011-08-20 21:26   ` Chong Yidong
2011-08-21  0:17     ` Kazuhiro Ito
2011-08-24  9:37       ` Kazuhiro Ito
2011-08-24 12:06         ` Eli Zaretskii
2011-08-25  9:49           ` Kazuhiro Ito
2011-08-24 17:59         ` Andreas Schwab
2011-08-25  9:54           ` Kazuhiro Ito
2011-08-26 11:41             ` Kazuhiro Ito
2011-08-28  0:04               ` Kazuhiro Ito
2011-08-30 23:30                 ` Kazuhiro Ito
2011-12-01  1:56                   ` Kenichi Handa
2011-12-05  7:10                     ` Kenichi Handa [this message]
2011-12-05 11:31                       ` Kazuhiro Ito
2011-12-05  9:11 ` Paul Eggert
2011-12-06  0:30   ` Kenichi Handa

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=tl7y5urmr3g.fsf@m17n.org \
    --to=handa@m17n.org \
    --cc=9318@debbugs.gnu.org \
    --cc=kzhr@d1.dion.ne.jp \
    --cc=schwab@linux-m68k.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).