From: Kenichi Handa <handa@m17n.org>
To: 9318@debbugs.gnu.org
Cc: kzhr@d1.dion.ne.jp, schwab@linux-m68k.org
Subject: bug#9318: 23.3.50; The first call of encode-coding-region() returns wrong result
Date: Mon, 05 Dec 2011 16:10:11 +0900 [thread overview]
Message-ID: <tl7y5urmr3g.fsf@m17n.org> (raw)
In-Reply-To: <tl7zkfdnjgj.fsf@m17n.org> (message from Kenichi Handa on Thu, 01 Dec 2011 10:56:12 +0900)
In article <tl7zkfdnjgj.fsf@m17n.org>, Kenichi Handa <handa@m17n.org> writes:
> In article <20110830233131.C74A61E0043@msa101.auone-net.jp>, Kazuhiro Ito <kzhr@d1.dion.ne.jp> writes:
> > Here is the patch for the code, which contains Andreas' patch. In my
> > environment, problems are fixed. I think it would be better that the
> > interface of encode_designation_at_bol() is changed.
> Oops, sorry, I have vaguely thought that your patch below
> has already been applied, but just noticed that it was not.
> I'll commit a slightly modified version including the
> improved interface for encode_designation_at_bol soon.
I've just installed the following changes. As I don't have
cygwin environment now, could you please check if this
change surely fix the problem?
---
Kenichi Handa
handa@m17n.org
2011-12-05 Kenichi Handa <handa@m17n.org>
* coding.c (encode_designation_at_bol): New args charbuf_end and
dst. Return the number of produced bytes. Callers changed.
(coding_set_source): Return how many bytes coding->source was
relocated.
(coding_set_destination): Return how many bytes
coding->destination was relocated.
(CODING_DECODE_CHAR, CODING_ENCODE_CHAR, CODING_CHAR_CHARSET)
(CODING_CHAR_CHARSET_P): Adjusted for the avove changes.
2011-12-05 Kazuhiro Ito <kzhr@d1.dion.ne.jp> (tiny change)
* coding.c (CODING_CHAR_CHARSET_P): New macro.
(encode_coding_emacs_mule, encode_coding_iso_2022): Use the above
macro (Bug#9318).
2011-12-05 Andreas Schwab <schwab@linux-m68k.org>
The following changes are to fix Bug#9318.
* coding.c (CODING_ENCODE_CHAR, CODING_CHAR_CHARSET): New macros.
(encode_coding_emacs_mule, ENCODE_ISO_CHARACTER)
(encode_coding_iso_2022, encode_coding_sjis)
(encode_coding_big5, encode_coding_charset): Use the above macros.
=== modified file 'src/coding.c'
--- src/coding.c 2011-11-07 01:57:07 +0000
+++ src/coding.c 2011-12-05 06:14:46 +0000
@@ -847,16 +847,16 @@
static void decode_coding_raw_text (struct coding_system *);
static int encode_coding_raw_text (struct coding_system *);
-static void coding_set_source (struct coding_system *);
-static void coding_set_destination (struct coding_system *);
+static EMACS_INT coding_set_source (struct coding_system *);
+static EMACS_INT coding_set_destination (struct coding_system *);
static void coding_alloc_by_realloc (struct coding_system *, EMACS_INT);
static void coding_alloc_by_making_gap (struct coding_system *,
EMACS_INT, EMACS_INT);
static unsigned char *alloc_destination (struct coding_system *,
EMACS_INT, unsigned char *);
static void setup_iso_safe_charsets (Lisp_Object);
-static unsigned char *encode_designation_at_bol (struct coding_system *,
- int *, unsigned char *);
+static int encode_designation_at_bol (struct coding_system *,
+ int *, int *, unsigned char *);
static int detect_eol (const unsigned char *,
EMACS_INT, enum coding_category);
static Lisp_Object adjust_coding_eol_type (struct coding_system *, int);
@@ -915,27 +915,68 @@
}
}
-/* This wrapper macro is used to preserve validity of pointers into
- buffer text across calls to decode_char, which could cause
- relocation of buffers if it loads a charset map, because loading a
- charset map allocates large structures. */
+/* These wrapper macros are used to preserve validity of pointers into
+ buffer text across calls to decode_char, encode_char, etc, which
+ could cause relocation of buffers if it loads a charset map,
+ because loading a charset map allocates large structures. */
+
#define CODING_DECODE_CHAR(coding, src, src_base, src_end, charset, code, c) \
do { \
+ EMACS_INT offset; \
+ \
charset_map_loaded = 0; \
c = DECODE_CHAR (charset, code); \
- if (charset_map_loaded) \
+ if (charset_map_loaded \
+ && (offset = coding_set_source (coding))) \
{ \
- const unsigned char *orig = coding->source; \
- EMACS_INT offset; \
- \
- coding_set_source (coding); \
- offset = coding->source - orig; \
src += offset; \
src_base += offset; \
src_end += offset; \
} \
} while (0)
+#define CODING_ENCODE_CHAR(coding, dst, dst_end, charset, c, code) \
+ do { \
+ EMACS_INT offset; \
+ \
+ charset_map_loaded = 0; \
+ code = ENCODE_CHAR (charset, c); \
+ if (charset_map_loaded \
+ && (offset = coding_set_destination (coding))) \
+ { \
+ dst += offset; \
+ dst_end += offset; \
+ } \
+ } while (0)
+
+#define CODING_CHAR_CHARSET(coding, dst, dst_end, c, charset_list, code_return, charset) \
+ do { \
+ EMACS_INT offset; \
+ \
+ charset_map_loaded = 0; \
+ charset = char_charset (c, charset_list, code_return); \
+ if (charset_map_loaded \
+ && (offset = coding_set_destination (coding))) \
+ { \
+ dst += offset; \
+ dst_end += offset; \
+ } \
+ } while (0)
+
+#define CODING_CHAR_CHARSET_P(coding, dst, dst_end, c, charset, result) \
+ do { \
+ EMACS_INT offset; \
+ \
+ charset_map_loaded = 0; \
+ result = CHAR_CHARSET_P (c, charset); \
+ if (charset_map_loaded \
+ && (offset = coding_set_destination (coding))) \
+ { \
+ dst += offset; \
+ dst_end += offset; \
+ } \
+ } while (0)
+
/* If there are at least BYTES length of room at dst, allocate memory
for coding->destination and update dst and dst_end. We don't have
@@ -1015,9 +1056,14 @@
| ((p)[-1] & 0x3F))))
-static void
+/* Update coding->source from coding->src_object, and return how many
+ bytes coding->source was changed. */
+
+static EMACS_INT
coding_set_source (struct coding_system *coding)
{
+ const unsigned char *orig = coding->source;
+
if (BUFFERP (coding->src_object))
{
struct buffer *buf = XBUFFER (coding->src_object);
@@ -1036,11 +1082,18 @@
/* Otherwise, the source is C string and is never relocated
automatically. Thus we don't have to update anything. */
}
+ return coding->source - orig;
}
-static void
+
+/* Update coding->destination from coding->dst_object, and return how
+ many bytes coding->destination was changed. */
+
+static EMACS_INT
coding_set_destination (struct coding_system *coding)
{
+ const unsigned char *orig = coding->destination;
+
if (BUFFERP (coding->dst_object))
{
if (BUFFERP (coding->src_object) && coding->src_pos < 0)
@@ -1065,6 +1118,7 @@
/* Otherwise, the destination is C string and is never relocated
automatically. Thus we don't have to update anything. */
}
+ return coding->destination - orig;
}
@@ -2650,14 +2704,19 @@
if (preferred_charset_id >= 0)
{
+ int result;
+
charset = CHARSET_FROM_ID (preferred_charset_id);
- if (CHAR_CHARSET_P (c, charset))
+ CODING_CHAR_CHARSET_P (coding, dst, dst_end, c, charset, result);
+ if (result)
code = ENCODE_CHAR (charset, c);
else
- charset = char_charset (c, charset_list, &code);
+ CODING_CHAR_CHARSET (coding, dst, dst_end, c, charset_list,
+ &code, charset);
}
else
- charset = char_charset (c, charset_list, &code);
+ CODING_CHAR_CHARSET (coding, dst, dst_end, c, charset_list,
+ &code, charset);
if (! charset)
{
c = coding->default_char;
@@ -2666,7 +2725,8 @@
EMIT_ONE_ASCII_BYTE (c);
continue;
}
- charset = char_charset (c, charset_list, &code);
+ CODING_CHAR_CHARSET (coding, dst, dst_end, c, charset_list,
+ &code, charset);
}
dimension = CHARSET_DIMENSION (charset);
emacs_mule_id = CHARSET_EMACS_MULE_ID (charset);
@@ -4185,7 +4245,8 @@
#define ENCODE_ISO_CHARACTER(charset, c) \
do { \
- int code = ENCODE_CHAR ((charset), (c)); \
+ int code; \
+ CODING_ENCODE_CHAR (coding, dst, dst_end, (charset), (c), code); \
\
if (CHARSET_DIMENSION (charset) == 1) \
ENCODE_ISO_CHARACTER_DIMENSION1 ((charset), code); \
@@ -4283,15 +4344,19 @@
/* Produce designation sequences of charsets in the line started from
- SRC to a place pointed by DST, and return updated DST.
+ CHARBUF to a place pointed by DST, and return the number of
+ produced bytes. DST should not directly point a buffer text area
+ which may be relocated by char_charset call.
If the current block ends before any end-of-line, we may fail to
find all the necessary designations. */
-static unsigned char *
-encode_designation_at_bol (struct coding_system *coding, int *charbuf,
+static int
+encode_designation_at_bol (struct coding_system *coding,
+ int *charbuf, int *charbuf_end,
unsigned char *dst)
{
+ unsigned char *orig;
struct charset *charset;
/* Table of charsets to be designated to each graphic register. */
int r[4];
@@ -4309,7 +4374,7 @@
for (reg = 0; reg < 4; reg++)
r[reg] = -1;
- while (found < 4)
+ while (charbuf < charbuf_end && found < 4)
{
int id;
@@ -4334,7 +4399,7 @@
ENCODE_DESIGNATION (CHARSET_FROM_ID (r[reg]), reg, coding);
}
- return dst;
+ return dst - orig;
}
/* See the above "GENERAL NOTES on `encode_coding_XXX ()' functions". */
@@ -4378,13 +4443,26 @@
if (bol_designation)
{
- unsigned char *dst_prev = dst;
-
/* We have to produce designation sequences if any now. */
- dst = encode_designation_at_bol (coding, charbuf, dst);
- bol_designation = 0;
+ unsigned char desig_buf[16];
+ int nbytes;
+ EMACS_INT offset;
+
+ charset_map_loaded = 0;
+ nbytes = encode_designation_at_bol (coding, charbuf, charbuf_end,
+ desig_buf);
+ if (charset_map_loaded
+ && (offset = coding_set_destination (coding)))
+ {
+ dst += offset;
+ dst_end += offset;
+ }
+ memcpy (dst, desig_buf, nbytes);
+ dst += nbytes;
/* We are sure that designation sequences are all ASCII bytes. */
- produced_chars += dst - dst_prev;
+ produced_chars += nbytes;
+ bol_designation = 0;
+ ASSURE_DESTINATION (safe_room);
}
c = *charbuf++;
@@ -4455,12 +4533,17 @@
if (preferred_charset_id >= 0)
{
+ int result;
+
charset = CHARSET_FROM_ID (preferred_charset_id);
- if (! CHAR_CHARSET_P (c, charset))
- charset = char_charset (c, charset_list, NULL);
+ CODING_CHAR_CHARSET_P (coding, dst, dst_end, c, charset, result);
+ if (! result)
+ CODING_CHAR_CHARSET (coding, dst, dst_end, c, charset_list,
+ NULL, charset);
}
else
- charset = char_charset (c, charset_list, NULL);
+ CODING_CHAR_CHARSET (coding, dst, dst_end, c, charset_list,
+ NULL, charset);
if (!charset)
{
if (coding->mode & CODING_MODE_SAFE_ENCODING)
@@ -4471,7 +4554,8 @@
else
{
c = coding->default_char;
- charset = char_charset (c, charset_list, NULL);
+ CODING_CHAR_CHARSET (coding, dst, dst_end, c,
+ charset_list, NULL, charset);
}
}
ENCODE_ISO_CHARACTER (charset, c);
@@ -4897,7 +4981,9 @@
else
{
unsigned code;
- struct charset *charset = char_charset (c, charset_list, &code);
+ struct charset *charset;
+ CODING_CHAR_CHARSET (coding, dst, dst_end, c, charset_list,
+ &code, charset);
if (!charset)
{
@@ -4909,7 +4995,8 @@
else
{
c = coding->default_char;
- charset = char_charset (c, charset_list, &code);
+ CODING_CHAR_CHARSET (coding, dst, dst_end, c,
+ charset_list, &code, charset);
}
}
if (code == CHARSET_INVALID_CODE (charset))
@@ -4984,7 +5071,9 @@
else
{
unsigned code;
- struct charset *charset = char_charset (c, charset_list, &code);
+ struct charset *charset;
+ CODING_CHAR_CHARSET (coding, dst, dst_end, c, charset_list,
+ &code, charset);
if (! charset)
{
@@ -4996,7 +5085,8 @@
else
{
c = coding->default_char;
- charset = char_charset (c, charset_list, &code);
+ CODING_CHAR_CHARSET (coding, dst, dst_end, c,
+ charset_list, &code, charset);
}
}
if (code == CHARSET_INVALID_CODE (charset))
@@ -5572,7 +5662,9 @@
}
else
{
- charset = char_charset (c, charset_list, &code);
+ CODING_CHAR_CHARSET (coding, dst, dst_end, c, charset_list,
+ &code, charset);
+
if (charset)
{
if (CHARSET_DIMENSION (charset) == 1)
next prev parent reply other threads:[~2011-12-05 7:10 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-08-18 9:01 bug#9318: 23.3.50; The first call of encode-coding-region() returns wrong result on on Windows Kazuhiro Ito
2011-08-18 9:48 ` Andreas Schwab
2011-08-18 21:33 ` Kazuhiro Ito
2011-08-19 13:46 ` bug#9318: 23.3.50; The first call of encode-coding-region() returns wrong result Kazuhiro Ito
2011-08-20 21:26 ` Chong Yidong
2011-08-21 0:17 ` Kazuhiro Ito
2011-08-24 9:37 ` Kazuhiro Ito
2011-08-24 12:06 ` Eli Zaretskii
2011-08-25 9:49 ` Kazuhiro Ito
2011-08-24 17:59 ` Andreas Schwab
2011-08-25 9:54 ` Kazuhiro Ito
2011-08-26 11:41 ` Kazuhiro Ito
2011-08-28 0:04 ` Kazuhiro Ito
2011-08-30 23:30 ` Kazuhiro Ito
2011-12-01 1:56 ` Kenichi Handa
2011-12-05 7:10 ` Kenichi Handa [this message]
2011-12-05 11:31 ` Kazuhiro Ito
2011-12-05 9:11 ` Paul Eggert
2011-12-06 0:30 ` Kenichi Handa
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=tl7y5urmr3g.fsf@m17n.org \
--to=handa@m17n.org \
--cc=9318@debbugs.gnu.org \
--cc=kzhr@d1.dion.ne.jp \
--cc=schwab@linux-m68k.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this external index
https://git.savannah.gnu.org/cgit/emacs.git
https://git.savannah.gnu.org/cgit/emacs/org-mode.git
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.