From: Paul Eggert <eggert@cs.ucla.edu>
To: Eli Zaretskii <eliz@gnu.org>
Cc: 20545@debbugs.gnu.org
Subject: bug#20545: Transliterating curved to straight quotes in 8-bit environments
Date: Sun, 31 May 2015 20:06:16 -0700 [thread overview]
Message-ID: <556BCC28.3030705@cs.ucla.edu> (raw)
In-Reply-To: <83egmff2kh.fsf@gnu.org>
[-- Attachment #1: Type: text/plain, Size: 874 bytes --]
Eli Zaretskii wrote:
> I guess we can always wait and see if we get complaints about this ;-)
In trying out a few new things I did run into a problem in batch mode, because
(print FOO) can generate improperly encoded output if FOO contains characters
outside the system locale. For example, if the file t.el uses UTF-8 encoding
and contains this:
;; -*- coding: utf-8 -*-
(print "‘x’ “y” `z'")
and Emacs is in an 8-bit locale, the command 'emacs --batch -l t.el' will output
the string "‘x’ “y” `z'" to stdout in a UTF-8 encoding, which is an encoding
error for this environment.
Although this problem can occur with any non-ASCII character, diagnostics
containing curved quotes will make it more likely. I installed the attached
patch to fix this, using the same technique for standard output that Emacs is
already using for standard display.
[-- Attachment #2: 0001-Treat-batch-stdout-stderr-like-standard-display.patch --]
[-- Type: text/x-patch, Size: 9035 bytes --]
From a1e999c8395b8a8cc0259bcf672043d5302425be Mon Sep 17 00:00:00 2001
From: Paul Eggert <eggert@cs.ucla.edu>
Date: Sun, 31 May 2015 20:04:05 -0700
Subject: [PATCH] Treat batch stdout/stderr like standard display
Calls like (print FOO) could generate improperly encoded or
hard-to-read output if FOO contains characters outside the system
locale. Fix this by treating batch stdout and stderr like
interactive standard display, when it comes to transliterating and
encoding characters (Bug#20545).
* doc/emacs/mule.texi (Communication Coding):
* doc/lispref/display.texi (Active Display Table):
* doc/lispref/nonascii.texi (Locales):
* etc/NEWS:
* src/coding.c (syms_of_coding):
* src/dispnew.c (syms_of_display):
Document this.
* src/print.c: Include disptab.h.
(printchar_to_stream): New function, with much of the guts of the
old Fexternal_debugging_output, except this one also uses the
standard display table.
(printchar, strout, Fexternal_debugging_output): Use it.
---
doc/emacs/mule.texi | 4 +-
doc/lispref/display.texi | 5 ++-
doc/lispref/nonascii.texi | 1 +
etc/NEWS | 6 +++
src/coding.c | 3 +-
src/dispnew.c | 1 +
src/print.c | 106 ++++++++++++++++++++++++++++++++--------------
7 files changed, 90 insertions(+), 36 deletions(-)
diff --git a/doc/emacs/mule.texi b/doc/emacs/mule.texi
index 7336fdc..80b1384 100644
--- a/doc/emacs/mule.texi
+++ b/doc/emacs/mule.texi
@@ -1164,7 +1164,9 @@ current language environment.
to use when encoding and decoding system strings such as system error
messages and @code{format-time-string} formats and time stamps. That
coding system is also used for decoding non-@acronym{ASCII} keyboard
-input on the X Window System. You should choose a coding system that is compatible
+input on the X Window System and for encoding text sent to the
+standard output and error streams when in batch mode. You should
+choose a coding system that is compatible
with the underlying system's text representation, which is normally
specified by one of the environment variables @env{LC_ALL},
@env{LC_CTYPE}, and @env{LANG}. (The first one, in the order
diff --git a/doc/lispref/display.texi b/doc/lispref/display.texi
index b12995b..357a3c3 100644
--- a/doc/lispref/display.texi
+++ b/doc/lispref/display.texi
@@ -6522,8 +6522,9 @@ no buffer display table.
@defvar standard-display-table
The value of this variable is the standard display table, which is
used when Emacs is displaying a buffer in a window with neither a
-window display table nor a buffer display table defined. Its default
-is @code{nil}.
+window display table nor a buffer display table defined, or when Emacs
+is outputting text to the standard output or error streams. Its
+default is @code{nil}.
@end defvar
The @file{disp-table} library defines several functions for changing
diff --git a/doc/lispref/nonascii.texi b/doc/lispref/nonascii.texi
index f160184..362c3a4 100644
--- a/doc/lispref/nonascii.texi
+++ b/doc/lispref/nonascii.texi
@@ -1959,6 +1959,7 @@ how Emacs interacts with these features.
@cindex keyboard input decoding on X
This variable specifies the coding system to use for decoding system
error messages and---on X Window system only---keyboard input, for
+sending batch output to the standard output and error streams, for
encoding the format argument to @code{format-time-string}, and for
decoding the return value of @code{format-time-string}.
@end defvar
diff --git a/etc/NEWS b/etc/NEWS
index d07057e..57e63b1 100644
--- a/etc/NEWS
+++ b/etc/NEWS
@@ -947,6 +947,12 @@ integers.
** New function `set-binary-mode' allows to switch a standard stream
of the Emacs process to binary I/O mode.
+** In locales that cannot display curved quotes, ASCII approximations
+are installed in standard-display-table.
+
+** Standard output and error streams now transliterate characters via
+standard-display-table, and encode output using locale-coding-system.
+
** Miscellaneous name change
For consistency with the usual Emacs spelling, the Lisp variable
diff --git a/src/coding.c b/src/coding.c
index 776ecac..9342c38 100644
--- a/src/coding.c
+++ b/src/coding.c
@@ -11121,7 +11121,8 @@ See also the function `find-operation-coding-system'. */);
DEFVAR_LISP ("locale-coding-system", Vlocale_coding_system,
doc: /* Coding system to use with system messages.
-Also used for decoding keyboard input on X Window system. */);
+Also used for decoding keyboard input on X Window system, and for
+encoding standard output and error streams. */);
Vlocale_coding_system = Qnil;
/* The eol mnemonics are reset in startup.el system-dependently. */
diff --git a/src/dispnew.c b/src/dispnew.c
index 693dd49..7e7afa7 100644
--- a/src/dispnew.c
+++ b/src/dispnew.c
@@ -6245,6 +6245,7 @@ Each element can be:
DEFVAR_LISP ("standard-display-table", Vstandard_display_table,
doc: /* Display table to use for buffers that specify none.
+It is also used for standard output and error streams.
See `buffer-display-table' for more information. */);
Vstandard_display_table = Qnil;
diff --git a/src/print.c b/src/print.c
index a182839..65c120d 100644
--- a/src/print.c
+++ b/src/print.c
@@ -31,6 +31,7 @@ along with GNU Emacs. If not, see <http://www.gnu.org/licenses/>. */
#include "window.h"
#include "process.h"
#include "dispextern.h"
+#include "disptab.h"
#include "termchar.h"
#include "intervals.h"
#include "blockinput.h"
@@ -195,6 +196,61 @@ print_unwind (Lisp_Object saved_text)
memcpy (print_buffer, SDATA (saved_text), SCHARS (saved_text));
}
+/* Print character CH to the stdio stream STREAM. */
+
+static void
+printchar_to_stream (unsigned int ch, FILE *stream)
+{
+ Lisp_Object dv IF_LINT (= Qnil);
+ ptrdiff_t i = 0, n = 1;
+
+ if (CHAR_VALID_P (ch) && DISP_TABLE_P (Vstandard_display_table))
+ {
+ dv = DISP_CHAR_VECTOR (XCHAR_TABLE (Vstandard_display_table), ch);
+ if (VECTORP (dv))
+ {
+ n = ASIZE (dv);
+ goto next_char;
+ }
+ }
+
+ while (true)
+ {
+ if (ASCII_CHAR_P (ch))
+ {
+ putc (ch, stream);
+#ifdef WINDOWSNT
+ /* Send the output to a debugger (nothing happens if there
+ isn't one). */
+ if (print_output_debug_flag && stream == stderr)
+ OutputDebugString ((char []) {ch, '\0'});
+#endif
+ }
+ else
+ {
+ unsigned char mbstr[MAX_MULTIBYTE_LENGTH];
+ int len = CHAR_STRING (ch, mbstr);
+ Lisp_Object encoded_ch =
+ ENCODE_SYSTEM (make_multibyte_string ((char *) mbstr, 1, len));
+
+ fwrite (SSDATA (encoded_ch), 1, SBYTES (encoded_ch), stream);
+#ifdef WINDOWSNT
+ if (print_output_debug_flag && stream == stderr)
+ OutputDebugString (SSDATA (encoded_ch));
+#endif
+ }
+
+ i++;
+
+ next_char:
+ for (; i < n; i++)
+ if (CHARACTERP (AREF (dv, i)))
+ break;
+ if (! (i < n))
+ break;
+ ch = XFASTINT (AREF (dv, i));
+ }
+}
/* Print character CH using method FUN. FUN nil means print to
print_buffer. FUN t means print to echo area or stdout if
@@ -226,7 +282,10 @@ printchar (unsigned int ch, Lisp_Object fun)
else if (noninteractive)
{
printchar_stdout_last = ch;
- fwrite (str, 1, len, stdout);
+ if (DISP_TABLE_P (Vstandard_display_table))
+ printchar_to_stream (ch, stdout);
+ else
+ fwrite (str, 1, len, stdout);
noninteractive_need_newline = 1;
}
else
@@ -267,7 +326,19 @@ strout (const char *ptr, ptrdiff_t size, ptrdiff_t size_byte,
}
else if (noninteractive && EQ (printcharfun, Qt))
{
- fwrite (ptr, 1, size_byte, stdout);
+ if (DISP_TABLE_P (Vstandard_display_table))
+ {
+ int len;
+ for (ptrdiff_t i = 0; i < size_byte; i += len)
+ {
+ int ch = STRING_CHAR_AND_LENGTH ((const unsigned char *) ptr + i,
+ len);
+ printchar_to_stream (ch, stdout);
+ }
+ }
+ else
+ fwrite (ptr, 1, size_byte, stdout);
+
noninteractive_need_newline = 1;
}
else if (EQ (printcharfun, Qt))
@@ -688,37 +759,8 @@ You can call print while debugging emacs, and pass it this function
to make it write to the debugging output. */)
(Lisp_Object character)
{
- unsigned int ch;
-
CHECK_NUMBER (character);
- ch = XINT (character);
- if (ASCII_CHAR_P (ch))
- {
- putc (ch, stderr);
-#ifdef WINDOWSNT
- /* Send the output to a debugger (nothing happens if there isn't
- one). */
- if (print_output_debug_flag)
- {
- char buf[2] = {(char) XINT (character), '\0'};
- OutputDebugString (buf);
- }
-#endif
- }
- else
- {
- unsigned char mbstr[MAX_MULTIBYTE_LENGTH];
- ptrdiff_t len = CHAR_STRING (ch, mbstr);
- Lisp_Object encoded_ch =
- ENCODE_SYSTEM (make_multibyte_string ((char *) mbstr, 1, len));
-
- fwrite (SSDATA (encoded_ch), SBYTES (encoded_ch), 1, stderr);
-#ifdef WINDOWSNT
- if (print_output_debug_flag)
- OutputDebugString (SSDATA (encoded_ch));
-#endif
- }
-
+ printchar_to_stream (XINT (character), stderr);
return character;
}
--
2.1.0
next prev parent reply other threads:[~2015-06-01 3:06 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-05-10 23:55 bug#20545: New minor mode Electric Punct Paul Eggert
2015-05-11 15:00 ` Eli Zaretskii
2015-05-13 7:48 ` Paul Eggert
2015-05-13 13:10 ` Stefan Monnier
2015-05-13 14:49 ` Paul Eggert
2015-05-13 17:05 ` Eli Zaretskii
2015-05-14 3:52 ` Stefan Monnier
2015-05-14 14:54 ` Eli Zaretskii
2015-05-13 16:54 ` Eli Zaretskii
2015-05-13 16:38 ` Eli Zaretskii
2015-05-13 16:56 ` Paul Eggert
2015-05-13 17:16 ` Eli Zaretskii
2015-05-13 18:10 ` Eli Zaretskii
2015-05-14 6:32 ` Paul Eggert
2015-05-16 21:20 ` Paul Eggert
2015-05-17 14:33 ` Eli Zaretskii
2015-05-17 16:58 ` Paul Eggert
2015-05-17 19:04 ` Eli Zaretskii
2015-06-01 3:06 ` Paul Eggert [this message]
2015-06-01 14:31 ` bug#20545: Transliterating curved to straight quotes in 8-bit environments Eli Zaretskii
2015-05-11 15:57 ` bug#20545: New minor mode Electric Punct Stefan Monnier
2015-05-13 7:33 ` Paul Eggert
2015-05-11 16:48 ` Artur Malabarba
2015-05-26 11:24 ` bug#20545: New minor mode Electric Quote Paul Eggert
2015-05-27 14:24 ` Eli Zaretskii
2015-05-27 15:27 ` Paul Eggert
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=556BCC28.3030705@cs.ucla.edu \
--to=eggert@cs.ucla.edu \
--cc=20545@debbugs.gnu.org \
--cc=eliz@gnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this external index
https://git.savannah.gnu.org/cgit/emacs.git
https://git.savannah.gnu.org/cgit/emacs/org-mode.git
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.