all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: Paul Eggert <eggert@cs.ucla.edu>
To: Eli Zaretskii <eliz@gnu.org>
Cc: 20545@debbugs.gnu.org
Subject: bug#20545: Transliterating curved to straight quotes in 8-bit environments
Date: Sun, 31 May 2015 20:06:16 -0700	[thread overview]
Message-ID: <556BCC28.3030705@cs.ucla.edu> (raw)
In-Reply-To: <83egmff2kh.fsf@gnu.org>

[-- Attachment #1: Type: text/plain, Size: 874 bytes --]

Eli Zaretskii wrote:
> I guess we can always wait and see if we get complaints about this ;-)

In trying out a few new things I did run into a problem in batch mode, because 
(print FOO) can generate improperly encoded output if FOO contains characters 
outside the system locale.  For example, if the file t.el uses UTF-8 encoding 
and contains this:

;; -*- coding: utf-8 -*-
(print "‘x’ “y” `z'")

and Emacs is in an 8-bit locale, the command 'emacs --batch -l t.el' will output 
the string "‘x’ “y” `z'" to stdout in a UTF-8 encoding, which is an encoding 
error for this environment.

Although this problem can occur with any non-ASCII character, diagnostics 
containing curved quotes will make it more likely.  I installed the attached 
patch to fix this, using the same technique for standard output that Emacs is 
already using for standard display.

[-- Attachment #2: 0001-Treat-batch-stdout-stderr-like-standard-display.patch --]
[-- Type: text/x-patch, Size: 9035 bytes --]

From a1e999c8395b8a8cc0259bcf672043d5302425be Mon Sep 17 00:00:00 2001
From: Paul Eggert <eggert@cs.ucla.edu>
Date: Sun, 31 May 2015 20:04:05 -0700
Subject: [PATCH] Treat batch stdout/stderr like standard display

Calls like (print FOO) could generate improperly encoded or
hard-to-read output if FOO contains characters outside the system
locale.  Fix this by treating batch stdout and stderr like
interactive standard display, when it comes to transliterating and
encoding characters (Bug#20545).
* doc/emacs/mule.texi (Communication Coding):
* doc/lispref/display.texi (Active Display Table):
* doc/lispref/nonascii.texi (Locales):
* etc/NEWS:
* src/coding.c (syms_of_coding):
* src/dispnew.c (syms_of_display):
Document this.
* src/print.c: Include disptab.h.
(printchar_to_stream): New function, with much of the guts of the
old Fexternal_debugging_output, except this one also uses the
standard display table.
(printchar, strout, Fexternal_debugging_output): Use it.
---
 doc/emacs/mule.texi       |   4 +-
 doc/lispref/display.texi  |   5 ++-
 doc/lispref/nonascii.texi |   1 +
 etc/NEWS                  |   6 +++
 src/coding.c              |   3 +-
 src/dispnew.c             |   1 +
 src/print.c               | 106 ++++++++++++++++++++++++++++++++--------------
 7 files changed, 90 insertions(+), 36 deletions(-)

diff --git a/doc/emacs/mule.texi b/doc/emacs/mule.texi
index 7336fdc..80b1384 100644
--- a/doc/emacs/mule.texi
+++ b/doc/emacs/mule.texi
@@ -1164,7 +1164,9 @@ current language environment.
 to use when encoding and decoding system strings such as system error
 messages and @code{format-time-string} formats and time stamps.  That
 coding system is also used for decoding non-@acronym{ASCII} keyboard
-input on the X Window System.  You should choose a coding system that is compatible
+input on the X Window System and for encoding text sent to the
+standard output and error streams when in batch mode.  You should
+choose a coding system that is compatible
 with the underlying system's text representation, which is normally
 specified by one of the environment variables @env{LC_ALL},
 @env{LC_CTYPE}, and @env{LANG}.  (The first one, in the order
diff --git a/doc/lispref/display.texi b/doc/lispref/display.texi
index b12995b..357a3c3 100644
--- a/doc/lispref/display.texi
+++ b/doc/lispref/display.texi
@@ -6522,8 +6522,9 @@ no buffer display table.
 @defvar standard-display-table
 The value of this variable is the standard display table, which is
 used when Emacs is displaying a buffer in a window with neither a
-window display table nor a buffer display table defined.  Its default
-is @code{nil}.
+window display table nor a buffer display table defined, or when Emacs
+is outputting text to the standard output or error streams.  Its
+default is @code{nil}.
 @end defvar
 
 The @file{disp-table} library defines several functions for changing
diff --git a/doc/lispref/nonascii.texi b/doc/lispref/nonascii.texi
index f160184..362c3a4 100644
--- a/doc/lispref/nonascii.texi
+++ b/doc/lispref/nonascii.texi
@@ -1959,6 +1959,7 @@ how Emacs interacts with these features.
 @cindex keyboard input decoding on X
 This variable specifies the coding system to use for decoding system
 error messages and---on X Window system only---keyboard input, for
+sending batch output to the standard output and error streams, for
 encoding the format argument to @code{format-time-string}, and for
 decoding the return value of @code{format-time-string}.
 @end defvar
diff --git a/etc/NEWS b/etc/NEWS
index d07057e..57e63b1 100644
--- a/etc/NEWS
+++ b/etc/NEWS
@@ -947,6 +947,12 @@ integers.
 ** New function `set-binary-mode' allows to switch a standard stream
 of the Emacs process to binary I/O mode.
 
+** In locales that cannot display curved quotes, ASCII approximations
+are installed in standard-display-table.
+
+** Standard output and error streams now transliterate characters via
+standard-display-table, and encode output using locale-coding-system.
+
 ** Miscellaneous name change
 
 For consistency with the usual Emacs spelling, the Lisp variable
diff --git a/src/coding.c b/src/coding.c
index 776ecac..9342c38 100644
--- a/src/coding.c
+++ b/src/coding.c
@@ -11121,7 +11121,8 @@ See also the function `find-operation-coding-system'.  */);
 
   DEFVAR_LISP ("locale-coding-system", Vlocale_coding_system,
 	       doc: /* Coding system to use with system messages.
-Also used for decoding keyboard input on X Window system.  */);
+Also used for decoding keyboard input on X Window system, and for
+encoding standard output and error streams.  */);
   Vlocale_coding_system = Qnil;
 
   /* The eol mnemonics are reset in startup.el system-dependently.  */
diff --git a/src/dispnew.c b/src/dispnew.c
index 693dd49..7e7afa7 100644
--- a/src/dispnew.c
+++ b/src/dispnew.c
@@ -6245,6 +6245,7 @@ Each element can be:
 
   DEFVAR_LISP ("standard-display-table", Vstandard_display_table,
 	       doc: /* Display table to use for buffers that specify none.
+It is also used for standard output and error streams.
 See `buffer-display-table' for more information.  */);
   Vstandard_display_table = Qnil;
 
diff --git a/src/print.c b/src/print.c
index a182839..65c120d 100644
--- a/src/print.c
+++ b/src/print.c
@@ -31,6 +31,7 @@ along with GNU Emacs.  If not, see <http://www.gnu.org/licenses/>.  */
 #include "window.h"
 #include "process.h"
 #include "dispextern.h"
+#include "disptab.h"
 #include "termchar.h"
 #include "intervals.h"
 #include "blockinput.h"
@@ -195,6 +196,61 @@ print_unwind (Lisp_Object saved_text)
   memcpy (print_buffer, SDATA (saved_text), SCHARS (saved_text));
 }
 
+/* Print character CH to the stdio stream STREAM.  */
+
+static void
+printchar_to_stream (unsigned int ch, FILE *stream)
+{
+  Lisp_Object dv IF_LINT (= Qnil);
+  ptrdiff_t i = 0, n = 1;
+
+  if (CHAR_VALID_P (ch) && DISP_TABLE_P (Vstandard_display_table))
+    {
+      dv = DISP_CHAR_VECTOR (XCHAR_TABLE (Vstandard_display_table), ch);
+      if (VECTORP (dv))
+	{
+	  n = ASIZE (dv);
+	  goto next_char;
+	}
+    }
+
+  while (true)
+    {
+      if (ASCII_CHAR_P (ch))
+	{
+	  putc (ch, stream);
+#ifdef WINDOWSNT
+	  /* Send the output to a debugger (nothing happens if there
+	     isn't one).  */
+	  if (print_output_debug_flag && stream == stderr)
+	    OutputDebugString ((char []) {ch, '\0'});
+#endif
+	}
+      else
+	{
+	  unsigned char mbstr[MAX_MULTIBYTE_LENGTH];
+	  int len = CHAR_STRING (ch, mbstr);
+	  Lisp_Object encoded_ch =
+	    ENCODE_SYSTEM (make_multibyte_string ((char *) mbstr, 1, len));
+
+	  fwrite (SSDATA (encoded_ch), 1, SBYTES (encoded_ch), stream);
+#ifdef WINDOWSNT
+	  if (print_output_debug_flag && stream == stderr)
+	    OutputDebugString (SSDATA (encoded_ch));
+#endif
+	}
+
+      i++;
+
+    next_char:
+      for (; i < n; i++)
+	if (CHARACTERP (AREF (dv, i)))
+	  break;
+      if (! (i < n))
+	break;
+      ch = XFASTINT (AREF (dv, i));
+    }
+}
 
 /* Print character CH using method FUN.  FUN nil means print to
    print_buffer.  FUN t means print to echo area or stdout if
@@ -226,7 +282,10 @@ printchar (unsigned int ch, Lisp_Object fun)
       else if (noninteractive)
 	{
 	  printchar_stdout_last = ch;
-	  fwrite (str, 1, len, stdout);
+	  if (DISP_TABLE_P (Vstandard_display_table))
+	    printchar_to_stream (ch, stdout);
+	  else
+	    fwrite (str, 1, len, stdout);
 	  noninteractive_need_newline = 1;
 	}
       else
@@ -267,7 +326,19 @@ strout (const char *ptr, ptrdiff_t size, ptrdiff_t size_byte,
     }
   else if (noninteractive && EQ (printcharfun, Qt))
     {
-      fwrite (ptr, 1, size_byte, stdout);
+      if (DISP_TABLE_P (Vstandard_display_table))
+	{
+	  int len;
+	  for (ptrdiff_t i = 0; i < size_byte; i += len)
+	    {
+	      int ch = STRING_CHAR_AND_LENGTH ((const unsigned char *) ptr + i,
+					       len);
+	      printchar_to_stream (ch, stdout);
+	    }
+	}
+      else
+	fwrite (ptr, 1, size_byte, stdout);
+
       noninteractive_need_newline = 1;
     }
   else if (EQ (printcharfun, Qt))
@@ -688,37 +759,8 @@ You can call print while debugging emacs, and pass it this function
 to make it write to the debugging output.  */)
   (Lisp_Object character)
 {
-  unsigned int ch;
-
   CHECK_NUMBER (character);
-  ch = XINT (character);
-  if (ASCII_CHAR_P (ch))
-    {
-      putc (ch, stderr);
-#ifdef WINDOWSNT
-      /* Send the output to a debugger (nothing happens if there isn't
-	 one).  */
-      if (print_output_debug_flag)
-	{
-	  char buf[2] = {(char) XINT (character), '\0'};
-	  OutputDebugString (buf);
-	}
-#endif
-    }
-  else
-    {
-      unsigned char mbstr[MAX_MULTIBYTE_LENGTH];
-      ptrdiff_t len = CHAR_STRING (ch, mbstr);
-      Lisp_Object encoded_ch =
-	ENCODE_SYSTEM (make_multibyte_string ((char *) mbstr, 1, len));
-
-      fwrite (SSDATA (encoded_ch), SBYTES (encoded_ch), 1, stderr);
-#ifdef WINDOWSNT
-      if (print_output_debug_flag)
-	OutputDebugString (SSDATA (encoded_ch));
-#endif
-    }
-
+  printchar_to_stream (XINT (character), stderr);
   return character;
 }
 
-- 
2.1.0


  reply	other threads:[~2015-06-01  3:06 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-05-10 23:55 bug#20545: New minor mode Electric Punct Paul Eggert
2015-05-11 15:00 ` Eli Zaretskii
2015-05-13  7:48   ` Paul Eggert
2015-05-13 13:10     ` Stefan Monnier
2015-05-13 14:49       ` Paul Eggert
2015-05-13 17:05         ` Eli Zaretskii
2015-05-14  3:52         ` Stefan Monnier
2015-05-14 14:54           ` Eli Zaretskii
2015-05-13 16:54       ` Eli Zaretskii
2015-05-13 16:38     ` Eli Zaretskii
2015-05-13 16:56       ` Paul Eggert
2015-05-13 17:16         ` Eli Zaretskii
2015-05-13 18:10           ` Eli Zaretskii
2015-05-14  6:32           ` Paul Eggert
2015-05-16 21:20             ` Paul Eggert
2015-05-17 14:33               ` Eli Zaretskii
2015-05-17 16:58                 ` Paul Eggert
2015-05-17 19:04                   ` Eli Zaretskii
2015-06-01  3:06                     ` Paul Eggert [this message]
2015-06-01 14:31                       ` bug#20545: Transliterating curved to straight quotes in 8-bit environments Eli Zaretskii
2015-05-11 15:57 ` bug#20545: New minor mode Electric Punct Stefan Monnier
2015-05-13  7:33   ` Paul Eggert
2015-05-11 16:48 ` Artur Malabarba
2015-05-26 11:24 ` bug#20545: New minor mode Electric Quote Paul Eggert
2015-05-27 14:24   ` Eli Zaretskii
2015-05-27 15:27     ` Paul Eggert

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=556BCC28.3030705@cs.ucla.edu \
    --to=eggert@cs.ucla.edu \
    --cc=20545@debbugs.gnu.org \
    --cc=eliz@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.