From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Paul Eggert Newsgroups: gmane.emacs.bugs Subject: bug#20545: Transliterating curved to straight quotes in 8-bit environments Date: Sun, 31 May 2015 20:06:16 -0700 Organization: UCLA Computer Science Department Message-ID: <556BCC28.3030705@cs.ucla.edu> References: <554FEFFE.7000303@cs.ucla.edu> <83k2wfma51.fsf@gnu.org> <555301BD.1040503@cs.ucla.edu> <83r3qkjuve.fsf@gnu.org> <5553822C.3000709@cs.ucla.edu> <83iobwjt3s.fsf@gnu.org> <55544199.3040704@cs.ucla.edu> <5557B4A7.9070604@cs.ucla.edu> <83k2w7ff3v.fsf@gnu.org> <5558C8CC.9030408@cs.ucla.edu> <83egmff2kh.fsf@gnu.org> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="------------020709010409040000080309" X-Trace: ger.gmane.org 1433128047 4537 80.91.229.3 (1 Jun 2015 03:07:27 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Mon, 1 Jun 2015 03:07:27 +0000 (UTC) Cc: 20545@debbugs.gnu.org To: Eli Zaretskii Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Mon Jun 01 05:07:16 2015 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1YzG4J-0003JL-Mm for geb-bug-gnu-emacs@m.gmane.org; Mon, 01 Jun 2015 05:07:15 +0200 Original-Received: from localhost ([::1]:43990 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YzG4I-000694-P4 for geb-bug-gnu-emacs@m.gmane.org; Sun, 31 May 2015 23:07:14 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:51210) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YzG4A-00068l-7p for bug-gnu-emacs@gnu.org; Sun, 31 May 2015 23:07:08 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1YzG46-0001F3-TW for bug-gnu-emacs@gnu.org; Sun, 31 May 2015 23:07:06 -0400 Original-Received: from debbugs.gnu.org ([140.186.70.43]:53451) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YzG46-0001Ey-Pq for bug-gnu-emacs@gnu.org; Sun, 31 May 2015 23:07:02 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.80) (envelope-from ) id 1YzG46-0008Oy-Ai for bug-gnu-emacs@gnu.org; Sun, 31 May 2015 23:07:02 -0400 X-Loop: help-debbugs@gnu.org Resent-From: Paul Eggert Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Mon, 01 Jun 2015 03:07:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 20545 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: patch Original-Received: via spool by 20545-submit@debbugs.gnu.org id=B20545.143312798832256 (code B ref 20545); Mon, 01 Jun 2015 03:07:02 +0000 Original-Received: (at 20545) by debbugs.gnu.org; 1 Jun 2015 03:06:28 +0000 Original-Received: from localhost ([127.0.0.1]:35193 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1YzG3W-0008OB-VT for submit@debbugs.gnu.org; Sun, 31 May 2015 23:06:28 -0400 Original-Received: from smtp.cs.ucla.edu ([131.179.128.62]:41922) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1YzG3T-0008Nw-If for 20545@debbugs.gnu.org; Sun, 31 May 2015 23:06:25 -0400 Original-Received: from localhost (localhost.localdomain [127.0.0.1]) by smtp.cs.ucla.edu (Postfix) with ESMTP id 72F3AA60003; Sun, 31 May 2015 20:06:17 -0700 (PDT) X-Virus-Scanned: amavisd-new at smtp.cs.ucla.edu Original-Received: from smtp.cs.ucla.edu ([127.0.0.1]) by localhost (smtp.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id VO0Ua76360Zc; Sun, 31 May 2015 20:06:16 -0700 (PDT) Original-Received: from [192.168.1.9] (pool-100-32-155-148.lsanca.fios.verizon.net [100.32.155.148]) by smtp.cs.ucla.edu (Postfix) with ESMTPSA id 3DB5F39E8016; Sun, 31 May 2015 20:06:16 -0700 (PDT) User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.7.0 In-Reply-To: <83egmff2kh.fsf@gnu.org> X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list X-detected-operating-system: by eggs.gnu.org: GNU/Linux 3.x X-Received-From: 140.186.70.43 X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Original-Sender: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.bugs:103417 Archived-At: This is a multi-part message in MIME format. --------------020709010409040000080309 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Eli Zaretskii wrote: > I guess we can always wait and see if we get complaints about this ;-) In trying out a few new things I did run into a problem in batch mode, because (print FOO) can generate improperly encoded output if FOO contains characters outside the system locale. For example, if the file t.el uses UTF-8 encoding and contains this: ;; -*- coding: utf-8 -*- (print "‘x’ “y” `z'") and Emacs is in an 8-bit locale, the command 'emacs --batch -l t.el' will output the string "‘x’ “y” `z'" to stdout in a UTF-8 encoding, which is an encoding error for this environment. Although this problem can occur with any non-ASCII character, diagnostics containing curved quotes will make it more likely. I installed the attached patch to fix this, using the same technique for standard output that Emacs is already using for standard display. --------------020709010409040000080309 Content-Type: text/x-patch; name="0001-Treat-batch-stdout-stderr-like-standard-display.patch" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename*0="0001-Treat-batch-stdout-stderr-like-standard-display.patch" >From a1e999c8395b8a8cc0259bcf672043d5302425be Mon Sep 17 00:00:00 2001 From: Paul Eggert Date: Sun, 31 May 2015 20:04:05 -0700 Subject: [PATCH] Treat batch stdout/stderr like standard display Calls like (print FOO) could generate improperly encoded or hard-to-read output if FOO contains characters outside the system locale. Fix this by treating batch stdout and stderr like interactive standard display, when it comes to transliterating and encoding characters (Bug#20545). * doc/emacs/mule.texi (Communication Coding): * doc/lispref/display.texi (Active Display Table): * doc/lispref/nonascii.texi (Locales): * etc/NEWS: * src/coding.c (syms_of_coding): * src/dispnew.c (syms_of_display): Document this. * src/print.c: Include disptab.h. (printchar_to_stream): New function, with much of the guts of the old Fexternal_debugging_output, except this one also uses the standard display table. (printchar, strout, Fexternal_debugging_output): Use it. --- doc/emacs/mule.texi | 4 +- doc/lispref/display.texi | 5 ++- doc/lispref/nonascii.texi | 1 + etc/NEWS | 6 +++ src/coding.c | 3 +- src/dispnew.c | 1 + src/print.c | 106 ++++++++++++++++++++++++++++++++-------------- 7 files changed, 90 insertions(+), 36 deletions(-) diff --git a/doc/emacs/mule.texi b/doc/emacs/mule.texi index 7336fdc..80b1384 100644 --- a/doc/emacs/mule.texi +++ b/doc/emacs/mule.texi @@ -1164,7 +1164,9 @@ current language environment. to use when encoding and decoding system strings such as system error messages and @code{format-time-string} formats and time stamps. That coding system is also used for decoding non-@acronym{ASCII} keyboard -input on the X Window System. You should choose a coding system that is compatible +input on the X Window System and for encoding text sent to the +standard output and error streams when in batch mode. You should +choose a coding system that is compatible with the underlying system's text representation, which is normally specified by one of the environment variables @env{LC_ALL}, @env{LC_CTYPE}, and @env{LANG}. (The first one, in the order diff --git a/doc/lispref/display.texi b/doc/lispref/display.texi index b12995b..357a3c3 100644 --- a/doc/lispref/display.texi +++ b/doc/lispref/display.texi @@ -6522,8 +6522,9 @@ no buffer display table. @defvar standard-display-table The value of this variable is the standard display table, which is used when Emacs is displaying a buffer in a window with neither a -window display table nor a buffer display table defined. Its default -is @code{nil}. +window display table nor a buffer display table defined, or when Emacs +is outputting text to the standard output or error streams. Its +default is @code{nil}. @end defvar The @file{disp-table} library defines several functions for changing diff --git a/doc/lispref/nonascii.texi b/doc/lispref/nonascii.texi index f160184..362c3a4 100644 --- a/doc/lispref/nonascii.texi +++ b/doc/lispref/nonascii.texi @@ -1959,6 +1959,7 @@ how Emacs interacts with these features. @cindex keyboard input decoding on X This variable specifies the coding system to use for decoding system error messages and---on X Window system only---keyboard input, for +sending batch output to the standard output and error streams, for encoding the format argument to @code{format-time-string}, and for decoding the return value of @code{format-time-string}. @end defvar diff --git a/etc/NEWS b/etc/NEWS index d07057e..57e63b1 100644 --- a/etc/NEWS +++ b/etc/NEWS @@ -947,6 +947,12 @@ integers. ** New function `set-binary-mode' allows to switch a standard stream of the Emacs process to binary I/O mode. +** In locales that cannot display curved quotes, ASCII approximations +are installed in standard-display-table. + +** Standard output and error streams now transliterate characters via +standard-display-table, and encode output using locale-coding-system. + ** Miscellaneous name change For consistency with the usual Emacs spelling, the Lisp variable diff --git a/src/coding.c b/src/coding.c index 776ecac..9342c38 100644 --- a/src/coding.c +++ b/src/coding.c @@ -11121,7 +11121,8 @@ See also the function `find-operation-coding-system'. */); DEFVAR_LISP ("locale-coding-system", Vlocale_coding_system, doc: /* Coding system to use with system messages. -Also used for decoding keyboard input on X Window system. */); +Also used for decoding keyboard input on X Window system, and for +encoding standard output and error streams. */); Vlocale_coding_system = Qnil; /* The eol mnemonics are reset in startup.el system-dependently. */ diff --git a/src/dispnew.c b/src/dispnew.c index 693dd49..7e7afa7 100644 --- a/src/dispnew.c +++ b/src/dispnew.c @@ -6245,6 +6245,7 @@ Each element can be: DEFVAR_LISP ("standard-display-table", Vstandard_display_table, doc: /* Display table to use for buffers that specify none. +It is also used for standard output and error streams. See `buffer-display-table' for more information. */); Vstandard_display_table = Qnil; diff --git a/src/print.c b/src/print.c index a182839..65c120d 100644 --- a/src/print.c +++ b/src/print.c @@ -31,6 +31,7 @@ along with GNU Emacs. If not, see . */ #include "window.h" #include "process.h" #include "dispextern.h" +#include "disptab.h" #include "termchar.h" #include "intervals.h" #include "blockinput.h" @@ -195,6 +196,61 @@ print_unwind (Lisp_Object saved_text) memcpy (print_buffer, SDATA (saved_text), SCHARS (saved_text)); } +/* Print character CH to the stdio stream STREAM. */ + +static void +printchar_to_stream (unsigned int ch, FILE *stream) +{ + Lisp_Object dv IF_LINT (= Qnil); + ptrdiff_t i = 0, n = 1; + + if (CHAR_VALID_P (ch) && DISP_TABLE_P (Vstandard_display_table)) + { + dv = DISP_CHAR_VECTOR (XCHAR_TABLE (Vstandard_display_table), ch); + if (VECTORP (dv)) + { + n = ASIZE (dv); + goto next_char; + } + } + + while (true) + { + if (ASCII_CHAR_P (ch)) + { + putc (ch, stream); +#ifdef WINDOWSNT + /* Send the output to a debugger (nothing happens if there + isn't one). */ + if (print_output_debug_flag && stream == stderr) + OutputDebugString ((char []) {ch, '\0'}); +#endif + } + else + { + unsigned char mbstr[MAX_MULTIBYTE_LENGTH]; + int len = CHAR_STRING (ch, mbstr); + Lisp_Object encoded_ch = + ENCODE_SYSTEM (make_multibyte_string ((char *) mbstr, 1, len)); + + fwrite (SSDATA (encoded_ch), 1, SBYTES (encoded_ch), stream); +#ifdef WINDOWSNT + if (print_output_debug_flag && stream == stderr) + OutputDebugString (SSDATA (encoded_ch)); +#endif + } + + i++; + + next_char: + for (; i < n; i++) + if (CHARACTERP (AREF (dv, i))) + break; + if (! (i < n)) + break; + ch = XFASTINT (AREF (dv, i)); + } +} /* Print character CH using method FUN. FUN nil means print to print_buffer. FUN t means print to echo area or stdout if @@ -226,7 +282,10 @@ printchar (unsigned int ch, Lisp_Object fun) else if (noninteractive) { printchar_stdout_last = ch; - fwrite (str, 1, len, stdout); + if (DISP_TABLE_P (Vstandard_display_table)) + printchar_to_stream (ch, stdout); + else + fwrite (str, 1, len, stdout); noninteractive_need_newline = 1; } else @@ -267,7 +326,19 @@ strout (const char *ptr, ptrdiff_t size, ptrdiff_t size_byte, } else if (noninteractive && EQ (printcharfun, Qt)) { - fwrite (ptr, 1, size_byte, stdout); + if (DISP_TABLE_P (Vstandard_display_table)) + { + int len; + for (ptrdiff_t i = 0; i < size_byte; i += len) + { + int ch = STRING_CHAR_AND_LENGTH ((const unsigned char *) ptr + i, + len); + printchar_to_stream (ch, stdout); + } + } + else + fwrite (ptr, 1, size_byte, stdout); + noninteractive_need_newline = 1; } else if (EQ (printcharfun, Qt)) @@ -688,37 +759,8 @@ You can call print while debugging emacs, and pass it this function to make it write to the debugging output. */) (Lisp_Object character) { - unsigned int ch; - CHECK_NUMBER (character); - ch = XINT (character); - if (ASCII_CHAR_P (ch)) - { - putc (ch, stderr); -#ifdef WINDOWSNT - /* Send the output to a debugger (nothing happens if there isn't - one). */ - if (print_output_debug_flag) - { - char buf[2] = {(char) XINT (character), '\0'}; - OutputDebugString (buf); - } -#endif - } - else - { - unsigned char mbstr[MAX_MULTIBYTE_LENGTH]; - ptrdiff_t len = CHAR_STRING (ch, mbstr); - Lisp_Object encoded_ch = - ENCODE_SYSTEM (make_multibyte_string ((char *) mbstr, 1, len)); - - fwrite (SSDATA (encoded_ch), SBYTES (encoded_ch), 1, stderr); -#ifdef WINDOWSNT - if (print_output_debug_flag) - OutputDebugString (SSDATA (encoded_ch)); -#endif - } - + printchar_to_stream (XINT (character), stderr); return character; } -- 2.1.0 --------------020709010409040000080309--