unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
* bug#16448: 24.3; Messages from (error "...") with UTF-8 chars are printed wrongly in Emacs Lisp scripts
@ 2014-01-15  0:10 Sergey Tselikh
  2014-01-15  4:02 ` Dmitry Antipov
  0 siblings, 1 reply; 4+ messages in thread
From: Sergey Tselikh @ 2014-01-15  0:10 UTC (permalink / raw)
  To: 16448

Hello.

In a script, when (error "...") instruction is executed with some UTF-8
characters in its text, the message is not printed correctly.

LANG environment variable is set to en_US.UTF-8 for all programs, my terminal is
x11-terms/rxvt-unicode with adequate UTF-8 support, Emacs version is GNU Emacs
24.3.1.


Examples (all of them are with LANG=en_US.UTF-8 in environment):

$ cat error.el 
(message "hello привет")
(message "привет hello")
(error "hello привет")

$ emacs --script error.el 
hello привет
привет hello
hello ?@825B

But: 
$ emacs -nw --eval '(error "hello привет")'  
^^^ successfully prints "hello привет" in minibuffer.


This ?@825B is not some trash.  Created a small table showing its origins (It
is ``echo hello привет | print-bits | cat -t'' vs. ``echo hello привет |
high-bits-01 | print-bits | cat -t''):

h    01101000  |   h  01101000  |
e    01100101  |   e  01100101  |
l    01101100  |   l  01101100  |
l    01101100  |   l  01101100  |
o    01101111  |   o  01101111  |
     00100000  |      00100000  |
M-P  11010000  |   P  01010000  |
M-?  10111111  |   ?  00111111  |   ?
M-Q  11010001  |   Q  01010001  |
M-^@ 10000000  |   @  01000000  |   @
M-P  11010000  |   P  01010000  |
M-8  10111000  |   8  00111000  |   8
M-P  11010000  |   P  01010000  |
M-2  10110010  |   2  00110010  |   2
M-P  11010000  |   P  01010000  |
M-5  10110101  |   5  00110101  |   5
M-Q  11010001  |   Q  01010001  |
M-^B 10000010  |   B  01000010  |   B



More examples:

$ cat any-other.el 
(error "cons:%s list:%s string:%s" (cons 'на 'речке) '(на речке на том бере) "be Быть beat Бить become Становиться begin Начинать bleed Кровоточить stung Жалить sweep Выметать swell Разбухать swim Плавать swing Качать take Брать, взять")

$ emacs --script any-other.el 
cons:(=0 . @5G:5) list:(=0 @5G:5 =0 B>< 15@5) string:be KBL beat 8BL become !B0=>28BLAO begin 0G8=0BL bleed @>2>B>G8BL stung 0;8BL sweep K<5B0BL swell  071CE0BL swim ;020BL swing 0G0BL take @0BL, 27OBL

$ cat ja.el 
(setq jstr "案ずるより産むが易し。 Anzuru yori umu ga yasushi. 出る杭は打たれる。 Deru kui wa utareru.")
(message "%s" jstr)
(error "%s" jstr)

$ emacs --script ja.el 
案ずるより産むが易し。 Anzuru yori umu ga yasushi. 出る杭は打たれる。 Deru kui wa utareru.
HZ???#?LW Anzuru yori umu ga yasushi. ?moS_?? Deru kui wa utareru.



In GNU Emacs 24.3.1 (x86_64-pc-linux-gnu, GTK+ Version 2.24.17)
 of 2013-10-10 on laptop
Windowing system distributor `The X.Org Foundation', version 11.0.11403000
Configured using:
 `configure '--prefix=/usr' '--build=x86_64-pc-linux-gnu'
 '--host=x86_64-pc-linux-gnu' '--mandir=/usr/share/man'
 '--infodir=/usr/share/info' '--datadir=/usr/share' '--sysconfdir=/etc'
 '--localstatedir=/var/lib' '--libdir=/usr/lib64'
 '--disable-silent-rules' '--disable-dependency-tracking'
 '--program-suffix=-emacs-24' '--infodir=/usr/share/info/emacs-24'
 '--enable-locallisppath=/etc/emacs:/usr/share/emacs/site-lisp'
 '--with-crt-dir=/usr/lib/gcc/x86_64-pc-linux-gnu/4.7.3/../../../../lib64'
 '--with-gameuser=games' '--without-compress-info' '--without-hesiod'
 '--without-kerberos' '--without-kerberos5' '--with-gpm' '--with-dbus'
 '--with-gnutls' '--with-xml2' '--without-selinux' '--without-wide-int'
 '--with-sound' '--with-x' '--without-ns' '--with-gconf'
 '--without-gsettings' '--with-toolkit-scroll-bars' '--with-gif'
 '--with-jpeg' '--with-png' '--with-rsvg' '--with-tiff' '--with-xpm'
 '--with-imagemagick' '--with-xft' '--with-libotf' '--with-m17n-flt'
 '--with-x-toolkit=gtk2' 'GENTOO_PACKAGE=app-editors/emacs-24.3-r2'
 'build_alias=x86_64-pc-linux-gnu' 'host_alias=x86_64-pc-linux-gnu'
 'CFLAGS=-pipe -march=corei7-avx -mno-aes -O2' 'LDFLAGS=-Wl,-O1
 -Wl,--as-needed' 'CPPFLAGS=''

Important settings:
  value of $LC_COLLATE: C
  value of $LANG: en_US.UTF-8
  locale-coding-system: utf-8-unix
  default enable-multibyte-characters: t


-- 
Sergey Tselikh <stselikh@gmail.com>





^ permalink raw reply	[flat|nested] 4+ messages in thread

* bug#16448: 24.3; Messages from (error "...") with UTF-8 chars are printed wrongly in Emacs Lisp scripts
  2014-01-15  0:10 bug#16448: 24.3; Messages from (error "...") with UTF-8 chars are printed wrongly in Emacs Lisp scripts Sergey Tselikh
@ 2014-01-15  4:02 ` Dmitry Antipov
  2014-01-15 15:35   ` Eli Zaretskii
  0 siblings, 1 reply; 4+ messages in thread
From: Dmitry Antipov @ 2014-01-15  4:02 UTC (permalink / raw)
  To: Sergey Tselikh; +Cc: 16448

On 01/15/2014 04:10 AM, Sergey Tselikh wrote:

> In a script, when (error "...") instruction is executed with some UTF-8
> characters in its text, the message is not printed correctly.

In batch mode, (error ...) is handled by external-debugging-output, and the
latter just does:

putc (XINT (character) & 0xFF, stderr);
                        ^^^^^^
To allow multibyte sequences here, we should use something like:

=== modified file 'src/print.c'
--- src/print.c	2014-01-01 07:43:34 +0000
+++ src/print.c	2014-01-15 03:55:39 +0000
@@ -709,8 +709,14 @@
  to make it write to the debugging output.  */)
    (Lisp_Object character)
  {
+  unsigned char str[MAX_MULTIBYTE_LENGTH];
+  unsigned int ch;
+  ptrdiff_t len;
+
    CHECK_NUMBER (character);
-  putc (XINT (character) & 0xFF, stderr);
+  ch = XINT (character);
+  len = CHAR_STRING (ch, str);
+  fwrite (str, len, 1, stderr);

  #ifdef WINDOWSNT
    /* Send the output to a debugger (nothing happens if there isn't one).  */

Dmitry






^ permalink raw reply	[flat|nested] 4+ messages in thread

* bug#16448: 24.3; Messages from (error "...") with UTF-8 chars are printed wrongly in Emacs Lisp scripts
  2014-01-15  4:02 ` Dmitry Antipov
@ 2014-01-15 15:35   ` Eli Zaretskii
  2014-02-01 12:00     ` Eli Zaretskii
  0 siblings, 1 reply; 4+ messages in thread
From: Eli Zaretskii @ 2014-01-15 15:35 UTC (permalink / raw)
  To: Dmitry Antipov; +Cc: 16448, stselikh

> Date: Wed, 15 Jan 2014 08:02:49 +0400
> From: Dmitry Antipov <dmantipov@yandex.ru>
> Cc: 16448@debbugs.gnu.org
> 
> On 01/15/2014 04:10 AM, Sergey Tselikh wrote:
> 
> > In a script, when (error "...") instruction is executed with some UTF-8
> > characters in its text, the message is not printed correctly.
> 
> In batch mode, (error ...) is handled by external-debugging-output, and the
> latter just does:
> 
> putc (XINT (character) & 0xFF, stderr);
>                         ^^^^^^
> To allow multibyte sequences here, we should use something like:
> 
> === modified file 'src/print.c'
> --- src/print.c	2014-01-01 07:43:34 +0000
> +++ src/print.c	2014-01-15 03:55:39 +0000
> @@ -709,8 +709,14 @@
>   to make it write to the debugging output.  */)
>     (Lisp_Object character)
>   {
> +  unsigned char str[MAX_MULTIBYTE_LENGTH];
> +  unsigned int ch;
> +  ptrdiff_t len;
> +
>     CHECK_NUMBER (character);
> -  putc (XINT (character) & 0xFF, stderr);
> +  ch = XINT (character);
> +  len = CHAR_STRING (ch, str);
> +  fwrite (str, len, 1, stderr);

This will only work correctly in a UTF-8 locale.  In the general case,
we need to run the resulting multibyte sequence through ENCODE_SYSTEM,
before writing it to stderr.

Btw, the way we output text in this case cries for refactoring: we
first assemble individual characters from their multibyte sequences,
then pass those characters one by one to external-debugging-output,
which will now have to unroll each character back into its multibyte
sequence, and encode each character individually.  Something for after
the branch, I guess.





^ permalink raw reply	[flat|nested] 4+ messages in thread

* bug#16448: 24.3; Messages from (error "...") with UTF-8 chars are printed wrongly in Emacs Lisp scripts
  2014-01-15 15:35   ` Eli Zaretskii
@ 2014-02-01 12:00     ` Eli Zaretskii
  0 siblings, 0 replies; 4+ messages in thread
From: Eli Zaretskii @ 2014-02-01 12:00 UTC (permalink / raw)
  To: stselikh; +Cc: 16448-done, dmantipov

> Date: Wed, 15 Jan 2014 17:35:43 +0200
> From: Eli Zaretskii <eliz@gnu.org>
> Cc: 16448@debbugs.gnu.org, stselikh@gmail.com
> 
> > Date: Wed, 15 Jan 2014 08:02:49 +0400
> > From: Dmitry Antipov <dmantipov@yandex.ru>
> > Cc: 16448@debbugs.gnu.org
> > 
> > On 01/15/2014 04:10 AM, Sergey Tselikh wrote:
> > 
> > > In a script, when (error "...") instruction is executed with some UTF-8
> > > characters in its text, the message is not printed correctly.
> > 
> > In batch mode, (error ...) is handled by external-debugging-output, and the
> > latter just does:
> > 
> > putc (XINT (character) & 0xFF, stderr);
> >                         ^^^^^^
> > To allow multibyte sequences here, we should use something like:
> > 
> > === modified file 'src/print.c'
> > --- src/print.c	2014-01-01 07:43:34 +0000
> > +++ src/print.c	2014-01-15 03:55:39 +0000
> > @@ -709,8 +709,14 @@
> >   to make it write to the debugging output.  */)
> >     (Lisp_Object character)
> >   {
> > +  unsigned char str[MAX_MULTIBYTE_LENGTH];
> > +  unsigned int ch;
> > +  ptrdiff_t len;
> > +
> >     CHECK_NUMBER (character);
> > -  putc (XINT (character) & 0xFF, stderr);
> > +  ch = XINT (character);
> > +  len = CHAR_STRING (ch, str);
> > +  fwrite (str, len, 1, stderr);
> 
> This will only work correctly in a UTF-8 locale.  In the general case,
> we need to run the resulting multibyte sequence through ENCODE_SYSTEM,
> before writing it to stderr.

Done in trunk revision 116232.





^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2014-02-01 12:00 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-01-15  0:10 bug#16448: 24.3; Messages from (error "...") with UTF-8 chars are printed wrongly in Emacs Lisp scripts Sergey Tselikh
2014-01-15  4:02 ` Dmitry Antipov
2014-01-15 15:35   ` Eli Zaretskii
2014-02-01 12:00     ` Eli Zaretskii

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).