unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
* Bug with UTF-8 string and dbus
@ 2010-06-08 21:39 Julien Danjou
  2010-06-09  0:43 ` Stefan Monnier
  2010-06-09  9:16 ` [PATCH] Fix D-Bus string encoding Julien Danjou
  0 siblings, 2 replies; 23+ messages in thread
From: Julien Danjou @ 2010-06-08 21:39 UTC (permalink / raw)
  To: emacs-devel

[-- Attachment #1: Type: text/plain, Size: 998 bytes --]

Hi,

While coding notifications.el, I found an odd bug here.
When using it with some UTF-8 chars in a string (like 'title' or
'body'), Emacs raises a D-Bus error because it did not received a reply.
But reading dbus-monitor, it did NOT send a method call.

I can reproduce it easily with:

  (dbus-call-method :session
                    "org.freedesktop.Notifications"
                    "/org/freedesktop/Notifications"
                    "org.freedesktop.Notifications"
                    "Notify"
                    :string "Emacsé")

(this is not a valid call for Notify, but anyhow it should send the
call)

I've tried to break on inside dbus.c:xd_append_arg and what I got is:
492		  char *val = SDATA (Fstring_make_unibyte (object));
(gdb) print (char *) val
$6 = 0x253e830 "Emacs", <incomplete sequence \351>

Is this normal? If yes, how to fix? If no, where's the bug?

Thanks,
-- 
Julien Danjou
// ᐰ <julien@danjou.info>   http://julien.danjou.info

[-- Attachment #2: Type: application/pgp-signature, Size: 197 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Bug with UTF-8 string and dbus
  2010-06-08 21:39 Bug with UTF-8 string and dbus Julien Danjou
@ 2010-06-09  0:43 ` Stefan Monnier
  2010-06-09  1:17   ` Eli Zaretskii
  2010-06-09  9:16 ` [PATCH] Fix D-Bus string encoding Julien Danjou
  1 sibling, 1 reply; 23+ messages in thread
From: Stefan Monnier @ 2010-06-09  0:43 UTC (permalink / raw)
  To: Julien Danjou; +Cc: emacs-devel

> 492		  char *val = SDATA (Fstring_make_unibyte (object));

Fstring_make_unibyte is wrong here.
Most likely, if the D-Bus specification mandates UTF-8, the better thing
to do would be either to encode using utf-8, or to take advantage of the
fact that Emacs already uses utf-8 internally and pass just SDATA (object).


        Stefan



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Bug with UTF-8 string and dbus
  2010-06-09  0:43 ` Stefan Monnier
@ 2010-06-09  1:17   ` Eli Zaretskii
  2010-06-09  6:34     ` Julien Danjou
  0 siblings, 1 reply; 23+ messages in thread
From: Eli Zaretskii @ 2010-06-09  1:17 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: julien, emacs-devel

> From: Stefan Monnier <monnier@iro.umontreal.ca>
> Date: Tue, 08 Jun 2010 20:43:37 -0400
> Cc: emacs-devel@gnu.org
> 
> > 492		  char *val = SDATA (Fstring_make_unibyte (object));
> 
> Fstring_make_unibyte is wrong here.
> Most likely, if the D-Bus specification mandates UTF-8, the better thing
> to do would be either to encode using utf-8, or to take advantage of the
> fact that Emacs already uses utf-8 internally and pass just SDATA (object).

Using unencoded SDATA would be wrong with eight-bit characters (aka
raw bytes).  I'd suggest to encode, to be on the safe side.



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Bug with UTF-8 string and dbus
  2010-06-09  1:17   ` Eli Zaretskii
@ 2010-06-09  6:34     ` Julien Danjou
  2010-06-09  7:27       ` Eli Zaretskii
                         ` (2 more replies)
  0 siblings, 3 replies; 23+ messages in thread
From: Julien Danjou @ 2010-06-09  6:34 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Stefan Monnier, emacs-devel

[-- Attachment #1: Type: text/plain, Size: 984 bytes --]

On Wed, Jun 09 2010, Eli Zaretskii wrote:

>> > 492		  char *val = SDATA (Fstring_make_unibyte (object));
>> 
>> Fstring_make_unibyte is wrong here.
>> Most likely, if the D-Bus specification mandates UTF-8, the better thing
>> to do would be either to encode using utf-8, or to take advantage of the
>> fact that Emacs already uses utf-8 internally and pass just SDATA (object).

According to D-Bus spec[1]:

          STRING

          UTF-8 string (must be valid UTF-8). Must be nul terminated and
          contain no other nul bytes.

> Using unencoded SDATA would be wrong with eight-bit characters (aka
> raw bytes).  I'd suggest to encode, to be on the safe side.

Any hint on how to do that ?
I mean, I don't know the Emacs C API at all, but I can test some
idea/patch if pointed in the appropriate direction.

[1]  http://dbus.freedesktop.org/doc/dbus-specification.html

-- 
Julien Danjou
// ᐰ <julien@danjou.info>   http://julien.danjou.info

[-- Attachment #2: Type: application/pgp-signature, Size: 197 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Bug with UTF-8 string and dbus
  2010-06-09  6:34     ` Julien Danjou
@ 2010-06-09  7:27       ` Eli Zaretskii
  2010-06-09  8:51         ` Jan Djärv
  2010-06-09  7:28       ` Jan Djärv
  2010-06-09 14:08       ` Stefan Monnier
  2 siblings, 1 reply; 23+ messages in thread
From: Eli Zaretskii @ 2010-06-09  7:27 UTC (permalink / raw)
  To: Julien Danjou; +Cc: monnier, emacs-devel

> From: Julien Danjou <julien@danjou.info>
> Cc: Stefan Monnier <monnier@iro.umontreal.ca>,  emacs-devel@gnu.org
> Date: Wed, 09 Jun 2010 08:34:17 +0200
> 
> > Using unencoded SDATA would be wrong with eight-bit characters (aka
> > raw bytes).  I'd suggest to encode, to be on the safe side.
> 
> Any hint on how to do that ?

Here's one way:

   code_convert_string_norecord (SDATA (object), Qutf_8, 1);

(Make sure you include coding.h, to have Qutf_8 declared.)

You will see quite a few other places we use this in the sources.



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Bug with UTF-8 string and dbus
  2010-06-09  6:34     ` Julien Danjou
  2010-06-09  7:27       ` Eli Zaretskii
@ 2010-06-09  7:28       ` Jan Djärv
  2010-06-09 14:08       ` Stefan Monnier
  2 siblings, 0 replies; 23+ messages in thread
From: Jan Djärv @ 2010-06-09  7:28 UTC (permalink / raw)
  To: Julien Danjou; +Cc: Eli Zaretskii, Stefan Monnier, emacs-devel



Julien Danjou skrev 2010-06-09 08.34:
> On Wed, Jun 09 2010, Eli Zaretskii wrote:
>
>>>> 492		  char *val = SDATA (Fstring_make_unibyte (object));
>>>
>>> Fstring_make_unibyte is wrong here.
>>> Most likely, if the D-Bus specification mandates UTF-8, the better thing
>>> to do would be either to encode using utf-8, or to take advantage of the
>>> fact that Emacs already uses utf-8 internally and pass just SDATA (object).
>
> According to D-Bus spec[1]:
>
>            STRING
>
>            UTF-8 string (must be valid UTF-8). Must be nul terminated and
>            contain no other nul bytes.
>
>> Using unencoded SDATA would be wrong with eight-bit characters (aka
>> raw bytes).  I'd suggest to encode, to be on the safe side.
>
> Any hint on how to do that ?

char *val = SDATA (ENCODE_UTF_8 (object));

should do it.

	Jan D.



> I mean, I don't know the Emacs C API at all, but I can test some
> idea/patch if pointed in the appropriate direction.
>
> [1]  http://dbus.freedesktop.org/doc/dbus-specification.html
>



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Bug with UTF-8 string and dbus
  2010-06-09  7:27       ` Eli Zaretskii
@ 2010-06-09  8:51         ` Jan Djärv
  2010-06-09  9:30           ` Eli Zaretskii
  0 siblings, 1 reply; 23+ messages in thread
From: Jan Djärv @ 2010-06-09  8:51 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Julien Danjou, monnier, emacs-devel

2010-06-09 09:27, Eli Zaretskii skrev:
>> From: Julien Danjou<julien@danjou.info>
>> Cc: Stefan Monnier<monnier@iro.umontreal.ca>,  emacs-devel@gnu.org
>> Date: Wed, 09 Jun 2010 08:34:17 +0200
>>
>>> Using unencoded SDATA would be wrong with eight-bit characters (aka
>>> raw bytes).  I'd suggest to encode, to be on the safe side.
>>
>> Any hint on how to do that ?
>
> Here's one way:
>
>     code_convert_string_norecord (SDATA (object), Qutf_8, 1);
>
> (Make sure you include coding.h, to have Qutf_8 declared.)
>
> You will see quite a few other places we use this in the sources.

That is not right, code_convert_string_norecord takes a Lisp_Object as first 
argument.

	Jan D.




^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH] Fix D-Bus string encoding.
  2010-06-08 21:39 Bug with UTF-8 string and dbus Julien Danjou
  2010-06-09  0:43 ` Stefan Monnier
@ 2010-06-09  9:16 ` Julien Danjou
  2010-06-10  0:20   ` Stefan Monnier
  1 sibling, 1 reply; 23+ messages in thread
From: Julien Danjou @ 2010-06-09  9:16 UTC (permalink / raw)
  To: emacs-devel; +Cc: Julien Danjou

Signed-off-by: Julien Danjou <julien@danjou.info>
---

This fix the problem described in <87typdnr08.fsf@keller.adm.naquadah.org>

 src/ChangeLog  |    5 +++++
 src/dbusbind.c |    2 +-
 2 files changed, 6 insertions(+), 1 deletions(-)

diff --git a/src/ChangeLog b/src/ChangeLog
index 16e1b87..daa9ea7 100644
--- a/src/ChangeLog
+++ b/src/ChangeLog
@@ -1,3 +1,8 @@
+2010-06-09  Julien Danjou  <julien@danjou.info>
+
+	* dbusbind.c (xd_append_arg): Fix string encoding: it has to be
+	valid UTF-8.
+
 2010-06-08  Dan Nicolaescu  <dann@ics.uci.edu>
 
 	* lread.c (X_OK): Remove, unused.
diff --git a/src/dbusbind.c b/src/dbusbind.c
index a72a955..4a17fb4 100644
--- a/src/dbusbind.c
+++ b/src/dbusbind.c
@@ -489,7 +489,7 @@ xd_append_arg (dtype, object, iter)
       case DBUS_TYPE_OBJECT_PATH:
       case DBUS_TYPE_SIGNATURE:
 	{
-	  char *val = SDATA (Fstring_make_unibyte (object));
+          char *val = SDATA (ENCODE_UTF_8 (object));
 	  XD_DEBUG_MESSAGE ("%c %s", dtype, val);
 	  if (!dbus_message_iter_append_basic (iter, dtype, &val))
 	    XD_SIGNAL2 (build_string ("Unable to append argument"), object);
-- 
1.7.1




^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: Bug with UTF-8 string and dbus
  2010-06-09  8:51         ` Jan Djärv
@ 2010-06-09  9:30           ` Eli Zaretskii
  0 siblings, 0 replies; 23+ messages in thread
From: Eli Zaretskii @ 2010-06-09  9:30 UTC (permalink / raw)
  To: Jan Djärv; +Cc: julien, monnier, emacs-devel

> Date: Wed, 09 Jun 2010 10:51:08 +0200
> From: Jan Djärv <jan.h.d@swipnet.se>
> CC: Julien Danjou <julien@danjou.info>, monnier@iro.umontreal.ca, 
>  emacs-devel@gnu.org
> 
> > Here's one way:
> >
> >     code_convert_string_norecord (SDATA (object), Qutf_8, 1);
> >
> > (Make sure you include coding.h, to have Qutf_8 declared.)
> >
> > You will see quite a few other places we use this in the sources.
> 
> That is not right, code_convert_string_norecord takes a Lisp_Object as first 
> argument.

You are right, sorry.  The first argument should be object itself
(assuming it is a Lisp string).  If it is a C string, then it should
be run through make_unibyte_string first.



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Bug with UTF-8 string and dbus
  2010-06-09  6:34     ` Julien Danjou
  2010-06-09  7:27       ` Eli Zaretskii
  2010-06-09  7:28       ` Jan Djärv
@ 2010-06-09 14:08       ` Stefan Monnier
  2010-06-09 14:24         ` Julien Danjou
                           ` (2 more replies)
  2 siblings, 3 replies; 23+ messages in thread
From: Stefan Monnier @ 2010-06-09 14:08 UTC (permalink / raw)
  To: Julien Danjou; +Cc: Eli Zaretskii, emacs-devel

> According to D-Bus spec[1]:

>           STRING
>           UTF-8 string (must be valid UTF-8). Must be nul terminated and
>           contain no other nul bytes.

>> Using unencoded SDATA would be wrong with eight-bit characters (aka
>> raw bytes).  I'd suggest to encode, to be on the safe side.

Since the spec says "must be valid UTF-8", I think that SDATA(object)
might even be better than encoding to utf-8 since encoding to utf-8 will
turn eight-bit-byte chars into raw bytes which will likely not make
a valid utf-8 output.


        Stefan



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Bug with UTF-8 string and dbus
  2010-06-09 14:08       ` Stefan Monnier
@ 2010-06-09 14:24         ` Julien Danjou
  2010-06-09 15:01         ` Andreas Schwab
  2010-06-09 22:19         ` Andreas Schwab
  2 siblings, 0 replies; 23+ messages in thread
From: Julien Danjou @ 2010-06-09 14:24 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: Eli Zaretskii, emacs-devel

[-- Attachment #1: Type: text/plain, Size: 603 bytes --]

On Wed, Jun 09 2010, Stefan Monnier wrote:

> Since the spec says "must be valid UTF-8", I think that SDATA(object)
> might even be better than encoding to utf-8 since encoding to utf-8 will
> turn eight-bit-byte chars into raw bytes which will likely not make
> a valid utf-8 output.

I clearly not have enough inside myself to decide what's the best
approach. I can provide a good patch if needed, but I'm also quite sure
any of you can fix it correctly.

So as long as someone push a patch I'll be happy. :-)

-- 
Julien Danjou
// ᐰ <julien@danjou.info>   http://julien.danjou.info

[-- Attachment #2: Type: application/pgp-signature, Size: 197 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Bug with UTF-8 string and dbus
  2010-06-09 14:08       ` Stefan Monnier
  2010-06-09 14:24         ` Julien Danjou
@ 2010-06-09 15:01         ` Andreas Schwab
  2010-06-09 15:39           ` Michael Albinus
  2010-06-09 18:11           ` Stefan Monnier
  2010-06-09 22:19         ` Andreas Schwab
  2 siblings, 2 replies; 23+ messages in thread
From: Andreas Schwab @ 2010-06-09 15:01 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: Julien Danjou, Eli Zaretskii, emacs-devel

Stefan Monnier <monnier@iro.umontreal.ca> writes:

> Since the spec says "must be valid UTF-8", I think that SDATA(object)
> might even be better than encoding to utf-8 since encoding to utf-8 will
> turn eight-bit-byte chars into raw bytes which will likely not make
> a valid utf-8 output.

Neither does emacs' internal encoding.  If you have a non-utf8 string
you have lost already.

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Bug with UTF-8 string and dbus
  2010-06-09 15:01         ` Andreas Schwab
@ 2010-06-09 15:39           ` Michael Albinus
  2010-06-09 18:11           ` Stefan Monnier
  1 sibling, 0 replies; 23+ messages in thread
From: Michael Albinus @ 2010-06-09 15:39 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: Julien Danjou, Eli Zaretskii, Stefan Monnier, emacs-devel

Andreas Schwab <schwab@linux-m68k.org> writes:

> Stefan Monnier <monnier@iro.umontreal.ca> writes:
>
>> Since the spec says "must be valid UTF-8", I think that SDATA(object)
>> might even be better than encoding to utf-8 since encoding to utf-8 will
>> turn eight-bit-byte chars into raw bytes which will likely not make
>> a valid utf-8 output.
>
> Neither does emacs' internal encoding.  If you have a non-utf8 string
> you have lost already.

The following works with Julien's patch:

  (notifications-notify :title "Emacsé")

So it might be worth to install it.

> Andreas.

Best regards, Michael.



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Bug with UTF-8 string and dbus
  2010-06-09 15:01         ` Andreas Schwab
  2010-06-09 15:39           ` Michael Albinus
@ 2010-06-09 18:11           ` Stefan Monnier
  2010-06-09 19:45             ` Davis Herring
  2010-06-09 20:30             ` Andreas Schwab
  1 sibling, 2 replies; 23+ messages in thread
From: Stefan Monnier @ 2010-06-09 18:11 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: Julien Danjou, Eli Zaretskii, emacs-devel

>> Since the spec says "must be valid UTF-8", I think that SDATA(object)
>> might even be better than encoding to utf-8 since encoding to utf-8 will
>> turn eight-bit-byte chars into raw bytes which will likely not make
>> a valid utf-8 output.
> Neither does emacs' internal encoding.

AFAIK, Emacs's internal encoding is valid utf-8.  It uses private
characters for some things, but I don't think that makes it invalid.

> If you have a non-utf8 string
> you have lost already.

Not sure what you mean by "non-utf8 string".


        Stefan



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Bug with UTF-8 string and dbus
  2010-06-09 18:11           ` Stefan Monnier
@ 2010-06-09 19:45             ` Davis Herring
  2010-06-09 20:30             ` Andreas Schwab
  1 sibling, 0 replies; 23+ messages in thread
From: Davis Herring @ 2010-06-09 19:45 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: Julien Danjou, Eli Zaretskii, Andreas Schwab, emacs-devel

> AFAIK, Emacs's internal encoding is valid utf-8.  It uses private
> characters for some things, but I don't think that makes it invalid.

Aren't some of those characters encoded using more bytes than the standard
allows?

Davis

-- 
This product is sold by volume, not by mass.  If it appears too dense or
too sparse, it is because mass-energy conversion has occurred during
shipping.



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Bug with UTF-8 string and dbus
  2010-06-09 18:11           ` Stefan Monnier
  2010-06-09 19:45             ` Davis Herring
@ 2010-06-09 20:30             ` Andreas Schwab
  2010-06-09 20:42               ` David Kastrup
  1 sibling, 1 reply; 23+ messages in thread
From: Andreas Schwab @ 2010-06-09 20:30 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: Julien Danjou, Eli Zaretskii, emacs-devel

Stefan Monnier <monnier@iro.umontreal.ca> writes:

> AFAIK, Emacs's internal encoding is valid utf-8.  It uses private
> characters for some things, but I don't think that makes it invalid.

The eight-bit characters are encoded outside of the Unicode range, and a
good utf-8 decoder must treat them as invalid.

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Bug with UTF-8 string and dbus
  2010-06-09 20:30             ` Andreas Schwab
@ 2010-06-09 20:42               ` David Kastrup
  0 siblings, 0 replies; 23+ messages in thread
From: David Kastrup @ 2010-06-09 20:42 UTC (permalink / raw)
  To: emacs-devel

Andreas Schwab <schwab@linux-m68k.org> writes:

> Stefan Monnier <monnier@iro.umontreal.ca> writes:
>
>> AFAIK, Emacs's internal encoding is valid utf-8.  It uses private
>> characters for some things, but I don't think that makes it invalid.
>
> The eight-bit characters are encoded outside of the Unicode range, and a
> good utf-8 decoder must treat them as invalid.

Yes, that's the whole point.  Indeed, Emacs own utf-8 decoder treats
them as invalid too: when Emacs considers the data to be in utf-8
instead of emacs-internal encoding, it will decode the respective codes
into its "raw byte" presentation.  Which again is not legal utf-8 (but a
rather obvious "extension" of the utf-8 encoding scheme which quite
artificially stops at 2^20+2^16 or something similar which I don't
accurately remember and that is a consequence of the range encodable
with utf-16 with surrogate codes).

-- 
David Kastrup




^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Bug with UTF-8 string and dbus
  2010-06-09 14:08       ` Stefan Monnier
  2010-06-09 14:24         ` Julien Danjou
  2010-06-09 15:01         ` Andreas Schwab
@ 2010-06-09 22:19         ` Andreas Schwab
       [not found]           ` <19472.35590.940217.577634@uwakimon.sk.tsukuba.ac.jp>
  2 siblings, 1 reply; 23+ messages in thread
From: Andreas Schwab @ 2010-06-09 22:19 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: Julien Danjou, Eli Zaretskii, emacs-devel

Stefan Monnier <monnier@iro.umontreal.ca> writes:

> Since the spec says "must be valid UTF-8", I think that SDATA(object)
> might even be better than encoding to utf-8 since encoding to utf-8 will
> turn eight-bit-byte chars into raw bytes which will likely not make
> a valid utf-8 output.

IMHO unencoded data should never leave the internals of Emacs.

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] Fix D-Bus string encoding.
  2010-06-09  9:16 ` [PATCH] Fix D-Bus string encoding Julien Danjou
@ 2010-06-10  0:20   ` Stefan Monnier
  2010-06-10  1:56     ` Eli Zaretskii
  0 siblings, 1 reply; 23+ messages in thread
From: Stefan Monnier @ 2010-06-10  0:20 UTC (permalink / raw)
  To: Julien Danjou; +Cc: emacs-devel

> +	* dbusbind.c (xd_append_arg): Fix string encoding: it has to be
> +	valid UTF-8.

Thanks, I installed a slightly different fix which just uses the
string's bytes (internally stored in utf-8 already), and added checks.


        Stefan



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] Fix D-Bus string encoding.
  2010-06-10  0:20   ` Stefan Monnier
@ 2010-06-10  1:56     ` Eli Zaretskii
  2010-06-10  2:48       ` Miles Bader
  0 siblings, 1 reply; 23+ messages in thread
From: Eli Zaretskii @ 2010-06-10  1:56 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: julien, emacs-devel

> From: Stefan Monnier <monnier@iro.umontreal.ca>
> Date: Wed, 09 Jun 2010 20:20:10 -0400
> Cc: emacs-devel@gnu.org
> 
> > +	* dbusbind.c (xd_append_arg): Fix string encoding: it has to be
> > +	valid UTF-8.
> 
> Thanks, I installed a slightly different fix which just uses the
> string's bytes (internally stored in utf-8 already)

I agree with Andreas here: we should not output unencoded internal
representation of characters.  If nothing else, it will be a
maintenance burden if and when we decide to change something in the
internal representation, or if dbus will ever accept binary data.

I don't really understand why you insist on doing this over
objections, even though functionally both approaches are currently
equivalent.



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] Fix D-Bus string encoding.
  2010-06-10  1:56     ` Eli Zaretskii
@ 2010-06-10  2:48       ` Miles Bader
  2010-06-10  3:49         ` Eli Zaretskii
  0 siblings, 1 reply; 23+ messages in thread
From: Miles Bader @ 2010-06-10  2:48 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: julien, Stefan Monnier, emacs-devel

Eli Zaretskii <eliz@gnu.org> writes:
>> Thanks, I installed a slightly different fix which just uses the
>> string's bytes (internally stored in utf-8 already)
>
> I agree with Andreas here: we should not output unencoded internal
> representation of characters.  If nothing else, it will be a
> maintenance burden if and when we decide to change something in the
> internal representation, or if dbus will ever accept binary data.

Isn't there some function in Emacs which "converts", but internally
checks to see if the desired output encoding is the same as Emacs'
internal encoding, and avoids the actual conversion in that case?  That
would be both future-proof and efficient...

[If there isn't such a function, there should be!]

-Miles

-- 
Infancy, n. The period of our lives when, according to Wordsworth, 'Heaven
lies about us.' The world begins lying about us pretty soon afterward.



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] Fix D-Bus string encoding.
  2010-06-10  2:48       ` Miles Bader
@ 2010-06-10  3:49         ` Eli Zaretskii
  0 siblings, 0 replies; 23+ messages in thread
From: Eli Zaretskii @ 2010-06-10  3:49 UTC (permalink / raw)
  To: Miles Bader; +Cc: julien, monnier, emacs-devel

> From: Miles Bader <miles@gnu.org>
> System-Type: x86_64-unknown-linux-gnu
> Date: Thu, 10 Jun 2010 11:48:41 +0900
> Cc: julien@danjou.info, Stefan Monnier <monnier@iro.umontreal.ca>,
> 	emacs-devel@gnu.org
> Reply-To: Miles Bader <miles@gnu.org>
> 
> Isn't there some function in Emacs which "converts", but internally
> checks to see if the desired output encoding is the same as Emacs'
> internal encoding, and avoids the actual conversion in that case?

What do you mean by "encoding is the same"?  If the internal encoding
is utf-8-emacs, while the external is utf-8, are they "the same" or
not?

Anyway, I think avoiding the conversion in this case is a classic
example of premature optimization: no one have made the case that
performance matters in this case.  If you look at encode_coding_utf_8
you will see that it's no more than a fancy copy for almost every
character (notable exception being eight-bit characters, aka raw
bytes).  What exactly are we saving here by "avoiding the conversion"?

Am I missing something?



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Bug with UTF-8 string and dbus
       [not found]           ` <19472.35590.940217.577634@uwakimon.sk.tsukuba.ac.jp>
@ 2010-06-10  8:05             ` Andreas Schwab
  0 siblings, 0 replies; 23+ messages in thread
From: Andreas Schwab @ 2010-06-10  8:05 UTC (permalink / raw)
  To: Stephen J. Turnbull
  Cc: Julien Danjou, Eli Zaretskii, Stefan Monnier, emacs-devel

"Stephen J. Turnbull" <stephen@xemacs.org> writes:

> Maybe it's just a matter of terminology, but what about David's
> leading use case of TeX (as opposed to say Omega), which treats UTF-8
> as an octet stream?

If you have an octett stream then you don't have utf-8, but an unibyte
string.  My point is that the Emacs dbus interface should _always_
(en|de)code the strings it passes to/from the outside.  The internal
representation should remain internal to Emacs.

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."



^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2010-06-10  8:05 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-06-08 21:39 Bug with UTF-8 string and dbus Julien Danjou
2010-06-09  0:43 ` Stefan Monnier
2010-06-09  1:17   ` Eli Zaretskii
2010-06-09  6:34     ` Julien Danjou
2010-06-09  7:27       ` Eli Zaretskii
2010-06-09  8:51         ` Jan Djärv
2010-06-09  9:30           ` Eli Zaretskii
2010-06-09  7:28       ` Jan Djärv
2010-06-09 14:08       ` Stefan Monnier
2010-06-09 14:24         ` Julien Danjou
2010-06-09 15:01         ` Andreas Schwab
2010-06-09 15:39           ` Michael Albinus
2010-06-09 18:11           ` Stefan Monnier
2010-06-09 19:45             ` Davis Herring
2010-06-09 20:30             ` Andreas Schwab
2010-06-09 20:42               ` David Kastrup
2010-06-09 22:19         ` Andreas Schwab
     [not found]           ` <19472.35590.940217.577634@uwakimon.sk.tsukuba.ac.jp>
2010-06-10  8:05             ` Andreas Schwab
2010-06-09  9:16 ` [PATCH] Fix D-Bus string encoding Julien Danjou
2010-06-10  0:20   ` Stefan Monnier
2010-06-10  1:56     ` Eli Zaretskii
2010-06-10  2:48       ` Miles Bader
2010-06-10  3:49         ` Eli Zaretskii

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).