`decode-coding-string' question

unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed

* `decode-coding-string' question
@ 2006-07-03 21:35 Paul Pogonyshev
  2006-07-04  0:50 ` Kenichi Handa
  2006-07-04 12:55 ` Richard Stallman
  0 siblings, 2 replies; 21+ messages in thread
From: Paul Pogonyshev @ 2006-07-03 21:35 UTC (permalink / raw)


Say I have a string with various text properties set.  If I then
apply `decode-coding-string' to it, all the properties are lost.
Is there a way to transfer properties from ``character beginning''
(i.e. first character of a number being combined during decoding)
to the decoded character?

I.e. UTF-8 representation of copyright sign is "\xc2\xa9".  Can I
transfer properties of '\xc2' character to the copyright sign
character and discard properties of '\xa9' character?

Paul

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: `decode-coding-string' question
  2006-07-03 21:35 `decode-coding-string' question Paul Pogonyshev
@ 2006-07-04  0:50 ` Kenichi Handa
  2006-07-04  3:27   ` Eli Zaretskii
  2006-07-04 15:31   ` Paul Pogonyshev
  2006-07-04 12:55 ` Richard Stallman
  1 sibling, 2 replies; 21+ messages in thread
From: Kenichi Handa @ 2006-07-04  0:50 UTC (permalink / raw)
  Cc: emacs-devel

In article <200607040035.01379.pogonyshev@gmx.net>, Paul Pogonyshev <pogonyshev@gmx.net> writes:

> Say I have a string with various text properties set.  If I then
> apply `decode-coding-string' to it, all the properties are lost.
> Is there a way to transfer properties from ``character beginning''
> (i.e. first character of a number being combined during decoding)
> to the decoded character?

In the current implementation, it's impossible.  But, first
of all, why do you have text properties on unibyte string?
I think all text processing should be done after the string
is decoded.

---
Kenichi Handa
handa@m17n.org

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: `decode-coding-string' question
  2006-07-04  0:50 ` Kenichi Handa
@ 2006-07-04  3:27   ` Eli Zaretskii
  2006-07-04 17:29     ` Richard Stallman
  2006-07-04 15:31   ` Paul Pogonyshev
  1 sibling, 1 reply; 21+ messages in thread
From: Eli Zaretskii @ 2006-07-04  3:27 UTC (permalink / raw)
  Cc: emacs-devel, pogonyshev

> From: Kenichi Handa <handa@m17n.org>
> Date: Tue, 04 Jul 2006 09:50:16 +0900
> Cc: emacs-devel@gnu.org
> 
> But, first of all, why do you have text properties on unibyte
> string?  I think all text processing should be done after the string
> is decoded.

Seconded.  Undecoded string is not text, strictly speaking, it's a
stream of bytes with no clear notion of a character.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: `decode-coding-string' question
  2006-07-03 21:35 `decode-coding-string' question Paul Pogonyshev
  2006-07-04  0:50 ` Kenichi Handa
@ 2006-07-04 12:55 ` Richard Stallman
  2006-07-04 13:03   ` David Kastrup
  1 sibling, 1 reply; 21+ messages in thread
From: Richard Stallman @ 2006-07-04 12:55 UTC (permalink / raw)
  Cc: emacs-devel

    Say I have a string with various text properties set.  If I then
    apply `decode-coding-string' to it, all the properties are lost.
    Is there a way to transfer properties from ``character beginning''
    (i.e. first character of a number being combined during decoding)
    to the decoded character?

It would be a nice improvement to make, but not trivial.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: `decode-coding-string' question
  2006-07-04 12:55 ` Richard Stallman
@ 2006-07-04 13:03   ` David Kastrup
  2006-07-04 13:23     ` Johan Bockgård
  0 siblings, 1 reply; 21+ messages in thread
From: David Kastrup @ 2006-07-04 13:03 UTC (permalink / raw)
  Cc: emacs-devel, Paul Pogonyshev

Richard Stallman <rms@gnu.org> writes:

>     Say I have a string with various text properties set.  If I then
>     apply `decode-coding-string' to it, all the properties are lost.
>     Is there a way to transfer properties from ``character beginning''
>     (i.e. first character of a number being combined during decoding)
>     to the decoded character?
>
> It would be a nice improvement to make, but not trivial.

I think it would be more important that recoding/encoding/decoding
preserved markers.  Again, likely not trivial.

-- 
David Kastrup, Kriemhildstr. 15, 44793 Bochum

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: `decode-coding-string' question
  2006-07-04 13:03   ` David Kastrup
@ 2006-07-04 13:23     ` Johan Bockgård
  0 siblings, 0 replies; 21+ messages in thread
From: Johan Bockgård @ 2006-07-04 13:23 UTC (permalink / raw)



While we're at it: I would like that `format-time-string' preserves
text properties (like `format' does).

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: `decode-coding-string' question
  2006-07-04  0:50 ` Kenichi Handa
  2006-07-04  3:27   ` Eli Zaretskii
@ 2006-07-04 15:31   ` Paul Pogonyshev
  2006-07-05  0:55     ` Kenichi Handa
  1 sibling, 1 reply; 21+ messages in thread
From: Paul Pogonyshev @ 2006-07-04 15:31 UTC (permalink / raw)
  Cc: Kenichi Handa

Kenichi Handa wrote:
> In article <200607040035.01379.pogonyshev@gmx.net>, Paul Pogonyshev <pogonyshev@gmx.net> writes:
> 
> > Say I have a string with various text properties set.  If I then
> > apply `decode-coding-string' to it, all the properties are lost.
> > Is there a way to transfer properties from ``character beginning''
> > (i.e. first character of a number being combined during decoding)
> > to the decoded character?
> 
> In the current implementation, it's impossible.  But, first
> of all, why do you have text properties on unibyte string?
> I think all text processing should be done after the string
> is decoded.

Bad.  OK, here is my task: I have a C string in the sources, possibly
containing encoded characters, like

	"foo bla \xc2\xa9",

the last thing being the UTF-8 copyright characters.  I want to
decode the string (can do that) _and_ know where particular
characters begin. Currently I set text property `point' on character
beginnings, but `decode-coding-string' eats them :(  Can anyone see a
different solution, maybe ugly if nothing else?  (Except that custom
implementation of `decode-coding-string' doesn't count as a solution
;)

Paul

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: `decode-coding-string' question
  2006-07-04  3:27   ` Eli Zaretskii
@ 2006-07-04 17:29     ` Richard Stallman
  0 siblings, 0 replies; 21+ messages in thread
From: Richard Stallman @ 2006-07-04 17:29 UTC (permalink / raw)
  Cc: pogonyshev, emacs-devel, handa

    Seconded.  Undecoded string is not text, strictly speaking, it's a
    stream of bytes with no clear notion of a character.

Since text properties apply to portions of a string or buffer,
they can make sense on encoded text.  So the idea is not nonsense.
There might be uses for it.

However, I am not sure it would be worth the work to implement
this, even if someone competent wants to do it.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: `decode-coding-string' question
  2006-07-04 15:31   ` Paul Pogonyshev
@ 2006-07-05  0:55     ` Kenichi Handa
  2006-07-05 16:11       ` Paul Pogonyshev
  0 siblings, 1 reply; 21+ messages in thread
From: Kenichi Handa @ 2006-07-05  0:55 UTC (permalink / raw)
  Cc: emacs-devel

In article <200607041831.18435.pogonyshev@gmx.net>, Paul Pogonyshev <pogonyshev@gmx.net> writes:

> Bad.  OK, here is my task: I have a C string in the sources, possibly
> containing encoded characters, like

> 	"foo bla \xc2\xa9",

> the last thing being the UTF-8 copyright characters.  I want to
> decode the string (can do that) _and_ know where particular
> characters begin. Currently I set text property `point' on character
> beginnings, but `decode-coding-string' eats them :(  Can anyone see a
> different solution, maybe ugly if nothing else?  (Except that custom
> implementation of `decode-coding-string' doesn't count as a solution
> ;)

Why don't you find paticular characters in the decoded
string?

---
Kenichi Handa
handa@m17n.org

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: `decode-coding-string' question
  2006-07-05  0:55     ` Kenichi Handa
@ 2006-07-05 16:11       ` Paul Pogonyshev
  2006-07-05 16:34         ` Stuart D. Herring
  2006-07-06  1:08         ` Kenichi Handa
  0 siblings, 2 replies; 21+ messages in thread
From: Paul Pogonyshev @ 2006-07-05 16:11 UTC (permalink / raw)
  Cc: Kenichi Handa

Kenichi Handa wrote:
> In article <200607041831.18435.pogonyshev@gmx.net>, Paul Pogonyshev <pogonyshev@gmx.net> writes:
> 
> > Bad.  OK, here is my task: I have a C string in the sources, possibly
> > containing encoded characters, like
> 
> > 	"foo bla \xc2\xa9",
> 
> > the last thing being the UTF-8 copyright characters.  I want to
> > decode the string (can do that) _and_ know where particular
> > characters begin. Currently I set text property `point' on character
> > beginnings, but `decode-coding-string' eats them :(  Can anyone see a
> > different solution, maybe ugly if nothing else?  (Except that custom
> > implementation of `decode-coding-string' doesn't count as a solution
> > ;)
> 
> Why don't you find paticular characters in the decoded
> string?

I do.  But I need to know where they begin in the buffer (containing
the encoded C string.)  I don't see a way to keep this information at
present... :(

For instance, if the buffer only contains "\xc2\xa9foo", I'd like
to receive a string with the following text properties:

  #("©foo" 0 1 (point 0) 1 2 (point 8) 2 3 (point 9) 3 4 (point 10))

The first characters actually takes 8 characters in the buffer!

Paul

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: `decode-coding-string' question
  2006-07-05 16:11       ` Paul Pogonyshev
@ 2006-07-05 16:34         ` Stuart D. Herring
  2006-07-05 16:50           ` Paul Pogonyshev
  2006-07-06  1:08         ` Kenichi Handa
  1 sibling, 1 reply; 21+ messages in thread
From: Stuart D. Herring @ 2006-07-05 16:34 UTC (permalink / raw)
  Cc: emacs-devel

> I do.  But I need to know where they begin in the buffer (containing
> the encoded C string.)  I don't see a way to keep this information at
> present... :(
>
> For instance, if the buffer only contains "\xc2\xa9foo", I'd like
> to receive a string with the following text properties:
>
>   #("©foo" 0 1 (point 0) 1 2 (point 8) 2 3 (point 9) 3 4 (point 10))
>
> The first characters actually takes 8 characters in the buffer!

This is a horrible hack, but could you take the "©foo" that you get
(without properties), turn it into "©-f-o-o-", then reencode it and look
for the '-'s you added?  It might run into trouble if there were -s in the
string already, but you could always compare the original and -ed strings
to resolve that.

Davis

-- 
This product is sold by volume, not by mass.  If it appears too dense or
too sparse, it is because mass-energy conversion has occurred during
shipping.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: `decode-coding-string' question
  2006-07-05 16:34         ` Stuart D. Herring
@ 2006-07-05 16:50           ` Paul Pogonyshev
  0 siblings, 0 replies; 21+ messages in thread
From: Paul Pogonyshev @ 2006-07-05 16:50 UTC (permalink / raw)


Stuart D. Herring wrote:
> > I do.  But I need to know where they begin in the buffer (containing
> > the encoded C string.)  I don't see a way to keep this information at
> > present... :(
> >
> > For instance, if the buffer only contains "\xc2\xa9foo", I'd like
> > to receive a string with the following text properties:
> >
> >   #("©foo" 0 1 (point 0) 1 2 (point 8) 2 3 (point 9) 3 4 (point 10))
> >
> > The first characters actually takes 8 characters in the buffer!
> 
> This is a horrible hack, but could you take the "©foo" that you get
> (without properties), turn it into "©-f-o-o-", then reencode it and look
> for the '-'s you added?  It might run into trouble if there were -s in the
> string already, but you could always compare the original and -ed strings
> to resolve that.

Probably I don't have any other option, I have to try something like
this...  Or maybe reencode character by one...

Thanks!

Paul

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: `decode-coding-string' question
  2006-07-05 16:11       ` Paul Pogonyshev
  2006-07-05 16:34         ` Stuart D. Herring
@ 2006-07-06  1:08         ` Kenichi Handa
  2006-07-06 15:52           ` Paul Pogonyshev
  1 sibling, 1 reply; 21+ messages in thread
From: Kenichi Handa @ 2006-07-06  1:08 UTC (permalink / raw)
  Cc: emacs-devel

In article <200607051911.45299.pogonyshev@gmx.net>, Paul Pogonyshev <pogonyshev@gmx.net> writes:

>> Why don't you find paticular characters in the decoded
>> string?

> I do.  But I need to know where they begin in the buffer (containing
> the encoded C string.)  I don't see a way to keep this information at
> present... :(

How did you make that buffer?  Why don't you have an
already-decoded text in that buffer?

> For instance, if the buffer only contains "\xc2\xa9foo", I'd like
> to receive a string with the following text properties:

>   #("©foo" 0 1 (point 0) 1 2 (point 8) 2 3 (point 9) 3 4 (point 10))

> The first characters actually takes 8 characters in the buffer!

They are just displayed by 8 characters, as well as, a
control character, say Formfeed (C-l), is displayed by 2
characters "^L".

---
Kenichi Handa
handa@m17n.org

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: `decode-coding-string' question
  2006-07-06  1:08         ` Kenichi Handa
@ 2006-07-06 15:52           ` Paul Pogonyshev
  2006-07-06 20:18             ` Eli Zaretskii
  0 siblings, 1 reply; 21+ messages in thread
From: Paul Pogonyshev @ 2006-07-06 15:52 UTC (permalink / raw)
  Cc: Kenichi Handa

Kenichi Handa wrote:
> In article <200607051911.45299.pogonyshev@gmx.net>, Paul Pogonyshev <pogonyshev@gmx.net> writes:
> 
> >> Why don't you find paticular characters in the decoded
> >> string?
> 
> > I do.  But I need to know where they begin in the buffer (containing
> > the encoded C string.)  I don't see a way to keep this information at
> > present... :(
> 
> How did you make that buffer?  Why don't you have an
> already-decoded text in that buffer?

Because it's a C source file.  Strings have to be encoded there.

> > For instance, if the buffer only contains "\xc2\xa9foo", I'd like
> > to receive a string with the following text properties:
> 
> >   #("©foo" 0 1 (point 0) 1 2 (point 8) 2 3 (point 9) 3 4 (point 10))
> 
> > The first characters actually takes 8 characters in the buffer!
> 
> They are just displayed by 8 characters, as well as, a
> control character, say Formfeed (C-l), is displayed by 2
> characters "^L".

They are 8 different characters in the buffer.

Paul

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: `decode-coding-string' question
  2006-07-06 15:52           ` Paul Pogonyshev
@ 2006-07-06 20:18             ` Eli Zaretskii
  2006-07-06 20:34               ` Paul Pogonyshev
  0 siblings, 1 reply; 21+ messages in thread
From: Eli Zaretskii @ 2006-07-06 20:18 UTC (permalink / raw)
  Cc: handa, emacs-devel

> From: Paul Pogonyshev <pogonyshev@gmx.net>
> Date: Thu, 6 Jul 2006 18:52:28 +0300
> Cc: Kenichi Handa <handa@m17n.org>
> 
> > > I do.  But I need to know where they begin in the buffer (containing
> > > the encoded C string.)  I don't see a way to keep this information at
> > > present... :(
> > 
> > How did you make that buffer?  Why don't you have an
> > already-decoded text in that buffer?
> 
> Because it's a C source file.  Strings have to be encoded there.

Paul, there's some misunderstanding here, so please bear with us.
Handa-san cannot understand how come you have undecoded characters in
the buffer, and neither can I.

The fact that it's a C file does not matter: Emacs _always_ decodes
characters when it visits the file, no matter if it's a C file or
something else.  In the text you get in your buffer the characters
should be decoded.  The question is, how come it didn't decode these
characters in your case?  Are there other non-ASCII characters in the
same file, perhaps? if so, what characters are those?  For that
matter, can you post a small sample file that, when visited in Emacs,
leaves the UTF-8 encoded characters undecoded in the buffer?  Please
post that file as a binary attachment, to avoid munging it by email
software en- and de-coding.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: `decode-coding-string' question
  2006-07-06 20:18             ` Eli Zaretskii
@ 2006-07-06 20:34               ` Paul Pogonyshev
  2006-07-07  9:17                 ` Eli Zaretskii
  0 siblings, 1 reply; 21+ messages in thread
From: Paul Pogonyshev @ 2006-07-06 20:34 UTC (permalink / raw)
  Cc: handa

Eli Zaretskii wrote:
> > From: Paul Pogonyshev <pogonyshev@gmx.net>
> > Date: Thu, 6 Jul 2006 18:52:28 +0300
> > Cc: Kenichi Handa <handa@m17n.org>
> > 
> > > > I do.  But I need to know where they begin in the buffer (containing
> > > > the encoded C string.)  I don't see a way to keep this information at
> > > > present... :(
> > > 
> > > How did you make that buffer?  Why don't you have an
> > > already-decoded text in that buffer?
> > 
> > Because it's a C source file.  Strings have to be encoded there.
> 
> Paul, there's some misunderstanding here, so please bear with us.
> Handa-san cannot understand how come you have undecoded characters in
> the buffer, and neither can I.
> 
> The fact that it's a C file does not matter: Emacs _always_ decodes
> characters when it visits the file, no matter if it's a C file or
> something else.  In the text you get in your buffer the characters
> should be decoded.  The question is, how come it didn't decode these
> characters in your case?  Are there other non-ASCII characters in the
> same file, perhaps? if so, what characters are those?  For that
> matter, can you post a small sample file that, when visited in Emacs,
> leaves the UTF-8 encoded characters undecoded in the buffer?  Please
> post that file as a binary attachment, to avoid munging it by email
> software en- and de-coding.

There is indeed a misunderstanding.  The characters in the buffer _are_
decoded.  However the characters form C escape sequence, like "\xc2\xa9".
To know what character is encoded by this C sequence, I first translate
strings "\xc2" and "\xa9" to the appropriate (undecoded!) characters.
The resulting string of length 2 is encoded in UTF-8 and I decode it
to receive the copyright character or whatever.

Phew.  Hope it is clearer now.  Anyway, it is not so important for me
anymore, since gettext doesn't support non-ASCII characters in
untranslated strings with fairly recent GNU libc.  (And yes, I tried
inserting non-ASCII characters in the untranslated strings.)

Paul

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: `decode-coding-string' question
  2006-07-06 20:34               ` Paul Pogonyshev
@ 2006-07-07  9:17                 ` Eli Zaretskii
  2006-07-07 16:05                   ` Paul Pogonyshev
  0 siblings, 1 reply; 21+ messages in thread
From: Eli Zaretskii @ 2006-07-07  9:17 UTC (permalink / raw)
  Cc: bug-cc-mode, emacs-devel

> From: Paul Pogonyshev <pogonyshev@gmx.net>
> Date: Thu, 6 Jul 2006 23:34:21 +0300
> Cc: handa@m17n.org
> 
> There is indeed a misunderstanding.  The characters in the buffer _are_
> decoded.  However the characters form C escape sequence, like "\xc2\xa9"

Right, I see the problem now.

> To know what character is encoded by this C sequence, I first translate
> strings "\xc2" and "\xa9" to the appropriate (undecoded!) characters.
> The resulting string of length 2 is encoded in UTF-8 and I decode it
> to receive the copyright character or whatever.

Why not use `(decode-coding-string "\xc2\xa9" 'utf-8)' right away?  It
gives me the right character directly.

Btw, why don't we have a feature in cc-mode to transparently decode
and encode such strings when the source file is read/written?  If
detecting the encoding is an issue, we could for starters ask that
users state that in some file-local variable.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: `decode-coding-string' question
  2006-07-07  9:17                 ` Eli Zaretskii
@ 2006-07-07 16:05                   ` Paul Pogonyshev
  2006-07-07 19:56                     ` David Kastrup
  0 siblings, 1 reply; 21+ messages in thread
From: Paul Pogonyshev @ 2006-07-07 16:05 UTC (permalink / raw)
  Cc: bug-cc-mode

Eli Zaretskii wrote:
> > To know what character is encoded by this C sequence, I first translate
> > strings "\xc2" and "\xa9" to the appropriate (undecoded!) characters.
> > The resulting string of length 2 is encoded in UTF-8 and I decode it
> > to receive the copyright character or whatever.
> 
> Why not use `(decode-coding-string "\xc2\xa9" 'utf-8)' right away?  It
> gives me the right character directly.

Because you underquoted the string.  It is actually `(decode-coding-string
"\\xc2\\xa9" 'utf-8)' and does nothing...

> Btw, why don't we have a feature in cc-mode to transparently decode
> and encode such strings when the source file is read/written?  If
> detecting the encoding is an issue, we could for starters ask that
> users state that in some file-local variable.

Given that it is not easy to decode strings (and quite slow)...

Paul

Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: `decode-coding-string' question
  2006-07-07 16:05                   ` Paul Pogonyshev
@ 2006-07-07 19:56                     ` David Kastrup
  2006-07-07 22:54                       ` Paul Pogonyshev
  0 siblings, 1 reply; 21+ messages in thread
From: David Kastrup @ 2006-07-07 19:56 UTC (permalink / raw)
  Cc: bug-cc-mode, Eli Zaretskii, emacs-devel

Paul Pogonyshev <pogonyshev@gmx.net> writes:

> Eli Zaretskii wrote:
>> > To know what character is encoded by this C sequence, I first translate
>> > strings "\xc2" and "\xa9" to the appropriate (undecoded!) characters.
>> > The resulting string of length 2 is encoded in UTF-8 and I decode it
>> > to receive the copyright character or whatever.
>> 
>> Why not use `(decode-coding-string "\xc2\xa9" 'utf-8)' right away?  It
>> gives me the right character directly.
>
> Because you underquoted the string.  It is actually `(decode-coding-string
> "\\xc2\\xa9" 'utf-8)' and does nothing...

Because you overquoted the string...

-- 
David Kastrup, Kriemhildstr. 15, 44793 Bochum

Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: `decode-coding-string' question
  2006-07-07 19:56                     ` David Kastrup
@ 2006-07-07 22:54                       ` Paul Pogonyshev
  2006-07-08  8:18                         ` David Kastrup
  0 siblings, 1 reply; 21+ messages in thread
From: Paul Pogonyshev @ 2006-07-07 22:54 UTC (permalink / raw)
  Cc: bug-cc-mode, Eli Zaretskii, David Kastrup

David Kastrup wrote:
> Paul Pogonyshev <pogonyshev@gmx.net> writes:
> 
> > Eli Zaretskii wrote:
> >> > To know what character is encoded by this C sequence, I first translate
> >> > strings "\xc2" and "\xa9" to the appropriate (undecoded!) characters.
> >> > The resulting string of length 2 is encoded in UTF-8 and I decode it
> >> > to receive the copyright character or whatever.
> >> 
> >> Why not use `(decode-coding-string "\xc2\xa9" 'utf-8)' right away?  It
> >> gives me the right character directly.
> >
> > Because you underquoted the string.  It is actually `(decode-coding-string
> > "\\xc2\\xa9" 'utf-8)' and does nothing...
> 
> Because you overquoted the string...

...

The buffer contains 8 (eight) characters.  ?\\ ?x ?c ?2 ?\\ ?x ?a ?9.

Paul

Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: `decode-coding-string' question
  2006-07-07 22:54                       ` Paul Pogonyshev
@ 2006-07-08  8:18                         ` David Kastrup
  0 siblings, 0 replies; 21+ messages in thread
From: David Kastrup @ 2006-07-08  8:18 UTC (permalink / raw)
  Cc: bug-cc-mode, Eli Zaretskii, emacs-devel

Paul Pogonyshev <pogonyshev@gmx.net> writes:

> David Kastrup wrote:
>> Paul Pogonyshev <pogonyshev@gmx.net> writes:
>> 
>> > Eli Zaretskii wrote:
>> >> > To know what character is encoded by this C sequence, I first translate
>> >> > strings "\xc2" and "\xa9" to the appropriate (undecoded!) characters.
>> >> > The resulting string of length 2 is encoded in UTF-8 and I decode it
>> >> > to receive the copyright character or whatever.
>> >> 
>> >> Why not use `(decode-coding-string "\xc2\xa9" 'utf-8)' right away?  It
>> >> gives me the right character directly.
>> >
>> > Because you underquoted the string.  It is actually `(decode-coding-string
>> > "\\xc2\\xa9" 'utf-8)' and does nothing...
>> 
>> Because you overquoted the string...
>
> ...
>
> The buffer contains 8 (eight) characters.  ?\\ ?x ?c ?2 ?\\ ?x ?a ?9.

Yes, I was aware of that.

Let us assume that you fetched the string including the double quotes
into the variable `string'.

Then take a look at (read string).  That should get you the right
amount of backslashes.

Alternatively, assume that point is on the first quote character of
the string.  Then you can use
(read current-buffer)
for reading the string.

-- 
David Kastrup, Kriemhildstr. 15, 44793 Bochum

Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2006-07-08  8:18 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-07-03 21:35 `decode-coding-string' question Paul Pogonyshev
2006-07-04  0:50 ` Kenichi Handa
2006-07-04  3:27   ` Eli Zaretskii
2006-07-04 17:29     ` Richard Stallman
2006-07-04 15:31   ` Paul Pogonyshev
2006-07-05  0:55     ` Kenichi Handa
2006-07-05 16:11       ` Paul Pogonyshev
2006-07-05 16:34         ` Stuart D. Herring
2006-07-05 16:50           ` Paul Pogonyshev
2006-07-06  1:08         ` Kenichi Handa
2006-07-06 15:52           ` Paul Pogonyshev
2006-07-06 20:18             ` Eli Zaretskii
2006-07-06 20:34               ` Paul Pogonyshev
2006-07-07  9:17                 ` Eli Zaretskii
2006-07-07 16:05                   ` Paul Pogonyshev
2006-07-07 19:56                     ` David Kastrup
2006-07-07 22:54                       ` Paul Pogonyshev
2006-07-08  8:18                         ` David Kastrup
2006-07-04 12:55 ` Richard Stallman
2006-07-04 13:03   ` David Kastrup
2006-07-04 13:23     ` Johan Bockgård

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).