* converting octal escape sequences to utf-8 and back
@ 2011-05-29 0:27 Roland Winkler
2011-05-29 2:57 ` Leo
` (3 more replies)
0 siblings, 4 replies; 18+ messages in thread
From: Roland Winkler @ 2011-05-29 0:27 UTC (permalink / raw)
To: emacs-devel
I am trying to use emacs to interface with a program that treats
utf-8 characters in its input and output as octal escape sequences.
So the program's output contains ascii strings like "\302\247",
which I want to display within Emacs as "§". Likewise, I want to
feed text containing utf-8 characters such as "§" into this program.
So I need to convert these utf-8 characters back to their respective
octal escape sequences. What is the proper way to achieve this?
Thanks a lot,
Roland
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: converting octal escape sequences to utf-8 and back
2011-05-29 0:27 converting octal escape sequences to utf-8 and back Roland Winkler
@ 2011-05-29 2:57 ` Leo
2011-05-29 5:40 ` Michael Welsh Duggan
2011-05-29 3:00 ` Eli Zaretskii
` (2 subsequent siblings)
3 siblings, 1 reply; 18+ messages in thread
From: Leo @ 2011-05-29 2:57 UTC (permalink / raw)
To: emacs-devel
On 2011-05-29 08:27 +0800, Roland Winkler wrote:
> I am trying to use emacs to interface with a program that treats
> utf-8 characters in its input and output as octal escape sequences.
> So the program's output contains ascii strings like "\302\247",
> which I want to display within Emacs as "§". Likewise, I want to
> feed text containing utf-8 characters such as "§" into this program.
> So I need to convert these utf-8 characters back to their respective
> octal escape sequences. What is the proper way to achieve this?
>
> Thanks a lot,
>
> Roland
Are these functions useful?
encode-coding-string, decode-coding-string
encode-coding-region, decode-coding-region
Leo
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: converting octal escape sequences to utf-8 and back
2011-05-29 2:57 ` Leo
@ 2011-05-29 5:40 ` Michael Welsh Duggan
2011-05-29 6:35 ` Roland Winkler
2011-05-29 6:47 ` Eli Zaretskii
0 siblings, 2 replies; 18+ messages in thread
From: Michael Welsh Duggan @ 2011-05-29 5:40 UTC (permalink / raw)
To: emacs-devel; +Cc: Roland Winkler
Leo <sdl.web@gmail.com> writes:
> On 2011-05-29 08:27 +0800, Roland Winkler wrote:
>> I am trying to use emacs to interface with a program that treats
>> utf-8 characters in its input and output as octal escape sequences.
>> So the program's output contains ascii strings like "\302\247",
>> which I want to display within Emacs as "§". Likewise, I want to
>> feed text containing utf-8 characters such as "§" into this program.
>> So I need to convert these utf-8 characters back to their respective
>> octal escape sequences. What is the proper way to achieve this?
>>
>> Thanks a lot,
>>
>> Roland
>
> Are these functions useful?
>
> encode-coding-string, decode-coding-string
> encode-coding-region, decode-coding-region
As Leo says:
(encode-coding-string "§" 'utf-8)
"\302\247"
(decode-coding-string "\302\247" 'utf-8)
"§"
--
Michael Welsh Duggan
(md5i@md5i.com)
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: converting octal escape sequences to utf-8 and back
2011-05-29 5:40 ` Michael Welsh Duggan
@ 2011-05-29 6:35 ` Roland Winkler
2011-05-29 20:05 ` Michael Welsh Duggan
2011-05-29 6:47 ` Eli Zaretskii
1 sibling, 1 reply; 18+ messages in thread
From: Roland Winkler @ 2011-05-29 6:35 UTC (permalink / raw)
To: Michael Welsh Duggan; +Cc: emacs-devel
On Sun May 29 2011 Michael Welsh Duggan wrote:
> As Leo says:
>
> (encode-coding-string "§" 'utf-8)
> "\302\247"
> (decode-coding-string "\302\247" 'utf-8)
> "§"
The decoding seems to work fine this way, but not the encoding.
If I start out with the 8-character ascii string "\302\247" the
following does not give me back this 8-character string:
(with-temp-file "~/foo.txt"
(insert (encode-coding-string
(decode-coding-string "\302\247" 'utf-8) 'utf-8)))
This will ask me for the coding system I want, suggesting the
default 'raw-text. Then I end up with a file that has only two
bytes, instead of the eight bytes I want.
What am I doing wrong? Do I need anything else for this?
Roland
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: converting octal escape sequences to utf-8 and back
2011-05-29 6:35 ` Roland Winkler
@ 2011-05-29 20:05 ` Michael Welsh Duggan
0 siblings, 0 replies; 18+ messages in thread
From: Michael Welsh Duggan @ 2011-05-29 20:05 UTC (permalink / raw)
To: Roland Winkler; +Cc: emacs-devel
"Roland Winkler" <winkler@gnu.org> writes:
> On Sun May 29 2011 Michael Welsh Duggan wrote:
>> As Leo says:
>>
>> (encode-coding-string "§" 'utf-8)
>> "\302\247"
>> (decode-coding-string "\302\247" 'utf-8)
>> "§"
>
> The decoding seems to work fine this way, but not the encoding.
> If I start out with the 8-character ascii string "\302\247" the
> following does not give me back this 8-character string:
>
> (with-temp-file "~/foo.txt"
> (insert (encode-coding-string
> (decode-coding-string "\302\247" 'utf-8) 'utf-8)))
>
> This will ask me for the coding system I want, suggesting the
> default 'raw-text. Then I end up with a file that has only two
> bytes, instead of the eight bytes I want.
(with-temp-file "~/foo.txt"
(insert (substring (prin1-to-string
(encode-coding-string
(decode-coding-string "\302\247" 'utf-8)'utf-8))
1 -1)))
--
Michael Welsh Duggan
(md5i@md5i.com)
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: converting octal escape sequences to utf-8 and back
2011-05-29 5:40 ` Michael Welsh Duggan
2011-05-29 6:35 ` Roland Winkler
@ 2011-05-29 6:47 ` Eli Zaretskii
2011-05-29 6:58 ` Roland Winkler
1 sibling, 1 reply; 18+ messages in thread
From: Eli Zaretskii @ 2011-05-29 6:47 UTC (permalink / raw)
To: Michael Welsh Duggan; +Cc: winkler, emacs-devel
> From: Michael Welsh Duggan <md5i@md5i.com>
> Date: Sun, 29 May 2011 01:40:54 -0400
> Cc: Roland Winkler <winkler@gnu.org>
>
> As Leo says:
>
> (encode-coding-string "§" 'utf-8)
> "\302\247"
This is an illusion: what is produced are 2 bytes, but when Emacs
inserts that into the buffer where the results are displayed, the
bytes are represented as ASCII strings. Try writing the result to a
file (e.g., with write-region) and you will see that what ends up in
the file is simply the UTF-8 encoding of the character. This is not
what the OP wanted.
IOW, encode-coding-string produces the encoding specified by its 3rd
argument, not its ASCII representation.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: converting octal escape sequences to utf-8 and back
2011-05-29 6:47 ` Eli Zaretskii
@ 2011-05-29 6:58 ` Roland Winkler
2011-05-29 8:35 ` Eli Zaretskii
2011-05-29 8:50 ` Thien-Thi Nguyen
0 siblings, 2 replies; 18+ messages in thread
From: Roland Winkler @ 2011-05-29 6:58 UTC (permalink / raw)
To: emacs-devel
On Sun, May 29 2011, Eli Zaretskii wrote:
> This is an illusion: what is produced are 2 bytes, but when Emacs
> inserts that into the buffer where the results are displayed, the
> bytes are represented as ASCII strings. Try writing the result to a
> file (e.g., with write-region) and you will see that what ends up in
> the file is simply the UTF-8 encoding of the character. This is not
> what the OP wanted.
>
> IOW, encode-coding-string produces the encoding specified by its 3rd
> argument, not its ASCII representation.
...So it seems that emacs knows already what I would like to have (at
least its display engine). How can I achieve that I actually get this in
an output file, too? Are there some formulas that allow one to calculate
these octal sequences? Then it should be possible to construct the ascii
character sequences, too.
Roland
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: converting octal escape sequences to utf-8 and back
2011-05-29 6:58 ` Roland Winkler
@ 2011-05-29 8:35 ` Eli Zaretskii
2011-05-29 19:15 ` Roland Winkler
2011-05-29 8:50 ` Thien-Thi Nguyen
1 sibling, 1 reply; 18+ messages in thread
From: Eli Zaretskii @ 2011-05-29 8:35 UTC (permalink / raw)
To: Roland Winkler; +Cc: emacs-devel
> From: Roland Winkler <winkler@gnu.org>
> Date: Sun, 29 May 2011 01:58:30 -0500
>
> > IOW, encode-coding-string produces the encoding specified by its 3rd
> > argument, not its ASCII representation.
>
> ...So it seems that emacs knows already what I would like to have (at
> least its display engine).
Nitpicking: It's not the display engine that does that, it's
eval-last-sexp.
> How can I achieve that I actually get this in an output file, too?
> Are there some formulas that allow one to calculate these octal
> sequences?
This should do what you want:
(with-output-to-string (princ (encode-coding-string STRING 'utf-8)))
(the STRING argument should be the entire string that you want to send
to that program of yours).
But this is crazy, IMO: Lisp code should not need to jump through the
hoops like that to produce such an octal representation. TRT is to
have a special encoding for this, then you could simply say
(encode-coding-string STRING 'foo)
or even just
(encode-coding-region START END 'foo)
because I presume that your original text comes from some buffer.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: converting octal escape sequences to utf-8 and back
2011-05-29 8:35 ` Eli Zaretskii
@ 2011-05-29 19:15 ` Roland Winkler
0 siblings, 0 replies; 18+ messages in thread
From: Roland Winkler @ 2011-05-29 19:15 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: emacs-devel
On Sun May 29 2011 Eli Zaretskii wrote:
> This should do what you want:
>
> (with-output-to-string (princ (encode-coding-string STRING 'utf-8)))
>
> (the STRING argument should be the entire string that you want to send
> to that program of yours).
Thank you. This seems to do what I want.
> But this is crazy, IMO: Lisp code should not need to jump through the
> hoops like that to produce such an octal representation. TRT is to
> have a special encoding for this
For what I want to do, this seems to be not a bottleneck. (I am
playing with a frontend for the program djvused.) -- I do not know
how common are rare my problem is to justify a new encoding for it.
Certainly, a new encoding goes much beyond my knowledge of these
things.
The web page http://billposer.org/Software/ListOfRepresentations.html
lists about 30 unicode escape formats which emacs possibly could
implement. I do not know how many of these formats are possibly
already implemented. For most of them this web page includes example
programs using it; and some of these programs appear to be more
common. For the particular format I need, this web page simply says
"used by various programs". So I don't know how common my particular
problem might be to justify such an effort. Maybe some other formats
described there would be yet more helpful for other people.
Roland
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: converting octal escape sequences to utf-8 and back
2011-05-29 6:58 ` Roland Winkler
2011-05-29 8:35 ` Eli Zaretskii
@ 2011-05-29 8:50 ` Thien-Thi Nguyen
1 sibling, 0 replies; 18+ messages in thread
From: Thien-Thi Nguyen @ 2011-05-29 8:50 UTC (permalink / raw)
To: Roland Winkler; +Cc: emacs-devel
() Roland Winkler <winkler@gnu.org>
() Sun, 29 May 2011 01:58:30 -0500
How can I achieve that I actually get this in
an output file, too?
In *scratch*, i bind ‘C-j’ to ‘ppq’[0], reproduced here:
(defun ppq (&optional replace)
(interactive "P")
(when replace
(delete-region (point) (progn (forward-sexp 1) (point))))
(save-excursion
(insert "\n")
(pp-eval-last-sexp t)
(when (bolp)
(delete-char -1))))
The function ‘pp-eval-last-sexp’ winds up calling ‘pp-to-string’.
You can finangle it to achieve the desired result like so:
(defvar pp-to-string-double-backslash t)
(defun pp-to-string (object)
"Return a string containing the pretty-printed representation of OBJECT.
OBJECT can be any Lisp object. Quoting characters are used as needed
to make output that `read' can handle, whenever this is possible."
(with-current-buffer (generate-new-buffer " pp-to-string")
(unwind-protect
(progn
(lisp-mode-variables nil)
(set-syntax-table emacs-lisp-mode-syntax-table)
(let ((print-escape-newlines pp-escape-newlines)
(print-quoted t))
(prin1 object (current-buffer)))
(pp-buffer)
;; Begin finangling.
(when pp-to-string-double-backslash
(goto-char (point-min))
(while (search-forward "\\" nil t)
(replace-match "\\\\" t t)))
;; End finangling.
(buffer-string))
(kill-buffer (current-buffer)))))
This was the five-minute top-down way. The cleaner bottom-up way
is to specify an appropriate PRINTCHARFUN to ‘prin1’, which is at
the heart of all pp funcs. Perhaps someone else can post that.
__________________________________________________________________________________
[0] http://www.gnuvola.org/software/personal-elisp/dist/lisp/prog-env/ppq.el
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: converting octal escape sequences to utf-8 and back
2011-05-29 0:27 converting octal escape sequences to utf-8 and back Roland Winkler
2011-05-29 2:57 ` Leo
@ 2011-05-29 3:00 ` Eli Zaretskii
2011-05-29 3:48 ` Roland Winkler
2011-05-30 22:39 ` Randal L. Schwartz
2011-05-30 22:54 ` Stefan Monnier
3 siblings, 1 reply; 18+ messages in thread
From: Eli Zaretskii @ 2011-05-29 3:00 UTC (permalink / raw)
To: Roland Winkler; +Cc: emacs-devel
> Date: Sat, 28 May 2011 19:27:51 -0500
> From: "Roland Winkler" <winkler@gnu.org>
>
> I am trying to use emacs to interface with a program that treats
> utf-8 characters in its input and output as octal escape sequences.
> So the program's output contains ascii strings like "\302\247",
> which I want to display within Emacs as "§". Likewise, I want to
> feed text containing utf-8 characters such as "§" into this program.
> So I need to convert these utf-8 characters back to their respective
> octal escape sequences. What is the proper way to achieve this?
A new coding-system?
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: converting octal escape sequences to utf-8 and back
2011-05-29 3:00 ` Eli Zaretskii
@ 2011-05-29 3:48 ` Roland Winkler
0 siblings, 0 replies; 18+ messages in thread
From: Roland Winkler @ 2011-05-29 3:48 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: emacs-devel
On Sun May 29 2011 Eli Zaretskii wrote:
> > Date: Sat, 28 May 2011 19:27:51 -0500
> > From: "Roland Winkler" <winkler@gnu.org>
> > I am trying to use emacs to interface with a program that treats
> > utf-8 characters in its input and output as octal escape sequences.
> > So the program's output contains ascii strings like "\302\247",
> > which I want to display within Emacs as "§". Likewise, I want to
> > feed text containing utf-8 characters such as "§" into this program.
> > So I need to convert these utf-8 characters back to their respective
> > octal escape sequences. What is the proper way to achieve this?
>
> A new coding-system?
I cannot claim I understand these things. But I thought that there
was some kind of a "conversion formula" that allows one to calculate
the octal sequence for any utf-8 character and also the opposite,
get the character given the octal sequence. Is this true?
There is a little tool uni2ascii which seems to implement this, see
http://directory.fsf.org/project/uni2ascii/
But if possible I would like to achieve this from within Emacs.
Roland
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: converting octal escape sequences to utf-8 and back
2011-05-29 0:27 converting octal escape sequences to utf-8 and back Roland Winkler
2011-05-29 2:57 ` Leo
2011-05-29 3:00 ` Eli Zaretskii
@ 2011-05-30 22:39 ` Randal L. Schwartz
2011-05-31 7:14 ` Harald Hanche-Olsen
2011-05-30 22:54 ` Stefan Monnier
3 siblings, 1 reply; 18+ messages in thread
From: Randal L. Schwartz @ 2011-05-30 22:39 UTC (permalink / raw)
To: emacs-devel
>>>>> "Roland" == Roland Winkler <winkler@gnu.org> writes:
Roland> So the program's output contains ascii strings like "\302\247",
Minor nit, but "ASCII" doesn't include anything above \177. If it has a
high-bit, it has to be something like latin-1, or (in your case) raw
UTF-8 bytes.
--
Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095
<merlyn@stonehenge.com> <URL:http://www.stonehenge.com/merlyn/>
Smalltalk/Perl/Unix consulting, Technical writing, Comedy, etc. etc.
See http://methodsandmessages.posterous.com/ for Smalltalk discussion
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: converting octal escape sequences to utf-8 and back
2011-05-30 22:39 ` Randal L. Schwartz
@ 2011-05-31 7:14 ` Harald Hanche-Olsen
2011-05-31 17:06 ` Randal L. Schwartz
0 siblings, 1 reply; 18+ messages in thread
From: Harald Hanche-Olsen @ 2011-05-31 7:14 UTC (permalink / raw)
To: emacs-devel
[merlyn@stonehenge.com (Randal L. Schwartz) (2011-05-30 22:39:51 UTC)]
> >>>>> "Roland" == Roland Winkler <winkler@gnu.org> writes:
>
> Roland> So the program's output contains ascii strings like "\302\247",
>
> Minor nit, but "ASCII" doesn't include anything above \177. If it has a
> high-bit, it has to be something like latin-1, or (in your case) raw
> UTF-8 bytes.
True, but given the octal escape sequences mentioned in the subject I
thought he meant the string that is denoted "\\302\\247" in elisp,
which is indeed ASCII. But that could be a misunderstanding on my
part.
- Harald
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: converting octal escape sequences to utf-8 and back
2011-05-31 7:14 ` Harald Hanche-Olsen
@ 2011-05-31 17:06 ` Randal L. Schwartz
2011-05-31 20:13 ` PJ Weisberg
0 siblings, 1 reply; 18+ messages in thread
From: Randal L. Schwartz @ 2011-05-31 17:06 UTC (permalink / raw)
To: emacs-devel
>>>>> "Harald" == Harald Hanche-Olsen <hanche@math.ntnu.no> writes:
Harald> True, but given the octal escape sequences mentioned in the subject I
Harald> thought he meant the string that is denoted "\\302\\247" in elisp,
Harald> which is indeed ASCII.
No, the string "\302\247" is not ASCII. It's 8-bit-something-or-other.
--
Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095
<merlyn@stonehenge.com> <URL:http://www.stonehenge.com/merlyn/>
Smalltalk/Perl/Unix consulting, Technical writing, Comedy, etc. etc.
See http://methodsandmessages.posterous.com/ for Smalltalk discussion
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: converting octal escape sequences to utf-8 and back
2011-05-31 17:06 ` Randal L. Schwartz
@ 2011-05-31 20:13 ` PJ Weisberg
0 siblings, 0 replies; 18+ messages in thread
From: PJ Weisberg @ 2011-05-31 20:13 UTC (permalink / raw)
To: Randal L. Schwartz; +Cc: emacs-devel@gnu.org
On Tuesday, May 31, 2011, Randal L. Schwartz <merlyn@stonehenge.com> wrote:
>>>>>> "Harald" == Harald Hanche-Olsen <hanche@math.ntnu.no> writes:
>
> Harald> True, but given the octal escape sequences mentioned in the subject I
> Harald> thought he meant the string that is denoted "\\302\\247" in elisp,
> Harald> which is indeed ASCII.
>
> No, the string "\302\247" is not ASCII. It's 8-bit-something-or-other.
ASCII definitely *does* include the character \, and all the digits
0-9, in only 7 bits.
--
-PJ
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: converting octal escape sequences to utf-8 and back
2011-05-29 0:27 converting octal escape sequences to utf-8 and back Roland Winkler
` (2 preceding siblings ...)
2011-05-30 22:39 ` Randal L. Schwartz
@ 2011-05-30 22:54 ` Stefan Monnier
2011-05-31 21:06 ` Roland Winkler
3 siblings, 1 reply; 18+ messages in thread
From: Stefan Monnier @ 2011-05-30 22:54 UTC (permalink / raw)
To: Roland Winkler; +Cc: emacs-devel
> I am trying to use emacs to interface with a program that treats
> utf-8 characters in its input and output as octal escape sequences.
> So the program's output contains ascii strings like "\302\247",
> which I want to display within Emacs as "§". Likewise, I want to
> feed text containing utf-8 characters such as "§" into this program.
> So I need to convert these utf-8 characters back to their respective
> octal escape sequences. What is the proper way to achieve this?
You can try to convert the \302 and \247 to/from bytes by using `read'
and `prin1' (since Emacs also uses such a notation for its own Elisp
strings). As for converting those bytes to/from chars, just use
(en|de)code-coding-string.
Stefan
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: converting octal escape sequences to utf-8 and back
2011-05-30 22:54 ` Stefan Monnier
@ 2011-05-31 21:06 ` Roland Winkler
0 siblings, 0 replies; 18+ messages in thread
From: Roland Winkler @ 2011-05-31 21:06 UTC (permalink / raw)
To: Stefan Monnier; +Cc: Eli Zaretskii, emacs-devel
On Mon May 30 2011 Stefan Monnier wrote:
> You can try to convert the \302 and \247 to/from bytes by using `read'
> and `prin1' (since Emacs also uses such a notation for its own Elisp
> strings). As for converting those bytes to/from chars, just use
> (en|de)code-coding-string.
Thanks! The funny thing is that I am trying to make emacs interact
with the program djvused that uses a rather lisp-like structure for
its input and output. So using `read' and `prin1' is anyway a
natural way to go.
Once my code is a bit more mature, I'll post it on
gnu-emacs-sources. Maybe it'll be helpful for others, too.
Roland
^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2011-05-31 21:06 UTC | newest]
Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-05-29 0:27 converting octal escape sequences to utf-8 and back Roland Winkler
2011-05-29 2:57 ` Leo
2011-05-29 5:40 ` Michael Welsh Duggan
2011-05-29 6:35 ` Roland Winkler
2011-05-29 20:05 ` Michael Welsh Duggan
2011-05-29 6:47 ` Eli Zaretskii
2011-05-29 6:58 ` Roland Winkler
2011-05-29 8:35 ` Eli Zaretskii
2011-05-29 19:15 ` Roland Winkler
2011-05-29 8:50 ` Thien-Thi Nguyen
2011-05-29 3:00 ` Eli Zaretskii
2011-05-29 3:48 ` Roland Winkler
2011-05-30 22:39 ` Randal L. Schwartz
2011-05-31 7:14 ` Harald Hanche-Olsen
2011-05-31 17:06 ` Randal L. Schwartz
2011-05-31 20:13 ` PJ Weisberg
2011-05-30 22:54 ` Stefan Monnier
2011-05-31 21:06 ` Roland Winkler
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/emacs.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).