unofficial mirror of help-gnu-emacs@gnu.org
 help / color / mirror / Atom feed
* Converting string to Unicode
@ 2005-11-04 14:02 Desilets, Alain
  0 siblings, 0 replies; 2+ messages in thread
From: Desilets, Alain @ 2005-11-04 14:02 UTC (permalink / raw)


I am working on an Emacs mode for programming by voice (i.e. dictating computer code using speech recognition system):

http://voicecode.iit.nrc.ca/

This mode communicates with the speech recognition engine (an application outside of Emacs) through XML messages over socket connections.

In particular, whenever a new character is typed into Emacs, Emacs sends an XML message to the SR system to notify it. This XML message contains the character that was typed as well as the name of the buffer and the position where it was typed.

Whenever I typed an accented character in Emacs, the XML message that gets generated turns out to be malformed, because the character that was typed is inserted into the XML message as a byte sequence that uses the original encoding of that character in the buffer, as opposed to the unicode encoding that the XML message is supposed to be encoded with.

So my question is this. What would be the easiest way for me to take a character that was inserted into an Emacs buffer, and turn it into a unicode character to be inserted in the XML message?

Thx

Alain Désilets, MASc 
Agent de recherches/Research Officer 
Institut de technologie de l'information du CNRC / 
NRC Institute for Information Technology 

alain.desilets@nrc-cnrc.gc.ca 
Tél/Tel (613) 990-2813 
Facsimile/télécopieur: (613) 952-7151 

Conseil national de recherches Canada, M50, 1200 chemin Montréal, 
Ottawa (Ontario) K1A 0R6 
National Research Council Canada, M50, 1200 Montreal Rd., Ottawa, ON 
K1A 0R6 

Gouvernement du Canada | Government of Canada 

 

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: Converting string to Unicode
       [not found] <mailman.13934.1131112985.20277.help-gnu-emacs@gnu.org>
@ 2005-11-04 14:33 ` Pascal Bourguignon
  0 siblings, 0 replies; 2+ messages in thread
From: Pascal Bourguignon @ 2005-11-04 14:33 UTC (permalink / raw)


"Desilets, Alain" <Alain.Desilets@nrc-cnrc.gc.ca> writes:

> I am working on an Emacs mode for programming by voice
> (i.e. dictating computer code using speech recognition system):
>
> http://voicecode.iit.nrc.ca/
>
> This mode communicates with the speech recognition engine (an
> application outside of Emacs) through XML messages over socket
> connections.
>
> In particular, whenever a new character is typed into Emacs, Emacs
> sends an XML message to the SR system to notify it. This XML message
> contains the character that was typed as well as the name of the
> buffer and the position where it was typed.
>
> Whenever I typed an accented character in Emacs, the XML message
> that gets generated turns out to be malformed, because the character
> that was typed is inserted into the XML message as a byte sequence
> that uses the original encoding of that character in the buffer, as
> opposed to the unicode encoding that the XML message is supposed to
> be encoded with.

You won't be able to resolve this problem if you continue with these
misconceptions.  Unicode is not an encoding!  Unicode is a character
set (or character repertoire as they call it).  There are various
encodings, some for only subsets of the Unicode set, some for the
whole unicode set.   You can consider UTF-8, UTF-16, UCS-32, and some
other encodings.  If you're only concerned by a subset, for example
for some European languages, you can  use ISO-8859-1, or some other
ISO-8859 encoding.


> So my question is this. What would be the easiest way for me to take
> a character that was inserted into an Emacs buffer, and turn it into
> a unicode character to be inserted in the XML message?

Wrong question.


The right questions are:
- What encoding should be used in the XML message?
- How to make emacs generate the message with this encoding?

You can use: M-x apropos RET coding RET 
or C-h i d m emacs RET m Coding Systems RET
to get more information about this second question.


-- 
__Pascal Bourguignon__                     http://www.informatimago.com/

Nobody can fix the economy.  Nobody can be trusted with their finger
on the button.  Nobody's perfect.  VOTE FOR NOBODY.

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2005-11-04 14:33 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <mailman.13934.1131112985.20277.help-gnu-emacs@gnu.org>
2005-11-04 14:33 ` Converting string to Unicode Pascal Bourguignon
2005-11-04 14:02 Desilets, Alain

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).