From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Pascal Bourguignon Newsgroups: gmane.emacs.help Subject: Re: Converting string to Unicode Date: Fri, 04 Nov 2005 15:33:49 +0100 Organization: Informatimago Message-ID: <87ek5wiisy.fsf@thalassa.informatimago.com> References: NNTP-Posting-Host: main.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: sea.gmane.org 1131117039 7260 80.91.229.2 (4 Nov 2005 15:10:39 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Fri, 4 Nov 2005 15:10:39 +0000 (UTC) Original-X-From: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Fri Nov 04 16:10:29 2005 Return-path: Original-Received: from lists.gnu.org ([199.232.76.165]) by ciao.gmane.org with esmtp (Exim 4.43) id 1EY3AI-0001nS-GL for geh-help-gnu-emacs@m.gmane.org; Fri, 04 Nov 2005 16:07:38 +0100 Original-Received: from localhost ([127.0.0.1] helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1EY3AH-0008TX-7h for geh-help-gnu-emacs@m.gmane.org; Fri, 04 Nov 2005 10:07:37 -0500 Original-Path: shelby.stanford.edu!newsfeed.stanford.edu!headwall.stanford.edu!newsfeed.news2me.com!newsfeed.icl.net!newsfeed.fjserv.net!newsfeed.freenet.de!feeder2.ecngs.de!ecngs!feeder.ecngs.de!195.40.4.120.MISMATCH!easynet-quince!easynet.net!easynet-post2!not-for-mail Original-Newsgroups: gnu.emacs.help Face: iVBORw0KGgoAAAANSUhEUgAAADAAAAAwAQMAAABtzGvEAAAABlBMVEUAAAD///+l2Z/dAAAA oElEQVR4nK3OsRHCMAwF0O8YQufUNIQRGIAja9CxSA55AxZgFO4coMgYrEDDQZWPIlNAjwq9 033pbOBPtbXuB6PKNBn5gZkhGa86Z4x2wE67O+06WxGD/HCOGR0deY3f9Ijwwt7rNGNf6Oac l/GuZTF1wFGKiYYHKSFAkjIo1b6sCYS1sVmFhhhahKQssRjRT90ITWUk6vvK3RsPGs+M1RuR mV+hO/VvFAAAAABJRU5ErkJggg== X-Accept-Language: fr, es, en X-No-Archive: no User-Agent: Gnus/5.11 (Gnus v5.11) Emacs/22.0.50 (gnu/linux) Cancel-Lock: sha1:yEwahNxG1e4FFj2zd9HlFBor12Y= Original-Lines: 54 Original-NNTP-Posting-Host: 62.93.174.79 Original-X-Trace: DXC=eP^8SP1m35Cg[B7mmUifOFV1L<_KKdG_BbBd4DjF1d]A Original-Xref: shelby.stanford.edu gnu.emacs.help:135233 Original-To: help-gnu-emacs@gnu.org X-BeenThere: help-gnu-emacs@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Users list for the GNU Emacs text editor List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Errors-To: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.help:30826 Archived-At: "Desilets, Alain" writes: > I am working on an Emacs mode for programming by voice > (i.e. dictating computer code using speech recognition system): > > http://voicecode.iit.nrc.ca/ > > This mode communicates with the speech recognition engine (an > application outside of Emacs) through XML messages over socket > connections. > > In particular, whenever a new character is typed into Emacs, Emacs > sends an XML message to the SR system to notify it. This XML message > contains the character that was typed as well as the name of the > buffer and the position where it was typed. > > Whenever I typed an accented character in Emacs, the XML message > that gets generated turns out to be malformed, because the character > that was typed is inserted into the XML message as a byte sequence > that uses the original encoding of that character in the buffer, as > opposed to the unicode encoding that the XML message is supposed to > be encoded with. You won't be able to resolve this problem if you continue with these misconceptions. Unicode is not an encoding! Unicode is a character set (or character repertoire as they call it). There are various encodings, some for only subsets of the Unicode set, some for the whole unicode set. You can consider UTF-8, UTF-16, UCS-32, and some other encodings. If you're only concerned by a subset, for example for some European languages, you can use ISO-8859-1, or some other ISO-8859 encoding. > So my question is this. What would be the easiest way for me to take > a character that was inserted into an Emacs buffer, and turn it into > a unicode character to be inserted in the XML message? Wrong question. The right questions are: - What encoding should be used in the XML message? - How to make emacs generate the message with this encoding? You can use: M-x apropos RET coding RET or C-h i d m emacs RET m Coding Systems RET to get more information about this second question. -- __Pascal Bourguignon__ http://www.informatimago.com/ Nobody can fix the economy. Nobody can be trusted with their finger on the button. Nobody's perfect. VOTE FOR NOBODY.