From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED!not-for-mail From: Helmut Eller Newsgroups: gmane.emacs.help Subject: Re: Parsing of multibyte strings frpom process output Date: Tue, 08 May 2018 13:00:13 +0200 Organization: Netfront http://www.netfront.net/ Message-ID: References: NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-Trace: blaine.gmane.org 1525777409 25894 195.159.176.226 (8 May 2018 11:03:29 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Tue, 8 May 2018 11:03:29 +0000 (UTC) User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.0.50 (gnu/linux) To: help-gnu-emacs@gnu.org Original-X-From: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Tue May 08 13:03:25 2018 Return-path: Envelope-to: geh-help-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by blaine.gmane.org with esmtp (Exim 4.84_2) (envelope-from ) id 1fG0Oy-0006df-FT for geh-help-gnu-emacs@m.gmane.org; Tue, 08 May 2018 13:03:24 +0200 Original-Received: from localhost ([::1]:50416 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fG0R5-0005Lo-MA for geh-help-gnu-emacs@m.gmane.org; Tue, 08 May 2018 07:05:35 -0400 Original-Path: usenet.stanford.edu!goblin1!goblin.stu.neva.ru!news.albasani.net!news.mixmin.net!news.unit0.net!news.netfront.net!.POSTED.37.186.165.47!not-for-mail Original-Newsgroups: gnu.emacs.help Original-Injection-Info: adenine.netfront.net; posting-host="37.186.165.47"; logging-data="53313"; mail-complaints-to="news@netfront.net" Cancel-Lock: sha1:tBwr8912i9cJpVtLzEESnOky6Lw= Original-Xref: usenet.stanford.edu gnu.emacs.help:222573 X-BeenThere: help-gnu-emacs@gnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: Users list for the GNU Emacs text editor List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Original-Sender: "help-gnu-emacs" Xref: news.gmane.org gmane.emacs.help:116692 Archived-At: On Tue, May 08 2018, Michael Albinus wrote: > Hi, > > I call a local process ("gio list ...", to name it), which returns utf8 > multibyte codes like > > --8<---------------cut here---------------start------------->8--- > standard::symlink-target=/home/albinus/tmp/\xc2\x9abung > --8<---------------cut here---------------end--------------->8--- > > The bytes "\xc2\x9a" stand for the multibyte char ?\x9a. The UTF-8 byte sequence \xc2\x9a is a control character. Maybe the byte sequence \xc3\x9c would make a better example as that corresponds to Ü (LATIN CAPITAL LETTER U WITH DIAERESIS). > However, I > don't know how to parse it that I could retrieve it. All what I have > tried returns always the *two* characters ?\xc2 ?\x9a, multibyte > encoded. How could I get just the multibyte character ?\x9a from this? You could use (set-process-coding-system 'utf-8) if you know that the all output of the process is indeed utf-8 encoded. Alternatively, you could use 'binary as coding system and manually call decode-coding-string on the parts that are utf-8 encoded. However keep in mind, that "raw bytes" in multibyte strings have char codes in the range #x3FFF00..#x3FFFFF. If you want even more confusion: you could set up the process so that it generates unibyte strings and then use decode-coding-string to create the multibyte string. > I know that (decode-coding-string "/home/albinus/tmp/\xc2\x9a\ bung" 'utf-8) > does what I want. But here, the string is a string *constant*, which > allows to write characters in hex syntax. When I read the string from > the output buffer (after including the trailing "\ "), this does not work. Remember, if a hexadecimal or octal escape sequence occurs in a string literal then the string is automatically becomes a unibyte string: (multibyte-string-p "\xc3\x9c") => nil Also consider these examples: (decode-coding-string "\xc3\x9c" 'utf-8) => "Ü" (decode-coding-string (string #xc3 #x9c) 'utf-8) => "Ã\234" (decode-coding-string (string #x3FFFc3 #x3FFF9c) 'utf-8) => "Ü" Helmut