From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Paul Pogonyshev Newsgroups: gmane.emacs.devel Subject: Re: `decode-coding-string' question Date: Thu, 6 Jul 2006 23:34:21 +0300 Message-ID: <200607062334.21288.pogonyshev@gmx.net> References: <200607040035.01379.pogonyshev@gmx.net> <200607061852.28566.pogonyshev@gmx.net> NNTP-Posting-Host: main.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit X-Trace: sea.gmane.org 1152217735 28860 80.91.229.2 (6 Jul 2006 20:28:55 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Thu, 6 Jul 2006 20:28:55 +0000 (UTC) Cc: handa@m17n.org Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Thu Jul 06 22:28:52 2006 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by ciao.gmane.org with esmtp (Exim 4.43) id 1FyaSl-00045r-V5 for ged-emacs-devel@m.gmane.org; Thu, 06 Jul 2006 22:28:42 +0200 Original-Received: from localhost ([127.0.0.1] helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1FyaSl-0006dM-Af for ged-emacs-devel@m.gmane.org; Thu, 06 Jul 2006 16:28:39 -0400 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1FyaSW-0006b3-FK for emacs-devel@gnu.org; Thu, 06 Jul 2006 16:28:24 -0400 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1FyaSU-0006Z7-UC for emacs-devel@gnu.org; Thu, 06 Jul 2006 16:28:24 -0400 Original-Received: from [199.232.76.173] (helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1FyaSU-0006Z1-Qf for emacs-devel@gnu.org; Thu, 06 Jul 2006 16:28:22 -0400 Original-Received: from [213.165.64.21] (helo=mail.gmx.net) by monty-python.gnu.org with smtp (Exim 4.52) id 1FyaSk-0001E6-QV for emacs-devel@gnu.org; Thu, 06 Jul 2006 16:28:39 -0400 Original-Received: (qmail invoked by alias); 06 Jul 2006 20:28:17 -0000 Original-Received: from unknown (EHLO [194.158.209.209]) [194.158.209.209] by mail.gmx.net (mp022) with SMTP; 06 Jul 2006 22:28:17 +0200 X-Authenticated: #16844820 Original-To: emacs-devel@gnu.org, Eli Zaretskii User-Agent: KMail/1.7.2 In-Reply-To: Content-Disposition: inline X-Y-GMX-Trusted: 0 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:56655 Archived-At: Eli Zaretskii wrote: > > From: Paul Pogonyshev > > Date: Thu, 6 Jul 2006 18:52:28 +0300 > > Cc: Kenichi Handa > > > > > > I do. But I need to know where they begin in the buffer (containing > > > > the encoded C string.) I don't see a way to keep this information at > > > > present... :( > > > > > > How did you make that buffer? Why don't you have an > > > already-decoded text in that buffer? > > > > Because it's a C source file. Strings have to be encoded there. > > Paul, there's some misunderstanding here, so please bear with us. > Handa-san cannot understand how come you have undecoded characters in > the buffer, and neither can I. > > The fact that it's a C file does not matter: Emacs _always_ decodes > characters when it visits the file, no matter if it's a C file or > something else. In the text you get in your buffer the characters > should be decoded. The question is, how come it didn't decode these > characters in your case? Are there other non-ASCII characters in the > same file, perhaps? if so, what characters are those? For that > matter, can you post a small sample file that, when visited in Emacs, > leaves the UTF-8 encoded characters undecoded in the buffer? Please > post that file as a binary attachment, to avoid munging it by email > software en- and de-coding. There is indeed a misunderstanding. The characters in the buffer _are_ decoded. However the characters form C escape sequence, like "\xc2\xa9". To know what character is encoded by this C sequence, I first translate strings "\xc2" and "\xa9" to the appropriate (undecoded!) characters. The resulting string of length 2 is encoded in UTF-8 and I decode it to receive the copyright character or whatever. Phew. Hope it is clearer now. Anyway, it is not so important for me anymore, since gettext doesn't support non-ASCII characters in untranslated strings with fairly recent GNU libc. (And yes, I tried inserting non-ASCII characters in the untranslated strings.) Paul