From mboxrd@z Thu Jan  1 00:00:00 1970
Path: news.gmane.org!not-for-mail
From: Paul Pogonyshev <pogonyshev@gmx.net>
Newsgroups: gmane.emacs.devel
Subject: Re: `decode-coding-string' question
Date: Thu, 6 Jul 2006 23:34:21 +0300
Message-ID: <200607062334.21288.pogonyshev@gmx.net>
References: <200607040035.01379.pogonyshev@gmx.net>
	<200607061852.28566.pogonyshev@gmx.net> <upsgiqyga.fsf@gnu.org>
NNTP-Posting-Host: main.gmane.org
Mime-Version: 1.0
Content-Type: text/plain;
  charset="utf-8"
Content-Transfer-Encoding: 7bit
X-Trace: sea.gmane.org 1152217735 28860 80.91.229.2 (6 Jul 2006 20:28:55 GMT)
X-Complaints-To: usenet@sea.gmane.org
NNTP-Posting-Date: Thu, 6 Jul 2006 20:28:55 +0000 (UTC)
Cc: handa@m17n.org
Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Thu Jul 06 22:28:52 2006
Return-path: <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>
Envelope-to: ged-emacs-devel@m.gmane.org
Original-Received: from lists.gnu.org ([199.232.76.165])
	by ciao.gmane.org with esmtp (Exim 4.43)
	id 1FyaSl-00045r-V5
	for ged-emacs-devel@m.gmane.org; Thu, 06 Jul 2006 22:28:42 +0200
Original-Received: from localhost ([127.0.0.1] helo=lists.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43)
	id 1FyaSl-0006dM-Af
	for ged-emacs-devel@m.gmane.org; Thu, 06 Jul 2006 16:28:39 -0400
Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43)
	id 1FyaSW-0006b3-FK
	for emacs-devel@gnu.org; Thu, 06 Jul 2006 16:28:24 -0400
Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43)
	id 1FyaSU-0006Z7-UC
	for emacs-devel@gnu.org; Thu, 06 Jul 2006 16:28:24 -0400
Original-Received: from [199.232.76.173] (helo=monty-python.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43) id 1FyaSU-0006Z1-Qf
	for emacs-devel@gnu.org; Thu, 06 Jul 2006 16:28:22 -0400
Original-Received: from [213.165.64.21] (helo=mail.gmx.net)
	by monty-python.gnu.org with smtp (Exim 4.52) id 1FyaSk-0001E6-QV
	for emacs-devel@gnu.org; Thu, 06 Jul 2006 16:28:39 -0400
Original-Received: (qmail invoked by alias); 06 Jul 2006 20:28:17 -0000
Original-Received: from unknown (EHLO [194.158.209.209]) [194.158.209.209]
	by mail.gmx.net (mp022) with SMTP; 06 Jul 2006 22:28:17 +0200
X-Authenticated: #16844820
Original-To: emacs-devel@gnu.org,
 Eli Zaretskii <eliz@gnu.org>
User-Agent: KMail/1.7.2
In-Reply-To: <upsgiqyga.fsf@gnu.org>
Content-Disposition: inline
X-Y-GMX-Trusted: 0
X-BeenThere: emacs-devel@gnu.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: "Emacs development discussions." <emacs-devel.gnu.org>
List-Unsubscribe: <http://lists.gnu.org/mailman/listinfo/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/pipermail/emacs-devel>
List-Post: <mailto:emacs-devel@gnu.org>
List-Help: <mailto:emacs-devel-request@gnu.org?subject=help>
List-Subscribe: <http://lists.gnu.org/mailman/listinfo/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=subscribe>
Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org
Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org
Xref: news.gmane.org gmane.emacs.devel:56655
Archived-At: <http://permalink.gmane.org/gmane.emacs.devel/56655>

Eli Zaretskii wrote:
> > From: Paul Pogonyshev <pogonyshev@gmx.net>
> > Date: Thu, 6 Jul 2006 18:52:28 +0300
> > Cc: Kenichi Handa <handa@m17n.org>
> > 
> > > > I do.  But I need to know where they begin in the buffer (containing
> > > > the encoded C string.)  I don't see a way to keep this information at
> > > > present... :(
> > > 
> > > How did you make that buffer?  Why don't you have an
> > > already-decoded text in that buffer?
> > 
> > Because it's a C source file.  Strings have to be encoded there.
> 
> Paul, there's some misunderstanding here, so please bear with us.
> Handa-san cannot understand how come you have undecoded characters in
> the buffer, and neither can I.
> 
> The fact that it's a C file does not matter: Emacs _always_ decodes
> characters when it visits the file, no matter if it's a C file or
> something else.  In the text you get in your buffer the characters
> should be decoded.  The question is, how come it didn't decode these
> characters in your case?  Are there other non-ASCII characters in the
> same file, perhaps? if so, what characters are those?  For that
> matter, can you post a small sample file that, when visited in Emacs,
> leaves the UTF-8 encoded characters undecoded in the buffer?  Please
> post that file as a binary attachment, to avoid munging it by email
> software en- and de-coding.

There is indeed a misunderstanding.  The characters in the buffer _are_
decoded.  However the characters form C escape sequence, like "\xc2\xa9".
To know what character is encoded by this C sequence, I first translate
strings "\xc2" and "\xa9" to the appropriate (undecoded!) characters.
The resulting string of length 2 is encoded in UTF-8 and I decode it
to receive the copyright character or whatever.

Phew.  Hope it is clearer now.  Anyway, it is not so important for me
anymore, since gettext doesn't support non-ASCII characters in
untranslated strings with fairly recent GNU libc.  (And yes, I tried
inserting non-ASCII characters in the untranslated strings.)

Paul