From mboxrd@z Thu Jan  1 00:00:00 1970
Path: main.gmane.org!not-for-mail
From: David Kastrup <dak@gnu.org>
Newsgroups: gmane.emacs.devel
Subject: Re: decode-coding-string gone awry?
Date: Mon, 14 Feb 2005 03:28:32 +0100
Message-ID: <x5vf8vri27.fsf@lola.goethe.zz>
References: <x5d5v52k4m.fsf@lola.goethe.zz>
	<200502140150.KAA29610@etlken.m17n.org>
NNTP-Posting-Host: main.gmane.org
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
X-Trace: sea.gmane.org 1108349118 23295 80.91.229.2 (14 Feb 2005 02:45:18 GMT)
X-Complaints-To: usenet@sea.gmane.org
NNTP-Posting-Date: Mon, 14 Feb 2005 02:45:18 +0000 (UTC)
Cc: emacs-devel@gnu.org
Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Mon Feb 14 03:45:17 2005
Original-Received: from lists.gnu.org ([199.232.76.165])
	by ciao.gmane.org with esmtp (Exim 4.43)
	id 1D0WES-0001tk-HI
	for ged-emacs-devel@m.gmane.org; Mon, 14 Feb 2005 03:45:04 +0100
Original-Received: from localhost ([127.0.0.1] helo=lists.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43)
	id 1D0WNp-0005vk-UJ
	for ged-emacs-devel@m.gmane.org; Sun, 13 Feb 2005 21:54:45 -0500
Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43)
	id 1D0WN3-0005d8-Vc
	for emacs-devel@gnu.org; Sun, 13 Feb 2005 21:53:58 -0500
Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43)
	id 1D0WMu-0005aJ-Ol
	for emacs-devel@gnu.org; Sun, 13 Feb 2005 21:53:51 -0500
Original-Received: from [199.232.76.173] (helo=monty-python.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43) id 1D0WMt-0005Tp-DV
	for emacs-devel@gnu.org; Sun, 13 Feb 2005 21:53:47 -0500
Original-Received: from [199.232.76.164] (helo=fencepost.gnu.org)
	by monty-python.gnu.org with esmtp (Exim 4.34) id 1D0VyR-0000Uu-0t
	for emacs-devel@gnu.org; Sun, 13 Feb 2005 21:28:31 -0500
Original-Received: from localhost ([127.0.0.1] helo=lola.goethe.zz)
	by fencepost.gnu.org with esmtp (Exim 4.34)
	id 1D0VuU-0005lQ-3m; Sun, 13 Feb 2005 21:24:26 -0500
Original-To: Kenichi Handa <handa@m17n.org>
In-Reply-To: <200502140150.KAA29610@etlken.m17n.org> (Kenichi Handa's
	message of "Mon, 14 Feb 2005 10:50:25 +0900 (JST)")
User-Agent: Gnus/5.11 (Gnus v5.11) Emacs/22.0.50 (gnu/linux)
X-BeenThere: emacs-devel@gnu.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: "Emacs development discussions." <emacs-devel.gnu.org>
List-Unsubscribe: <http://lists.gnu.org/mailman/listinfo/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/pipermail/emacs-devel>
List-Post: <mailto:emacs-devel@gnu.org>
List-Help: <mailto:emacs-devel-request@gnu.org?subject=help>
List-Subscribe: <http://lists.gnu.org/mailman/listinfo/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=subscribe>
Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org
Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org
X-MailScanner-To: ged-emacs-devel@m.gmane.org
Xref: main.gmane.org gmane.emacs.devel:33362
X-Report-Spam: http://spam.gmane.org/gmane.emacs.devel:33362

Kenichi Handa <handa@m17n.org> writes:

> In article <x5d5v52k4m.fsf@lola.goethe.zz>, David Kastrup <dak@gnu.org> writes:
>> I have the problem that within preview-latex there is a function
>> that assembles UTF-8 strings from single characters.  This
>> function, when used manually, mostly works.
>
> It seems that you are caught in a trap of automatic
> unibyte->multibyte conversion.
>
>> (defun preview-error-quote (string)
>>   "Turn STRING with potential ^^ sequences into a regexp.
>> To preserve sanity, additional ^ prefixes are matched literally,
>> so the character represented by ^^^ preceding extended characters
>> will not get matched, usually."
>>   (let (output case-fold-search)
>>     (while (string-match "\\^\\{2,\\}\\(\\([@-_?]\\)\\|[8-9a-f][0-9a-f]\\)"
>> 			 string)
>>       (setq output
>> 	    (concat output
>> 		    (regexp-quote (substring string
>> 					     0
>> 					     (- (match-beginning 1) 2)))
>
> If STRING is taken from a multibyte buffer, it is a
> multibyte string.  Thus, the above substring also returns a
> multibyte string.
>
>> 		      (char-to-string
>> 		       (string-to-number (match-string 1 string) 16))))
>
> But, this char-to-string produces a unibyte string.  So, on
> concatinating them, this unibyte string is automatically converted
> to multibyte by string-make-multibyte function which usually
> produces a multibyte string containing latin-1 chars.

Oh.  Latin-1 chars.  Can't I tell char-to-string to produce the same
sort of raw-marked chars that raw-text (as process-coding system)
appears to produce?

>>   (setq output (decode-coding-string output buffer-file-coding-system))
>
> And this decode-coding-string treats the internal byte
> sequence of a multibyte string OUTPUT as utf-8, thus you get
> some garbage.
>
>> Unfortunately, when I call this stuff by hand instead from the
>> process-sentinel, it mostly works
>
> That is because the string you give to preview-error-quote
> is a unibyte string in that case.  The Lisp reader generates
> a unibyte string when it sees ASCII-only string.
>
> Ex: (multibyte-string-p "abc") => nil
>
> This will also return incorrect string.
>
> (preview-error-quote
>   (string-to-multibyte "r Weise $f$ um~$1$ erh^^c3^^b6ht und $e$"))
>
> So, the easiest fix will be to do:
>   (setq string (string-as-unibyte string))
> in the head of preview-error-quote.

Sigh.  XEmacs-21.4-mule does not seem to have string-as-unibyte.  I'll
have to see whether it happens to work without it on XEmacs.  If not,
I'll have to come up with something else.

Thanks for the analysis!

-- 
David Kastrup, Kriemhildstr. 15, 44793 Bochum