From mboxrd@z Thu Jan  1 00:00:00 1970
Path: main.gmane.org!not-for-mail
From: Kenichi Handa <handa@m17n.org>
Newsgroups: gmane.emacs.devel
Subject: Re: decode-coding-string gone awry?
Date: Mon, 14 Feb 2005 10:50:25 +0900 (JST)
Message-ID: <200502140150.KAA29610@etlken.m17n.org>
References: <x5d5v52k4m.fsf@lola.goethe.zz>
NNTP-Posting-Host: main.gmane.org
Mime-Version: 1.0 (generated by SEMI 1.14.3 - "Ushinoya")
Content-Type: text/plain; charset=US-ASCII
X-Trace: sea.gmane.org 1108347091 19592 80.91.229.2 (14 Feb 2005 02:11:31 GMT)
X-Complaints-To: usenet@sea.gmane.org
NNTP-Posting-Date: Mon, 14 Feb 2005 02:11:31 +0000 (UTC)
Cc: emacs-devel@gnu.org
Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Mon Feb 14 03:11:31 2005
Original-Received: from lists.gnu.org ([199.232.76.165])
	by ciao.gmane.org with esmtp (Exim 4.43)
	id 1D0Vht-00067C-G0
	for ged-emacs-devel@m.gmane.org; Mon, 14 Feb 2005 03:11:25 +0100
Original-Received: from localhost ([127.0.0.1] helo=lists.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43)
	id 1D0VxL-00045b-Aa
	for ged-emacs-devel@m.gmane.org; Sun, 13 Feb 2005 21:27:23 -0500
Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43)
	id 1D0VvZ-0003SY-SP
	for emacs-devel@gnu.org; Sun, 13 Feb 2005 21:25:34 -0500
Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43)
	id 1D0VvY-0003Rm-3f
	for emacs-devel@gnu.org; Sun, 13 Feb 2005 21:25:33 -0500
Original-Received: from [199.232.76.173] (helo=monty-python.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43) id 1D0VuM-0002rM-D0
	for emacs-devel@gnu.org; Sun, 13 Feb 2005 21:24:18 -0500
Original-Received: from [192.47.44.130] (helo=tsukuba.m17n.org)
	by monty-python.gnu.org with esmtp (TLSv1:DES-CBC3-SHA:168)
	(Exim 4.34) id 1D0VNc-0007Ju-30; Sun, 13 Feb 2005 20:50:28 -0500
Original-Received: from nfs.m17n.org (nfs.m17n.org [192.47.44.7])
	by tsukuba.m17n.org (8.12.3/8.12.3/Debian-7.1) with ESMTP id
	j1E1oPXD031064; Mon, 14 Feb 2005 10:50:26 +0900
Original-Received: from etlken.m17n.org (etlken.m17n.org [192.47.44.125])
	by nfs.m17n.org (8.12.3/8.12.3/Debian-7.1) with ESMTP id j1E1oPPN001778;
	Mon, 14 Feb 2005 10:50:25 +0900
Original-Received: (from handa@localhost)
	by etlken.m17n.org (8.8.8+Sun/3.7W-2001040620) id KAA29610;
	Mon, 14 Feb 2005 10:50:25 +0900 (JST)
Original-To: David Kastrup <dak@gnu.org>
In-reply-to: <x5d5v52k4m.fsf@lola.goethe.zz> (message from David Kastrup on
	Sun, 13 Feb 2005 04:50:49 +0100)
User-Agent: SEMI/1.14.3 (Ushinoya) FLIM/1.14.2 (Yagi-Nishiguchi) APEL/10.2
	Emacs/21.3.50 (sparc-sun-solaris2.6) MULE/5.0 (SAKAKI)
X-BeenThere: emacs-devel@gnu.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: "Emacs development discussions." <emacs-devel.gnu.org>
List-Unsubscribe: <http://lists.gnu.org/mailman/listinfo/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/pipermail/emacs-devel>
List-Post: <mailto:emacs-devel@gnu.org>
List-Help: <mailto:emacs-devel-request@gnu.org?subject=help>
List-Subscribe: <http://lists.gnu.org/mailman/listinfo/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=subscribe>
Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org
Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org
X-MailScanner-To: ged-emacs-devel@m.gmane.org
Xref: main.gmane.org gmane.emacs.devel:33358
X-Report-Spam: http://spam.gmane.org/gmane.emacs.devel:33358

In article <x5d5v52k4m.fsf@lola.goethe.zz>, David Kastrup <dak@gnu.org> writes:
> I have the problem that within preview-latex there is a function that
> assembles UTF-8 strings from single characters.  This function, when
> used manually, mostly works.  It is called within a process sentinel
> and fails rather consistently there with a current CVS Emacs.  I
> include the code here since I don't know what might be involved here:
> regexp-quote, substring, char-to-string etc.  The starting string is
> taken from a buffer containing only ASCII (inserted by a process with
> coding-system 'raw-text).

It seems that you are caught in a trap of automatic
unibyte->multibyte conversion.

> (defun preview-error-quote (string)
>   "Turn STRING with potential ^^ sequences into a regexp.
> To preserve sanity, additional ^ prefixes are matched literally,
> so the character represented by ^^^ preceding extended characters
> will not get matched, usually."
>   (let (output case-fold-search)
>     (while (string-match "\\^\\{2,\\}\\(\\([@-_?]\\)\\|[8-9a-f][0-9a-f]\\)"
> 			 string)
>       (setq output
> 	    (concat output
> 		    (regexp-quote (substring string
> 					     0
> 					     (- (match-beginning 1) 2)))

If STRING is taken from a multibyte buffer, it is a
multibyte string.  Thus, the above substring also returns a
multibyte string.

> 		    (if (match-beginning 2)
> 			(concat
> 			 "\\(?:" (regexp-quote
> 				  (substring string
> 					     (- (match-beginning 1) 2)
> 					     (match-end 0)))
> 			 "\\|"
> 			 (char-to-string
> 			  (logxor (aref string (match-beginning 2)) 64))
> 			 "\\)")
> 		      (char-to-string
> 		       (string-to-number (match-string 1 string) 16))))

But, this char-to-string produces a unibyte string.  So, on
concatinating them, this unibyte string is automatically
converted to multibyte by string-make-multibyte function
which usually produces a multibyte string containing latin-1
chars.

> 	    string (substring string (match-end 0))))
>     (setq output (concat output (regexp-quote string)))
>     (if (featurep 'mule)
> 	(prog2
> 	    (message "%S %S " output buffer-file-coding-system)
> 	    (setq output (decode-coding-string output buffer-file-coding-system))

And this decode-coding-string treats the internal byte
sequence of a multibyte string OUTPUT as utf-8, thus you get
some garbage.

> Unfortunately, when I call this stuff by hand instead from the
> process-sentinel, it mostly works

That is because the string you give to preview-error-quote
is a unibyte string in that case.  The Lisp reader generates
a unibyte string when it sees ASCII-only string.

Ex: (multibyte-string-p "abc") => nil

This will also return incorrect string.

(preview-error-quote
  (string-to-multibyte "r Weise $f$ um~$1$ erh^^c3^^b6ht und $e$"))

So, the easiest fix will be to do:
  (setq string (string-as-unibyte string))
in the head of preview-error-quote.

---
Ken'ichi HANDA
handa@m17n.org