From mboxrd@z Thu Jan 1 00:00:00 1970 Path: main.gmane.org!not-for-mail From: David Kastrup Newsgroups: gmane.emacs.devel Subject: Re: decode-coding-string gone awry? Date: Mon, 14 Feb 2005 03:28:32 +0100 Message-ID: References: <200502140150.KAA29610@etlken.m17n.org> NNTP-Posting-Host: main.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: sea.gmane.org 1108349118 23295 80.91.229.2 (14 Feb 2005 02:45:18 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Mon, 14 Feb 2005 02:45:18 +0000 (UTC) Cc: emacs-devel@gnu.org Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Mon Feb 14 03:45:17 2005 Original-Received: from lists.gnu.org ([199.232.76.165]) by ciao.gmane.org with esmtp (Exim 4.43) id 1D0WES-0001tk-HI for ged-emacs-devel@m.gmane.org; Mon, 14 Feb 2005 03:45:04 +0100 Original-Received: from localhost ([127.0.0.1] helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1D0WNp-0005vk-UJ for ged-emacs-devel@m.gmane.org; Sun, 13 Feb 2005 21:54:45 -0500 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1D0WN3-0005d8-Vc for emacs-devel@gnu.org; Sun, 13 Feb 2005 21:53:58 -0500 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1D0WMu-0005aJ-Ol for emacs-devel@gnu.org; Sun, 13 Feb 2005 21:53:51 -0500 Original-Received: from [199.232.76.173] (helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1D0WMt-0005Tp-DV for emacs-devel@gnu.org; Sun, 13 Feb 2005 21:53:47 -0500 Original-Received: from [199.232.76.164] (helo=fencepost.gnu.org) by monty-python.gnu.org with esmtp (Exim 4.34) id 1D0VyR-0000Uu-0t for emacs-devel@gnu.org; Sun, 13 Feb 2005 21:28:31 -0500 Original-Received: from localhost ([127.0.0.1] helo=lola.goethe.zz) by fencepost.gnu.org with esmtp (Exim 4.34) id 1D0VuU-0005lQ-3m; Sun, 13 Feb 2005 21:24:26 -0500 Original-To: Kenichi Handa In-Reply-To: <200502140150.KAA29610@etlken.m17n.org> (Kenichi Handa's message of "Mon, 14 Feb 2005 10:50:25 +0900 (JST)") User-Agent: Gnus/5.11 (Gnus v5.11) Emacs/22.0.50 (gnu/linux) X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org X-MailScanner-To: ged-emacs-devel@m.gmane.org Xref: main.gmane.org gmane.emacs.devel:33362 X-Report-Spam: http://spam.gmane.org/gmane.emacs.devel:33362 Kenichi Handa writes: > In article , David Kastrup writes: >> I have the problem that within preview-latex there is a function >> that assembles UTF-8 strings from single characters. This >> function, when used manually, mostly works. > > It seems that you are caught in a trap of automatic > unibyte->multibyte conversion. > >> (defun preview-error-quote (string) >> "Turn STRING with potential ^^ sequences into a regexp. >> To preserve sanity, additional ^ prefixes are matched literally, >> so the character represented by ^^^ preceding extended characters >> will not get matched, usually." >> (let (output case-fold-search) >> (while (string-match "\\^\\{2,\\}\\(\\([@-_?]\\)\\|[8-9a-f][0-9a-f]\\)" >> string) >> (setq output >> (concat output >> (regexp-quote (substring string >> 0 >> (- (match-beginning 1) 2))) > > If STRING is taken from a multibyte buffer, it is a > multibyte string. Thus, the above substring also returns a > multibyte string. > >> (char-to-string >> (string-to-number (match-string 1 string) 16)))) > > But, this char-to-string produces a unibyte string. So, on > concatinating them, this unibyte string is automatically converted > to multibyte by string-make-multibyte function which usually > produces a multibyte string containing latin-1 chars. Oh. Latin-1 chars. Can't I tell char-to-string to produce the same sort of raw-marked chars that raw-text (as process-coding system) appears to produce? >> (setq output (decode-coding-string output buffer-file-coding-system)) > > And this decode-coding-string treats the internal byte > sequence of a multibyte string OUTPUT as utf-8, thus you get > some garbage. > >> Unfortunately, when I call this stuff by hand instead from the >> process-sentinel, it mostly works > > That is because the string you give to preview-error-quote > is a unibyte string in that case. The Lisp reader generates > a unibyte string when it sees ASCII-only string. > > Ex: (multibyte-string-p "abc") => nil > > This will also return incorrect string. > > (preview-error-quote > (string-to-multibyte "r Weise $f$ um~$1$ erh^^c3^^b6ht und $e$")) > > So, the easiest fix will be to do: > (setq string (string-as-unibyte string)) > in the head of preview-error-quote. Sigh. XEmacs-21.4-mule does not seem to have string-as-unibyte. I'll have to see whether it happens to work without it on XEmacs. If not, I'll have to come up with something else. Thanks for the analysis! -- David Kastrup, Kriemhildstr. 15, 44793 Bochum