From mboxrd@z Thu Jan 1 00:00:00 1970 Path: main.gmane.org!not-for-mail From: David Kastrup Newsgroups: gmane.emacs.devel Subject: Re: decode-coding-string gone awry? Date: Mon, 14 Feb 2005 21:09:46 +0100 Message-ID: References: <874qgf1dkv.fsf-monnier+emacs@gnu.org> NNTP-Posting-Host: main.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: sea.gmane.org 1108411806 13036 80.91.229.2 (14 Feb 2005 20:10:06 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Mon, 14 Feb 2005 20:10:06 +0000 (UTC) Cc: emacs-devel@gnu.org Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Mon Feb 14 21:10:06 2005 Original-Received: from lists.gnu.org ([199.232.76.165]) by ciao.gmane.org with esmtp (Exim 4.43) id 1D0mXd-0006Iw-DZ for ged-emacs-devel@m.gmane.org; Mon, 14 Feb 2005 21:09:57 +0100 Original-Received: from localhost ([127.0.0.1] helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1D0mnE-0000NV-Dr for ged-emacs-devel@m.gmane.org; Mon, 14 Feb 2005 15:26:04 -0500 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1D0mm8-0000B3-Qx for emacs-devel@gnu.org; Mon, 14 Feb 2005 15:24:57 -0500 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1D0mlw-0008V0-3I for emacs-devel@gnu.org; Mon, 14 Feb 2005 15:24:48 -0500 Original-Received: from [199.232.76.173] (helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1D0mlu-0008U9-H3 for emacs-devel@gnu.org; Mon, 14 Feb 2005 15:24:42 -0500 Original-Received: from [199.232.76.164] (helo=fencepost.gnu.org) by monty-python.gnu.org with esmtp (Exim 4.34) id 1D0mXT-00071q-0O for emacs-devel@gnu.org; Mon, 14 Feb 2005 15:09:47 -0500 Original-Received: from localhost ([127.0.0.1] helo=lola.goethe.zz) by fencepost.gnu.org with esmtp (Exim 4.34) id 1D0mTQ-0002yj-Js; Mon, 14 Feb 2005 15:05:37 -0500 Original-To: Stefan Monnier In-Reply-To: (Stefan Monnier's message of "Mon, 14 Feb 2005 14:30:32 -0500") User-Agent: Gnus/5.11 (Gnus v5.11) Emacs/22.0.50 (gnu/linux) X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org X-MailScanner-To: ged-emacs-devel@m.gmane.org Xref: main.gmane.org gmane.emacs.devel:33416 X-Report-Spam: http://spam.gmane.org/gmane.emacs.devel:33416 Stefan Monnier writes: >> Give me a clue: what happens if a process inserts stuff with >> 'raw-text encoding into a multibyte buffer? 'raw-text is a >> reconstructible encoding, isn't it, so the stuff will get converted >> into some prefix byte indicating "isolated single-byte entity >> instead of utf-8 char" and the byte itself or something, right? >> And decode-encoding-string does not want to work on something like >> that? > > If you want accented chars to appear as accented chars in the > (process) buffer (i.e. you don't want to change the AUCTeX part), > then raw-text is not an option anyway. Yes, I figured as much. I should better explain what I am doing in the first place. AUCTeX does the basic management of the buffer, creating it, associating processes with it, making a filter routine for it that inserts the strings after some scanning for keyphrases and so on. preview-latex uses all of this folderol, but turns the process output encoding of its own processes to raw text. This is something that AUCTeX does _not_ yet do for its own processes. AUCTeX's own process output is more likely to be viewed by the user, anyway. We can't hope to get a really readable UTF-8 display for AUCTeX's own processes at the moment, but AUCTeX's behavior right now leads to user-readable output in all current cases _except_ when TeX thinks it is in some Latin-1 locale while working on utf-8 input. Now with the AUCTeX processes, user readability is the most important thing. If AUCTeX can't locate the buffer position exactly, it will at least locate the line, and that's tolerable for all practical purposes. With preview-latex, it is not tolerable. On the other hand, the output from preview-latex processes is usually not shown to the user at all: having an unreadable output buffer due to raw-text encoding is quite ok. So that is basically the background why we can easily make the process raw-text, but quite less easily make the buffer unibyte: AUCTeX will use the same buffer for its next run, just erasing it, and if it has turned unibyte, we get into trouble. > If you don't mind about accented chars appearing as \NNN, then you > can make the buffer unibyte and use `raw-text' as the process's > output coding-system. That's the more robust approach. If the accented chars (in fact, the whole upper 8bit page) appeared as \NNN, this would actually mostly be a _win_ over the current situation where we not too rarely get a mixture of raw bytes and nonsense characters. However, I am afraid that this is not quite possible right now. We are now in the process of preparing the last major standalone release of preview-latex. After that, it will get folded into AUCTeX, and we will streamline the whole junk. But in the next weeks, I still want to get out a preview-latex that works with the current AUCTeX releases and vice versa. After that, we will probably make the process encoding raw-text for the _whole_ of AUCTeX and use a CCL-Program for preprocessing the ^^ sequences into bytecodes again, essentially creating an efficient artificial illusion of a TeX outputting sane error messages in all surroundings. > If that option is out (i.e. you have to use a multibyte buffer), > you'll have to basically recover the original byte-sequence by > replacing the > > (regexp-quote (substring string 0 (match-beginning 1))) > > with > > (regexp-quote (encode-coding-string > (substring string 0 (match-beginning 1)) > buffer-file-coding-system)) > > [assuming buffer-file-coding-system is the process's output > coding-system] The process output coding system being raw-text. Do I really need to actually encode raw-text? > (regexp-quote (string-make-unibyte > (substring string 0 (match-beginning 1)))) > > which is basically equivalent except that you lose control over > which coding-system is used. I have to admit to being befuddled. I'll probably have to experiment until I find something that works and cross fingers. I don't think I have much of a chance to actually understand all of the involved intricacies. -- David Kastrup, Kriemhildstr. 15, 44793 Bochum