From mboxrd@z Thu Jan  1 00:00:00 1970
Path: main.gmane.org!not-for-mail
From: David Kastrup <dak@gnu.org>
Newsgroups: gmane.emacs.devel
Subject: Re: decode-coding-string gone awry?
Date: Mon, 14 Feb 2005 21:09:46 +0100
Message-ID: <x5is4u3nud.fsf@lola.goethe.zz>
References: <x5d5v52k4m.fsf@lola.goethe.zz>
	<874qgf1dkv.fsf-monnier+emacs@gnu.org> <x5hdkf5jzi.fsf@lola.goethe.zz>
	<jwvbranhykt.fsf-monnier+emacs@gnu.org>
	<x5fyzz3vh4.fsf@lola.goethe.zz>
	<jwvu0ofggsu.fsf-monnier+emacs@gnu.org>
	<x53bvz3rxs.fsf@lola.goethe.zz>
	<jwvd5v3gdaq.fsf-monnier+emacs@gnu.org>
NNTP-Posting-Host: main.gmane.org
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
X-Trace: sea.gmane.org 1108411806 13036 80.91.229.2 (14 Feb 2005 20:10:06 GMT)
X-Complaints-To: usenet@sea.gmane.org
NNTP-Posting-Date: Mon, 14 Feb 2005 20:10:06 +0000 (UTC)
Cc: emacs-devel@gnu.org
Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Mon Feb 14 21:10:06 2005
Original-Received: from lists.gnu.org ([199.232.76.165])
	by ciao.gmane.org with esmtp (Exim 4.43)
	id 1D0mXd-0006Iw-DZ
	for ged-emacs-devel@m.gmane.org; Mon, 14 Feb 2005 21:09:57 +0100
Original-Received: from localhost ([127.0.0.1] helo=lists.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43)
	id 1D0mnE-0000NV-Dr
	for ged-emacs-devel@m.gmane.org; Mon, 14 Feb 2005 15:26:04 -0500
Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43)
	id 1D0mm8-0000B3-Qx
	for emacs-devel@gnu.org; Mon, 14 Feb 2005 15:24:57 -0500
Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43)
	id 1D0mlw-0008V0-3I
	for emacs-devel@gnu.org; Mon, 14 Feb 2005 15:24:48 -0500
Original-Received: from [199.232.76.173] (helo=monty-python.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43) id 1D0mlu-0008U9-H3
	for emacs-devel@gnu.org; Mon, 14 Feb 2005 15:24:42 -0500
Original-Received: from [199.232.76.164] (helo=fencepost.gnu.org)
	by monty-python.gnu.org with esmtp (Exim 4.34) id 1D0mXT-00071q-0O
	for emacs-devel@gnu.org; Mon, 14 Feb 2005 15:09:47 -0500
Original-Received: from localhost ([127.0.0.1] helo=lola.goethe.zz)
	by fencepost.gnu.org with esmtp (Exim 4.34)
	id 1D0mTQ-0002yj-Js; Mon, 14 Feb 2005 15:05:37 -0500
Original-To: Stefan Monnier <monnier@iro.umontreal.ca>
In-Reply-To: <jwvd5v3gdaq.fsf-monnier+emacs@gnu.org> (Stefan Monnier's
	message of "Mon, 14 Feb 2005 14:30:32 -0500")
User-Agent: Gnus/5.11 (Gnus v5.11) Emacs/22.0.50 (gnu/linux)
X-BeenThere: emacs-devel@gnu.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: "Emacs development discussions." <emacs-devel.gnu.org>
List-Unsubscribe: <http://lists.gnu.org/mailman/listinfo/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/pipermail/emacs-devel>
List-Post: <mailto:emacs-devel@gnu.org>
List-Help: <mailto:emacs-devel-request@gnu.org?subject=help>
List-Subscribe: <http://lists.gnu.org/mailman/listinfo/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=subscribe>
Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org
Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org
X-MailScanner-To: ged-emacs-devel@m.gmane.org
Xref: main.gmane.org gmane.emacs.devel:33416
X-Report-Spam: http://spam.gmane.org/gmane.emacs.devel:33416

Stefan Monnier <monnier@iro.umontreal.ca> writes:

>> Give me a clue: what happens if a process inserts stuff with
>> 'raw-text encoding into a multibyte buffer?  'raw-text is a
>> reconstructible encoding, isn't it, so the stuff will get converted
>> into some prefix byte indicating "isolated single-byte entity
>> instead of utf-8 char" and the byte itself or something, right?
>> And decode-encoding-string does not want to work on something like
>> that?
>
> If you want accented chars to appear as accented chars in the
> (process) buffer (i.e. you don't want to change the AUCTeX part),
> then raw-text is not an option anyway.

Yes, I figured as much.  I should better explain what I am doing in
the first place.  AUCTeX does the basic management of the buffer,
creating it, associating processes with it, making a filter routine
for it that inserts the strings after some scanning for keyphrases and
so on.

preview-latex uses all of this folderol, but turns the process output
encoding of its own processes to raw text.  This is something that
AUCTeX does _not_ yet do for its own processes.  AUCTeX's own
process output is more likely to be viewed by the user, anyway.  We
can't hope to get a really readable UTF-8 display for AUCTeX's own
processes at the moment, but AUCTeX's behavior right now leads to
user-readable output in all current cases _except_ when TeX thinks it
is in some Latin-1 locale while working on utf-8 input.

Now with the AUCTeX processes, user readability is the most important
thing.  If AUCTeX can't locate the buffer position exactly, it will at
least locate the line, and that's tolerable for all practical
purposes.

With preview-latex, it is not tolerable.  On the other hand, the
output from preview-latex processes is usually not shown to the user
at all: having an unreadable output buffer due to raw-text encoding is
quite ok.

So that is basically the background why we can easily make the process
raw-text, but quite less easily make the buffer unibyte: AUCTeX will
use the same buffer for its next run, just erasing it, and if it has
turned unibyte, we get into trouble.

> If you don't mind about accented chars appearing as \NNN, then you
> can make the buffer unibyte and use `raw-text' as the process's
> output coding-system.  That's the more robust approach.

If the accented chars (in fact, the whole upper 8bit page) appeared as
\NNN, this would actually mostly be a _win_ over the current situation
where we not too rarely get a mixture of raw bytes and nonsense
characters.  However, I am afraid that this is not quite possible
right now.

We are now in the process of preparing the last major standalone
release of preview-latex.  After that, it will get folded into AUCTeX,
and we will streamline the whole junk.  But in the next weeks, I still
want to get out a preview-latex that works with the current AUCTeX
releases and vice versa.

After that, we will probably make the process encoding raw-text for
the _whole_ of AUCTeX and use a CCL-Program for preprocessing the ^^
sequences into bytecodes again, essentially creating an efficient
artificial illusion of a TeX outputting sane error messages in all
surroundings.

> If that option is out (i.e. you have to use a multibyte buffer),
> you'll have to basically recover the original byte-sequence by
> replacing the
>
>    (regexp-quote (substring string 0 (match-beginning 1)))
>
> with
>
>    (regexp-quote (encode-coding-string
>                   (substring string 0 (match-beginning 1))
>                   buffer-file-coding-system))
>
> [assuming buffer-file-coding-system is the process's output
> coding-system]

The process output coding system being raw-text.  Do I really need to
actually encode raw-text?

>    (regexp-quote (string-make-unibyte
>                   (substring string 0 (match-beginning 1))))
>
> which is basically equivalent except that you lose control over
> which coding-system is used.

I have to admit to being befuddled.  I'll probably have to experiment
until I find something that works and cross fingers.  I don't think I
have much of a chance to actually understand all of the involved
intricacies.

-- 
David Kastrup, Kriemhildstr. 15, 44793 Bochum