all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: David Kastrup <dak@gnu.org>
Cc: emacs-devel@gnu.org
Subject: Re: decode-coding-string gone awry?
Date: Mon, 14 Feb 2005 21:09:46 +0100	[thread overview]
Message-ID: <x5is4u3nud.fsf@lola.goethe.zz> (raw)
In-Reply-To: <jwvd5v3gdaq.fsf-monnier+emacs@gnu.org> (Stefan Monnier's message of "Mon, 14 Feb 2005 14:30:32 -0500")

Stefan Monnier <monnier@iro.umontreal.ca> writes:

>> Give me a clue: what happens if a process inserts stuff with
>> 'raw-text encoding into a multibyte buffer?  'raw-text is a
>> reconstructible encoding, isn't it, so the stuff will get converted
>> into some prefix byte indicating "isolated single-byte entity
>> instead of utf-8 char" and the byte itself or something, right?
>> And decode-encoding-string does not want to work on something like
>> that?
>
> If you want accented chars to appear as accented chars in the
> (process) buffer (i.e. you don't want to change the AUCTeX part),
> then raw-text is not an option anyway.

Yes, I figured as much.  I should better explain what I am doing in
the first place.  AUCTeX does the basic management of the buffer,
creating it, associating processes with it, making a filter routine
for it that inserts the strings after some scanning for keyphrases and
so on.

preview-latex uses all of this folderol, but turns the process output
encoding of its own processes to raw text.  This is something that
AUCTeX does _not_ yet do for its own processes.  AUCTeX's own
process output is more likely to be viewed by the user, anyway.  We
can't hope to get a really readable UTF-8 display for AUCTeX's own
processes at the moment, but AUCTeX's behavior right now leads to
user-readable output in all current cases _except_ when TeX thinks it
is in some Latin-1 locale while working on utf-8 input.

Now with the AUCTeX processes, user readability is the most important
thing.  If AUCTeX can't locate the buffer position exactly, it will at
least locate the line, and that's tolerable for all practical
purposes.

With preview-latex, it is not tolerable.  On the other hand, the
output from preview-latex processes is usually not shown to the user
at all: having an unreadable output buffer due to raw-text encoding is
quite ok.

So that is basically the background why we can easily make the process
raw-text, but quite less easily make the buffer unibyte: AUCTeX will
use the same buffer for its next run, just erasing it, and if it has
turned unibyte, we get into trouble.

> If you don't mind about accented chars appearing as \NNN, then you
> can make the buffer unibyte and use `raw-text' as the process's
> output coding-system.  That's the more robust approach.

If the accented chars (in fact, the whole upper 8bit page) appeared as
\NNN, this would actually mostly be a _win_ over the current situation
where we not too rarely get a mixture of raw bytes and nonsense
characters.  However, I am afraid that this is not quite possible
right now.

We are now in the process of preparing the last major standalone
release of preview-latex.  After that, it will get folded into AUCTeX,
and we will streamline the whole junk.  But in the next weeks, I still
want to get out a preview-latex that works with the current AUCTeX
releases and vice versa.

After that, we will probably make the process encoding raw-text for
the _whole_ of AUCTeX and use a CCL-Program for preprocessing the ^^
sequences into bytecodes again, essentially creating an efficient
artificial illusion of a TeX outputting sane error messages in all
surroundings.

> If that option is out (i.e. you have to use a multibyte buffer),
> you'll have to basically recover the original byte-sequence by
> replacing the
>
>    (regexp-quote (substring string 0 (match-beginning 1)))
>
> with
>
>    (regexp-quote (encode-coding-string
>                   (substring string 0 (match-beginning 1))
>                   buffer-file-coding-system))
>
> [assuming buffer-file-coding-system is the process's output
> coding-system]

The process output coding system being raw-text.  Do I really need to
actually encode raw-text?

>    (regexp-quote (string-make-unibyte
>                   (substring string 0 (match-beginning 1))))
>
> which is basically equivalent except that you lose control over
> which coding-system is used.

I have to admit to being befuddled.  I'll probably have to experiment
until I find something that works and cross fingers.  I don't think I
have much of a chance to actually understand all of the involved
intricacies.

-- 
David Kastrup, Kriemhildstr. 15, 44793 Bochum

  reply	other threads:[~2005-02-14 20:09 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-02-13  3:50 decode-coding-string gone awry? David Kastrup
2005-02-14  1:50 ` Kenichi Handa
2005-02-14  2:28   ` David Kastrup
2005-02-15  6:15   ` Richard Stallman
2005-02-15  9:31     ` David Kastrup
2005-02-15 16:17     ` Stefan Monnier
2005-02-17 10:35       ` Richard Stallman
2005-02-17 12:08       ` Kenichi Handa
2005-02-17 13:20         ` Stefan Monnier
2005-02-18  8:30           ` Kenichi Handa
2005-02-18 12:56             ` Stefan Monnier
2005-02-19  9:44             ` Richard Stallman
2005-02-18 14:12           ` Richard Stallman
2005-02-19 20:55             ` Richard Stallman
2005-02-21  1:19               ` Kenichi Handa
2005-02-22  8:41                 ` Richard Stallman
2005-02-18 14:12         ` Richard Stallman
2005-02-14 13:37 ` Stefan Monnier
2005-02-14 13:50   ` David Kastrup
2005-02-14 16:57     ` Stefan Monnier
2005-02-14 17:24       ` David Kastrup
2005-02-14 18:12         ` Stefan Monnier
2005-02-14 18:41           ` David Kastrup
2005-02-14 19:30             ` Stefan Monnier
2005-02-14 20:09               ` David Kastrup [this message]
2005-02-14 20:56                 ` Stefan Monnier
2005-02-14 21:07                   ` David Kastrup
2005-02-14 21:29                     ` Stefan Monnier
2005-02-14 21:57                       ` David Kastrup
2005-02-14 21:26                   ` David Kastrup
2005-02-15 17:28         ` Richard Stallman
2005-02-15 21:42           ` David Kastrup

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=x5is4u3nud.fsf@lola.goethe.zz \
    --to=dak@gnu.org \
    --cc=emacs-devel@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.