unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
From: David Kastrup <dak@gnu.org>
Cc: emacs-devel@gnu.org
Subject: Re: decode-coding-string gone awry?
Date: Mon, 14 Feb 2005 19:41:19 +0100	[thread overview]
Message-ID: <x53bvz3rxs.fsf@lola.goethe.zz> (raw)
In-Reply-To: <jwvu0ofggsu.fsf-monnier+emacs@gnu.org> (Stefan Monnier's message of "Mon, 14 Feb 2005 13:12:03 -0500")

Stefan Monnier <monnier@iro.umontreal.ca> writes:

>>> instead of being processed directly from the process filter, then
>>> you should also ensure that this buffer is unibyte.
>
>> Yuk.  The problem is that this buffer is not only processed by
>> preview-latex, but also by AUCTeX, and the versions that get combined
>> may be different.  AUCTeX uses the source code buffer's file encoding
>> by default, which is fine for basically unibyte based coding systems.
>
> If you can't change this part, then your best bet might be to do something
> like:
>
> (defun preview-error-quote (string)
>   "Turn STRING with potential ^^ sequences into a regexp.
> To preserve sanity, additional ^ prefixes are matched literally,
> so the character represented by ^^^ preceding extended characters
> will not get matched, usually."
>   (let (output case-fold-search)
>     (while (string-match "\\^*\\(\\^\\^\\(\\([@-_?]\\)\\|[8-9a-f][0-9a-f]\\)\\)+"
>                          string)
>       (setq output
>             (concat output
>                     (regexp-quote (substring string 0 (match-beginning 1)))
>                     (decode-coding-string
>                      (preview-dequote-thingies (substring (match-beginning 1)
>                                                           (match-end 0)))
>                      buffer-file-coding-system))
>             string (substring string (match-end 0))))
>     (setq output (concat output (regexp-quote string)))
>     output)))
>
> BTW, you can use the 3rd arg to string-match to avoid consing strings for
> `string'.
>
> This way you only apply decode-coding-string to the part of the
> string which is still undecoded but not to the rest.

No use.  The gag precisely is that TeX may decide to split a _single_
Unicode character into some bytes that it will let go through
unchanged, and some bytes that it will transcribe into ^^ba notation.
If decode-coding-string is supposed to have a chance of reassembling
this junk, it must only be run at the end of reconstructing the byte
stream.  Yes, this is completely insane.  No, I can't avoid having to
deal with it somehow.

Give me a clue: what happens if a process inserts stuff with 'raw-text
encoding into a multibyte buffer?  'raw-text is a reconstructible
encoding, isn't it, so the stuff will get converted into some prefix
byte indicating "isolated single-byte entity instead of utf-8 char"
and the byte itself or something, right?  And decode-encoding-string
does not want to work on something like that?

I have to admit to total cluelessness.

-- 
David Kastrup, Kriemhildstr. 15, 44793 Bochum

  reply	other threads:[~2005-02-14 18:41 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-02-13  3:50 decode-coding-string gone awry? David Kastrup
2005-02-14  1:50 ` Kenichi Handa
2005-02-14  2:28   ` David Kastrup
2005-02-15  6:15   ` Richard Stallman
2005-02-15  9:31     ` David Kastrup
2005-02-15 16:17     ` Stefan Monnier
2005-02-17 10:35       ` Richard Stallman
2005-02-17 12:08       ` Kenichi Handa
2005-02-17 13:20         ` Stefan Monnier
2005-02-18  8:30           ` Kenichi Handa
2005-02-18 12:56             ` Stefan Monnier
2005-02-19  9:44             ` Richard Stallman
2005-02-18 14:12           ` Richard Stallman
2005-02-19 20:55             ` Richard Stallman
2005-02-21  1:19               ` Kenichi Handa
2005-02-22  8:41                 ` Richard Stallman
2005-02-18 14:12         ` Richard Stallman
2005-02-14 13:37 ` Stefan Monnier
2005-02-14 13:50   ` David Kastrup
2005-02-14 16:57     ` Stefan Monnier
2005-02-14 17:24       ` David Kastrup
2005-02-14 18:12         ` Stefan Monnier
2005-02-14 18:41           ` David Kastrup [this message]
2005-02-14 19:30             ` Stefan Monnier
2005-02-14 20:09               ` David Kastrup
2005-02-14 20:56                 ` Stefan Monnier
2005-02-14 21:07                   ` David Kastrup
2005-02-14 21:29                     ` Stefan Monnier
2005-02-14 21:57                       ` David Kastrup
2005-02-14 21:26                   ` David Kastrup
2005-02-15 17:28         ` Richard Stallman
2005-02-15 21:42           ` David Kastrup

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=x53bvz3rxs.fsf@lola.goethe.zz \
    --to=dak@gnu.org \
    --cc=emacs-devel@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).