all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: David Kastrup <dak@gnu.org>
Cc: emacs-devel@gnu.org
Subject: Re: decode-coding-string gone awry?
Date: Mon, 14 Feb 2005 03:28:32 +0100	[thread overview]
Message-ID: <x5vf8vri27.fsf@lola.goethe.zz> (raw)
In-Reply-To: <200502140150.KAA29610@etlken.m17n.org> (Kenichi Handa's message of "Mon, 14 Feb 2005 10:50:25 +0900 (JST)")

Kenichi Handa <handa@m17n.org> writes:

> In article <x5d5v52k4m.fsf@lola.goethe.zz>, David Kastrup <dak@gnu.org> writes:
>> I have the problem that within preview-latex there is a function
>> that assembles UTF-8 strings from single characters.  This
>> function, when used manually, mostly works.
>
> It seems that you are caught in a trap of automatic
> unibyte->multibyte conversion.
>
>> (defun preview-error-quote (string)
>>   "Turn STRING with potential ^^ sequences into a regexp.
>> To preserve sanity, additional ^ prefixes are matched literally,
>> so the character represented by ^^^ preceding extended characters
>> will not get matched, usually."
>>   (let (output case-fold-search)
>>     (while (string-match "\\^\\{2,\\}\\(\\([@-_?]\\)\\|[8-9a-f][0-9a-f]\\)"
>> 			 string)
>>       (setq output
>> 	    (concat output
>> 		    (regexp-quote (substring string
>> 					     0
>> 					     (- (match-beginning 1) 2)))
>
> If STRING is taken from a multibyte buffer, it is a
> multibyte string.  Thus, the above substring also returns a
> multibyte string.
>
>> 		      (char-to-string
>> 		       (string-to-number (match-string 1 string) 16))))
>
> But, this char-to-string produces a unibyte string.  So, on
> concatinating them, this unibyte string is automatically converted
> to multibyte by string-make-multibyte function which usually
> produces a multibyte string containing latin-1 chars.

Oh.  Latin-1 chars.  Can't I tell char-to-string to produce the same
sort of raw-marked chars that raw-text (as process-coding system)
appears to produce?

>>   (setq output (decode-coding-string output buffer-file-coding-system))
>
> And this decode-coding-string treats the internal byte
> sequence of a multibyte string OUTPUT as utf-8, thus you get
> some garbage.
>
>> Unfortunately, when I call this stuff by hand instead from the
>> process-sentinel, it mostly works
>
> That is because the string you give to preview-error-quote
> is a unibyte string in that case.  The Lisp reader generates
> a unibyte string when it sees ASCII-only string.
>
> Ex: (multibyte-string-p "abc") => nil
>
> This will also return incorrect string.
>
> (preview-error-quote
>   (string-to-multibyte "r Weise $f$ um~$1$ erh^^c3^^b6ht und $e$"))
>
> So, the easiest fix will be to do:
>   (setq string (string-as-unibyte string))
> in the head of preview-error-quote.

Sigh.  XEmacs-21.4-mule does not seem to have string-as-unibyte.  I'll
have to see whether it happens to work without it on XEmacs.  If not,
I'll have to come up with something else.

Thanks for the analysis!

-- 
David Kastrup, Kriemhildstr. 15, 44793 Bochum

  reply	other threads:[~2005-02-14  2:28 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-02-13  3:50 decode-coding-string gone awry? David Kastrup
2005-02-14  1:50 ` Kenichi Handa
2005-02-14  2:28   ` David Kastrup [this message]
2005-02-15  6:15   ` Richard Stallman
2005-02-15  9:31     ` David Kastrup
2005-02-15 16:17     ` Stefan Monnier
2005-02-17 10:35       ` Richard Stallman
2005-02-17 12:08       ` Kenichi Handa
2005-02-17 13:20         ` Stefan Monnier
2005-02-18  8:30           ` Kenichi Handa
2005-02-18 12:56             ` Stefan Monnier
2005-02-19  9:44             ` Richard Stallman
2005-02-18 14:12           ` Richard Stallman
2005-02-19 20:55             ` Richard Stallman
2005-02-21  1:19               ` Kenichi Handa
2005-02-22  8:41                 ` Richard Stallman
2005-02-18 14:12         ` Richard Stallman
2005-02-14 13:37 ` Stefan Monnier
2005-02-14 13:50   ` David Kastrup
2005-02-14 16:57     ` Stefan Monnier
2005-02-14 17:24       ` David Kastrup
2005-02-14 18:12         ` Stefan Monnier
2005-02-14 18:41           ` David Kastrup
2005-02-14 19:30             ` Stefan Monnier
2005-02-14 20:09               ` David Kastrup
2005-02-14 20:56                 ` Stefan Monnier
2005-02-14 21:07                   ` David Kastrup
2005-02-14 21:29                     ` Stefan Monnier
2005-02-14 21:57                       ` David Kastrup
2005-02-14 21:26                   ` David Kastrup
2005-02-15 17:28         ` Richard Stallman
2005-02-15 21:42           ` David Kastrup

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=x5vf8vri27.fsf@lola.goethe.zz \
    --to=dak@gnu.org \
    --cc=emacs-devel@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.