all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: David Kastrup <dak@gnu.org>
Subject: How to create a derived encoding?
Date: Tue, 12 Oct 2004 02:10:00 +0200	[thread overview]
Message-ID: <x5vfdgbxuv.fsf@lola.goethe.zz> (raw)


After considerable thinking about the problem, I have arrived at the
conclusion that for efficiency's sake I'd like to have an encoding
like tex-utf-8 which is derived from the normal utf-8 except that
sequences like ^^8a and similar are converted into a corresponding
byte before combining Unicode characters.  It would be a bonus if such
sequences staid unchanged in case that this sort of composition does
not lead to a valid Unicode character, but that's just a bonus.

The problem is that TeX has no clue about _characters_, but works on
byte streams, and it has the habit of transliterating some byte codes
in the above manner.  Treating the output of TeX sensibly means
converting those transliteration back into bytes _before_ assembling
Unicode characters.

The same problem occurs with unibyte non-ASCII encodings by Latin-1.
I already have one (rather inefficient) hack to deal with that in
preview-latex, but it does not extend easily to multibyte.

So if there was a tolerably working way to derive a special encoding
(which will be used as a process output encoding) that reconverts
control sequences like the above before composing unicode characters
from the resulting utf-8 stream, this would appear to be by far the
fastest and convenient way to go about this problem.

Any hints how to derive a suitably augmented encoding from an existing
one?

-- 
David Kastrup, Kriemhildstr. 15, 44793 Bochum

             reply	other threads:[~2004-10-12  0:10 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2004-10-12  0:10 David Kastrup [this message]
2004-10-12 15:09 ` How to create a derived encoding? Stefan Monnier
2004-10-12 15:27   ` David Kastrup
2004-10-12 16:23     ` Stefan Monnier
2004-10-12 21:02       ` David Kastrup
2004-10-14 11:12         ` Oliver Scholz

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=x5vfdgbxuv.fsf@lola.goethe.zz \
    --to=dak@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.