unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
From: Simon Josefsson <jas@extundo.com>
Cc: emacs-devel@gnu.org
Subject: Re: eight-bit char handling in emacs-unicode
Date: Sat, 15 Nov 2003 04:04:05 +0100	[thread overview]
Message-ID: <ilu1xsas57e.fsf@latte.josefsson.org> (raw)
In-Reply-To: <200311140047.JAA06414@etlken.m17n.org> (Kenichi Handa's message of "Fri, 14 Nov 2003 09:47:51 +0900 (JST)")

Kenichi Handa <handa@m17n.org> writes:

> In article <ilun0b08by1.fsf@latte.josefsson.org>, Simon Josefsson <jas@extundo.com> writes:
>> rfc2104.el now works, thanks.  But does the fix really have to
>> explicitly mention charsets like iso-latin-1?  Is there no way to
>> handle binary octet strings in emacs-unicode?  Preferably in a
>> portable way, that works on old Emacs versions and on XEmacs.
>
>>>  This is a typical problem of emacs-unicode in which
>>>  characters 128..255 are valid Unicode characters, thus, for
>>>  instance, (concat '(?a ?\300)) returns a multibyte string of
>>>  `a' and `À'.  But in the current Emacs, it returns a unibyte
>>>  string.
>>> 
>>>  I suspect the similar fix is necessary in several other
>>>  places.
>
>> Having a way to deal with data that is a pure single byte, without
>> involving coding systems, seems like a rather important thing to me.
>
> I agree with you.  Currently, I can think of these methods:

Can you think of one that would work on Emacs 21?  Having a stable
idiom to use to deal with octets would be useful, forcing third-party
packages to try several methods can easily lead to unreadable code.

> (1) Perhaps the easiest way.
>
> Check `default-enable-multibyte-characters' or a newly
> instroduced variable `byte-as-byte' to decide whether a
> integer 128..255 must be treated as a Latin-1 char or a
> byte.   So,
> (concat '(?a ?\300)) => "aÀ" (multibyte string)
> (let ((byte-as-byte t))
>   (concat '(?a ?\300))) => "a\300" (unibyte string)
>
> (2) Introduce a new function `eight-bit-char'.
>
> It converts an argument to ascii or eight-bit-char.
> (eight-bit-char ?a) => 94
> (eight-bit-char ?\300) => 4194240
> Then,
> (concat '(?a (eight-bit-char ?\300))) => "a\300"

Both would work for me, although superficially both look like quick
hacks to me.

> (3) Make a series of new functions (I think it's not good)
>
> concat vs concat-unibyte
> string vs string-unibyte
> aset vs aset-unibyte

I agree it isn't good.

> (4) Most drastic way (the cleanest but requires lots of work)
>
> The basic problem is that we don't distinguish a character
> (code) and a number.  So, we introduce a character object
> (like XEmacs).  The function `character' converts a
> character code into the corresponding character object.  The
> lisp reader always generate a character object for ?a,
> ?\300, etc.   So:
>  (concat '(?a ?\300)) => "aÀ"
>  (concat '(?a #o300)) => "a\300"
>  (concat '(?a (character #o300))) => "aÀ"
>  (concat '(?a #o300 (character #o300))) => "a\300À"
>
> Note: (character X) == (decode-char 'ucs X)

This would be nice.  Characters aren't numbers (unless within the
internal representation, but the internal representation should be
hidden), so separating the two types is useful.  So to be consistent
with that, I think your `character' function should be called
`ucs-character' or similar.

>> It started now, but when I enter a summary buffer it crashed:
>
>> Program received signal SIGSEGV, Segmentation fault.
>> 0x081a3c81 in skip_chars (forwardp=1, string=160, lim=36) at syntax.c:1591
>> 1591                      char_ranges[n_char_ranges++] = c;
>> (gdb) bt
>> #0  0x081a3c81 in skip_chars (forwardp=1, string=160, lim=36) at syntax.c:1591
>
> I just tried gnus but I couldn't reproduce it.  So, I need
> more help.  Could you show me the results of the following?
>
> (gdb) p n_char_ranges
> (gbd) p c
> (gdb) p string
> (gdb) xstring
> (gdb) p *$

I'll try to get time to try emacs-unicode-2 more, but no promises.

Thanks.

  parent reply	other threads:[~2003-11-15  3:04 UTC|newest]

Thread overview: 58+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2003-11-12 16:11 BIG5-HKSCS? Simon Josefsson
2003-11-13  1:53 ` BIG5-HKSCS? Kenichi Handa
2003-11-13  4:14   ` BIG5-HKSCS? Simon Josefsson
2003-11-13  5:34     ` BIG5-HKSCS? Kenichi Handa
2003-11-13  5:50       ` BIG5-HKSCS? Simon Josefsson
2003-11-13  4:49   ` BIG5-HKSCS? Simon Josefsson
2003-11-13  6:10     ` BIG5-HKSCS? Kenichi Handa
2003-11-13  6:51       ` BIG5-HKSCS? Simon Josefsson
2003-11-13  9:01         ` BIG5-HKSCS? Kenichi Handa
2003-11-13 13:29           ` BIG5-HKSCS? Oliver Scholz
2003-11-13 23:40             ` BIG5-HKSCS? Kenichi Handa
2003-11-14 13:35               ` BIG5-HKSCS? Oliver Scholz
2003-11-13 16:34           ` BIG5-HKSCS? Simon Josefsson
2003-11-14  0:47             ` eight-bit char handling in emacs-unicode Kenichi Handa
2003-11-14 13:25               ` Oliver Scholz
2003-11-15  1:09                 ` Kenichi Handa
2003-11-15 10:26                   ` Oliver Scholz
2003-11-15 21:47                     ` Simon Josefsson
2003-11-15  3:04               ` Simon Josefsson [this message]
2003-11-16 15:03                 ` Alex Schroeder
2003-11-17 21:17               ` Stefan Monnier
2003-11-18  7:33                 ` Kenichi Handa
2003-11-18 17:12                   ` Stefan Monnier
2003-11-19  0:06                     ` Kenichi Handa
2003-11-19  3:05                       ` Stefan Monnier
2003-11-19 10:46                         ` Juri Linkov
2003-11-19 13:48                           ` Stefan Monnier
2003-11-20 23:41                           ` Kenichi Handa
2003-11-21  0:41                         ` Kenichi Handa
2003-11-21  5:27                           ` Stefan Monnier
2003-11-21  6:27                             ` Kenichi Handa
2003-11-21 14:59                               ` Stefan Monnier
2003-11-22  1:25                                 ` Kenichi Handa
2003-11-22 23:53                                   ` Stefan Monnier
2003-11-23  7:30                                     ` Kenichi Handa
2003-11-23 23:48                                       ` Stefan Monnier
2003-11-25  1:07                                         ` Kenichi Handa
     [not found]                                           ` <jwvfzgcsbuv.fsf-monnier+emacs/devel@vor.iro.umontreal.ca>
2003-11-26  0:07                                             ` Kenichi Handa
2003-11-26 14:14                                               ` Stefan Monnier
2003-11-27  1:34                                                 ` Kenichi Handa
2003-11-27 14:23                                                   ` Stefan Monnier
2003-12-01  0:43                                                     ` Kenichi Handa
2003-12-01 16:15                                                       ` Stefan Monnier
2003-12-02 13:07                                                         ` Kenichi Handa
2003-12-02 16:06                                                           ` Stefan Monnier
2003-11-25  4:28                                         ` Richard Stallman
     [not found]                                     ` <jwv7k1gtswz.fsf-monnier+emacs/devel@vor.iro.umontreal.ca>
2003-12-09 21:49                                       ` Richard Stallman
2003-11-15 22:32       ` BIG5-HKSCS? Simon Josefsson
2003-11-17  1:12         ` BIG5-HKSCS? Kenichi Handa
2003-11-17  2:06           ` BIG5-HKSCS? Simon Josefsson
2003-11-17  5:45             ` BIG5-HKSCS? Eli Zaretskii
2003-11-17  7:43               ` BIG5-HKSCS? Simon Josefsson
2003-11-18  7:01                 ` BIG5-HKSCS? Richard Stallman
2003-11-18  8:56                   ` BIG5-HKSCS? Simon Josefsson
2003-11-19  5:15                     ` BIG5-HKSCS? Richard Stallman
2003-11-20  5:48                       ` BIG5-HKSCS? Simon Josefsson
2003-11-20  5:56                         ` BIG5-HKSCS? Eli Zaretskii
2003-11-20  6:20                           ` BIG5-HKSCS? Simon Josefsson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ilu1xsas57e.fsf@latte.josefsson.org \
    --to=jas@extundo.com \
    --cc=emacs-devel@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).