* string-as-unibyte
@ 2005-07-18 21:33 Stefan Monnier
2005-07-18 22:41 ` string-as-unibyte YAMAMOTO Mitsuharu
0 siblings, 1 reply; 8+ messages in thread
From: Stefan Monnier @ 2005-07-18 21:33 UTC (permalink / raw)
Cc: emacs-devel
Could you explain the need for the change below:
2005-07-16 YAMAMOTO Mitsuharu <mituharu@math.s.chiba-u.ac.jp>
* mac.c [TARGET_API_MAC_CARBON] (Fmac_code_convert_string):
Use Fstring_as_unibyte instead of string_make_unibyte.
My experience is that string-as-unibyte is extremely rarely the right answer
to solve a problem. If you described your motivation, I could add a comment
in the code making it clear why this is needed here (or else come up with
a better solution).
Stefan
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: string-as-unibyte
2005-07-18 21:33 string-as-unibyte Stefan Monnier
@ 2005-07-18 22:41 ` YAMAMOTO Mitsuharu
2005-07-18 23:52 ` string-as-unibyte YAMAMOTO Mitsuharu
2005-07-19 2:56 ` string-as-unibyte Kenichi Handa
0 siblings, 2 replies; 8+ messages in thread
From: YAMAMOTO Mitsuharu @ 2005-07-18 22:41 UTC (permalink / raw)
Cc: emacs-devel
>>>>> On Mon, 18 Jul 2005 17:33:02 -0400, Stefan Monnier <monnier@iro.umontreal.ca> said:
> Could you explain the need for the change below:
> 2005-07-16 YAMAMOTO Mitsuharu <mituharu@math.s.chiba-u.ac.jp>
> * mac.c [TARGET_API_MAC_CARBON] (Fmac_code_convert_string):
> Use Fstring_as_unibyte instead of string_make_unibyte.
It is at the preparation stage of code conversion. So I think the
following comment in decode_coding_string (coding.c) is also
applicable to this case.
if (STRING_MULTIBYTE (str))
{
/* Decoding routines expect the source text to be unibyte. */
str = Fstring_as_unibyte (str);
> My experience is that string-as-unibyte is extremely rarely the
> right answer to solve a problem. If you described your motivation,
> I could add a comment in the code making it clear why this is needed
> here (or else come up with a better solution).
I was trying to make a coding system that almost works as utf-8, but
additionally does "HFS+ composition" (canonical composition with some
exclusions) on decoding.
YAMAMOTO Mitsuharu
mituharu@math.s.chiba-u.ac.jp
;; For the Carbon port, Mac OS X 10.2 or later.
(make-coding-system
'mac-hfs+
0
(coding-system-mnemonic 'utf-8)
"Like utf-8, but additionally does Mac HFS+ composition on decoding."
(coding-system-flags 'utf-8)
(list (cons 'safe-charsets (coding-system-get 'utf-8 'safe-charsets))
'(post-read-conversion . mac-hfs+-post-read-conversion)
'(pre-write-conversion . mac-hfs+-pre-write-conversion)))
(defun mac-hfs+-post-read-conversion (length)
(save-excursion
(save-restriction
(narrow-to-region (point) (+ (point) length))
(let ((str (mac-code-convert-string (buffer-string)
'utf-8 'utf-8 'HFS+C)))
(when str
(erase-buffer)
(insert (if enable-multibyte-characters
(string-as-multibyte str) str)))
(setq length (decode-coding-region (point-min) (point-max) 'utf-8))
;; We are inside a post-read-conversion function, so the
;; original post-read-conversion for utf-8 is not
;; automatically called.
(goto-char (point-min))
(funcall (or (coding-system-get 'utf-8 'post-read-conversion)
'identity)
length)))))
(defun mac-hfs+-pre-write-conversion (beg end)
(funcall (or (coding-system-get 'utf-8 'pre-write-conversion) 'ignore)
beg (+ beg (encode-coding-region beg end 'utf-8))))
(setq default-file-name-coding-system 'mac-hfs+)
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: string-as-unibyte
2005-07-18 22:41 ` string-as-unibyte YAMAMOTO Mitsuharu
@ 2005-07-18 23:52 ` YAMAMOTO Mitsuharu
2005-07-19 2:50 ` string-as-unibyte Kenichi Handa
2005-07-19 2:56 ` string-as-unibyte Kenichi Handa
1 sibling, 1 reply; 8+ messages in thread
From: YAMAMOTO Mitsuharu @ 2005-07-18 23:52 UTC (permalink / raw)
Cc: emacs-devel
>>>>> On Tue, 19 Jul 2005 07:41:33 +0900, YAMAMOTO Mitsuharu <mituharu@math.s.chiba-u.ac.jp> said:
> I was trying to make a coding system that almost works as utf-8, but
> additionally does "HFS+ composition" (canonical composition with
> some exclusions) on decoding.
> (when str
> (erase-buffer)
> (insert (if enable-multibyte-characters
> (string-as-multibyte str) str)))
Maybe this part should have been string-to-multibyte.
BTW, I noticed that backslashes in the docstring of
string-as-multibyte are stripped off as follows:
If you're not sure, whether to use `string-as-multibyte' or
`string-to-multibyte', use `string-to-multibyte'. Beware:
(aref (string-as-multibyte "201") 0) -> 129 (aka ?201)
(aref (string-as-multibyte "300") 0) -> 192 (aka ?300)
(aref (string-as-multibyte "300201") 0) -> 192 (aka ?300)
(aref (string-as-multibyte "300201") 1) -> 129 (aka ?201)
but
(aref (string-as-multibyte "201300") 0) -> 2240
(aref (string-as-multibyte "201300") 1) -> <error>
YAMAMOTO Mitsuharu
mituharu@math.s.chiba-u.ac.jp
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: string-as-unibyte
2005-07-18 23:52 ` string-as-unibyte YAMAMOTO Mitsuharu
@ 2005-07-19 2:50 ` Kenichi Handa
0 siblings, 0 replies; 8+ messages in thread
From: Kenichi Handa @ 2005-07-19 2:50 UTC (permalink / raw)
Cc: monnier, emacs-devel
In article <wl3bqbisf1.wl%mituharu@math.s.chiba-u.ac.jp>, YAMAMOTO Mitsuharu <mituharu@math.s.chiba-u.ac.jp> writes:
> BTW, I noticed that backslashes in the docstring of
> string-as-multibyte are stripped off as follows:
> If you're not sure, whether to use `string-as-multibyte' or
> `string-to-multibyte', use `string-to-multibyte'. Beware:
> (aref (string-as-multibyte "201") 0) -> 129 (aka ?201)
[...]
Thank you for finding this bug. I've just installed a fix.
---
Kenichi Handa
handa@m17n.org
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: string-as-unibyte
2005-07-18 22:41 ` string-as-unibyte YAMAMOTO Mitsuharu
2005-07-18 23:52 ` string-as-unibyte YAMAMOTO Mitsuharu
@ 2005-07-19 2:56 ` Kenichi Handa
2005-07-19 3:49 ` string-as-unibyte YAMAMOTO Mitsuharu
2005-07-19 14:26 ` string-as-unibyte Stefan Monnier
1 sibling, 2 replies; 8+ messages in thread
From: Kenichi Handa @ 2005-07-19 2:56 UTC (permalink / raw)
Cc: monnier, emacs-devel
In article <wl4qarag9u.wl%mituharu@math.s.chiba-u.ac.jp>, YAMAMOTO Mitsuharu <mituharu@math.s.chiba-u.ac.jp> writes:
>>>>>> On Mon, 18 Jul 2005 17:33:02 -0400, Stefan Monnier <monnier@iro.umontreal.ca> said:
>> Could you explain the need for the change below:
>> 2005-07-16 YAMAMOTO Mitsuharu <mituharu@math.s.chiba-u.ac.jp>
>> * mac.c [TARGET_API_MAC_CARBON] (Fmac_code_convert_string):
>> Use Fstring_as_unibyte instead of string_make_unibyte.
> It is at the preparation stage of code conversion. So I think the
> following comment in decode_coding_string (coding.c) is also
> applicable to this case.
> if (STRING_MULTIBYTE (str))
> {
> /* Decoding routines expect the source text to be unibyte. */
> str = Fstring_as_unibyte (str);
If a multibyte string is given to mac-code-convert-string,
and the string is made mutlibyte by string-to-multibyte from
the raw-byte sequence (ex. inserting a file by raw-text in a
mutlibyte buffer and extracting a string by
buffer-substring), using Fstring_as_unibyte is correct.
Please note that we don't have Fstring_to_unibyte because it
should work the same way as Fstring_as_unibyte.
---
Kenichi Handa
handa@m17n.org
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: string-as-unibyte
2005-07-19 2:56 ` string-as-unibyte Kenichi Handa
@ 2005-07-19 3:49 ` YAMAMOTO Mitsuharu
2005-07-19 14:26 ` string-as-unibyte Stefan Monnier
1 sibling, 0 replies; 8+ messages in thread
From: YAMAMOTO Mitsuharu @ 2005-07-19 3:49 UTC (permalink / raw)
Cc: monnier, emacs-devel
>>>>> On Tue, 19 Jul 2005 11:56:37 +0900, Kenichi Handa <handa@m17n.org> said:
> If a multibyte string is given to mac-code-convert-string, and the
> string is made mutlibyte by string-to-multibyte from the raw-byte
> sequence (ex. inserting a file by raw-text in a mutlibyte buffer and
> extracting a string by buffer-substring), using Fstring_as_unibyte
> is correct.
That's the case for mac-code-convert-string. Thanks for clarifying.
As for the `mac-hfs+' coding system shown in my previous mail, its
coding-system-type should have been 5 (raw-text), rather than 0
(emacs-mule) so that the leading bytes for eight-bit-control and
eight-bit-graphic may not be eaten. And I had to do
(set-buffer-multibyte t) explicitly for the case of
decode-coding-string. Maybe " *code-converting-work*" in
ctext-post-read-conversion should be " *code-conversion-work*" ?
YAMAMOTO Mitsuharu
mituharu@math.s.chiba-u.ac.jp
(make-coding-system
'mac-hfs+
5
(coding-system-mnemonic 'utf-8)
"Like utf-8, but additionally does Mac HFS+ composition on decoding."
(coding-system-flags 'utf-8)
(list (cons 'safe-charsets (coding-system-get 'utf-8 'safe-charsets))
'(post-read-conversion . mac-hfs+-post-read-conversion)
'(pre-write-conversion . mac-hfs+-pre-write-conversion)))
(defun mac-hfs+-post-read-conversion (length)
(save-excursion
(save-restriction
(narrow-to-region (point) (+ (point) length))
(let ((in-workbuf (string= (buffer-name) " *code-conversion-work*"))
(str (mac-code-convert-string (buffer-string)
'utf-8 'utf-8 'HFS+C)))
(when str
(erase-buffer)
(insert (if enable-multibyte-characters
(string-to-multibyte str) str)))
(if in-workbuf
(set-buffer-multibyte t))
(setq length (decode-coding-region (point-min) (point-max) 'utf-8))
;; We are inside a post-read-conversion function, so the
;; original post-read-conversion for utf-8 is not
;; automatically called.
(goto-char (point-min))
(funcall (or (coding-system-get 'utf-8 'post-read-conversion)
'identity)
length)))))
(defun mac-hfs+-pre-write-conversion (beg end)
(funcall (or (coding-system-get 'utf-8 'pre-write-conversion) 'ignore)
beg (+ beg (encode-coding-region beg end 'utf-8))))
(setq default-file-name-coding-system 'mac-hfs+)
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: string-as-unibyte
2005-07-19 2:56 ` string-as-unibyte Kenichi Handa
2005-07-19 3:49 ` string-as-unibyte YAMAMOTO Mitsuharu
@ 2005-07-19 14:26 ` Stefan Monnier
2005-07-20 0:36 ` string-as-unibyte Kenichi Handa
1 sibling, 1 reply; 8+ messages in thread
From: Stefan Monnier @ 2005-07-19 14:26 UTC (permalink / raw)
Cc: YAMAMOTO Mitsuharu, emacs-devel
> If a multibyte string is given to mac-code-convert-string,
> and the string is made mutlibyte by string-to-multibyte from
> the raw-byte sequence (ex. inserting a file by raw-text in a
> mutlibyte buffer and extracting a string by
> buffer-substring), using Fstring_as_unibyte is correct.
Indeed, but sadly so. In order to make it clear that the multibyte string
is expected to only contain single-byte chars (including eight-bit-* chars),
a comment is in order.
> Please note that we don't have Fstring_to_unibyte because it
> should work the same way as Fstring_as_unibyte.
Actually no. string-to-unibyte should signal an error if it encounters
a non-ascii non-eight-bit-* char.
Stefan
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: string-as-unibyte
2005-07-19 14:26 ` string-as-unibyte Stefan Monnier
@ 2005-07-20 0:36 ` Kenichi Handa
0 siblings, 0 replies; 8+ messages in thread
From: Kenichi Handa @ 2005-07-20 0:36 UTC (permalink / raw)
Cc: mituharu, emacs-devel
In article <87wtnmg9iw.fsf-monnier+emacs@gnu.org>, Stefan Monnier <monnier@iro.umontreal.ca> writes:
>> If a multibyte string is given to mac-code-convert-string,
>> and the string is made mutlibyte by string-to-multibyte from
>> the raw-byte sequence (ex. inserting a file by raw-text in a
>> mutlibyte buffer and extracting a string by
>> buffer-substring), using Fstring_as_unibyte is correct.
> Indeed, but sadly so. In order to make it clear that the multibyte string
> is expected to only contain single-byte chars (including eight-bit-* chars),
> a comment is in order.
I agree.
>> Please note that we don't have Fstring_to_unibyte because it
>> should work the same way as Fstring_as_unibyte.
> Actually no. string-to-unibyte should signal an error if it encounters
> a non-ascii non-eight-bit-* char.
Ah, hmmm, you are right. I remember to implement it in
emacs-unicode.
---
Kenichi Handa
handa@m17n.org
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2005-07-20 0:36 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-07-18 21:33 string-as-unibyte Stefan Monnier
2005-07-18 22:41 ` string-as-unibyte YAMAMOTO Mitsuharu
2005-07-18 23:52 ` string-as-unibyte YAMAMOTO Mitsuharu
2005-07-19 2:50 ` string-as-unibyte Kenichi Handa
2005-07-19 2:56 ` string-as-unibyte Kenichi Handa
2005-07-19 3:49 ` string-as-unibyte YAMAMOTO Mitsuharu
2005-07-19 14:26 ` string-as-unibyte Stefan Monnier
2005-07-20 0:36 ` string-as-unibyte Kenichi Handa
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/emacs.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).