all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
* bug#1051: 23.0.60; rmail decoding bug
@ 2008-09-29 17:13 Richard M. Stallman
  2008-09-29 19:03 ` Eli Zaretskii
  0 siblings, 1 reply; 20+ messages in thread
From: Richard M. Stallman @ 2008-09-29 17:13 UTC (permalink / raw)
  To: emacs-pretest-bug

Rmail decodes this message (and many others like it)
incorrectly.  Each pair of quoted-printable characters is supposed
to convert to one character in the Emacs buffer, but instead
it shows up as two.

    From dorascilipoti@alice.it  Sun Sep 28 14:33:13 2008
    Return-path: <dorascilipoti@alice.it>
    Envelope-to: rms@gnu.org
    Delivery-date: Sun, 28 Sep 2008 14:33:13 -0400
    Received: from mx10.gnu.org ([199.232.76.166]:39989)
	    by fencepost.gnu.org with esmtp (Exim 4.67)
	    (envelope-from <dorascilipoti@alice.it>)
	    id 1Kk14z-0002qJ-3Q
	    for rms@gnu.org; Sun, 28 Sep 2008 14:33:13 -0400
    Received: from Debian-exim by monty-python.gnu.org with spam-scanned (Exim 4.60)
	    (envelope-from <dorascilipoti@alice.it>)
	    id 1Kk171-0005Bt-Ma
	    for rms@gnu.org; Sun, 28 Sep 2008 14:35:22 -0400
    X-Spam-Checker-Version: SpamAssassin 3.1.0 (2005-09-13) on monty-python
    X-Spam-Level: 
    X-Spam-Status: No, score=-0.6 required=5.0 tests=AWL,BAYES_00,
	    DNS_FROM_RFC_ABUSE,DNS_FROM_RFC_POST autolearn=no version=3.1.0
    Received: from smtp-out28.alice.it ([85.33.2.28]:4844)
	    by monty-python.gnu.org with esmtp (Exim 4.60)
	    (envelope-from <dorascilipoti@alice.it>)
	    id 1Kk171-0005B3-6H
	    for rms@gnu.org; Sun, 28 Sep 2008 14:35:19 -0400
    Received: from FBCMMO02.fbc.local ([192.168.68.196]) by smtp-out28.alice.it with Microsoft SMTPSVC(6.0.3790.1830);
	     Sun, 28 Sep 2008 20:35:17 +0200
    Received: from FBCMCL01B03.fbc.local ([192.168.69.84]) by FBCMMO02.fbc.local with Microsoft SMTPSVC(6.0.3790.1830);
	     Sun, 28 Sep 2008 20:35:17 +0200
    Received: from [192.168.1.101] ([79.16.197.32]) by FBCMCL01B03.fbc.local with Microsoft SMTPSVC(6.0.3790.1830);
	     Sun, 28 Sep 2008 20:35:14 +0200
    Subject: Re: Hamaca
    From: Dora Scilipoti <dorascilipoti@alice.it>
    Reply-To: dorascilipoti@alice.it
    To: rms@gnu.org
    Cc: dorascilipoti@gmail.com
    In-Reply-To: <E1KjzUp-0002Ad-Un@fencepost.gnu.org>
    References: <A071D5B468700B439E1980A52A399B5D01ED0F83@FBCMST05V06.fbc.local>
	     <E1KjzUp-0002Ad-Un@fencepost.gnu.org>
    Content-Type: text/plain; charset=utf-8
    Date: Sun, 28 Sep 2008 20:37:59 +0200
    Message-Id: <1222627079.4471.16.camel@Osiris>
    Mime-Version: 1.0
    X-Mailer: Evolution 2.10.3 
    Content-Transfer-Encoding: quoted-printable
    X-OriginalArrivalTime: 28 Sep 2008 18:35:15.0239 (UTC) FILETIME=[F3002B70:01C92198]
    X-detected-operating-system: by monty-python.gnu.org: Windows 2000 SP4, XP SP1+


    > Una vez entrada como root, puedes hacer `su dora' para cambiar a la
    > cuenta `dora'.

    Claro, pero el problema es que no puedo lanzar ning=C3=BAn programa desde m=
    i
    cuenta de usuario:=20

    Xlib: connection to ":0.0" refused by server
    Xlib: No protocol specified

    He encontrado una soluci=C3=B3n parcial y provisoria copiando el
    directorio /home/dora/.evolution en /root/.evolution, de esta manera
    puedo ver mis correos anteriores.



In GNU Emacs 23.0.60.3 (mipsel-unknown-linux-gnu, GTK+ Version 2.12.11)
 of 2008-09-25 on lemote-menglan
configured using `configure  'CFLAGS=-O0 -g -Wno-pointer-sign' 'mipsel-unknown-linux-gnu' 'build_alias=mipsel-unknown-linux-gnu' 'host_alias=mipsel-unknown-linux-gnu' 'target_alias=mipsel-unknown-linux-gnu''

Important settings:
  value of $LC_ALL: nil
  value of $LC_COLLATE: nil
  value of $LC_CTYPE: nil
  value of $LC_MESSAGES: nil
  value of $LC_MONETARY: nil
  value of $LC_NUMERIC: nil
  value of $LC_TIME: nil
  value of $LANG: en_US.UTF-8
  value of $XMODIFIERS: nil
  locale-coding-system: utf-8-unix
  default-enable-multibyte-characters: t

Major mode: Mail

Minor modes in effect:
  shell-dirtrack-mode: t
  gpm-mouse-mode: t
  tooltip-mode: t
  menu-bar-mode: t
  file-name-shadow-mode: t
  global-font-lock-mode: t
  font-lock-mode: t
  global-auto-composition-mode: t
  auto-composition-mode: t
  auto-encryption-mode: t
  auto-compression-mode: t
  line-number-mode: t
  transient-mark-mode: t
  abbrev-mode: t

Recent input:
1 . RET C-c C-c C-d x r C-x o C-u C-n C-u C-n C-n C-@ 
C-u C-n C-n C-n C-n C-n ESC , RET P o k DEL DEL DEL 
P e DEL l e C-a C-k P l DEL O k , SPC p l e a s e SPC 
d SPC s o . C-a C-d ESC f ESC f ESC f o C-c C-c C-u 
C-p C-u C-p C-@ C-u C-n ESC w C-x C-f r e s e / DEL 
DEL u DEL DEL u s e / i t - i s TAB RET C-v C-v C-u 
C-u C-n C-u C-n C-p C-u C-y RET C-d C-d C-d C-d C-d 
C-d C-n C-u C-d C-d C-n C-u C-d C-d C-d C-p C-d C-n 
C-n C-u C-d C-d C-d ESC q C-x C-s C-p C-p C-p C-p C-@ 
C-u C-n C-n ESC w C-x C-f n e w - m a TAB RET ESC > 
C-u C-u C-p C-u C-p C-p C-u C-p C-p C-y C-x C-s C-x 
b R TAB RET C-d x C-d x SPC ESC v SPC o f l a s h TAB 
RET C-d x C-d C-d C-d C-d C-d x C-d C-d x C-x C-s p 
C-d C-d x C-d C-d x o d o r a TAB RET C-x C-f n o ESC 
b i C-e o u t DEL DEL DEL u / DEL t / r m s 0 6 3 DEL 
TAB 2 RET C-s R e : SPC h a DEL DEL H a m a C-s C-a 
C-u C-u C-p C-u C-p C-u C-p C-u C-p C-u C-p C-@ C-v 
C-u C-u C-n C-u C-n C-n ESC w C-x 4 m ESC x r e p o 
r t SPC e TAB RET

Recent messages:
Expunging deleted messages...done
Saving file /home/rms/RMAIL...
Wrote /home/rms/RMAIL
Expunging deleted messages...done
Expunging deleted messages...done
Added to /home/rms/xmail/dora.xmail
Making completion list...
Mark saved where search started
Mark set
Saved text from "From dorascilipoti@alice.it  Sun Sep 28 "






^ permalink raw reply	[flat|nested] 20+ messages in thread

* bug#1051: 23.0.60; rmail decoding bug
  2008-09-29 17:13 bug#1051: 23.0.60; rmail decoding bug Richard M. Stallman
@ 2008-09-29 19:03 ` Eli Zaretskii
  2008-09-30  4:55   ` Richard M. Stallman
  0 siblings, 1 reply; 20+ messages in thread
From: Eli Zaretskii @ 2008-09-29 19:03 UTC (permalink / raw)
  To: rms, 1051; +Cc: emacs-pretest-bug, bug-gnu-emacs

> From: "Richard M. Stallman" <rms@gnu.org>
> Date: Mon, 29 Sep 2008 13:13:16 -0400
> Cc: 
> 
> Rmail decodes this message (and many others like it)
> incorrectly.  Each pair of quoted-printable characters is supposed
> to convert to one character in the Emacs buffer, but instead
> it shows up as two.

I cannot reproduce this bug, neither in Emacs 23.0.60 built from
today's morning CVS trunk, nor in Emacs 22.3.  I see a single
character for each of these pairs.

Do you see the same problem in "emacs -Q"?







^ permalink raw reply	[flat|nested] 20+ messages in thread

* bug#1051: 23.0.60; rmail decoding bug
  2008-09-29 19:03 ` Eli Zaretskii
@ 2008-09-30  4:55   ` Richard M. Stallman
  2008-09-30  7:32     ` Eli Zaretskii
  0 siblings, 1 reply; 20+ messages in thread
From: Richard M. Stallman @ 2008-09-30  4:55 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-pretest-bug, bug-gnu-emacs, 1051

    I cannot reproduce this bug, neither in Emacs 23.0.60 built from
    today's morning CVS trunk, nor in Emacs 22.3.  I see a single
    character for each of these pairs.

    Do you see the same problem in "emacs -Q"?

I've discovered that the problem does not happen when I visit that
message directly with C-u M-x rmail RET FILENAME RET.

It does happen when I use C-u g FILENAME to get that message
as new mail into my RMAIL file.

Both cases are the same with and without -Q.

This unfortunately leaves me with no test case I can send.





^ permalink raw reply	[flat|nested] 20+ messages in thread

* bug#1051: 23.0.60; rmail decoding bug
  2008-09-30  4:55   ` Richard M. Stallman
@ 2008-09-30  7:32     ` Eli Zaretskii
  2008-09-30  8:40       ` Eli Zaretskii
  0 siblings, 1 reply; 20+ messages in thread
From: Eli Zaretskii @ 2008-09-30  7:32 UTC (permalink / raw)
  To: rms; +Cc: emacs-pretest-bug, bug-gnu-emacs, 1051

> From: "Richard M. Stallman" <rms@gnu.org>
> CC: 1051@emacsbugs.donarmstrong.com, emacs-pretest-bug@gnu.org,
> 	bug-gnu-emacs@gnu.org
> Date: Tue, 30 Sep 2008 00:55:08 -0400
> 
> I've discovered that the problem does not happen when I visit that
> message directly with C-u M-x rmail RET FILENAME RET.
> 
> It does happen when I use C-u g FILENAME to get that message
> as new mail into my RMAIL file.
> 
> Both cases are the same with and without -Q.
> 
> This unfortunately leaves me with no test case I can send.

I can reproduce this by reading the test case you sent twice: once
with "C-u M-x rmail", then with "C-u g".  The second time I get the
message displayed incorrectly.

I will debug this and see what I find.






^ permalink raw reply	[flat|nested] 20+ messages in thread

* bug#1051: 23.0.60; rmail decoding bug
  2008-09-30  7:32     ` Eli Zaretskii
@ 2008-09-30  8:40       ` Eli Zaretskii
  2008-09-30 10:59         ` Kenichi Handa
  0 siblings, 1 reply; 20+ messages in thread
From: Eli Zaretskii @ 2008-09-30  8:40 UTC (permalink / raw)
  To: 1051; +Cc: emacs-pretest-bug, bug-gnu-emacs, rms, handa

> Date: Tue, 30 Sep 2008 10:32:08 +0300
> From: Eli Zaretskii <eliz@gnu.org>
> Cc: emacs-pretest-bug@gnu.org, bug-gnu-emacs@gnu.org,
> 	1051@emacsbugs.donarmstrong.com
> 
> I can reproduce this by reading the test case you sent twice: once
> with "C-u M-x rmail", then with "C-u g".  The second time I get the
> message displayed incorrectly.
> 
> I will debug this and see what I find.

The problem is within mail-unquote-printable-region: it relies on
insert-char to insert a unibyte character, even if the target buffer
is a multibyte buffer.  In Emacs 22.x this works, but not in Emacs 23.

Perhaps Handa-san can suggest what is the best way of inserting
unibyte characters into a multibyte buffer in Emacs 23.  Obviously,
insert-file-contents does that when coding-system-for-read is bound to
no-conversion.








^ permalink raw reply	[flat|nested] 20+ messages in thread

* bug#1051: 23.0.60; rmail decoding bug
  2008-09-30  8:40       ` Eli Zaretskii
@ 2008-09-30 10:59         ` Kenichi Handa
  2008-09-30 12:00           ` Eli Zaretskii
  0 siblings, 1 reply; 20+ messages in thread
From: Kenichi Handa @ 2008-09-30 10:59 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-pretest-bug, bug-gnu-emacs, 1051, rms

In article <uljxajac7.fsf@gnu.org>, Eli Zaretskii <eliz@gnu.org> writes:

> The problem is within mail-unquote-printable-region: it relies on
> insert-char to insert a unibyte character, even if the target buffer
> is a multibyte buffer.  In Emacs 22.x this works, but not in Emacs 23.

> Perhaps Handa-san can suggest what is the best way of inserting
> unibyte characters into a multibyte buffer in Emacs 23.  Obviously,
> insert-file-contents does that when coding-system-for-read is bound to
> no-conversion.

The Lisp API for that is insert-byte.

By the way, we still don't have a proper API for reading an
eight-bit character as byte.  What we can do now for that is
something like these:
  (multibyte-char-to-unibyte (char-after POS))
or
  (encode-char (char-after POS) 'eight-bit)

It may be good to provide byte-after, following-byte, and
preceding-byte (all signal an error if the character is not
an ASCII nor eight-bit character).  What do you think?

---
Kenichi Handa
handa@ni.aist.go.jp







^ permalink raw reply	[flat|nested] 20+ messages in thread

* bug#1051: 23.0.60; rmail decoding bug
  2008-09-30 10:59         ` Kenichi Handa
@ 2008-09-30 12:00           ` Eli Zaretskii
  2008-09-30 23:30             ` Richard M. Stallman
  2008-10-01  0:29             ` Kenichi Handa
  0 siblings, 2 replies; 20+ messages in thread
From: Eli Zaretskii @ 2008-09-30 12:00 UTC (permalink / raw)
  To: Kenichi Handa; +Cc: emacs-pretest-bug, bug-gnu-emacs, 1051, rms

> From: Kenichi Handa <handa@m17n.org>
> CC: 1051@emacsbugs.donarmstrong.com, rms@gnu.org, emacs-pretest-bug@gnu.org,
>         bug-gnu-emacs@gnu.org
> Date: Tue, 30 Sep 2008 19:59:08 +0900
> 
> In article <uljxajac7.fsf@gnu.org>, Eli Zaretskii <eliz@gnu.org> writes:
> 
> > The problem is within mail-unquote-printable-region: it relies on
> > insert-char to insert a unibyte character, even if the target buffer
> > is a multibyte buffer.  In Emacs 22.x this works, but not in Emacs 23.
> 
> > Perhaps Handa-san can suggest what is the best way of inserting
> > unibyte characters into a multibyte buffer in Emacs 23.  Obviously,
> > insert-file-contents does that when coding-system-for-read is bound to
> > no-conversion.
> 
> The Lisp API for that is insert-byte.

Thanks, this indeed fixes the problem.

Richard, please see if the patch below fixes the problem for you as
well.

> It may be good to provide byte-after, following-byte, and
> preceding-byte (all signal an error if the character is not
> an ASCII nor eight-bit character).  What do you think?

I agree that it would be nice to have such a feature, but perhaps a
single API

  (get-byte POS)

would be enough?  This could default to point if POS is nil or
omitted, and could even read from a string if POS is a string.


2008-09-30  Eli Zaretskii  <eliz@gnu.org>

	* mail/mail-utils.el (mail-unquote-printable-region): Use
	insert-byte instead of insert-char, when the UNIBYTE arg is
	non-nil.

Index: lisp/mail/mail-utils.el
===================================================================
RCS file: /cvsroot/emacs/emacs/lisp/mail/mail-utils.el,v
retrieving revision 1.65
retrieving revision 1.66
diff -c -r1.65 -r1.66
*** lisp/mail/mail-utils.el	6 May 2008 07:22:25 -0000	1.65
--- lisp/mail/mail-utils.el	30 Sep 2008 11:53:21 -0000	1.66
***************
*** 141,148 ****
  		     (if unibyte
  			 (progn
  			   (replace-match "")
! 			   ;; insert-char will insert this as unibyte,
! 			   (insert-char char 1))
  		       (replace-match (make-string 1 char) t t))))
  		  (noerror
  		   (setq failed t))
--- 141,149 ----
  		     (if unibyte
  			 (progn
  			   (replace-match "")
! 			   ;; insert-byte will insert this as a
! 			   ;; corresponding eight-bit character.
! 			   (insert-byte char 1))
  		       (replace-match (make-string 1 char) t t))))
  		  (noerror
  		   (setq failed t))







^ permalink raw reply	[flat|nested] 20+ messages in thread

* bug#1051: 23.0.60; rmail decoding bug
  2008-09-30 12:00           ` Eli Zaretskii
@ 2008-09-30 23:30             ` Richard M. Stallman
  2008-10-01  0:29             ` Kenichi Handa
  1 sibling, 0 replies; 20+ messages in thread
From: Richard M. Stallman @ 2008-09-30 23:30 UTC (permalink / raw)
  To: Eli Zaretskii, 1051; +Cc: emacs-pretest-bug, handa

    Richard, please see if the patch below fixes the problem for you as
    well.

Yes, it's fixed now.






^ permalink raw reply	[flat|nested] 20+ messages in thread

* bug#1051: 23.0.60; rmail decoding bug
  2008-09-30 12:00           ` Eli Zaretskii
  2008-09-30 23:30             ` Richard M. Stallman
@ 2008-10-01  0:29             ` Kenichi Handa
  2008-10-01  8:14               ` get-byte (was: bug#1051: 23.0.60; rmail decoding bug) Eli Zaretskii
  1 sibling, 1 reply; 20+ messages in thread
From: Kenichi Handa @ 2008-10-01  0:29 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-pretest-bug, bug-gnu-emacs, 1051, rms

In article <uiqsdkfmo.fsf@gnu.org>, Eli Zaretskii <eliz@gnu.org> writes:

> > It may be good to provide byte-after, following-byte, and
> > preceding-byte (all signal an error if the character is not
> > an ASCII nor eight-bit character).  What do you think?

> I agree that it would be nice to have such a feature, but perhaps a
> single API

>   (get-byte POS)

> would be enough?  This could default to point if POS is nil or
> omitted, and could even read from a string if POS is a string.

Ah!  How about something like this?

(defun get-byte (pos &optional string)
  "Return a byte at position POS of the current buffer..
If POS is nil, it defaults to point.
If the second optional arg STRING is non-nil, return a byte in
STRING at index POS.
An error is signaled if the character at POS is not ASCII
nor eight-bit character."
  ...)

---
Kenichi Handa
handa@ni.aist.go.jp






^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: get-byte (was: bug#1051: 23.0.60; rmail decoding bug)
  2008-10-01  0:29             ` Kenichi Handa
@ 2008-10-01  8:14               ` Eli Zaretskii
  2008-10-01 13:36                 ` get-byte Stefan Monnier
  0 siblings, 1 reply; 20+ messages in thread
From: Eli Zaretskii @ 2008-10-01  8:14 UTC (permalink / raw)
  To: Kenichi Handa; +Cc: emacs-devel

> From: Kenichi Handa <handa@m17n.org>
> CC: 1051@emacsbugs.donarmstrong.com, rms@gnu.org, emacs-pretest-bug@gnu.org,
>         bug-gnu-emacs@gnu.org
> Date: Wed, 01 Oct 2008 09:29:41 +0900
> 
> Ah!  How about something like this?
> 
> (defun get-byte (pos &optional string)
>   "Return a byte at position POS of the current buffer..
> If POS is nil, it defaults to point.
> If the second optional arg STRING is non-nil, return a byte in
> STRING at index POS.
> An error is signaled if the character at POS is not ASCII
> nor eight-bit character."
>   ...)

Yes, that's what I had in mind.




^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: get-byte
  2008-10-01  8:14               ` get-byte (was: bug#1051: 23.0.60; rmail decoding bug) Eli Zaretskii
@ 2008-10-01 13:36                 ` Stefan Monnier
  2008-10-01 17:10                   ` get-byte Eli Zaretskii
  0 siblings, 1 reply; 20+ messages in thread
From: Stefan Monnier @ 2008-10-01 13:36 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel, Kenichi Handa

>> Ah!  How about something like this?
>> 
>> (defun get-byte (pos &optional string)
>> "Return a byte at position POS of the current buffer..
>> If POS is nil, it defaults to point.
>> If the second optional arg STRING is non-nil, return a byte in
>> STRING at index POS.
>> An error is signaled if the character at POS is not ASCII
>> nor eight-bit character."
>> ...)

> Yes, that's what I had in mind.

It really seems to me that encode-char is a better solution.  More to
the point it's the most natural and flexible solution.  E.g. it's
trivial to write get-byte using encode-char, whereas it's much less
trivial to write encode-char with get-byte.


        Stefan




^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: get-byte
  2008-10-01 13:36                 ` get-byte Stefan Monnier
@ 2008-10-01 17:10                   ` Eli Zaretskii
  2008-11-08 13:12                     ` get-byte Kenichi Handa
  0 siblings, 1 reply; 20+ messages in thread
From: Eli Zaretskii @ 2008-10-01 17:10 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: emacs-devel, handa

> From: Stefan Monnier <monnier@iro.umontreal.ca>
> Cc: Kenichi Handa <handa@m17n.org>,  emacs-devel@gnu.org
> Date: Wed, 01 Oct 2008 09:36:29 -0400
> 
> >> Ah!  How about something like this?
> >> 
> >> (defun get-byte (pos &optional string)
> >> "Return a byte at position POS of the current buffer..
> >> If POS is nil, it defaults to point.
> >> If the second optional arg STRING is non-nil, return a byte in
> >> STRING at index POS.
> >> An error is signaled if the character at POS is not ASCII
> >> nor eight-bit character."
> >> ...)
> 
> > Yes, that's what I had in mind.
> 
> It really seems to me that encode-char is a better solution.  More to
> the point it's the most natural and flexible solution.  E.g. it's
> trivial to write get-byte using encode-char, whereas it's much less
> trivial to write encode-char with get-byte.

Then let's write get-byte using encode-char.  I don't care how trivial
is it (I think it isn't, not unless you know very well how raw bytes
are handled in Emacs buffers and strings), I think we do need such an
API.




^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: get-byte
  2008-10-01 17:10                   ` get-byte Eli Zaretskii
@ 2008-11-08 13:12                     ` Kenichi Handa
  2008-11-08 19:27                       ` get-byte Eli Zaretskii
  2008-11-09  2:28                       ` get-byte Stefan Monnier
  0 siblings, 2 replies; 20+ messages in thread
From: Kenichi Handa @ 2008-11-08 13:12 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: monnier, emacs-devel

In article <utzbwi6mb.fsf@gnu.org>, Eli Zaretskii <eliz@gnu.org> writes:

> Then let's write get-byte using encode-char.  I don't care how trivial
> is it (I think it isn't, not unless you know very well how raw bytes
> are handled in Emacs buffers and strings), I think we do need such an
> API.

I've just installed get-byte.

----------------------------------------------------------------------
(get-byte &optional POSITION STRING)

Return a byte value of a character at point.
Optional 1st arg POSITION, if non-nil, is a position of a character to get
a byte value.
Optional 2nd arg STRING, if non-nil, is a string of which first
character is a target to get a byte value.  In this case, POSITION, if
non-nil, is an index of a target character in the string.
----------------------------------------------------------------------

I wrote it in C because, I think it must run very fast in
the situaiont when this function is called.

---
Kenichi Handa
handa@ni.aist.go.jp




^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: get-byte
  2008-11-08 13:12                     ` get-byte Kenichi Handa
@ 2008-11-08 19:27                       ` Eli Zaretskii
  2008-11-10  1:06                         ` get-byte Kenichi Handa
  2008-11-09  2:28                       ` get-byte Stefan Monnier
  1 sibling, 1 reply; 20+ messages in thread
From: Eli Zaretskii @ 2008-11-08 19:27 UTC (permalink / raw)
  To: Kenichi Handa; +Cc: monnier, emacs-devel

> From: Kenichi Handa <handa@m17n.org>
> CC: monnier@iro.umontreal.ca, emacs-devel@gnu.org
> Date: Sat, 08 Nov 2008 22:12:02 +0900
> 
> In article <utzbwi6mb.fsf@gnu.org>, Eli Zaretskii <eliz@gnu.org> writes:
> 
> > Then let's write get-byte using encode-char.  I don't care how trivial
> > is it (I think it isn't, not unless you know very well how raw bytes
> > are handled in Emacs buffers and strings), I think we do need such an
> > API.
> 
> I've just installed get-byte.

Thank you.




^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: get-byte
  2008-11-08 13:12                     ` get-byte Kenichi Handa
  2008-11-08 19:27                       ` get-byte Eli Zaretskii
@ 2008-11-09  2:28                       ` Stefan Monnier
  2008-11-10  2:20                         ` get-byte Kenichi Handa
  1 sibling, 1 reply; 20+ messages in thread
From: Stefan Monnier @ 2008-11-09  2:28 UTC (permalink / raw)
  To: Kenichi Handa; +Cc: Eli Zaretskii, emacs-devel

>> Then let's write get-byte using encode-char.  I don't care how trivial
>> is it (I think it isn't, not unless you know very well how raw bytes
>> are handled in Emacs buffers and strings), I think we do need such an API.
> I've just installed get-byte.

As already mentioned I think the important function to provide is
`encode-char' for that functionality.  Yes, I see you provide `get-byte'
but can encode-char be used for it now?  If so how?

> I wrote it in C because, I think it must run very fast in
> the situaiont when this function is called.

Currently I don't see it being used.  Where is it going to be used?


        Stefan




^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: get-byte
  2008-11-08 19:27                       ` get-byte Eli Zaretskii
@ 2008-11-10  1:06                         ` Kenichi Handa
  0 siblings, 0 replies; 20+ messages in thread
From: Kenichi Handa @ 2008-11-10  1:06 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: monnier, emacs-devel

In article <uskq2qaqh.fsf@gnu.org>, Eli Zaretskii <eliz@gnu.org> writes:

> > In article <utzbwi6mb.fsf@gnu.org>, Eli Zaretskii <eliz@gnu.org> writes:
> > 
> > > Then let's write get-byte using encode-char.  I don't care how trivial
> > > is it (I think it isn't, not unless you know very well how raw bytes
> > > are handled in Emacs buffers and strings), I think we do need such an
> > > API.
> > 
> > I've just installed get-byte.

> Thank you.

I found that it doesn't work on unibyte buffer.  I've just
installed a fix.

---
Kenichi Handa
handa@ni.aist.go.jp




^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: get-byte
  2008-11-09  2:28                       ` get-byte Stefan Monnier
@ 2008-11-10  2:20                         ` Kenichi Handa
  2008-11-10  3:30                           ` get-byte Stefan Monnier
  0 siblings, 1 reply; 20+ messages in thread
From: Kenichi Handa @ 2008-11-10  2:20 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: eliz, emacs-devel

In article <jwvej1l8wkj.fsf-monnier+emacs@gnu.org>, Stefan Monnier <monnier@iro.umontreal.ca> writes:

>>> Then let's write get-byte using encode-char.  I don't care how trivial
>>> is it (I think it isn't, not unless you know very well how raw bytes
>>> are handled in Emacs buffers and strings), I think we do need such an API.
> > I've just installed get-byte.

> As already mentioned I think the important function to provide is
> `encode-char' for that functionality.  Yes, I see you provide `get-byte'
> but can encode-char be used for it now?  If so how?

Yes, we can use encode-char to implement get-byte as this:

(defun get-byte (&optional pos string)
  (let ((multibyte (if string (multibyte-string-p string)
		     enable-multibyte-characters))
	(ch (if string (aref string (or pos 0)) 
	      (char-after (or pos (point))))))
    (if (< ch #x80)
	ch
      (if multibyte
	  (or (encode-char ch 'eight-bit)
	      (error "Not an ASCII nor an 8-bit character: %d" ch))
	ch))))

But it's 5 to 10 times slower than the C version.

> > I wrote it in C because, I think it must run very fast in
> > the situaiont when this function is called.

> Currently I don't see it being used.  Where is it going to be used?

At everywhere you want to play with binary data that is
stored in a multibyte buffer/string.  By grepping
multibyte-char-to-unibyte, I found these places;
quoted-printable-encode-region, ctext-post-read-conversion.
It seems that arc-mode should also use it unless it is
re-written to use buffer-swap-text as tar-mode.

---
Kenichi Handa
handa@ni.aist.go.jp




^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: get-byte
  2008-11-10  2:20                         ` get-byte Kenichi Handa
@ 2008-11-10  3:30                           ` Stefan Monnier
  2008-11-10  5:02                             ` get-byte Kenichi Handa
  0 siblings, 1 reply; 20+ messages in thread
From: Stefan Monnier @ 2008-11-10  3:30 UTC (permalink / raw)
  To: Kenichi Handa; +Cc: eliz, emacs-devel

> Yes, we can use encode-char to implement get-byte as this:

> (defun get-byte (&optional pos string)
>   (let ((multibyte (if string (multibyte-string-p string)
> 		     enable-multibyte-characters))
> 	(ch (if string (aref string (or pos 0)) 
> 	      (char-after (or pos (point))))))
>     (if (< ch #x80)
> 	ch
>       (if multibyte
> 	  (or (encode-char ch 'eight-bit)
> 	      (error "Not an ASCII nor an 8-bit character: %d" ch))
> 	ch))))

> But it's 5 to 10 times slower than the C version.

I'm not opposed to `get-byte' being implemented in C.  I just think it
should be implementable as

  (defun get-byte (&optional pos string)
    (let ((ch (if string (aref string (or pos 0))
                (char-after pos))))
      (or (encode-char ch 'binary)
          (error "Not an ASCII nor an 8-bit character: %d" ch))))

Given that, in most cases where you'd use get-byte you could replace it
with either (encode-char (char-after POS) 'binary)
or (encode-char (aref STRING POS) 'binary).  It may still be
significantly slower than a direct C implementation of get-byte, but it
is the right functionality to provide (i.e. get-byte is only there for
optimization purposes) and in some cases get-byte is not an option
(e.g. in cases such as (mapcar (lambda (c) (... (encode-char c 'binary)
..)) <string>)).

>> > I wrote it in C because, I think it must run very fast in
>> > the situaiont when this function is called.
>> Currently I don't see it being used.  Where is it going to be used?
> At everywhere you want to play with binary data that is
> stored in a multibyte buffer/string.  By grepping
> multibyte-char-to-unibyte, I found these places;
> quoted-printable-encode-region, ctext-post-read-conversion.

I see.

> It seems that arc-mode should also use it unless it is
> re-written to use buffer-swap-text as tar-mode.

This one needs to be converted to use buffer-swap-text indeed.


        Stefan




^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: get-byte
  2008-11-10  3:30                           ` get-byte Stefan Monnier
@ 2008-11-10  5:02                             ` Kenichi Handa
  2008-11-10 14:57                               ` get-byte Stefan Monnier
  0 siblings, 1 reply; 20+ messages in thread
From: Kenichi Handa @ 2008-11-10  5:02 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: eliz, emacs-devel

In article <jwvy6zsnu5z.fsf-monnier+emacs@gnu.org>, Stefan Monnier <monnier@iro.umontreal.ca> writes:

> I'm not opposed to `get-byte' being implemented in C.  I just think it
> should be implementable as

>   (defun get-byte (&optional pos string)
>     (let ((ch (if string (aref string (or pos 0))
>                 (char-after pos))))
>       (or (encode-char ch 'binary)
>           (error "Not an ASCII nor an 8-bit character: %d" ch))))

`binary' is not a character set, and even if you change it
to `eight-bit', it can't be used for unibyte buffer/string
because, for instance, (encode-char 128 'eight-bit) is nil.

Or, are you proposing to create `binary' charset to make the
above work?

> Given that, in most cases where you'd use get-byte you could replace it
> with either (encode-char (char-after POS) 'binary)
> or (encode-char (aref STRING POS) 'binary).  It may still be
> significantly slower than a direct C implementation of get-byte, but it
> is the right functionality to provide (i.e. get-byte is only there for
> optimization purposes) and in some cases get-byte is not an option
> (e.g. in cases such as (mapcar (lambda (c) (... (encode-char c 'binary)
> ..)) <string>)).

Do you have a concrete example in which you need some result
as a list?  I think, most of the case, we don't need such a
list, and thus we can have this version:
  (dotimes (i (length <string>))
    (... (get-byte i string) ...))

---
Kenichi Handa
handa@ni.aist.go.jp




^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: get-byte
  2008-11-10  5:02                             ` get-byte Kenichi Handa
@ 2008-11-10 14:57                               ` Stefan Monnier
  0 siblings, 0 replies; 20+ messages in thread
From: Stefan Monnier @ 2008-11-10 14:57 UTC (permalink / raw)
  To: Kenichi Handa; +Cc: eliz, emacs-devel

> Or, are you proposing to create `binary' charset to make the
> above work?

Yes, that's what I'm been suggesting all along.

> Do you have a concrete example in which you need some result
> as a list?  I think, most of the case, we don't need such a
> list, and thus we can have this version:
>   (dotimes (i (length <string>))
>     (... (get-byte i string) ...))

Of course, there's probably always some way around the problem, we have
a Turing-complete language after all.  But it's pretty clear that
encode-char is the fundamental functionality, whereas get-byte is just
a layer on top of it that catches/optimizes the most common cases.


        Stefan




^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2008-11-10 14:57 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-09-29 17:13 bug#1051: 23.0.60; rmail decoding bug Richard M. Stallman
2008-09-29 19:03 ` Eli Zaretskii
2008-09-30  4:55   ` Richard M. Stallman
2008-09-30  7:32     ` Eli Zaretskii
2008-09-30  8:40       ` Eli Zaretskii
2008-09-30 10:59         ` Kenichi Handa
2008-09-30 12:00           ` Eli Zaretskii
2008-09-30 23:30             ` Richard M. Stallman
2008-10-01  0:29             ` Kenichi Handa
2008-10-01  8:14               ` get-byte (was: bug#1051: 23.0.60; rmail decoding bug) Eli Zaretskii
2008-10-01 13:36                 ` get-byte Stefan Monnier
2008-10-01 17:10                   ` get-byte Eli Zaretskii
2008-11-08 13:12                     ` get-byte Kenichi Handa
2008-11-08 19:27                       ` get-byte Eli Zaretskii
2008-11-10  1:06                         ` get-byte Kenichi Handa
2008-11-09  2:28                       ` get-byte Stefan Monnier
2008-11-10  2:20                         ` get-byte Kenichi Handa
2008-11-10  3:30                           ` get-byte Stefan Monnier
2008-11-10  5:02                             ` get-byte Kenichi Handa
2008-11-10 14:57                               ` get-byte Stefan Monnier

Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.