all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
* decode-char & utf-8-fragment-on-decoding
@ 2002-09-04  5:56 Thomas Morgan
  2002-09-04  8:18 ` Kenichi Handa
  0 siblings, 1 reply; 13+ messages in thread
From: Thomas Morgan @ 2002-09-04  5:56 UTC (permalink / raw)


decode-char does not honor utf-8-fragment-on-decoding.

I tried this code in
GNU Emacs 21.3.50.2 (i686-pc-linux-gnu, X toolkit, Xaw3d scroll bars)
of 2002-09-03 on cricket
run with options -q and --no-site-file.

  (let ((utf-8-fragment-on-decoding nil)
        (c ?Γ))
    (= c (decode-char 'ucs (encode-char c 'ucs))))

encode-char returns 915, decode-char returns 2883, and the entire sexp
evalutes nil.  The Unicode code point is translated into greek-iso8859-7
by decode-char even though utf-8-fragment-on-decoding is not enabled.

Is this a bug?  The following change makes decode-char act as I expected.

*** /src/emacs/lisp/international/mule.el.~1.159.~	Sat Aug 24 03:46:25 2002
--- /src/emacs/lisp/international/mule.el	Wed Sep  4 01:30:54 2002
***************
*** 331,337 ****
  	       (setq code-point (- code-point #xe000))
  	       (make-char 'mule-unicode-e000-ffff
  			  (+ (/ code-point 96) 32) (+ (% code-point 96) 32))))))
!       (if (and c (aref utf-8-translation-table-for-decode c))
  	  (aref utf-8-translation-table-for-decode c)
  	c)))))
  
--- 331,339 ----
  	       (setq code-point (- code-point #xe000))
  	       (make-char 'mule-unicode-e000-ffff
  			  (+ (/ code-point 96) 32) (+ (% code-point 96) 32))))))
!       (if (and c
! 	       utf-8-fragment-on-decoding
! 	       (aref utf-8-translation-table-for-decode c))
  	  (aref utf-8-translation-table-for-decode c)
  	c)))))
  

Diff finished at Wed Sep  4 01:31:04

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: decode-char & utf-8-fragment-on-decoding
  2002-09-04  5:56 decode-char & utf-8-fragment-on-decoding Thomas Morgan
@ 2002-09-04  8:18 ` Kenichi Handa
  2002-09-04 23:34   ` Dave Love
  0 siblings, 1 reply; 13+ messages in thread
From: Kenichi Handa @ 2002-09-04  8:18 UTC (permalink / raw)
  Cc: bug-gnu-emacs, d.love

Thomas Morgan <tlm@pocketmail.com> writes:
> decode-char does not honor utf-8-fragment-on-decoding.
> I tried this code in
> GNU Emacs 21.3.50.2 (i686-pc-linux-gnu, X toolkit, Xaw3d scroll bars)
> of 2002-09-03 on cricket
> run with options -q and --no-site-file.

>   (let ((utf-8-fragment-on-decoding nil)
>         (c ?Γ))
>     (= c (decode-char 'ucs (encode-char c 'ucs))))

> encode-char returns 915, decode-char returns 2883, and the entire sexp
> evalutes nil.  The Unicode code point is translated into greek-iso8859-7
> by decode-char even though utf-8-fragment-on-decoding is not enabled.

> Is this a bug?

The documetation of decode-char says that a character is
translated by utf-8-translation-table-for-decode (regardless
of utf-8-fragment-on-decoding).  Thus it's not a bug.  But,
I agree that this behavior is very confusing and not good.
And, I've recently found that utf-8 can't encode
cyrillic-iso8859-5 and greek-iso8859-7 correctly because of
this behavior.

> The following change makes decode-char act as I expected.

Thank you.  It seems to be the right fix.  I'll install it
soon.   Dave, do you see any problem with that?

---
Ken'ichi HANDA
handa@etl.go.jp

> *** /src/emacs/lisp/international/mule.el.~1.159.~	Sat Aug 24 03:46:25 2002
> --- /src/emacs/lisp/international/mule.el	Wed Sep  4 01:30:54 2002
> ***************
> *** 331,337 ****
>   	       (setq code-point (- code-point #xe000))
>   	       (make-char 'mule-unicode-e000-ffff
>   			  (+ (/ code-point 96) 32) (+ (% code-point 96) 32))))))
> !       (if (and c (aref utf-8-translation-table-for-decode c))
>   	  (aref utf-8-translation-table-for-decode c)
>   	c)))))
  
> --- 331,339 ----
>   	       (setq code-point (- code-point #xe000))
>   	       (make-char 'mule-unicode-e000-ffff
>   			  (+ (/ code-point 96) 32) (+ (% code-point 96) 32))))))
> !       (if (and c
> ! 	       utf-8-fragment-on-decoding
> ! 	       (aref utf-8-translation-table-for-decode c))
>   	  (aref utf-8-translation-table-for-decode c)
>   	c)))))
  

> Diff finished at Wed Sep  4 01:31:04



> _______________________________________________
> Bug-gnu-emacs mailing list
> Bug-gnu-emacs@gnu.org
> http://mail.gnu.org/mailman/listinfo/bug-gnu-emacs

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: decode-char & utf-8-fragment-on-decoding
  2002-09-04  8:18 ` Kenichi Handa
@ 2002-09-04 23:34   ` Dave Love
  2002-09-05  1:25     ` Kenichi Handa
  2002-09-05  5:23     ` Thomas Morgan
  0 siblings, 2 replies; 13+ messages in thread
From: Dave Love @ 2002-09-04 23:34 UTC (permalink / raw)
  Cc: tlm, bug-gnu-emacs

Kenichi Handa <handa@etl.go.jp> writes:

> Thank you.  It seems to be the right fix.  I'll install it
> soon.   Dave, do you see any problem with that?

Yes, I think it's wrong.  I think the only bug is that the doc string
of `utf-8-fragment-on-decoding' is missing the normal `setting it
directly does not take effect' text for a Custom option with a setter.
It has no effect on how decoding is done, and `decode-char' was meant
to be consistent with how utf-8 decoding actually works.

The translation table is always used -- CCL has no way to access Lisp
variables anyway.  It only matters how it's populated (empty by
default).

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: decode-char & utf-8-fragment-on-decoding
  2002-09-04 23:34   ` Dave Love
@ 2002-09-05  1:25     ` Kenichi Handa
  2002-09-05 22:45       ` Dave Love
  2002-09-05  5:23     ` Thomas Morgan
  1 sibling, 1 reply; 13+ messages in thread
From: Kenichi Handa @ 2002-09-05  1:25 UTC (permalink / raw)
  Cc: tlm, bug-gnu-emacs

In article <rzqd6rtuy5g.fsf@albion.dl.ac.uk>, Dave Love <d.love@dl.ac.uk> writes:
> Kenichi Handa <handa@etl.go.jp> writes:
>>  Thank you.  It seems to be the right fix.  I'll install it
>>  soon.   Dave, do you see any problem with that?

> Yes, I think it's wrong.  I think the only bug is that the doc string
> of `utf-8-fragment-on-decoding' is missing the normal `setting it
> directly does not take effect' text for a Custom option with a setter.
> It has no effect on how decoding is done, and `decode-char' was meant
> to be consistent with how utf-8 decoding actually works.

> The translation table is always used -- CCL has no way to access Lisp
> variables anyway.  It only matters how it's populated (empty by
> default).

I see your point.  Ok, I'll cancel the change, and add this
sentence:
	Setting this variable outside customize has no effect.
in the docstring of utf-8-fragment-on-decoding.

So, we should regard it a feature that
	(decode-char 'ucs (encode-char C 'ucs))
will not return C even if C belongs to one of mule-unicode-*
charsets.

---
Ken'ichi HANDA
handa@etl.go.jp

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: decode-char & utf-8-fragment-on-decoding
  2002-09-04 23:34   ` Dave Love
  2002-09-05  1:25     ` Kenichi Handa
@ 2002-09-05  5:23     ` Thomas Morgan
  2002-09-05 22:47       ` Dave Love
  1 sibling, 1 reply; 13+ messages in thread
From: Thomas Morgan @ 2002-09-05  5:23 UTC (permalink / raw)
  Cc: handa, bug-gnu-emacs

   [utf-8-fragment-on-decoding] has no effect on how decoding is done,
   and `decode-char' was meant to be consistent with how utf-8
   decoding actually works.

I understand now that utf-8-fragment-on-decoding has no direct effect
on decoding, but it is still not clear to me what indirect effect it is
supposed to have when it is set through Custom.  Right now it applies
to CCL programs, but not to decode-char.  Is that correct?

If so, perhaps that should be documented.  It's a rather confusing point
for me because I am not an expert; there are probably other non-experts
who would also find it confusing.

If it is not correct, however, how about making decode-char look for
utf-8-translation-table-for-decode within translation-table-vector
rather than accessing the Lisp variable directly?

The Custom set function for utf-8-fragment-on-decoding never changes
the variable utf-8-translation-table-for-decode, but it does change
the corresponding member of translation-table-vector by calling
define-translation-table.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: decode-char & utf-8-fragment-on-decoding
@ 2002-09-05 17:39 Thomas Morgan
  2002-09-05 22:50 ` Dave Love
  0 siblings, 1 reply; 13+ messages in thread
From: Thomas Morgan @ 2002-09-05 17:39 UTC (permalink / raw)
  Cc: handa, bug-gnu-emacs

I wrote:

   I understand now that utf-8-fragment-on-decoding has no direct effect
   on decoding, but it is still not clear to me what indirect effect it is
   supposed to have when it is set through Custom.  Right now it applies
   to CCL programs, but not to decode-char.  Is that correct?

   If so, perhaps that should be documented.

When I think about it more carefully, I see that decode-char's
doc string is unambiguous when it says "the result is translated
through the char table `utf-8-translation-table-for-decode'".

For the sake of careless readers like me, could something like this
be added after that sentence?

  (Note that this char table is independent of the translation table
  of the same name which CCL programs use, and it is not affected by
  the user's setting of utf-8-fragment-on-decoding.)

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: decode-char & utf-8-fragment-on-decoding
  2002-09-05  1:25     ` Kenichi Handa
@ 2002-09-05 22:45       ` Dave Love
  0 siblings, 0 replies; 13+ messages in thread
From: Dave Love @ 2002-09-05 22:45 UTC (permalink / raw)
  Cc: tlm, bug-gnu-emacs

Kenichi Handa <handa@etl.go.jp> writes:

> So, we should regard it a feature that
> 	(decode-char 'ucs (encode-char C 'ucs))
> will not return C even if C belongs to one of mule-unicode-*
> charsets.

[... when the translation table is populated.]

If you think that's wrong, feel free to change it.  I don't feel
strongly about it.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: decode-char & utf-8-fragment-on-decoding
  2002-09-05  5:23     ` Thomas Morgan
@ 2002-09-05 22:47       ` Dave Love
  2002-09-06  1:13         ` Thomas Morgan
  0 siblings, 1 reply; 13+ messages in thread
From: Dave Love @ 2002-09-05 22:47 UTC (permalink / raw)
  Cc: handa, bug-gnu-emacs

Thomas Morgan <tlm@pocketmail.com> writes:

> I understand now that utf-8-fragment-on-decoding has no direct effect
> on decoding, but it is still not clear to me what indirect effect it is
> supposed to have when it is set through Custom.

When you customize it, it runs setter code, the same way as global
minor mode customizations typically do.  See the code if you really
care.

> Right now it applies
> to CCL programs, but not to decode-char.  Is that correct?

I don't know what `it' refers to there.

> If so, perhaps that should be documented.  It's a rather confusing
> point for me because I am not an expert; there are probably other
> non-experts who would also find it confusing.

I don't know what you think needs documenting -- can you suggest text?
Custom is aimed at non-experts (though I use it almost exclusively).
As far as I can tell, the doc for the option says accurately what it
does and why (except that you could define arbitrary translations in
the table).

> If it is not correct, however, how about making decode-char look for
> utf-8-translation-table-for-decode within translation-table-vector
> rather than accessing the Lisp variable directly?

I don't understand.

> The Custom set function for utf-8-fragment-on-decoding never changes
> the variable utf-8-translation-table-for-decode, but it does change
> the corresponding member of translation-table-vector by calling
> define-translation-table.

Correct.

I'm afraid I don't understand what the problem is, perhaps because I
don't have enough context.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: decode-char & utf-8-fragment-on-decoding
  2002-09-05 17:39 Thomas Morgan
@ 2002-09-05 22:50 ` Dave Love
  0 siblings, 0 replies; 13+ messages in thread
From: Dave Love @ 2002-09-05 22:50 UTC (permalink / raw)
  Cc: handa, bug-gnu-emacs

Thomas Morgan <tlm@pocketmail.com> writes:

> For the sake of careless readers like me, could something like this
> be added after that sentence?
> 
>   (Note that this char table is independent of the translation table
>   of the same name which CCL programs use, and it is not affected by
>   the user's setting of utf-8-fragment-on-decoding.)

That seems to be the opposite of how it actually is.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: decode-char & utf-8-fragment-on-decoding
  2002-09-05 22:47       ` Dave Love
@ 2002-09-06  1:13         ` Thomas Morgan
  2002-09-07 23:14           ` Dave Love
  0 siblings, 1 reply; 13+ messages in thread
From: Thomas Morgan @ 2002-09-06  1:13 UTC (permalink / raw)
  Cc: handa, bug-gnu-emacs

   > Right now it applies
   > to CCL programs, but not to decode-char.  Is that correct?
   
   I don't know what `it' refers to there.

The option `utf-8-fragment-on-decoding'.

utf-8-fragment-on-decoding's Custom set function changes the element
of translation-table-vector named `utf-8-translation-table-for-decode'.
CCL programs use this same table from translation-table-vector.
Therefore, utf-8-fragment-on-decoding applies to CCL programs.

On the other hand, utf-8-fragment-on-decoding's Custom set function does
not change the value of the variable `utf-8-translation-table-for-decode'.
decode-char uses this variable's value.  Therefore,
utf-8-fragment-on-decoding does not apply to decode-char.

Is this because utf-8-fragment-on-decoding is meant only to apply
to user-level operations like finding a file or reading email,
but decode-char should be counted on always to return the same
result to Lisp programs even when user-level options change?

                                * * *

   > If it is not correct, however, how about making decode-char look for
   > utf-8-translation-table-for-decode within translation-table-vector
   > rather than accessing the Lisp variable directly?

   I don't understand.

This would be a change from the current behavior, and I'm not sure
whether it is right or wrong.  But when you said

   The translation table is always used -- CCL has no way to access Lisp
   variables anyway.  It only matters how it's populated (empty by
   default).

it seemed to me that decode-char should use the same translation table
that CCL uses.  I just learned that this table, as well as being found
in translation-table-vector, is also the value of the `translation-table'
property for the symbol `utf-8-translation-table-for-decode'.  It is not,
however, necessarily the value of that symbol's variable definition.

So instead of

   (if (and c (aref utf-8-translation-table-for-decode c))
       (aref utf-8-translation-table-for-decode c)
     c)))))

maybe decode-char could do something like

   (let ((table (get 'utf-8-translation-table-for-decode 'translation-table)))
     (if (and c table (aref table c))
         (aref table c)
       c))

                                * * *

   I don't know what you think needs documenting -- can you suggest text?

I realized that you were right to put the explanation in decode-char
rather than utf-8-fragment-on-decoding, and thought that the following
would make the situation more clear,

   >   (Note that this char table is independent of the translation table
   >   of the same name which CCL programs use, and it is not affected by
   >   the user's setting of utf-8-fragment-on-decoding.)

but I must have misunderstood the facts.

   That seems to be the opposite of how it actually is.

Could you explain how it actually is?

I'm sorry to take up your time with something that is surely not as
important as your other activities.  I would also like to express
as a user great thanks to you and all the Emacs developers!

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: decode-char & utf-8-fragment-on-decoding
  2002-09-06  1:13         ` Thomas Morgan
@ 2002-09-07 23:14           ` Dave Love
  2002-09-08  1:07             ` Thomas Morgan
  0 siblings, 1 reply; 13+ messages in thread
From: Dave Love @ 2002-09-07 23:14 UTC (permalink / raw)
  Cc: handa, bug-gnu-emacs

I'm sorry, I think I finally got it after confusion between two
different issues and paying insufficient attention.  Thanks for
persisting.

It's indeed a bug that

(eq utf-8-translation-table-for-decode
    (get 'utf-8-translation-table-for-decode 'translation-table))
  => nil

I think it's more consistent to fix the custom stuff to keep the
things in sync rather than just fixing decode-char, and there may be
similar problems elsewhere.  I'll look at fixing it.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: decode-char & utf-8-fragment-on-decoding
  2002-09-07 23:14           ` Dave Love
@ 2002-09-08  1:07             ` Thomas Morgan
  2002-09-09 22:35               ` Dave Love
  0 siblings, 1 reply; 13+ messages in thread
From: Thomas Morgan @ 2002-09-08  1:07 UTC (permalink / raw)
  Cc: handa, bug-gnu-emacs

   It's indeed a bug that
   
   (eq utf-8-translation-table-for-decode
       (get 'utf-8-translation-table-for-decode 'translation-table))
     => nil

I noticed this code in international/characters.el:
   
   (modify-category-entry (make-char 'greek-iso8859-7) ?g)
   (let ((c #x370))
     (while (<= c #x3ff)
       (modify-category-entry (decode-char 'ucs c) ?g)
       (setq c (1+ c))))

My understanding is that the first line is sufficient to put all
characters from the charset `greek-iso8859-7' into the category ?g.

Then the purpose of the remaining lines must be to add Greek Unicode
characters into the category ?g, but they do not do that; instead,
they add the corresponding characters from greek-iso8859-7 in again.

So it looks like this bug has at least one practical consequence:
Greek Unicode characters are not put into the category ?g.

It occurred to me that even after this bug is fixed, the problem
would remain if utf-8-fragment-on-decoding were enabled while this
code is executed.  However, this code is executed before the user
has a chance to enable the option, right?  So as long as fragmentation
is not the default, that will be ok.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: decode-char & utf-8-fragment-on-decoding
  2002-09-08  1:07             ` Thomas Morgan
@ 2002-09-09 22:35               ` Dave Love
  0 siblings, 0 replies; 13+ messages in thread
From: Dave Love @ 2002-09-09 22:35 UTC (permalink / raw)
  Cc: handa, bug-gnu-emacs

Thomas Morgan <tlm@pocketmail.com> writes:

> Then the purpose of the remaining lines must be to add Greek Unicode
> characters into the category ?g, but they do not do that; instead,
> they add the corresponding characters from greek-iso8859-7 in again.
> 
> So it looks like this bug has at least one practical consequence:
> Greek Unicode characters are not put into the category ?g.

Yes, that's something handa pointed out, and we'll deal with it.  I
hadn't spotted the problem since I've only been using similar code in
Emacs 21.2, not preloaded.  I don't think it's really a problem apart
from in the preloaded stuff.

I don't know how much testing the Mule changes I installed in the
development sources have got, so please carry on finding problems.

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2002-09-09 22:35 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2002-09-04  5:56 decode-char & utf-8-fragment-on-decoding Thomas Morgan
2002-09-04  8:18 ` Kenichi Handa
2002-09-04 23:34   ` Dave Love
2002-09-05  1:25     ` Kenichi Handa
2002-09-05 22:45       ` Dave Love
2002-09-05  5:23     ` Thomas Morgan
2002-09-05 22:47       ` Dave Love
2002-09-06  1:13         ` Thomas Morgan
2002-09-07 23:14           ` Dave Love
2002-09-08  1:07             ` Thomas Morgan
2002-09-09 22:35               ` Dave Love
  -- strict thread matches above, loose matches on Subject: below --
2002-09-05 17:39 Thomas Morgan
2002-09-05 22:50 ` Dave Love

Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.