unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
* Re: GNU Emacs 22.0.50 fails to find ä in different ISO Latin encodings
       [not found] <F021B5CA-A186-4AFB-B650-520DBB6261C4@Web.DE>
@ 2006-09-19  3:58 ` Kenichi Handa
  2006-09-19  6:43   ` David Kastrup
  2006-09-19 22:57   ` Richard Stallman
  0 siblings, 2 replies; 25+ messages in thread
From: Kenichi Handa @ 2006-09-19  3:58 UTC (permalink / raw)
  Cc: emacs-pretest-bug, emacs-devel

In article <F021B5CA-A186-4AFB-B650-520DBB6261C4@Web.DE>, Peter Dyballa <Peter_Dyballa@Web.DE> writes:

> Hello!
> Launched with -Q

> 	unify-8859-on-decoding-mode is nil
> 	unify-8859-on-encoding-mode is t

> I start i-search in an Unicode encoded buffer (*Help*). In an ISO  
> 8859-1 encoded buffer ä and Ä are found, also in ISO 8859-10, ISO  
> 8859-13 (except for Ä), ISO 8859-14, and ISO 8859-16 encoded buffers,  
> but fails in ISO 8859-2, ISO 8859-3, ISO 8859-4, ISO 8859-9, and ISO  
> 8859-15. It is similiar to ö and ü, accept that these are not found  
> in the ISO 8859-14 encoded buffer. Changing unify-8859-on-decoding- 
> mode's value makes no difference.

This is the story I remember.

A while ago, I proposed to change isearch so that it
translates characters by translation-table-for-input to
solve such a problem, but there raised an objection that
read-char should do that translation.  RMS asked to check if
such a change to read-char is surely safe or not, but as
such a check is very difficult and time-consuiming, no one
took on the job.

So, this problem is still unfixed.

I again propose to change isearch.  When we know that
changing read-char is safe in the future, we can cancel that
change in isearch.

---
Kenichi Handa
handa@m17n.org

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: GNU Emacs 22.0.50 fails to find ä in different ISO Latin encodings
  2006-09-19  3:58 ` GNU Emacs 22.0.50 fails to find ä in different ISO Latin encodings Kenichi Handa
@ 2006-09-19  6:43   ` David Kastrup
  2006-09-19 22:57   ` Richard Stallman
  1 sibling, 0 replies; 25+ messages in thread
From: David Kastrup @ 2006-09-19  6:43 UTC (permalink / raw)
  Cc: Peter Dyballa, emacs-devel, emacs-pretest-bug

Kenichi Handa <handa@m17n.org> writes:

> Peter Dyballa <Peter_Dyballa@Web.DE> writes:
>
>> Launched with -Q
>
>> 	unify-8859-on-decoding-mode is nil
>> 	unify-8859-on-encoding-mode is t
>
>> I start i-search in an Unicode encoded buffer (*Help*). In an ISO  
>> 8859-1 encoded buffer ä and Ä are found, also in ISO 8859-10, ISO  
>> 8859-13 (except for Ä), ISO 8859-14, and ISO 8859-16 encoded buffers,  
>> but fails in ISO 8859-2, ISO 8859-3, ISO 8859-4, ISO 8859-9, and ISO  
>> 8859-15. It is similiar to ö and ü, accept that these are not found  
>> in the ISO 8859-14 encoded buffer. Changing unify-8859-on-decoding- 
>> mode's value makes no difference.
>
> This is the story I remember.
>
> A while ago, I proposed to change isearch so that it
> translates characters by translation-table-for-input to
> solve such a problem, but there raised an objection that
> read-char should do that translation.  RMS asked to check if
> such a change to read-char is surely safe or not, but as
> such a check is very difficult and time-consuiming, no one
> took on the job.
>
> So, this problem is still unfixed.
>
> I again propose to change isearch.  When we know that
> changing read-char is safe in the future, we can cancel that
> change in isearch.

"in the future", namely after the release, we are going to switch to
the unicode2 branch and presumably the problem will go away.  So it
does not sound like we should attempt any complicated fix for Emacs 22
that is not going to stay around, anyway.  If the one problem where
people are complaining is search-and-replace, we should fix that case
for Emacs 22 and that's it for Emacs 22, in my opinion.

-- 
David Kastrup, Kriemhildstr. 15, 44793 Bochum

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: GNU Emacs 22.0.50 fails to find ä in different ISO Latin encodings
  2006-09-19  3:58 ` GNU Emacs 22.0.50 fails to find ä in different ISO Latin encodings Kenichi Handa
  2006-09-19  6:43   ` David Kastrup
@ 2006-09-19 22:57   ` Richard Stallman
  2006-09-20  7:10     ` Kenichi Handa
  1 sibling, 1 reply; 25+ messages in thread
From: Richard Stallman @ 2006-09-19 22:57 UTC (permalink / raw)
  Cc: Peter_Dyballa, emacs-devel, emacs-pretest-bug

    A while ago, I proposed to change isearch so that it
    translates characters by translation-table-for-input to
    solve such a problem, but there raised an objection that
    read-char should do that translation.  RMS asked to check if
    such a change to read-char is surely safe or not, but as
    such a check is very difficult and time-consuiming, no one
    took on the job.

    So, this problem is still unfixed.

    I again propose to change isearch.

Yes, let's do it that way.  Could you do it now?

David Kastrup wrote:

    "in the future", namely after the release, we are going to switch to
    the unicode2 branch and presumably the problem will go away.

That is true, so we already have our long-term solution.
For now, fixing isearch is sufficient.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: GNU Emacs 22.0.50 fails to find ä in different ISO Latin encodings
  2006-09-19 22:57   ` Richard Stallman
@ 2006-09-20  7:10     ` Kenichi Handa
  2006-09-20  7:43       ` Peter Dyballa
  2006-09-21 17:20       ` Richard Stallman
  0 siblings, 2 replies; 25+ messages in thread
From: Kenichi Handa @ 2006-09-20  7:10 UTC (permalink / raw)
  Cc: Peter_Dyballa, emacs-devel, emacs-pretest-bug

In article <E1GPoWo-0007PL-Pd@fencepost.gnu.org>, Richard Stallman <rms@gnu.org> writes:

>     A while ago, I proposed to change isearch so that it
>     translates characters by translation-table-for-input to
>     solve such a problem, but there raised an objection that
>     read-char should do that translation.  RMS asked to check if
>     such a change to read-char is surely safe or not, but as
>     such a check is very difficult and time-consuiming, no one
>     took on the job.

>     So, this problem is still unfixed.

>     I again propose to change isearch.

> Yes, let's do it that way.  Could you do it now?

Oops, it seems that my brain is seriously damaged :-(.  I
have already installed such a change (perhaps accoding to
your decision).

The problem is that the change took care only for a typed
character.  If isearch-string is set from a (possibly
different) buffer (e.g. by C-s C-w), the translation doesn't
happen.

So, I've just installed the attached change.  But, there
still exists a case that isearch fails.  For instance, if
your buffer's buffer-file-coding-system is iso-8859-2, and
you somehow insert a-acute of iso-8859-1, isearch won't be
able to find that a-acute.  The fix for that case is very
difficult in Emacs 22.

---
Kenichi Handa
handa@m17n.org

2006-09-20  Kenichi Handa  <handa@m17n.org>

	* isearch.el (isearch-process-search-char): Cancel the previous
	change.
	(isearch-search-string): New function.
	(isearch-search): Use isearch-search-string.
	(isearch-lazy-highlight-search): Likewise.

Index: isearch.el
===================================================================
RCS file: /cvsroot/emacs/emacs/lisp/isearch.el,v
retrieving revision 1.289
retrieving revision 1.290
diff -u -r1.289 -r1.290
--- isearch.el	9 Jul 2006 11:04:18 -0000	1.289
+++ isearch.el	20 Sep 2006 06:13:43 -0000	1.290
@@ -1807,8 +1807,6 @@
    ((eq   char ?|)       (isearch-fallback t nil t)))
 
   ;; Append the char to the search string, update the message and re-search.
-  (if (char-table-p translation-table-for-input)
-      (setq char (or (aref translation-table-for-input char) char)))
   (isearch-process-search-string
    (char-to-string char)
    (if (>= char ?\200)
@@ -1993,6 +1991,36 @@
      (t
       (if isearch-forward 'search-forward 'search-backward)))))
 
+(defun isearch-search-string (string bound noerror)
+  ;; Search for the first occurance of STRING or its translation.  If
+  ;; found, move point to the end of the occurance, update
+  ;; isearch-match-beg and isearch-match-end, and return point.
+  (let ((func (isearch-search-fun))
+	(len (length string))
+	pos1 pos2)
+    (setq pos1 (save-excursion (funcall func string bound noerror)))
+    (if (and (char-table-p translation-table-for-input)
+	     (> (string-bytes string) len))
+	(let (translated match-data)
+	  (dotimes (i len)
+	    (let ((x (aref translation-table-for-input (aref string i))))
+	      (when x
+		(or translated (setq translated (copy-sequence string)))
+		(aset translated i x))))
+	  (when translated
+	    (save-match-data
+	      (save-excursion
+		(if (setq pos2 (funcall func translated bound noerror))
+		    (setq match-data (match-data t)))))
+	    (when (and pos2
+		       (or (not pos1)
+			   (if isearch-forward (< pos2 pos1) (> pos2 pos1))))
+	      (setq pos1 pos2)
+	      (set-match-data match-data)))))
+    (if pos1
+	(goto-char pos1))
+    pos1))
+
 (defun isearch-search ()
   ;; Do the search with the current search string.
   (isearch-message nil t)
@@ -2008,9 +2036,7 @@
 	(setq isearch-error nil)
 	(while retry
 	  (setq isearch-success
-		(funcall
-		 (isearch-search-fun)
-		 isearch-string nil t))
+		(isearch-search-string isearch-string nil t))
 	  ;; Clear RETRY unless we matched some invisible text
 	  ;; and we aren't supposed to do that.
 	  (if (or (eq search-invisible t)
@@ -2353,7 +2379,7 @@
 	(isearch-regexp isearch-lazy-highlight-regexp)
 	(search-spaces-regexp search-whitespace-regexp))
     (condition-case nil
-	(funcall (isearch-search-fun)
+	(isearch-search-string
 		 isearch-lazy-highlight-last-string
 		 (if isearch-forward
 		     (min (or isearch-lazy-highlight-end-limit (point-max))

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: GNU Emacs 22.0.50 fails to find ä in different ISO Latin encodings
  2006-09-20  7:10     ` Kenichi Handa
@ 2006-09-20  7:43       ` Peter Dyballa
  2006-09-20  8:05         ` Kenichi Handa
  2006-09-21 17:20       ` Richard Stallman
  1 sibling, 1 reply; 25+ messages in thread
From: Peter Dyballa @ 2006-09-20  7:43 UTC (permalink / raw)
  Cc: emacs-pretest-bug, rms, emacs-devel


Am 20.09.2006 um 09:10 schrieb Kenichi Handa:

> The problem is that the change took care only for a typed
> character.  If isearch-string is set from a (possibly
> different) buffer (e.g. by C-s C-w), the translation doesn't
> happen.

My test was very simple: I opened the ISO 8859-1 encoded file (starts  
with ;;; -*- mode: Text; coding: iso-8859-1; -*-) and typed C-s ä C-s  
RET. The I opened the other ISO Latin test file, which all have a  
coding set in the first line. Then I re-used the ä via C-s C-s.

I do not set buffer-file-coding-system, it's mule-utf-8 from UTF-8 in  
LC_CTYPE or LANG. But the value is adjusted to a local value due to  
the '-*- coding: iso-8859-X; -*-' in the files' first lines.

--
Greetings

   Pete

"What do you think of Western Civilisation?"
"I think it would be a good idea!"
                          -- Mohandas Karamchand Gandhi

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: GNU Emacs 22.0.50 fails to find ä in different ISO Latin encodings
  2006-09-20  7:43       ` Peter Dyballa
@ 2006-09-20  8:05         ` Kenichi Handa
  2006-09-20 11:17           ` Peter Dyballa
  0 siblings, 1 reply; 25+ messages in thread
From: Kenichi Handa @ 2006-09-20  8:05 UTC (permalink / raw)
  Cc: emacs-pretest-bug, rms, emacs-devel

In article <1B8CD230-9A54-4F2A-B0FA-5CD02730F034@web.de>, Peter Dyballa <Peter_Dyballa@web.de> writes:

> My test was very simple: I opened the ISO 8859-1 encoded file (starts  
> with ;;; -*- mode: Text; coding: iso-8859-1; -*-) and typed C-s ä C-s  
> RET. The I opened the other ISO Latin test file, which all have a  
> coding set in the first line. Then I re-used the ä via C-s C-s.

That "re-using" is also the case that the previous change
didn't take care.  Could you please try the test with the
latest code?

---
Kenichi Handa
handa@m17n.org

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: GNU Emacs 22.0.50 fails to find ä in different ISO Latin encodings
  2006-09-20  8:05         ` Kenichi Handa
@ 2006-09-20 11:17           ` Peter Dyballa
  2006-09-21  2:13             ` Kenichi Handa
  0 siblings, 1 reply; 25+ messages in thread
From: Peter Dyballa @ 2006-09-20 11:17 UTC (permalink / raw)
  Cc: emacs-pretest-bug, rms, emacs-devel


Am 20.09.2006 um 10:05 schrieb Kenichi Handa:

>> My test was very simple: I opened the ISO 8859-1 encoded file (starts
>> with ;;; -*- mode: Text; coding: iso-8859-1; -*-) and typed C-s ä C-s
>> RET. The I opened the other ISO Latin test file, which all have a
>> coding set in the first line. Then I re-used the ä via C-s C-s.
>
> That "re-using" is also the case that the previous change
> didn't take care.  Could you please try the test with the
> latest code?

The CVS code is from Sunday or Monday. After applying your patch  
nothing changes for my simple test (emacs-22.0.50 -Q). I did it also  
for °, which can't be found in ISO 8859-7 and ISO 8859-8 although it  
exists there additionally/instead of ä.

For ­, U+00AD HYPHEN-MINUS, the most common character, I get when  
starting from ISO 8859-1:

	failure: ISO 8859-2, ISO 8859-3, ISO 8859-4, ISO 8859-5, ISO 8859-7,  
ISO 8859-8, ISO 8859-9, ISO 8859-14, ISO 8859-15
	success: ISO 8859-10, ISO 8859-13, ISO 8859-16

although in all these 13 encodings it's (oct/dec/hex) 255 - 173 - AD.  
Starting the search for HYPHEN-MINUS in an ISO 8859-15 encoded file  
it's found in no other ISO 8859 encoded file.


This is not satisfactory. This is not unified.

--
Greetings

   Pete

Some day we may discover how to make magnets that can point in any  
direction.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: GNU Emacs 22.0.50 fails to find ä in different ISO Latin encodings
  2006-09-20 11:17           ` Peter Dyballa
@ 2006-09-21  2:13             ` Kenichi Handa
  2006-09-21  8:09               ` Peter Dyballa
  2006-09-21 23:22               ` Peter Dyballa
  0 siblings, 2 replies; 25+ messages in thread
From: Kenichi Handa @ 2006-09-21  2:13 UTC (permalink / raw)
  Cc: emacs-pretest-bug, rms, emacs-devel

In article <BA478627-9970-47CA-8EBF-A2332C4CFAEE@web.de>, Peter Dyballa <Peter_Dyballa@web.de> writes:

> The CVS code is from Sunday or Monday. After applying your patch  
> nothing changes for my simple test (emacs-22.0.50 -Q). I did it also  
> for °, which can't be found in ISO 8859-7 and ISO 8859-8 although it  
> exists there additionally/instead of ä.

Hmmm, strange, it doesn't fail for me.  Are you sure that
Emacs is re-built after isearch.el is byte-compiled?

---
Kenichi Handa
handa@m17n.org

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: GNU Emacs 22.0.50 fails to find ä in different ISO Latin encodings
  2006-09-21  2:13             ` Kenichi Handa
@ 2006-09-21  8:09               ` Peter Dyballa
  2006-09-21 23:22               ` Peter Dyballa
  1 sibling, 0 replies; 25+ messages in thread
From: Peter Dyballa @ 2006-09-21  8:09 UTC (permalink / raw)
  Cc: emacs-pretest-bug, rms, emacs-devel


Am 21.09.2006 um 04:13 schrieb Kenichi Handa:

> In article <BA478627-9970-47CA-8EBF-A2332C4CFAEE@web.de>, Peter  
> Dyballa <Peter_Dyballa@web.de> writes:
>
>> The CVS code is from Sunday or Monday. After applying your patch
>> nothing changes for my simple test (emacs-22.0.50 -Q). I did it also
>> for °, which can't be found in ISO 8859-7 and ISO 8859-8 although it
>> exists there additionally/instead of ä.
>
> Hmmm, strange, it doesn't fail for me.  Are you sure that
> Emacs is re-built after isearch.el is byte-compiled?
>

Yes, I did. I made this mistake a few times so I learned of it.  
Actually I just re-made and installed GNU Emacs and then byte- 
compiled isearch.el in /usr/local/share/emacs/22.0.50/lisp.

I'll cvs-update tomorrow or on Saturday and I'll check isearch.el  
again (I'm a bit busy today).

--
Greetings

   Pete

Time flies like an error -- but fruit flies like a banana!
                              (almost Groucho Marx)

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: GNU Emacs 22.0.50 fails to find ä in different ISO Latin encodings
  2006-09-20  7:10     ` Kenichi Handa
  2006-09-20  7:43       ` Peter Dyballa
@ 2006-09-21 17:20       ` Richard Stallman
  1 sibling, 0 replies; 25+ messages in thread
From: Richard Stallman @ 2006-09-21 17:20 UTC (permalink / raw)
  Cc: Peter_Dyballa, emacs-devel, emacs-pretest-bug

    So, I've just installed the attached change.  But, there
    still exists a case that isearch fails.  For instance, if
    your buffer's buffer-file-coding-system is iso-8859-2, and
    you somehow insert a-acute of iso-8859-1, isearch won't be
    able to find that a-acute.  The fix for that case is very
    difficult in Emacs 22.

Do you think we should document this in the Emacs manual?

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: GNU Emacs 22.0.50 fails to find ä in different ISO Latin encodings
  2006-09-21  2:13             ` Kenichi Handa
  2006-09-21  8:09               ` Peter Dyballa
@ 2006-09-21 23:22               ` Peter Dyballa
  2006-09-22  0:44                 ` Miles Bader
  2006-09-22  1:06                 ` Kenichi Handa
  1 sibling, 2 replies; 25+ messages in thread
From: Peter Dyballa @ 2006-09-21 23:22 UTC (permalink / raw)
  Cc: emacs-pretest-bug, rms, emacs-devel


Am 21.09.2006 um 04:13 schrieb Kenichi Handa:

> In article <BA478627-9970-47CA-8EBF-A2332C4CFAEE@web.de>, Peter  
> Dyballa <Peter_Dyballa@web.de> writes:
>
>> The CVS code is from Sunday or Monday. After applying your patch
>> nothing changes for my simple test (emacs-22.0.50 -Q). I did it also
>> for °, which can't be found in ISO 8859-7 and ISO 8859-8 although it
>> exists there additionally/instead of ä.
>
> Hmmm, strange, it doesn't fail for me.  Are you sure that
> Emacs is re-built after isearch.el is byte-compiled?
>

OK, you're right: it really works better now, I had make some  
mistake! I wonder whether I picked up the characters with C-s C-w ...  
As you wrote, this won't work.

Anyway, what also does not work is: C-s C-q <a non-ASCII, i.e.  
greater 177 octal code>. For those with really small keyboards this  
is the (almost?) only chance to find some of the x times 64 K  
characters in Unicode ...

--
Greetings

   Pete

Hard Disk:  A device that allows users to delete vast quantities of  
data with
             simple mnemonic commands.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: GNU Emacs 22.0.50 fails to find ä in different ISO Latin encodings
  2006-09-21 23:22               ` Peter Dyballa
@ 2006-09-22  0:44                 ` Miles Bader
  2006-09-22  9:06                   ` Peter Dyballa
  2006-09-22  1:06                 ` Kenichi Handa
  1 sibling, 1 reply; 25+ messages in thread
From: Miles Bader @ 2006-09-22  0:44 UTC (permalink / raw)
  Cc: emacs-pretest-bug, emacs-devel, rms, Kenichi Handa

Peter Dyballa <Peter_Dyballa@web.de> writes:
> Anyway, what also does not work is: C-s C-q <a non-ASCII, i.e.  greater
> 177 octal code>. For those with really small keyboards this  is the
> (almost?) only chance to find some of the x times 64 K  characters in
> Unicode ...

Eh?  It works for me:

E.g., the Emacs 22 character code of "字" is octal 0156772.

If I enter C-s C-q 0156772 (followed by some other char to terminate the
octal code), it correctly adds that character to the search string (and
finds in the buffer).

-Miles

-- 
P.S.  All information contained in the above letter is false,
      for reasons of military security.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: GNU Emacs 22.0.50 fails to find ä in different ISO Latin encodings
  2006-09-21 23:22               ` Peter Dyballa
  2006-09-22  0:44                 ` Miles Bader
@ 2006-09-22  1:06                 ` Kenichi Handa
  2006-09-22  9:32                   ` Peter Dyballa
  1 sibling, 1 reply; 25+ messages in thread
From: Kenichi Handa @ 2006-09-22  1:06 UTC (permalink / raw)
  Cc: emacs-pretest-bug, rms, emacs-devel

[-- Attachment #1: Type: text/plain, Size: 1155 bytes --]

In article <4CEE7BA9-0CEF-40CD-A081-2C707A44833B@web.de>, Peter Dyballa <Peter_Dyballa@web.de> writes:

> OK, you're right: it really works better now, I had make some  
> mistake! I wonder whether I picked up the characters with C-s C-w ...  
> As you wrote, this won't work.

It didn't work, but should work now.  I attached 3 files
(temp1,2,7 encoded in iso-8859-1,2,7 respectively).
  C-x C-f temp1 RET ESC < C-n C-s C-w C-x C-f temp2 C-s C-s
should find "­á", and
  C-x C-f temp1 RET ESC < C-n C-n C-s C-w C-x C-f temp7 C-s C-s
should find "­°".

> Anyway, what also does not work is: C-s C-q <a non-ASCII, i.e.  
> greater 177 octal code>. For those with really small keyboards this  
> is the (almost?) only chance to find some of the x times 64 K  
> characters in Unicode ...

This should work now too.  For instance, "­" and "á" are
0255 and 0341 in iso-8859-1 charset.  So, if your primary
charset is iso-8859-1, C-q 255 C-q 341 RET should input
"­á".  And,
  C-x C-f temp2 ESC < C-s C-q 255 C-q 341 RET
should find "­á" even if the characters in that buffer is
from iso-8859-2.

---
Kenichi Handa
handa@m17n.org


[-- Attachment #2: temp.tar.gz --]
[-- Type: application/octet-stream, Size: 222 bytes --]

[-- Attachment #3: Type: text/plain, Size: 142 bytes --]

_______________________________________________
Emacs-devel mailing list
Emacs-devel@gnu.org
http://lists.gnu.org/mailman/listinfo/emacs-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: GNU Emacs 22.0.50 fails to find ä in different ISO Latin encodings
  2006-09-22  0:44                 ` Miles Bader
@ 2006-09-22  9:06                   ` Peter Dyballa
  2006-09-22 10:31                     ` Miles Bader
  0 siblings, 1 reply; 25+ messages in thread
From: Peter Dyballa @ 2006-09-22  9:06 UTC (permalink / raw)
  Cc: emacs-pretest-bug, emacs-devel, rms, Kenichi Handa


Am 22.09.2006 um 02:44 schrieb Miles Bader:

> Peter Dyballa <Peter_Dyballa@web.de> writes:
>> Anyway, what also does not work is: C-s C-q <a non-ASCII, i.e.   
>> greater
>> 177 octal code>. For those with really small keyboards this  is the
>> (almost?) only chance to find some of the x times 64 K  characters in
>> Unicode ...
>
> Eh?  It works for me:
>
> E.g., the Emacs 22 character code of "字" is octal 0156772.
>
> If I enter C-s C-q 0156772 (followed by some other char to  
> terminate the
> octal code), it correctly adds that character to the search string  
> (and
> finds in the buffer).


OK, I did not check in the "higher" Unicode regions, and I did not  
check in an UTF-8 encoded buffer, and I did not input so long numbers  
I cannot compute, I was still in my simple ISO 8859-X test files  
(your example works for me too in an UTF-8 encoded buffer). After  
launching GNU Emacs 22.0.50 with -Q the phenomenon seems to be that  
input like

	C-s C-q <[23][0-7][0-7]> RET

is interpreted as trying to "name/point to" an ISO 8859-1 encoded  
character. For example:

	C-s C-q 245 in ISO 8859-16 does not find ``„´´ (U+201E) – mini- 
buffer tells me that ``¥´´ (\245 in ISO 8859-1) cannot be found.

C-s C-q 241 RET searches for ¡.
C-s C-q 242 RET searches for ¢.
C-s C-q 243 RET searches for £.
C-s C-q 244 RET searches for ¤ (CURRENCY SIGN, U+00A4).

Evaluating (unify-8859-on-decoding-mode t) does not change this  
specific behaviour.



Which is the formula to map octal 0156772 to a Unicode slot/position?  
Octal 0156772 is DDFA in hex, which is different from 5B57, 字's  
position in Unicode. Or: how can I find the octal value for a given  
Unicode slot (U+ABCD)? There is probably some function for this  
purpose ...

--
Greetings

   Pete

"It isn't pollution that's harming the environment. It's the  
impurities in our air and water that are doing it."

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: GNU Emacs 22.0.50 fails to find ä in different ISO Latin encodings
  2006-09-22  1:06                 ` Kenichi Handa
@ 2006-09-22  9:32                   ` Peter Dyballa
  0 siblings, 0 replies; 25+ messages in thread
From: Peter Dyballa @ 2006-09-22  9:32 UTC (permalink / raw)
  Cc: emacs-pretest-bug, rms, emacs-devel

[-- Attachment #1: Type: text/plain, Size: 1872 bytes --]


Am 22.09.2006 um 03:06 schrieb Kenichi Handa:

> In article <4CEE7BA9-0CEF-40CD-A081-2C707A44833B@web.de>, Peter  
> Dyballa <Peter_Dyballa@web.de> writes:
>
>> OK, you're right: it really works better now, I had made some
>> mistake! I wonder whether I picked up the characters with C-s C-w ...
>> As you wrote, this won't work.
>
> It didn't work, but should work now.  I attached 3 files
> (temp1,2,7 encoded in iso-8859-1,2,7 respectively).
>   C-x C-f temp1 RET ESC < C-n C-s C-w C-x C-f temp2 C-s C-s
> should find " á", and
>   C-x C-f temp1 RET ESC < C-n C-n C-s C-w C-x C-f temp7 C-s C-s
> should find " °".

Yes, I can confirm: it works! It works also in my own test files –  
except one: the ISO 8859-6 encoded one. I was searching for HYPHEN- 
MINUS, U+00AD. I'll attach my test file. It could be also useful in  
the ISO 8859-6 possible bug I reported recently.


>
>> Anyway, what also does not work is: C-s C-q <a non-ASCII, i.e.
>> greater 177 octal code>. For those with really small keyboards this
>> is the (almost?) only chance to find some of the x times 64 K
>> characters in Unicode ...
>
> This should work now too.  For instance, " " and "á" are
> 0255 and 0341 in iso-8859-1 charset.  So, if your primary
> charset is iso-8859-1, C-q 255 C-q 341 RET should input
> " á".  And,
>   C-x C-f temp2 ESC < C-s C-q 255 C-q 341 RET
> should find " á" even if the characters in that buffer is
> from iso-8859-2.
>

I did not try this test because it is too simple: LATIN SMALL LETTER  
A WITH ACUTE (U+00E1) is in the two encodings on 341/225/E1.

Please use my answer to Miles Bader as test case! I can send you my  
other ISO 8859-X test files.


--
Greetings

   Pete

"Eternity is a terrible thought. I mean, where's it going to end?"
                                             - Tom Stoppard


[-- Attachment #2: ISO 8859-6.txt --]
[-- Type: text/plain, Size: 3314 bytes --]

;;; -*- coding: iso-8859-6; -*-
;
;	Time-stamp: <2006-09-22 00:25:10 pete>
;
;   Arabic Glyphs
;
;   oct   dec   hex    UCS2    UTF-8
;=====================================
  = 240 = 160 = A0 = U+00A0 =    C2 A0 : NO-BREAK SPACE
¤ = 244 = 164 = A4 = U+00A4 =    C2 A4 : CURRENCY SIGN
¬ = 254 = 172 = AC = U+060C =    D8 8C : ARABIC COMMA
­ = 255 = 173 = AD = U+00AD =    C2 AD : HYPHEN-MINUS
» = 273 = 187 = BB = U+061B =    D8 9B : ARABIC SEMICOLON
¿ = 277 = 191 = BF = U+061F =    D8 9F : ARABIC QUESTION MARK
Á = 301 = 193 = C1 = U+0621 =    D8 A1 : ARABIC LETTER HAMZA
 = 302 = 194 = C2 = U+0622 =    D8 A2 : ARABIC LETTER ALEF WITH MADDA ABOVE
à = 303 = 195 = C3 = U+0623 =    D8 A3 : ARABIC LETTER ALEF WITH HAMZA ABOVE
Ä = 304 = 196 = C4 = U+0624 =    D8 A4 : ARABIC LETTER WAW WITH HAMZA ABOVE
Å = 305 = 197 = C5 = U+0625 =    D8 A5 : ARABIC LETTER ALEF WITH HAMZA BELOW
Æ = 306 = 198 = C6 = U+0626 =    D8 A6 : ARABIC LETTER YEH WITH HAMZA ABOVE
Ç = 307 = 199 = C7 = U+0627 =    D8 A7 : ARABIC LETTER ALEF
È = 310 = 200 = C8 = U+0628 =    D8 A8 : ARABIC LETTER BEH
É = 311 = 201 = C9 = U+0629 =    D8 A9 : ARABIC LETTER TEH MARBUTA
Ê = 312 = 202 = CA = U+062A =    D8 AA : ARABIC LETTER TEH
Ë = 313 = 203 = CB = U+062B =    D8 AB : ARABIC LETTER THEH
Ì = 314 = 204 = CC = U+062C =    D8 AC : ARABIC LETTER JEEM
Í = 315 = 205 = CD = U+062D =    D8 AD : ARABIC LETTER HAH
Î = 316 = 206 = CE = U+062E =    D8 AE : ARABIC LETTER KHAH
Ï = 317 = 207 = CF = U+062F =    D8 AF : ARABIC LETTER DAL
Ð = 320 = 208 = D0 = U+0630 =    D8 B0 : ARABIC LETTER THAL
Ñ = 321 = 209 = D1 = U+0631 =    D8 B1 : ARABIC LETTER REHe
Ò = 322 = 210 = D2 = U+0632 =    D8 B2 : ARABIC LETTER ZAIN
Ó = 323 = 211 = D3 = U+0633 =    D8 B3 : ARABIC LETTER SEEN
Ô = 324 = 212 = D4 = U+0634 =    D8 B4 : ARABIC LETTER SHEEN
Õ = 325 = 213 = D5 = U+0635 =    D8 B5 : ARABIC LETTER SAD
Ö = 326 = 214 = D6 = U+0636 =    D8 B6 : ARABIC LETTER DAD
× = 327 = 215 = D7 = U+0637 =    D8 B7 : ARABIC LETTER TAH
Ø = 330 = 216 = D8 = U+0638 =    D8 B8 : ARABIC LETTER ZAH
Ù = 331 = 217 = D9 = U+0639 =    D8 B9 : ARABIC LETTER AIN
Ú = 332 = 218 = DA = U+063A =    D8 BA : ARABIC LETTER GHAIN
à = 340 = 224 = E0 = U+0640 =    D9 80 : ARABIC TATWEEL
á = 341 = 225 = E1 = U+0641 =    D9 81 : ARABIC LETTER FEH
â = 342 = 226 = E2 = U+0642 =    D9 82 : ARABIC LETTER QAF
ã = 343 = 227 = E3 = U+0643 =    D9 83 : ARABIC LETTER KAF
ä = 344 = 228 = E4 = U+0644 =    D9 84 : ARABIC LETTER LAM
å = 345 = 229 = E5 = U+0645 =    D9 85 : ARABIC LETTER MEEM
æ = 346 = 230 = E6 = U+0646 =    D9 86 : ARABIC LETTER NOON
ç = 347 = 231 = E7 = U+0647 =    D9 87 : ARABIC LETTER HEH
è = 350 = 232 = E8 = U+0648 =    D9 88 : ARABIC LETTER WAW
é = 351 = 233 = E9 = U+0649 =    D9 89 : ARABIC LETTER ALEF MAKSURA
ê = 352 = 234 = EA = U+064A =    D9 8A : ARABIC LETTER YEH
ë = 353 = 235 = EB = U+064B =    D9 8B : ARABIC FATHATAN
ì = 354 = 236 = EC = U+064C =    D9 8C : ARABIC DAMMATAN
í = 355 = 237 = ED = U+064D =    D9 8D : ARABIC KASRATAN
î = 356 = 238 = EE = U+064E =    D9 8E : ARABIC FATHA
ï = 357 = 239 = EF = U+064F =    D9 8F : ARABIC DAMMA
ð = 360 = 240 = F0 = U+0650 =    D9 90 : ARABIC KASRA
ñ = 361 = 241 = F1 = U+0651 =    D9 91 : ARABIC SHADDA
ò = 362 = 242 = F2 = U+0652 =    D9 92 : ARABIC SUKUN

[-- Attachment #3: Type: text/plain, Size: 1 bytes --]



[-- Attachment #4: Type: text/plain, Size: 142 bytes --]

_______________________________________________
Emacs-devel mailing list
Emacs-devel@gnu.org
http://lists.gnu.org/mailman/listinfo/emacs-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: GNU Emacs 22.0.50 fails to find ä in different ISO Latin encodings
  2006-09-22  9:06                   ` Peter Dyballa
@ 2006-09-22 10:31                     ` Miles Bader
  2006-09-22 10:55                       ` Peter Dyballa
  0 siblings, 1 reply; 25+ messages in thread
From: Miles Bader @ 2006-09-22 10:31 UTC (permalink / raw)
  Cc: emacs-pretest-bug, emacs-devel, rms, Kenichi Handa

Peter Dyballa <Peter_Dyballa@web.de> writes:
> 	C-s C-q 245 in ISO 8859-16 does not find ``„´´ (U+201E) – mini- 
> buffer tells me that ``¥´´ (\245 in ISO 8859-1) cannot be found.

That's because the numeric code following C-q is _not_ a unicode code
point, it's an Emacs character code.  In Emacs 22 those two things are
very different (in Emacs 23, I guess they are the same, as Emacs 23 uses
unicode for its internal codes).

You can see the "Emacs character code" of a character by hitting C-x =
on top of that character in a buffer.

E.g., C-x = says that ``„´´ has Emacs code 1234576, and indeed entering
`C-s C-q 1234576 RET' successfully searches for „ !  Similarly, the
Emacs code for ¥ is 4245, and that also works correctly following C-q.

> Which is the formula to map octal 0156772 to a Unicode slot/position?
> Octal 0156772 is DDFA in hex, which is different from 5B57, 字's
> position in Unicode.

(encode-char #o156772 'ucs)
  => 23383 (#o55527, #x5b57)

> Or: how can I find the octal value for a given Unicode slot (U+ABCD)?

(decode-char 'ucs #x5b57)
  => 56826 (#o156772, #xddfa)

[There seems to be no such unicode character #xABCD known to Emacs.]

Note that (decode-char 'ucs CODE) continues to work properly in Emacs
23, even though Emacs internal codes are completely different (in Emacs
23, of course, it basically just returns its 2nd argument), so it seems
a good function to use for code portable between Emacs 22 and 23.

-Miles

-- 
(\(\
(^.^)
(")")
*This is the cute bunny virus, please copy this into your sig so it can spread.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: GNU Emacs 22.0.50 fails to find ä in different ISO Latin encodings
  2006-09-22 10:31                     ` Miles Bader
@ 2006-09-22 10:55                       ` Peter Dyballa
  2006-09-22 11:27                         ` Miles Bader
  0 siblings, 1 reply; 25+ messages in thread
From: Peter Dyballa @ 2006-09-22 10:55 UTC (permalink / raw)
  Cc: emacs-pretest-bug, emacs-devel, rms, Kenichi Handa


Am 22.09.2006 um 12:31 schrieb Miles Bader:

> Peter Dyballa <Peter_Dyballa@web.de> writes:
>> 	C-s C-q 245 in ISO 8859-16 does not find ``„´´ (U+201E) – mini-
>> buffer tells me that ``¥´´ (\245 in ISO 8859-1) cannot be found.
>
> That's because the numeric code following C-q is _not_ a unicode code
> point, it's an Emacs character code.  In Emacs 22 those two things are
> very different (in Emacs 23, I guess they are the same, as Emacs 23  
> uses
> unicode for its internal codes).
>
> You can see the "Emacs character code" of a character by hitting C-x =
> on top of that character in a buffer.
>
> E.g., C-x = says that ``„´´ has Emacs code 1234576, and indeed  
> entering
> `C-s C-q 1234576 RET' successfully searches for „ !  Similarly, the
> Emacs code for ¥ is 4245, and that also works correctly following C-q.

This might be the correct way in a GNU Emacs way, but not in the way  
an Emacs user would use it. Or can I type C-q 4245 RET to input ¥ in  
some file? (Well, it actually works ...) Having to use other numbers  
than the well-known three digits wide ones is not the usual user  
experience. The so-called character code is a known quantity and  
supported by some operating systems. (There is also the option to  
change the 'base' of the character code notation from 8 to 16, to be  
able to input the Unicode slot number. This should work also IMO.)

--
Greetings

   Pete

   Basic, n.:
A programming language.  Related to certain social diseases in
that those who have it will not admit it in polite company.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: GNU Emacs 22.0.50 fails to find ä in different ISO Latin encodings
  2006-09-22 10:55                       ` Peter Dyballa
@ 2006-09-22 11:27                         ` Miles Bader
  2006-09-22 22:54                           ` Peter Dyballa
  2006-09-23  3:34                           ` Richard Stallman
  0 siblings, 2 replies; 25+ messages in thread
From: Miles Bader @ 2006-09-22 11:27 UTC (permalink / raw)
  Cc: emacs-pretest-bug, emacs-devel, rms, Kenichi Handa

Peter Dyballa <Peter_Dyballa@web.de> writes:
> There is also the option to change the 'base' of the character code
> notation from 8 to 16

This feature is supported; see the variable `read-quoted-char-radix'.

> This might be the correct way in a GNU Emacs way, but not in the way  an
> Emacs user would use it. Or can I type C-q 4245 RET to input ¥ in  some
> file? (Well, it actually works ...) Having to use other numbers  than
> the well-known three digits wide ones is not the usual user
> experience.

I suppose that a patch such as the following could be used to support
at least unicode input in `read-quoted-char' (the function underlying C-q).

(set `read-quoted-char-charset' to `ucs' to input unicode-codes)

Whether this is a serious enough problem to consider adding a patch this
latein the release cycle to consider, I don't know.  [I think the
default value of read-quoted-char-charset would probably have to remain
nil though...]

-Miles


2006-09-22  Miles Bader  <miles@gnu.org>

	* subr.el (read-quoted-char-charset): New variable.
	(read-quoted-char): Use it.

--- orig/lisp/subr.el
+++ mod/lisp/subr.el
@@ -1539,6 +1548,17 @@
   :type '(choice (const 8) (const 10) (const 16))
   :group 'editing-basics)
 
+(defvar read-quoted-char-charset nil
+  "*The character-set used for numeric codepoints entered with `read-quoted-char'.
+If nil, Emacs' internal codepoints are used.")
+
+(custom-declare-variable-early
+ 'read-quoted-char-charset nil
+ "*The character-set used for numeric codepoints entered with `read-quoted-char'.
+If nil, Emacs' internal codepoints are used."
+  :type '(choice (const nil) (const ucs))
+  :group 'editing-basics)
+
 (defun read-quoted-char (&optional prompt)
   "Like `read-char', but do not allow quitting.
 Also, if the first character read is an octal digit,
@@ -1595,7 +1615,13 @@
 	    (t (setq code translated
 		     done t)))
       (setq first nil))
-    code))
+    (if (null read-quoted-char-charset)
+	code
+      (let ((decoded (decode-char read-quoted-char-charset code)))
+	(when (null decoded)
+	  (error "Invalid %s character: %d, #o%o, #x%x"
+		 read-quoted-char-charset code code code))
+	decoded))))
 
 (defun read-passwd (prompt &optional confirm default)
   "Read a password, prompting with PROMPT, and return it.


-- 
The secret to creativity is knowing how to hide your sources.
  --Albert Einstein

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: GNU Emacs 22.0.50 fails to find ä in different ISO Latin encodings
  2006-09-22 11:27                         ` Miles Bader
@ 2006-09-22 22:54                           ` Peter Dyballa
  2006-09-22 23:25                             ` Miles Bader
  2006-09-23  3:34                           ` Richard Stallman
  1 sibling, 1 reply; 25+ messages in thread
From: Peter Dyballa @ 2006-09-22 22:54 UTC (permalink / raw)
  Cc: emacs-pretest-bug, emacs-devel, rms, Kenichi Handa


Am 22.09.2006 um 13:27 schrieb Miles Bader:

> Peter Dyballa <Peter_Dyballa@web.de> writes:
>> There is also the option to change the 'base' of the character code
>> notation from 8 to 16
>
> This feature is supported; see the variable `read-quoted-char-radix'.

Right, it works a bit, i.e. in the ASCII range it works well. When it  
comes to ISO Latin it interprets all in ISO Latin-1, i.e. C-s C-q 0 0  
a 4 RET searches in an ISO 8859-16 encoded buffer for CURRENCY UNIT  
although it is EURO in this case. A translation to the buffer local  
encoding obviously does not happen ...


(setq read-quoted-char-radix 16)
(setq read-quoted-char-charset 'ucs)

After applying your patch this behaviour does not change, it's still  
assumed that the encoding is ISO Latin-1. 00A4 is categorically ``¦ 
´´. The improvement is that I can find via an Unicode value an ISO  
Latin encoded character – is this an improvement? The file code is A4  
in any ISO Latin case, and the character is U+20AC in Unicode when in  
ISO Latin-10/ISO Latin-0 or ISO Latin-9. This looks like a Do What I  
Mean. Really not bad! But the real way should be C-s C-q 2 4 4 RET or  
C-s C-q A 4 RET or C-s C-q 1 6 4 RET (decimal), because it searches  
for the codes one expects in the encoded file, and which does not work.

--
Greetings

   Pete

Some day we may discover how to make magnets that can point in any  
direction.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: GNU Emacs 22.0.50 fails to find ä in different ISO Latin encodings
  2006-09-22 22:54                           ` Peter Dyballa
@ 2006-09-22 23:25                             ` Miles Bader
  2006-09-23  8:45                               ` Peter Dyballa
  0 siblings, 1 reply; 25+ messages in thread
From: Miles Bader @ 2006-09-22 23:25 UTC (permalink / raw)
  Cc: emacs-pretest-bug, emacs-devel, rms, Kenichi Handa

On 9/23/06, Peter Dyballa <Peter_Dyballa@web.de> wrote:
> The improvement is that I can find via an Unicode value an ISO
> Latin encoded character – is this an improvement?

It's what you asked for -- that input codes use some well-known
encoding rather than the unfamiliar emacs codes.

> The file code is A4
> in any ISO Latin case, and the character is U+20AC in Unicode when in
> ISO Latin-10/ISO Latin-0 or ISO Latin-9. This looks like a Do What I
> Mean. Really not bad! But the real way should be C-s C-q 2 4 4 RET or
> C-s C-q A 4 RET or C-s C-q 1 6 4 RET (decimal), because it searches
> for the codes one expects in the encoded file, and which does not work.

I think that sounds awful -- I do not think users want to learn the
codepoints in all encodings they use, they simply want to be able to
enter _characters_ that they don't know how to enter via the keyboard.

UCS codepoints are good because they allow _all_ emacs characters to
be entered in a consistent way.  Having C-q use the buffer's file
encoding on the other hand seems quite annoying, because it requires
users to use different numbers depending on what the file they're
editing was saved in (and I suspect a large portion of the time, users
don't even _know_ what encoding their file uses).

Nonetheless, if you feel that is the right method, feel free to
implement it and allow us to try it out (I offered the patch above
because it is very simple and offers useful functionality, but I do
not know offhand how to implement what you want).

-Miles

-- 
Do not taunt Happy Fun Ball.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: GNU Emacs 22.0.50 fails to find ä in different ISO Latin encodings
  2006-09-22 11:27                         ` Miles Bader
  2006-09-22 22:54                           ` Peter Dyballa
@ 2006-09-23  3:34                           ` Richard Stallman
  2006-09-23  5:18                             ` Miles Bader
  1 sibling, 1 reply; 25+ messages in thread
From: Richard Stallman @ 2006-09-23  3:34 UTC (permalink / raw)
  Cc: Peter_Dyballa, emacs-devel, handa, emacs-pretest-bug

    Whether this is a serious enough problem to consider adding a patch this
    latein the release cycle to consider, I don't know.  [I think the
    default value of read-quoted-char-charset would probably have to remain
    nil though...]

Could you give a self-contained explanation of why you propose this to
be added now?

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: GNU Emacs 22.0.50 fails to find ä in different ISO Latin encodings
  2006-09-23  3:34                           ` Richard Stallman
@ 2006-09-23  5:18                             ` Miles Bader
  2006-09-24  2:10                               ` Richard Stallman
  0 siblings, 1 reply; 25+ messages in thread
From: Miles Bader @ 2006-09-23  5:18 UTC (permalink / raw)
  Cc: Peter_Dyballa, handa, emacs-pretest-bug, emacs-devel

Richard Stallman <rms@gnu.org> writes:
> Could you give a self-contained explanation of why you propose this to
> be added now?

I don't really care one way or another, but Peter (Dyballa) suggests
that it would be more user-friendly if non-ASCII characters
entered/searched-for via C-q <code> used a standard like unicode to
interpret <code>, rather than Emacs internal character numbers as it
does now.

[In Emacs 23, of course, Emacs internal character numbers will _be_
unicode, so the distinction will go away.]

-Miles

-- 
The car has become... an article of dress without which we feel uncertain,
unclad, and incomplete.  [Marshall McLuhan, Understanding Media, 1964]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: GNU Emacs 22.0.50 fails to find ä in different ISO Latin encodings
  2006-09-22 23:25                             ` Miles Bader
@ 2006-09-23  8:45                               ` Peter Dyballa
  2006-09-24  1:51                                 ` Miles Bader
  0 siblings, 1 reply; 25+ messages in thread
From: Peter Dyballa @ 2006-09-23  8:45 UTC (permalink / raw)
  Cc: emacs-pretest-bug, emacs-devel, rms, Kenichi Handa


Am 23.09.2006 um 01:25 schrieb Miles Bader:

> UCS codepoints are good because they allow _all_ emacs characters to
> be entered in a consistent way.  Having C-q use the buffer's file
> encoding on the other hand seems quite annoying, because it requires
> users to use different numbers depending on what the file they're
> editing was saved in (and I suspect a large portion of the time, users
> don't even _know_ what encoding their file uses).

This is a good enough method for me! (And others probably too.) The  
problem I wanted to point out is that not the file's contents but its  
presentation forms are now found. This needs to be documented, and it  
needs to be emphasised that C-s C-q uses a Unicode search and does  
not take into account the file's proper encoding. Could be there are  
just a few that care about these encoding details.

This is like pressing u on the keyboard and an x appears on screen ...

--
Greetings

   Pete

Know thyself. Need help, call GOOGLE.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: GNU Emacs 22.0.50 fails to find ä in different ISO Latin encodings
  2006-09-23  8:45                               ` Peter Dyballa
@ 2006-09-24  1:51                                 ` Miles Bader
  0 siblings, 0 replies; 25+ messages in thread
From: Miles Bader @ 2006-09-24  1:51 UTC (permalink / raw)
  Cc: emacs-pretest-bug, Kenichi Handa, rms, emacs-devel

Peter Dyballa <Peter_Dyballa@web.de> writes:
> it needs to be emphasised that C-s C-q uses a Unicode search and does
> not take into account the file's proper encoding. Could be there are
> just a few that care about these encoding details.

That's misleading.  There's no "unicode search"; if the variable I added
is set to `ucs', it _converts_ a unicode codepoint entered via C-q to
Emacs' internal representation; after that, it works exactly like the
old C-q.

Since I-search (for instance) currently seems to correctly handle, for
instance, searching for a latin-1 ä in a latin-2 buffer -- even though
the underlying buffer representation is in fact different -- then
searching should continue to work correctly even in "unicode C-q mode".

[However, I think that character insertion via C-q won't work as the
user-expects; for instance, C-q e4 would insert a latin-1 ä even in a
unicode-2 buffer -- using the default settings, this situation will get
fixed up at file write time, because unify-8859-on-encoding-mode is on
by default, but until then, the inconsistent buffer contents might
confuse a user.]

-Miles
-- 
"Whatever you do will be insignificant, but it is very important that
 you do it."  Mahatma Ghandi

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: GNU Emacs 22.0.50 fails to find ä in different ISO Latin encodings
  2006-09-23  5:18                             ` Miles Bader
@ 2006-09-24  2:10                               ` Richard Stallman
  0 siblings, 0 replies; 25+ messages in thread
From: Richard Stallman @ 2006-09-24  2:10 UTC (permalink / raw)
  Cc: Peter_Dyballa, emacs-devel, emacs-pretest-bug, handa

    I don't really care one way or another, but Peter (Dyballa) suggests
    that it would be more user-friendly if non-ASCII characters
    entered/searched-for via C-q <code> used a standard like unicode to
    interpret <code>, rather than Emacs internal character numbers as it
    does now.

    [In Emacs 23, of course, Emacs internal character numbers will _be_
    unicode, so the distinction will go away.]

Let's leave it alone for now.

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2006-09-24  2:10 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <F021B5CA-A186-4AFB-B650-520DBB6261C4@Web.DE>
2006-09-19  3:58 ` GNU Emacs 22.0.50 fails to find ä in different ISO Latin encodings Kenichi Handa
2006-09-19  6:43   ` David Kastrup
2006-09-19 22:57   ` Richard Stallman
2006-09-20  7:10     ` Kenichi Handa
2006-09-20  7:43       ` Peter Dyballa
2006-09-20  8:05         ` Kenichi Handa
2006-09-20 11:17           ` Peter Dyballa
2006-09-21  2:13             ` Kenichi Handa
2006-09-21  8:09               ` Peter Dyballa
2006-09-21 23:22               ` Peter Dyballa
2006-09-22  0:44                 ` Miles Bader
2006-09-22  9:06                   ` Peter Dyballa
2006-09-22 10:31                     ` Miles Bader
2006-09-22 10:55                       ` Peter Dyballa
2006-09-22 11:27                         ` Miles Bader
2006-09-22 22:54                           ` Peter Dyballa
2006-09-22 23:25                             ` Miles Bader
2006-09-23  8:45                               ` Peter Dyballa
2006-09-24  1:51                                 ` Miles Bader
2006-09-23  3:34                           ` Richard Stallman
2006-09-23  5:18                             ` Miles Bader
2006-09-24  2:10                               ` Richard Stallman
2006-09-22  1:06                 ` Kenichi Handa
2006-09-22  9:32                   ` Peter Dyballa
2006-09-21 17:20       ` Richard Stallman

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).