bug#52263: Stale comment in xsd-regexp.el about Emacs not supporting Unicode

unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed

* bug#52263: Stale comment in xsd-regexp.el about Emacs not supporting Unicode
@ 2021-12-03 18:37 Stefan Kangas
  2021-12-03 19:27 ` Eli Zaretskii
  0 siblings, 1 reply; 7+ messages in thread
From: Stefan Kangas @ 2021-12-03 18:37 UTC (permalink / raw)
  To: 52263

Severity: minor

I believe this comment in lisp/nxml/xsd-regexp.el can be removed as
Emacs supports Unicode now:

    ;; The semantics of XSD regexps are defined in terms of Unicode.
    ;; Non-Unicode characters are not allowed in regular expressions and
    ;; will not match against the generated regular expressions.  A
    ;; Unicode character means a character in one of the Mule charsets
    ;; ascii, latin-iso8859-1, mule-unicode-0100-24ff,
    ;; mule-unicode-2500-33ff, mule-unicode-e000-ffff, eight-bit-control
    ;; or a character translatable to such a character (i.e a character
    ;; for which `encode-char' will return non-nil).
    ;;
    ;; Unfortunately, this means that this package is currently useless
    ;; for CJK characters, since there's no mule-unicode charset for the
    ;; CJK ranges of Unicode.  We should devise a workaround for this
    ;; until the fabled Unicode version of Emacs makes an appearance.

Is that correct?





^ permalink raw reply	[flat|nested] 7+ messages in thread

* bug#52263: Stale comment in xsd-regexp.el about Emacs not supporting Unicode
  2021-12-03 18:37 bug#52263: Stale comment in xsd-regexp.el about Emacs not supporting Unicode Stefan Kangas
@ 2021-12-03 19:27 ` Eli Zaretskii
  2021-12-04 13:07   ` Stefan Kangas
  0 siblings, 1 reply; 7+ messages in thread
From: Eli Zaretskii @ 2021-12-03 19:27 UTC (permalink / raw)
  To: Stefan Kangas; +Cc: 52263

> From: Stefan Kangas <stefan@marxist.se>
> Date: Fri, 3 Dec 2021 10:37:10 -0800
> 
> I believe this comment in lisp/nxml/xsd-regexp.el can be removed as
> Emacs supports Unicode now:
> 
>     ;; The semantics of XSD regexps are defined in terms of Unicode.
>     ;; Non-Unicode characters are not allowed in regular expressions and
>     ;; will not match against the generated regular expressions.  A
>     ;; Unicode character means a character in one of the Mule charsets
>     ;; ascii, latin-iso8859-1, mule-unicode-0100-24ff,
>     ;; mule-unicode-2500-33ff, mule-unicode-e000-ffff, eight-bit-control
>     ;; or a character translatable to such a character (i.e a character
>     ;; for which `encode-char' will return non-nil).
>     ;;
>     ;; Unfortunately, this means that this package is currently useless
>     ;; for CJK characters, since there's no mule-unicode charset for the
>     ;; CJK ranges of Unicode.  We should devise a workaround for this
>     ;; until the fabled Unicode version of Emacs makes an appearance.
> 
> Is that correct?

Probably.  The mule-Unicode-* stuff is definitely obsolete.  The only
thing that bothers me is what happens with eight-bit characters in the
XSD regexps -- are they allowed?  Emacs in general does allow them.
If xsd-regexp.el doesn't, that should be stated there.





^ permalink raw reply	[flat|nested] 7+ messages in thread

* bug#52263: Stale comment in xsd-regexp.el about Emacs not supporting Unicode
  2021-12-03 19:27 ` Eli Zaretskii
@ 2021-12-04 13:07   ` Stefan Kangas
  2021-12-04 16:16     ` Eli Zaretskii
  0 siblings, 1 reply; 7+ messages in thread
From: Stefan Kangas @ 2021-12-04 13:07 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 52263

Eli Zaretskii <eliz@gnu.org> writes:

>> I believe this comment in lisp/nxml/xsd-regexp.el can be removed as
>> Emacs supports Unicode now:
>>
>>     ;; The semantics of XSD regexps are defined in terms of Unicode.
>>     ;; Non-Unicode characters are not allowed in regular expressions and
>>     ;; will not match against the generated regular expressions.  A
>>     ;; Unicode character means a character in one of the Mule charsets
>>     ;; ascii, latin-iso8859-1, mule-unicode-0100-24ff,
>>     ;; mule-unicode-2500-33ff, mule-unicode-e000-ffff, eight-bit-control
>>     ;; or a character translatable to such a character (i.e a character
>>     ;; for which `encode-char' will return non-nil).
>>     ;;
>>     ;; Unfortunately, this means that this package is currently useless
>>     ;; for CJK characters, since there's no mule-unicode charset for the
>>     ;; CJK ranges of Unicode.  We should devise a workaround for this
>>     ;; until the fabled Unicode version of Emacs makes an appearance.
>>
>> Is that correct?
>
> Probably.  The mule-Unicode-* stuff is definitely obsolete.  The only
> thing that bothers me is what happens with eight-bit characters in the
> XSD regexps -- are they allowed?  Emacs in general does allow them.
> If xsd-regexp.el doesn't, that should be stated there.

Hmm, so probably more work is needed here than just removing the above
comment.  There is a lot of non-trivial mule and conversion stuff going
on in that library that might need a proper look by someone that knows
this stuff well.

Perhaps this bug should also be retitled accordingly.





^ permalink raw reply	[flat|nested] 7+ messages in thread

* bug#52263: Stale comment in xsd-regexp.el about Emacs not supporting Unicode
  2021-12-04 13:07   ` Stefan Kangas
@ 2021-12-04 16:16     ` Eli Zaretskii
  2021-12-05 17:34       ` Stefan Kangas
  0 siblings, 1 reply; 7+ messages in thread
From: Eli Zaretskii @ 2021-12-04 16:16 UTC (permalink / raw)
  To: Stefan Kangas; +Cc: 52263

> From: Stefan Kangas <stefan@marxist.se>
> Date: Sat, 4 Dec 2021 14:07:46 +0100
> Cc: 52263@debbugs.gnu.org
> 
> >> Is that correct?
> >
> > Probably.  The mule-Unicode-* stuff is definitely obsolete.  The only
> > thing that bothers me is what happens with eight-bit characters in the
> > XSD regexps -- are they allowed?  Emacs in general does allow them.
> > If xsd-regexp.el doesn't, that should be stated there.
> 
> Hmm, so probably more work is needed here than just removing the above
> comment.  There is a lot of non-trivial mule and conversion stuff going
> on in that library that might need a proper look by someone that knows
> this stuff well.

Mainly that file needs simplification: we in effect have a single
range of characters, with the possible exception of the codepoints
between 128 and 160.  Also, decode-char is now a no-op when the 1st
arg is 'ucs'.

I can offer help in those parts where you don't feel you understand
the issue well enough to make the simplifications.





^ permalink raw reply	[flat|nested] 7+ messages in thread

* bug#52263: Stale comment in xsd-regexp.el about Emacs not supporting Unicode
  2021-12-04 16:16     ` Eli Zaretskii
@ 2021-12-05 17:34       ` Stefan Kangas
  2021-12-05 17:48         ` Eli Zaretskii
  0 siblings, 1 reply; 7+ messages in thread
From: Stefan Kangas @ 2021-12-05 17:34 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 52263

[-- Attachment #1: Type: text/plain, Size: 172 bytes --]

Eli Zaretskii <eliz@gnu.org> writes:

> Also, decode-char is now a no-op when the 1st arg is 'ucs'.

Interesting.  So the attached cleanup should be okay to install, then?

[-- Attachment #2: decode-char.diff --]
[-- Type: text/x-diff, Size: 5682 bytes --]

diff --git a/lisp/gnus/mm-util.el b/lisp/gnus/mm-util.el
index 92e04f9d2e..ddc228e490 100644
--- a/lisp/gnus/mm-util.el
+++ b/lisp/gnus/mm-util.el
@@ -31,7 +31,7 @@ mm-mime-mule-charset-alist
 
 (defun mm-ucs-to-char (codepoint)
   "Convert Unicode codepoint to character."
-  (or (decode-char 'ucs codepoint) ?#))
+  (or codepoint ?#))
 
 (defvar mm-coding-system-list nil)
 (defun mm-get-coding-system-list ()
diff --git a/lisp/language/hanja-util.el b/lisp/language/hanja-util.el
index 9e9213536c..fe6323d42b 100644
--- a/lisp/language/hanja-util.el
+++ b/lisp/language/hanja-util.el
@@ -6573,8 +6573,8 @@ hangul-to-hanja-char
            (hanja-filter (lambda (x) (car x))
                          (mapcar (lambda (c)
                                    (if (listp c)
-                                       (cons (decode-char 'ucs (car c)) (cdr c))
-				     (list (decode-char 'ucs c))))
+                                       (cons (car c) (cdr c))
+                                     (list c)))
                                  (aref hanja-table char)))))
     (unwind-protect
 	(when (aref hanja-conversions 2)
diff --git a/lisp/leim/quail/hangul.el b/lisp/leim/quail/hangul.el
index c03e86b33c..d069b5b68e 100644
--- a/lisp/leim/quail/hangul.el
+++ b/lisp/leim/quail/hangul.el
@@ -429,7 +429,7 @@ hangul3-input-method-internal
            (hangul3-input-method-jong char))
           (t
            (setq hangul-queue (make-vector 6 0))
-           (insert (decode-char 'ucs char))
+           (insert char)
            (move-overlay quail-overlay (point) (point))))))
 
 (defun hangul3-input-method (key)
@@ -476,7 +476,7 @@ hangul390-input-method-internal
            (hangul3-input-method-jong char))
           (t
            (setq hangul-queue (make-vector 6 0))
-           (insert (decode-char 'ucs char))
+           (insert char)
            (move-overlay quail-overlay (point) (point))))))
 
 (defun hangul390-input-method (key)
diff --git a/lisp/nxml/rng-cmpct.el b/lisp/nxml/rng-cmpct.el
index dd3000773f..1476aa0e5a 100644
--- a/lisp/nxml/rng-cmpct.el
+++ b/lisp/nxml/rng-cmpct.el
@@ -369,7 +369,7 @@ rng-c-process-escapes
     (while (re-search-forward "\\\\x+{\\([[:xdigit:]]+\\)}"
 			      (point-max)
 			      t)
-      (let* ((ch (decode-char 'ucs (string-to-number (match-string 1) 16))))
+      (let* ((ch (string-to-number (match-string 1) 16)))
 	(if (and ch (> ch 0))
 	    (let ((begin (match-beginning 0))
 		  (end (match-end 0)))
diff --git a/lisp/nxml/xmltok.el b/lisp/nxml/xmltok.el
index 6159e00c51..503bb7d38f 100644
--- a/lisp/nxml/xmltok.el
+++ b/lisp/nxml/xmltok.el
@@ -943,7 +943,6 @@ xmltok-scan-char-ref
 	(let ((n (string-to-number (buffer-substring-no-properties start end)
 				base)))
 	  (cond ((and (integerp n) (xmltok-valid-char-p n))
-		 (setq n (xmltok-unicode-to-char n))
 		 (and n (string n)))
 		(t
 		 (xmltok-add-error "Invalid character code" start end)
@@ -971,10 +970,8 @@ xmltok-valid-char-p
 	(t (and (> n #xFFFF)
 		(< n #x110000)))))
 
-(defun xmltok-unicode-to-char (n)
-  "Return the character corresponding to Unicode scalar value N.
-Return nil if unsupported in Emacs."
-  (decode-char 'ucs n))
+(define-obsolete-function-alias 'xmltok-unicode-to-char
+  #'identity "29.1")
 
 ;;; Prolog parsing
 
diff --git a/lisp/nxml/xsd-regexp.el b/lisp/nxml/xsd-regexp.el
index 3c29803ab9..d6eaf7cc4b 100644
--- a/lisp/nxml/xsd-regexp.el
+++ b/lisp/nxml/xsd-regexp.el
@@ -287,7 +287,7 @@ xsdre-compile-char-class
 (defun xsdre-compile-single-char (ch)
   (if (memq ch '(?. ?* ?+ ?? ?\[ ?\] ?^ ?$ ?\\))
       (string ?\\ ch)
-    (string (decode-char 'ucs ch))))
+    (string ch)))
 
 (defun xsdre-char-class-to-range-list (cc)
   "Return a range-list for a symbolic char-class CC."
@@ -404,10 +404,6 @@ xsdre-range-list-to-char-alternative
 		    (cons last chars)
 		  (cons last (cons ?- chars))))))
       (setq range-list (cdr range-list)))
-    (setq chars
-	  (mapcar (lambda (c)
-		    (decode-char 'ucs c))
-		  chars))
     (when caret
       (setq chars (cons ?^ chars)))
     (when hyphen
diff --git a/lisp/xml.el b/lisp/xml.el
index 0282e3741c..e2ba02e195 100644
--- a/lisp/xml.el
+++ b/lisp/xml.el
@@ -612,8 +612,8 @@ xml-parse-string
 	(if (setq ref (match-string 2))
 	    (progn  ; Numeric char reference
 	      (setq val (save-match-data
-			  (decode-char 'ucs (string-to-number
-					     ref (if (match-string 1) 16)))))
+                          (string-to-number
+                           ref (if (match-string 1) 16))))
 	      (and (null val)
 		   xml-validating-parser
 		   (error "XML: (Validity) Invalid character reference `%s'"
@@ -898,11 +898,11 @@ xml--entity-replacement-text
 	    ref val)
 	(cond ((setq ref (match-string 1 string))
 	       ;; Decimal character reference
-	       (setq val (decode-char 'ucs (string-to-number ref)))
+               (setq val (string-to-number ref))
 	       (if val (push (string val) children)))
 	      ;; Hexadecimal character reference
 	      ((setq ref (match-string 2 string))
-	       (setq val (decode-char 'ucs (string-to-number ref 16)))
+               (setq val (string-to-number ref 16))
 	       (if val (push (string val) children)))
 	      ;; Parameter entity reference
 	      ((setq ref (match-string 3 string))
@@ -962,7 +962,7 @@ xml-substitute-special
 	(if ref
 	    ;; [4.6] Character references are included as
 	    ;; character data.
-	    (let ((val (decode-char 'ucs (string-to-number ref (if is-hex 16)))))
+            (let ((val (string-to-number ref (if is-hex 16))))
 	      (push (cond (val (string val))
 			  (xml-validating-parser
 			   (error "XML: (Validity) Undefined character `x%s'" ref))

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* bug#52263: Stale comment in xsd-regexp.el about Emacs not supporting Unicode
  2021-12-05 17:34       ` Stefan Kangas
@ 2021-12-05 17:48         ` Eli Zaretskii
  2021-12-05 18:20           ` Stefan Kangas
  0 siblings, 1 reply; 7+ messages in thread
From: Eli Zaretskii @ 2021-12-05 17:48 UTC (permalink / raw)
  To: Stefan Kangas; +Cc: 52263

> From: Stefan Kangas <stefan@marxist.se>
> Date: Sun, 5 Dec 2021 18:34:32 +0100
> Cc: 52263@debbugs.gnu.org
> 
> > Also, decode-char is now a no-op when the 1st arg is 'ucs'.
> 
> Interesting.  So the attached cleanup should be okay to install, then?

Yes -- assuming that the 2nd argument is never a cons cell.  Which
AFAICT it never is in these places.





^ permalink raw reply	[flat|nested] 7+ messages in thread

* bug#52263: Stale comment in xsd-regexp.el about Emacs not supporting Unicode
  2021-12-05 17:48         ` Eli Zaretskii
@ 2021-12-05 18:20           ` Stefan Kangas
  0 siblings, 0 replies; 7+ messages in thread
From: Stefan Kangas @ 2021-12-05 18:20 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 52263

Eli Zaretskii <eliz@gnu.org> writes:

>> Interesting.  So the attached cleanup should be okay to install, then?
>
> Yes -- assuming that the 2nd argument is never a cons cell.  Which
> AFAICT it never is in these places.

I looked over all cases too, and my conclusion is also that the second
argument can never be a cons cell there.  So I installed the cleanup on
master.  Thanks.





^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2021-12-05 18:20 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-12-03 18:37 bug#52263: Stale comment in xsd-regexp.el about Emacs not supporting Unicode Stefan Kangas
2021-12-03 19:27 ` Eli Zaretskii
2021-12-04 13:07   ` Stefan Kangas
2021-12-04 16:16     ` Eli Zaretskii
2021-12-05 17:34       ` Stefan Kangas
2021-12-05 17:48         ` Eli Zaretskii
2021-12-05 18:20           ` Stefan Kangas

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).