* bug#52263: Stale comment in xsd-regexp.el about Emacs not supporting Unicode
@ 2021-12-03 18:37 Stefan Kangas
2021-12-03 19:27 ` Eli Zaretskii
0 siblings, 1 reply; 7+ messages in thread
From: Stefan Kangas @ 2021-12-03 18:37 UTC (permalink / raw)
To: 52263
Severity: minor
I believe this comment in lisp/nxml/xsd-regexp.el can be removed as
Emacs supports Unicode now:
;; The semantics of XSD regexps are defined in terms of Unicode.
;; Non-Unicode characters are not allowed in regular expressions and
;; will not match against the generated regular expressions. A
;; Unicode character means a character in one of the Mule charsets
;; ascii, latin-iso8859-1, mule-unicode-0100-24ff,
;; mule-unicode-2500-33ff, mule-unicode-e000-ffff, eight-bit-control
;; or a character translatable to such a character (i.e a character
;; for which `encode-char' will return non-nil).
;;
;; Unfortunately, this means that this package is currently useless
;; for CJK characters, since there's no mule-unicode charset for the
;; CJK ranges of Unicode. We should devise a workaround for this
;; until the fabled Unicode version of Emacs makes an appearance.
Is that correct?
^ permalink raw reply [flat|nested] 7+ messages in thread
* bug#52263: Stale comment in xsd-regexp.el about Emacs not supporting Unicode
2021-12-03 18:37 bug#52263: Stale comment in xsd-regexp.el about Emacs not supporting Unicode Stefan Kangas
@ 2021-12-03 19:27 ` Eli Zaretskii
2021-12-04 13:07 ` Stefan Kangas
0 siblings, 1 reply; 7+ messages in thread
From: Eli Zaretskii @ 2021-12-03 19:27 UTC (permalink / raw)
To: Stefan Kangas; +Cc: 52263
> From: Stefan Kangas <stefan@marxist.se>
> Date: Fri, 3 Dec 2021 10:37:10 -0800
>
> I believe this comment in lisp/nxml/xsd-regexp.el can be removed as
> Emacs supports Unicode now:
>
> ;; The semantics of XSD regexps are defined in terms of Unicode.
> ;; Non-Unicode characters are not allowed in regular expressions and
> ;; will not match against the generated regular expressions. A
> ;; Unicode character means a character in one of the Mule charsets
> ;; ascii, latin-iso8859-1, mule-unicode-0100-24ff,
> ;; mule-unicode-2500-33ff, mule-unicode-e000-ffff, eight-bit-control
> ;; or a character translatable to such a character (i.e a character
> ;; for which `encode-char' will return non-nil).
> ;;
> ;; Unfortunately, this means that this package is currently useless
> ;; for CJK characters, since there's no mule-unicode charset for the
> ;; CJK ranges of Unicode. We should devise a workaround for this
> ;; until the fabled Unicode version of Emacs makes an appearance.
>
> Is that correct?
Probably. The mule-Unicode-* stuff is definitely obsolete. The only
thing that bothers me is what happens with eight-bit characters in the
XSD regexps -- are they allowed? Emacs in general does allow them.
If xsd-regexp.el doesn't, that should be stated there.
^ permalink raw reply [flat|nested] 7+ messages in thread
* bug#52263: Stale comment in xsd-regexp.el about Emacs not supporting Unicode
2021-12-03 19:27 ` Eli Zaretskii
@ 2021-12-04 13:07 ` Stefan Kangas
2021-12-04 16:16 ` Eli Zaretskii
0 siblings, 1 reply; 7+ messages in thread
From: Stefan Kangas @ 2021-12-04 13:07 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: 52263
Eli Zaretskii <eliz@gnu.org> writes:
>> I believe this comment in lisp/nxml/xsd-regexp.el can be removed as
>> Emacs supports Unicode now:
>>
>> ;; The semantics of XSD regexps are defined in terms of Unicode.
>> ;; Non-Unicode characters are not allowed in regular expressions and
>> ;; will not match against the generated regular expressions. A
>> ;; Unicode character means a character in one of the Mule charsets
>> ;; ascii, latin-iso8859-1, mule-unicode-0100-24ff,
>> ;; mule-unicode-2500-33ff, mule-unicode-e000-ffff, eight-bit-control
>> ;; or a character translatable to such a character (i.e a character
>> ;; for which `encode-char' will return non-nil).
>> ;;
>> ;; Unfortunately, this means that this package is currently useless
>> ;; for CJK characters, since there's no mule-unicode charset for the
>> ;; CJK ranges of Unicode. We should devise a workaround for this
>> ;; until the fabled Unicode version of Emacs makes an appearance.
>>
>> Is that correct?
>
> Probably. The mule-Unicode-* stuff is definitely obsolete. The only
> thing that bothers me is what happens with eight-bit characters in the
> XSD regexps -- are they allowed? Emacs in general does allow them.
> If xsd-regexp.el doesn't, that should be stated there.
Hmm, so probably more work is needed here than just removing the above
comment. There is a lot of non-trivial mule and conversion stuff going
on in that library that might need a proper look by someone that knows
this stuff well.
Perhaps this bug should also be retitled accordingly.
^ permalink raw reply [flat|nested] 7+ messages in thread
* bug#52263: Stale comment in xsd-regexp.el about Emacs not supporting Unicode
2021-12-04 13:07 ` Stefan Kangas
@ 2021-12-04 16:16 ` Eli Zaretskii
2021-12-05 17:34 ` Stefan Kangas
0 siblings, 1 reply; 7+ messages in thread
From: Eli Zaretskii @ 2021-12-04 16:16 UTC (permalink / raw)
To: Stefan Kangas; +Cc: 52263
> From: Stefan Kangas <stefan@marxist.se>
> Date: Sat, 4 Dec 2021 14:07:46 +0100
> Cc: 52263@debbugs.gnu.org
>
> >> Is that correct?
> >
> > Probably. The mule-Unicode-* stuff is definitely obsolete. The only
> > thing that bothers me is what happens with eight-bit characters in the
> > XSD regexps -- are they allowed? Emacs in general does allow them.
> > If xsd-regexp.el doesn't, that should be stated there.
>
> Hmm, so probably more work is needed here than just removing the above
> comment. There is a lot of non-trivial mule and conversion stuff going
> on in that library that might need a proper look by someone that knows
> this stuff well.
Mainly that file needs simplification: we in effect have a single
range of characters, with the possible exception of the codepoints
between 128 and 160. Also, decode-char is now a no-op when the 1st
arg is 'ucs'.
I can offer help in those parts where you don't feel you understand
the issue well enough to make the simplifications.
^ permalink raw reply [flat|nested] 7+ messages in thread
* bug#52263: Stale comment in xsd-regexp.el about Emacs not supporting Unicode
2021-12-04 16:16 ` Eli Zaretskii
@ 2021-12-05 17:34 ` Stefan Kangas
2021-12-05 17:48 ` Eli Zaretskii
0 siblings, 1 reply; 7+ messages in thread
From: Stefan Kangas @ 2021-12-05 17:34 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: 52263
[-- Attachment #1: Type: text/plain, Size: 172 bytes --]
Eli Zaretskii <eliz@gnu.org> writes:
> Also, decode-char is now a no-op when the 1st arg is 'ucs'.
Interesting. So the attached cleanup should be okay to install, then?
[-- Attachment #2: decode-char.diff --]
[-- Type: text/x-diff, Size: 5682 bytes --]
diff --git a/lisp/gnus/mm-util.el b/lisp/gnus/mm-util.el
index 92e04f9d2e..ddc228e490 100644
--- a/lisp/gnus/mm-util.el
+++ b/lisp/gnus/mm-util.el
@@ -31,7 +31,7 @@ mm-mime-mule-charset-alist
(defun mm-ucs-to-char (codepoint)
"Convert Unicode codepoint to character."
- (or (decode-char 'ucs codepoint) ?#))
+ (or codepoint ?#))
(defvar mm-coding-system-list nil)
(defun mm-get-coding-system-list ()
diff --git a/lisp/language/hanja-util.el b/lisp/language/hanja-util.el
index 9e9213536c..fe6323d42b 100644
--- a/lisp/language/hanja-util.el
+++ b/lisp/language/hanja-util.el
@@ -6573,8 +6573,8 @@ hangul-to-hanja-char
(hanja-filter (lambda (x) (car x))
(mapcar (lambda (c)
(if (listp c)
- (cons (decode-char 'ucs (car c)) (cdr c))
- (list (decode-char 'ucs c))))
+ (cons (car c) (cdr c))
+ (list c)))
(aref hanja-table char)))))
(unwind-protect
(when (aref hanja-conversions 2)
diff --git a/lisp/leim/quail/hangul.el b/lisp/leim/quail/hangul.el
index c03e86b33c..d069b5b68e 100644
--- a/lisp/leim/quail/hangul.el
+++ b/lisp/leim/quail/hangul.el
@@ -429,7 +429,7 @@ hangul3-input-method-internal
(hangul3-input-method-jong char))
(t
(setq hangul-queue (make-vector 6 0))
- (insert (decode-char 'ucs char))
+ (insert char)
(move-overlay quail-overlay (point) (point))))))
(defun hangul3-input-method (key)
@@ -476,7 +476,7 @@ hangul390-input-method-internal
(hangul3-input-method-jong char))
(t
(setq hangul-queue (make-vector 6 0))
- (insert (decode-char 'ucs char))
+ (insert char)
(move-overlay quail-overlay (point) (point))))))
(defun hangul390-input-method (key)
diff --git a/lisp/nxml/rng-cmpct.el b/lisp/nxml/rng-cmpct.el
index dd3000773f..1476aa0e5a 100644
--- a/lisp/nxml/rng-cmpct.el
+++ b/lisp/nxml/rng-cmpct.el
@@ -369,7 +369,7 @@ rng-c-process-escapes
(while (re-search-forward "\\\\x+{\\([[:xdigit:]]+\\)}"
(point-max)
t)
- (let* ((ch (decode-char 'ucs (string-to-number (match-string 1) 16))))
+ (let* ((ch (string-to-number (match-string 1) 16)))
(if (and ch (> ch 0))
(let ((begin (match-beginning 0))
(end (match-end 0)))
diff --git a/lisp/nxml/xmltok.el b/lisp/nxml/xmltok.el
index 6159e00c51..503bb7d38f 100644
--- a/lisp/nxml/xmltok.el
+++ b/lisp/nxml/xmltok.el
@@ -943,7 +943,6 @@ xmltok-scan-char-ref
(let ((n (string-to-number (buffer-substring-no-properties start end)
base)))
(cond ((and (integerp n) (xmltok-valid-char-p n))
- (setq n (xmltok-unicode-to-char n))
(and n (string n)))
(t
(xmltok-add-error "Invalid character code" start end)
@@ -971,10 +970,8 @@ xmltok-valid-char-p
(t (and (> n #xFFFF)
(< n #x110000)))))
-(defun xmltok-unicode-to-char (n)
- "Return the character corresponding to Unicode scalar value N.
-Return nil if unsupported in Emacs."
- (decode-char 'ucs n))
+(define-obsolete-function-alias 'xmltok-unicode-to-char
+ #'identity "29.1")
;;; Prolog parsing
diff --git a/lisp/nxml/xsd-regexp.el b/lisp/nxml/xsd-regexp.el
index 3c29803ab9..d6eaf7cc4b 100644
--- a/lisp/nxml/xsd-regexp.el
+++ b/lisp/nxml/xsd-regexp.el
@@ -287,7 +287,7 @@ xsdre-compile-char-class
(defun xsdre-compile-single-char (ch)
(if (memq ch '(?. ?* ?+ ?? ?\[ ?\] ?^ ?$ ?\\))
(string ?\\ ch)
- (string (decode-char 'ucs ch))))
+ (string ch)))
(defun xsdre-char-class-to-range-list (cc)
"Return a range-list for a symbolic char-class CC."
@@ -404,10 +404,6 @@ xsdre-range-list-to-char-alternative
(cons last chars)
(cons last (cons ?- chars))))))
(setq range-list (cdr range-list)))
- (setq chars
- (mapcar (lambda (c)
- (decode-char 'ucs c))
- chars))
(when caret
(setq chars (cons ?^ chars)))
(when hyphen
diff --git a/lisp/xml.el b/lisp/xml.el
index 0282e3741c..e2ba02e195 100644
--- a/lisp/xml.el
+++ b/lisp/xml.el
@@ -612,8 +612,8 @@ xml-parse-string
(if (setq ref (match-string 2))
(progn ; Numeric char reference
(setq val (save-match-data
- (decode-char 'ucs (string-to-number
- ref (if (match-string 1) 16)))))
+ (string-to-number
+ ref (if (match-string 1) 16))))
(and (null val)
xml-validating-parser
(error "XML: (Validity) Invalid character reference `%s'"
@@ -898,11 +898,11 @@ xml--entity-replacement-text
ref val)
(cond ((setq ref (match-string 1 string))
;; Decimal character reference
- (setq val (decode-char 'ucs (string-to-number ref)))
+ (setq val (string-to-number ref))
(if val (push (string val) children)))
;; Hexadecimal character reference
((setq ref (match-string 2 string))
- (setq val (decode-char 'ucs (string-to-number ref 16)))
+ (setq val (string-to-number ref 16))
(if val (push (string val) children)))
;; Parameter entity reference
((setq ref (match-string 3 string))
@@ -962,7 +962,7 @@ xml-substitute-special
(if ref
;; [4.6] Character references are included as
;; character data.
- (let ((val (decode-char 'ucs (string-to-number ref (if is-hex 16)))))
+ (let ((val (string-to-number ref (if is-hex 16))))
(push (cond (val (string val))
(xml-validating-parser
(error "XML: (Validity) Undefined character `x%s'" ref))
^ permalink raw reply related [flat|nested] 7+ messages in thread
* bug#52263: Stale comment in xsd-regexp.el about Emacs not supporting Unicode
2021-12-05 17:34 ` Stefan Kangas
@ 2021-12-05 17:48 ` Eli Zaretskii
2021-12-05 18:20 ` Stefan Kangas
0 siblings, 1 reply; 7+ messages in thread
From: Eli Zaretskii @ 2021-12-05 17:48 UTC (permalink / raw)
To: Stefan Kangas; +Cc: 52263
> From: Stefan Kangas <stefan@marxist.se>
> Date: Sun, 5 Dec 2021 18:34:32 +0100
> Cc: 52263@debbugs.gnu.org
>
> > Also, decode-char is now a no-op when the 1st arg is 'ucs'.
>
> Interesting. So the attached cleanup should be okay to install, then?
Yes -- assuming that the 2nd argument is never a cons cell. Which
AFAICT it never is in these places.
^ permalink raw reply [flat|nested] 7+ messages in thread
* bug#52263: Stale comment in xsd-regexp.el about Emacs not supporting Unicode
2021-12-05 17:48 ` Eli Zaretskii
@ 2021-12-05 18:20 ` Stefan Kangas
0 siblings, 0 replies; 7+ messages in thread
From: Stefan Kangas @ 2021-12-05 18:20 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: 52263
Eli Zaretskii <eliz@gnu.org> writes:
>> Interesting. So the attached cleanup should be okay to install, then?
>
> Yes -- assuming that the 2nd argument is never a cons cell. Which
> AFAICT it never is in these places.
I looked over all cases too, and my conclusion is also that the second
argument can never be a cons cell there. So I installed the cleanup on
master. Thanks.
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2021-12-05 18:20 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-12-03 18:37 bug#52263: Stale comment in xsd-regexp.el about Emacs not supporting Unicode Stefan Kangas
2021-12-03 19:27 ` Eli Zaretskii
2021-12-04 13:07 ` Stefan Kangas
2021-12-04 16:16 ` Eli Zaretskii
2021-12-05 17:34 ` Stefan Kangas
2021-12-05 17:48 ` Eli Zaretskii
2021-12-05 18:20 ` Stefan Kangas
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/emacs.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).