* bug#52263: Stale comment in xsd-regexp.el about Emacs not supporting Unicode @ 2021-12-03 18:37 Stefan Kangas 2021-12-03 19:27 ` Eli Zaretskii 0 siblings, 1 reply; 7+ messages in thread From: Stefan Kangas @ 2021-12-03 18:37 UTC (permalink / raw) To: 52263 Severity: minor I believe this comment in lisp/nxml/xsd-regexp.el can be removed as Emacs supports Unicode now: ;; The semantics of XSD regexps are defined in terms of Unicode. ;; Non-Unicode characters are not allowed in regular expressions and ;; will not match against the generated regular expressions. A ;; Unicode character means a character in one of the Mule charsets ;; ascii, latin-iso8859-1, mule-unicode-0100-24ff, ;; mule-unicode-2500-33ff, mule-unicode-e000-ffff, eight-bit-control ;; or a character translatable to such a character (i.e a character ;; for which `encode-char' will return non-nil). ;; ;; Unfortunately, this means that this package is currently useless ;; for CJK characters, since there's no mule-unicode charset for the ;; CJK ranges of Unicode. We should devise a workaround for this ;; until the fabled Unicode version of Emacs makes an appearance. Is that correct? ^ permalink raw reply [flat|nested] 7+ messages in thread
* bug#52263: Stale comment in xsd-regexp.el about Emacs not supporting Unicode 2021-12-03 18:37 bug#52263: Stale comment in xsd-regexp.el about Emacs not supporting Unicode Stefan Kangas @ 2021-12-03 19:27 ` Eli Zaretskii 2021-12-04 13:07 ` Stefan Kangas 0 siblings, 1 reply; 7+ messages in thread From: Eli Zaretskii @ 2021-12-03 19:27 UTC (permalink / raw) To: Stefan Kangas; +Cc: 52263 > From: Stefan Kangas <stefan@marxist.se> > Date: Fri, 3 Dec 2021 10:37:10 -0800 > > I believe this comment in lisp/nxml/xsd-regexp.el can be removed as > Emacs supports Unicode now: > > ;; The semantics of XSD regexps are defined in terms of Unicode. > ;; Non-Unicode characters are not allowed in regular expressions and > ;; will not match against the generated regular expressions. A > ;; Unicode character means a character in one of the Mule charsets > ;; ascii, latin-iso8859-1, mule-unicode-0100-24ff, > ;; mule-unicode-2500-33ff, mule-unicode-e000-ffff, eight-bit-control > ;; or a character translatable to such a character (i.e a character > ;; for which `encode-char' will return non-nil). > ;; > ;; Unfortunately, this means that this package is currently useless > ;; for CJK characters, since there's no mule-unicode charset for the > ;; CJK ranges of Unicode. We should devise a workaround for this > ;; until the fabled Unicode version of Emacs makes an appearance. > > Is that correct? Probably. The mule-Unicode-* stuff is definitely obsolete. The only thing that bothers me is what happens with eight-bit characters in the XSD regexps -- are they allowed? Emacs in general does allow them. If xsd-regexp.el doesn't, that should be stated there. ^ permalink raw reply [flat|nested] 7+ messages in thread
* bug#52263: Stale comment in xsd-regexp.el about Emacs not supporting Unicode 2021-12-03 19:27 ` Eli Zaretskii @ 2021-12-04 13:07 ` Stefan Kangas 2021-12-04 16:16 ` Eli Zaretskii 0 siblings, 1 reply; 7+ messages in thread From: Stefan Kangas @ 2021-12-04 13:07 UTC (permalink / raw) To: Eli Zaretskii; +Cc: 52263 Eli Zaretskii <eliz@gnu.org> writes: >> I believe this comment in lisp/nxml/xsd-regexp.el can be removed as >> Emacs supports Unicode now: >> >> ;; The semantics of XSD regexps are defined in terms of Unicode. >> ;; Non-Unicode characters are not allowed in regular expressions and >> ;; will not match against the generated regular expressions. A >> ;; Unicode character means a character in one of the Mule charsets >> ;; ascii, latin-iso8859-1, mule-unicode-0100-24ff, >> ;; mule-unicode-2500-33ff, mule-unicode-e000-ffff, eight-bit-control >> ;; or a character translatable to such a character (i.e a character >> ;; for which `encode-char' will return non-nil). >> ;; >> ;; Unfortunately, this means that this package is currently useless >> ;; for CJK characters, since there's no mule-unicode charset for the >> ;; CJK ranges of Unicode. We should devise a workaround for this >> ;; until the fabled Unicode version of Emacs makes an appearance. >> >> Is that correct? > > Probably. The mule-Unicode-* stuff is definitely obsolete. The only > thing that bothers me is what happens with eight-bit characters in the > XSD regexps -- are they allowed? Emacs in general does allow them. > If xsd-regexp.el doesn't, that should be stated there. Hmm, so probably more work is needed here than just removing the above comment. There is a lot of non-trivial mule and conversion stuff going on in that library that might need a proper look by someone that knows this stuff well. Perhaps this bug should also be retitled accordingly. ^ permalink raw reply [flat|nested] 7+ messages in thread
* bug#52263: Stale comment in xsd-regexp.el about Emacs not supporting Unicode 2021-12-04 13:07 ` Stefan Kangas @ 2021-12-04 16:16 ` Eli Zaretskii 2021-12-05 17:34 ` Stefan Kangas 0 siblings, 1 reply; 7+ messages in thread From: Eli Zaretskii @ 2021-12-04 16:16 UTC (permalink / raw) To: Stefan Kangas; +Cc: 52263 > From: Stefan Kangas <stefan@marxist.se> > Date: Sat, 4 Dec 2021 14:07:46 +0100 > Cc: 52263@debbugs.gnu.org > > >> Is that correct? > > > > Probably. The mule-Unicode-* stuff is definitely obsolete. The only > > thing that bothers me is what happens with eight-bit characters in the > > XSD regexps -- are they allowed? Emacs in general does allow them. > > If xsd-regexp.el doesn't, that should be stated there. > > Hmm, so probably more work is needed here than just removing the above > comment. There is a lot of non-trivial mule and conversion stuff going > on in that library that might need a proper look by someone that knows > this stuff well. Mainly that file needs simplification: we in effect have a single range of characters, with the possible exception of the codepoints between 128 and 160. Also, decode-char is now a no-op when the 1st arg is 'ucs'. I can offer help in those parts where you don't feel you understand the issue well enough to make the simplifications. ^ permalink raw reply [flat|nested] 7+ messages in thread
* bug#52263: Stale comment in xsd-regexp.el about Emacs not supporting Unicode 2021-12-04 16:16 ` Eli Zaretskii @ 2021-12-05 17:34 ` Stefan Kangas 2021-12-05 17:48 ` Eli Zaretskii 0 siblings, 1 reply; 7+ messages in thread From: Stefan Kangas @ 2021-12-05 17:34 UTC (permalink / raw) To: Eli Zaretskii; +Cc: 52263 [-- Attachment #1: Type: text/plain, Size: 172 bytes --] Eli Zaretskii <eliz@gnu.org> writes: > Also, decode-char is now a no-op when the 1st arg is 'ucs'. Interesting. So the attached cleanup should be okay to install, then? [-- Attachment #2: decode-char.diff --] [-- Type: text/x-diff, Size: 5682 bytes --] diff --git a/lisp/gnus/mm-util.el b/lisp/gnus/mm-util.el index 92e04f9d2e..ddc228e490 100644 --- a/lisp/gnus/mm-util.el +++ b/lisp/gnus/mm-util.el @@ -31,7 +31,7 @@ mm-mime-mule-charset-alist (defun mm-ucs-to-char (codepoint) "Convert Unicode codepoint to character." - (or (decode-char 'ucs codepoint) ?#)) + (or codepoint ?#)) (defvar mm-coding-system-list nil) (defun mm-get-coding-system-list () diff --git a/lisp/language/hanja-util.el b/lisp/language/hanja-util.el index 9e9213536c..fe6323d42b 100644 --- a/lisp/language/hanja-util.el +++ b/lisp/language/hanja-util.el @@ -6573,8 +6573,8 @@ hangul-to-hanja-char (hanja-filter (lambda (x) (car x)) (mapcar (lambda (c) (if (listp c) - (cons (decode-char 'ucs (car c)) (cdr c)) - (list (decode-char 'ucs c)))) + (cons (car c) (cdr c)) + (list c))) (aref hanja-table char))))) (unwind-protect (when (aref hanja-conversions 2) diff --git a/lisp/leim/quail/hangul.el b/lisp/leim/quail/hangul.el index c03e86b33c..d069b5b68e 100644 --- a/lisp/leim/quail/hangul.el +++ b/lisp/leim/quail/hangul.el @@ -429,7 +429,7 @@ hangul3-input-method-internal (hangul3-input-method-jong char)) (t (setq hangul-queue (make-vector 6 0)) - (insert (decode-char 'ucs char)) + (insert char) (move-overlay quail-overlay (point) (point)))))) (defun hangul3-input-method (key) @@ -476,7 +476,7 @@ hangul390-input-method-internal (hangul3-input-method-jong char)) (t (setq hangul-queue (make-vector 6 0)) - (insert (decode-char 'ucs char)) + (insert char) (move-overlay quail-overlay (point) (point)))))) (defun hangul390-input-method (key) diff --git a/lisp/nxml/rng-cmpct.el b/lisp/nxml/rng-cmpct.el index dd3000773f..1476aa0e5a 100644 --- a/lisp/nxml/rng-cmpct.el +++ b/lisp/nxml/rng-cmpct.el @@ -369,7 +369,7 @@ rng-c-process-escapes (while (re-search-forward "\\\\x+{\\([[:xdigit:]]+\\)}" (point-max) t) - (let* ((ch (decode-char 'ucs (string-to-number (match-string 1) 16)))) + (let* ((ch (string-to-number (match-string 1) 16))) (if (and ch (> ch 0)) (let ((begin (match-beginning 0)) (end (match-end 0))) diff --git a/lisp/nxml/xmltok.el b/lisp/nxml/xmltok.el index 6159e00c51..503bb7d38f 100644 --- a/lisp/nxml/xmltok.el +++ b/lisp/nxml/xmltok.el @@ -943,7 +943,6 @@ xmltok-scan-char-ref (let ((n (string-to-number (buffer-substring-no-properties start end) base))) (cond ((and (integerp n) (xmltok-valid-char-p n)) - (setq n (xmltok-unicode-to-char n)) (and n (string n))) (t (xmltok-add-error "Invalid character code" start end) @@ -971,10 +970,8 @@ xmltok-valid-char-p (t (and (> n #xFFFF) (< n #x110000))))) -(defun xmltok-unicode-to-char (n) - "Return the character corresponding to Unicode scalar value N. -Return nil if unsupported in Emacs." - (decode-char 'ucs n)) +(define-obsolete-function-alias 'xmltok-unicode-to-char + #'identity "29.1") ;;; Prolog parsing diff --git a/lisp/nxml/xsd-regexp.el b/lisp/nxml/xsd-regexp.el index 3c29803ab9..d6eaf7cc4b 100644 --- a/lisp/nxml/xsd-regexp.el +++ b/lisp/nxml/xsd-regexp.el @@ -287,7 +287,7 @@ xsdre-compile-char-class (defun xsdre-compile-single-char (ch) (if (memq ch '(?. ?* ?+ ?? ?\[ ?\] ?^ ?$ ?\\)) (string ?\\ ch) - (string (decode-char 'ucs ch)))) + (string ch))) (defun xsdre-char-class-to-range-list (cc) "Return a range-list for a symbolic char-class CC." @@ -404,10 +404,6 @@ xsdre-range-list-to-char-alternative (cons last chars) (cons last (cons ?- chars)))))) (setq range-list (cdr range-list))) - (setq chars - (mapcar (lambda (c) - (decode-char 'ucs c)) - chars)) (when caret (setq chars (cons ?^ chars))) (when hyphen diff --git a/lisp/xml.el b/lisp/xml.el index 0282e3741c..e2ba02e195 100644 --- a/lisp/xml.el +++ b/lisp/xml.el @@ -612,8 +612,8 @@ xml-parse-string (if (setq ref (match-string 2)) (progn ; Numeric char reference (setq val (save-match-data - (decode-char 'ucs (string-to-number - ref (if (match-string 1) 16))))) + (string-to-number + ref (if (match-string 1) 16)))) (and (null val) xml-validating-parser (error "XML: (Validity) Invalid character reference `%s'" @@ -898,11 +898,11 @@ xml--entity-replacement-text ref val) (cond ((setq ref (match-string 1 string)) ;; Decimal character reference - (setq val (decode-char 'ucs (string-to-number ref))) + (setq val (string-to-number ref)) (if val (push (string val) children))) ;; Hexadecimal character reference ((setq ref (match-string 2 string)) - (setq val (decode-char 'ucs (string-to-number ref 16))) + (setq val (string-to-number ref 16)) (if val (push (string val) children))) ;; Parameter entity reference ((setq ref (match-string 3 string)) @@ -962,7 +962,7 @@ xml-substitute-special (if ref ;; [4.6] Character references are included as ;; character data. - (let ((val (decode-char 'ucs (string-to-number ref (if is-hex 16))))) + (let ((val (string-to-number ref (if is-hex 16)))) (push (cond (val (string val)) (xml-validating-parser (error "XML: (Validity) Undefined character `x%s'" ref)) ^ permalink raw reply related [flat|nested] 7+ messages in thread
* bug#52263: Stale comment in xsd-regexp.el about Emacs not supporting Unicode 2021-12-05 17:34 ` Stefan Kangas @ 2021-12-05 17:48 ` Eli Zaretskii 2021-12-05 18:20 ` Stefan Kangas 0 siblings, 1 reply; 7+ messages in thread From: Eli Zaretskii @ 2021-12-05 17:48 UTC (permalink / raw) To: Stefan Kangas; +Cc: 52263 > From: Stefan Kangas <stefan@marxist.se> > Date: Sun, 5 Dec 2021 18:34:32 +0100 > Cc: 52263@debbugs.gnu.org > > > Also, decode-char is now a no-op when the 1st arg is 'ucs'. > > Interesting. So the attached cleanup should be okay to install, then? Yes -- assuming that the 2nd argument is never a cons cell. Which AFAICT it never is in these places. ^ permalink raw reply [flat|nested] 7+ messages in thread
* bug#52263: Stale comment in xsd-regexp.el about Emacs not supporting Unicode 2021-12-05 17:48 ` Eli Zaretskii @ 2021-12-05 18:20 ` Stefan Kangas 0 siblings, 0 replies; 7+ messages in thread From: Stefan Kangas @ 2021-12-05 18:20 UTC (permalink / raw) To: Eli Zaretskii; +Cc: 52263 Eli Zaretskii <eliz@gnu.org> writes: >> Interesting. So the attached cleanup should be okay to install, then? > > Yes -- assuming that the 2nd argument is never a cons cell. Which > AFAICT it never is in these places. I looked over all cases too, and my conclusion is also that the second argument can never be a cons cell there. So I installed the cleanup on master. Thanks. ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2021-12-05 18:20 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2021-12-03 18:37 bug#52263: Stale comment in xsd-regexp.el about Emacs not supporting Unicode Stefan Kangas 2021-12-03 19:27 ` Eli Zaretskii 2021-12-04 13:07 ` Stefan Kangas 2021-12-04 16:16 ` Eli Zaretskii 2021-12-05 17:34 ` Stefan Kangas 2021-12-05 17:48 ` Eli Zaretskii 2021-12-05 18:20 ` Stefan Kangas
Code repositories for project(s) associated with this external index https://git.savannah.gnu.org/cgit/emacs.git https://git.savannah.gnu.org/cgit/emacs/org-mode.git This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.