* Translation of http status code to text @ 2010-03-18 19:11 Lennart Borgman 2010-03-22 1:19 ` Juri Linkov 0 siblings, 1 reply; 182+ messages in thread From: Lennart Borgman @ 2010-03-18 19:11 UTC (permalink / raw) To: Emacs-Devel devel Is there a list in Emacs for translating http status code to readable text (like http://en.wikipedia.org/wiki/List_of_HTTP_status_codes)? If not could we please add one that can be used from different elisp files that needs this? ^ permalink raw reply [flat|nested] 182+ messages in thread
* Re: Translation of http status code to text 2010-03-18 19:11 Translation of http status code to text Lennart Borgman @ 2010-03-22 1:19 ` Juri Linkov 2010-03-22 13:17 ` Ted Zlatanov 0 siblings, 1 reply; 182+ messages in thread From: Juri Linkov @ 2010-03-22 1:19 UTC (permalink / raw) To: Lennart Borgman; +Cc: Emacs-Devel devel > Is there a list in Emacs for translating http status code to readable > text (like http://en.wikipedia.org/wiki/List_of_HTTP_status_codes)? > > If not could we please add one that can be used from different elisp > files that needs this? I suppose this should be in lisp/url/url-http.el. -- Juri Linkov http://www.jurta.org/emacs/ ^ permalink raw reply [flat|nested] 182+ messages in thread
* Re: Translation of http status code to text 2010-03-22 1:19 ` Juri Linkov @ 2010-03-22 13:17 ` Ted Zlatanov 2010-03-22 14:01 ` Stefan Monnier 0 siblings, 1 reply; 182+ messages in thread From: Ted Zlatanov @ 2010-03-22 13:17 UTC (permalink / raw) To: emacs-devel On Mon, 22 Mar 2010 03:19:32 +0200 Juri Linkov <juri@jurta.org> wrote: >> Is there a list in Emacs for translating http status code to readable >> text (like http://en.wikipedia.org/wiki/List_of_HTTP_status_codes)? >> >> If not could we please add one that can be used from different elisp >> files that needs this? JL> I suppose this should be in lisp/url/url-http.el. The codes are used (and listed in the comments) in url-http-parse-headers. Would it be OK to just make an alist with symbols and text, e.g. (setq url-http-codes '((206 (url-http-code-partial-content "Partial content")))) ... (assq 206 url-http-codes) or is there anything else needed? Ted ^ permalink raw reply [flat|nested] 182+ messages in thread
* Re: Translation of http status code to text 2010-03-22 13:17 ` Ted Zlatanov @ 2010-03-22 14:01 ` Stefan Monnier 2010-03-22 14:25 ` Ted Zlatanov 0 siblings, 1 reply; 182+ messages in thread From: Stefan Monnier @ 2010-03-22 14:01 UTC (permalink / raw) To: Ted Zlatanov; +Cc: emacs-devel > Would it be OK to just make an alist with symbols and text, e.g. The question is sufficiently general that some of the possible understandings of it are indeed OK. Of course, you can also come up with an alist of symbols and text to which I would be strongly opposed. So assuming you (can) do it right, then yes, that's of course perfectly acceptable; but details matter. > (setq url-http-codes '((206 (url-http-code-partial-content "Partial content")))) `setq' doesn't sound right, for example, Stefan ^ permalink raw reply [flat|nested] 182+ messages in thread
* Re: Translation of http status code to text 2010-03-22 14:01 ` Stefan Monnier @ 2010-03-22 14:25 ` Ted Zlatanov 2010-03-22 17:06 ` Ted Zlatanov 0 siblings, 1 reply; 182+ messages in thread From: Ted Zlatanov @ 2010-03-22 14:25 UTC (permalink / raw) To: emacs-devel On Mon, 22 Mar 2010 10:01:24 -0400 Stefan Monnier <monnier@iro.umontreal.ca> wrote: >> Would it be OK to just make an alist with symbols and text, e.g. SM> The question is sufficiently general that some of the possible SM> understandings of it are indeed OK. Of course, you can also come up SM> with an alist of symbols and text to which I would be strongly opposed. SM> So assuming you (can) do it right, then yes, that's of course perfectly SM> acceptable; but details matter. I am not sure what you're saying. Is the format I showed, which is sufficient for the url-http.el needs, OK or should I use a plist in the cdr? Or do you mean I should put the list together and then we can massage it? >> (setq url-http-codes '((206 (url-http-code-partial-content "Partial content")))) SM> `setq' doesn't sound right, for example, I was just showing the format. It would be a proper defvar of course. Ted ^ permalink raw reply [flat|nested] 182+ messages in thread
* Re: Translation of http status code to text 2010-03-22 14:25 ` Ted Zlatanov @ 2010-03-22 17:06 ` Ted Zlatanov 2010-03-22 17:55 ` Sven Joachim 2010-03-22 18:41 ` Translation of http status code to text Stefan Monnier 0 siblings, 2 replies; 182+ messages in thread From: Ted Zlatanov @ 2010-03-22 17:06 UTC (permalink / raw) To: emacs-devel Here's one version of the HTTP codes as an alist. I was thinking of also generating defconst calls based on this list, that's why I named everything "url-http-code-*". But maybe that's not necessary and accessor functions will be enough, so then we can s/url-http-code-// Ted === modified file 'lisp/url/url-http.el' --- lisp/url/url-http.el 2010-01-13 08:35:10 +0000 +++ lisp/url/url-http.el 2010-03-22 17:00:04 +0000 @@ -64,6 +64,56 @@ nil will explicitly close the connection to the server after every request.") +(defvar url-http-codes + "The HTTP return codes." + '( + (100 url-http-code-continue "Continue with request") + (101 url-http-code-switching-protocols "Switching protocols") + (102 url-http-code-processing "Processing (Added by DAV)") + (200 url-http-code-OK "OК") + (201 url-http-code-created "Created") + (202 url-http-code-accepted "Accepted") + (203 url-http-code-non-authoritative "Non-authoritative information") + (204 url-http-code-no-content "No content") + (205 url-http-code-reset-content "Reset content") + (206 url-http-code-partial-content "Partial content") + (207 url-http-code-multi-status "Multi-status (Added by DAV)") + (300 url-http-code-multiple-choices "Multiple choices") + (301 url-http-code-moved-permanently "Moved permanently") + (302 url-http-code-found "Found") + (303 url-http-code-see-other "See other") + (304 url-http-code-not-modified "Not modified") + (305 url-http-code-use-proxy "Use proxy") + (307 url-http-code-temporary-redirect "Temporary redirect") + (400 url-http-code-bad-request "Bad Request") + (401 url-http-code-unauthorized "Unauthorized") + (402 url-http-code-payment-required "Payment required") + (403 url-http-code-forbidden "Forbidden") + (404 url-http-code-not-found "Not found") + (405 url-http-code-method-not-allowed "Method not allowed") + (406 url-http-code-not-acceptable "Not acceptable") + (407 url-http-code-proxy-authentication-required "Proxy authentication required") + (408 url-http-code-request-timeout "Request time-out") + (409 url-http-code-conflict "Conflict") + (410 url-http-code-gone "Gone") + (411 url-http-code-length-required "Length required") + (412 url-http-code-precondition-failed "Precondition failed") + (413 url-http-code-request-entity-too-large "Request entity too large") + (414 url-http-code-request-uri-too-large "Request-URI too large") + (415 url-http-code-unsupported-media-type "Unsupported media type") + (416 url-http-code-requested-range-not-satisfiable "Requested range not satisfiable") + (417 url-http-code-expectation-failed "Expectation failed") + (422 url-http-code-unprocessable-entity "Unprocessable Entity (Added by DAV)") + (423 url-http-code-locked "Locked") + (424 url-http-code-failed-Dependency "Failed Dependency") + (500 url-http-code-internal-server-error "Internal server error") + (501 url-http-code-not-implemented "Not implemented") + (502 url-http-code-bad-gateway "Bad gateway") + (503 url-http-code-service-unavailable "Service unavailable") + (504 url-http-code-gateway-timeout "Gateway time-out") + (505 url-http-code-http-version-not-supported "HTTP version not supported") + (507 url-http-code-insufficient-storage "Insufficient storage"))) + ;(eval-when-compile ;; These are all macros so that they are hidden from external sight ;; when the file is byte-compiled. ^ permalink raw reply [flat|nested] 182+ messages in thread
* Re: Translation of http status code to text 2010-03-22 17:06 ` Ted Zlatanov @ 2010-03-22 17:55 ` Sven Joachim 2010-03-22 19:23 ` Ted Zlatanov 2010-03-22 18:41 ` Translation of http status code to text Stefan Monnier 1 sibling, 1 reply; 182+ messages in thread From: Sven Joachim @ 2010-03-22 17:55 UTC (permalink / raw) To: Ted Zlatanov; +Cc: emacs-devel On 2010-03-22 18:06 +0100, Ted Zlatanov wrote: > + (200 url-http-code-OK "OК") ^ I think this is not quite OK. ;-) Sven ^ permalink raw reply [flat|nested] 182+ messages in thread
* Re: Translation of http status code to text 2010-03-22 17:55 ` Sven Joachim @ 2010-03-22 19:23 ` Ted Zlatanov 2010-03-22 20:32 ` Sven Joachim 0 siblings, 1 reply; 182+ messages in thread From: Ted Zlatanov @ 2010-03-22 19:23 UTC (permalink / raw) To: emacs-devel On Mon, 22 Mar 2010 18:55:57 +0100 Sven Joachim <svenjoac@gmx.de> wrote: SJ> On 2010-03-22 18:06 +0100, Ted Zlatanov wrote: >> + (200 url-http-code-OK "OК") SJ> ^ SJ> I think this is not quite OK. ;-) Sorry, can you elaborate? Ted ^ permalink raw reply [flat|nested] 182+ messages in thread
* Re: Translation of http status code to text 2010-03-22 19:23 ` Ted Zlatanov @ 2010-03-22 20:32 ` Sven Joachim 2010-03-22 21:31 ` Ted Zlatanov 0 siblings, 1 reply; 182+ messages in thread From: Sven Joachim @ 2010-03-22 20:32 UTC (permalink / raw) To: Ted Zlatanov; +Cc: emacs-devel On 2010-03-22 20:23 +0100, Ted Zlatanov wrote: > On Mon, 22 Mar 2010 18:55:57 +0100 Sven Joachim <svenjoac@gmx.de> wrote: > > SJ> On 2010-03-22 18:06 +0100, Ted Zlatanov wrote: >>> + (200 url-http-code-OK "OК") > SJ> ^ > SJ> I think this is not quite OK. ;-) > > Sorry, can you elaborate? Press C-u C-x = over the К in "OК" and notice how this differs from "K". Sven ^ permalink raw reply [flat|nested] 182+ messages in thread
* Re: Translation of http status code to text 2010-03-22 20:32 ` Sven Joachim @ 2010-03-22 21:31 ` Ted Zlatanov 2010-03-23 9:55 ` Juri Linkov 0 siblings, 1 reply; 182+ messages in thread From: Ted Zlatanov @ 2010-03-22 21:31 UTC (permalink / raw) To: emacs-devel On Mon, 22 Mar 2010 21:32:53 +0100 Sven Joachim <svenjoac@gmx.de> wrote: SJ> On 2010-03-22 20:23 +0100, Ted Zlatanov wrote: >> On Mon, 22 Mar 2010 18:55:57 +0100 Sven Joachim <svenjoac@gmx.de> wrote: >> SJ> On 2010-03-22 18:06 +0100, Ted Zlatanov wrote: >>>> + (200 url-http-code-OK "OК") SJ> ^ SJ> I think this is not quite OK. ;-) >> >> Sorry, can you elaborate? SJ> Press C-u C-x = over the К in "OК" and notice how this differs from "K". Thanks for noticing that. I had the cyrillic-translit input method on accidentally. It would be nice if I could highlight non-ASCII characters in a subtle way to avoid this, is there a relevant face? Thanks Ted ^ permalink raw reply [flat|nested] 182+ messages in thread
* Re: Translation of http status code to text 2010-03-22 21:31 ` Ted Zlatanov @ 2010-03-23 9:55 ` Juri Linkov 2010-03-23 13:08 ` Lennart Borgman 2010-03-23 14:22 ` highlighting non-ASCII characters (was: Translation of http status code to text) Ted Zlatanov 0 siblings, 2 replies; 182+ messages in thread From: Juri Linkov @ 2010-03-23 9:55 UTC (permalink / raw) To: Ted Zlatanov; +Cc: emacs-devel >>>>> + (200 url-http-code-OK "OК") > SJ> ^ > SJ> I think this is not quite OK. ;-) >>> >>> Sorry, can you elaborate? > > SJ> Press C-u C-x = over the К in "OК" and notice how this differs from "K". > > Thanks for noticing that. I had the cyrillic-translit input method on > accidentally. It would be nice if I could highlight non-ASCII > characters in a subtle way to avoid this, is there a relevant face? Maybe `escape-glyph' is a relevant face? -- Juri Linkov http://www.jurta.org/emacs/ ^ permalink raw reply [flat|nested] 182+ messages in thread
* Re: Translation of http status code to text 2010-03-23 9:55 ` Juri Linkov @ 2010-03-23 13:08 ` Lennart Borgman 2010-03-23 14:26 ` face for non-ASCII characters (was: Translation of http status code to text) Ted Zlatanov 2010-03-23 14:35 ` Translation of http status code to text Miles Bader 2010-03-23 14:22 ` highlighting non-ASCII characters (was: Translation of http status code to text) Ted Zlatanov 1 sibling, 2 replies; 182+ messages in thread From: Lennart Borgman @ 2010-03-23 13:08 UTC (permalink / raw) To: Juri Linkov; +Cc: Ted Zlatanov, emacs-devel [-- Attachment #1: Type: text/plain, Size: 760 bytes --] 2010/3/23 Juri Linkov <juri@jurta.org>: >>>>>> + (200 url-http-code-OK "OК") >> SJ> ^ >> SJ> I think this is not quite OK. ;-) >>>> >>>> Sorry, can you elaborate? >> >> SJ> Press C-u C-x = over the К in "OК" and notice how this differs from "K". >> >> Thanks for noticing that. I had the cyrillic-translit input method on >> accidentally. It would be nice if I could highlight non-ASCII >> characters in a subtle way to avoid this, is there a relevant face? > > Maybe `escape-glyph' is a relevant face? I just wrote a little mis-named library for marking non-ascii characters. I have attached it if someone want to use it (or convert it to something more flexible and less mis-named). [-- Attachment #2: markchars.el --] [-- Type: text/plain, Size: 2176 bytes --] ;;; markchars.el --- Mark characters fitting certain characteristics ;; ;; Author: Lennart Borgman (lennart O borgman A gmail O com) ;; Created: 2010-03-22 Mon ;; Version: ;; Last-Updated: ;; URL: ;; Keywords: ;; Compatibility: ;; ;; Features that might be required by this library: ;; ;; None ;; ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ;; ;;; Commentary: ;; ;; ;; ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ;; ;;; Change log: ;; ;; ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ;; ;; This program is free software; you can redistribute it and/or ;; modify it under the terms of the GNU General Public License as ;; published by the Free Software Foundation; either version 3, or ;; (at your option) any later version. ;; ;; This program is distributed in the hope that it will be useful, ;; but WITHOUT ANY WARRANTY; without even the implied warranty of ;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ;; General Public License for more details. ;; ;; You should have received a copy of the GNU General Public License ;; along with this program; see the file COPYING. If not, write to ;; the Free Software Foundation, Inc., 51 Franklin Street, Fifth ;; Floor, Boston, MA 02110-1301, USA. ;; ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ;; ;;; Code: ;; (re-search-forward "[[:nonascii:]]") ;; äåö (defgroup markchars nil "Customization group for `markchars-mode'." :group 'convenience) (defface markchars-face '((t (:underline "light blue"))) "Face for `markchars-mode' char marking." :group 'markchars) (defconst markchars-keywords `( ("[[:nonascii:]]" (0 (put-text-property (match-beginning 0) (match-end 0) 'face 'markchars-face))) )) (define-minor-mode markchars-mode "Color up non-ascii chars." :group 'markchars (if markchars-mode (font-lock-add-keywords nil markchars-keywords) (font-lock-remove-keywords nil markchars-keywords)) (font-lock-fontify-buffer)) ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ;;; markchars.el ends here ^ permalink raw reply [flat|nested] 182+ messages in thread
* face for non-ASCII characters (was: Translation of http status code to text) 2010-03-23 13:08 ` Lennart Borgman @ 2010-03-23 14:26 ` Ted Zlatanov 2010-03-23 16:28 ` Lennart Borgman 2010-03-23 19:40 ` Florian Beck 2010-03-23 14:35 ` Translation of http status code to text Miles Bader 1 sibling, 2 replies; 182+ messages in thread From: Ted Zlatanov @ 2010-03-23 14:26 UTC (permalink / raw) To: emacs-devel On Tue, 23 Mar 2010 14:08:42 +0100 Lennart Borgman <lennart.borgman@gmail.com> wrote: ... LB> (defface markchars-face LB> '((t (:underline "light blue"))) LB> "Face for `markchars-mode' char marking." LB> :group 'markchars) LB> (defconst markchars-keywords LB> `( LB> ("[[:nonascii:]]" LB> (0 (put-text-property (match-beginning 0) (match-end 0) LB> 'face 'markchars-face))) LB> )) This is what I was thinking of, and the effect should be subtle. The light blue underline is pretty good; it would be nice if it could be a choice between a few similarly well-chosen defaults (I'm hardly the one to provide them, though). I would make this an Emacs first-class option rather than an optional library. I think it's really useful. Ted ^ permalink raw reply [flat|nested] 182+ messages in thread
* Re: face for non-ASCII characters (was: Translation of http status code to text) 2010-03-23 14:26 ` face for non-ASCII characters (was: Translation of http status code to text) Ted Zlatanov @ 2010-03-23 16:28 ` Lennart Borgman 2010-03-23 18:18 ` face for non-ASCII characters Ted Zlatanov 2010-03-23 19:40 ` Florian Beck 1 sibling, 1 reply; 182+ messages in thread From: Lennart Borgman @ 2010-03-23 16:28 UTC (permalink / raw) To: Ted Zlatanov; +Cc: emacs-devel [-- Attachment #1: Type: text/plain, Size: 991 bytes --] 2010/3/23 Ted Zlatanov <tzz@lifelogs.com>: > On Tue, 23 Mar 2010 14:08:42 +0100 Lennart Borgman <lennart.borgman@gmail.com> wrote: > > ... > LB> (defface markchars-face > LB> '((t (:underline "light blue"))) > LB> "Face for `markchars-mode' char marking." > LB> :group 'markchars) > > LB> (defconst markchars-keywords > LB> `( > LB> ("[[:nonascii:]]" > LB> (0 (put-text-property (match-beginning 0) (match-end 0) > LB> 'face 'markchars-face))) > LB> )) > > This is what I was thinking of, and the effect should be subtle. The > light blue underline is pretty good; it would be nice if it could be a > choice between a few similarly well-chosen defaults (I'm hardly the one > to provide them, though). > > I would make this an Emacs first-class option rather than an optional > library. I think it's really useful. Ok, since you found it useful I tried to finish it. See the attachment. [-- Attachment #2: markchars.el --] [-- Type: text/plain, Size: 3294 bytes --] ;;; markchars.el --- Mark chars fitting certain characteristics ;; ;; Author: Lennart Borgman (lennart O borgman A gmail O com) ;; Created: 2010-03-22 Mon ;; Version: ;; Last-Updated: ;; URL: ;; Keywords: ;; Compatibility: ;; ;; Features that might be required by this library: ;; ;; None ;; ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ;; ;;; Commentary: ;; ;; Mark special chars, by default nonascii chars. See `markchars-mode'. ;; ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ;; ;;; Change log: ;; ;; ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ;; ;; This program is free software; you can redistribute it and/or ;; modify it under the terms of the GNU General Public License as ;; published by the Free Software Foundation; either version 3, or ;; (at your option) any later version. ;; ;; This program is distributed in the hope that it will be useful, ;; but WITHOUT ANY WARRANTY; without even the implied warranty of ;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ;; General Public License for more details. ;; ;; You should have received a copy of the GNU General Public License ;; along with this program; see the file COPYING. If not, write to ;; the Free Software Foundation, Inc., 51 Franklin Street, Fifth ;; Floor, Boston, MA 02110-1301, USA. ;; ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ;; ;;; Code: ;; (re-search-forward "[[:nonascii:]]") ;; äåö ;;;###autoload (defgroup markchars nil "Customization group for `markchars-mode'." :group 'convenience) (defface markchars-light '((t (:underline "light blue"))) "Light face for `markchars-mode' char marking." :group 'markchars) (defface markchars-heavy '((t (:underline "magenta"))) "Heavy face for `markchars-mode' char marking." :group 'markchars) (defcustom markchars-face 'markchars-heavy "Pointer to face used for marking chars." :type 'face :group 'markchars) (defcustom markchars-pattern "[[:nonascii:]]" "Regexp for characters to mark. This is supposed to match single characters, but you can of course \(mis)use it anyway you want. By default it matches nonascii-chars." :type 'regexp :group 'markchars) (defvar markchars-keywords nil "Keywords for font lock.") (defun markchars-set-keywords () "Set `markchars-keywords' from options." (set (make-local-variable 'markchars-keywords) (list (list markchars-pattern (list 0 '(put-text-property (match-beginning 0) (match-end 0) 'face markchars-face)))))) ;;;###autoload (define-minor-mode markchars-mode "Mark special characters. Which characters to mark are defined by `markchars-pattern'. The default is to mark nonascii chars with a magenta underline." :group 'markchars (if markchars-mode (progn (markchars-set-keywords) (font-lock-add-keywords nil markchars-keywords)) (font-lock-remove-keywords nil markchars-keywords)) (font-lock-fontify-buffer)) ;;;###autoload (define-globalized-minor-mode markchars-global-mode markchars-mode (lambda () (markchars-mode 1)) :group 'markchars) ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ;;; markchars.el ends here ^ permalink raw reply [flat|nested] 182+ messages in thread
* Re: face for non-ASCII characters 2010-03-23 16:28 ` Lennart Borgman @ 2010-03-23 18:18 ` Ted Zlatanov 2011-04-15 22:41 ` Ted Zlatanov 0 siblings, 1 reply; 182+ messages in thread From: Ted Zlatanov @ 2010-03-23 18:18 UTC (permalink / raw) To: emacs-devel On Tue, 23 Mar 2010 17:28:01 +0100 Lennart Borgman <lennart.borgman@gmail.com> wrote: LB> 2010/3/23 Ted Zlatanov <tzz@lifelogs.com>: >> I would make this an Emacs first-class option rather than an optional >> library. I think it's really useful. LB> Ok, since you found it useful I tried to finish it. See the attachment. Stefan, Yidong, what do you think about something like markchars.el becoming part of Emacs, so users can turn it on with a global customization? I don't think it has to be a minor mode necessarily but I'm no expert in that area. If it goes in there are several tasks: picking a variable name, picking suitable default faces, (maybe) adding a menu item, and documenting the new behavior. I would really like to make it easily available to Emacs users and will help with these tasks. Ted ^ permalink raw reply [flat|nested] 182+ messages in thread
* Re: face for non-ASCII characters 2010-03-23 18:18 ` face for non-ASCII characters Ted Zlatanov @ 2011-04-15 22:41 ` Ted Zlatanov 2011-04-15 23:07 ` Lennart Borgman 0 siblings, 1 reply; 182+ messages in thread From: Ted Zlatanov @ 2011-04-15 22:41 UTC (permalink / raw) To: emacs-devel [-- Attachment #1: Type: text/plain, Size: 2147 bytes --] On Tue, 23 Mar 2010 13:18:05 -0500 Ted Zlatanov <tzz@lifelogs.com> wrote: TZ> On Tue, 23 Mar 2010 17:28:01 +0100 Lennart Borgman <lennart.borgman@gmail.com> wrote: LB> Ok, since you found it useful I tried to finish it. See the attachment. TZ> Stefan, Yidong, what do you think about something like markchars.el TZ> becoming part of Emacs, so users can turn it on with a global TZ> customization? I don't think it has to be a minor mode necessarily but TZ> I'm no expert in that area. TZ> If it goes in there are several tasks: picking a variable name, picking TZ> suitable default faces, (maybe) adding a menu item, and documenting the TZ> new behavior. I would really like to make it easily available to Emacs TZ> users and will help with these tasks. I worked on markchars.el some more (over 1 year later, heh). The result is attached; the commentary explains what it will do: ;;; Commentary: ;; ;; Mark special chars, by default nonascii chars, in modes where they ;; may be confused with regular chars. See `markchars-mode' and ;; `markchars-what'. There are two modes: confusable detection (where ;; we look for mixed scripts within a word, without using the ;; http://www.unicode.org/reports/tr39/ confusable tables) and pattern ;; detection (where any regular expressions can be matched). ;; ;; The marked text will have the 'markchars property set to either ;; 'confusable or 'pattern and the face set to either ;; `markchars-face-confusable' or `markchars-face-pattern' ;; respectively. So basically you can choose what to highlight; if you choose to highlight confusable characters, any strings whose characters have mixed scripts according to `char-script-table' will be highlighted. I didn't use the actual confusables.txt table from http://www.unicode.org/reports/tr39/ because it's huge and probably not very useful; the goal is to show suspicious characters and within individual scripts it's rare to find confusable characters. It's pretty easy to add more markchar properties and to make the confusable detection smarter. Let me know if you find this useful; I will propose it for the GNU ELPA if so. Thanks Ted [-- Attachment #2: markchars.el --] [-- Type: application/emacs-lisp, Size: 6397 bytes --] ^ permalink raw reply [flat|nested] 182+ messages in thread
* Re: face for non-ASCII characters 2011-04-15 22:41 ` Ted Zlatanov @ 2011-04-15 23:07 ` Lennart Borgman 2011-04-16 0:51 ` Ted Zlatanov 0 siblings, 1 reply; 182+ messages in thread From: Lennart Borgman @ 2011-04-15 23:07 UTC (permalink / raw) To: Ted Zlatanov; +Cc: emacs-devel 2011/4/16 Ted Zlatanov <tzz@lifelogs.com>: > On Tue, 23 Mar 2010 13:18:05 -0500 Ted Zlatanov <tzz@lifelogs.com> wrote: > > TZ> On Tue, 23 Mar 2010 17:28:01 +0100 Lennart Borgman <lennart.borgman@gmail.com> wrote: > LB> Ok, since you found it useful I tried to finish it. See the attachment. > > TZ> Stefan, Yidong, what do you think about something like markchars.el > TZ> becoming part of Emacs, so users can turn it on with a global > TZ> customization? I don't think it has to be a minor mode necessarily but > TZ> I'm no expert in that area. > > TZ> If it goes in there are several tasks: picking a variable name, picking > TZ> suitable default faces, (maybe) adding a menu item, and documenting the > TZ> new behavior. I would really like to make it easily available to Emacs > TZ> users and will help with these tasks. > > I worked on markchars.el some more (over 1 year later, heh). The result > is attached; the commentary explains what it will do: > > ;;; Commentary: > ;; > ;; Mark special chars, by default nonascii chars, in modes where they > ;; may be confused with regular chars. See `markchars-mode' and > ;; `markchars-what'. There are two modes: confusable detection (where > ;; we look for mixed scripts within a word, without using the > ;; http://www.unicode.org/reports/tr39/ confusable tables) and pattern > ;; detection (where any regular expressions can be matched). > ;; > ;; The marked text will have the 'markchars property set to either > ;; 'confusable or 'pattern and the face set to either > ;; `markchars-face-confusable' or `markchars-face-pattern' > ;; respectively. > > So basically you can choose what to highlight; if you choose to > highlight confusable characters, any strings whose characters have mixed > scripts according to `char-script-table' will be highlighted. I didn't > use the actual confusables.txt table from > http://www.unicode.org/reports/tr39/ because it's huge and probably not > very useful; the goal is to show suspicious characters and within > individual scripts it's rare to find confusable characters. > > It's pretty easy to add more markchar properties and to make the > confusable detection smarter. Let me know if you find this useful; I > will propose it for the GNU ELPA if so. > > Thanks > Ted Nice to see you are enhancing it, Ted. However I wonder if you are working on an older copy of it since it does not use idn.el. Could you please take a look at the latest version and see how idn-is-recommended compares to what you call confusables? ^ permalink raw reply [flat|nested] 182+ messages in thread
* Re: face for non-ASCII characters 2011-04-15 23:07 ` Lennart Borgman @ 2011-04-16 0:51 ` Ted Zlatanov 2011-04-16 9:10 ` Lennart Borgman 0 siblings, 1 reply; 182+ messages in thread From: Ted Zlatanov @ 2011-04-16 0:51 UTC (permalink / raw) To: emacs-devel On Sat, 16 Apr 2011 01:07:06 +0200 Lennart Borgman <lennart.borgman@gmail.com> wrote: LB> Nice to see you are enhancing it, Ted. However I wonder if you are LB> working on an older copy of it since it does not use idn.el. Could you LB> please take a look at the latest version and see how LB> idn-is-recommended compares to what you call confusables? Where is the latest version? I didn't see any further messages from you in that thread after 2010-03 so I didn't know you had updated it. Ted ^ permalink raw reply [flat|nested] 182+ messages in thread
* Re: face for non-ASCII characters 2011-04-16 0:51 ` Ted Zlatanov @ 2011-04-16 9:10 ` Lennart Borgman 2011-04-16 15:05 ` Ted Zlatanov 0 siblings, 1 reply; 182+ messages in thread From: Lennart Borgman @ 2011-04-16 9:10 UTC (permalink / raw) To: Ted Zlatanov; +Cc: emacs-devel 2011/4/16 Ted Zlatanov <tzz@lifelogs.com>: > On Sat, 16 Apr 2011 01:07:06 +0200 Lennart Borgman <lennart.borgman@gmail.com> wrote: > > LB> Nice to see you are enhancing it, Ted. However I wonder if you are > LB> working on an older copy of it since it does not use idn.el. Could you > LB> please take a look at the latest version and see how > LB> idn-is-recommended compares to what you call confusables? > > Where is the latest version? I didn't see any further messages from you > in that thread after 2010-03 so I didn't know you had updated it. Oh, I am very sorry Ted. I have put mostly every elisp library I have written into nXhtml. So you find it in the nXhtml repository at Launchpad. ^ permalink raw reply [flat|nested] 182+ messages in thread
* Re: face for non-ASCII characters 2011-04-16 9:10 ` Lennart Borgman @ 2011-04-16 15:05 ` Ted Zlatanov 2011-04-16 15:28 ` Lennart Borgman 0 siblings, 1 reply; 182+ messages in thread From: Ted Zlatanov @ 2011-04-16 15:05 UTC (permalink / raw) To: emacs-devel [-- Attachment #1: Type: text/plain, Size: 2653 bytes --] On Sat, 16 Apr 2011 11:10:03 +0200 Lennart Borgman <lennart.borgman@gmail.com> wrote: LB> 2011/4/16 Ted Zlatanov <tzz@lifelogs.com>: >> On Sat, 16 Apr 2011 01:07:06 +0200 Lennart Borgman <lennart.borgman@gmail.com> wrote: >> LB> Nice to see you are enhancing it, Ted. However I wonder if you are LB> working on an older copy of it since it does not use idn.el. Could you LB> please take a look at the latest version and see how LB> idn-is-recommended compares to what you call confusables? >> >> Where is the latest version? I didn't see any further messages from you >> in that thread after 2010-03 so I didn't know you had updated it. LB> Oh, I am very sorry Ted. I have put mostly every elisp library I have LB> written into nXhtml. So you find it in the nXhtml repository at LB> Launchpad. I merged your changes with my version and called myself a "contrbuthor" :) I'd like to keep markchars.el a standalone library, so the attached does not require idn.el. I also set the version to 0.2. I would like to put it in the GNU ELPA, if you don't mind (it can still live in nXhtml, we can mirror it). You'll need to assign the copyright, though. The major change is that instead of detecting the range at the font-lock keyword level, I run non-IDN detection at the word markup level (just like confusables detection). I think that results in cleaner, easily extensible code--take a look and see what you think. For an IDN markup face I defined a new one. Your call on what it should be, I just set it to a white underline for now. This is IMO a good change: (make-obsolete-variable 'markchars-keywords 'markchars-what "markchars.el 0.2") because you had `markchars-keywords' and `markchars-used-keywords' which was confusing. `markchars--render-nonidn' is not optimized: it steps through the word in the buffer and assigns the properties to each individual character instead of each range it finds. I don't think that's a big deal but it could be done better. I couldn't reuse your non-IDN detection logic because it was not word-oriented. I would use a char-table for idn.el instead of a bool-vector. Also perhaps idn.el's .txt files and confusables.txt should simply be part of Emacs, so the IDN and confusables properties can be looked up like the other properties. Emacs already does that for many properties, see for example: (format "%S" (mapcar 'car char-code-property-alist)) (get-char-code-property ?q 'titlecase) I think that inclusion would benefit everyone, but the original .txt files are large so I'll leave it up to the experts. If they are included, `markchars--render-nonidn' would be much much smaller. Ted [-- Attachment #2: markchars.el --] [-- Type: application/emacs-lisp, Size: 7721 bytes --] ^ permalink raw reply [flat|nested] 182+ messages in thread
* Re: face for non-ASCII characters 2011-04-16 15:05 ` Ted Zlatanov @ 2011-04-16 15:28 ` Lennart Borgman 2011-04-16 15:42 ` Ted Zlatanov 0 siblings, 1 reply; 182+ messages in thread From: Lennart Borgman @ 2011-04-16 15:28 UTC (permalink / raw) To: Ted Zlatanov; +Cc: emacs-devel 2011/4/16 Ted Zlatanov <tzz@lifelogs.com>: > On Sat, 16 Apr 2011 11:10:03 +0200 Lennart Borgman <lennart.borgman@gmail.com> wrote: > > LB> 2011/4/16 Ted Zlatanov <tzz@lifelogs.com>: >>> On Sat, 16 Apr 2011 01:07:06 +0200 Lennart Borgman <lennart.borgman@gmail.com> wrote: >>> > LB> Nice to see you are enhancing it, Ted. However I wonder if you are > LB> working on an older copy of it since it does not use idn.el. Could you > LB> please take a look at the latest version and see how > LB> idn-is-recommended compares to what you call confusables? >>> >>> Where is the latest version? I didn't see any further messages from you >>> in that thread after 2010-03 so I didn't know you had updated it. > > LB> Oh, I am very sorry Ted. I have put mostly every elisp library I have > LB> written into nXhtml. So you find it in the nXhtml repository at > LB> Launchpad. > > I merged your changes with my version and called myself a "contrbuthor" :) :-) > I'd like to keep markchars.el a standalone library, so the attached does > not require idn.el. Ok. Do I understand it correctly that you use idn.el if it is available, otherwise not? > can mirror it). You'll need to assign the copyright, though. I have of course signed papers for Emacs. > The major change is that instead of detecting the range at the font-lock > keyword level, I run non-IDN detection at the word markup level (just > like confusables detection). I think that results in cleaner, easily > extensible code--take a look and see what you think. I have no special opinion about it. Just use what is easiest to maintain and understand. I guess it works the same? > For an IDN markup face I defined a new one. Your call on what it should > be, I just set it to a white underline for now. Thanks. I have no special opinion on that. > This is IMO a good change: > > (make-obsolete-variable 'markchars-keywords 'markchars-what "markchars.el 0.2") > > because you had `markchars-keywords' and `markchars-used-keywords' which > was confusing. Ok. > `markchars--render-nonidn' is not optimized: it steps through the word > in the buffer and assigns the properties to each individual character > instead of each range it finds. I don't think that's a big deal but it > could be done better. I couldn't reuse your non-IDN detection logic > because it was not word-oriented. Please suggest a better inferface from idn.el if possible to fix this. > I would use a char-table for idn.el instead of a bool-vector. Also > perhaps idn.el's .txt files and confusables.txt should simply be part of > Emacs, so the IDN and confusables properties can be looked up like the > other properties. Emacs already does that for many properties, see for > example: > > (format "%S" (mapcar 'car char-code-property-alist)) > (get-char-code-property ?q 'titlecase) > > I think that inclusion would benefit everyone, but the original .txt > files are large so I'll leave it up to the experts. That was why I used a boolean vector. But I am not an expert on this. > If they are > included, `markchars--render-nonidn' would be much much smaller. I do not understand what you mean here. > Ted > > ^ permalink raw reply [flat|nested] 182+ messages in thread
* Re: face for non-ASCII characters 2011-04-16 15:28 ` Lennart Borgman @ 2011-04-16 15:42 ` Ted Zlatanov 2011-04-16 15:50 ` Lennart Borgman 0 siblings, 1 reply; 182+ messages in thread From: Ted Zlatanov @ 2011-04-16 15:42 UTC (permalink / raw) To: emacs-devel On Sat, 16 Apr 2011 17:28:48 +0200 Lennart Borgman <lennart.borgman@gmail.com> wrote: >> I'd like to keep markchars.el a standalone library, so the attached does >> not require idn.el. LB> Ok. Do I understand it correctly that you use idn.el if it is LB> available, otherwise not? Yes, `idn-is-recommended' is autoloaded. It may be a problem with the fboundp call, which will return true even if the library is not available: (when (autoload 'coocoo "coo") (fboundp 'coocoo)) => t so maybe (require 'idn nil t) is a better approach, and we can define a stub replacement for `idn-is-recommended'. I didn't dig too much into it because of the possibility of inclusion I mentioned. >> The major change is that instead of detecting the range at the font-lock >> keyword level, I run non-IDN detection at the word markup level (just >> like confusables detection). I think that results in cleaner, easily >> extensible code--take a look and see what you think. LB> I have no special opinion about it. Just use what is easiest to LB> maintain and understand. I guess it works the same? Effectively it's the same, yes. >> `markchars--render-nonidn' is not optimized: it steps through the word >> in the buffer and assigns the properties to each individual character >> instead of each range it finds. I don't think that's a big deal but it >> could be done better. I couldn't reuse your non-IDN detection logic >> because it was not word-oriented. LB> Please suggest a better inferface from idn.el if possible to fix LB> this. This is the stepping algorithm at the buffer level, idn.el can't help. >> I would use a char-table for idn.el instead of a bool-vector. Also >> perhaps idn.el's .txt files and confusables.txt should simply be part of >> Emacs, so the IDN and confusables properties can be looked up like the >> other properties. Emacs already does that for many properties, see for >> example: >> >> (format "%S" (mapcar 'car char-code-property-alist)) >> (get-char-code-property ?q 'titlecase) >> >> I think that inclusion would benefit everyone, but the original .txt >> files are large so I'll leave it up to the experts. LB> That was why I used a boolean vector. But I am not an expert on LB> this. The char-table is meant for character lookups, that's why I suggested it instead of a bool-vector. >> If they are >> included, `markchars--render-nonidn' would be much much smaller. LB> I do not understand what you mean here. The function wouldn't need to assign a special property, the character would already have it. And the highlighting could just look for that special property instead of hunting for it with a function. Ted ^ permalink raw reply [flat|nested] 182+ messages in thread
* Re: face for non-ASCII characters 2011-04-16 15:42 ` Ted Zlatanov @ 2011-04-16 15:50 ` Lennart Borgman 2011-04-16 15:57 ` Ted Zlatanov 2011-04-16 16:00 ` face for non-ASCII characters Drew Adams 0 siblings, 2 replies; 182+ messages in thread From: Lennart Borgman @ 2011-04-16 15:50 UTC (permalink / raw) To: Ted Zlatanov; +Cc: emacs-devel 2011/4/16 Ted Zlatanov <tzz@lifelogs.com>: > On Sat, 16 Apr 2011 17:28:48 +0200 Lennart Borgman <lennart.borgman@gmail.com> wrote: > >>> I'd like to keep markchars.el a standalone library, so the attached does >>> not require idn.el. > > LB> Ok. Do I understand it correctly that you use idn.el if it is > LB> available, otherwise not? > > Yes, `idn-is-recommended' is autoloaded. It may be a problem with the > fboundp call, which will return true even if the library is not > available: > > (when (autoload 'coocoo "coo") > (fboundp 'coocoo)) > > => t This is not a problem if autoload is setup correctly. > so maybe (require 'idn nil t) is a better approach, and we can define a > stub replacement for `idn-is-recommended'. I didn't dig too much into > it because of the possibility of inclusion I mentioned. I think autoloading and checking if the function is defined is the best way. (Autoloading of `idn-is-recommended' should of course not be setup if idn.el is not available.) > Effectively it's the same, yes. Good. >>> `markchars--render-nonidn' is not optimized: it steps through the word >>> in the buffer and assigns the properties to each individual character >>> instead of each range it finds. I don't think that's a big deal but it >>> could be done better. I couldn't reuse your non-IDN detection logic >>> because it was not word-oriented. > > LB> Please suggest a better inferface from idn.el if possible to fix > LB> this. > > This is the stepping algorithm at the buffer level, idn.el can't help. ... > The char-table is meant for character lookups, that's why I suggested > it instead of a bool-vector. So using a char-table instead in idn.el would make the interface better? (But we need someone good at this to tell if that is usable here performancewise.) > The function wouldn't need to assign a special property, the character > would already have it. And the highlighting could just look for that > special property instead of hunting for it with a function. I see, thanks. > Ted > > > ^ permalink raw reply [flat|nested] 182+ messages in thread
* Re: face for non-ASCII characters 2011-04-16 15:50 ` Lennart Borgman @ 2011-04-16 15:57 ` Ted Zlatanov 2011-04-16 16:01 ` Lennart Borgman 2011-04-16 16:00 ` face for non-ASCII characters Drew Adams 1 sibling, 1 reply; 182+ messages in thread From: Ted Zlatanov @ 2011-04-16 15:57 UTC (permalink / raw) To: emacs-devel On Sat, 16 Apr 2011 17:50:07 +0200 Lennart Borgman <lennart.borgman@gmail.com> wrote: >> so maybe (require 'idn nil t) is a better approach, and we can define a >> stub replacement for `idn-is-recommended'. I didn't dig too much into >> it because of the possibility of inclusion I mentioned. LB> I think autoloading and checking if the function is defined is the LB> best way. (Autoloading of `idn-is-recommended' should of course not be LB> setup if idn.el is not available.) OK, but then this: (defcustom markchars-what `(markchars-simple-pattern markchars-confusables ,@(when (fboundp 'idn-is-recommended) '(markchars-nonidn-fun))) ... will always add `markchars-nonidn-fun' to `markchars-what' and that will break users without idn.el. LB> So using a char-table instead in idn.el would make the interface LB> better? (But we need someone good at this to tell if that is usable LB> here performancewise.) I think it's slightly better for characters to use char-table instead of bool-vector. Semantically it's cleaner: you're using a char-table, which is meant for characters, instead of a bool-vector, which is for general data. But it's a really small difference so don't bother if you don't feel like it :) OK, so after we resolve the autoload issue above and find out if the confusables.txt and idn*.txt files can be turned into Emacs character properties, we can publish 0.2, right? Is there anything else it needs? Ted ^ permalink raw reply [flat|nested] 182+ messages in thread
* Re: face for non-ASCII characters 2011-04-16 15:57 ` Ted Zlatanov @ 2011-04-16 16:01 ` Lennart Borgman 2011-04-16 16:13 ` Ted Zlatanov 0 siblings, 1 reply; 182+ messages in thread From: Lennart Borgman @ 2011-04-16 16:01 UTC (permalink / raw) To: Ted Zlatanov; +Cc: emacs-devel 2011/4/16 Ted Zlatanov <tzz@lifelogs.com>: > On Sat, 16 Apr 2011 17:50:07 +0200 Lennart Borgman <lennart.borgman@gmail.com> wrote: > >>> so maybe (require 'idn nil t) is a better approach, and we can define a >>> stub replacement for `idn-is-recommended'. I didn't dig too much into >>> it because of the possibility of inclusion I mentioned. > > LB> I think autoloading and checking if the function is defined is the > LB> best way. (Autoloading of `idn-is-recommended' should of course not be > LB> setup if idn.el is not available.) > > OK, but then this: > > (defcustom markchars-what > `(markchars-simple-pattern > markchars-confusables > ,@(when (fboundp 'idn-is-recommended) '(markchars-nonidn-fun))) > ... > > will always add `markchars-nonidn-fun' to `markchars-what' and that will > break users without idn.el. Why? (fboundp 'idn-is-recommended) only returns true if the function is defined or autoloaded. If idn.el is not available the function should not be autoloaded, or? > LB> So using a char-table instead in idn.el would make the interface > LB> better? (But we need someone good at this to tell if that is usable > LB> here performancewise.) > > I think it's slightly better for characters to use char-table instead of > bool-vector. Semantically it's cleaner: you're using a char-table, > which is meant for characters, instead of a bool-vector, which is for > general data. But it's a really small difference so don't bother if you > don't feel like it :) I sounds like a good idea, I am just waiting for some more info about performance. > OK, so after we resolve the autoload issue above and find out if the > confusables.txt and idn*.txt files can be turned into Emacs character > properties, we can publish 0.2, right? Is there anything else it needs? > > Ted > > > ^ permalink raw reply [flat|nested] 182+ messages in thread
* Re: face for non-ASCII characters 2011-04-16 16:01 ` Lennart Borgman @ 2011-04-16 16:13 ` Ted Zlatanov 2011-04-16 16:22 ` Lennart Borgman 2011-04-16 16:27 ` Drew Adams 0 siblings, 2 replies; 182+ messages in thread From: Ted Zlatanov @ 2011-04-16 16:13 UTC (permalink / raw) To: emacs-devel On Sat, 16 Apr 2011 18:01:12 +0200 Lennart Borgman <lennart.borgman@gmail.com> wrote: LB> Why? (fboundp 'idn-is-recommended) only returns true if the function LB> is defined or autoloaded. If idn.el is not available the function LB> should not be autoloaded, or? My example showed that `fboundp' returns t if the function is autoloaded, even if it's not actually loaded yet. (when (autoload 'coocoo1 "coo") (fboundp 'coocoo1)) => t (coocoo1) => Debugger entered--Lisp error: (file-error "Cannot open load file" "coo") On Sat, 16 Apr 2011 09:00:08 -0700 "Drew Adams" <drew.adams@oracle.com> wrote: DA> Why set up autoloading here? Seems like you would want to either soft-require DA> the library (no error if not there) or simply test whether the function is DA> defined (which presumably happens when the library is loaded). DA> IOW, why don't you just choose between (require 'foo nil t) and (fboundp DA> 'foobar)? What's the point of having this library autoload the other one but DA> not soft-require it? Lennart wants to conditionally, if idn.el is available, to enable its use in the defcustom so things Just Work for the user: (defcustom markchars-what `(markchars-simple-pattern markchars-confusables ,@(when (fboundp 'idn-is-recommended) '(markchars-nonidn-fun))) ... I'm not sure what's the right approach, but IMO (require 'idn nil t) is the simplest solution. Ted ^ permalink raw reply [flat|nested] 182+ messages in thread
* Re: face for non-ASCII characters 2011-04-16 16:13 ` Ted Zlatanov @ 2011-04-16 16:22 ` Lennart Borgman 2011-04-16 16:27 ` Drew Adams 1 sibling, 0 replies; 182+ messages in thread From: Lennart Borgman @ 2011-04-16 16:22 UTC (permalink / raw) To: Ted Zlatanov; +Cc: emacs-devel 2011/4/16 Ted Zlatanov <tzz@lifelogs.com>: > On Sat, 16 Apr 2011 18:01:12 +0200 Lennart Borgman <lennart.borgman@gmail.com> wrote: > > LB> Why? (fboundp 'idn-is-recommended) only returns true if the function > LB> is defined or autoloaded. If idn.el is not available the function > LB> should not be autoloaded, or? > > My example showed that `fboundp' returns t if the function is > autoloaded, even if it's not actually loaded yet. > > (when (autoload 'coocoo1 "coo") > (fboundp 'coocoo1)) > > => t > > (coocoo1) > > => Debugger entered--Lisp error: (file-error "Cannot open load file" "coo") Yes, of course. But coocoo1 should not be autoloaded if coo.el is not available, or? > Lennart wants to conditionally, if idn.el is available, to enable its > use in the defcustom so things Just Work for the user: > > (defcustom markchars-what > `(markchars-simple-pattern > markchars-confusables > ,@(when (fboundp 'idn-is-recommended) '(markchars-nonidn-fun))) > ... > > I'm not sure what's the right approach, but IMO (require 'idn nil t) is > the simplest solution. Maybe I am beginning to understand. I suppose you want idn.el to be available even if autoloading is not used for `idn-is-recommended'? In that case this could perhaps work: (unless (fboundp 'idn-is-recommended) (require 'idn nil t)) > Ted > > > ^ permalink raw reply [flat|nested] 182+ messages in thread
* RE: face for non-ASCII characters 2011-04-16 16:13 ` Ted Zlatanov 2011-04-16 16:22 ` Lennart Borgman @ 2011-04-16 16:27 ` Drew Adams 2011-04-16 16:45 ` Ted Zlatanov 1 sibling, 1 reply; 182+ messages in thread From: Drew Adams @ 2011-04-16 16:27 UTC (permalink / raw) To: 'Ted Zlatanov', emacs-devel > DA> Why set up autoloading here? Seems like you would want > DA> to either soft-require the library (no error if not there) > DA> or simply test whether the function is defined (which > DA> presumably happens when the library is loaded). > DA> > DA> IOW, why don't you just choose between (require 'foo nil > DA> t) and (fboundp 'foobar)? What's the point of having > DA> this library autoload the other one but not soft-require it? > > Lennart wants to conditionally, if idn.el is available, to enable its > use in the defcustom so things Just Work for the user: > > (defcustom markchars-what > `(markchars-simple-pattern > markchars-confusables > ,@(when (fboundp 'idn-is-recommended) '(markchars-nonidn-fun))) I understand that. But it seems to me that either we want to unconditionally load idn.el if available (in `load-path') or we do not want to do that. In the latter case, we are leaving it up to the user to load it if s?he wants. And in that case the proper test is simply `fboundp'. I still don't see the point of the middle ground you're scoping out, of autoloading but not soft-requiring. When this library (the other one, not idn.el) is loaded, do you want idn.el to have been loaded or not? If you do, then soft-require. If you want to leave that choice up to the user, then test with `fboundp'. What am I missing? > I'm not sure what's the right approach, but IMO (require 'idn > nil t) is the simplest solution. `fboundp' is just as simple. It's not about simple, I think. It's about deciding what's best for most users. Do we want this library to load idn.el whenever it is in the `load-path' or not? ^ permalink raw reply [flat|nested] 182+ messages in thread
* Re: face for non-ASCII characters 2011-04-16 16:27 ` Drew Adams @ 2011-04-16 16:45 ` Ted Zlatanov 2011-04-16 16:48 ` Lennart Borgman 0 siblings, 1 reply; 182+ messages in thread From: Ted Zlatanov @ 2011-04-16 16:45 UTC (permalink / raw) To: emacs-devel Sorry if I confused you guys. I set it up to do (require 'idn nil t) and later test (fboundp 'idn-is-recommended) which I believe is the right thing. autoloading is not required and was confusing me. If the user overrides the default and adds 'markchars-nonidn-fun to `markchars-what' then they'll get an error. The change is trivial so I won't repost the whole file. Ted ^ permalink raw reply [flat|nested] 182+ messages in thread
* Re: face for non-ASCII characters 2011-04-16 16:45 ` Ted Zlatanov @ 2011-04-16 16:48 ` Lennart Borgman 2011-04-16 16:55 ` Ted Zlatanov 0 siblings, 1 reply; 182+ messages in thread From: Lennart Borgman @ 2011-04-16 16:48 UTC (permalink / raw) To: Ted Zlatanov; +Cc: emacs-devel 2011/4/16 Ted Zlatanov <tzz@lifelogs.com>: > Sorry if I confused you guys. I set it up to do > > (require 'idn nil t) > > and later test (fboundp 'idn-is-recommended) which I believe is the > right thing. autoloading is not required and was confusing me. If the > user overrides the default and adds 'markchars-nonidn-fun to > `markchars-what' then they'll get an error. The change is trivial so I > won't repost the whole file. If you do it this way and the function is autoloaded but not used you will load idn.el though it is not needed. Please see my last suggestion instead. ^ permalink raw reply [flat|nested] 182+ messages in thread
* Re: face for non-ASCII characters 2011-04-16 16:48 ` Lennart Borgman @ 2011-04-16 16:55 ` Ted Zlatanov 2011-04-16 17:11 ` Lennart Borgman 0 siblings, 1 reply; 182+ messages in thread From: Ted Zlatanov @ 2011-04-16 16:55 UTC (permalink / raw) To: emacs-devel On Sat, 16 Apr 2011 18:48:49 +0200 Lennart Borgman <lennart.borgman@gmail.com> wrote: LB> 2011/4/16 Ted Zlatanov <tzz@lifelogs.com>: >> Sorry if I confused you guys. I set it up to do >> >> (require 'idn nil t) >> >> and later test (fboundp 'idn-is-recommended) which I believe is the >> right thing. autoloading is not required and was confusing me. If the >> user overrides the default and adds 'markchars-nonidn-fun to >> `markchars-what' then they'll get an error. The change is trivial so I >> won't repost the whole file. LB> If you do it this way and the function is autoloaded but not used you LB> will load idn.el though it is not needed. Please see my last LB> suggestion instead. Could you please do it the way you like and post your version 0.2? Even better, put it in nXhtml and we can later propose markchars.el for the GNU ELPA? Thanks Ted ^ permalink raw reply [flat|nested] 182+ messages in thread
* Re: face for non-ASCII characters 2011-04-16 16:55 ` Ted Zlatanov @ 2011-04-16 17:11 ` Lennart Borgman 2011-04-18 15:48 ` Ted Zlatanov 0 siblings, 1 reply; 182+ messages in thread From: Lennart Borgman @ 2011-04-16 17:11 UTC (permalink / raw) To: Ted Zlatanov; +Cc: emacs-devel 2011/4/16 Ted Zlatanov <tzz@lifelogs.com>: > On Sat, 16 Apr 2011 18:48:49 +0200 Lennart Borgman <lennart.borgman@gmail.com> wrote: > > LB> 2011/4/16 Ted Zlatanov <tzz@lifelogs.com>: >>> Sorry if I confused you guys. I set it up to do >>> >>> (require 'idn nil t) >>> >>> and later test (fboundp 'idn-is-recommended) which I believe is the >>> right thing. autoloading is not required and was confusing me. If the >>> user overrides the default and adds 'markchars-nonidn-fun to >>> `markchars-what' then they'll get an error. The change is trivial so I >>> won't repost the whole file. > > LB> If you do it this way and the function is autoloaded but not used you > LB> will load idn.el though it is not needed. Please see my last > LB> suggestion instead. > > Could you please do it the way you like and post your version 0.2? Even > better, put it in nXhtml and we can later propose markchars.el for the > GNU ELPA? > > Thanks > Ted > > > ^ permalink raw reply [flat|nested] 182+ messages in thread
* Re: face for non-ASCII characters 2011-04-16 17:11 ` Lennart Borgman @ 2011-04-18 15:48 ` Ted Zlatanov 2011-04-18 15:53 ` Lennart Borgman 0 siblings, 1 reply; 182+ messages in thread From: Ted Zlatanov @ 2011-04-18 15:48 UTC (permalink / raw) To: emacs-devel On Sat, 16 Apr 2011 19:11:45 +0200 Lennart Borgman <lennart.borgman@gmail.com> wrote: >> Could you please do it the way you like and post your version 0.2? Even >> better, put it in nXhtml and we can later propose markchars.el for the >> GNU ELPA? (no message) Lennart, maybe you sent prematurely or something else went wrong? There was no new content in your message. Ted ^ permalink raw reply [flat|nested] 182+ messages in thread
* Re: face for non-ASCII characters 2011-04-18 15:48 ` Ted Zlatanov @ 2011-04-18 15:53 ` Lennart Borgman 2011-04-18 16:20 ` Ted Zlatanov 0 siblings, 1 reply; 182+ messages in thread From: Lennart Borgman @ 2011-04-18 15:53 UTC (permalink / raw) To: Ted Zlatanov; +Cc: emacs-devel 2011/4/18 Ted Zlatanov <tzz@lifelogs.com>: > On Sat, 16 Apr 2011 19:11:45 +0200 Lennart Borgman <lennart.borgman@gmail.com> wrote: > >>> Could you please do it the way you like and post your version 0.2? Even >>> better, put it in nXhtml and we can later propose markchars.el for the >>> GNU ELPA? > > (no message) > > Lennart, maybe you sent prematurely or something else went wrong? There > was no new content in your message. Eh, yes. I was going to say that I will take your version (with my suggestion for loading) and put it in nXhtml repository. (Some troubles with my computers at the moment.) ^ permalink raw reply [flat|nested] 182+ messages in thread
* Re: face for non-ASCII characters 2011-04-18 15:53 ` Lennart Borgman @ 2011-04-18 16:20 ` Ted Zlatanov 2011-04-18 17:03 ` Lennart Borgman 0 siblings, 1 reply; 182+ messages in thread From: Ted Zlatanov @ 2011-04-18 16:20 UTC (permalink / raw) To: emacs-devel On Mon, 18 Apr 2011 17:53:03 +0200 Lennart Borgman <lennart.borgman@gmail.com> wrote: LB> 2011/4/18 Ted Zlatanov <tzz@lifelogs.com>: >> On Sat, 16 Apr 2011 19:11:45 +0200 Lennart Borgman <lennart.borgman@gmail.com> wrote: >> >>>> Could you please do it the way you like and post your version 0.2? Even >>>> better, put it in nXhtml and we can later propose markchars.el for the >>>> GNU ELPA? >> >> (no message) >> >> Lennart, maybe you sent prematurely or something else went wrong? There >> was no new content in your message. LB> Eh, yes. I was going to say that I will take your version (with my LB> suggestion for loading) and put it in nXhtml repository. (Some LB> troubles with my computers at the moment.) Is it OK if I also propose it for the GNU ELPA? It doesn't depend on idn.el or anything else external, and I can set up a nightly job to mirror it from the nXhtml repository. Ted ^ permalink raw reply [flat|nested] 182+ messages in thread
* Re: face for non-ASCII characters 2011-04-18 16:20 ` Ted Zlatanov @ 2011-04-18 17:03 ` Lennart Borgman 2011-04-19 13:07 ` Ted Zlatanov 0 siblings, 1 reply; 182+ messages in thread From: Lennart Borgman @ 2011-04-18 17:03 UTC (permalink / raw) To: Ted Zlatanov, Reuben Thomas; +Cc: emacs-devel 2011/4/18 Ted Zlatanov <tzz@lifelogs.com>: > > LB> Eh, yes. I was going to say that I will take your version (with my > LB> suggestion for loading) and put it in nXhtml repository. (Some > LB> troubles with my computers at the moment.) > > Is it OK if I also propose it for the GNU ELPA? It doesn't depend on > idn.el or anything else external, and I can set up a nightly job to > mirror it from the nXhtml repository. Could you please coordinate this with Reuben who have had similar questions and offered to help? I think the best would be to keep nXhtml as one package at the moment (and I think Reuben agrees to that). Later on I think it should be split and more general things like this extracted to ELPA or Emacs. ^ permalink raw reply [flat|nested] 182+ messages in thread
* Re: face for non-ASCII characters 2011-04-18 17:03 ` Lennart Borgman @ 2011-04-19 13:07 ` Ted Zlatanov 2011-04-19 18:56 ` Lennart Borgman 0 siblings, 1 reply; 182+ messages in thread From: Ted Zlatanov @ 2011-04-19 13:07 UTC (permalink / raw) To: emacs-devel On Mon, 18 Apr 2011 19:03:15 +0200 Lennart Borgman <lennart.borgman@gmail.com> wrote: LB> 2011/4/18 Ted Zlatanov <tzz@lifelogs.com>: >> LB> Eh, yes. I was going to say that I will take your version (with my LB> suggestion for loading) and put it in nXhtml repository. (Some LB> troubles with my computers at the moment.) >> >> Is it OK if I also propose it for the GNU ELPA? It doesn't depend on >> idn.el or anything else external, and I can set up a nightly job to >> mirror it from the nXhtml repository. LB> Could you please coordinate this with Reuben who have had similar LB> questions and offered to help? When? Where? You have to give me some idea of the questions Reuben asked and what he offered :) LB> I think the best would be to keep nXhtml as one package at the moment LB> (and I think Reuben agrees to that). Later on I think it should be LB> split and more general things like this extracted to ELPA or Emacs. That makes it hard for people like me, who don't use nXhtml, to use markchars.el (which as I mentioned is a standalone library). What would you suggest as a remedy instead of installing nXhtml? Thanks Ted ^ permalink raw reply [flat|nested] 182+ messages in thread
* Re: face for non-ASCII characters 2011-04-19 13:07 ` Ted Zlatanov @ 2011-04-19 18:56 ` Lennart Borgman 2011-04-20 14:49 ` Ted Zlatanov 0 siblings, 1 reply; 182+ messages in thread From: Lennart Borgman @ 2011-04-19 18:56 UTC (permalink / raw) To: Ted Zlatanov, Reuben Thomas; +Cc: emacs-devel 2011/4/19 Ted Zlatanov <tzz@lifelogs.com>: > On Mon, 18 Apr 2011 19:03:15 +0200 Lennart Borgman <lennart.borgman@gmail.com> wrote: > > LB> 2011/4/18 Ted Zlatanov <tzz@lifelogs.com>: >>> > LB> Eh, yes. I was going to say that I will take your version (with my > LB> suggestion for loading) and put it in nXhtml repository. (Some > LB> troubles with my computers at the moment.) >>> >>> Is it OK if I also propose it for the GNU ELPA? It doesn't depend on >>> idn.el or anything else external, and I can set up a nightly job to >>> mirror it from the nXhtml repository. > > LB> Could you please coordinate this with Reuben who have had similar > LB> questions and offered to help? > > When? Where? You have to give me some idea of the questions Reuben > asked and what he offered :) He contacted me privately. Could you please ask him so I do not get it wrong? > LB> I think the best would be to keep nXhtml as one package at the moment > LB> (and I think Reuben agrees to that). Later on I think it should be > LB> split and more general things like this extracted to ELPA or Emacs. > > That makes it hard for people like me, who don't use nXhtml, to use > markchars.el (which as I mentioned is a standalone library). What would > you suggest as a remedy instead of installing nXhtml? I do not know how ELPA handles this. (But I do not think there is any problems installing all of nXhtml. Everything is autoloaded.) ^ permalink raw reply [flat|nested] 182+ messages in thread
* Re: face for non-ASCII characters 2011-04-19 18:56 ` Lennart Borgman @ 2011-04-20 14:49 ` Ted Zlatanov 2011-04-20 21:38 ` Lennart Borgman 0 siblings, 1 reply; 182+ messages in thread From: Ted Zlatanov @ 2011-04-20 14:49 UTC (permalink / raw) To: emacs-devel On Tue, 19 Apr 2011 20:56:41 +0200 Lennart Borgman <lennart.borgman@gmail.com> wrote: LB> 2011/4/19 Ted Zlatanov <tzz@lifelogs.com>: LB> I think the best would be to keep nXhtml as one package at the moment LB> (and I think Reuben agrees to that). Later on I think it should be LB> split and more general things like this extracted to ELPA or Emacs. >> >> That makes it hard for people like me, who don't use nXhtml, to use >> markchars.el (which as I mentioned is a standalone library). What would >> you suggest as a remedy instead of installing nXhtml? LB> I do not know how ELPA handles this. (But I do not think there is any LB> problems installing all of nXhtml. Everything is autoloaded.) All I want is an easy way to display suspicious characters (as I recall from when we started this discussion over a year ago). I have no problem that you're hosting markchars.el inside nXhtml and I can mirror just that one file into the GNU ELPA, but if nXhtml has to be *installed* in order to use markchars.el, I'll have to write my own version. I think installing a large, unrelated package is an unnecessary burden on those who just want to detect suspicious characters. Can you please confirm one way or the other so I know what I need to do, mirror or rewrite? Just to be clear, mirroring markchars.el does not require you to change anything. The GNU ELPA machine will pull the latest markchars.el down daily, that's all. Thanks Ted ^ permalink raw reply [flat|nested] 182+ messages in thread
* Re: face for non-ASCII characters 2011-04-20 14:49 ` Ted Zlatanov @ 2011-04-20 21:38 ` Lennart Borgman 2011-04-21 17:35 ` Ted Zlatanov 0 siblings, 1 reply; 182+ messages in thread From: Lennart Borgman @ 2011-04-20 21:38 UTC (permalink / raw) To: Ted Zlatanov, Reuben Thomas; +Cc: emacs-devel 2011/4/20 Ted Zlatanov <tzz@lifelogs.com>: > > LB> I do not know how ELPA handles this. (But I do not think there is any > LB> problems installing all of nXhtml. Everything is autoloaded.) > > All I want is an easy way to display suspicious characters (as I recall > from when we started this discussion over a year ago). I have no > problem that you're hosting markchars.el inside nXhtml and I can mirror > just that one file into the GNU ELPA, but if nXhtml has to be > *installed* in order to use markchars.el, I'll have to write my own > version. I think installing a large, unrelated package is an > unnecessary burden on those who just want to detect suspicious > characters. Can you please confirm one way or the other so I know what > I need to do, mirror or rewrite? > > Just to be clear, mirroring markchars.el does not require you to change > anything. The GNU ELPA machine will pull the latest markchars.el down > daily, that's all. nXhtml is not very much more monolitic than ELPA actually. You can use just the basic libraries in nXhtml and get the rest downloaded automatically when you try to access them. So it contains kind of "ELPA" inside. But that is not the point here, but since you took it up I mention it. As I tried to say nXhtml will somehow be mirrored in ELPA and Reuben has said that he want to contribute to that process. So please try to discuss this witrh him. ^ permalink raw reply [flat|nested] 182+ messages in thread
* Re: face for non-ASCII characters 2011-04-20 21:38 ` Lennart Borgman @ 2011-04-21 17:35 ` Ted Zlatanov 2011-04-21 18:42 ` Lennart Borgman 0 siblings, 1 reply; 182+ messages in thread From: Ted Zlatanov @ 2011-04-21 17:35 UTC (permalink / raw) To: emacs-devel On Wed, 20 Apr 2011 23:38:12 +0200 Lennart Borgman <lennart.borgman@gmail.com> wrote: LB> 2011/4/20 Ted Zlatanov <tzz@lifelogs.com>: >> LB> I do not know how ELPA handles this. (But I do not think there is any LB> problems installing all of nXhtml. Everything is autoloaded.) >> >> All I want is an easy way to display suspicious characters (as I recall >> from when we started this discussion over a year ago). I have no >> problem that you're hosting markchars.el inside nXhtml and I can mirror >> just that one file into the GNU ELPA, but if nXhtml has to be >> *installed* in order to use markchars.el, I'll have to write my own >> version. I think installing a large, unrelated package is an >> unnecessary burden on those who just want to detect suspicious >> characters. Can you please confirm one way or the other so I know what >> I need to do, mirror or rewrite? >> >> Just to be clear, mirroring markchars.el does not require you to change >> anything. The GNU ELPA machine will pull the latest markchars.el down >> daily, that's all. LB> nXhtml is not very much more monolitic than ELPA actually. You can use LB> just the basic libraries in nXhtml and get the rest downloaded LB> automatically when you try to access them. So it contains kind of LB> "ELPA" inside. LB> But that is not the point here, but since you took it up I mention it. LB> As I tried to say nXhtml will somehow be mirrored in ELPA and Reuben LB> has said that he want to contribute to that process. So please try to LB> discuss this witrh him. If you think about it, this is not a good user experience. "You should beware domain names with suspicious characters that are not in the same script. markchars.el can do it." "How do I install it?" "You have to install nXhtml, a web development environment." "???" Compare to 'You run (package-install "markchars").' Which one makes more sense for the users? And even assuming nXhtml is in the ELPA on its own, if they are browsing the list of packages, why would they expect a web development environment to provide what markchars.el provides? Sorry if this is not what you want to do, but it really doesn't make sense to force the user to install nXhtml in order to detect suspicious characters. So I think mirroring or rewriting markchars.el are the only options to provide a good user experience. I can't think of any others. Ted ^ permalink raw reply [flat|nested] 182+ messages in thread
* Re: face for non-ASCII characters 2011-04-21 17:35 ` Ted Zlatanov @ 2011-04-21 18:42 ` Lennart Borgman 2011-04-21 19:14 ` Ted Zlatanov 0 siblings, 1 reply; 182+ messages in thread From: Lennart Borgman @ 2011-04-21 18:42 UTC (permalink / raw) To: Ted Zlatanov; +Cc: emacs-devel 2011/4/21 Ted Zlatanov <tzz@lifelogs.com>: > On Wed, 20 Apr 2011 23:38:12 +0200 Lennart Borgman <lennart.borgman@gmail.com> wrote: > > LB> 2011/4/20 Ted Zlatanov <tzz@lifelogs.com>: >>> > LB> I do not know how ELPA handles this. (But I do not think there is any > LB> problems installing all of nXhtml. Everything is autoloaded.) >>> >>> All I want is an easy way to display suspicious characters (as I recall >>> from when we started this discussion over a year ago). I have no >>> problem that you're hosting markchars.el inside nXhtml and I can mirror >>> just that one file into the GNU ELPA, but if nXhtml has to be >>> *installed* in order to use markchars.el, I'll have to write my own >>> version. I think installing a large, unrelated package is an >>> unnecessary burden on those who just want to detect suspicious >>> characters. Can you please confirm one way or the other so I know what >>> I need to do, mirror or rewrite? >>> >>> Just to be clear, mirroring markchars.el does not require you to change >>> anything. The GNU ELPA machine will pull the latest markchars.el down >>> daily, that's all. > > LB> nXhtml is not very much more monolitic than ELPA actually. You can use > LB> just the basic libraries in nXhtml and get the rest downloaded > LB> automatically when you try to access them. So it contains kind of > LB> "ELPA" inside. > > LB> But that is not the point here, but since you took it up I mention it. > > LB> As I tried to say nXhtml will somehow be mirrored in ELPA and Reuben > LB> has said that he want to contribute to that process. So please try to > LB> discuss this witrh him. > > If you think about it, this is not a good user experience. > > "You should beware domain names with suspicious characters that are not > in the same script. markchars.el can do it." > > "How do I install it?" > > "You have to install nXhtml, a web development environment." > > "???" Yes, the name and description is misleading. I have been waiting for something like ELPA (with the additonal capabilities in that area that nXhtml provides). > Sorry if this is not what you want to do, but it really doesn't make > sense to force the user to install nXhtml in order to detect suspicious > characters. So I think mirroring or rewriting markchars.el are the only > options to provide a good user experience. I can't think of any others. You are maybe misunderstanding part of this. For me the transition is the problem (and maybe the current lack of dynamic loading of libraries from ELPA). I do agree with you where markchars.el should be in the end. It should be in ELPA or Emacs. It is the road we have to think about. ^ permalink raw reply [flat|nested] 182+ messages in thread
* Re: face for non-ASCII characters 2011-04-21 18:42 ` Lennart Borgman @ 2011-04-21 19:14 ` Ted Zlatanov 2011-04-21 20:00 ` Lennart Borgman 0 siblings, 1 reply; 182+ messages in thread From: Ted Zlatanov @ 2011-04-21 19:14 UTC (permalink / raw) To: emacs-devel On Thu, 21 Apr 2011 20:42:08 +0200 Lennart Borgman <lennart.borgman@gmail.com> wrote: LB> You are maybe misunderstanding part of this. For me the transition is LB> the problem (and maybe the current lack of dynamic loading of LB> libraries from ELPA). I'm not sure what you mean about dynamic loading. Package installation from the GNU ELPA (and ELPA in general) is static: you install what you want synchronously, there is no "auto-install this when I need it" functionality. Do you mean this is an issue? LB> I do agree with you where markchars.el should be in the end. It LB> should be in ELPA or Emacs. It is the road we have to think about. OK, as I understand it you also want to be able to offer markchars.el to users as a separate feature independent of nXhtml. We agree. As I said markchars.el can be simply mirrored into the GNU ELPA nightly from the nXhtml repository. Is that the road you're referring to? Or does something else about the mirroring worry you? How would you like to tell users to install it, if my proposal is not OK for some reason? Thanks Ted ^ permalink raw reply [flat|nested] 182+ messages in thread
* Re: face for non-ASCII characters 2011-04-21 19:14 ` Ted Zlatanov @ 2011-04-21 20:00 ` Lennart Borgman 2011-04-21 20:35 ` Ted Zlatanov 0 siblings, 1 reply; 182+ messages in thread From: Lennart Borgman @ 2011-04-21 20:00 UTC (permalink / raw) To: Ted Zlatanov; +Cc: emacs-devel 2011/4/21 Ted Zlatanov <tzz@lifelogs.com>: > On Thu, 21 Apr 2011 20:42:08 +0200 Lennart Borgman <lennart.borgman@gmail.com> wrote: > > LB> You are maybe misunderstanding part of this. For me the transition is > LB> the problem (and maybe the current lack of dynamic loading of > LB> libraries from ELPA). > > I'm not sure what you mean about dynamic loading. What I mean here is what can be used in nXhtml: If you (require 'somelib) and somelib.el is not on your computer then you can have it automatically downloaded from nXhtml repository (with a possibility to check the code before actually installing it). > Package installation > from the GNU ELPA (and ELPA in general) is static: you install what you > want synchronously, there is no "auto-install this when I need it" > functionality. Do you mean this is an issue? Yes. It makes it more difficult to integrate elisp libraries. This is one of the reasons I have kept so many things in nXhtml. If ELPA had that possibility I would not hesitate to move things into ELPA. (Unfortunately for this to work realy reliable a (simple) change in `require' is needed.) > LB> I do agree with you where markchars.el should be in the end. It > LB> should be in ELPA or Emacs. It is the road we have to think about. > > OK, as I understand it you also want to be able to offer markchars.el to > users as a separate feature independent of nXhtml. We agree. > > As I said markchars.el can be simply mirrored into the GNU ELPA nightly > from the nXhtml repository. Is that the road you're referring to? Or > does something else about the mirroring worry you? How would you like > to tell users to install it, if my proposal is not OK for some reason? At the moment just grab it from the nXhtml repository. You can download it from the web page there. ^ permalink raw reply [flat|nested] 182+ messages in thread
* Re: face for non-ASCII characters 2011-04-21 20:00 ` Lennart Borgman @ 2011-04-21 20:35 ` Ted Zlatanov 2011-04-21 20:53 ` Lennart Borgman 0 siblings, 1 reply; 182+ messages in thread From: Ted Zlatanov @ 2011-04-21 20:35 UTC (permalink / raw) To: emacs-devel On Thu, 21 Apr 2011 22:00:38 +0200 Lennart Borgman <lennart.borgman@gmail.com> wrote: LB> 2011/4/21 Ted Zlatanov <tzz@lifelogs.com>: >> On Thu, 21 Apr 2011 20:42:08 +0200 Lennart Borgman <lennart.borgman@gmail.com> wrote: >> LB> You are maybe misunderstanding part of this. For me the transition is LB> the problem (and maybe the current lack of dynamic loading of LB> libraries from ELPA). >> >> I'm not sure what you mean about dynamic loading. LB> What I mean here is what can be used in nXhtml: If you (require LB> 'somelib) and somelib.el is not on your computer then you can have it LB> automatically downloaded from nXhtml repository (with a possibility to LB> check the code before actually installing it). I would be strongly opposed to opportunistic package installations in general, although nXhtml can use it internally of course. >> Package installation from the GNU ELPA (and ELPA in general) is >> static: you install what you want synchronously, there is no >> "auto-install this when I need it" functionality. Do you mean this >> is an issue? LB> Yes. It makes it more difficult to integrate elisp libraries. This is LB> one of the reasons I have kept so many things in nXhtml. If ELPA had LB> that possibility I would not hesitate to move things into ELPA. LB> (Unfortunately for this to work realy reliable a (simple) change in LB> `require' is needed.) You should make that dynamic loading proposal separately and maybe it will be accepted, but as far as markchars.el goes, since it will reside in nXhtml, you can still do the nXhtml opportunistic installs if the user hasn't installed it from the GNU ELPA statically. >> As I said markchars.el can be simply mirrored into the GNU ELPA nightly >> from the nXhtml repository. Is that the road you're referring to? Or >> does something else about the mirroring worry you? How would you like >> to tell users to install it, if my proposal is not OK for some reason? LB> At the moment just grab it from the nXhtml repository. You can LB> download it from the web page there. Sorry, is "just grab it" advice for me to mirror into the GNU ELPA, or advice for the end user? I hope it's the former but just making sure... Ted ^ permalink raw reply [flat|nested] 182+ messages in thread
* Re: face for non-ASCII characters 2011-04-21 20:35 ` Ted Zlatanov @ 2011-04-21 20:53 ` Lennart Borgman 2011-04-21 21:18 ` Ted Zlatanov 0 siblings, 1 reply; 182+ messages in thread From: Lennart Borgman @ 2011-04-21 20:53 UTC (permalink / raw) To: Ted Zlatanov; +Cc: emacs-devel 2011/4/21 Ted Zlatanov <tzz@lifelogs.com>: > On Thu, 21 Apr 2011 22:00:38 +0200 Lennart Borgman <lennart.borgman@gmail.com> wrote: > > LB> 2011/4/21 Ted Zlatanov <tzz@lifelogs.com>: >>> On Thu, 21 Apr 2011 20:42:08 +0200 Lennart Borgman <lennart.borgman@gmail.com> wrote: >>> > LB> You are maybe misunderstanding part of this. For me the transition is > LB> the problem (and maybe the current lack of dynamic loading of > LB> libraries from ELPA). >>> >>> I'm not sure what you mean about dynamic loading. > > LB> What I mean here is what can be used in nXhtml: If you (require > LB> 'somelib) and somelib.el is not on your computer then you can have it > LB> automatically downloaded from nXhtml repository (with a possibility to > LB> check the code before actually installing it). > > I would be strongly opposed to opportunistic package installations in > general, although nXhtml can use it internally of course. Why is the word opportunistic used by you here? I do not have time to discuss if you do not take it seriously. Please describe exactly what it is you do not like instead. >>> Package installation from the GNU ELPA (and ELPA in general) is >>> static: you install what you want synchronously, there is no >>> "auto-install this when I need it" functionality. Do you mean this >>> is an issue? > > LB> Yes. It makes it more difficult to integrate elisp libraries. This is > LB> one of the reasons I have kept so many things in nXhtml. If ELPA had > LB> that possibility I would not hesitate to move things into ELPA. > LB> (Unfortunately for this to work realy reliable a (simple) change in > LB> `require' is needed.) > > You should make that dynamic loading proposal separately and maybe it > will be accepted, but as far as markchars.el goes, since it will reside > in nXhtml, you can still do the nXhtml opportunistic installs if the > user hasn't installed it from the GNU ELPA statically. You are greatly exaggerating. The difference between ELPA and nXhtml here is that nXhtml will propose that you can install a library to get things working while ELPA will not do that. So I think the way nXhtml handles this just makes things easier for the user. (And I would be glad if ELPA did this instead of nXhtml. I really do not want the code there. And I do not want markchars.el either there.) >>> As I said markchars.el can be simply mirrored into the GNU ELPA nightly >>> from the nXhtml repository. Is that the road you're referring to? Or >>> does something else about the mirroring worry you? How would you like >>> to tell users to install it, if my proposal is not OK for some reason? > > LB> At the moment just grab it from the nXhtml repository. You can > LB> download it from the web page there. > > Sorry, is "just grab it" advice for me to mirror into the GNU ELPA, or > advice for the end user? I hope it's the former but just making sure... At first sight one might think that your proposal to mirror markchars.el into ELPA is not troublesome. However you may end up with two versions of markchars.el if you mirror it into ELPA now. I would be glad to have it in ELPA - if just the automatic installation could be fixed too. But you are however of course free to do what you want. ^ permalink raw reply [flat|nested] 182+ messages in thread
* Re: face for non-ASCII characters 2011-04-21 20:53 ` Lennart Borgman @ 2011-04-21 21:18 ` Ted Zlatanov 2011-04-22 12:20 ` Lennart Borgman 0 siblings, 1 reply; 182+ messages in thread From: Ted Zlatanov @ 2011-04-21 21:18 UTC (permalink / raw) To: emacs-devel On Thu, 21 Apr 2011 22:53:01 +0200 Lennart Borgman <lennart.borgman@gmail.com> wrote: LB> 2011/4/21 Ted Zlatanov <tzz@lifelogs.com>: >>>> I'm not sure what you mean about dynamic loading. >> LB> What I mean here is what can be used in nXhtml: If you (require LB> 'somelib) and somelib.el is not on your computer then you can have it LB> automatically downloaded from nXhtml repository (with a possibility to LB> check the code before actually installing it). >> >> I would be strongly opposed to opportunistic package installations in >> general, although nXhtml can use it internally of course. LB> Why is the word opportunistic used by you here? I do not have time to LB> discuss if you do not take it seriously. Please describe exactly what LB> it is you do not like instead. "Opportunistic" means it's installed when you need it as you said. "Dynamic loading" is a term easily confused with the Unix dynamic libraries, that's why I avoided it. I am not using "opportunistic" disparagingly. I am opposed to opportunistic installs because they destabilize the working environment. They may make sense in a tightly controlled environment, but for a general audience (all Emacs users) I think they are a bad idea. Most package managers I've used (Perl, Python, Ruby, Emacs, XEmacs, Unix distributions) do static installs. This is different from autoloading, where you know the library is available and you've scanned it for autoload cookies. LB> You are greatly exaggerating. The difference between ELPA and nXhtml LB> here is that nXhtml will propose that you can install a library to get LB> things working while ELPA will not do that. ELPA will install all the dependencies when it installs the library. So when the library is installed, you won't have surprises later. If you're talking about optional add-ons and plugins, that's a different discussion :) As I said, you should make the opportunistic/dynamic loading proposal and maybe it will be accepted. While it seems to me like a bad idea, it's entirely possible it turns out to be good! We won't know until it's discussed directly. LB> At first sight one might think that your proposal to mirror LB> markchars.el into ELPA is not troublesome. However you may end up with LB> two versions of markchars.el if you mirror it into ELPA now. That would last for at most 1 day, until the nightly synchronization catches up with the nXhtml repository. I think that's OK. The nXhtml repository will still be the primary repository. LB> I would be glad to have it in ELPA - if just the automatic LB> installation could be fixed too. Can you give a specific scenario where markchars.el in both the GNU ELPA and in nXhtml would be a problem? I want to understand what needs to be fixed. LB> But you are however of course free to do what you want. Sure, but I'd rather collaborate if I can. The easiest thing (just keep markchars.el in the GNU ELPA) is not the best thing for the users. Ted ^ permalink raw reply [flat|nested] 182+ messages in thread
* Re: face for non-ASCII characters 2011-04-21 21:18 ` Ted Zlatanov @ 2011-04-22 12:20 ` Lennart Borgman 2011-04-22 12:49 ` Stephen J. Turnbull 2011-04-22 14:20 ` Ted Zlatanov 0 siblings, 2 replies; 182+ messages in thread From: Lennart Borgman @ 2011-04-22 12:20 UTC (permalink / raw) To: Ted Zlatanov; +Cc: emacs-devel 2011/4/21 Ted Zlatanov <tzz@lifelogs.com>: > On Thu, 21 Apr 2011 22:53:01 +0200 Lennart Borgman <lennart.borgman@gmail.com> wrote: > > LB> 2011/4/21 Ted Zlatanov <tzz@lifelogs.com>: >>>>> I'm not sure what you mean about dynamic loading. >>> > LB> What I mean here is what can be used in nXhtml: If you (require > LB> 'somelib) and somelib.el is not on your computer then you can have it > LB> automatically downloaded from nXhtml repository (with a possibility to > LB> check the code before actually installing it). >>> >>> I would be strongly opposed to opportunistic package installations in >>> general, although nXhtml can use it internally of course. > > LB> Why is the word opportunistic used by you here? I do not have time to > LB> discuss if you do not take it seriously. Please describe exactly what > LB> it is you do not like instead. > > "Opportunistic" means it's installed when you need it as you said. > "Dynamic loading" is a term easily confused with the Unix dynamic > libraries, that's why I avoided it. I am not using "opportunistic" > disparagingly. Oh, I see. Then my answer was inappropriate. My excuses. And I am glad I was wrong. > I am opposed to opportunistic installs because they destabilize the > working environment. They may make sense in a tightly controlled > environment, but for a general audience (all Emacs users) I think they > are a bad idea. Most package managers I've used (Perl, Python, Ruby, > Emacs, XEmacs, Unix distributions) do static installs. I can surely see the problem, but if the opportunistic installer asks (and make it possible to check) before each install I do not think it is an additional problem when using Emacs. For another comparison think about the firewalls. They effectively act similar to such an opportunistic installer as I suggest when they ask you if you want a program to be able to do that and that. > This is different from autoloading, where you know the library is > available and you've scanned it for autoload cookies. This can be done by an opportunistic installer. In fact the opportunistic installer in nXhtml allows you to check the file. > LB> You are greatly exaggerating. The difference between ELPA and nXhtml > LB> here is that nXhtml will propose that you can install a library to get > LB> things working while ELPA will not do that. > > ELPA will install all the dependencies when it installs the library. So > when the library is installed, you won't have surprises later. If > you're talking about optional add-ons and plugins, that's a different > discussion :) It is not clear all the time what dependencies there are since that may depend on how you are using a library. That is why I think an opportunistic installer is good. > As I said, you should make the opportunistic/dynamic loading proposal > and maybe it will be accepted. While it seems to me like a bad idea, > it's entirely possible it turns out to be good! We won't know until > it's discussed directly. I do not have time to discuss it much now, but please see my explanations above. > LB> At first sight one might think that your proposal to mirror > LB> markchars.el into ELPA is not troublesome. However you may end up with > LB> two versions of markchars.el if you mirror it into ELPA now. > > That would last for at most 1 day, until the nightly synchronization > catches up with the nXhtml repository. I think that's OK. The nXhtml > repository will still be the primary repository. A misunderstanding. I was referring to two versions in different locations on the users computers. > LB> I would be glad to have it in ELPA - if just the automatic > LB> installation could be fixed too. > > Can you give a specific scenario where markchars.el in both the GNU ELPA > and in nXhtml would be a problem? I want to understand what needs to be > fixed. Please see just above. > LB> But you are however of course free to do what you want. > > Sure, but I'd rather collaborate if I can. The easiest thing (just keep > markchars.el in the GNU ELPA) is not the best thing for the users. Good. I am not sure either but want to give you my concerns. Please feel free to handle it the way you think is best at the moment. ^ permalink raw reply [flat|nested] 182+ messages in thread
* Re: face for non-ASCII characters 2011-04-22 12:20 ` Lennart Borgman @ 2011-04-22 12:49 ` Stephen J. Turnbull 2011-04-22 13:23 ` Lennart Borgman 2011-04-22 14:20 ` Ted Zlatanov 1 sibling, 1 reply; 182+ messages in thread From: Stephen J. Turnbull @ 2011-04-22 12:49 UTC (permalink / raw) To: Lennart Borgman; +Cc: Ted Zlatanov, emacs-devel Lennart Borgman writes: > > As I said, you should make the opportunistic/dynamic loading proposal > > and maybe it will be accepted. While it seems to me like a bad idea, > > it's entirely possible it turns out to be good! We won't know until > > it's discussed directly. > > I do not have time to discuss it much now, but please see my > explanations above. It was discussed at the time that ELPA became officially sanctioned by the Emacs project, and Richard's reaction then was almost identical to Ted's now, except Richard left less room for changing his mind on the issue. ^ permalink raw reply [flat|nested] 182+ messages in thread
* Re: face for non-ASCII characters 2011-04-22 12:49 ` Stephen J. Turnbull @ 2011-04-22 13:23 ` Lennart Borgman 2011-04-23 0:50 ` Richard Stallman 0 siblings, 1 reply; 182+ messages in thread From: Lennart Borgman @ 2011-04-22 13:23 UTC (permalink / raw) To: Stephen J. Turnbull; +Cc: Ted Zlatanov, emacs-devel On Fri, Apr 22, 2011 at 2:49 PM, Stephen J. Turnbull <stephen@xemacs.org> wrote: > Lennart Borgman writes: > > > > As I said, you should make the opportunistic/dynamic loading proposal > > > and maybe it will be accepted. While it seems to me like a bad idea, > > > it's entirely possible it turns out to be good! We won't know until > > > it's discussed directly. > > > > I do not have time to discuss it much now, but please see my > > explanations above. > > It was discussed at the time that ELPA became officially sanctioned by > the Emacs project, and Richard's reaction then was almost identical to > Ted's now, except Richard left less room for changing his mind on the > issue. I believe RMS rejection was not so much because of instability but insecurity and that the user should have control. It was after that I added the possibility to review and reject the opportunistic install, just before the library is going to be installed. ^ permalink raw reply [flat|nested] 182+ messages in thread
* Re: face for non-ASCII characters 2011-04-22 13:23 ` Lennart Borgman @ 2011-04-23 0:50 ` Richard Stallman 2011-04-23 7:13 ` Lennart Borgman 0 siblings, 1 reply; 182+ messages in thread From: Richard Stallman @ 2011-04-23 0:50 UTC (permalink / raw) To: Lennart Borgman; +Cc: stephen, tzz, emacs-devel I already made a decision about this question. The users' freedom is at stake, not just convenience. To decide it again would be extra work that doesn't need to be done and that I don't have time for. -- Dr Richard Stallman President, Free Software Foundation 51 Franklin St Boston MA 02110 USA www.fsf.org, www.gnu.org Skype: No way! That's nonfree (freedom-denying) software. ^ permalink raw reply [flat|nested] 182+ messages in thread
* Re: face for non-ASCII characters 2011-04-23 0:50 ` Richard Stallman @ 2011-04-23 7:13 ` Lennart Borgman 2011-04-25 17:54 ` Richard Stallman 0 siblings, 1 reply; 182+ messages in thread From: Lennart Borgman @ 2011-04-23 7:13 UTC (permalink / raw) To: rms; +Cc: stephen, tzz, emacs-devel On Sat, Apr 23, 2011 at 2:50 AM, Richard Stallman <rms@gnu.org> wrote: > I already made a decision about this question. The users' freedom is > at stake, not just convenience. This was a decision you made against a version of opportunistic install where the user could not decide against it. After that I changed it so the user can review and decide at each opportunistic install. I can not see that the users' freedom in any way is at stake with the new version. > To decide it again would be extra work that doesn't need to be done > and that I don't have time for. ^ permalink raw reply [flat|nested] 182+ messages in thread
* Re: face for non-ASCII characters 2011-04-23 7:13 ` Lennart Borgman @ 2011-04-25 17:54 ` Richard Stallman 2011-04-26 18:26 ` Chong Yidong 0 siblings, 1 reply; 182+ messages in thread From: Richard Stallman @ 2011-04-25 17:54 UTC (permalink / raw) To: Lennart Borgman; +Cc: stephen, tzz, emacs-devel After that I changed it so the user can review and decide at each opportunistic install. This is not a binary question, it's a scale. You want to take a step towards the wrong end of the scale. We are just introducing the separate package archive, which is a big step. Now is too soon to consider another step. The risk here is hard to calculate, so we must be cautious. -- Dr Richard Stallman President, Free Software Foundation 51 Franklin St Boston MA 02110 USA www.fsf.org, www.gnu.org Skype: No way! That's nonfree (freedom-denying) software. ^ permalink raw reply [flat|nested] 182+ messages in thread
* Re: face for non-ASCII characters 2011-04-25 17:54 ` Richard Stallman @ 2011-04-26 18:26 ` Chong Yidong 2011-04-26 19:05 ` Ted Zlatanov 2011-04-27 12:41 ` Lennart Borgman 0 siblings, 2 replies; 182+ messages in thread From: Chong Yidong @ 2011-04-26 18:26 UTC (permalink / raw) To: rms; +Cc: stephen, Lennart Borgman, tzz, emacs-devel Richard Stallman <rms@gnu.org> writes: > After that I changed it so the user can review and decide at each > opportunistic install. > > This is not a binary question, it's a scale. You want to take a step > towards the wrong end of the scale. We are just introducing the > separate package archive, which is a big step. Now is too soon to > consider another step. The risk here is hard to calculate, so we must > be cautious. For what it's worth, I don't think general-case opportunistic installation is a good idea, even from a strictly technical standpoint. One can imagine special instances where one might wish to prompt the user, at a very specific point during the execution of a command, to install a package. In fact, this can be done easily using the existing package.el infrastructure (or even without package.el; this is Lisp, after all). But such program behavior should not be encouraged in general. In the vast majority of forseeable Emacs-relevant situations, opportunistic installation is a false optimization. It introduces more moving parts into a program and obnoxiously makes the user wait for a download in the middle of a command---all for the purpose of saving trivial amounts of disk space. In practically all instances, package dependencies are the proper way to go. So the functionality that Lennart is promoting is not something that I'm interested in at all. ^ permalink raw reply [flat|nested] 182+ messages in thread
* Re: face for non-ASCII characters 2011-04-26 18:26 ` Chong Yidong @ 2011-04-26 19:05 ` Ted Zlatanov 2011-04-26 20:29 ` Chong Yidong 2011-04-27 12:41 ` Lennart Borgman 1 sibling, 1 reply; 182+ messages in thread From: Ted Zlatanov @ 2011-04-26 19:05 UTC (permalink / raw) To: emacs-devel On Tue, 26 Apr 2011 14:26:13 -0400 Chong Yidong <cyd@stupidchicken.com> wrote: CY> Richard Stallman <rms@gnu.org> writes: >> After that I changed it so the user can review and decide at each >> opportunistic install. >> >> This is not a binary question, it's a scale. You want to take a step >> towards the wrong end of the scale. We are just introducing the >> separate package archive, which is a big step. Now is too soon to >> consider another step. The risk here is hard to calculate, so we must >> be cautious. CY> For what it's worth, I don't think general-case opportunistic CY> installation is a good idea, even from a strictly technical standpoint. CY> One can imagine special instances where one might wish to prompt the CY> user, at a very specific point during the execution of a command, to CY> install a package. In fact, this can be done easily using the existing CY> package.el infrastructure (or even without package.el; this is Lisp, CY> after all). OTOH it's very nice to be told "hey, you could have X if you installed package Y." I think it's good to know about these opportunities at customization time. For example, markchars.el has optional behavior if idn.el is installed. It would be nice if the customization for `markchars-what' connected the 'markchars-nonidn-fun option with the idn.el package, so that 1) the user knows there's a connection, and 2) the user can choose to enable 'markchars-nonidn-fun and *at that time* must install idn.el as well. This could be done in customization code like this: (defcustom markchars-what `(markchars-simple-pattern markchars-confusables ,@(when (fboundp 'idn-is-recommended) '(markchars-nonidn-fun))) "Things to mark, a list of regular expressions or symbols." :type `(repeat (choice :tag "Marking choices" (const :required-feature 'idn ; NEW Custom tag :tag "Non IDN chars (Unicode.org tr39 suggestions)" markchars-nonidn-fun) (const :tag "Confusables" markchars-confusables) (const :tag "`markchars-simple-pattern'" markchars-simple-pattern) (regexp :tag "Arbitrary pattern"))) :group 'markchars) Note the awkwardness of the default, which could also be remedied by this new tag. Ted ^ permalink raw reply [flat|nested] 182+ messages in thread
* Re: face for non-ASCII characters 2011-04-26 19:05 ` Ted Zlatanov @ 2011-04-26 20:29 ` Chong Yidong 2011-04-27 3:45 ` Ted Zlatanov 0 siblings, 1 reply; 182+ messages in thread From: Chong Yidong @ 2011-04-26 20:29 UTC (permalink / raw) To: Ted Zlatanov; +Cc: emacs-devel Ted Zlatanov <tzz@lifelogs.com> writes: > OTOH it's very nice to be told "hey, you could have X if you installed > package Y." I think it's good to know about these opportunities at > customization time. That's more similar to "recommended packages" in OS-level packaging. I don't object, in principle, to adding functionality along those lines. But the markchars.el and idn.el situation actually points to an important danger. IIUC, the straightforward solution would be to make idn.el a dependency, but that's been made difficult by the tangle of copyright and other problems in nXhtml. So this kind of functionality might tend to encourage end-runs around mainline Emacs ("come use my version, and all this extra functionality is magically turned on!"). ^ permalink raw reply [flat|nested] 182+ messages in thread
* Re: face for non-ASCII characters 2011-04-26 20:29 ` Chong Yidong @ 2011-04-27 3:45 ` Ted Zlatanov 2011-04-27 4:42 ` Stephen J. Turnbull 0 siblings, 1 reply; 182+ messages in thread From: Ted Zlatanov @ 2011-04-27 3:45 UTC (permalink / raw) To: emacs-devel On Tue, 26 Apr 2011 16:29:39 -0400 Chong Yidong <cyd@stupidchicken.com> wrote: CY> Ted Zlatanov <tzz@lifelogs.com> writes: >> OTOH it's very nice to be told "hey, you could have X if you installed >> package Y." I think it's good to know about these opportunities at >> customization time. CY> That's more similar to "recommended packages" in OS-level packaging. CY> I don't object, in principle, to adding functionality along those lines. CY> But the markchars.el and idn.el situation actually points to an CY> important danger. IIUC, the straightforward solution would be to make CY> idn.el a dependency, but that's been made difficult by the tangle of CY> copyright and other problems in nXhtml. So this kind of functionality CY> might tend to encourage end-runs around mainline Emacs ("come use my CY> version, and all this extra functionality is magically turned on!"). We could enforce the rule that you're only allowed to recommend other packages that live within the same ELPA or something even more carefully controlled (a fsf-kevin-bacon-number equal or greater than the current one, if you know what I mean). So the Emacs core can recommend anything else in the core. The GNU ELPA can recommend things in the GNU ELPA or the Emacs core. Tom Tromey's ELPA can recommend things in itself or the GNU ELPA or the Emacs core... Ted ^ permalink raw reply [flat|nested] 182+ messages in thread
* Re: face for non-ASCII characters 2011-04-27 3:45 ` Ted Zlatanov @ 2011-04-27 4:42 ` Stephen J. Turnbull 2011-05-02 18:18 ` Ted Zlatanov 0 siblings, 1 reply; 182+ messages in thread From: Stephen J. Turnbull @ 2011-04-27 4:42 UTC (permalink / raw) To: Ted Zlatanov; +Cc: emacs-devel Ted Zlatanov writes: > On Tue, 26 Apr 2011 16:29:39 -0400 Chong Yidong <cyd@stupidchicken.com> wrote: > CY> So this kind of functionality might tend to encourage end-runs > CY> around mainline Emacs ("come use my version, and all this extra > CY> functionality is magically turned on!"). That ship sailed when Richard released Emacs under a free license. Cf. the Mac versions, or the 15-year history of Mule before it was included in the mainline, or Epoch. Experience shows that most users return to the mainline. > We could enforce the rule that you're only allowed to recommend other > packages that live within the same ELPA or something even more carefully > controlled (a fsf-kevin-bacon-number equal or greater than the current > one, if you know what I mean). You can't (and shouldn't try to) enforce anything. If somebody finds the restrictions onerous, they'll fork the package library (or all of Emacs, whichever seems more convenient at the time). If others find the restrictions onerous, they'll install the forked version. That's what free software means. Eventually there will be a "google this package and install it for me" functionality. I think a better way to go is to add a way to filter on that functionality such that the user can white- or blacklist certain URLs, and the package manager will query for URLs not matched by either list. Default the whitelist to (<GNU ELPA>) and the blacklist to t (ie, everything not in the whitelist). Cf. the Debian free, contrib, non-free hierarchy, and Gentoo's allowed-licenses facilty. ^ permalink raw reply [flat|nested] 182+ messages in thread
* Re: face for non-ASCII characters 2011-04-27 4:42 ` Stephen J. Turnbull @ 2011-05-02 18:18 ` Ted Zlatanov 2011-05-03 1:50 ` Stephen J. Turnbull 0 siblings, 1 reply; 182+ messages in thread From: Ted Zlatanov @ 2011-05-02 18:18 UTC (permalink / raw) To: emacs-devel On Wed, 27 Apr 2011 13:42:28 +0900 "Stephen J. Turnbull" <stephen@xemacs.org> wrote: SJT> Ted Zlatanov writes: >> We could enforce the rule that you're only allowed to recommend other >> packages that live within the same ELPA or something even more carefully >> controlled (a fsf-kevin-bacon-number equal or greater than the current >> one, if you know what I mean). SJT> You can't (and shouldn't try to) enforce anything. Within GNU Emacs and the GNU ELPA, many things can (and are) enforced. Obviously I'm not speaking about things outside that sphere. SJT> If somebody finds the restrictions onerous, they'll fork the SJT> package library (or all of Emacs, whichever seems more convenient SJT> at the time). If others find the restrictions onerous, they'll SJT> install the forked version. That's what free software means. That's even more bizarre, defining free software by someone's actions. SJT> Eventually there will be a "google this package and install it for me" SJT> functionality. I think a better way to go is to add a way to filter SJT> on that functionality such that the user can white- or blacklist SJT> certain URLs, and the package manager will query for URLs not matched SJT> by either list. Default the whitelist to (<GNU ELPA>) and the SJT> blacklist to t (ie, everything not in the whitelist). This is what package.el does, minus the Google dependency and using symbolic package names instead of URLs. The user just adds whatever archive URLs he likes to the list. Ted ^ permalink raw reply [flat|nested] 182+ messages in thread
* Re: face for non-ASCII characters 2011-05-02 18:18 ` Ted Zlatanov @ 2011-05-03 1:50 ` Stephen J. Turnbull 2011-05-03 14:45 ` Ted Zlatanov 0 siblings, 1 reply; 182+ messages in thread From: Stephen J. Turnbull @ 2011-05-03 1:50 UTC (permalink / raw) To: Ted Zlatanov; +Cc: emacs-devel Ted Zlatanov writes: > SJT> You can't (and shouldn't try to) enforce anything. > > Within GNU Emacs and the GNU ELPA, many things can (and are) enforced. > Obviously I'm not speaking about things outside that sphere. Including a passel of copyleft licenses, whose justification for existence is reaching outside that sphere; it is not obvious what you are talking about until you say it. > That's even more bizarre, defining free software by someone's > actions. *sigh* There is no other way to define it than by what people are permitted to do, ie, their actions. What is permitted will be done, by somebody somewhere sometime. ^ permalink raw reply [flat|nested] 182+ messages in thread
* Re: face for non-ASCII characters 2011-05-03 1:50 ` Stephen J. Turnbull @ 2011-05-03 14:45 ` Ted Zlatanov 2011-05-03 21:21 ` Lennart Borgman 0 siblings, 1 reply; 182+ messages in thread From: Ted Zlatanov @ 2011-05-03 14:45 UTC (permalink / raw) To: emacs-devel On Tue, 03 May 2011 10:50:46 +0900 "Stephen J. Turnbull" <stephen@xemacs.org> wrote: SJT> What is permitted will be done, by somebody somewhere sometime. I hereby permit the ignition of the Horsehead Nebula in my honor. (It's worth a try, and would improve the Milky Way real estate values.) Ted ^ permalink raw reply [flat|nested] 182+ messages in thread
* Re: face for non-ASCII characters 2011-05-03 14:45 ` Ted Zlatanov @ 2011-05-03 21:21 ` Lennart Borgman 2011-05-04 14:41 ` Stephen J. Turnbull 0 siblings, 1 reply; 182+ messages in thread From: Lennart Borgman @ 2011-05-03 21:21 UTC (permalink / raw) To: Ted Zlatanov; +Cc: emacs-devel 2011/5/3 Ted Zlatanov <tzz@lifelogs.com>: > On Tue, 03 May 2011 10:50:46 +0900 "Stephen J. Turnbull" <stephen@xemacs.org> wrote: > > SJT> What is permitted will be done, by somebody somewhere sometime. > > I hereby permit the ignition of the Horsehead Nebula in my honor. > > (It's worth a try, and would improve the Milky Way real estate values.) So now the experimental verification of the rule has started. We just have to wait and see. If the ignition never ever happens we are done. C U ^ permalink raw reply [flat|nested] 182+ messages in thread
* Re: face for non-ASCII characters 2011-05-03 21:21 ` Lennart Borgman @ 2011-05-04 14:41 ` Stephen J. Turnbull 0 siblings, 0 replies; 182+ messages in thread From: Stephen J. Turnbull @ 2011-05-04 14:41 UTC (permalink / raw) To: Lennart Borgman; +Cc: Ted Zlatanov, emacs-devel Lennart Borgman writes: > 2011/5/3 Ted Zlatanov <tzz@lifelogs.com>: > > On Tue, 03 May 2011 10:50:46 +0900 "Stephen J. Turnbull" <stephen@xemacs.org> wrote: > > > > SJT> What is permitted will be done, by somebody somewhere sometime. > > > > I hereby permit the ignition of the Horsehead Nebula in my honor. > > > > (It's worth a try, and would improve the Milky Way real estate values.) > > So now the experimental verification of the rule has started. Ted doesn't have the authority to permit that. You wait in vain. ^ permalink raw reply [flat|nested] 182+ messages in thread
* Re: face for non-ASCII characters 2011-04-26 18:26 ` Chong Yidong 2011-04-26 19:05 ` Ted Zlatanov @ 2011-04-27 12:41 ` Lennart Borgman 1 sibling, 0 replies; 182+ messages in thread From: Lennart Borgman @ 2011-04-27 12:41 UTC (permalink / raw) To: Chong Yidong; +Cc: tzz, stephen, rms, emacs-devel On Tue, Apr 26, 2011 at 8:26 PM, Chong Yidong <cyd@stupidchicken.com> wrote: > Richard Stallman <rms@gnu.org> writes: > >> After that I changed it so the user can review and decide at each >> opportunistic install. >> >> This is not a binary question, it's a scale. You want to take a step >> towards the wrong end of the scale. We are just introducing the >> separate package archive, which is a big step. Now is too soon to >> consider another step. The risk here is hard to calculate, so we must >> be cautious. ... > > But such program behavior should not be encouraged in general. In the > vast majority of forseeable Emacs-relevant situations, opportunistic > installation is a false optimization. It introduces more moving parts > into a program and obnoxiously makes the user wait for a download in the > middle of a command---all for the purpose of saving trivial amounts of > disk space. In practically all instances, package dependencies are the > proper way to go. I think this contains some misunderstandings. Opportunistic install is not (mainly) for saving disk space. And package dependencies only is in my opinion too rough. The main advantage of opportunistic install is easier structure. It is basically the same thing as (require ...), but from the network if needed. Using only package dependencies makes it hard to "announce" new possibilities. In the case of nXhtml I have tried to announce possibilities by placing them in menus. If only package dependencies where used that would mean installing the menus for nXhtml would install all of nXhtml. ^ permalink raw reply [flat|nested] 182+ messages in thread
* Re: face for non-ASCII characters 2011-04-22 12:20 ` Lennart Borgman 2011-04-22 12:49 ` Stephen J. Turnbull @ 2011-04-22 14:20 ` Ted Zlatanov 2011-04-22 17:12 ` Lennart Borgman 2011-04-26 3:09 ` markchars.el 0.2.0 and idn.el (was: face for non-ASCII characters) Ted Zlatanov 1 sibling, 2 replies; 182+ messages in thread From: Ted Zlatanov @ 2011-04-22 14:20 UTC (permalink / raw) To: emacs-devel On Fri, 22 Apr 2011 14:20:45 +0200 Lennart Borgman <lennart.borgman@gmail.com> wrote: LB> I can surely see the problem, but if the opportunistic installer asks LB> (and make it possible to check) before each install I do not think it LB> is an additional problem when using Emacs. At the very least it's a burden on the user. What programs do you know that use this system? If the prevailing norm is to do static installs, that suggests that users prefer it (I can't believe no one thought "let's do opportunistic installs!" before). LB> For another comparison think about the firewalls. They effectively act LB> similar to such an opportunistic installer as I suggest when they ask LB> you if you want a program to be able to do that and that. I think the difference here is between installing software and enabling services. >> ELPA will install all the dependencies when it installs the library. So >> when the library is installed, you won't have surprises later. If >> you're talking about optional add-ons and plugins, that's a different >> discussion :) LB> It is not clear all the time what dependencies there are since that LB> may depend on how you are using a library. That is why I think an LB> opportunistic installer is good. OK, so we're talking about plugins, not package dependencies. Those may be useful in a limited context, e.g. within nXhtml itself. Emacs may even get facilities to support them generally some day. But plugins are not packages. I don't think markchars.el is a plugin. It does not depend on nXhtml and does not enhance it in a special way; it's a general package. So perhaps our misunderstanding is semantic :) LB> At first sight one might think that your proposal to mirror LB> markchars.el into ELPA is not troublesome. However you may end up with LB> two versions of markchars.el if you mirror it into ELPA now. >> >> That would last for at most 1 day, until the nightly synchronization >> catches up with the nXhtml repository. I think that's OK. The nXhtml >> repository will still be the primary repository. LB> A misunderstanding. I was referring to two versions in different LB> locations on the users computers. Ah. package.el installs the two versions of the library in different locations and will activate only one. Thus the user has control over the versions and can upgrade. Does nXhtml do that? In any case, as long as nXhtml puts its plugin directory in front of package.el on the load-path, markchars.el will be loaded from the install location nXhtml specifies. >> Sure, but I'd rather collaborate if I can. The easiest thing (just keep >> markchars.el in the GNU ELPA) is not the best thing for the users. LB> Good. I am not sure either but want to give you my concerns. Please LB> feel free to handle it the way you think is best at the moment. OK, I'll mirror it. I don't expect it to become a problem. LB> I believe RMS rejection was not so much because of instability but LB> insecurity and that the user should have control. It was after that I LB> added the possibility to review and reject the opportunistic install, LB> just before the library is going to be installed. As I said, you have to make a proposal and defend it. It may turn out to be really great, we won't know until it's up for review. But I think you should frame it as a "plugin facility" instead of a package manager to give it a good chance to be accepted. Ted ^ permalink raw reply [flat|nested] 182+ messages in thread
* Re: face for non-ASCII characters 2011-04-22 14:20 ` Ted Zlatanov @ 2011-04-22 17:12 ` Lennart Borgman 2011-04-26 3:14 ` package management proposals for Emacs (was: face for non-ASCII characters) Ted Zlatanov 2011-04-26 3:09 ` markchars.el 0.2.0 and idn.el (was: face for non-ASCII characters) Ted Zlatanov 1 sibling, 1 reply; 182+ messages in thread From: Lennart Borgman @ 2011-04-22 17:12 UTC (permalink / raw) To: Ted Zlatanov; +Cc: emacs-devel 2011/4/22 Ted Zlatanov <tzz@lifelogs.com>: > On Fri, 22 Apr 2011 14:20:45 +0200 Lennart Borgman <lennart.borgman@gmail.com> wrote: > > LB> I can surely see the problem, but if the opportunistic installer asks > LB> (and make it possible to check) before each install I do not think it > LB> is an additional problem when using Emacs. > > At the very least it's a burden on the user. What programs do you know > that use this system? If the prevailing norm is to do static installs, > that suggests that users prefer it (I can't believe no one thought > "let's do opportunistic installs!" before). Web browsers do it all the time. > LB> For another comparison think about the firewalls. They effectively act > LB> similar to such an opportunistic installer as I suggest when they ask > LB> you if you want a program to be able to do that and that. > > I think the difference here is between installing software and enabling > services. There is no difference that I can see when it comes to stability (which was what I believe you suggested as the most important). >>> ELPA will install all the dependencies when it installs the library. So >>> when the library is installed, you won't have surprises later. If >>> you're talking about optional add-ons and plugins, that's a different >>> discussion :) > > LB> It is not clear all the time what dependencies there are since that > LB> may depend on how you are using a library. That is why I think an > LB> opportunistic installer is good. > > OK, so we're talking about plugins, not package dependencies. ?? Elisp libraries work the same way AFAICS. > Those may > be useful in a limited context, e.g. within nXhtml itself. Emacs may > even get facilities to support them generally some day. But plugins are > not packages. > > I don't think markchars.el is a plugin. It does not depend on nXhtml > and does not enhance it in a special way; it's a general package. So > perhaps our misunderstanding is semantic :) I can't find any sense in what you say here. Could explain what differences you see? > LB> A misunderstanding. I was referring to two versions in different > LB> locations on the users computers. > > Ah. package.el installs the two versions of the library in different > locations and will activate only one. Thus the user has control over > the versions and can upgrade. Does nXhtml do that? No, it did not make sense to finish the system for opportunistic install (since ELPA was to be used). I made it more as an example of how it can be built. (But it works, of course.) > In any case, as long as nXhtml puts its plugin directory in front of > package.el on the load-path, markchars.el will be loaded from the > install location nXhtml specifies. Yes. >>> Sure, but I'd rather collaborate if I can. The easiest thing (just keep >>> markchars.el in the GNU ELPA) is not the best thing for the users. > > LB> Good. I am not sure either but want to give you my concerns. Please > LB> feel free to handle it the way you think is best at the moment. > > OK, I'll mirror it. I don't expect it to become a problem. Why not mirror idn.el too then? > LB> I believe RMS rejection was not so much because of instability but > LB> insecurity and that the user should have control. It was after that I > LB> added the possibility to review and reject the opportunistic install, > LB> just before the library is going to be installed. > > As I said, you have to make a proposal and defend it. It may turn out > to be really great, we won't know until it's up for review. But I think > you should frame it as a "plugin facility" instead of a package manager > to give it a good chance to be accepted. I think it would make most sense as an enhancemen to ELPA since the libraries within ELPA should be good to use so we do not have to worry about something bad getting installed this way. ^ permalink raw reply [flat|nested] 182+ messages in thread
* package management proposals for Emacs (was: face for non-ASCII characters) 2011-04-22 17:12 ` Lennart Borgman @ 2011-04-26 3:14 ` Ted Zlatanov 2011-04-26 8:10 ` Lennart Borgman 0 siblings, 1 reply; 182+ messages in thread From: Ted Zlatanov @ 2011-04-26 3:14 UTC (permalink / raw) To: emacs-devel On Fri, 22 Apr 2011 19:12:15 +0200 Lennart Borgman <lennart.borgman@gmail.com> wrote: LB> I think it would make most sense as an enhancemen to ELPA since the LB> libraries within ELPA should be good to use so we do not have to worry LB> about something bad getting installed this way. OK, you can propose it later when the GNU ELPA (note the difference between "ELPA" and "GNU ELPA" btw) has been around for a bit. Meanwhile check out emacsmirror and el-get, they are interesting and go in a different direction from nXhtml's package management and from the ELPA model, and there's plenty of overlap. Ted ^ permalink raw reply [flat|nested] 182+ messages in thread
* Re: package management proposals for Emacs (was: face for non-ASCII characters) 2011-04-26 3:14 ` package management proposals for Emacs (was: face for non-ASCII characters) Ted Zlatanov @ 2011-04-26 8:10 ` Lennart Borgman 2011-04-26 21:46 ` Richard Stallman 0 siblings, 1 reply; 182+ messages in thread From: Lennart Borgman @ 2011-04-26 8:10 UTC (permalink / raw) To: Ted Zlatanov; +Cc: emacs-devel 2011/4/26 Ted Zlatanov <tzz@lifelogs.com>: > On Fri, 22 Apr 2011 19:12:15 +0200 Lennart Borgman <lennart.borgman@gmail.com> wrote: > > LB> I think it would make most sense as an enhancemen to ELPA since the > LB> libraries within ELPA should be good to use so we do not have to worry > LB> about something bad getting installed this way. > > OK, you can propose it later when the GNU ELPA (note the difference > between "ELPA" and "GNU ELPA" btw) has been around for a bit. Yes, I want the software used for GNU ELPA to stabilize first and then propose it. > Meanwhile check out emacsmirror and el-get, they are interesting and go > in a different direction from nXhtml's package management and from the > ELPA model, and there's plenty of overlap. > > Ted ^ permalink raw reply [flat|nested] 182+ messages in thread
* Re: package management proposals for Emacs (was: face for non-ASCII characters) 2011-04-26 8:10 ` Lennart Borgman @ 2011-04-26 21:46 ` Richard Stallman 2011-04-27 1:19 ` package management proposals for Emacs Stefan Monnier 0 siblings, 1 reply; 182+ messages in thread From: Richard Stallman @ 2011-04-26 21:46 UTC (permalink / raw) To: Lennart Borgman; +Cc: tzz, emacs-devel > OK, you can propose it later when the GNU ELPA (note the difference > between "ELPA" and "GNU ELPA" btw) has been around for a bit. Can't we choose some other name for our package archive that doesn't cause confusion with another one? -- Dr Richard Stallman President, Free Software Foundation 51 Franklin St Boston MA 02110 USA www.fsf.org, www.gnu.org Skype: No way! That's nonfree (freedom-denying) software. ^ permalink raw reply [flat|nested] 182+ messages in thread
* Re: package management proposals for Emacs 2011-04-26 21:46 ` Richard Stallman @ 2011-04-27 1:19 ` Stefan Monnier 2011-04-27 3:36 ` Ted Zlatanov 2011-04-27 21:14 ` Richard Stallman 0 siblings, 2 replies; 182+ messages in thread From: Stefan Monnier @ 2011-04-27 1:19 UTC (permalink / raw) To: rms; +Cc: tzz, Lennart Borgman, emacs-devel >> OK, you can propose it later when the GNU ELPA (note the difference >> between "ELPA" and "GNU ELPA" btw) has been around for a bit. > Can't we choose some other name for our package archive > that doesn't cause confusion with another one? Ours is an ELPA, it's just not the same one as the one originally setup by the original author of package.el. So what we need to is to avoid using "ELPA" without specifying which one (GNU's, Tom's, or some other one). Stefan ^ permalink raw reply [flat|nested] 182+ messages in thread
* Re: package management proposals for Emacs 2011-04-27 1:19 ` package management proposals for Emacs Stefan Monnier @ 2011-04-27 3:36 ` Ted Zlatanov 2011-04-27 21:14 ` Richard Stallman 1 sibling, 0 replies; 182+ messages in thread From: Ted Zlatanov @ 2011-04-27 3:36 UTC (permalink / raw) To: emacs-devel On Tue, 26 Apr 2011 22:19:35 -0300 Stefan Monnier <monnier@iro.umontreal.ca> wrote: >>> OK, you can propose it later when the GNU ELPA (note the difference >>> between "ELPA" and "GNU ELPA" btw) has been around for a bit. >> Can't we choose some other name for our package archive >> that doesn't cause confusion with another one? SM> Ours is an ELPA, it's just not the same one as the one originally setup SM> by the original author of package.el. So what we need to is to avoid SM> using "ELPA" without specifying which one (GNU's, Tom's, or some other one). It adds to the confusion that Tom's ELPA calls itself "the ELPA." I try to be consistent in the use of "GNU ELPA," at least, but we already had a discussion about this. Ted ^ permalink raw reply [flat|nested] 182+ messages in thread
* Re: package management proposals for Emacs 2011-04-27 1:19 ` package management proposals for Emacs Stefan Monnier 2011-04-27 3:36 ` Ted Zlatanov @ 2011-04-27 21:14 ` Richard Stallman 1 sibling, 0 replies; 182+ messages in thread From: Richard Stallman @ 2011-04-27 21:14 UTC (permalink / raw) To: Stefan Monnier; +Cc: tzz, lennart.borgman, emacs-devel Ours is an ELPA, it's just not the same one as the one originally setup by the original author of package.el. So what we need to is to avoid using "ELPA" without specifying which one (GNU's, Tom's, or some other one). Sure, but that's a mistake waiting to happen. Why not call it the GNU Emacs Package Archive, GEPA? -- Dr Richard Stallman President, Free Software Foundation 51 Franklin St Boston MA 02110 USA www.fsf.org, www.gnu.org Skype: No way! That's nonfree (freedom-denying) software. ^ permalink raw reply [flat|nested] 182+ messages in thread
* markchars.el 0.2.0 and idn.el (was: face for non-ASCII characters) 2011-04-22 14:20 ` Ted Zlatanov 2011-04-22 17:12 ` Lennart Borgman @ 2011-04-26 3:09 ` Ted Zlatanov 2011-04-26 8:13 ` Lennart Borgman 1 sibling, 1 reply; 182+ messages in thread From: Ted Zlatanov @ 2011-04-26 3:09 UTC (permalink / raw) To: emacs-devel On Fri, 22 Apr 2011 09:20:00 -0500 Ted Zlatanov <tzz@lifelogs.com> wrote: >>> Sure, but I'd rather collaborate if I can. The easiest thing (just keep >>> markchars.el in the GNU ELPA) is not the best thing for the users. ... TZ> OK, I'll mirror it. I don't expect it to become a problem. When markchars.el is up to version 0.2 (I try to use semantic versioning, so that became 0.2.0) in the nXhtml Launchpad repository, let me know and I'll set up mirroring. Until then I've checked my version 0.2.0 into the GNU ELPA Bazaar repository. I did not deploy it yet. Chong or Stefan, can you please look at markchars.el (and epoch-view.el, which I also checked in tonight) and let me know if I need to fix anything before deploying? idn.el depends on nXhtml: (require 'nxhtml-base) (defvar uts39-datadir (expand-file-name "etc/uts39/" nxhtml-install-dir)) so I don't think it can be mirrored yet. Let me know when and if that changes and how you want to do it. With the GNU ELPA we can make idn.el a package tarball that also has the confusables.txt file. Thanks Ted ^ permalink raw reply [flat|nested] 182+ messages in thread
* Re: markchars.el 0.2.0 and idn.el (was: face for non-ASCII characters) 2011-04-26 3:09 ` markchars.el 0.2.0 and idn.el (was: face for non-ASCII characters) Ted Zlatanov @ 2011-04-26 8:13 ` Lennart Borgman 2011-04-26 15:28 ` idn.el and confusables.txt (was: markchars.el 0.2.0 and idn.el) Ted Zlatanov 0 siblings, 1 reply; 182+ messages in thread From: Lennart Borgman @ 2011-04-26 8:13 UTC (permalink / raw) To: Ted Zlatanov; +Cc: emacs-devel 2011/4/26 Ted Zlatanov <tzz@lifelogs.com>: > On Fri, 22 Apr 2011 09:20:00 -0500 Ted Zlatanov <tzz@lifelogs.com> wrote: > >>>> Sure, but I'd rather collaborate if I can. The easiest thing (just keep >>>> markchars.el in the GNU ELPA) is not the best thing for the users. > ... > TZ> OK, I'll mirror it. I don't expect it to become a problem. > > When markchars.el is up to version 0.2 (I try to use semantic > versioning, so that became 0.2.0) in the nXhtml Launchpad repository, > let me know and I'll set up mirroring. Until then I've checked my > version 0.2.0 into the GNU ELPA Bazaar repository. I did not deploy it > yet. Ok. > Chong or Stefan, can you please look at markchars.el (and epoch-view.el, > which I also checked in tonight) and let me know if I need to fix > anything before deploying? > > idn.el depends on nXhtml: > > (require 'nxhtml-base) > (defvar uts39-datadir (expand-file-name "etc/uts39/" nxhtml-install-dir)) Ah, I forgot that dependency. I will have a look at it. Do you have a better proposal for this? > so I don't think it can be mirrored yet. Let me know when and if that > changes and how you want to do it. With the GNU ELPA we can make idn.el > a package tarball that also has the confusables.txt file. Thanks. I can't see any copyright problem with confusables.txt, but I am not sure. It would be good to have this checked. > Thanks > Ted ^ permalink raw reply [flat|nested] 182+ messages in thread
* idn.el and confusables.txt (was: markchars.el 0.2.0 and idn.el) 2011-04-26 8:13 ` Lennart Borgman @ 2011-04-26 15:28 ` Ted Zlatanov 2011-05-13 19:42 ` idn.el and confusables.txt Stefan Monnier 2011-05-31 10:42 ` uni-confusables 0.1 is on the Emacs ELPA branch (was: idn.el and confusables.txt) Ted Zlatanov 0 siblings, 2 replies; 182+ messages in thread From: Ted Zlatanov @ 2011-04-26 15:28 UTC (permalink / raw) To: emacs-devel On Tue, 26 Apr 2011 10:13:34 +0200 Lennart Borgman <lennart.borgman@gmail.com> wrote: LB> 2011/4/26 Ted Zlatanov <tzz@lifelogs.com>: >> idn.el depends on nXhtml: >> >> (require 'nxhtml-base) >> (defvar uts39-datadir (expand-file-name "etc/uts39/" nxhtml-install-dir)) LB> Ah, I forgot that dependency. I will have a look at it. Do you have a LB> better proposal for this? I would look at the Unicode copyrights on confusables.txt and figure out if they need to be considered. Since Emacs already includes many transformed Unicode files, I imagine it's OK, but don't know for sure. Once we figure that out, I will bundle idn.el with the confusables.txt and make it a tarball-style ELPA package. IMHO idn.el and confusables.txt should go into the Emacs trunk so they can generate first-class character properties for `C-u x ='. Stefan, Chong, what do you think? Thanks Ted ^ permalink raw reply [flat|nested] 182+ messages in thread
* Re: idn.el and confusables.txt 2011-04-26 15:28 ` idn.el and confusables.txt (was: markchars.el 0.2.0 and idn.el) Ted Zlatanov @ 2011-05-13 19:42 ` Stefan Monnier 2011-05-13 20:19 ` Ted Zlatanov 2011-05-14 8:06 ` Eli Zaretskii 2011-05-31 10:42 ` uni-confusables 0.1 is on the Emacs ELPA branch (was: idn.el and confusables.txt) Ted Zlatanov 1 sibling, 2 replies; 182+ messages in thread From: Stefan Monnier @ 2011-05-13 19:42 UTC (permalink / raw) To: Ted Zlatanov; +Cc: emacs-devel > IMHO idn.el and confusables.txt should go into the Emacs trunk so they > can generate first-class character properties for `C-u x ='. Stefan, > Chong, what do you think? I don't know enough about the way we handle Unicode tables to know. It does sound like confusables.txt could be turned into a lisp/international/uni-confusables.el, but I don't know whether there is a large benefit from having it part of Emacs as opposed to having it in GNU ELPA. As for idn.el, I haven't seen the file, and don't know what uses it, so I can't judge. Stefan ^ permalink raw reply [flat|nested] 182+ messages in thread
* Re: idn.el and confusables.txt 2011-05-13 19:42 ` idn.el and confusables.txt Stefan Monnier @ 2011-05-13 20:19 ` Ted Zlatanov 2011-05-14 8:13 ` Eli Zaretskii 2011-05-14 8:06 ` Eli Zaretskii 1 sibling, 1 reply; 182+ messages in thread From: Ted Zlatanov @ 2011-05-13 20:19 UTC (permalink / raw) To: emacs-devel On Fri, 13 May 2011 16:42:57 -0300 Stefan Monnier <monnier@iro.umontreal.ca> wrote: >> IMHO idn.el and confusables.txt should go into the Emacs trunk so they >> can generate first-class character properties for `C-u x ='. Stefan, >> Chong, what do you think? SM> I don't know enough about the way we handle Unicode tables to know. SM> It does sound like confusables.txt could be turned into SM> a lisp/international/uni-confusables.el, but I don't know whether there SM> is a large benefit from having it part of Emacs as opposed to having it SM> in GNU ELPA. As for idn.el, I haven't seen the file, and don't know SM> what uses it, so I can't judge. Looking at lisp/international/uni-uppercase.el for example, it seems possible to generate something similar from confusables.txt that would live in the GNU ELPA and can be loaded by other packages. I don't see anything that requires the code to live in Emacs. Ditto for idn.el. But it would be nice if we could reuse the admin/unidata infrastructure to generate uni-confusables.el and uni-idn.el. Could that be connected with the GNU ELPA somehow? If Juanma Barranquero or Kenichi Handa (two recent committers to that area) could comment, that would be great. Ted ^ permalink raw reply [flat|nested] 182+ messages in thread
* Re: idn.el and confusables.txt 2011-05-13 20:19 ` Ted Zlatanov @ 2011-05-14 8:13 ` Eli Zaretskii 0 siblings, 0 replies; 182+ messages in thread From: Eli Zaretskii @ 2011-05-14 8:13 UTC (permalink / raw) To: Ted Zlatanov; +Cc: emacs-devel > From: Ted Zlatanov <tzz@lifelogs.com> > Date: Fri, 13 May 2011 15:19:34 -0500 > > But it would be nice if we could reuse the admin/unidata infrastructure > to generate uni-confusables.el and uni-idn.el. AFAIK, admin/unidata/unidata-gen.el can only parse the format of the UnicodeData.txt file. confusables.txt is in different format, so I don't think you can reuse unidata-gen.el for that. ^ permalink raw reply [flat|nested] 182+ messages in thread
* Re: idn.el and confusables.txt 2011-05-13 19:42 ` idn.el and confusables.txt Stefan Monnier 2011-05-13 20:19 ` Ted Zlatanov @ 2011-05-14 8:06 ` Eli Zaretskii 2011-05-14 8:56 ` Lennart Borgman ` (2 more replies) 1 sibling, 3 replies; 182+ messages in thread From: Eli Zaretskii @ 2011-05-14 8:06 UTC (permalink / raw) To: Stefan Monnier; +Cc: tzz, emacs-devel > From: Stefan Monnier <monnier@iro.umontreal.ca> > Date: Fri, 13 May 2011 16:42:57 -0300 > Cc: emacs-devel@gnu.org > > > IMHO idn.el and confusables.txt should go into the Emacs trunk so they > > can generate first-class character properties for `C-u x ='. Stefan, > > Chong, what do you think? > > I don't know enough about the way we handle Unicode tables to know. We create char-tables from them. But I'm not sure I understand the question, so maybe my answer is not helpful. > It does sound like confusables.txt could be turned into > a lisp/international/uni-confusables.el, but I don't know whether there > is a large benefit from having it part of Emacs as opposed to having it > in GNU ELPA. As for idn.el, I haven't seen the file, and don't know > what uses it, so I can't judge. What is idn.el? where can I see it? And how and where would we like to use it? I searched the relevant threads (which were all spin-offs of other threads, which didn't help searching for the info), but didn't find any pointers. Apologies if I missed something. You see, the uni-*.el files we create out of the Unicode DB are not used anywhere in application code, AFAIK. We use them to display character properties in the likes of "C-u C-x =", and that's it. I'm not even sure they are organized in a way that makes them useful. E.g., when I needed to use the Unicode bidirectional properties for bidi reordering, I eventually was forced to create my own tables (see src/biditype.h and src/bidimirror.h, and the corresponding Awk scripts in admin/unidata/) which lend themselves well to using them in real-life code. So I'd really like to avoid introducing yet another huge table whose only effects are to show one more property in "C-u C-x =" and bloat the ELisp manual some more. Can we please have some preliminary ideas and design for using the "confusables" information and the IDNA protocol in Emacs, before we decide whether and how to include them? ^ permalink raw reply [flat|nested] 182+ messages in thread
* Re: idn.el and confusables.txt 2011-05-14 8:06 ` Eli Zaretskii @ 2011-05-14 8:56 ` Lennart Borgman 2011-05-14 9:36 ` Eli Zaretskii 2011-05-14 13:40 ` Ted Zlatanov 2011-05-15 13:06 ` Kenichi Handa 2 siblings, 1 reply; 182+ messages in thread From: Lennart Borgman @ 2011-05-14 8:56 UTC (permalink / raw) To: Eli Zaretskii; +Cc: tzz, Stefan Monnier, emacs-devel On Sat, May 14, 2011 at 10:06 AM, Eli Zaretskii <eliz@gnu.org> wrote: > > What is idn.el? where can I see it? It is currently part of nXhtml: http://bazaar.launchpad.net/~nxhtml/nxhtml/main/files/head:/util/ > And how and where would we like > to use it? I searched the relevant threads (which were all spin-offs > of other threads, which didn't help searching for the info), but > didn't find any pointers. Apologies if I missed something. It is used by markchars.el. (Which I have not yet updated I think.) ^ permalink raw reply [flat|nested] 182+ messages in thread
* Re: idn.el and confusables.txt 2011-05-14 8:56 ` Lennart Borgman @ 2011-05-14 9:36 ` Eli Zaretskii 0 siblings, 0 replies; 182+ messages in thread From: Eli Zaretskii @ 2011-05-14 9:36 UTC (permalink / raw) To: Lennart Borgman; +Cc: tzz, monnier, emacs-devel > From: Lennart Borgman <lennart.borgman@gmail.com> > Date: Sat, 14 May 2011 10:56:33 +0200 > Cc: Stefan Monnier <monnier@iro.umontreal.ca>, tzz@lifelogs.com, emacs-devel@gnu.org > > On Sat, May 14, 2011 at 10:06 AM, Eli Zaretskii <eliz@gnu.org> wrote: > > > > What is idn.el? where can I see it? > > It is currently part of nXhtml: > http://bazaar.launchpad.net/~nxhtml/nxhtml/main/files/head:/util/ Thanks. > > And how and where would we like > > to use it? I searched the relevant threads (which were all spin-offs > > of other threads, which didn't help searching for the info), but > > didn't find any pointers. Apologies if I missed something. > > It is used by markchars.el. (Which I have not yet updated I think.) So we are discussing addition of a feature that is only used by an unbundled package, and then only to highlight certain characters, is that right? ^ permalink raw reply [flat|nested] 182+ messages in thread
* Re: idn.el and confusables.txt 2011-05-14 8:06 ` Eli Zaretskii 2011-05-14 8:56 ` Lennart Borgman @ 2011-05-14 13:40 ` Ted Zlatanov 2011-05-14 14:38 ` Eli Zaretskii 2011-05-15 13:06 ` Kenichi Handa 2 siblings, 1 reply; 182+ messages in thread From: Ted Zlatanov @ 2011-05-14 13:40 UTC (permalink / raw) To: emacs-devel On Sat, 14 May 2011 11:06:52 +0300 Eli Zaretskii <eliz@gnu.org> wrote: >> It does sound like confusables.txt could be turned into >> a lisp/international/uni-confusables.el, but I don't know whether there >> is a large benefit from having it part of Emacs as opposed to having it >> in GNU ELPA. As for idn.el, I haven't seen the file, and don't know >> what uses it, so I can't judge. EZ> What is idn.el? where can I see it? And how and where would we like EZ> to use it? I searched the relevant threads (which were all spin-offs EZ> of other threads, which didn't help searching for the info), but EZ> didn't find any pointers. Apologies if I missed something. Both idn.el and confusables.txt can be used by markchars.el to show suspicious characters, especially in URLs but also in other situations (e.g. accidentally putting a Cyrillic о instead of the Latin o in e-mail text). The latest markchars.el is in the GNU ELPA, though the primary location may continue to be nXhtml. EZ> You see, the uni-*.el files we create out of the Unicode DB are not EZ> used anywhere in application code, AFAIK. We use them to display EZ> character properties in the likes of "C-u C-x =", and that's it. I'm EZ> not even sure they are organized in a way that makes them useful. markchars.el could use other Unicode properties if people ask. But specifically regarding the ones I'm proposing for inclusion, since we've started using the GNU ELPA more and markchars.el lives in it, we can put uni-confusables.el and uni-idn.el in the GNU ELPA instead of the Emacs trunk. EZ> So I'd really like to avoid introducing yet another huge table whose EZ> only effects are to show one more property in "C-u C-x =" and bloat EZ> the ELisp manual some more. IMO it's not a huge table and should not bloat the manual significantly if it was in the trunk. There is useful extra information for each character (the characters it can be confused with) which would grow the char-table if it was included. Also the char-table doesn't have to cover the Asian confusables--I'm not sure anyone would need those. So there's some vagueness as far as the memory usage. EZ> Can we please have some preliminary ideas and design for using the EZ> "confusables" information and the IDNA protocol in Emacs, before we EZ> decide whether and how to include them? We've had literally hundreds of messages on this over the last year, unfortunately over many threads as you pointed out. My best attempt to propose their usage would be as supplements to markchars.el, I can't think of other uses currently. EZ> So we are discussing addition of a feature that is only used by an EZ> unbundled package, and then only to highlight certain characters, is EZ> that right? uni-confusables.el and uni-idn.el will define Unicode properties and not a feature per se. Thus they are really reference tables and not functional changes to Emacs. I point this out because the burden of including them in the Emacs trunk is not large: some memory usage, keeping the Unicode data updated, and the conversion scripts. On Sat, 14 May 2011 11:13:46 +0300 Eli Zaretskii <eliz@gnu.org> wrote: EZ> AFAIK, admin/unidata/unidata-gen.el can only parse the format of the EZ> UnicodeData.txt file. confusables.txt is in different format, so I EZ> don't think you can reuse unidata-gen.el for that. OK, I'll set a converter up to live in the trunk or in the GNU ELPA when the maintainers decide where uni-confusables.el and uni-idn.el should be. Ted ^ permalink raw reply [flat|nested] 182+ messages in thread
* Re: idn.el and confusables.txt 2011-05-14 13:40 ` Ted Zlatanov @ 2011-05-14 14:38 ` Eli Zaretskii 2011-05-14 15:30 ` Ted Zlatanov 2011-05-14 17:25 ` Stefan Monnier 0 siblings, 2 replies; 182+ messages in thread From: Eli Zaretskii @ 2011-05-14 14:38 UTC (permalink / raw) To: Ted Zlatanov; +Cc: emacs-devel > From: Ted Zlatanov <tzz@lifelogs.com> > Date: Sat, 14 May 2011 08:40:48 -0500 > > EZ> You see, the uni-*.el files we create out of the Unicode DB are not > EZ> used anywhere in application code, AFAIK. We use them to display > EZ> character properties in the likes of "C-u C-x =", and that's it. I'm > EZ> not even sure they are organized in a way that makes them useful. > > markchars.el could use other Unicode properties if people ask. I'm talking about the details. The way we currently set the tables in uni-*.el is that many of the values are symbols. For example: (get-char-code-property ?1 'general-category) => Nd (get-char-code-property ?א 'bidi-class) => R (get-char-code-property ?\( 'mirrored) => Y The `Nd', `R', and `Y' are symbols. Now, suppose you wanted to use these values in some code that needs to be fast -- how would you feel about having to write multi-branch `cond' forms to compare the value against all the possibilities? For bidi reordering, which runs in the innermost loop of the display engine, using the `bidi-class' or `mirrored' properties that are symbols would be prohibitively expensive. For now, with markchars.el, all you need is a boolean value for each character. However, in other use cases, some other Lisp code will want the paired character. Yet another application will want to compare characters such that confusable pairs will compare equal. Can a single table satisfy all these needs efficiently? Maybe it can, but we need to design that table carefully. > But specifically regarding the ones I'm proposing for inclusion, > since we've started using the GNU ELPA more and markchars.el lives > in it, we can put uni-confusables.el and uni-idn.el in the GNU ELPA > instead of the Emacs trunk. I'm not arguing about where to put them. I'm saying that for such basic infrastructure, we should consider the possible uses before we rush into implementation. Otherwise, we will again repeat the same mistake, whose result is that the only real user of bidirectional properties cannot use uni-bidi.el! > EZ> So I'd really like to avoid introducing yet another huge table whose > EZ> only effects are to show one more property in "C-u C-x =" and bloat > EZ> the ELisp manual some more. > > IMO it's not a huge table ??? It's a char-table that can be indexed by any character supported by Emacs. Even if you count only the characters mentioned in confusables.txt, there are 20 thousand of them. char-tables are memory-efficient, but their footprint is not negligible. The bloat may be insignificant by comparison, but if the _only_ useful effect is the bloat, why should we do that? > Also the char-table doesn't have to > cover the Asian confusables--I'm not sure anyone would need those. Well, the Unicode consortium definitely thought they were needed. Either we follow established standards, or we don't. ^ permalink raw reply [flat|nested] 182+ messages in thread
* Re: idn.el and confusables.txt 2011-05-14 14:38 ` Eli Zaretskii @ 2011-05-14 15:30 ` Ted Zlatanov 2011-05-14 16:42 ` Eli Zaretskii 2011-05-14 17:25 ` Stefan Monnier 1 sibling, 1 reply; 182+ messages in thread From: Ted Zlatanov @ 2011-05-14 15:30 UTC (permalink / raw) To: emacs-devel On Sat, 14 May 2011 17:38:11 +0300 Eli Zaretskii <eliz@gnu.org> wrote: >> From: Ted Zlatanov <tzz@lifelogs.com> >> Date: Sat, 14 May 2011 08:40:48 -0500 >> EZ> You see, the uni-*.el files we create out of the Unicode DB are not EZ> used anywhere in application code, AFAIK. We use them to display EZ> character properties in the likes of "C-u C-x =", and that's it. I'm EZ> not even sure they are organized in a way that makes them useful. >> >> markchars.el could use other Unicode properties if people ask. EZ> I'm talking about the details. The way we currently set the tables in EZ> uni-*.el is that many of the values are symbols. For example: EZ> (get-char-code-property ?1 'general-category) => Nd EZ> (get-char-code-property ?א 'bidi-class) => R EZ> (get-char-code-property ?\( 'mirrored) => Y EZ> The `Nd', `R', and `Y' are symbols. EZ> Now, suppose you wanted to use these values in some code that needs to EZ> be fast -- how would you feel about having to write multi-branch EZ> `cond' forms to compare the value against all the possibilities? It wouldn't be ideal, surely, but most glyphs are not confusable so the lookup would fail. I might write some of it in C if performance was an issue, or try to inline the conditions with macros, or cache the lookups. But I don't know if markchars.el needs to be terribly fast. It runs at the font-lock level and IIUC that's opportunistic and not time-critical like the display code. For instance, unmodified text is not rechecked, right? EZ> For now, with markchars.el, all you need is a boolean value for each EZ> character. However, in other use cases, some other Lisp code will EZ> want the paired character. Yet another application will want to EZ> compare characters such that confusable pairs will compare equal. Can EZ> a single table satisfy all these needs efficiently? Maybe it can, but EZ> we need to design that table carefully. Two char-tables would be enough: one small table for the confusable -> target mapping, and one even smaller for the reverse target -> (confusable list) mapping. The reverse lookup table could be stored in an extra slot of the primary lookup table. markchars.el could use this mapping to show more information than just underlining the characters. A tooltip could show why the glyph is confusable, for instance. >> Also the char-table doesn't have to >> cover the Asian confusables--I'm not sure anyone would need those. EZ> Well, the Unicode consortium definitely thought they were needed. EZ> Either we follow established standards, or we don't. You're right. Also there are Asian characters that could be confused for Latin characters so it's not safe to exclude them. Ted ^ permalink raw reply [flat|nested] 182+ messages in thread
* Re: idn.el and confusables.txt 2011-05-14 15:30 ` Ted Zlatanov @ 2011-05-14 16:42 ` Eli Zaretskii 2011-05-14 17:06 ` Ted Zlatanov 0 siblings, 1 reply; 182+ messages in thread From: Eli Zaretskii @ 2011-05-14 16:42 UTC (permalink / raw) To: Ted Zlatanov; +Cc: emacs-devel > From: Ted Zlatanov <tzz@lifelogs.com> > Date: Sat, 14 May 2011 10:30:37 -0500 > > It wouldn't be ideal, surely, but most glyphs are not confusable so the > lookup would fail. For some value of "most": there are 20K entries in confusables.txt. > I might write some of it in C if performance was an issue C won't help, if you need to access the same char-table and compare with half a dozen possible symbols. > or try to inline the conditions with macros, or cache the lookups. Isn't it better to design the table for efficient use to begin with? > But I don't know if markchars.el needs to be terribly fast. I hope we are not introducing another character property for a single use. Some use, some day might need to do it fast. > It runs at the font-lock level and IIUC that's opportunistic and not > time-critical like the display code. For instance, unmodified text is > not rechecked, right? No, you cannot count on that. E.g., fontification-functions are always called with a region that starts at the beginning of a line, even if part of that line is already fontified. > Two char-tables would be enough: one small table for the confusable -> > target mapping, and one even smaller for the reverse target -> > (confusable list) mapping. The reverse lookup table could be stored in > an extra slot of the primary lookup table. Doesn't confusables.txt include both mappings already? If so, you don't need the reverse table. ^ permalink raw reply [flat|nested] 182+ messages in thread
* Re: idn.el and confusables.txt 2011-05-14 16:42 ` Eli Zaretskii @ 2011-05-14 17:06 ` Ted Zlatanov 2011-05-14 20:59 ` Eli Zaretskii 0 siblings, 1 reply; 182+ messages in thread From: Ted Zlatanov @ 2011-05-14 17:06 UTC (permalink / raw) To: emacs-devel On Sat, 14 May 2011 19:42:39 +0300 Eli Zaretskii <eliz@gnu.org> wrote: EZ> Isn't it better to design the table for efficient use to begin with? Yes, and I ask you and the other experts on char-tables to help with that design. I am far from an expert on that topic. >> But I don't know if markchars.el needs to be terribly fast. EZ> I hope we are not introducing another character property for a EZ> single use. Some use, some day might need to do it fast. This is premature optimization. I only have a single use in hand. Let's make sure markchars.el is fast and we can optimize for other uses when they are needed. >> Two char-tables would be enough: one small table for the confusable -> >> target mapping, and one even smaller for the reverse target -> >> (confusable list) mapping. The reverse lookup table could be stored in >> an extra slot of the primary lookup table. EZ> Doesn't confusables.txt include both mappings already? If so, you EZ> don't need the reverse table. I thought the lookups would be faster with a reverse mapping in one of the scenarios you listed (looking up all the characters that might be confused with a given one). But I realized it doesn't need to be. Let's say C1, C2, and C3 are confusables mapped to C1. Then the mapping is C1 -> (C2, C3); C2 -> C1; and C3 -> C1. The algorithm is "if a character maps to an atom it's confusable with it, if it maps to a list the whole lisp is confusable to this character." So to find all the confusables mapped to a character you need at most two lookups. In addition to the character mapping we also need a confusable data type, which can be SL/SA (single-script) or ML/MA (mixed-script). I don't know where to store that. Maybe we can just have two char-tables for the two data types. There aren't going to be more data types AFAIK. But markchars.el can definitely use the knowledge that the confusable is within a single script or not. Does all of that make sense? Ted ^ permalink raw reply [flat|nested] 182+ messages in thread
* Re: idn.el and confusables.txt 2011-05-14 17:06 ` Ted Zlatanov @ 2011-05-14 20:59 ` Eli Zaretskii 2011-05-15 1:22 ` Ted Zlatanov 0 siblings, 1 reply; 182+ messages in thread From: Eli Zaretskii @ 2011-05-14 20:59 UTC (permalink / raw) To: Ted Zlatanov; +Cc: emacs-devel > From: Ted Zlatanov <tzz@lifelogs.com> > Date: Sat, 14 May 2011 12:06:04 -0500 > > On Sat, 14 May 2011 19:42:39 +0300 Eli Zaretskii <eliz@gnu.org> wrote: > > EZ> Isn't it better to design the table for efficient use to begin with? > > Yes, and I ask you and the other experts on char-tables to help with > that design. I am far from an expert on that topic. I don't think char-table expertise will help here. The main issue is what uses of this data we envision. > >> But I don't know if markchars.el needs to be terribly fast. > > EZ> I hope we are not introducing another character property for a > EZ> single use. Some use, some day might need to do it fast. > > This is premature optimization. I call it foresight ;-) > I only have a single use in hand. > Let's make sure markchars.el is fast and we can optimize for other uses > when they are needed. No, let's think what uses we envision, and make the data structure convenient for those uses. > Let's say C1, C2, and C3 are confusables mapped to C1. Then the mapping > is C1 -> (C2, C3); C2 -> C1; and C3 -> C1. > > The algorithm is "if a character maps to an atom it's confusable with > it, if it maps to a list the whole lisp is confusable to this > character." Should it be a list or a string? How would you use this mapping? > So to find all the confusables mapped to a character you > need at most two lookups. The RHS of a mapping can be several characters, in which case there's no reverse mapping and no "confusables mapped to a character", I think. > In addition to the character mapping we also need a confusable data > type, which can be SL/SA (single-script) or ML/MA (mixed-script). What would be a possible use of that? ^ permalink raw reply [flat|nested] 182+ messages in thread
* Re: idn.el and confusables.txt 2011-05-14 20:59 ` Eli Zaretskii @ 2011-05-15 1:22 ` Ted Zlatanov 2011-05-15 5:56 ` Eli Zaretskii 0 siblings, 1 reply; 182+ messages in thread From: Ted Zlatanov @ 2011-05-15 1:22 UTC (permalink / raw) To: emacs-devel On Sat, 14 May 2011 23:59:22 +0300 Eli Zaretskii <eliz@gnu.org> wrote: >> Let's say C1, C2, and C3 are confusables mapped to C1. Then the mapping >> is C1 -> (C2, C3); C2 -> C1; and C3 -> C1. >> >> The algorithm is "if a character maps to an atom it's confusable with >> it, if it maps to a list the whole lisp is confusable to this >> character." EZ> Should it be a list or a string? How would you use this mapping? It could be any type of sequence, I guess. Strings are more compact but for small amounts of data (typically 1-3 characters) I'm not sure if that matters. For 1 character in particular I'm pretty sure it's more efficient to store the character directly than any sequence. markchars.el would use it as follows: look at all the characters of a word. If any are of a different script S2 from the majority script S1, highlight them (we do this now with `markchars-face-confusable'). New functionality: now if any of the S2 characters are multi-script confusables that map to a character in the majority script S1, highlight them specially with the new variable `markchars-face-confusable-multi-script' and give them a tooltip to say they are confusable with a particular character. New functionality: if any of the word characters, regardless of script, are confusables of the single-script type, highlight them with `markchars-face-confusable'. But see below about normalization. EZ> The RHS of a mapping can be several characters, in which case there's EZ> no reverse mapping and no "confusables mapped to a character", I EZ> think. OK. I was thinking of using the transitivity information but that's not very useful so never mind. >> In addition to the character mapping we also need a confusable data >> type, which can be SL/SA (single-script) or ML/MA (mixed-script). EZ> What would be a possible use of that? Single-script confusables can be an accident and are usually due to combining, e.g. parenthesized numbers: 2485 ; 0028 006C 0038 0029 ; SL #* ( ⒅ → (l8) ) PARENTHESIZED NUMBER EIGHTEEN → LEFT PARENTHESIS, LATIN SMALL LETTER L, DIGIT EIGHT, RIGHT PARENTHESIS # →(18)→ ...although there are many cases where that's not true: 0399 ; 0031 ; SA # ( Ι → 1 ) GREEK CAPITAL LETTER IOTA → DIGIT ONE # →l→ 0417 ; 0033 ; SA # ( З → 3 ) CYRILLIC CAPITAL LETTER ZE → DIGIT THREE # As a general rule I'd say that if the mapping is to a single character with the SL/SA single-script property, chances are it's a true confusable. Otherwise it could be legitimate and we'd need to convert the string to a normalized form, which is probably slow (do you know?) Mixed-script confusables are more dangerous because they look exactly like the other character and are less likely to be an accident, e.g. FF01 ; 0021 ; ML #* ( ! → ! ) FULLWIDTH EXCLAMATION MARK → EXCLAMATION MARK # →ǃ→ 0430 ; 0061 ; ML # ( а → a ) CYRILLIC SMALL LETTER A → LATIN SMALL LETTER A # so I would make them more noticeable and would skip any normalization. Thus my new functionality proposals above. There are also whole-script confusables, e.g. "scope" in Latin and "scope" in Cyrillic (example from http://unicode.org/reports/tr39/) but I think those are covered by the rules above already and don't merit special treatment. Finally, confusables.txt has transitivity mappings that explain how the mapping was derived. I don't think that's particularly useful for markchars.el. I can't think of any other uses for the confusables.txt data beyond the listed above. Based on all this, I think it's best to make the confusables char-table values atoms or sequences (strings or lists) but split them into two char-tables for the single-script and multi-script mappings. Ted ^ permalink raw reply [flat|nested] 182+ messages in thread
* Re: idn.el and confusables.txt 2011-05-15 1:22 ` Ted Zlatanov @ 2011-05-15 5:56 ` Eli Zaretskii 2011-05-15 12:14 ` Ted Zlatanov 0 siblings, 1 reply; 182+ messages in thread From: Eli Zaretskii @ 2011-05-15 5:56 UTC (permalink / raw) To: Ted Zlatanov; +Cc: emacs-devel > From: Ted Zlatanov <tzz@lifelogs.com> > Date: Sat, 14 May 2011 20:22:44 -0500 > > EZ> Should it be a list or a string? How would you use this mapping? > > It could be any type of sequence, I guess. Strings are more compact but > for small amounts of data (typically 1-3 characters) I'm not sure if > that matters. For 1 character in particular I'm pretty sure it's more > efficient to store the character directly than any sequence. > > markchars.el would use it as follows: look at all the characters of a > word. If any are of a different script S2 from the majority script S1, > highlight them (we do this now with `markchars-face-confusable'). > > New functionality: now if any of the S2 characters are multi-script > confusables that map to a character in the majority script S1, highlight > them specially with the new variable > `markchars-face-confusable-multi-script' and give them a tooltip to say > they are confusable with a particular character. > > New functionality: if any of the word characters, regardless of script, > are confusables of the single-script type, highlight them with > `markchars-face-confusable'. But see below about normalization. These all examine portions of a buffer ("words") for being a match to some string or regexp. So I think having strings in the char-table will be more convenient, because you could then use looking-at, string=, string-match, etc. > As a general rule I'd say that if the mapping is to a single character > with the SL/SA single-script property, chances are it's a true > confusable. Otherwise it could be legitimate and we'd need to convert > the string to a normalized form, which is probably slow (do you know?) What do you mean by "normalized form"? > Based on all this, I think it's best to make the confusables char-table > values atoms or sequences (strings or lists) but split them into two > char-tables for the single-script and multi-script mappings. If we were to implement the full IDNA protocol, would the above be enough? Or will we need additional information? ^ permalink raw reply [flat|nested] 182+ messages in thread
* Re: idn.el and confusables.txt 2011-05-15 5:56 ` Eli Zaretskii @ 2011-05-15 12:14 ` Ted Zlatanov 2011-05-16 12:38 ` Eli Zaretskii 2011-05-17 15:32 ` Ted Zlatanov 0 siblings, 2 replies; 182+ messages in thread From: Ted Zlatanov @ 2011-05-15 12:14 UTC (permalink / raw) To: emacs-devel On Sun, 15 May 2011 01:56:02 -0400 Eli Zaretskii <eliz@gnu.org> wrote: EZ> These all examine portions of a buffer ("words") for being a match to EZ> some string or regexp. So I think having strings in the char-table EZ> will be more convenient, because you could then use looking-at, EZ> string=, string-match, etc. Oh, good point. OK, strings it is. I'll write the converter. >> As a general rule I'd say that if the mapping is to a single character >> with the SL/SA single-script property, chances are it's a true >> confusable. Otherwise it could be legitimate and we'd need to convert >> the string to a normalized form, which is probably slow (do you know?) EZ> What do you mean by "normalized form"? Unicode has a normalization algorithm to see if two strings are informationally the same regardless of the combining characters and other sequences within. But thinking about it, even if normalization says they're the same, it's still a potential problem for the user, so we can skip normalization and always mark those. >> Based on all this, I think it's best to make the confusables char-table >> values atoms or sequences (strings or lists) but split them into two >> char-tables for the single-script and multi-script mappings. EZ> If we were to implement the full IDNA protocol, would the above be EZ> enough? Or will we need additional information? Oh, all this has been for confusables (TR39) only. IDNA and uni-idn.el will have their own needs! IIUC, Lennart used IDNA only as a character set in markchars.el (I didn't write that functionality and he maintains idn.el), but there are more security issues with it we may need to handle. IDNA is better described in http://unicode.org/reports/tr46/ and the links at the end of that document (a whole bunch of RFCs). I'm not interested in implementing the IDNA code beyond supporting the current character set detection because I don't think IDNA is popular enough, but maybe Lennart and others want to do it. For further possible markchars.el functionality, take a look at http://www.unicode.org/reports/tr36/ (Unicode Security Considerations). It talks about the confusables issues, IDNA issues, and bidi issues among others. It's a really good explanation of what security-related functionality is needed from the confusables char-table and potentially other places in Emacs. Ted ^ permalink raw reply [flat|nested] 182+ messages in thread
* Re: idn.el and confusables.txt 2011-05-15 12:14 ` Ted Zlatanov @ 2011-05-16 12:38 ` Eli Zaretskii 2011-05-16 18:31 ` Ted Zlatanov 2011-05-17 15:32 ` Ted Zlatanov 1 sibling, 1 reply; 182+ messages in thread From: Eli Zaretskii @ 2011-05-16 12:38 UTC (permalink / raw) To: Ted Zlatanov; +Cc: emacs-devel > From: Ted Zlatanov <tzz@lifelogs.com> > Date: Sun, 15 May 2011 07:14:47 -0500 > > EZ> If we were to implement the full IDNA protocol, would the above be > EZ> enough? Or will we need additional information? > > Oh, all this has been for confusables (TR39) only. IDNA and uni-idn.el > will have their own needs! Granted, but I was asking whether IDNA implementation will need anything from confusables.txt beyond what we discussed. That is, will a mapping of a character to a string be all IDNA needs to use the information in confusables.txt? > IDNA is better described in http://unicode.org/reports/tr46/ and the > links at the end of that document (a whole bunch of RFCs). I'm not > interested in implementing the IDNA code beyond supporting the current > character set detection because I don't think IDNA is popular enough, > but maybe Lennart and others want to do it. > > For further possible markchars.el functionality, take a look at > http://www.unicode.org/reports/tr36/ (Unicode Security Considerations). > It talks about the confusables issues, IDNA issues, and bidi issues > among others. It's a really good explanation of what security-related > functionality is needed from the confusables char-table and potentially > other places in Emacs. Yes, I'm familiar with these specs, but I didn't try to design an implementation of IDNA, so the answers to the above questions are not clear to me. If someone can explain how would an IDNA implementation use confusables.txt, we could make sure that the tables we produce from it will be good for IDNA as well, if and when someone will want to implement it. ^ permalink raw reply [flat|nested] 182+ messages in thread
* Re: idn.el and confusables.txt 2011-05-16 12:38 ` Eli Zaretskii @ 2011-05-16 18:31 ` Ted Zlatanov 2011-05-17 17:59 ` Eli Zaretskii 0 siblings, 1 reply; 182+ messages in thread From: Ted Zlatanov @ 2011-05-16 18:31 UTC (permalink / raw) To: emacs-devel On Mon, 16 May 2011 15:38:38 +0300 Eli Zaretskii <eliz@gnu.org> wrote: >> From: Ted Zlatanov <tzz@lifelogs.com> >> Date: Sun, 15 May 2011 07:14:47 -0500 >> EZ> If we were to implement the full IDNA protocol, would the above be EZ> enough? Or will we need additional information? >> >> Oh, all this has been for confusables (TR39) only. IDNA and uni-idn.el >> will have their own needs! EZ> Granted, but I was asking whether IDNA implementation will need EZ> anything from confusables.txt beyond what we discussed. That is, will EZ> a mapping of a character to a string be all IDNA needs to use the EZ> information in confusables.txt? I can't think of any other uses, so yes. Lennart may want to comment. >> IDNA is better described in http://unicode.org/reports/tr46/ and the >> links at the end of that document (a whole bunch of RFCs). I'm not >> interested in implementing the IDNA code beyond supporting the current >> character set detection because I don't think IDNA is popular enough, >> but maybe Lennart and others want to do it. >> >> For further possible markchars.el functionality, take a look at >> http://www.unicode.org/reports/tr36/ (Unicode Security Considerations). >> It talks about the confusables issues, IDNA issues, and bidi issues >> among others. It's a really good explanation of what security-related >> functionality is needed from the confusables char-table and potentially >> other places in Emacs. EZ> Yes, I'm familiar with these specs, but I didn't try to design an EZ> implementation of IDNA, so the answers to the above questions are not EZ> clear to me. If someone can explain how would an IDNA implementation EZ> use confusables.txt, we could make sure that the tables we produce EZ> from it will be good for IDNA as well, if and when someone will want EZ> to implement it. I may be misunderstanding, but I think you're sort of implying that IDNA (in domain names and URLs) is the main use for confusables.txt. I think file names, shell interactions, and general text (especially source code) are also important uses of that data because confusables in those settings can introduce security issues. Sorry if this is clear to you and everyone else. Ted ^ permalink raw reply [flat|nested] 182+ messages in thread
* Re: idn.el and confusables.txt 2011-05-16 18:31 ` Ted Zlatanov @ 2011-05-17 17:59 ` Eli Zaretskii 0 siblings, 0 replies; 182+ messages in thread From: Eli Zaretskii @ 2011-05-17 17:59 UTC (permalink / raw) To: Ted Zlatanov; +Cc: emacs-devel > From: Ted Zlatanov <tzz@lifelogs.com> > Date: Mon, 16 May 2011 13:31:44 -0500 > > >> For further possible markchars.el functionality, take a look at > >> http://www.unicode.org/reports/tr36/ (Unicode Security Considerations). > >> It talks about the confusables issues, IDNA issues, and bidi issues > >> among others. It's a really good explanation of what security-related > >> functionality is needed from the confusables char-table and potentially > >> other places in Emacs. > > EZ> Yes, I'm familiar with these specs, but I didn't try to design an > EZ> implementation of IDNA, so the answers to the above questions are not > EZ> clear to me. If someone can explain how would an IDNA implementation > EZ> use confusables.txt, we could make sure that the tables we produce > EZ> from it will be good for IDNA as well, if and when someone will want > EZ> to implement it. > > I may be misunderstanding, but I think you're sort of implying that IDNA > (in domain names and URLs) is the main use for confusables.txt. No, I wasn't implying that. But it's certainly one possible use. ^ permalink raw reply [flat|nested] 182+ messages in thread
* Re: idn.el and confusables.txt 2011-05-15 12:14 ` Ted Zlatanov 2011-05-16 12:38 ` Eli Zaretskii @ 2011-05-17 15:32 ` Ted Zlatanov 2011-05-18 18:15 ` Ted Zlatanov 1 sibling, 1 reply; 182+ messages in thread From: Ted Zlatanov @ 2011-05-17 15:32 UTC (permalink / raw) To: emacs-devel [-- Attachment #1: Type: text/plain, Size: 1014 bytes --] On Sun, 15 May 2011 07:14:47 -0500 Ted Zlatanov <tzz@lifelogs.com> wrote: TZ> On Sun, 15 May 2011 01:56:02 -0400 Eli Zaretskii <eliz@gnu.org> wrote: EZ> These all examine portions of a buffer ("words") for being a match to EZ> some string or regexp. So I think having strings in the char-table EZ> will be more convenient, because you could then use looking-at, EZ> string=, string-match, etc. TZ> Oh, good point. OK, strings it is. I'll write the converter. Here's the converter. It reads the confusables.txt file and generates a char-table with strings as values. I'll package the converter and the resulting uni-confusables.el library and put them on the GNU ELPA. Could you tell me the best way to write uni-confusables.el? In what format should I provide the char-tables in the ELisp code? The uni-* files in Emacs use `define-char-code-property' because they deal with a single property, so should the properties here be 'confusables-single-script and 'confusables-multiple-script? Thanks Ted [-- Attachment #2: gen-confusables.el --] [-- Type: application/emacs-lisp, Size: 2498 bytes --] ^ permalink raw reply [flat|nested] 182+ messages in thread
* Re: idn.el and confusables.txt 2011-05-17 15:32 ` Ted Zlatanov @ 2011-05-18 18:15 ` Ted Zlatanov 0 siblings, 0 replies; 182+ messages in thread From: Ted Zlatanov @ 2011-05-18 18:15 UTC (permalink / raw) To: emacs-devel [-- Attachment #1: Type: text/plain, Size: 1141 bytes --] On Tue, 17 May 2011 10:32:03 -0500 Ted Zlatanov <tzz@lifelogs.com> wrote: TZ> Here's the converter. It reads the confusables.txt file and generates a TZ> char-table with strings as values. I'll package the converter and the TZ> resulting uni-confusables.el library and put them on the GNU ELPA. TZ> Could you tell me the best way to write uni-confusables.el? In what TZ> format should I provide the char-tables in the ELisp code? The shortest format turned out to be a range enumeration, because the native char-table dump was much bigger (700K vs. 100K). So I wrote `gen-confusables-write' to create the "uni-confusables.el" file that defines the two char-tables and then populates them. As a bonus, two ERT tests (one per single/multiple type) are also generated dynamically based on the data found in the confusables.txt file. gen-confusables.el is a pretty unholy mix of Lisp and string manipulations, but since I am the only real user I don't mind. You can test it with http://www.unicode.org/Public/security/revision-04/confusables.txt (I'm not including the resulting uni-confusables.el here because it's over 100K). Ted [-- Attachment #2: gen-confusables.el --] [-- Type: application/emacs-lisp, Size: 4409 bytes --] ;;; gen-confusables.el --- generate uni-confusables.el from confusables.txt ;; Copyright (C) 2011 Teodor Zlatanov ;; Author: Teodor Zlatanov <tzz@lifelogs.com> ;; This program is free software; you can redistribute it and/or modify ;; it under the terms of the GNU General Public License as published by ;; the Free Software Foundation, either version 3 of the License, or ;; (at your option) any later version. ;; This program is distributed in the hope that it will be useful, ;; but WITHOUT ANY WARRANTY; without even the implied warranty of ;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the ;; GNU General Public License for more details. ;; You should have received a copy of the GNU General Public License ;; along with this program. If not, see <http://www.gnu.org/licenses/>. ;;; Commentary: ;;; Code: (require 'cl) (defvar gen-confusables-char-table-single) (defvar gen-confusables-char-table-multiple) (defun gen-confusables-read (file) (interactive "fConfusables filename: \n") (flet ((reader (h) (string-to-number h 16))) (let ((stable (make-char-table 'confusables-single-script)) (mtable (make-char-table 'confusables-multiple-script)) (count 0) (confusable-line-regexp (concat "^\\([[:xdigit:]]+\\)" ; \x+ " ;\t" ;; \x+ separated by spaces "\\([[:space:][:xdigit:]]+\\)" " ;\t" "\\([SM]\\)[LA]"))) ; SL, SA, ML, MA (setq gen-confusables-char-table-single stable) (setq gen-confusables-char-table-multiple mtable) (with-temp-buffer (insert-file-contents file) (goto-char (point-min)) (while (re-search-forward confusable-line-regexp nil t) (incf count) (when (and (called-interactively-p) (zerop (mod count 100))) (message "processed %d lines" count)) (let* ((from (match-string 1)) (to (match-string 2)) (class (match-string 3)) (table (if (string-equal "S" class) stable mtable))) (set-char-table-range table (reader from) (concat (mapcar 'reader (split-string to)))))))))) (defun gen-confusables-write (file) (interactive "fDumped filename: \n") (let ((coding-system-for-write 'utf-8-emacs)) (with-temp-file file (insert ";; Copyright (C) 1991-2009, 2010 Unicode, Inc. ;; This file was generated from the Unicode confusables list at ;; http://www.unicode.org/Public/security/revision-04/confusables.txt. ;; See lisp/international/README in the Emacs trunk ;; for the copyright and permission notice.\n\n") (dolist (type '(single multiple)) (let* ((tablesym (intern (format "uni-confusables-char-table-%s" type))) (oursym (intern (format "gen-confusables-char-table-%s" type))) (ourtable (symbol-value oursym)) (ourtablename (symbol-name oursym)) (tablename (symbol-name tablesym)) (prop (format "confusables-%s-script" type)) props) (insert (format "(defvar %s (make-char-table '%s))\n\n" tablename prop)) (map-char-table (lambda (k v) (setq props (cons k (cons v props)))) ourtable) (insert (format "(let ((k nil) (v nil) (ranges '%S))\n" props)) (insert (format " (while ranges (setq k (pop ranges) v (pop ranges)) (set-char-table-range %s k v)))\n\n" tablename)) (insert (format "(ert-deftest uni-confusables-test-%s ()\n" type)) (dolist (offset '(100 200 800 3000 3500)) (insert (format " (should (string-equal (char-table-range %s %d) %S))\n" tablename (nth (* 2 offset) props) (nth (1+ (* 2 offset)) props)))) (insert ")\n\n"))) (insert " ;; Local Variables: ;; coding: utf-8 ;; no-byte-compile: t ;; End: ;; uni-confusables.el ends here")))) (provide 'gen-confusables) ;;; gen-confusables.el ends here ^ permalink raw reply [flat|nested] 182+ messages in thread
* Re: idn.el and confusables.txt 2011-05-14 14:38 ` Eli Zaretskii 2011-05-14 15:30 ` Ted Zlatanov @ 2011-05-14 17:25 ` Stefan Monnier 1 sibling, 0 replies; 182+ messages in thread From: Stefan Monnier @ 2011-05-14 17:25 UTC (permalink / raw) To: Eli Zaretskii; +Cc: Ted Zlatanov, emacs-devel > I'm not arguing about where to put them. I'm saying that for such > basic infrastructure, we should consider the possible uses before we > rush into implementation. Otherwise, we will again repeat the same I recommend we keep it in the GNU ELPA for now, so it can more easily be changed in the future. Stefan ^ permalink raw reply [flat|nested] 182+ messages in thread
* Re: idn.el and confusables.txt 2011-05-14 8:06 ` Eli Zaretskii 2011-05-14 8:56 ` Lennart Borgman 2011-05-14 13:40 ` Ted Zlatanov @ 2011-05-15 13:06 ` Kenichi Handa 2011-05-15 17:34 ` Eli Zaretskii 2 siblings, 1 reply; 182+ messages in thread From: Kenichi Handa @ 2011-05-15 13:06 UTC (permalink / raw) To: Eli Zaretskii; +Cc: tzz, monnier, emacs-devel In article <83iptdg0yr.fsf@gnu.org>, Eli Zaretskii <eliz@gnu.org> writes: > You see, the uni-*.el files we create out of the Unicode DB are not > used anywhere in application code, AFAIK. We use them to display > character properties in the likes of "C-u C-x =", and that's it. composite.el uses `general-category' and `canonical-combining-class'. ucs-normalize.el uses `decomposition' and `canonical-combining-class'. mule-cmds.el uses `name' and `old-name' for read-char-by-name. > I'm > not even sure they are organized in a way that makes them useful. > E.g., when I needed to use the Unicode bidirectional properties for > bidi reordering, I eventually was forced to create my own tables (see > src/biditype.h and src/bidimirror.h, and the corresponding Awk scripts > in admin/unidata/) which lend themselves well to using them in > real-life code. Why did you have to create another table? Was it because get-char-code-property is defiend by Lisp and not efficient to call from C? --- Kenichi Handa handa@m17n.org ^ permalink raw reply [flat|nested] 182+ messages in thread
* Re: idn.el and confusables.txt 2011-05-15 13:06 ` Kenichi Handa @ 2011-05-15 17:34 ` Eli Zaretskii 2011-05-18 5:23 ` handa 0 siblings, 1 reply; 182+ messages in thread From: Eli Zaretskii @ 2011-05-15 17:34 UTC (permalink / raw) To: Kenichi Handa; +Cc: tzz, monnier, emacs-devel > From: Kenichi Handa <handa@m17n.org> > Cc: monnier@iro.umontreal.ca, tzz@lifelogs.com, emacs-devel@gnu.org > Date: Sun, 15 May 2011 22:06:23 +0900 > > In article <83iptdg0yr.fsf@gnu.org>, Eli Zaretskii <eliz@gnu.org> writes: > > > You see, the uni-*.el files we create out of the Unicode DB are not > > used anywhere in application code, AFAIK. We use them to display > > character properties in the likes of "C-u C-x =", and that's it. > > composite.el uses `general-category' and `canonical-combining-class'. > ucs-normalize.el uses `decomposition' and `canonical-combining-class'. > mule-cmds.el uses `name' and `old-name' for read-char-by-name. Are functions defined by ucs-normalize.el used anywhere? > Why did you have to create another table? Was it because > get-char-code-property is defiend by Lisp and not efficient > to call from C? Yes, calling a Lisp function (one that calls `load' at that!) in the lowest level of display engine was out of the question. But there were several other reasons as well: . get-char-code-property returns a property list in which bidi types are recorded as symbols, while I needed them as small numeric values of a C enumerated type (see bidi_type_t), to fit in a small number of bits in `struct glyph'. . The data structures manipulated by get-char-code-property include complications (e.g., a function in the extra slot) for which I could find no documentation, so I couldn't figure out whether it would be possible to replace get-char-code-property by a simple call to CHAR_TABLE_REF. . Even if I could use CHAR_TABLE_REF, the additional call to plist-get means more overhead. bidi_get_type, the function which needs to look up the bidirectional type of an arbitrary character, runs in the innermost loop of the display engine, and is called at least once (sometimes more) for every character in the displayed portion of the buffer, so it must be very efficient. . For bidi-mirrored property, the data in the `mirrored' property recorded by uni-mirrored.el is simply inadequate: the value is a boolean (albeit in a form of symbols `Y' and `N'). What I needed was for each character its mirrored character, if there is one; this data was simply not available in uni-mirrored.el. The corresponding function bidi_mirror_char is also called for a large percentage of displayed characters, and must be efficient. It was extremely frustrating to have all that data at my fingertips and not be able to use it for the purposes of bidi.c, which at first seems like a first-class client of Unicode DB. What I wanted was something similar to C ctype macros in simplicity and efficiency, but nothing quite like that was available. A char-table comes close, but it must be a simple table with numerical values -- and that is what bidi.c currently uses, leaving uni-bidi.el unused. ^ permalink raw reply [flat|nested] 182+ messages in thread
* Re: idn.el and confusables.txt 2011-05-15 17:34 ` Eli Zaretskii @ 2011-05-18 5:23 ` handa 2011-05-18 7:38 ` Eli Zaretskii 0 siblings, 1 reply; 182+ messages in thread From: handa @ 2011-05-18 5:23 UTC (permalink / raw) To: Eli Zaretskii; +Cc: tzz, monnier, emacs-devel In article <83wrhreukg.fsf@gnu.org>, Eli Zaretskii <eliz@gnu.org> writes: > Are functions defined by ucs-normalize.el used anywhere? What do you mean by "anywhere". That file provides a coding system utf-8-hfs that uses functions in that file. And I don't know whether a specific coding system is used or not except for what I use. In addition, I have a plan of using the facility of normalization to display, for instance, the sequence 'A' '́' by 'Á' when a font doesn't support OTF GPOS features. > > Why did you have to create another table? Was it because > > get-char-code-property is defiend by Lisp and not efficient > > to call from C? > Yes, calling a Lisp function (one that calls `load' at that!) in the > lowest level of display engine was out of the question. Why? If that loading is just once, what's the problem? For automatic composition, we are already doing that. > But there were several other reasons as well: > . get-char-code-property returns a property list in which bidi types > are recorded as symbols, while I needed them as small numeric > values of a C enumerated type (see bidi_type_t), to fit in a small > number of bits in `struct glyph'. > . The data structures manipulated by get-char-code-property include > complications (e.g., a function in the extra slot) for which I > could find no documentation, so I couldn't figure out whether it > would be possible to replace get-char-code-property by a simple > call to CHAR_TABLE_REF. > . Even if I could use CHAR_TABLE_REF, the additional call to > plist-get means more overhead. bidi_get_type, the function which > needs to look up the bidirectional type of an arbitrary character, > runs in the innermost loop of the display engine, and is called at > least once (sometimes more) for every character in the displayed > portion of the buffer, so it must be very efficient. > . For bidi-mirrored property, the data in the `mirrored' property > recorded by uni-mirrored.el is simply inadequate: the value is a > boolean (albeit in a form of symbols `Y' and `N'). What I needed > was for each character its mirrored character, if there is one; > this data was simply not available in uni-mirrored.el. The > corresponding function bidi_mirror_char is also called for a large > percentage of displayed characters, and must be efficient. > It was extremely frustrating to have all that data at my fingertips > and not be able to use it for the purposes of bidi.c, which at first > seems like a first-class client of Unicode DB. What I wanted was > something similar to C ctype macros in simplicity and efficiency, but > nothing quite like that was available. A char-table comes close, but > it must be a simple table with numerical values -- and that is what > bidi.c currently uses, leaving uni-bidi.el unused. I understand your situation, and it's unfortunate that you didn't consult me to improve the situation. I think it's not difficult to write a C code that makes uni-* char-tables more easily be accessed from C without running Lisp. As for the value type of symbol vs integer in uni-bidi.el, it's possible to write a code to expose integer value to C, and the corresponding symbol to Lisp. As for the uni-mirrored.el, I think it's ok to change the value type to what you want (character or nil). --- Kenichi Handa handa@m17n.org ^ permalink raw reply [flat|nested] 182+ messages in thread
* Re: idn.el and confusables.txt 2011-05-18 5:23 ` handa @ 2011-05-18 7:38 ` Eli Zaretskii 2011-05-18 7:59 ` handa 0 siblings, 1 reply; 182+ messages in thread From: Eli Zaretskii @ 2011-05-18 7:38 UTC (permalink / raw) To: handa; +Cc: monnier, emacs-devel > From: handa <handa@m17n.org> > Cc: monnier@iro.umontreal.ca, tzz@lifelogs.com, emacs-devel@gnu.org > Date: Wed, 18 May 2011 14:23:26 +0900 > > > > Why did you have to create another table? Was it because > > > get-char-code-property is defiend by Lisp and not efficient > > > to call from C? > > > Yes, calling a Lisp function (one that calls `load' at that!) in the > > lowest level of display engine was out of the question. > > Why? If that loading is just once, what's the problem? For > automatic composition, we are already doing that. When I started working on bidirectional support, I didn't know that (as well as many other intimate details of the display engine I know now). Loading and calling Lisp code in the middle of redisplay buffer iteration surely sounded scary. I also didn't know then whether we will enable the bidi reordering by default, or just when and where needed. The delay of loading is significant, and I wanted to avoid that. Maybe these all were just shadows of a dwarf, I don't know. I guess the real reason is that I felt alone in this adventure (I still do), and needed to get things done as quickly and as simply as possible, and in a way that I understood well enough to design and write reliable code. The uni-* files and the corresponding properties were not documented back then, except in short doc strings, so it was hard for me to know whether changing them to what bidi.c needed was feasible. With the little time I had to work on that, and the job being so huge, I couldn't afford spending too much time on research and discussions. > I understand your situation, and it's unfortunate that you > didn't consult me to improve the situation. I think it's > not difficult to write a C code that makes uni-* char-tables > more easily be accessed from C without running Lisp. It's not too late: Emacs 24 is not yet released, and won't be for some time. If uni-bidi can be changed to suit the requirements of bidi.c, I will gladly change bidi.c to use it and throw away the private table it uses now. TIA. > As for the value type of symbol vs integer in uni-bidi.el, > it's possible to write a code to expose integer value to > C, and the corresponding symbol to Lisp. That would be great, thanks. > As for the uni-mirrored.el, I think it's ok to change the > value type to what you want (character or nil). AFAIK, this needs a change in how uni-mirrored.el is generated, because UnicodeData.txt doesn't include the necessary information. So admin/unidata/unidata-gen.el's way of gleaning the info from UnicodeData.txt is not appropriate for such a change. Maybe we should simply expose to Lisp the table built from BidiMirroring.txt, which bidi.c already creates and uses. ^ permalink raw reply [flat|nested] 182+ messages in thread
* Re: idn.el and confusables.txt 2011-05-18 7:38 ` Eli Zaretskii @ 2011-05-18 7:59 ` handa 2011-05-18 8:13 ` Eli Zaretskii 2011-06-17 8:15 ` Kenichi Handa 0 siblings, 2 replies; 182+ messages in thread From: handa @ 2011-05-18 7:59 UTC (permalink / raw) To: Eli Zaretskii; +Cc: monnier, emacs-devel In article <E1QMbKx-0003tU-B3@fencepost.gnu.org>, Eli Zaretskii <eliz@gnu.org> writes: > AFAIK, this needs a change in how uni-mirrored.el is generated, > because UnicodeData.txt doesn't include the necessary information. So > admin/unidata/unidata-gen.el's way of gleaning the info from > UnicodeData.txt is not appropriate for such a change. > Maybe we should simply expose to Lisp the table built from > BidiMirroring.txt, which bidi.c already creates and uses. Now there are several *.txt files that can't be delived from UnicodeData.txt, and I want to provide Emacs the facility to utilize all such Unicode data. For that, I want to improve unidata-gen.el to handle those data files. But, at first, I'll work on improving C interface to uni-*.el. --- Kenichi Handa handa@m17n.org ^ permalink raw reply [flat|nested] 182+ messages in thread
* Re: idn.el and confusables.txt 2011-05-18 7:59 ` handa @ 2011-05-18 8:13 ` Eli Zaretskii 2011-06-17 8:15 ` Kenichi Handa 1 sibling, 0 replies; 182+ messages in thread From: Eli Zaretskii @ 2011-05-18 8:13 UTC (permalink / raw) To: handa; +Cc: monnier, emacs-devel > From: handa <handa@m17n.org> > Cc: monnier@iro.umontreal.ca, emacs-devel@gnu.org > Date: Wed, 18 May 2011 16:59:15 +0900 > > Now there are several *.txt files that can't be delived from > UnicodeData.txt, and I want to provide Emacs the facility to > utilize all such Unicode data. For that, I want to improve > unidata-gen.el to handle those data files. > > But, at first, I'll work on improving C interface to > uni-*.el. Thank you. ^ permalink raw reply [flat|nested] 182+ messages in thread
* Re: idn.el and confusables.txt 2011-05-18 7:59 ` handa 2011-05-18 8:13 ` Eli Zaretskii @ 2011-06-17 8:15 ` Kenichi Handa 2011-06-17 15:12 ` Eli Zaretskii 1 sibling, 1 reply; 182+ messages in thread From: Kenichi Handa @ 2011-06-17 8:15 UTC (permalink / raw) To: eliz; +Cc: emacs-devel In article <tl7r57wo2wc.fsf@m17n.org>, handa <handa@m17n.org> writes: > But, at first, I'll work on improving C interface to > uni-*.el. Attached is the first version for that. It provides two C functions (excerpt from chartab.c). /* Unicode character property This section provides a convenient and efficient way to get a Unicode character property from C code (from Lisp, you can use get-char-code-property). The typical usage is to get a char-table for a specific property at a proper initialization time as this: Lisp_Object bidi_class_table = uniprop_table (intern ("bidi-class")); and get a property value for character CH as this: Lisp_Object bidi_class = uniprop_table_lookup (CH, bidi_class_table); In this case, what you actually get is an index number to the vector of property values (symbols nil, L, R, etc). See the comment of uniprop_table_lookup for the detail. */ Eli, I arranged that bidi_class got as above is an integer value that is the same as enum bidi_type_t defined in dispextern.h. And, updata_table (intern "mirroring") gives a char-table for mirroring character. Could you check if the attached patch gives sufficient facility to be used in the bidi code? --- Kenichi Handa handa@m17n.org === modified file 'admin/ChangeLog' --- admin/ChangeLog 2011-04-10 16:33:22 +0000 +++ admin/ChangeLog 2011-06-17 07:27:55 +0000 @@ -1,3 +1,29 @@ +2011-06-17 Kenichi Handa <handa@m17n.org> + + * unidata/unidata-gen.el (unidata-dir): New variable. + (unidata-setup-list): Expand unidata-text-file in unidata-dir. + (unidata-prop-alist): INDEX element may be a function. New + optional element VAL-LIST (for general-category and bidi-class). + New entry `mirroring'. + (unidata-get-character): Adjusted for the new compression format + of char-table element. + (unidata-gen-table-character): New arg IGNORE. Adjusted for the + above changes. + (unidata-encode-val): Assume the first element of VAL-LIST is (nil + . 0). + (unidata-gen-table): Change argument DEFAULT-VALUE to VAL-LIST. + (unidata-gen-table-symbol): New arg VAL-LIST. + (unidata-gen-table-integer): Likewise. + (unidata-gen-table-numeric): Likewise. + (unidata-gen-table-name): New arg IGNORE. + (unidata-gen-table-decomposition): Likewise. + (unidata-gen-mirroring-list): New funciton. + (unidata-gen-files): New arg DATA-DIR. Adjusted for the change of + unidata-prop-alist. + + * unidata/Makefile.in (${DSTDIR}/charprop.el): New arg to + unidata-gen-files. + 2011-03-07 Chong Yidong <cyd@stupidchicken.com> * Version 23.3 released. === modified file 'admin/unidata/Makefile.in' --- admin/unidata/Makefile.in 2011-01-14 17:18:41 +0000 +++ admin/unidata/Makefile.in 2011-06-16 03:46:42 +0000 @@ -33,9 +33,10 @@ ${DSTDIR}/charprop.el: unidata-gen.elc unidata.txt ELC=`/bin/pwd`/unidata-gen.elc; \ - DATA=`/bin/pwd`/unidata.txt; \ + DATADIR=`/bin/pwd`; \ + DATA=unidata.txt; \ cd ${DSTDIR}; \ - ${RUNEMACS} -batch --load $${ELC} -f unidata-gen-files $${DATA} + ${RUNEMACS} -batch --load $${ELC} -f unidata-gen-files $${DATADIR} $${DATA} ../../src/biditype.h: UnicodeData.txt gawk -F";" -f biditype.awk $< > $@ === modified file 'admin/unidata/unidata-gen.el' --- admin/unidata/unidata-gen.el 2011-01-14 17:18:41 +0000 +++ admin/unidata/unidata-gen.el 2011-06-17 07:07:19 +0000 @@ -41,7 +41,7 @@ ;; uni-name.el, uni-category.el, uni-combining.el, uni-bidi.el, ;; uni-decomposition.el, uni-decimal.el, uni-digit.el, uni-numeric.el, ;; uni-mirrored.el, uni-old-name.el, uni-comment.el, uni-uppercase.el, -;; uni-lowercase.el, uni-titlecase.el +;; uni-lowercase.el, uni-titlecase.el, uni-mirroring.el ;; They each contain a single form of this format: ;; (char-code-property-register PROP CHAR-TABLE) ;; where PROP is the same as above, and CHAR-TABLE is a @@ -49,8 +49,9 @@ ;; ;; When they are installed in .../lisp/international/, the file ;; "charprop.el" is preloaded in loadup.el. The other files are -;; automatically loaded when the functions `get-char-code-property' -;; and `put-char-code-property' are called. +;; automatically loaded when the Lisp functions +;; `get-char-code-property' and `put-char-code-property', and C +;; function uniprop_table are called. ;; ;; FORMAT OF A CHAR TABLE ;; @@ -70,7 +71,8 @@ ;; The char table has four extra slots: ;; 1st: property symbol -;; 2nd: function to call to get a property value +;; 2nd: function to call to get a property value, +;; or an index number of C function to uncompress the data ;; 3nd: function to call to put a property value ;; 4th: function to call to get a description of a property value ;; 5th: data referred by the above functions @@ -82,6 +84,11 @@ (defvar unidata-list nil) +;; Name of the directory containing files of Unicode Character +;; Database. + +(defvar unidata-dir nil) + (defun unidata-setup-list (unidata-text-file) (let* ((table (list nil)) (tail table) @@ -90,6 +97,7 @@ ("^<.*Surrogate" . nil) ("^<.*Private Use" . PRIVATE\ USE))) val char name) + (setq unidata-text-file (expand-file-name unidata-text-file unidata-dir)) (or (file-readable-p unidata-text-file) (error "File not readable: %s" unidata-text-file)) (with-temp-buffer @@ -136,10 +144,13 @@ ;; Alist of this form: ;; (PROP INDEX GENERATOR FILENAME) ;; PROP: character property -;; INDEX: index to each element of unidata-list for PROP +;; INDEX: index to each element of unidata-list for PROP. +;; It may be a function that generates an alist of character codes +;; vs. the corresponding property values. ;; GENERATOR: function to generate a char-table ;; FILENAME: filename to store the char-table ;; DESCRIBER: function to call to get a description string of property value +;; VAL-LIST: list of specially ordered property values (defconst unidata-prop-alist '((name @@ -152,7 +163,9 @@ Property value is one of the following symbols: Lu, Ll, Lt, Lm, Lo, Mn, Mc, Me, Nd, Nl, No, Pc, Pd, Ps, Pe, Pi, Pf, Po, Sm, Sc, Sk, So, Zs, Zl, Zp, Cc, Cf, Cs, Co, Cn" - unidata-describe-general-category) + unidata-describe-general-category + (Lu Ll Lt Lm Lo Mn Mc Me Nd Nl No Pc Pd Ps Pe Pi Pf Po + Sm Sc Sk So Zs Zl Zp Cc Cf Cs Co Cn)) (canonical-combining-class 3 unidata-gen-table-integer "uni-combining.el" "Unicode canonical combining class. @@ -164,7 +177,8 @@ Property value is one of the following symbols: L, LRE, LRO, R, AL, RLE, RLO, PDF, EN, ES, ET, AN, CS, NSM, BN, B, S, WS, ON" - unidata-describe-bidi-class) + unidata-describe-bidi-class + (L R EN AN BN B AL LRE LRO RLE RLO PDF ES ET CS NSM S WS ON)) (decomposition 5 unidata-gen-table-decomposition "uni-decomposition.el" "Unicode decomposition mapping. @@ -188,7 +202,7 @@ (mirrored 9 unidata-gen-table-symbol "uni-mirrored.el" "Unicode bidi mirrored flag. -Property value is a symbol `Y' or `N'.") +Property value is a symbol `Y' or `N'. See also the property `mirroring'.") (old-name 10 unidata-gen-table-name "uni-old-name.el" "Unicode old names as published in Unicode 1.0. @@ -211,7 +225,12 @@ 14 unidata-gen-table-character "uni-titlecase.el" "Unicode simple titlecase mapping. Property value is a character." - string))) + string) + (mirroring + unidata-gen-mirroring-list unidata-gen-table-character "uni-mirroring.el" + "Unicode bidi-mirroring characters. +Property value is a character that has the corresponding mirroring image, +or nil for non-mirrored character."))) ;; Functions to access the above data. (defsubst unidata-prop-index (prop) (nth 1 (assq prop unidata-prop-alist))) @@ -219,6 +238,7 @@ (defsubst unidata-prop-file (prop) (nth 3 (assq prop unidata-prop-alist))) (defsubst unidata-prop-docstring (prop) (nth 4 (assq prop unidata-prop-alist))) (defsubst unidata-prop-describer (prop) (nth 5 (assq prop unidata-prop-alist))) +(defsubst unidata-prop-val-list (prop) (nth 6 (assq prop unidata-prop-alist))) \f ;; SIMPLE TABLE @@ -227,14 +247,18 @@ ;; values of succeeding character codes are usually different, we use ;; a char-table described here to store such values. ;; -;; If succeeding 128 characters has no property, a char-table has the -;; symbol t for them. Otherwise a char-table has a string of the -;; following format for them. +;; A char-table divides character code space (#x0..#x3FFFFF) into +;; #x8000 blocks (each block contains 128 characters). + +;; If all characters of a block have no property, a char-table has the +;; symbol nil for that block. Otherwise a char-table has a string of +;; the following format for it. ;; -;; The first character of the string is FIRST-INDEX. -;; The Nth (N > 0) character of the string is a property value of the -;; character (BLOCK-HEAD + FIRST-INDEX + N - 1), where BLOCK-HEAD is -;; the first of the characters in the block. +;; The first character of the string is '\0'. +;; The second character of the string is FIRST-INDEX. +;; The Nth (N > 1) character of the string is a property value of the +;; character (BLOCK-HEAD + FIRST-INDEX + N - 2), where BLOCK-HEAD is +;; the first character of the block. ;; ;; The 4th extra slot of a char-table is nil. @@ -247,9 +271,9 @@ (let* ((len (length val)) (block-head (lsh (lsh char -7) 7)) (vec (make-vector 128 nil)) - (first-index (aref val 0))) + (first-index (aref val 1))) (dotimes (i (1- len)) - (let ((elt (aref val (1+ i)))) + (let ((elt (aref val (+ 2 i)))) (if (> elt 0) (aset vec (+ first-index i) elt)))) (dotimes (i 128) @@ -266,13 +290,15 @@ (funcall (char-table-extra-slot table 1) char current-val table)) (aset table char val)))) -(defun unidata-gen-table-character (prop) +(defun unidata-gen-table-character (prop ignore) (let ((table (make-char-table 'char-code-property-table)) (prop-idx (unidata-prop-index prop)) (vec (make-vector 128 0)) (tail unidata-list) elt range val idx slot) - (set-char-table-range table (cons 0 (max-char)) t) + (if (functionp prop-idx) + (setq tail (funcall prop-idx) + prop-idx 1)) (while tail (setq elt (car tail) tail (cdr tail)) (setq range (car elt) @@ -301,7 +327,7 @@ (setq first-index last-index))) (setq tail (cdr tail))) (when first-index - (let ((str (string first-index)) + (let ((str (string 0 first-index)) c) (while (<= first-index last-index) (setq str (format "%s%c" str (or (aref vec first-index) 0)) @@ -311,7 +337,7 @@ (set-char-table-extra-slot table 0 prop) (byte-compile 'unidata-get-character) (byte-compile 'unidata-put-character) - (set-char-table-extra-slot table 1 (symbol-function 'unidata-get-character)) + (set-char-table-extra-slot table 1 0) (set-char-table-extra-slot table 2 (symbol-function 'unidata-put-character)) table)) @@ -463,30 +489,34 @@ ;; Encode the character property value VAL into an integer value by ;; VAL-LIST. By side effect, VAL-LIST is modified. ;; VAL-LIST has this form: -;; (t (VAL1 . VAL-CODE1) (VAL2 . VAL-CODE2) ...) -;; If VAL is one of VALn, just return VAL-CODEn. Otherwise, -;; VAL-LIST is modified to this: -;; (t (VAL . (1+ VAL-CODE1)) (VAL1 . VAL-CODE1) (VAL2 . VAL-CODE2) ...) +;; ((nil . 0) (VAL1 . 1) (VAL2 . 2) ...) +;; If VAL is one of VALn, just return n. +;; Otherwise, VAL-LIST is modified to this: +;; ((nil . 0) (VAL1 . 1) (VAL2 . 2) ... (VAL . n+1)) (defun unidata-encode-val (val-list val) (let ((slot (assoc val val-list)) val-code) (if slot (cdr slot) - (setq val-code (if (cdr val-list) (1+ (cdr (nth 1 val-list))) 1)) - (setcdr val-list (cons (cons val val-code) (cdr val-list))) + (setq val-code (length val-list)) + (nconc val-list (list (cons val val-code))) val-code))) ;; Generate a char-table for the character property PROP. -(defun unidata-gen-table (prop val-func default-value) +(defun unidata-gen-table (prop val-func val-list) (let ((table (make-char-table 'char-code-property-table)) (prop-idx (unidata-prop-index prop)) - (val-list (list t)) (vec (make-vector 128 0)) tail elt range val val-code idx slot prev-range-data) - (set-char-table-range table (cons 0 (max-char)) default-value) + (setq val-list (cons nil (copy-sequence val-list))) + (setq tail val-list val-code 0) + ;; Convert (nil A B ...) to ((nil . 0) (A . 1) (B . 2) ...) + (while tail + (setcar tail (cons (car tail) val-code)) + (setq tail (cdr tail) val-code (1+ val-code))) (setq tail unidata-list) (while tail (setq elt (car tail) tail (cdr tail)) @@ -534,7 +564,7 @@ (if val-code (aset vec (- range start) val-code)) (setq tail (cdr tail))) - (setq str "" val-code -1 count 0) + (setq str "\000" val-code -1 count 0) (mapc #'(lambda (x) (if (= val-code x) (setq count (1+ count)) @@ -559,34 +589,33 @@ (setq str (concat str (string val-code))))) (set-char-table-range table (cons start limit) str)))))) - (setq val-list (nreverse (cdr val-list))) (set-char-table-extra-slot table 0 prop) (set-char-table-extra-slot table 4 (vconcat (mapcar 'car val-list))) table)) -(defun unidata-gen-table-symbol (prop) +(defun unidata-gen-table-symbol (prop val-list) (let ((table (unidata-gen-table prop #'(lambda (x) (and (> (length x) 0) (intern x))) - 0))) + val-list))) (byte-compile 'unidata-get-symbol) (byte-compile 'unidata-put-symbol) - (set-char-table-extra-slot table 1 (symbol-function 'unidata-get-symbol)) + (set-char-table-extra-slot table 1 1) (set-char-table-extra-slot table 2 (symbol-function 'unidata-put-symbol)) table)) -(defun unidata-gen-table-integer (prop) +(defun unidata-gen-table-integer (prop val-list) (let ((table (unidata-gen-table prop #'(lambda (x) (and (> (length x) 0) (string-to-number x))) - t))) + val-list))) (byte-compile 'unidata-get-integer) (byte-compile 'unidata-put-integer) - (set-char-table-extra-slot table 1 (symbol-function 'unidata-get-integer)) + (set-char-table-extra-slot table 1 1) (set-char-table-extra-slot table 2 (symbol-function 'unidata-put-integer)) table)) -(defun unidata-gen-table-numeric (prop) +(defun unidata-gen-table-numeric (prop val-list) (let ((table (unidata-gen-table prop #'(lambda (x) (if (string-match "/" x) @@ -595,10 +624,10 @@ (substring x (match-end 0)))) (if (> (length x) 0) (string-to-number x)))) - t))) + val-list))) (byte-compile 'unidata-get-numeric) (byte-compile 'unidata-put-numeric) - (set-char-table-extra-slot table 1 (symbol-function 'unidata-get-numeric)) + (set-char-table-extra-slot table 1 1) (set-char-table-extra-slot table 2 (symbol-function 'unidata-put-numeric)) table)) @@ -1025,7 +1054,7 @@ idx (1+ i))))) (nreverse (cons (intern (substring str idx)) l)))))) -(defun unidata-gen-table-name (prop) +(defun unidata-gen-table-name (prop ignore) (let* ((table (unidata-gen-table-word-list prop 'unidata-split-name)) (word-tables (char-table-extra-slot table 4))) (byte-compile 'unidata-get-name) @@ -1064,7 +1093,7 @@ (nreverse l))))) -(defun unidata-gen-table-decomposition (prop) +(defun unidata-gen-table-decomposition (prop ignore) (let* ((table (unidata-gen-table-word-list prop 'unidata-split-decomposition)) (word-tables (char-table-extra-slot table 4))) (byte-compile 'unidata-get-decomposition) @@ -1171,6 +1200,19 @@ (string ?')))) val " ")) +(defun unidata-gen-mirroring-list () + (let ((head (list nil)) + tail) + (with-temp-buffer + (insert-file-contents (expand-file-name "BidiMirroring.txt" unidata-dir)) + (goto-char (point-min)) + (setq tail head) + (while (re-search-forward "^\\([0-9A-F]+\\);\\s +\\([0-9A-F]+\\)" nil t) + (let ((char (string-to-number (match-string 1) 16)) + (mirror (match-string 2))) + (setq tail (setcdr tail (list (list char mirror))))))) + (cdr head))) + ;; Verify if we can retrieve correct values from the generated ;; char-tables. @@ -1212,13 +1254,16 @@ ;; The entry function. It generates files described in the header ;; comment of this file. -(defun unidata-gen-files (&optional unidata-text-file) - (or unidata-text-file - (setq unidata-text-file (car command-line-args-left) +(defun unidata-gen-files (&optional data-dir unidata-text-file) + (or data-dir + (setq data-dir (car command-line-args-left) + command-line-args-left (cdr command-line-args-left) + unidata-text-file (car command-line-args-left) command-line-args-left (cdr command-line-args-left))) - (unidata-setup-list unidata-text-file) (let ((coding-system-for-write 'utf-8-unix) - (charprop-file "charprop.el")) + (charprop-file "charprop.el") + (unidata-dir data-dir)) + (unidata-setup-list unidata-text-file) (with-temp-file charprop-file (insert ";; Automatically generated by unidata-gen.el.\n") (dolist (elt unidata-prop-alist) @@ -1227,6 +1272,7 @@ (file (unidata-prop-file prop)) (docstring (unidata-prop-docstring prop)) (describer (unidata-prop-describer prop)) + (val-list (unidata-prop-val-list prop)) table) ;; Filename in this comment line is extracted by sed in ;; Makefile. @@ -1235,15 +1281,15 @@ prop file docstring)) (with-temp-file file (message "Generating %s..." file) - (setq table (funcall generator prop)) + (setq table (funcall generator prop val-list)) (when describer (unless (subrp (symbol-function describer)) (byte-compile describer) (setq describer (symbol-function describer))) (set-char-table-extra-slot table 3 describer)) (insert ";; Copyright (C) 1991-2009 Unicode, Inc. -;; This file was generated from the Unicode data file at -;; http://www.unicode.org/Public/UNIDATA/UnicodeData.txt. +;; This file was generated from the Unicode data files at +;; http://www.unicode.org/Public/UNIDATA/. ;; See lisp/international/README for the copyright and permission notice.\n" (format "(define-char-code-property '%S %S %S)\n" prop table docstring) @@ -1251,7 +1297,8 @@ ";; coding: utf-8\n" ";; no-byte-compile: t\n" ";; End:\n\n" - (format ";; %s ends here\n" file))))) + (format ";; %s ends here\n" file)) + (message "Generating %s...done" file)))) (message "Writing %s..." charprop-file) (insert ";; Local Variables:\n" ";; coding: utf-8\n" === modified file 'lisp/ChangeLog' --- lisp/ChangeLog 2011-05-18 03:42:33 +0000 +++ lisp/ChangeLog 2011-06-17 07:33:43 +0000 @@ -1,3 +1,28 @@ +2011-06-17 Kenichi Handa <handa@m17n.org> + + * international/mule-cmds.el (char-code-property-alist): Moved to + to src/chartab.c. + (get-char-code-property): Call get-unicode-property-internal where + necessary. + + * international/charprop.el: + * international/uni-bidi.el: + * international/uni-category.el: + * international/uni-combining.el: + * international/uni-comment.el: + * international/uni-decimal.el: + * international/uni-decomposition.el: + * international/uni-digit.el: + * international/uni-lowercase.el: + * international/uni-mirrored.el: + * international/uni-name.el: + * international/uni-numeric.el: + * international/uni-old-name.el: + * international/uni-titlecase.el: + * international/uni-uppercase.el: Regenerate. + + * international/uni-mirroring.el: New file. + 2011-05-18 Glenn Morris <rgm@gnu.org> * emacs-lisp/authors.el (authors-fixed-entries): Remove fakemail.c === modified file 'lisp/international/charprop.el' --- lisp/international/charprop.el 2010-06-09 15:46:41 +0000 +++ lisp/international/charprop.el 2011-06-17 07:07:51 +0000 @@ -1,8 +1,4 @@ -;; Copyright (C) 1991-2010 Unicode, Inc. -;; This file was generated from the Unicode data file at -;; http://www.unicode.org/Public/UNIDATA/UnicodeData.txt. -;; See lisp/international/README for the copyright and permission notice. - +;; Automatically generated by unidata-gen.el. ;; FILE: uni-name.el (define-char-code-property 'name "uni-name.el" "Unicode character name. @@ -45,7 +41,7 @@ ;; FILE: uni-mirrored.el (define-char-code-property 'mirrored "uni-mirrored.el" "Unicode bidi mirrored flag. -Property value is a symbol `Y' or `N'.") +Property value is a symbol `Y' or `N'. See also the property `mirroring'.") ;; FILE: uni-old-name.el (define-char-code-property 'old-name "uni-old-name.el" "Unicode old names as published in Unicode 1.0. @@ -66,6 +62,11 @@ (define-char-code-property 'titlecase "uni-titlecase.el" "Unicode simple titlecase mapping. Property value is a character.") +;; FILE: uni-mirroring.el +(define-char-code-property 'mirroring "uni-mirroring.el" + "Unicode bidi-mirroring characters. +Property value is a character that has the corresponding mirroring image, +or nil for non-mirrored character.") ;; Local Variables: ;; coding: utf-8 ;; no-byte-compile: t === modified file 'lisp/international/mule-cmds.el' --- lisp/international/mule-cmds.el 2011-02-28 01:07:29 +0000 +++ lisp/international/mule-cmds.el 2011-06-04 04:33:19 +0000 @@ -2709,16 +2709,6 @@ \f ;;; Character property -;; Each element has the form (PROP . TABLE). -;; PROP is a symbol representing a character property. -;; TABLE is a char-table containing the property value for each character. -;; TABLE may be a name of file to load to build a char-table. -;; Don't modify this variable directly but use `define-char-code-property'. - -(defvar char-code-property-alist nil - "Alist of character property name vs char-table containing property values. -Internal use only.") - (put 'char-code-property-table 'char-table-extra-slots 5) (defun define-char-code-property (name table &optional docstring) @@ -2776,10 +2766,11 @@ (if (stringp (cdr slot)) (load (cdr slot) nil t)) (setq table (cdr slot) - value (aref table char) func (char-table-extra-slot table 1)) - (if (functionp func) - (setq value (funcall func char value table))) + (cond ((integerp func) + (setq value (get-unicode-property-internal char table))) + ((functionp func) + (setq value (funcall func char (aref table char) table)))) value) (plist-get (aref char-code-property-table char) propname)))) === modified file 'lisp/international/uni-bidi.el' Binary files lisp/international/uni-bidi.el 2010-09-05 02:06:39 +0000 and lisp/international/uni-bidi.el 2011-06-17 07:07:49 +0000 differ === modified file 'lisp/international/uni-category.el' Binary files lisp/international/uni-category.el 2010-09-05 02:06:39 +0000 and lisp/international/uni-category.el 2011-06-17 07:07:49 +0000 differ === modified file 'lisp/international/uni-combining.el' Binary files lisp/international/uni-combining.el 2010-09-05 02:06:39 +0000 and lisp/international/uni-combining.el 2011-06-17 07:07:49 +0000 differ === modified file 'lisp/international/uni-comment.el' Binary files lisp/international/uni-comment.el 2010-06-09 15:46:41 +0000 and lisp/international/uni-comment.el 2011-06-17 07:07:50 +0000 differ === modified file 'lisp/international/uni-decimal.el' Binary files lisp/international/uni-decimal.el 2010-09-05 02:06:39 +0000 and lisp/international/uni-decimal.el 2011-06-17 07:07:50 +0000 differ === modified file 'lisp/international/uni-decomposition.el' Binary files lisp/international/uni-decomposition.el 2010-06-09 15:46:41 +0000 and lisp/international/uni-decomposition.el 2011-06-17 07:07:50 +0000 differ === modified file 'lisp/international/uni-digit.el' Binary files lisp/international/uni-digit.el 2010-06-09 15:46:41 +0000 and lisp/international/uni-digit.el 2011-06-17 07:07:50 +0000 differ === modified file 'lisp/international/uni-lowercase.el' Binary files lisp/international/uni-lowercase.el 2010-06-09 15:46:41 +0000 and lisp/international/uni-lowercase.el 2011-06-17 07:07:51 +0000 differ === modified file 'lisp/international/uni-mirrored.el' Binary files lisp/international/uni-mirrored.el 2010-09-05 02:06:39 +0000 and lisp/international/uni-mirrored.el 2011-06-17 07:07:50 +0000 differ === added file 'lisp/international/uni-mirroring.el' Binary files lisp/international/uni-mirroring.el 1970-01-01 00:00:00 +0000 and lisp/international/uni-mirroring.el 2011-06-17 07:07:51 +0000 differ === modified file 'lisp/international/uni-name.el' Binary files lisp/international/uni-name.el 2010-09-05 02:06:39 +0000 and lisp/international/uni-name.el 2011-06-17 07:07:49 +0000 differ === modified file 'lisp/international/uni-numeric.el' Binary files lisp/international/uni-numeric.el 2010-06-09 15:46:41 +0000 and lisp/international/uni-numeric.el 2011-06-17 07:07:50 +0000 differ === modified file 'lisp/international/uni-old-name.el' Binary files lisp/international/uni-old-name.el 2010-06-09 15:46:41 +0000 and lisp/international/uni-old-name.el 2011-06-17 07:07:50 +0000 differ === modified file 'lisp/international/uni-titlecase.el' Binary files lisp/international/uni-titlecase.el 2010-06-09 15:46:41 +0000 and lisp/international/uni-titlecase.el 2011-06-17 07:07:51 +0000 differ === modified file 'lisp/international/uni-uppercase.el' Binary files lisp/international/uni-uppercase.el 2010-06-09 15:46:41 +0000 and lisp/international/uni-uppercase.el 2011-06-17 07:07:50 +0000 differ === modified file 'src/ChangeLog' --- src/ChangeLog 2011-05-18 03:00:08 +0000 +++ src/ChangeLog 2011-06-17 07:32:14 +0000 @@ -1,3 +1,18 @@ +2011-06-17 Kenichi Handa <handa@m17n.org> + + * chartab.c (sub_char_table_set): Delete it. + (char_table_get_bottom): New function. + (char_table_set): Use char_table_get_bottom. + (UNIPROP_COMPRESSED_FORM_P): New macro. + (uniprop_table_uncompress_simple) + (uniprop_table_uncompress_run_length): New functions. + (uniprop_uncompress, uniprop_uncompressor_count): New variables. + (uniprop_table, uniprop_table_lookup) + (Fget_unicode_property_internal): New functions. + (syms_of_chartab): Defsubr + Sget_unicode_property_internal. Defvar_lisp + char-code-property-alist. + 2011-05-18 Christoph Scholtes <cschol2112@googlemail.com> * menu.c: Include limits.h (fixes the MS-Windows build broken by === modified file 'src/chartab.c' --- src/chartab.c 2011-05-12 07:07:06 +0000 +++ src/chartab.c 2011-06-16 03:54:55 +0000 @@ -330,7 +330,7 @@ return val; } - +#if 0 static void sub_char_table_set (Lisp_Object table, int c, Lisp_Object val) { @@ -354,6 +354,29 @@ sub_char_table_set (sub, c, val); } } +#endif + +static Lisp_Object +char_table_get_bottom (struct Lisp_Char_Table *tbl, int c) +{ + Lisp_Object *sub = tbl->contents + CHARTAB_IDX (c, 0, 0); + int depth; + + if (! SUB_CHAR_TABLE_P (*sub)) + *sub = make_sub_char_table (1, c & ~(chartab_chars[0] - 1), *sub); + for (depth = 1; depth < 3; depth++) + { + int min_char = XINT (XSUB_CHAR_TABLE (*sub)->min_char); + int idx = CHARTAB_IDX (c, depth, min_char); + + sub = XSUB_CHAR_TABLE (*sub)->contents + idx; + if (! SUB_CHAR_TABLE_P (*sub)) + *sub = make_sub_char_table (depth + 1, c & ~(chartab_chars[depth] - 1), + *sub); + } + return *sub; +} + Lisp_Object char_table_set (Lisp_Object table, int c, Lisp_Object val) @@ -367,18 +390,11 @@ } else { - int i = CHARTAB_IDX (c, 0, 0); - Lisp_Object sub; + Lisp_Object sub_table = char_table_get_bottom (tbl, c); - sub = tbl->contents[i]; - if (! SUB_CHAR_TABLE_P (sub)) - { - sub = make_sub_char_table (1, i * chartab_chars[0], sub); - tbl->contents[i] = sub; - } - sub_char_table_set (sub, c, val); + XSUB_CHAR_TABLE (sub_table)->contents[c & (chartab_chars[2] - 1)] = val; if (ASCII_CHAR_P (c)) - tbl->ascii = char_table_ascii (table); + tbl->ascii = sub_table; } return val; } @@ -984,6 +1000,213 @@ } \f +/* Unicode character property + + This section provides a convenient and efficient way to get a + Unicode character property from C code (from Lisp, you can use + get-char-code-property). + + The typical usage is to get a char-table for a specific property at + a proper initialization time as this: + + Lisp_Object bidi_class_table = uniprop_table (intern ("bidi-class")); + + and get a property value for character CH as this: + + Lisp_Object bidi_class = uniprop_table_lookup (CH, bidi_class_table); + + In this case, what you actually get is an index number to the + vector of property values (symbols nil, L, R, etc). See the + comment of uniprop_table_lookup for the detail. */ + +/* Nonzero iff OBJ is a string representing Unicode character + properties of 128 succeeding characters (the bottom level of a + char-table) by a compressed format. We are sure that no property + value has a string starting from NULL character. */ + +#define UNIPROP_COMPRESSED_FORM_P(OBJ) (STRINGP (OBJ) && (SREF (OBJ, 0) == 0)) + + +/* The first element of uniprop_uncompress (which see). + This decodes the compressed data of "SIMPLE TABLE" (see the comment + in admin/unidata/unidata-gen.el. */ + +static Lisp_Object +uniprop_table_uncompress_simple (Lisp_Object str, + struct Lisp_Sub_Char_Table *subtbl) +{ + const unsigned char *p, *pend; + int i, idx; + + p = SDATA (str) + 1, pend = p + SBYTES (str); + idx = STRING_CHAR_ADVANCE (p); + for (i = 0; i < idx; i++) + subtbl->contents[idx] = Qnil; + while (p < pend && idx < chartab_chars[2]) + { + int v = STRING_CHAR_ADVANCE (p); + subtbl->contents[idx++] = v > 0 ? make_number (v) : Qnil; + } + while (idx < chartab_chars[2]) + subtbl->contents[idx++] = Qnil; + return Qnil; +} + +/* The second element of uniprop_uncompress (which see). + This decodes the compressed data of "RUN-LENGTH TABLE" (see the + comment in admin/unidata/unidata-gen.el. */ + +static Lisp_Object +uniprop_table_uncompress_run_length (Lisp_Object str, + struct Lisp_Sub_Char_Table *subtbl) +{ + const unsigned char *p, *pend; + int idx; + + p = SDATA (str) + 1, pend = p + SBYTES (str); + for (idx = 0; p < pend; ) + { + int v = STRING_CHAR_ADVANCE (p); + int count = 1; + int len; + + if (p < pend) + { + count = STRING_CHAR_AND_LENGTH (p, len); + if (count < 128) + count = 1; + else + { + count -= 128; + p += len; + } + } + while (count-- > 0) + subtbl->contents[idx++] = make_number (v); + } + return Qnil; +} + +#if 0 +/* It seems that we don't need this function because C code won't need + to get a property that is compressed in this form. */ + +/* The third element of uniprop_uncompress (which see). + This decodes the compressed data of "WORD-LIST TABLE" (see the + comment in admin/unidata/unidata-gen.el. */ + +static Lisp_Object +uniprop_table_uncompress_word_list (Lisp_Object str, + struct Lisp_Sub_Char_Table *subtbl) +{ + return Qnil; +} +#endif + +/* Array of functions that decode the compressed property values for + consecutive 128 characters in STR, and store each value in the + sub-chartable SUBTBL. */ + +static Lisp_Object (*uniprop_uncompress []) (Lisp_Object, + struct Lisp_Sub_Char_Table *) + = { uniprop_table_uncompress_simple, + uniprop_table_uncompress_run_length}; + +/* How many elements uniprop_uncompress has. */ + +static int uniprop_uncompressor_count = + (sizeof uniprop_uncompress) / (sizeof uniprop_uncompress[0]); + +/* Return a char-table for Unicode character property PROPNAME. This + function may load a Lisp file and thus may cause + garbage-collection. */ + +Lisp_Object +uniprop_table (Lisp_Object propname) +{ + Lisp_Object val, table; + + val = Fassq (propname, Vchar_code_property_alist); + if (! CONSP (val)) + return Qnil; + table = XCDR (val); + if (STRINGP (table)) + { + Lisp_Object result = Fload (AREF (table, 0), Qt, Qt, Qt, Qt); + if (NILP (result)) + return Qnil; + table = XCDR (val); + } + if (NILP (table)) + return Qnil; + val = XCHAR_TABLE (table)->extras[1]; + if (INTEGERP (val) + && XINT (val) >= 0 && XINT (val) < uniprop_uncompressor_count) + return table; + return Qnil; +} + +/* Return a Unicode character property of character C stored in TABLE. + TABLE must be what the function uniprop_table returns. + + If XCHAR_TABLE (TABLE)->extras[4] is a vector, the returned value + is an index number to that vector which contains the actual + property value. */ + +Lisp_Object +uniprop_table_lookup (int c, Lisp_Object table) +{ + struct Lisp_Char_Table *tbl = XCHAR_TABLE (table); + struct Lisp_Sub_Char_Table *subtbl; + Lisp_Object sub_table, val; + + if (ASCII_CHAR_P (c)) + { + if (SUB_CHAR_TABLE_P (tbl->ascii)) + return XSUB_CHAR_TABLE (tbl->ascii)->contents[c]; + /* Always unfold the bottom sub-table for ASCII chars. */ + tbl->ascii = sub_table = char_table_get_bottom (tbl, c); + subtbl = XSUB_CHAR_TABLE (sub_table); + val = subtbl->contents[c]; + if (UNIPROP_COMPRESSED_FORM_P (val)) + uniprop_uncompress[XINT (tbl->extras[1])] (val, subtbl); + } + else + { + val = char_table_ref (table, c); + if (! UNIPROP_COMPRESSED_FORM_P (val)) + return val; + sub_table = char_table_get_bottom (tbl, c); + subtbl = XSUB_CHAR_TABLE (sub_table); + uniprop_uncompress[XINT (tbl->extras[1])] (val, subtbl); + } + return subtbl->contents[c - XINT (subtbl->min_char)]; +} + +DEFUN ("get-unicode-property-internal", Fget_unicode_property_internal, + Sget_unicode_property_internal, 2, 2, 0, + doc: /* Get Unicode character property of CH stored in TABLE. +Internal use only. */) + (Lisp_Object ch, Lisp_Object table) +{ + Lisp_Object val; + + CHECK_CHARACTER (ch); + CHECK_CHAR_TABLE (table); + val = uniprop_table_lookup (XINT (ch), table); + if (INTEGERP (val) + && VECTORP (XCHAR_TABLE (table)->extras[4])) + { + Lisp_Object val_vec = XCHAR_TABLE (table)->extras[4]; + if (XINT (val) > 0 && XINT (val) < ASIZE (val_vec)) + val = AREF (val_vec, XINT (val) - 1); + else + val = Qnil; + } + return val; +} + +\f void syms_of_chartab (void) { @@ -998,4 +1221,17 @@ defsubr (&Sset_char_table_default); defsubr (&Soptimize_char_table); defsubr (&Smap_char_table); + defsubr (&Sget_unicode_property_internal); + + /* Each element has the form (PROP . TABLE). + PROP is a symbol representing a character property. + TABLE is a char-table containing the property value for each character. + TABLE may be a name of file to load to build a char-table. + This variable should be modified only through + `define-char-code-property'. */ + + DEFVAR_LISP ("char-code-property-alist", Vchar_code_property_alist, + doc: /* Alist of character property name vs char-table containing property values. +Internal use only. */); + Vchar_code_property_alist = Qnil; } ^ permalink raw reply [flat|nested] 182+ messages in thread
* Re: idn.el and confusables.txt 2011-06-17 8:15 ` Kenichi Handa @ 2011-06-17 15:12 ` Eli Zaretskii 2011-06-21 2:07 ` Kenichi Handa 0 siblings, 1 reply; 182+ messages in thread From: Eli Zaretskii @ 2011-06-17 15:12 UTC (permalink / raw) To: Kenichi Handa; +Cc: emacs-devel [Resending, as the original message didn't get through.] > From: Kenichi Handa <handa@m17n.org> > Cc: emacs-devel@gnu.org > Date: Fri, 17 Jun 2011 17:15:23 +0900 > > In article <tl7r57wo2wc.fsf@m17n.org>, handa <handa@m17n.org> writes: > > > But, at first, I'll work on improving C interface to > > uni-*.el. > > Attached is the first version for that. It provides two C > functions (excerpt from chartab.c). Thanks! > Lisp_Object bidi_class_table = uniprop_table (intern ("bidi-class")); > > and get a property value for character CH as this: > > Lisp_Object bidi_class = uniprop_table_lookup (CH, bidi_class_table); > > In this case, what you actually get is an index number to the > vector of property values (symbols nil, L, R, etc). See the > comment of uniprop_table_lookup for the detail. */ > > Eli, I arranged that bidi_class got as above is an integer > value that is the same as enum bidi_type_t defined in > dispextern.h. This sounds good, but I think we should add a comment to both dispextern.h and where you create the properties regarding the fact that the order must be in sync if any changes are made in any of these two places. > And, updata_table (intern "mirroring") gives a char-table for > mirroring character. And what is the entry in that table for characters that don't have a mirrored character defined by BidiMirroring.txt? Is it the character itself or nil? Also, uni-mirroring.el will clash with uni-mirrorred.el on 8+3 filesystems. Can't we use the same uni-mirrorred.el file for both tables? > Could you check if the attached patch gives sufficient > facility to be used in the bidi code? I will do that, thanks. But please wait for a while, as I'd like to finish the changes needed for string reordering first. Thanks again for working on this. ^ permalink raw reply [flat|nested] 182+ messages in thread
* Re: idn.el and confusables.txt 2011-06-17 15:12 ` Eli Zaretskii @ 2011-06-21 2:07 ` Kenichi Handa 2011-06-21 2:53 ` Eli Zaretskii ` (2 more replies) 0 siblings, 3 replies; 182+ messages in thread From: Kenichi Handa @ 2011-06-21 2:07 UTC (permalink / raw) To: Eli Zaretskii; +Cc: emacs-devel In article <83vcw4wkxy.fsf@gnu.org>, Eli Zaretskii <eliz@gnu.org> writes: > > Lisp_Object bidi_class_table = uniprop_table (intern ("bidi-class")); > > > > and get a property value for character CH as this: > > > > Lisp_Object bidi_class = uniprop_table_lookup (CH, bidi_class_table); > > > > In this case, what you actually get is an index number to the > > vector of property values (symbols nil, L, R, etc). See the > > comment of uniprop_table_lookup for the detail. */ > > > > Eli, I arranged that bidi_class got as above is an integer > > value that is the same as enum bidi_type_t defined in > > dispextern.h. > This sounds good, but I think we should add a comment to both > dispextern.h and where you create the properties regarding the fact > that the order must be in sync if any changes are made in any of these > two places. Ok. By the way, as I got an idea of making the normal CHAR_TABLE_REF macro usable for getting an element of such table, please wait until I implement it. > > And, updata_table (intern "mirroring") gives a char-table for > > mirroring character. > And what is the entry in that table for characters that don't have a > mirrored character defined by BidiMirroring.txt? Is it the character > itself or nil? It's nil. We can make it the character itself, but that makes the uni-mirroring.el file big. Currently, the size is just 3605 bytes (half of bidimirror.h). > Also, uni-mirroring.el will clash with uni-mirrorred.el on 8+3 > filesystems. Can't we use the same uni-mirrorred.el file for both > tables? If it is agreed to break the backward compatibility, I'll do that. At the moment, at least in Emacs code, uni-mirrorred.el is not used. --- Kenichi Handa handa@m17n.org ^ permalink raw reply [flat|nested] 182+ messages in thread
* Re: idn.el and confusables.txt 2011-06-21 2:07 ` Kenichi Handa @ 2011-06-21 2:53 ` Eli Zaretskii 2011-06-21 3:29 ` Kenichi Handa 2011-07-07 6:10 ` C interface to Unicode character property char-tables Kenichi Handa 2011-08-06 16:52 ` Using uniprop_table_lookup (was: idn.el and confusables.txt) Eli Zaretskii 2 siblings, 1 reply; 182+ messages in thread From: Eli Zaretskii @ 2011-06-21 2:53 UTC (permalink / raw) To: Kenichi Handa; +Cc: emacs-devel > From: Kenichi Handa <handa@m17n.org> > Cc: emacs-devel@gnu.org > Date: Tue, 21 Jun 2011 11:07:02 +0900 > > By the way, as I got an idea of making the normal > CHAR_TABLE_REF macro usable for getting an element of such > table, please wait until I implement it. Thanks, will do. > > And what is the entry in that table for characters that don't have a > > mirrored character defined by BidiMirroring.txt? Is it the character > > itself or nil? > > It's nil. We can make it the character itself, but that > makes the uni-mirroring.el file big. Currently, the size is > just 3605 bytes (half of bidimirror.h). nil is fine, it's what bidi.c expects. > > Also, uni-mirroring.el will clash with uni-mirrorred.el on 8+3 > > filesystems. Can't we use the same uni-mirrorred.el file for both > > tables? > > If it is agreed to break the backward compatibility, I'll do > that. At the moment, at least in Emacs code, > uni-mirrorred.el is not used. I'm not sure what backward compatibility this would break. Adding another table to a file hardly breaks something, can it? ^ permalink raw reply [flat|nested] 182+ messages in thread
* Re: idn.el and confusables.txt 2011-06-21 2:53 ` Eli Zaretskii @ 2011-06-21 3:29 ` Kenichi Handa 2011-06-21 6:11 ` Eli Zaretskii 0 siblings, 1 reply; 182+ messages in thread From: Kenichi Handa @ 2011-06-21 3:29 UTC (permalink / raw) To: Eli Zaretskii; +Cc: emacs-devel In article <83zklbvqrd.fsf@gnu.org>, Eli Zaretskii <eliz@gnu.org> writes: > > > Also, uni-mirroring.el will clash with uni-mirrorred.el on 8+3 > > > filesystems. Can't we use the same uni-mirrorred.el file for both > > > tables? > > > > If it is agreed to break the backward compatibility, I'll do > > that. At the moment, at least in Emacs code, > > uni-mirrorred.el is not used. > I'm not sure what backward compatibility this would break. Adding > another table to a file hardly breaks something, can it? Ah, I misunderstood what you meant. Yes, it's possible to add one more define-char-code-property call in uni-mirrorred.el. One disadvantage is that you only need uni-mirroring property but uni-mirrored property is also loaded. --- Kenichi Handa handa@m17n.org ^ permalink raw reply [flat|nested] 182+ messages in thread
* Re: idn.el and confusables.txt 2011-06-21 3:29 ` Kenichi Handa @ 2011-06-21 6:11 ` Eli Zaretskii 2011-06-21 7:22 ` Kenichi Handa 0 siblings, 1 reply; 182+ messages in thread From: Eli Zaretskii @ 2011-06-21 6:11 UTC (permalink / raw) To: Kenichi Handa; +Cc: emacs-devel > From: Kenichi Handa <handa@m17n.org> > Cc: emacs-devel@gnu.org > Date: Tue, 21 Jun 2011 12:29:21 +0900 > > Yes, it's possible to add one more define-char-code-property call in > uni-mirrorred.el. One disadvantage is that you only need > uni-mirroring property but uni-mirrored property is also loaded. That should be fine, because both tables are very small. Btw, this begs a question: should we preload uni-bidi.el and uni-mirrored.el in Emacs 24? If we don't, it will be loaded right away on startup, I think, since bidi.c will need it. Stefan, Chong, WDYT? ^ permalink raw reply [flat|nested] 182+ messages in thread
* Re: idn.el and confusables.txt 2011-06-21 6:11 ` Eli Zaretskii @ 2011-06-21 7:22 ` Kenichi Handa 2011-06-21 7:34 ` Eli Zaretskii 0 siblings, 1 reply; 182+ messages in thread From: Kenichi Handa @ 2011-06-21 7:22 UTC (permalink / raw) To: Eli Zaretskii; +Cc: emacs-devel In article <E1QYuB6-0004u1-8i@fencepost.gnu.org>, Eli Zaretskii <eliz@gnu.org> writes: > > From: Kenichi Handa <handa@m17n.org> > > Cc: emacs-devel@gnu.org > > Date: Tue, 21 Jun 2011 12:29:21 +0900 > > > > Yes, it's possible to add one more define-char-code-property call in > > uni-mirrorred.el. One disadvantage is that you only need > > uni-mirroring property but uni-mirrored property is also loaded. > That should be fine, because both tables are very small. > Btw, this begs a question: should we preload uni-bidi.el and > uni-mirrored.el in Emacs 24? If we don't, it will be loaded right > away on startup, I think, since bidi.c will need it. Stefan, Chong, > WDYT? Isn't it possible to load it where we surely know that the table is necessarily. For instance, in bidi_init_it. --- Kenichi Handa handa@m17n.org ^ permalink raw reply [flat|nested] 182+ messages in thread
* Re: idn.el and confusables.txt 2011-06-21 7:22 ` Kenichi Handa @ 2011-06-21 7:34 ` Eli Zaretskii 2011-06-21 8:02 ` Kenichi Handa 0 siblings, 1 reply; 182+ messages in thread From: Eli Zaretskii @ 2011-06-21 7:34 UTC (permalink / raw) To: Kenichi Handa; +Cc: emacs-devel > From: Kenichi Handa <handa@m17n.org> > Cc: emacs-devel@gnu.org > Date: Tue, 21 Jun 2011 16:22:23 +0900 > > > Btw, this begs a question: should we preload uni-bidi.el and > > uni-mirrored.el in Emacs 24? If we don't, it will be loaded right > > away on startup, I think, since bidi.c will need it. Stefan, Chong, > > WDYT? > > Isn't it possible to load it where we surely know that the > table is necessarily. For instance, in bidi_init_it. Yes, of course. But when bidi-display-reordering is turned on by default, bidi_init will _always_ load these 2 files at startup, whenever it needs to display the first thing on the Emacs frame (mode line or menu/tool bar, I think). AFAIK, it has been our policy since before I remember to preload every Lisp file that is always loaded at startup. ^ permalink raw reply [flat|nested] 182+ messages in thread
* Re: idn.el and confusables.txt 2011-06-21 7:34 ` Eli Zaretskii @ 2011-06-21 8:02 ` Kenichi Handa 2011-06-21 10:30 ` bidi at startup (was: idn.el and confusables.txt) Eli Zaretskii 0 siblings, 1 reply; 182+ messages in thread From: Kenichi Handa @ 2011-06-21 8:02 UTC (permalink / raw) To: Eli Zaretskii; +Cc: emacs-devel In article <E1QYvTe-0002I4-79@fencepost.gnu.org>, Eli Zaretskii <eliz@gnu.org> writes: > > Isn't it possible to load it where we surely know that the > > table is necessarily. For instance, in bidi_init_it. > Yes, of course. But when bidi-display-reordering is turned on by > default, bidi_init will _always_ load these 2 files at startup, > whenever it needs to display the first thing on the Emacs frame (mode > line or menu/tool bar, I think). AFAIK, it has been our policy since > before I remember to preload every Lisp file that is always loaded at > startup. Is it used when we run emacs with -batch? --- Kenichi Handa handa@m17n.org ^ permalink raw reply [flat|nested] 182+ messages in thread
* Re: bidi at startup (was: idn.el and confusables.txt) 2011-06-21 8:02 ` Kenichi Handa @ 2011-06-21 10:30 ` Eli Zaretskii 2011-06-21 15:12 ` bidi at startup Stefan Monnier 0 siblings, 1 reply; 182+ messages in thread From: Eli Zaretskii @ 2011-06-21 10:30 UTC (permalink / raw) To: Kenichi Handa; +Cc: emacs-devel > From: Kenichi Handa <handa@m17n.org> > Cc: emacs-devel@gnu.org > Date: Tue, 21 Jun 2011 17:02:46 +0900 > > > Yes, of course. But when bidi-display-reordering is turned on by > > default, bidi_init will _always_ load these 2 files at startup, > > whenever it needs to display the first thing on the Emacs frame (mode > > line or menu/tool bar, I think). AFAIK, it has been our policy since > > before I remember to preload every Lisp file that is always loaded at > > startup. > > Is it used when we run emacs with -batch? That's an excellent question, one that I meant to raise a bit later, but since we are talking... I think in -batch mode Emacs should not reorder bidirectional text. Mainly because if the underlying terminal is bidi-aware, it will reorder it by itself, and if it isn't aware, most probably it doesn't support the corresponding scripts anyway, and scripts that need reordering will come out garbled. So I think bidi-display-reordering should be set early during startup, but only in interactive sessions, i.e. dynamically. Comments? ^ permalink raw reply [flat|nested] 182+ messages in thread
* Re: bidi at startup 2011-06-21 10:30 ` bidi at startup (was: idn.el and confusables.txt) Eli Zaretskii @ 2011-06-21 15:12 ` Stefan Monnier 2011-06-21 17:13 ` Eli Zaretskii 0 siblings, 1 reply; 182+ messages in thread From: Stefan Monnier @ 2011-06-21 15:12 UTC (permalink / raw) To: Eli Zaretskii; +Cc: emacs-devel, Kenichi Handa >> > Yes, of course. But when bidi-display-reordering is turned on by >> > default, bidi_init will _always_ load these 2 files at startup, >> > whenever it needs to display the first thing on the Emacs frame (mode >> > line or menu/tool bar, I think). AFAIK, it has been our policy since >> > before I remember to preload every Lisp file that is always loaded at >> > startup. Yes, if the file is needed when you just run "emacs" with an empty .emacs file, then it definitely should be preloaded. >> Is it used when we run emacs with -batch? That doesn't matter for preloading. > I think in -batch mode Emacs should not reorder bidirectional text. > Mainly because if the underlying terminal is bidi-aware, it will > reorder it by itself, and if it isn't aware, most probably it doesn't > support the corresponding scripts anyway, and scripts that need > reordering will come out garbled. > So I think bidi-display-reordering should be set early during startup, > but only in interactive sessions, i.e. dynamically. As mentioned a few days ago in another thread, thinking in terms of "interactive/noninteractive" session is the wrong way to look at it. Instead, you want to think about which terminal is used. E.g. think of the case of an Emacs server that did not daemonize: some terminals will be GUIs, others will be ttys, and there'll be one more which is just stdin/stdout. IIUC you're saying that the special initial terminal (bound to stdin/stdout) should not do bib-processing, which sounds fine. Stefan ^ permalink raw reply [flat|nested] 182+ messages in thread
* Re: bidi at startup 2011-06-21 15:12 ` bidi at startup Stefan Monnier @ 2011-06-21 17:13 ` Eli Zaretskii 2011-06-22 15:32 ` Stefan Monnier 0 siblings, 1 reply; 182+ messages in thread From: Eli Zaretskii @ 2011-06-21 17:13 UTC (permalink / raw) To: Stefan Monnier; +Cc: emacs-devel, handa > From: Stefan Monnier <monnier@iro.umontreal.ca> > Cc: Kenichi Handa <handa@m17n.org>, emacs-devel@gnu.org > Date: Tue, 21 Jun 2011 11:12:12 -0400 > > >> Is it used when we run emacs with -batch? > > That doesn't matter for preloading. I think it does, because temacs is normally run with -batch. > > I think in -batch mode Emacs should not reorder bidirectional text. > > Mainly because if the underlying terminal is bidi-aware, it will > > reorder it by itself, and if it isn't aware, most probably it doesn't > > support the corresponding scripts anyway, and scripts that need > > reordering will come out garbled. > > > So I think bidi-display-reordering should be set early during startup, > > but only in interactive sessions, i.e. dynamically. > > As mentioned a few days ago in another thread, thinking in terms of > "interactive/noninteractive" session is the wrong way to look at it. Maybe in general, but not in this case, see below. > Instead, you want to think about which terminal is used. E.g. think of > the case of an Emacs server that did not daemonize: some terminals will > be GUIs, others will be ttys The device where Emacs displays is of no importance, as long as the display engine supports it, because the bidi reordering works on all of them. > and there'll be one more which is just > stdin/stdout. IIUC you're saying that the special initial terminal > (bound to stdin/stdout) should not do bib-processing, which sounds fine. No. stdout is not a display device in this context; a text terminal is. When Emacs writes to stdout from `message' and such likes, the result doesn't trigger redisplay, and thus doesn't invoke the reordering. Reordering happens when the result of `message' is inserted into some buffer, like the minibuffer, because that triggers redisplay, which reorders. What I was saying was that "emacs -batch" does not need reordering services, because text we write to stdout should not be reordered. Therefore, I think bidi-display-reordering should be initialized to nil in xdisp.c, then set to t during startup if Emacs is invoked interactively or as a daemon. Do you agree? ^ permalink raw reply [flat|nested] 182+ messages in thread
* Re: bidi at startup 2011-06-21 17:13 ` Eli Zaretskii @ 2011-06-22 15:32 ` Stefan Monnier 0 siblings, 0 replies; 182+ messages in thread From: Stefan Monnier @ 2011-06-22 15:32 UTC (permalink / raw) To: Eli Zaretskii; +Cc: emacs-devel, handa >> >> Is it used when we run emacs with -batch? >> That doesn't matter for preloading. > I think it does, because temacs is normally run with -batch. I don't see how that's relevant. > No. stdout is not a display device in this context; a text terminal > is. When Emacs writes to stdout from `message' and such likes, the > result doesn't trigger redisplay, and thus doesn't invoke the > reordering. Good. > What I was saying was that "emacs -batch" does not need reordering > services, because text we write to stdout should not be reordered. > Therefore, I think bidi-display-reordering should be initialized to > nil in xdisp.c, then set to t during startup if Emacs is invoked > interactively or as a daemon. Do you agree? No. I still don't see the harm in setting it to t all the time. From what you're saying bidi only happens during redisplay and redisplay does not happen in --batch mode, so the setting of bidi-display-reordering is irrelevant in such a situation and might as well be the same as in other cases. Stefan ^ permalink raw reply [flat|nested] 182+ messages in thread
* C interface to Unicode character property char-tables. 2011-06-21 2:07 ` Kenichi Handa 2011-06-21 2:53 ` Eli Zaretskii @ 2011-07-07 6:10 ` Kenichi Handa 2011-08-06 16:52 ` Using uniprop_table_lookup (was: idn.el and confusables.txt) Eli Zaretskii 2 siblings, 0 replies; 182+ messages in thread From: Kenichi Handa @ 2011-07-07 6:10 UTC (permalink / raw) To: Kenichi Handa; +Cc: eliz, emacs-devel In article <tl7aadcj5tl.fsf@m17n.org>, Kenichi Handa <handa@m17n.org> writes: > Ok. By the way, as I got an idea of making the normal > CHAR_TABLE_REF macro usable for getting an element of such > table, please wait until I implement it. I've just committed that change to the trunk. --- Kenichi Handa handa@m17n.org ^ permalink raw reply [flat|nested] 182+ messages in thread
* Re: Using uniprop_table_lookup (was: idn.el and confusables.txt) 2011-06-21 2:07 ` Kenichi Handa 2011-06-21 2:53 ` Eli Zaretskii 2011-07-07 6:10 ` C interface to Unicode character property char-tables Kenichi Handa @ 2011-08-06 16:52 ` Eli Zaretskii 2011-08-09 0:55 ` Kenichi Handa 2 siblings, 1 reply; 182+ messages in thread From: Eli Zaretskii @ 2011-08-06 16:52 UTC (permalink / raw) To: Kenichi Handa; +Cc: emacs-devel > From: Kenichi Handa <handa@m17n.org> > Cc: emacs-devel@gnu.org > Date: Tue, 21 Jun 2011 11:07:02 +0900 > > In article <83vcw4wkxy.fsf@gnu.org>, Eli Zaretskii <eliz@gnu.org> writes: > > > Lisp_Object bidi_class_table = uniprop_table (intern ("bidi-class")); > > > > > > and get a property value for character CH as this: > > > > > > Lisp_Object bidi_class = uniprop_table_lookup (CH, bidi_class_table); > > > > > > > In this case, what you actually get is an index number to the > > > vector of property values (symbols nil, L, R, etc). See the > > > comment of uniprop_table_lookup for the detail. */ > > > > > > Eli, I arranged that bidi_class got as above is an integer > > > value that is the same as enum bidi_type_t defined in > > > dispextern.h. > > > This sounds good, but I think we should add a comment to both > > dispextern.h and where you create the properties regarding the fact > > that the order must be in sync if any changes are made in any of these > > two places. > > Ok. By the way, as I got an idea of making the normal > CHAR_TABLE_REF macro usable for getting an element of such > table, please wait until I implement it. I'm getting ready to use this feature instead of src/biditype.h. I understand that CHAR_TABLE_REF is usable for this job already, and the updated way of using uniprop_table in bidi.c is this: Lisp_Object bidi_class_table = uniprop_table (intern ("bidi-class")); ... Lisp_Object bidi_class = CHAR_TABLE_REF (CH, bidi_class_table); If so, do I need to staticpro the bidi_class_table variable above? It will be initialized only once, at startup time, and then used every time bidi.c needs to look up the bidirectional properties of a character. Thanks. ^ permalink raw reply [flat|nested] 182+ messages in thread
* Re: Using uniprop_table_lookup (was: idn.el and confusables.txt) 2011-08-06 16:52 ` Using uniprop_table_lookup (was: idn.el and confusables.txt) Eli Zaretskii @ 2011-08-09 0:55 ` Kenichi Handa 2011-08-09 1:32 ` Using uniprop_table_lookup Stefan Monnier 0 siblings, 1 reply; 182+ messages in thread From: Kenichi Handa @ 2011-08-09 0:55 UTC (permalink / raw) To: Eli Zaretskii; +Cc: emacs-devel In article <838vr6wkh4.fsf@gnu.org>, Eli Zaretskii <eliz@gnu.org> writes: > I'm getting ready to use this feature instead of src/biditype.h. I > understand that CHAR_TABLE_REF is usable for this job already, and the > updated way of using uniprop_table in bidi.c is this: > Lisp_Object bidi_class_table = uniprop_table (intern ("bidi-class")); > ... > Lisp_Object bidi_class = CHAR_TABLE_REF (CH, bidi_class_table); > If so, do I need to staticpro the bidi_class_table variable above? No. All returned tables by uniprop_table are stored in Vchar_code_property_alist which is DEFVAR_LISPed in chartab.c. It means that the next call of uniprop_table with the same PROP argument just returns the already loaded one, and thus fast. --- Kenichi Handa handa@m17n.org ^ permalink raw reply [flat|nested] 182+ messages in thread
* Re: Using uniprop_table_lookup 2011-08-09 0:55 ` Kenichi Handa @ 2011-08-09 1:32 ` Stefan Monnier 2011-08-09 4:31 ` Kenichi Handa 0 siblings, 1 reply; 182+ messages in thread From: Stefan Monnier @ 2011-08-09 1:32 UTC (permalink / raw) To: Kenichi Handa; +Cc: Eli Zaretskii, emacs-devel >> I'm getting ready to use this feature instead of src/biditype.h. I >> understand that CHAR_TABLE_REF is usable for this job already, and the >> updated way of using uniprop_table in bidi.c is this: >> Lisp_Object bidi_class_table = uniprop_table (intern ("bidi-class")); >> ... >> Lisp_Object bidi_class = CHAR_TABLE_REF (CH, bidi_class_table); >> If so, do I need to staticpro the bidi_class_table variable above? > No. All returned tables by uniprop_table are stored in > Vchar_code_property_alist which is DEFVAR_LISPed in > chartab.c. It means that the next call of uniprop_table > with the same PROP argument just returns the already loaded > one, and thus fast. But since Eli says that he initializes bidi_class_table only once and never touches it again, if he doesn't staticpro that variable and someone does someone changes Vchar_code_property_alist, you could end up with bidi_class_table pointing to a dead object. There's no point wasting time trying to figure out whether a global Lisp_Object should be staticpro'd or not: just staticpro it and move on. In 99% of the cases it's the only safe choice and the remaining 1% is not worth the trouble worrying about. Stefan ^ permalink raw reply [flat|nested] 182+ messages in thread
* Re: Using uniprop_table_lookup 2011-08-09 1:32 ` Using uniprop_table_lookup Stefan Monnier @ 2011-08-09 4:31 ` Kenichi Handa 2011-08-15 8:57 ` Eli Zaretskii 0 siblings, 1 reply; 182+ messages in thread From: Kenichi Handa @ 2011-08-09 4:31 UTC (permalink / raw) To: Stefan Monnier; +Cc: eliz, emacs-devel In article <jwvty9rfk32.fsf-monnier+emacs@gnu.org>, Stefan Monnier <monnier@iro.umontreal.ca> writes: > > No. All returned tables by uniprop_table are stored in > > Vchar_code_property_alist which is DEFVAR_LISPed in > > chartab.c. It means that the next call of uniprop_table > > with the same PROP argument just returns the already loaded > > one, and thus fast. > But since Eli says that he initializes bidi_class_table only once and > never touches it again, if he doesn't staticpro that variable and > someone does someone changes Vchar_code_property_alist, you could end up > with bidi_class_table pointing to a dead object. Ah, yes, you are right. I misunderstood. > There's no point wasting time trying to figure out whether a global > Lisp_Object should be staticpro'd or not: just staticpro it and > move on. In 99% of the cases it's the only safe choice and the > remaining 1% is not worth the trouble worrying about. --- Kenichi Handa handa@m17n.org ^ permalink raw reply [flat|nested] 182+ messages in thread
* Re: Using uniprop_table_lookup 2011-08-09 4:31 ` Kenichi Handa @ 2011-08-15 8:57 ` Eli Zaretskii 0 siblings, 0 replies; 182+ messages in thread From: Eli Zaretskii @ 2011-08-15 8:57 UTC (permalink / raw) To: Kenichi Handa; +Cc: monnier, emacs-devel > From: Kenichi Handa <handa@m17n.org> > Cc: eliz@gnu.org, emacs-devel@gnu.org > Date: Tue, 09 Aug 2011 13:31:00 +0900 > > In article <jwvty9rfk32.fsf-monnier+emacs@gnu.org>, Stefan Monnier <monnier@iro.umontreal.ca> writes: > > > No. All returned tables by uniprop_table are stored in > > > Vchar_code_property_alist which is DEFVAR_LISPed in > > > chartab.c. It means that the next call of uniprop_table > > > with the same PROP argument just returns the already loaded > > > one, and thus fast. > > > But since Eli says that he initializes bidi_class_table only once and > > never touches it again, if he doesn't staticpro that variable and > > someone does someone changes Vchar_code_property_alist, you could end up > > with bidi_class_table pointing to a dead object. > > Ah, yes, you are right. I misunderstood. I have another question. The commentary in chartab.c says: and get a property value for character CH as this: Lisp_Object bidi_class = CHAR_TABLE_REF (CH, bidi_class_table); In this case, what you actually get is an index number to the vector of property values (symbols nil, L, R, etc). What does it mean by "in this case"? is this something specific to the "bidi-class" property? If yes, what will the call to CHAR_TABLE_REF yield in the general case? It is essential to document that, for the C code to be able to use any Unicode character property through this API. The commentary also says: o The second extra slot is a Lisp function, an index (integer) to the array uniprop_decoder[], or nil. If it is a Lisp function, we can't use such a table from C (at the moment). If it is nil, it means that we don't have to decode values. o The third extra slot is a Lisp function, an index (integer) to the array uniprop_encoder[], or nil. If it is a Lisp function, we can't use such a table from C (at the moment). If it is nil, it means that we don't have to encode values. This doesn't say anything about the indices into uniprop_decoder[] and uniprop_encoder[] arrays, which I believe are values that need to be handled by the C code. Can you tell something about that? I would like to improve this commentary with the additional information. TIA ^ permalink raw reply [flat|nested] 182+ messages in thread
* uni-confusables 0.1 is on the Emacs ELPA branch (was: idn.el and confusables.txt) 2011-04-26 15:28 ` idn.el and confusables.txt (was: markchars.el 0.2.0 and idn.el) Ted Zlatanov 2011-05-13 19:42 ` idn.el and confusables.txt Stefan Monnier @ 2011-05-31 10:42 ` Ted Zlatanov 2011-06-08 10:42 ` uni-confusables 0.1 is on the Emacs ELPA branch Ted Zlatanov 1 sibling, 1 reply; 182+ messages in thread From: Ted Zlatanov @ 2011-05-31 10:42 UTC (permalink / raw) To: emacs-devel I've put uni-confusables.el on the ELPA branch. Stefan, could you please check it? I tried to stick to the *-pkg.el format you specified in archive-contents.el, but I may have missed something. It's the first multi-file package I've bundled. Also I don't know how the license should look. If you prefer I could just put uni-confusables.el there, but I thought the generator and the source database were important too. Ted ^ permalink raw reply [flat|nested] 182+ messages in thread
* Re: uni-confusables 0.1 is on the Emacs ELPA branch 2011-05-31 10:42 ` uni-confusables 0.1 is on the Emacs ELPA branch (was: idn.el and confusables.txt) Ted Zlatanov @ 2011-06-08 10:42 ` Ted Zlatanov 2011-06-08 15:22 ` Stefan Monnier 0 siblings, 1 reply; 182+ messages in thread From: Ted Zlatanov @ 2011-06-08 10:42 UTC (permalink / raw) To: emacs-devel On Tue, 31 May 2011 05:42:55 -0500 Ted Zlatanov <tzz@lifelogs.com> wrote: TZ> I've put uni-confusables.el on the ELPA branch. Stefan, could you TZ> please check it? I tried to stick to the *-pkg.el format you specified TZ> in archive-contents.el, but I may have missed something. It's the first TZ> multi-file package I've bundled. Also I don't know how the license TZ> should look. (to clarify, the license should be as part of Emacs) TZ> If you prefer I could just put uni-confusables.el there, but I thought TZ> the generator and the source database were important too. Ping on the questions above? Ted ^ permalink raw reply [flat|nested] 182+ messages in thread
* Re: uni-confusables 0.1 is on the Emacs ELPA branch 2011-06-08 10:42 ` uni-confusables 0.1 is on the Emacs ELPA branch Ted Zlatanov @ 2011-06-08 15:22 ` Stefan Monnier 0 siblings, 0 replies; 182+ messages in thread From: Stefan Monnier @ 2011-06-08 15:22 UTC (permalink / raw) To: emacs-devel TZ> I've put uni-confusables.el on the ELPA branch. Stefan, could you TZ> please check it? I tried to stick to the *-pkg.el format you specified TZ> in archive-contents.el, but I may have missed something. It's the first TZ> multi-file package I've bundled. Also I don't know how the license TZ> should look. > (to clarify, the license should be as part of Emacs) I'm not sure which is best. All the single files there seem to say "part of Emacs" and most of the multi-file packages seem to say "not part of Emacs". I don't think it matters much, as long as the text is correct (i.e. it doesn't say "not part of GNU Emacs" followed by "GNU Emacs is Free Software blabla" since that makes the blurb legally meaningless). So write it whichever fits best your intentions. TZ> If you prefer I could just put uni-confusables.el there, but I thought TZ> the generator and the source database were important too. > Ping on the questions above? Including the generator is important, indeed. The source database is not that important since it can be downloaded from other places, but feel free to include it. Stefan ^ permalink raw reply [flat|nested] 182+ messages in thread
* RE: face for non-ASCII characters 2011-04-16 15:50 ` Lennart Borgman 2011-04-16 15:57 ` Ted Zlatanov @ 2011-04-16 16:00 ` Drew Adams 1 sibling, 0 replies; 182+ messages in thread From: Drew Adams @ 2011-04-16 16:00 UTC (permalink / raw) To: 'Lennart Borgman', 'Ted Zlatanov'; +Cc: emacs-devel > > Yes, `idn-is-recommended' is autoloaded. It may be a > > problem with the fboundp call, which will return true even > > if the library is not available: > > (when (autoload 'coocoo "coo")(fboundp 'coocoo)) => t > > so maybe (require 'idn nil t) is a better approach > > I think autoloading and checking if the function is defined is the > best way. (Autoloading of `idn-is-recommended' should of course not be > setup if idn.el is not available.) I'm not following this thread, so ignore if this question doesn't help: Why set up autoloading here? Seems like you would want to either soft-require the library (no error if not there) or simply test whether the function is defined (which presumably happens when the library is loaded). IOW, why don't you just choose between (require 'foo nil t) and (fboundp 'foobar)? What's the point of having this library autoload the other one but not soft-require it? (FWIW, it sounds a priori like you should maybe just use the `fboundp' test, and not worry about automatically loading the other library.) Again, ignore if irrelevant to your discussion. ^ permalink raw reply [flat|nested] 182+ messages in thread
* Re: face for non-ASCII characters 2010-03-23 14:26 ` face for non-ASCII characters (was: Translation of http status code to text) Ted Zlatanov 2010-03-23 16:28 ` Lennart Borgman @ 2010-03-23 19:40 ` Florian Beck 1 sibling, 0 replies; 182+ messages in thread From: Florian Beck @ 2010-03-23 19:40 UTC (permalink / raw) To: Ted Zlatanov; +Cc: emacs-devel Ted Zlatanov wrote: > On Tue, 23 Mar 2010 14:08:42 +0100 Lennart Borgman <lennart.borgman@gmail.com> wrote: > > ... > LB> (defface markchars-face > LB> '((t (:underline "light blue"))) > LB> "Face for `markchars-mode' char marking." > LB> :group 'markchars) > > LB> (defconst markchars-keywords > LB> `( > LB> ("[[:nonascii:]]" > LB> (0 (put-text-property (match-beginning 0) (match-end 0) > LB> 'face 'markchars-face))) > LB> )) > > This is what I was thinking of, and the effect should be subtle. The > light blue underline is pretty good; it would be nice if it could be a > choice between a few similarly well-chosen defaults (I'm hardly the one > to provide them, though). > > I would make this an Emacs first-class option rather than an optional > library. I think it's really useful. You could always do something like (set-fontset-font "fontset-coding" ;; or whatever '(128 . 999999) (font-spec :family "LastResort" :size 20) ;; ok, not so subtle nil 'prepend) > > Ted > > > > ^ permalink raw reply [flat|nested] 182+ messages in thread
* Re: Translation of http status code to text 2010-03-23 13:08 ` Lennart Borgman 2010-03-23 14:26 ` face for non-ASCII characters (was: Translation of http status code to text) Ted Zlatanov @ 2010-03-23 14:35 ` Miles Bader 1 sibling, 0 replies; 182+ messages in thread From: Miles Bader @ 2010-03-23 14:35 UTC (permalink / raw) To: Lennart Borgman; +Cc: Juri Linkov, Ted Zlatanov, emacs-devel Don't use "-face" as a suffix for face names. -Miles -- Is it true that nothing can be known? If so how do we know this? -Woody Allen ^ permalink raw reply [flat|nested] 182+ messages in thread
* highlighting non-ASCII characters (was: Translation of http status code to text) 2010-03-23 9:55 ` Juri Linkov 2010-03-23 13:08 ` Lennart Borgman @ 2010-03-23 14:22 ` Ted Zlatanov 2010-03-23 16:50 ` highlighting non-ASCII characters (was: Translation of http statuscode " Drew Adams 2010-03-24 9:27 ` Juri Linkov 1 sibling, 2 replies; 182+ messages in thread From: Ted Zlatanov @ 2010-03-23 14:22 UTC (permalink / raw) To: emacs-devel On Tue, 23 Mar 2010 11:55:17 +0200 Juri Linkov <juri@jurta.org> wrote: >> Thanks for noticing that. I had the cyrillic-translit input method on >> accidentally. It would be nice if I could highlight non-ASCII >> characters in a subtle way to avoid this, is there a relevant face? JL> Maybe `escape-glyph' is a relevant face? I don't think it applies to non-ASCII by default, at least for me it doesn't. I can set up font-lock for myself, of course, but was wondering if Emacs has a simple built-in way to highlight them. Also it would be good if the effect did not interfere with other faces, so e.g. a subtle fading or underline effect overlaid with the regular face would be nicer than just asking the user to customize it themselves (although that option is still valuable). I don't know much about Emacs faces so any help is welcome. Ted ^ permalink raw reply [flat|nested] 182+ messages in thread
* RE: highlighting non-ASCII characters (was: Translation of http statuscode to text) 2010-03-23 14:22 ` highlighting non-ASCII characters (was: Translation of http status code to text) Ted Zlatanov @ 2010-03-23 16:50 ` Drew Adams 2010-03-23 21:49 ` highlighting non-ASCII characters Stefan Monnier 2010-03-24 9:27 ` Juri Linkov 1 sibling, 1 reply; 182+ messages in thread From: Drew Adams @ 2010-03-23 16:50 UTC (permalink / raw) To: 'Ted Zlatanov', emacs-devel > JL> Maybe `escape-glyph' is a relevant face? > > I don't think it applies to non-ASCII by default, at least > for me it doesn't.... Also it would be good if the effect > did not interfere with other faces I agree. `escape-glyph' is designed for a different use. Its intended use is given by its doc string, and it is very precise and clear: "Face for characters displayed as sequences using `^' or `\'. In my case, I customize `escape-glyph' to be something very noticeable (orange background, with blue foreground and 1-pixel blue box). I want to see something like ^G really stand out. I don't want to see non-ascii chars handled the same way. It is a too-common mistake, IMO, to try too hard to reuse existing faces. It complicates use (including customization) of those faces if they end up signifying a collection of disparate things. We should have the reflex, when tempted to reuse a face, to think twice. In general, just say no to such y-a-qu'a impulses. ;-) ^ permalink raw reply [flat|nested] 182+ messages in thread
* Re: highlighting non-ASCII characters 2010-03-23 16:50 ` highlighting non-ASCII characters (was: Translation of http statuscode " Drew Adams @ 2010-03-23 21:49 ` Stefan Monnier 2010-03-23 21:53 ` Drew Adams 0 siblings, 1 reply; 182+ messages in thread From: Stefan Monnier @ 2010-03-23 21:49 UTC (permalink / raw) To: Drew Adams; +Cc: 'Ted Zlatanov', emacs-devel > In my case, I customize `escape-glyph' to be something very noticeable > (orange background, with blue foreground and 1-pixel blue box). > I want to see something like ^G really stand out. I don't want to see > non-ascii chars handled the same way. Actually, in the original case (a weird non-ASCII K in the middle of ASCII text), I would want it to stand out. The problem there is how to define "weird". Stefan ^ permalink raw reply [flat|nested] 182+ messages in thread
* RE: highlighting non-ASCII characters 2010-03-23 21:49 ` highlighting non-ASCII characters Stefan Monnier @ 2010-03-23 21:53 ` Drew Adams 2010-03-24 0:45 ` Stefan Monnier 0 siblings, 1 reply; 182+ messages in thread From: Drew Adams @ 2010-03-23 21:53 UTC (permalink / raw) To: 'Stefan Monnier'; +Cc: 'Ted Zlatanov', emacs-devel > > In my case, I customize `escape-glyph' to be something very > > noticeable (orange background, with blue foreground and > > 1-pixel blue box). I want to see something like ^G really > > stand out. I don't want to see > > non-ascii chars handled the same way. > > Actually, in the original case (a weird non-ASCII K in the middle of > ASCII text), I would want it to stand out. The problem there > is how to define "weird". Whether you want non-ASCII chars to also stand out is a separate question. My point is that I don't want to see them handled _the same way_ as ^G et compagnie. Having separate faces lets users get the behavior they want. If you then want to customize the two faces to look the same, no problem. ^ permalink raw reply [flat|nested] 182+ messages in thread
* Re: highlighting non-ASCII characters 2010-03-23 21:53 ` Drew Adams @ 2010-03-24 0:45 ` Stefan Monnier 2010-03-24 1:03 ` Ted Zlatanov ` (3 more replies) 0 siblings, 4 replies; 182+ messages in thread From: Stefan Monnier @ 2010-03-24 0:45 UTC (permalink / raw) To: Drew Adams; +Cc: 'Ted Zlatanov', emacs-devel >> > In my case, I customize `escape-glyph' to be something very >> > noticeable (orange background, with blue foreground and >> > 1-pixel blue box). I want to see something like ^G really >> > stand out. I don't want to see >> > non-ascii chars handled the same way. >> Actually, in the original case (a weird non-ASCII K in the middle of >> ASCII text), I would want it to stand out. The problem there >> is how to define "weird". > Whether you want non-ASCII chars to also stand out is a separate > question. My point is that I don't want to see them handled _the same > way_ as ^G et compagnie. What I'm saying is that there are two issues: non-ASCII chars in general (which I personally don't want to display in any special manner: they're just as normal as ASCII chars), and then there are "chars that are out of place or that may not be what they look like", such as the weird "K" in the other message's "OK" (which to me, is similar to the NBSP char in that it is meant to be displayed in the same way as some other char, so we want to call the attention of the user to the difference). > Having separate faces lets users get the behavior they want. If you > then want to customize the two faces to look the same, no problem. I don't insist on using escape-glyph for those chars, indeed (I don't really care which face is used for them). What I care about is figuring out how to define programmatically "chars that are out of place or that may not be what they look like". Stefan ^ permalink raw reply [flat|nested] 182+ messages in thread
* Re: highlighting non-ASCII characters 2010-03-24 0:45 ` Stefan Monnier @ 2010-03-24 1:03 ` Ted Zlatanov 2010-03-24 2:47 ` Stefan Monnier 2010-03-24 2:09 ` Drew Adams ` (2 subsequent siblings) 3 siblings, 1 reply; 182+ messages in thread From: Ted Zlatanov @ 2010-03-24 1:03 UTC (permalink / raw) To: emacs-devel On Tue, 23 Mar 2010 20:45:48 -0400 Stefan Monnier <monnier@iro.umontreal.ca> wrote: SM> What I'm saying is that there are two issues: non-ASCII chars in general SM> (which I personally don't want to display in any special manner: SM> they're just as normal as ASCII chars), and then there are "chars that SM> are out of place or that may not be what they look like", such as the SM> weird "K" in the other message's "OK" (which to me, is similar to the SM> NBSP char in that it is meant to be displayed in the same way as some SM> other char, so we want to call the attention of the user to the SM> difference). ... SM> I don't insist on using escape-glyph for those chars, indeed (I don't SM> really care which face is used for them). What I care about is figuring SM> out how to define programmatically "chars that are out of place or that SM> may not be what they look like". How about this: show-nonascii-characters: t, 'majority-paragraph, majority-line, 'minority-line, 'minority-paragraph, 'suspicious, a function, or nil (default) show-nonascii-characters-face: customizable from a list of presets, escape-glyph, or a custom face The rules: t = always majority-paragraph = highlight when they are 0-90% of the paragraph majority-line = ditto for the line minority-line = highlight when they are 0-20% of the line minority-paragraph = ditto for the paragraph suspicious = 3-5 characters per visual line function = passed a list of regions of non-ASCII characters nil = the default, don't highlight I'm sure we'll tune this but as someone who writes non-ASCII characters a lot, this would make sense. I'd personally use t but I can see how that could be annoying. I think you're in favor of 'suspicious. The idea is to make this easy to set up and available in any mode or globally. It's easy for an experienced user to set up something atrocious, but good choices for the colors and the rules are, I think, the tricky and valuable part. Ted ^ permalink raw reply [flat|nested] 182+ messages in thread
* Re: highlighting non-ASCII characters 2010-03-24 1:03 ` Ted Zlatanov @ 2010-03-24 2:47 ` Stefan Monnier 2010-03-24 4:20 ` Eli Zaretskii 2010-03-24 10:05 ` Ted Zlatanov 0 siblings, 2 replies; 182+ messages in thread From: Stefan Monnier @ 2010-03-24 2:47 UTC (permalink / raw) To: Ted Zlatanov; +Cc: emacs-devel > show-nonascii-characters: t, 'majority-paragraph, majority-line, > 'minority-line, 'minority-paragraph, > 'suspicious, a function, or nil (default) The name is wrong, I think: I'd probably want to highlight ASCII chars that are out of place, just as with non-ASCII chars. Also, I'm not sure if proportion compared to total text (or line) is a good metric to decide whether it's suspicious. I don't have much better to suggest, tho. Stefan ^ permalink raw reply [flat|nested] 182+ messages in thread
* Re: highlighting non-ASCII characters 2010-03-24 2:47 ` Stefan Monnier @ 2010-03-24 4:20 ` Eli Zaretskii 2010-03-24 5:14 ` Jason Rumney 2010-03-24 10:05 ` Ted Zlatanov 1 sibling, 1 reply; 182+ messages in thread From: Eli Zaretskii @ 2010-03-24 4:20 UTC (permalink / raw) To: Stefan Monnier; +Cc: tzz, emacs-devel > From: Stefan Monnier <monnier@iro.umontreal.ca> > Date: Tue, 23 Mar 2010 22:47:37 -0400 > Cc: emacs-devel@gnu.org > > Also, I'm not sure if proportion compared to total text (or line) is > a good metric to decide whether it's suspicious. I don't have much > better to suggest, tho. If we go for such a metric, it would need to be augmented by a database of words where a small number of such characters is ``normal'', not to be highlighted. This is for words like naïve. Otherwise the feature will be an annoyance. ^ permalink raw reply [flat|nested] 182+ messages in thread
* Re: highlighting non-ASCII characters 2010-03-24 4:20 ` Eli Zaretskii @ 2010-03-24 5:14 ` Jason Rumney 2010-03-24 13:25 ` Stefan Monnier 0 siblings, 1 reply; 182+ messages in thread From: Jason Rumney @ 2010-03-24 5:14 UTC (permalink / raw) To: Eli Zaretskii; +Cc: tzz, Stefan Monnier, emacs-devel On 24/03/2010 12:20, Eli Zaretskii wrote: > If we go for such a metric, it would need to be augmented by a > database of words where a small number of such characters is > ``normal'', not to be highlighted. This is for words like naïve. > Otherwise the feature will be an annoyance. > It's also dependent on which characters they are - Cyrillic, Han, Greek, Hebrew etc should be expected to appear in long runs, perhaps with runs of ASCII and/or other characters interleaved. Latin-1 on the other hand would normally appear individually or in very short runs mixed in with ASCII. There is no single heuristic that can be used to identify "suspicious" characters. ^ permalink raw reply [flat|nested] 182+ messages in thread
* Re: highlighting non-ASCII characters 2010-03-24 5:14 ` Jason Rumney @ 2010-03-24 13:25 ` Stefan Monnier 2010-03-24 15:06 ` Jason Rumney 0 siblings, 1 reply; 182+ messages in thread From: Stefan Monnier @ 2010-03-24 13:25 UTC (permalink / raw) To: Jason Rumney; +Cc: Eli Zaretskii, tzz, emacs-devel >> If we go for such a metric, it would need to be augmented by a >> database of words where a small number of such characters is >> ``normal'', not to be highlighted. This is for words like naïve. >> Otherwise the feature will be an annoyance. > It's also dependent on which characters they are - Cyrillic, Han, Greek, > Hebrew etc should be expected to appear in long runs, perhaps with runs of > ASCII and/or other characters interleaved. Latin-1 on the other hand would > normally appear individually or in very short runs mixed in with ASCII. > There is no single heuristic that can be used to identify > "suspicious" characters. Maybe within the scope of "social engineering attacks where some text (e.g. a host name) looks like something well-known but actually uses funny Unicode chars", some people have come up with something like a list of known problems (typically Unicode points that are usually displayed identically). Stefan ^ permalink raw reply [flat|nested] 182+ messages in thread
* Re: highlighting non-ASCII characters 2010-03-24 13:25 ` Stefan Monnier @ 2010-03-24 15:06 ` Jason Rumney 2010-03-24 19:47 ` Ted Zlatanov 0 siblings, 1 reply; 182+ messages in thread From: Jason Rumney @ 2010-03-24 15:06 UTC (permalink / raw) To: Stefan Monnier; +Cc: Eli Zaretskii, tzz, emacs-devel Stefan Monnier <monnier@iro.umontreal.ca> writes: >> There is no single heuristic that can be used to identify >> "suspicious" characters. > > Maybe within the scope of "social engineering attacks where some text > (e.g. a host name) looks like something well-known but actually uses > funny Unicode chars", some people have come up with something like > a list of known problems (typically Unicode points that are usually > displayed identically). Yes, within that context suspicious has a clear enough meaning. But the cases others were discussing seemed more like cases of mismatched encodings that they wanted to be flagged more obviously. ^ permalink raw reply [flat|nested] 182+ messages in thread
* Re: highlighting non-ASCII characters 2010-03-24 15:06 ` Jason Rumney @ 2010-03-24 19:47 ` Ted Zlatanov 0 siblings, 0 replies; 182+ messages in thread From: Ted Zlatanov @ 2010-03-24 19:47 UTC (permalink / raw) To: emacs-devel On Wed, 24 Mar 2010 23:06:48 +0800 Jason Rumney <jasonr@gnu.org> wrote: JR> Yes, within that [social engineering] context suspicious has a clear JR> enough meaning. But the cases others were discussing seemed more JR> like cases of mismatched encodings that they wanted to be flagged JR> more obviously. All of those use cases are valid, but the social engineering is IMO the most important reason why we should consider it for first-class inclusion in Emacs. On Wed, 24 Mar 2010 17:21:40 +0100 Lennart Borgman <lennart.borgman@gmail.com> wrote: LB> Maybe "Recommended Identifier Profiles for IDN" should be implemented LB> in Emacs? (See http://www.unicode.org/reports/tr39/data/idnchars.txt) LB> How about a bool vector (see make-bool-vector) for this? I don't know if we should be planning the storage mechanism yet, but you can certainly put that together. Don't optimize this prematurely, just use a list at first. It's important to decide what we're doing with this feature; as you see there's several use cases and I want to be sure we can cover them all (hence my proposal with several classes of characters and matching rules) before jumping into the implementation. I'm pretty sure we'll at least need the "confusables" or homoglyphs so maybe that's a good start. Ted ^ permalink raw reply [flat|nested] 182+ messages in thread
* Re: highlighting non-ASCII characters 2010-03-24 2:47 ` Stefan Monnier 2010-03-24 4:20 ` Eli Zaretskii @ 2010-03-24 10:05 ` Ted Zlatanov 2010-03-24 16:21 ` Lennart Borgman 2010-03-25 7:12 ` Juri Linkov 1 sibling, 2 replies; 182+ messages in thread From: Ted Zlatanov @ 2010-03-24 10:05 UTC (permalink / raw) To: emacs-devel On Tue, 23 Mar 2010 22:47:37 -0400 Stefan Monnier <monnier@iro.umontreal.ca> wrote: >> show-nonascii-characters: t, 'majority-paragraph, majority-line, >> 'minority-line, 'minority-paragraph, >> 'suspicious, a function, or nil (default) SM> The name is wrong, I think: I'd probably want to highlight ASCII chars SM> that are out of place, just as with non-ASCII chars. Although Unicode calls them "confusable" I think that's a terrible name. So how about show-out-of-place-glyphs as an alist and 'homoglyphs as a key option (see http://en.wikipedia.org/wiki/Homoglyph and, amazingly, http://homoglyphs.net/)? show-out-of-place-glyphs: alist; keys can be 'ascii, 'nonascii, or 'homoglyphs. Maybe we can also allow a general regex. Values can be 'always, 'majority-paragraph, majority-line, 'minority-line, 'minority-paragraph, 'suspicious (with the same rules I proposed earlier). A function should also be possible. Optional second value is a face, defaulting to `out-of-place-glyph'. That lets us map an interesting class of characters to a heuristic that determines whether they are out of place. So Stefan might have (and this could be the Emacs default) (setq show-out-of-place-glyphs '(homoglyphs suspicious)) but I would have (setq show-out-of-place-glyphs '(nonascii always face1)) which includes Stefan's setting. SM> Also, I'm not sure if proportion compared to total text (or line) is SM> a good metric to decide whether it's suspicious. I don't have much SM> better to suggest, tho. I based it on what I would find useful. I think the majority of people will want 'suspicious and let Emacs choose a default. So maybe the {majority,minority}-* options are superfluous. On Wed, 24 Mar 2010 06:20:51 +0200 Eli Zaretskii <eliz@gnu.org> wrote: EZ> If we go for such a metric, it would need to be augmented by a EZ> database of words where a small number of such characters is EZ> ``normal'', not to be highlighted. This is for words like naïve. EZ> Otherwise the feature will be an annoyance. That's in the extended ASCII charset which would probably be included in the ASCII definition above, although it certainly has homoglyphs to upper-range Unicode (I revised the proposal to distinguish between highlighting homoglyphs and non-ASCII). I think regular English doesn't have many common words that would be outside the extended ASCII charset. On Wed, 24 Mar 2010 13:14:13 +0800 Jason Rumney <jasonr@gnu.org> wrote: JR> It's also dependent on which characters they are - Cyrillic, Han, JR> Greek, Hebrew etc should be expected to appear in long runs, perhaps JR> with runs of ASCII and/or other characters interleaved. Latin-1 on JR> the other hand would normally appear individually or in very short JR> runs mixed in with ASCII. Agreed, and that can be fine-tuned. JR> There is no single heuristic that can be used to identify "suspicious" JR> characters. So we'll provide several. I'd rather have something useful than try to make it perfect. On Tue, 23 Mar 2010 19:09:18 -0700 "Drew Adams" <drew.adams@oracle.com> wrote: >> What I'm saying is that there are two issues: non-ASCII chars in >> general (which I personally don't want to display in any special >> manner: they're just as normal as ASCII chars), and then there are >> "chars that are out of place or that may not be what they look like", >> such as the weird "K" in the other message's "OK" (which to me, is >> similar to the NBSP char in that it is meant to be displayed in the >> same way as some other char, so we want to call the attention of the >> user to the difference). I hope you'll go along with "homoglyphs" as I propose, I think that's what you mean :) On Wed, 24 Mar 2010 14:00:47 +0900 "Stephen J. Turnbull" <stephen@xemacs.org> wrote: SJT> There were long threads on Python-dev about this with respect to the SJT> PEPs implementing Unicode. The bottom line was basically that the SJT> recommendations of the Unicode Security Considerations UTR #36 should SJT> be followed with respect to "characters that may not be what they look SJT> like". This is relevant, thanks for the pointer. See http://unicode.org/reports/tr36/ which links to: http://www.unicode.org/reports/tr39/#Confusable_Detection which can also be used to build a table of homoglyphs (as in http://homoglyphs.net). Thanks! Ted ^ permalink raw reply [flat|nested] 182+ messages in thread
* Re: highlighting non-ASCII characters 2010-03-24 10:05 ` Ted Zlatanov @ 2010-03-24 16:21 ` Lennart Borgman 2010-03-24 19:34 ` Lennart Borgman 2010-03-25 7:11 ` Juri Linkov 2010-03-25 7:12 ` Juri Linkov 1 sibling, 2 replies; 182+ messages in thread From: Lennart Borgman @ 2010-03-24 16:21 UTC (permalink / raw) To: Ted Zlatanov; +Cc: emacs-devel 2010/3/24 Ted Zlatanov <tzz@lifelogs.com>: > On Wed, 24 Mar 2010 14:00:47 +0900 "Stephen J. Turnbull" <stephen@xemacs.org> wrote: > > SJT> There were long threads on Python-dev about this with respect to the > SJT> PEPs implementing Unicode. The bottom line was basically that the > SJT> recommendations of the Unicode Security Considerations UTR #36 should > SJT> be followed with respect to "characters that may not be what they look > SJT> like". > > This is relevant, thanks for the pointer. See > > http://unicode.org/reports/tr36/ > > which links to: > > http://www.unicode.org/reports/tr39/#Confusable_Detection > > which can also be used to build a table of homoglyphs (as in http://homoglyphs.net). Maybe "Recommended Identifier Profiles for IDN" should be implemented in Emacs? (See http://www.unicode.org/reports/tr39/data/idnchars.txt) How about a bool vector (see make-bool-vector) for this? ^ permalink raw reply [flat|nested] 182+ messages in thread
* Re: highlighting non-ASCII characters 2010-03-24 16:21 ` Lennart Borgman @ 2010-03-24 19:34 ` Lennart Borgman 2010-03-26 17:35 ` Ted Zlatanov 2010-03-25 7:11 ` Juri Linkov 1 sibling, 1 reply; 182+ messages in thread From: Lennart Borgman @ 2010-03-24 19:34 UTC (permalink / raw) To: Ted Zlatanov; +Cc: emacs-devel [-- Attachment #1: Type: text/plain, Size: 562 bytes --] On Wed, Mar 24, 2010 at 5:21 PM, Lennart Borgman <lennart.borgman@gmail.com> wrote: > > Maybe "Recommended Identifier Profiles for IDN" should be implemented > in Emacs? (See http://www.unicode.org/reports/tr39/data/idnchars.txt) > > How about a bool vector (see make-bool-vector) for this? I wrote a little file for that part. (Too much trouble trying to report some difficult bugs I have not reported yet. I need to do some more relaxing thing like this... ;-) The attached file sets up IDN chars as above. How about defining a character class [:idnchars:]? [-- Attachment #2: idn.el --] [-- Type: text/plain, Size: 3460 bytes --] ;;; idn.el --- Recommended Identifier Profiles for IDN ;; ;; Author: Lennart Borgman (lennart O borgman A gmail O com) ;; Created: 2010-03-24 Wed ;; Version: 0.1 ;; Last-Updated: 2010-03-24 Wed ;; URL: ;; Keywords: ;; Compatibility: ;; ;; Features that might be required by this library: ;; ;; None ;; ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ;; ;;; Commentary: ;; ;; Functions for handling IDN chars defined by ;; `http://www.unicode.org/reports/tr39/'. ;; ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ;; ;;; Change log: ;; ;; ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ;; ;; This program is free software; you can redistribute it and/or ;; modify it under the terms of the GNU General Public License as ;; published by the Free Software Foundation; either version 3, or ;; (at your option) any later version. ;; ;; This program is distributed in the hope that it will be useful, ;; but WITHOUT ANY WARRANTY; without even the implied warranty of ;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ;; General Public License for more details. ;; ;; You should have received a copy of the GNU General Public License ;; along with this program; see the file COPYING. If not, write to ;; the Free Software Foundation, Inc., 51 Franklin Street, Fifth ;; Floor, Boston, MA 02110-1301, USA. ;; ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ;; ;;; Code: (defvar uts39-datadir "c:/uts39/data/") (defun idn-init (bv) (let* ((idnchars-file (expand-file-name "idnchars.txt" uts39-datadir)) (idnchars-old (find-buffer-visiting idnchars-file)) (idnchars-buf (or idnchars-old (if (not (file-exists-p idnchars-file)) (message "Can't find file %S" idnchars-file) (find-file-noselect idnchars-file)))) here (range-patt (rx bol (group (repeat 4 (any xdigit))) (optional ".." (group (repeat 4 (any xdigit))))))) (when idnchars-buf (with-current-buffer idnchars-buf (setq here (point)) (save-restriction (widen) (goto-char (point-min)) (while (re-search-forward range-patt nil t) (let* ((str-beg (match-string 0)) (str-end (match-string 1)) (beg (string-to-number str-beg 16)) (end (or (when str-end (string-to-number str-end 16)) beg))) (dotimes (ii (1+ (- end beg))) (aset bv (+ ii beg) t))))) (goto-char here)) (unless idnchars-old (kill-buffer idnchars-buf)) t))) (defconst idn-chars "Boolean vector with recommended IDN chars. See URL `http://www.unicode.org/reports/tr39/'. Data is initialized from the file idnchars.txt in the directory `uts39-datadir'. This file is fetched from the above location." (let ((bv (make-bool-vector (1- (* 256 256)) nil))) (when (idn-init bv) ;; (string-to-number "002D" 16) ;; Make a quick sanity check: (unless (and (not (aref idn-chars 44)) (aref idn-chars 45)) (message "idn-chars: Bad idn data in file idnchars.txt")) bv))) (provide 'idn) ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ;;; idn.el ends here ^ permalink raw reply [flat|nested] 182+ messages in thread
* Re: highlighting non-ASCII characters 2010-03-24 19:34 ` Lennart Borgman @ 2010-03-26 17:35 ` Ted Zlatanov 2010-03-26 20:43 ` Ted Zlatanov 0 siblings, 1 reply; 182+ messages in thread From: Ted Zlatanov @ 2010-03-26 17:35 UTC (permalink / raw) To: emacs-devel On Wed, 24 Mar 2010 20:34:41 +0100 Lennart Borgman <lennart.borgman@gmail.com> wrote: LB> The attached file sets up IDN chars as above. How about defining a LB> character class [:idnchars:]? The IDN character class could be useful. The list changes so rarely that it can be hard-coded like the POSIX classes IMO. I think this would be done in src/regex.c by defining RECC_IDNCHARS for instance. This could highlight when non-IDN characters are used in a domain name. But IDN characters are separate from the "confusables" (homoglyphs) we should discuss, which are much more problematic and more complex because they not just a character class. On Thu, 25 Mar 2010 09:11:35 +0200 Juri Linkov <juri@jurta.org> wrote: JL> I think it would be more useful to implement this spec: JL> http://www.unicode.org/reports/tr39/data/confusables.txt JL> "Visually Confusable Characters: Provides a mapping for visual JL> confusables for use in further restricting identifiers for security". JL> It's very large, but it seems it's still incomplete. I can't find JL> a "confusable" mapping for the problem I reported: JL> BOX DRAWINGS DOUBLE HORIZONTAL -> EQUALS SIGN We can have a [:confusable:] character class defined in src/regex.c. That lets us find these characters. It could be generated from the TXT database and augmented with our own mappings. But there's grouping information, so maybe that should be available too. For highlighting we don't need grouping information, but the user would find it useful to look at a glyph and find out that it looks like 3 other glyphs. So this can be in a Lisp-level data structure like a hashtable with list values. I looked at whitespace.el and it looks generally suitable for this kind of highlighting. I can't decide if the work should augment whitespace.el or if it should be a new library called visible.el (because the name whitespace.el is so specific). On Thu, 25 Mar 2010 15:07:04 +0100 Lennart Borgman <lennart.borgman@gmail.com> wrote: LB> To me it looks like IDN is the most important. Is not this a LB> derivative work from "confusables"? I think they are separate logically. TR39 cares about "confusables" in the context of IDN but Emacs has a wider view as a general text editor, IIUC. Ted ^ permalink raw reply [flat|nested] 182+ messages in thread
* Re: highlighting non-ASCII characters 2010-03-26 17:35 ` Ted Zlatanov @ 2010-03-26 20:43 ` Ted Zlatanov 2010-03-26 22:50 ` Lennart Borgman 0 siblings, 1 reply; 182+ messages in thread From: Ted Zlatanov @ 2010-03-26 20:43 UTC (permalink / raw) To: emacs-devel On Fri, 26 Mar 2010 12:35:36 -0500 Ted Zlatanov <tzz@lifelogs.com> wrote: TZ> We can have a [:confusable:] character class defined in src/regex.c. TZ> That lets us find these characters. It could be generated from the TXT TZ> database and augmented with our own mappings. But there's grouping TZ> information, so maybe that should be available too. For highlighting we TZ> don't need grouping information, but the user would find it useful to TZ> look at a glyph and find out that it looks like 3 other glyphs. So this TZ> can be in a Lisp-level data structure like a hashtable with list values. I forgot to mention this RFC is relevant as well, section 2.2.6: http://www.ietf.org/rfc/rfc4690.txt Like the IDN character class, the discussion centers on homoglyphs inside domain names, but it mentions general relationship-based confusable detection and points to further RFCs. Ted ^ permalink raw reply [flat|nested] 182+ messages in thread
* Re: highlighting non-ASCII characters 2010-03-26 20:43 ` Ted Zlatanov @ 2010-03-26 22:50 ` Lennart Borgman 2010-03-29 18:38 ` Ted Zlatanov 0 siblings, 1 reply; 182+ messages in thread From: Lennart Borgman @ 2010-03-26 22:50 UTC (permalink / raw) To: Ted Zlatanov; +Cc: emacs-devel 2010/3/26 Ted Zlatanov <tzz@lifelogs.com>: > On Fri, 26 Mar 2010 12:35:36 -0500 Ted Zlatanov <tzz@lifelogs.com> wrote: > > TZ> We can have a [:confusable:] character class defined in src/regex.c. > TZ> That lets us find these characters. It could be generated from the TXT > TZ> database and augmented with our own mappings. But there's grouping > TZ> information, so maybe that should be available too. For highlighting we > TZ> don't need grouping information, but the user would find it useful to > TZ> look at a glyph and find out that it looks like 3 other glyphs. So this > TZ> can be in a Lisp-level data structure like a hashtable with list values. > > I forgot to mention this RFC is relevant as well, section 2.2.6: > > http://www.ietf.org/rfc/rfc4690.txt > > Like the IDN character class, the discussion centers on homoglyphs > inside domain names, but it mentions general relationship-based > confusable detection and points to further RFCs. Thanks, but what would the difference be to try confusable detection instead of chars outside IDN? I believe marking confusable chars would also mean marking those confusable chars that have been decided on in IDN. Perhaps they could also be handled in another way (for example helping switching/rotating the confusable chars, all or one-by-one). Would that be useful? ^ permalink raw reply [flat|nested] 182+ messages in thread
* Re: highlighting non-ASCII characters 2010-03-26 22:50 ` Lennart Borgman @ 2010-03-29 18:38 ` Ted Zlatanov 2010-03-29 18:48 ` Drew Adams 2010-03-29 20:19 ` Stefan Monnier 0 siblings, 2 replies; 182+ messages in thread From: Ted Zlatanov @ 2010-03-29 18:38 UTC (permalink / raw) To: emacs-devel On Fri, 26 Mar 2010 23:50:26 +0100 Lennart Borgman <lennart.borgman@gmail.com> wrote: LB> Thanks, but what would the difference be to try confusable detection LB> instead of chars outside IDN? They are two different character sets. The IDN chars are fewer and have a specific purpose for IDN; confusable characters are any Unicode characters mapped to homoglyphs. LB> I believe marking confusable chars would also mean marking those LB> confusable chars that have been decided on in IDN. Yes, probably. But that's accidental. I still think the character classes [:idn:] (revised name from before) and [:confusable:] (or [:homoglyph:]) would make sense as a first step, then we can decide how to highlight them. LB> Perhaps they could also be handled in another way (for example helping LB> switching/rotating the confusable chars, all or one-by-one). Would LB> that be useful? I don't know what you mean by switching/rotating, sorry. LB> Still I think it could be a good idea to restrict identifier names to LB> chars within IDN. We're talking about generic text in Emacs, so I think IDN should be limited to places in Emacs where domain names show up (e.g. URLs). Ted ^ permalink raw reply [flat|nested] 182+ messages in thread
* RE: highlighting non-ASCII characters 2010-03-29 18:38 ` Ted Zlatanov @ 2010-03-29 18:48 ` Drew Adams 2010-03-29 20:20 ` Stefan Monnier 2010-03-29 20:19 ` Stefan Monnier 1 sibling, 1 reply; 182+ messages in thread From: Drew Adams @ 2010-03-29 18:48 UTC (permalink / raw) To: 'Ted Zlatanov', emacs-devel I haven't been following this thread; sorry. But it occurred to me that besides different categories of such critters there might be different levels of fontification details that users might want to see. For example, for some users or for some purposes, it might be useful to see different kinds of quote marks distinguished (e.g. different kinds of curly quotes that might be homoglyphs or curly vs straight quotes, which are not homoglyphs). For other users or for other purposes such highlighting would be a distraction. Font-lock verbosity levels allow for this kind of thing. ^ permalink raw reply [flat|nested] 182+ messages in thread
* Re: highlighting non-ASCII characters 2010-03-29 18:48 ` Drew Adams @ 2010-03-29 20:20 ` Stefan Monnier 0 siblings, 0 replies; 182+ messages in thread From: Stefan Monnier @ 2010-03-29 20:20 UTC (permalink / raw) To: Drew Adams; +Cc: 'Ted Zlatanov', emacs-devel > Font-lock verbosity levels allow for this kind of thing. font-lock verbosity level must die! Stefan "because it's too coarse: just because you want fine-grained for some things shouldn't force you to suffer from the information overload of other things" ^ permalink raw reply [flat|nested] 182+ messages in thread
* Re: highlighting non-ASCII characters 2010-03-29 18:38 ` Ted Zlatanov 2010-03-29 18:48 ` Drew Adams @ 2010-03-29 20:19 ` Stefan Monnier 2010-03-29 20:51 ` Lennart Borgman 2010-03-29 21:05 ` Ted Zlatanov 1 sibling, 2 replies; 182+ messages in thread From: Stefan Monnier @ 2010-03-29 20:19 UTC (permalink / raw) To: Ted Zlatanov; +Cc: emacs-devel > Yes, probably. But that's accidental. I still think the character > classes [:idn:] (revised name from before) and [:confusable:] (or > [:homoglyph:]) would make sense as a first step, then we can decide how > to highlight them. The homoglyph data would be a useful starting point for the feature I imagine, indeed. But from the message that started this thread, "K" is a homoglyph, yet highlighting it everywhere doesn't sound like a good idea, so basically we need to associate with each homoglyph char a context where it is expected and only highlight it when it appears in a different context (or maybe rather when it appears in the context of its peer). Stefan ^ permalink raw reply [flat|nested] 182+ messages in thread
* Re: highlighting non-ASCII characters 2010-03-29 20:19 ` Stefan Monnier @ 2010-03-29 20:51 ` Lennart Borgman 2010-03-30 13:22 ` Ted Zlatanov 2010-03-29 21:05 ` Ted Zlatanov 1 sibling, 1 reply; 182+ messages in thread From: Lennart Borgman @ 2010-03-29 20:51 UTC (permalink / raw) To: Stefan Monnier; +Cc: Ted Zlatanov, emacs-devel On Mon, Mar 29, 2010 at 10:19 PM, Stefan Monnier <monnier@iro.umontreal.ca> wrote: >> Yes, probably. But that's accidental. I still think the character >> classes [:idn:] (revised name from before) and [:confusable:] (or >> [:homoglyph:]) would make sense as a first step, then we can decide how >> to highlight them. > > The homoglyph data would be a useful starting point for the feature > I imagine, indeed. But from the message that started this thread, "K" > is a homoglyph, yet highlighting it everywhere doesn't sound like a good > idea, so basically we need to associate with each homoglyph char > a context where it is expected and only highlight it when it appears in > a different context (or maybe rather when it appears in the context of > its peer). I agree (and I agree to most of what Ted said too). However just hilighting non-IDN chars seems useful enough. I think it should be done everywhere (because it is simple and probably does not hurt, IDN seems mostly useful for variables to for examples), or optionally only in strings (the only URL context we can actually guess). For the moment I have implemented this as fontification. Having it as a char class that is flexibly initialized would be better. Perhaps my routines for reading the chars can be used there too. The homoglyph context thing is maybe more difficult. I did not try to read carefully so I do not know much. I guess there is something like char value ranges to use, or? Someone knows which document that where those ranges can be read (by some elisp code)? ^ permalink raw reply [flat|nested] 182+ messages in thread
* Re: highlighting non-ASCII characters 2010-03-29 20:51 ` Lennart Borgman @ 2010-03-30 13:22 ` Ted Zlatanov 0 siblings, 0 replies; 182+ messages in thread From: Ted Zlatanov @ 2010-03-30 13:22 UTC (permalink / raw) To: emacs-devel On Mon, 29 Mar 2010 22:51:02 +0200 Lennart Borgman <lennart.borgman@gmail.com> wrote: LB> However just hilighting non-IDN chars seems useful enough. I think it LB> should be done everywhere (because it is simple and probably does not LB> hurt, IDN seems mostly useful for variables to for examples), or LB> optionally only in strings (the only URL context we can actually LB> guess). LB> For the moment I have implemented this as fontification. Having it as LB> a char class that is flexibly initialized would be better. Perhaps my LB> routines for reading the chars can be used there too. Look at Categories in the ELisp manual (what Stefan referred to when he mentioned category-table). If you can implement your reader that way it would be great. It's much better than modifying regexp.c :) LB> The homoglyph context thing is maybe more difficult. I did not try to LB> read carefully so I do not know much. I guess there is something like LB> char value ranges to use, or? Someone knows which document that where LB> those ranges can be read (by some elisp code)? The confusables text file will give you all of them for the category table. But you also need to group them by homoglyph (probably with a hashtable), so I'd write a custom reader. If you don't get to it, I will eventually :) The two text files (IDN and confusables) would have to live inside Emacs somewhere and the reader will load them when it's loaded. LB> My impression is that IDN is a work in progress so it might be good LB> idea to read in the characters from a file if possible (and let the LB> user reread that file later if necessary). Re-reading the file is a really, really rare occurrence for the user so I would make it an internal function. You can always call it directly while developing, but end users will never need to. Ted ^ permalink raw reply [flat|nested] 182+ messages in thread
* Re: highlighting non-ASCII characters 2010-03-29 20:19 ` Stefan Monnier 2010-03-29 20:51 ` Lennart Borgman @ 2010-03-29 21:05 ` Ted Zlatanov 2010-03-29 21:31 ` Lennart Borgman ` (2 more replies) 1 sibling, 3 replies; 182+ messages in thread From: Ted Zlatanov @ 2010-03-29 21:05 UTC (permalink / raw) To: emacs-devel On Mon, 29 Mar 2010 16:19:07 -0400 Stefan Monnier <monnier@iro.umontreal.ca> wrote: >> Yes, probably. But that's accidental. I still think the character >> classes [:idn:] (revised name from before) and [:confusable:] (or >> [:homoglyph:]) would make sense as a first step, then we can decide how >> to highlight them. SM> The homoglyph data would be a useful starting point for the feature SM> I imagine, indeed. But from the message that started this thread, "K" SM> is a homoglyph, yet highlighting it everywhere doesn't sound like a good SM> idea, so basically we need to associate with each homoglyph char SM> a context where it is expected and only highlight it when it appears in SM> a different context (or maybe rather when it appears in the context of SM> its peer). (I had a "lightbulb moment" I should have had long ago: "confusable" is a character property, while "homoglyph" is a glyph property; thus the character class should be [:confusable:] and "homoglyph" should be used in the face name as long as it's not, er, confusing.) I know the goal is to match in context and I may take whitespace.el as a guide in this regard, but I have to start with a [:confusable:] character class. I'll also add a [:idn:] class as discussed. Is that OK or are you concerned about code bloat in regexp.c? Afterwards we can set up the map between each confusable character and the set of characters it can match; this is also in the data file. That lets us look in context and apply the rules I proposed. So for example if Cyrillic K is confusable with Roman K and we see Roman characters around, that's suspicious. But Cyrillic "zhe" is not confusable with any Roman characters so it wouldn't be as suspicious. On Mon, 29 Mar 2010 11:48:28 -0700 "Drew Adams" <drew.adams@oracle.com> wrote: DA> But it occurred to me that besides different categories of such critters there DA> might be different levels of fontification details that users might want to see. DA> For example, for some users or for some purposes, it might be useful to see DA> different kinds of quote marks distinguished (e.g. different kinds of curly DA> quotes that might be homoglyphs or curly vs straight quotes, which are not DA> homoglyphs). For other users or for other purposes such highlighting would be a DA> distraction. I'll set up a flexible mechanism, probably patterned after whitespace.el, to do this kind of highlighting. So the users will be able to extend it if needed. I don't know about curly vs. straight quotes. I don't think that's a significant problem, whereas a Cyrillic K in Roman text can actually cause problems and security compromises. I'm not against the idea, I have just never seen it become an issue, and there's a million ways to combine quotation marks depending on the context. What's the specific case that you're thinking of? Ted ^ permalink raw reply [flat|nested] 182+ messages in thread
* Re: highlighting non-ASCII characters 2010-03-29 21:05 ` Ted Zlatanov @ 2010-03-29 21:31 ` Lennart Borgman 2010-03-29 21:32 ` Drew Adams 2010-03-30 1:45 ` Stefan Monnier 2 siblings, 0 replies; 182+ messages in thread From: Lennart Borgman @ 2010-03-29 21:31 UTC (permalink / raw) To: Ted Zlatanov; +Cc: emacs-devel 2010/3/29 Ted Zlatanov <tzz@lifelogs.com>: > > I know the goal is to match in context and I may take whitespace.el as a > guide in this regard, but I have to start with a [:confusable:] > character class. I'll also add a [:idn:] class as discussed. Is that > OK or are you concerned about code bloat in regexp.c? My impression is that IDN is a work in progress so it might be good idea to read in the characters from a file if possible (and let the user reread that file later if necessary). ^ permalink raw reply [flat|nested] 182+ messages in thread
* RE: highlighting non-ASCII characters 2010-03-29 21:05 ` Ted Zlatanov 2010-03-29 21:31 ` Lennart Borgman @ 2010-03-29 21:32 ` Drew Adams 2010-03-30 13:15 ` Ted Zlatanov 2010-03-30 1:45 ` Stefan Monnier 2 siblings, 1 reply; 182+ messages in thread From: Drew Adams @ 2010-03-29 21:32 UTC (permalink / raw) To: 'Ted Zlatanov', emacs-devel > DA> But it occurred to me that besides different categories > DA> of such critters there might be different levels of > DA> fontification details that users might want to see. > > DA> For example, for some users or for some purposes, it > DA> might be useful to see different kinds of quote marks > DA> distinguished (e.g. different kinds of curly quotes > DA> that might be homoglyphs or curly vs straight quotes, > DA> which are not homoglyphs). For other users or for other > DA> purposes such highlighting would be a distraction. > > I'll set up a flexible mechanism, probably patterned after > whitespace.el, to do this kind of highlighting. So the users will be > able to extend it if needed. > > I don't know about curly vs. straight quotes. I don't think that's a > significant problem, whereas a Cyrillic K in Roman text can actually > cause problems and security compromises. I'm not against the idea, I > have just never seen it become an issue, and there's a million ways to > combine quotation marks depending on the context. What's the specific > case that you're thinking of? Nothing special. I know that in Framemaker we consistently replace all curly quotes with straight quotes, mainly so that copy+paste of doc examples will work in applications (e.g. SQL). The point was that besides multiple kinds of curly-quote characters that are true homoglyphs (dunno if such exist; I'm assuming they do), it can also be useful to highlight a character (e.g. curly quote) that is similar to but not identical to another character (e.g. straight quote). That use would be contextual, of course: you might turn it on the same way you might turn on an on-the-fly spell-check, to let you know whenever you type or paste the wrong char. Probably this need would be taken care of by what you described in your first paragraphs (not quoted here): being able to highlight similar looking chars that are not necessarily exact homoglyphs. ^ permalink raw reply [flat|nested] 182+ messages in thread
* Re: highlighting non-ASCII characters 2010-03-29 21:32 ` Drew Adams @ 2010-03-30 13:15 ` Ted Zlatanov 2010-03-30 14:04 ` Drew Adams 2010-03-30 16:18 ` Juri Linkov 0 siblings, 2 replies; 182+ messages in thread From: Ted Zlatanov @ 2010-03-30 13:15 UTC (permalink / raw) To: emacs-devel On Mon, 29 Mar 2010 14:32:33 -0700 "Drew Adams" <drew.adams@oracle.com> wrote: DA> The point was that besides multiple kinds of curly-quote characters that are DA> true homoglyphs (dunno if such exist; I'm assuming they do), There are definitely some funky quotes in Unicode so let's assume so. DA> it can also be useful to highlight a character (e.g. curly quote) DA> that is similar to but not identical to another character DA> (e.g. straight quote). OK, but can you say how it's useful in a specific example? In SQL, Perl, Java, Lisp, and TeX editing I would not need the *glyphs* highlighted because the mode would detect the mismatch, e.g. in Perl $result = `run command here[wrong backward quote here]; # comment here would highlight "comment here" as part of the command. IOW they are syntactically significant so a mismatch is not likely to go unnoticed anyway by the regular font-lock and the parser. In regular text it's legitimate to have any combination of quote marks so I don't see the benefit of looking for suspicious combinations. In domain names quote marks of any kind are suspicious :) Ted ^ permalink raw reply [flat|nested] 182+ messages in thread
* RE: highlighting non-ASCII characters 2010-03-30 13:15 ` Ted Zlatanov @ 2010-03-30 14:04 ` Drew Adams 2010-03-30 14:17 ` Lennart Borgman 2010-03-30 14:42 ` Ted Zlatanov 2010-03-30 16:18 ` Juri Linkov 1 sibling, 2 replies; 182+ messages in thread From: Drew Adams @ 2010-03-30 14:04 UTC (permalink / raw) To: 'Ted Zlatanov', emacs-devel > DA> it can also be useful to highlight a character (e.g. curly quote) > DA> that is similar to but not identical to another character > DA> (e.g. straight quote). > > OK, but can you say how it's useful in a specific example? In SQL, > Perl, Java, Lisp, and TeX editing I would not need the *glyphs* > highlighted because the mode would detect the mismatch, e.g. in Perl > > $result = `run command here[wrong backward quote here]; # comment here > > would highlight "comment here" as part of the command. IOW they are > syntactically significant so a mismatch is not likely to go unnoticed > anyway by the regular font-lock and the parser. > > In regular text it's legitimate to have any combination of quote marks > so I don't see the benefit of looking for suspicious combinations. In > domain names quote marks of any kind are suspicious :) Perhaps you're assuming that the code will be used in Emacs, so you say that Emacs treats all such quotes similarly or highlights them anyway etc. (so no problem). Emacs might be used to write raw documentation (e.g. including code samples) that is used to generate HTML or PDF or... Readers of that doc might then copy and paste such examples into an app other than Emacs for execution - an app that does not treat all such quotes similarly. Just one hypothetical example, extrapolated from why we use straight quotes in our use of Framemaker. Beyond that, I would think that there might be a number of use cases where one might want to visually distinguish characters that are difficult to distinguish - either exact homoglyphs or approximate ones. That's all. Remember that it was only recently that Emacs itself started to treat a non-breaking space in Lisp code the same as a regular space. Nothing guarantees that an app DTRT with chars that look similar but are different. And there's the opposite potential problem: not distinguishing similar chars visually in the case where they do have different behaviors in some app. Suppose you prepare code (for example) in Emacs for use in some other context, and you want to be made aware when you use the wrong char, to avoid a problem downstream. Anyway, you get the point, I think. If you don't think there is a problem, I'm OK with that. ^ permalink raw reply [flat|nested] 182+ messages in thread
* Re: highlighting non-ASCII characters 2010-03-30 14:04 ` Drew Adams @ 2010-03-30 14:17 ` Lennart Borgman 2010-03-30 14:42 ` Ted Zlatanov 1 sibling, 0 replies; 182+ messages in thread From: Lennart Borgman @ 2010-03-30 14:17 UTC (permalink / raw) To: Drew Adams; +Cc: Ted Zlatanov, emacs-devel On Tue, Mar 30, 2010 at 4:04 PM, Drew Adams <drew.adams@oracle.com> wrote: > > And there's the opposite potential problem: not distinguishing similar chars > visually in the case where they do have different behaviors in some app. Suppose > you prepare code (for example) in Emacs for use in some other context, and you > want to be made aware when you use the wrong char, to avoid a problem > downstream. Even though I understand IDN chars are limited to URL-like things I expect them to form the basis of attempts to solve also problems like this. ^ permalink raw reply [flat|nested] 182+ messages in thread
* Re: highlighting non-ASCII characters 2010-03-30 14:04 ` Drew Adams 2010-03-30 14:17 ` Lennart Borgman @ 2010-03-30 14:42 ` Ted Zlatanov 1 sibling, 0 replies; 182+ messages in thread From: Ted Zlatanov @ 2010-03-30 14:42 UTC (permalink / raw) To: emacs-devel On Tue, 30 Mar 2010 07:04:05 -0700 "Drew Adams" <drew.adams@oracle.com> wrote: DA> Perhaps you're assuming that the code will be used in Emacs, so you say that DA> Emacs treats all such quotes similarly or highlights them anyway etc. (so no DA> problem). DA> Emacs might be used to write raw documentation (e.g. including code samples) DA> that is used to generate HTML or PDF or... Readers of that doc might then copy DA> and paste such examples into an app other than Emacs for execution - an app that DA> does not treat all such quotes similarly. DA> Just one hypothetical example, extrapolated from why we use straight quotes in DA> our use of Framemaker. I see what you mean. It would require copy from a text mode (any programming modes would show the problem with mismatched quotes). I don't know if Emacs should try to prevent this problem; the effort spent is far larger than the potential benefit. DA> Beyond that, I would think that there might be a number of use cases where one DA> might want to visually distinguish characters that are difficult to distinguish DA> - either exact homoglyphs or approximate ones. That's all. So far we've discussed exact homoglyphs, but I think once we have the basic UCS confusables we can augment it with more characters we think are confusable. DA> And there's the opposite potential problem: not distinguishing similar chars DA> visually in the case where they do have different behaviors in some app. Suppose DA> you prepare code (for example) in Emacs for use in some other context, and you DA> want to be made aware when you use the wrong char, to avoid a problem DA> downstream. OK. DA> Anyway, you get the point, I think. If you don't think there is a DA> problem, I'm OK with that. I think it's sufficient to start with the idnchars.txt and confusables.txt databases (the latter parsed into an auxiliary hashtable as well, to find what homoglyph group a character belongs to). From what you have said, it seems useful to augment the confusables.txt database with "similar but not exact homoglyphs" eventually but it's not as important. I'll try to put something together soon, unless Lennart beats me to it, to at least read the databases and generate the category-tables. Ted ^ permalink raw reply [flat|nested] 182+ messages in thread
* Re: highlighting non-ASCII characters 2010-03-30 13:15 ` Ted Zlatanov 2010-03-30 14:04 ` Drew Adams @ 2010-03-30 16:18 ` Juri Linkov 1 sibling, 0 replies; 182+ messages in thread From: Juri Linkov @ 2010-03-30 16:18 UTC (permalink / raw) To: Ted Zlatanov; +Cc: emacs-devel > DA> it can also be useful to highlight a character (e.g. curly quote) > DA> that is similar to but not identical to another character > DA> (e.g. straight quote). > > OK, but can you say how it's useful in a specific example? In SQL, > Perl, Java, Lisp, and TeX editing I would not need the *glyphs* > highlighted because the mode would detect the mismatch, e.g. in Perl > > $result = `run command here[wrong backward quote here]; # comment here > > would highlight "comment here" as part of the command. IOW they are > syntactically significant so a mismatch is not likely to go unnoticed > anyway by the regular font-lock and the parser. This is a significant problem. Often non-ASCII typographical characters (such as en-dash, em-dash used in the documentation for the long command line arguments, and different quotation marks used to quote command line arguments) copied from a Web page to Emacs look like ASCII characters. They can cause security compromises when pasted into the M-x shell command line unnoticed. In the best case, a shell command just signals an error, in the worst case it overwrites existing files. -- Juri Linkov http://www.jurta.org/emacs/ ^ permalink raw reply [flat|nested] 182+ messages in thread
* Re: highlighting non-ASCII characters 2010-03-29 21:05 ` Ted Zlatanov 2010-03-29 21:31 ` Lennart Borgman 2010-03-29 21:32 ` Drew Adams @ 2010-03-30 1:45 ` Stefan Monnier 2 siblings, 0 replies; 182+ messages in thread From: Stefan Monnier @ 2010-03-30 1:45 UTC (permalink / raw) To: Ted Zlatanov; +Cc: emacs-devel > I know the goal is to match in context and I may take whitespace.el as a > guide in this regard, but I have to start with a [:confusable:] > character class. I'll also add a [:idn:] class as discussed. Is that > OK or are you concerned about code bloat in regexp.c? It seems that character categories (see category-talble and friends) are a better choice. Stefan ^ permalink raw reply [flat|nested] 182+ messages in thread
* Re: highlighting non-ASCII characters 2010-03-24 16:21 ` Lennart Borgman 2010-03-24 19:34 ` Lennart Borgman @ 2010-03-25 7:11 ` Juri Linkov 2010-03-25 14:07 ` Lennart Borgman 1 sibling, 1 reply; 182+ messages in thread From: Juri Linkov @ 2010-03-25 7:11 UTC (permalink / raw) To: Lennart Borgman; +Cc: Ted Zlatanov, emacs-devel > Maybe "Recommended Identifier Profiles for IDN" should be implemented > in Emacs? (See http://www.unicode.org/reports/tr39/data/idnchars.txt) I think it would be more useful to implement this spec: http://www.unicode.org/reports/tr39/data/confusables.txt "Visually Confusable Characters: Provides a mapping for visual confusables for use in further restricting identifiers for security". It's very large, but it seems it's still incomplete. I can't find a "confusable" mapping for the problem I reported: BOX DRAWINGS DOUBLE HORIZONTAL -> EQUALS SIGN -- Juri Linkov http://www.jurta.org/emacs/ ^ permalink raw reply [flat|nested] 182+ messages in thread
* Re: highlighting non-ASCII characters 2010-03-25 7:11 ` Juri Linkov @ 2010-03-25 14:07 ` Lennart Borgman 2010-03-25 17:32 ` Juri Linkov 0 siblings, 1 reply; 182+ messages in thread From: Lennart Borgman @ 2010-03-25 14:07 UTC (permalink / raw) To: Juri Linkov; +Cc: Ted Zlatanov, emacs-devel On Thu, Mar 25, 2010 at 8:11 AM, Juri Linkov <juri@jurta.org> wrote: >> Maybe "Recommended Identifier Profiles for IDN" should be implemented >> in Emacs? (See http://www.unicode.org/reports/tr39/data/idnchars.txt) > > I think it would be more useful to implement this spec: > http://www.unicode.org/reports/tr39/data/confusables.txt > "Visually Confusable Characters: Provides a mapping for visual > confusables for use in further restricting identifiers for security". I have already implemented marking of non-ascii, non-IDN chars (it is in the nXhtml repository currently). To me it looks like IDN is the most important. Is not this a derivative work from "confusables"? Perhaps it is useful to implement "confusables" too. Do you have any idea of what to mark there then? (And what would actually differ from the IDN case?) ^ permalink raw reply [flat|nested] 182+ messages in thread
* Re: highlighting non-ASCII characters 2010-03-25 14:07 ` Lennart Borgman @ 2010-03-25 17:32 ` Juri Linkov 2010-03-26 0:32 ` Lennart Borgman 0 siblings, 1 reply; 182+ messages in thread From: Juri Linkov @ 2010-03-25 17:32 UTC (permalink / raw) To: Lennart Borgman; +Cc: Ted Zlatanov, emacs-devel > To me it looks like IDN is the most important. Is not this a > derivative work from "confusables"? How IDN will help to highlight characters that look like other characters? > Perhaps it is useful to implement "confusables" too. Do you have any > idea of what to mark there then? You can mark all characters mentioned in confusables.txt. > (And what would actually differ from the IDN case?) What do you want to use IDN for? -- Juri Linkov http://www.jurta.org/emacs/ ^ permalink raw reply [flat|nested] 182+ messages in thread
* Re: highlighting non-ASCII characters 2010-03-25 17:32 ` Juri Linkov @ 2010-03-26 0:32 ` Lennart Borgman 2010-03-26 13:38 ` Stephen Berman 0 siblings, 1 reply; 182+ messages in thread From: Lennart Borgman @ 2010-03-26 0:32 UTC (permalink / raw) To: Juri Linkov; +Cc: Ted Zlatanov, emacs-devel On Thu, Mar 25, 2010 at 6:32 PM, Juri Linkov <juri@jurta.org> wrote: >> To me it looks like IDN is the most important. Is not this a >> derivative work from "confusables"? > > How IDN will help to highlight characters that look like other characters? All this is new to me so I might have misunderstood it. However it looks to me like the IDN (which I have supposed means IDentifier Name, but maybe that is wrong) chars are selected so that the characters choosen are not easily mistaken for other chars. And I suppose that as many characters as possible has been included in IDN. If my assumptions are right, would then the non-IDN chars not be the ones to mark? >> Perhaps it is useful to implement "confusables" too. Do you have any >> idea of what to mark there then? > > You can mark all characters mentioned in confusables.txt. Yes, I thought of that (if someone can explain why and when it may be useful). >> (And what would actually differ from the IDN case?) > > What do you want to use IDN for? I thought they were choosen to more safely create URLs, programming symbols etc. Is not that the purpose? ^ permalink raw reply [flat|nested] 182+ messages in thread
* Re: highlighting non-ASCII characters 2010-03-26 0:32 ` Lennart Borgman @ 2010-03-26 13:38 ` Stephen Berman 2010-03-26 22:44 ` Lennart Borgman 0 siblings, 1 reply; 182+ messages in thread From: Stephen Berman @ 2010-03-26 13:38 UTC (permalink / raw) To: emacs-devel On Fri, 26 Mar 2010 01:32:53 +0100 Lennart Borgman <lennart.borgman@gmail.com> wrote: > On Thu, Mar 25, 2010 at 6:32 PM, Juri Linkov <juri@jurta.org> wrote: >>> To me it looks like IDN is the most important. Is not this a >>> derivative work from "confusables"? >> >> How IDN will help to highlight characters that look like other characters? > > > All this is new to me so I might have misunderstood it. However it > looks to me like the IDN (which I have supposed means IDentifier Name, > but maybe that is wrong) Internationalized Domain Names (IDN), see http://unicode.org/reports/tr36/#international_domain_names Steve Berman ^ permalink raw reply [flat|nested] 182+ messages in thread
* Re: highlighting non-ASCII characters 2010-03-26 13:38 ` Stephen Berman @ 2010-03-26 22:44 ` Lennart Borgman 0 siblings, 0 replies; 182+ messages in thread From: Lennart Borgman @ 2010-03-26 22:44 UTC (permalink / raw) To: Stephen Berman; +Cc: emacs-devel On Fri, Mar 26, 2010 at 2:38 PM, Stephen Berman <stephen.berman@gmx.net> wrote: >> >> All this is new to me so I might have misunderstood it. However it >> looks to me like the IDN (which I have supposed means IDentifier Name, >> but maybe that is wrong) > > Internationalized Domain Names (IDN), see > http://unicode.org/reports/tr36/#international_domain_names Oh, thanks. Seems like I have mixed my thoughts about what it can be used to with the currently official intended use. Still I think it could be a good idea to restrict identifier names to chars within IDN. ^ permalink raw reply [flat|nested] 182+ messages in thread
* Re: highlighting non-ASCII characters 2010-03-24 10:05 ` Ted Zlatanov 2010-03-24 16:21 ` Lennart Borgman @ 2010-03-25 7:12 ` Juri Linkov 1 sibling, 0 replies; 182+ messages in thread From: Juri Linkov @ 2010-03-25 7:12 UTC (permalink / raw) To: Ted Zlatanov; +Cc: emacs-devel > show-out-of-place-glyphs: alist; keys can be 'ascii, 'nonascii, > or 'homoglyphs. Maybe we can also allow a general regex. > > Values can be 'always, 'majority-paragraph, majority-line, > 'minority-line, 'minority-paragraph, 'suspicious (with the same rules I > proposed earlier). A function should also be possible. Optional second > value is a face, defaulting to `out-of-place-glyph'. If you are going to implement this package, I suggest you to look at whitespace.el: how it defines where to highlight whitespace character, how it defines rules for different classes of whitespace. I don't propose to extend it to visualize suspicious characters and homoglyphs in addition to whitespace, but really it has very similar functionality. -- Juri Linkov http://www.jurta.org/emacs/ ^ permalink raw reply [flat|nested] 182+ messages in thread
* RE: highlighting non-ASCII characters 2010-03-24 0:45 ` Stefan Monnier 2010-03-24 1:03 ` Ted Zlatanov @ 2010-03-24 2:09 ` Drew Adams 2010-03-24 5:00 ` Stephen J. Turnbull 2010-03-24 9:28 ` Juri Linkov 3 siblings, 0 replies; 182+ messages in thread From: Drew Adams @ 2010-03-24 2:09 UTC (permalink / raw) To: 'Stefan Monnier'; +Cc: 'Ted Zlatanov', emacs-devel > >> > In my case, I customize `escape-glyph' to be something very > >> > noticeable (orange background, with blue foreground and > >> > 1-pixel blue box). I want to see something like ^G really > >> > stand out. I don't want to see > >> > non-ascii chars handled the same way. > >> > >> Actually, in the original case (a weird non-ASCII K in the > >> middle of ASCII text), I would want it to stand out. > >> The problem there > >> is how to define "weird". > > > Whether you want non-ASCII chars to also stand out is a separate > > question. My point is that I don't want to see them handled > > _the same way_ as ^G et compagnie. > > What I'm saying is that there are two issues: non-ASCII chars > in general (which I personally don't want to display in any > special manner: > they're just as normal as ASCII chars), and then there are "chars that > are out of place or that may not be what they look like", such as the > weird "K" in the other message's "OK" (which to me, is similar to the > NBSP char in that it is meant to be displayed in the same way as some > other char, so we want to call the attention of the user to the > difference). > > > Having separate faces lets users get the behavior they want. If you > > then want to customize the two faces to look the same, no problem. > > I don't insist on using escape-glyph for those chars, indeed (I don't > really care which face is used for them). What I care about > is figuring out how to define programmatically "chars that are > out of place or that may not be what they look like". Sounds good to me. When you figure it out, you can have a way to highlight "weird" non-ASCII that's different from escape-glyph. And if someone also wants a way to highlight non-weird non-ASCII, that can be done too. ;-) ^ permalink raw reply [flat|nested] 182+ messages in thread
* Re: highlighting non-ASCII characters 2010-03-24 0:45 ` Stefan Monnier 2010-03-24 1:03 ` Ted Zlatanov 2010-03-24 2:09 ` Drew Adams @ 2010-03-24 5:00 ` Stephen J. Turnbull 2010-03-24 9:28 ` Juri Linkov 3 siblings, 0 replies; 182+ messages in thread From: Stephen J. Turnbull @ 2010-03-24 5:00 UTC (permalink / raw) To: Stefan Monnier; +Cc: 'Ted Zlatanov', Drew Adams, emacs-devel Stefan Monnier writes: > I don't insist on using escape-glyph for those chars, indeed (I don't > really care which face is used for them). What I care about is figuring > out how to define programmatically "chars that are out of place or that > may not be what they look like". There were long threads on Python-dev about this with respect to the PEPs implementing Unicode. The bottom line was basically that the recommendations of the Unicode Security Considerations UTR #36 should be followed with respect to "characters that may not be what they look like". "Out of place" is harder. ^ permalink raw reply [flat|nested] 182+ messages in thread
* Re: highlighting non-ASCII characters 2010-03-24 0:45 ` Stefan Monnier ` (2 preceding siblings ...) 2010-03-24 5:00 ` Stephen J. Turnbull @ 2010-03-24 9:28 ` Juri Linkov 2010-03-24 13:15 ` Ted Zlatanov 3 siblings, 1 reply; 182+ messages in thread From: Juri Linkov @ 2010-03-24 9:28 UTC (permalink / raw) To: Stefan Monnier; +Cc: 'Ted Zlatanov', emacs-devel > What I'm saying is that there are two issues: non-ASCII chars in general > (which I personally don't want to display in any special manner: > they're just as normal as ASCII chars), and then there are "chars that > are out of place or that may not be what they look like", such as the > weird "K" in the other message's "OK" (which to me, is similar to the > NBSP char in that it is meant to be displayed in the same way as some > other char, so we want to call the attention of the user to the > difference). There is another case where non-highlighted out of place chars cause problems: http://thread.gmane.org/gmane.emacs.devel/121494 When I look at NBSP chars in cc-engine.el, I see them as `EQUALS SIGN' (actually `C-u C-x =' says they are `BOX DRAWINGS DOUBLE HORIZONTAL'). That's because of the wrong coding used (there is no coding: tag, so they are displayed with my default koi8). When Miles reported this problem with NBSP chars in cc-engine.el, I found nothing suspicious because equals-sign chars don't look out of place. And only when I grepped for non-ASCII chars, I discovered them. What should be a heuristics to highlight these problematic chars that look like normal chars? -- Juri Linkov http://www.jurta.org/emacs/ ^ permalink raw reply [flat|nested] 182+ messages in thread
* Re: highlighting non-ASCII characters 2010-03-24 9:28 ` Juri Linkov @ 2010-03-24 13:15 ` Ted Zlatanov 0 siblings, 0 replies; 182+ messages in thread From: Ted Zlatanov @ 2010-03-24 13:15 UTC (permalink / raw) To: emacs-devel On Wed, 24 Mar 2010 11:28:16 +0200 Juri Linkov <juri@jurta.org> wrote: JL> There is another case where non-highlighted out of place chars JL> cause problems: http://thread.gmane.org/gmane.emacs.devel/121494 JL> When I look at NBSP chars in cc-engine.el, I see them as `EQUALS SIGN' JL> (actually `C-u C-x =' says they are `BOX DRAWINGS DOUBLE HORIZONTAL'). JL> That's because of the wrong coding used (there is no coding: tag, JL> so they are displayed with my default koi8). When Miles reported this JL> problem with NBSP chars in cc-engine.el, I found nothing suspicious JL> because equals-sign chars don't look out of place. And only when JL> I grepped for non-ASCII chars, I discovered them. JL> What should be a heuristics to highlight these problematic chars JL> that look like normal chars? If you look at my latest proposal for showing out-of-place characters, '(homoglyphs suspicious) will probably work for this case as well. Ted ^ permalink raw reply [flat|nested] 182+ messages in thread
* Re: highlighting non-ASCII characters 2010-03-23 14:22 ` highlighting non-ASCII characters (was: Translation of http status code to text) Ted Zlatanov 2010-03-23 16:50 ` highlighting non-ASCII characters (was: Translation of http statuscode " Drew Adams @ 2010-03-24 9:27 ` Juri Linkov 1 sibling, 0 replies; 182+ messages in thread From: Juri Linkov @ 2010-03-24 9:27 UTC (permalink / raw) To: Ted Zlatanov; +Cc: emacs-devel > I can set up font-lock for myself, of course, but was wondering > if Emacs has a simple built-in way to highlight them. This is what `hi-lock.el' is intended for. You can put (highlight-regexp "[[:nonascii:]]" 'escape-glyph) to any hook you want with any face you want. Or to add file-local patterns like Hi-lock:(("[[:nonascii:]]" (0 (quote escape-glyph) t))). -- Juri Linkov http://www.jurta.org/emacs/ ^ permalink raw reply [flat|nested] 182+ messages in thread
* Re: Translation of http status code to text 2010-03-22 17:06 ` Ted Zlatanov 2010-03-22 17:55 ` Sven Joachim @ 2010-03-22 18:41 ` Stefan Monnier 2010-03-22 19:15 ` Ted Zlatanov 1 sibling, 1 reply; 182+ messages in thread From: Stefan Monnier @ 2010-03-22 18:41 UTC (permalink / raw) To: Ted Zlatanov; +Cc: emacs-devel > Here's one version of the HTTP codes as an alist. I was thinking of > also generating defconst calls based on this list, that's why I named > everything "url-http-code-*". But maybe that's not necessary and > accessor functions will be enough, so then we can s/url-http-code-// The symbol part depends on the rest of the patch: what do you use the symbol for? Do you even need it? Stefan ^ permalink raw reply [flat|nested] 182+ messages in thread
* Re: Translation of http status code to text 2010-03-22 18:41 ` Translation of http status code to text Stefan Monnier @ 2010-03-22 19:15 ` Ted Zlatanov 2010-03-23 9:54 ` Juri Linkov 2010-03-23 12:57 ` Stefan Monnier 0 siblings, 2 replies; 182+ messages in thread From: Ted Zlatanov @ 2010-03-22 19:15 UTC (permalink / raw) To: emacs-devel On Mon, 22 Mar 2010 14:41:46 -0400 Stefan Monnier <monnier@iro.umontreal.ca> wrote: >> Here's one version of the HTTP codes as an alist. I was thinking of >> also generating defconst calls based on this list, that's why I named >> everything "url-http-code-*". But maybe that's not necessary and >> accessor functions will be enough, so then we can s/url-http-code-// SM> The symbol part depends on the rest of the patch: what do you use the SM> symbol for? Do you even need it? I was going to change url-http.el like so, in url-http-parse-headers: (case url-http-response-status ((url-http-code-multiple-choices) ; was 300, uses defconst ... nil) ...) and I thought generating the defconsts would be best without dynamically generating the symbol name as "url-http-code-%s" from the symbol in the table. But instead I can say (url-http-code-number 'multiple-choices). It's really a matter of taste. Ted ^ permalink raw reply [flat|nested] 182+ messages in thread
* Re: Translation of http status code to text 2010-03-22 19:15 ` Ted Zlatanov @ 2010-03-23 9:54 ` Juri Linkov 2010-03-23 10:54 ` joakim 2010-03-23 15:02 ` Ted Zlatanov 2010-03-23 12:57 ` Stefan Monnier 1 sibling, 2 replies; 182+ messages in thread From: Juri Linkov @ 2010-03-23 9:54 UTC (permalink / raw) To: Ted Zlatanov; +Cc: emacs-devel > SM> The symbol part depends on the rest of the patch: what do you use the > SM> symbol for? Do you even need it? > > I was going to change url-http.el like so, in url-http-parse-headers: > > (case url-http-response-status > ((url-http-code-multiple-choices) ; was 300, uses defconst > ... > nil) > ...) > > and I thought generating the defconsts would be best without dynamically > generating the symbol name as "url-http-code-%s" from the symbol in the > table. > > But instead I can say (url-http-code-number 'multiple-choices). It's > really a matter of taste. Please don't add symbols. Developers prefer to refer to numbers. The "number to text" mapping is necessary only to display codes in a readable form to users. Please re-read Lennart's original request: Is there a list in Emacs for translating http status code to readable text (like http://en.wikipedia.org/wiki/List_of_HTTP_status_codes)? -- Juri Linkov http://www.jurta.org/emacs/ ^ permalink raw reply [flat|nested] 182+ messages in thread
* Re: Translation of http status code to text 2010-03-23 9:54 ` Juri Linkov @ 2010-03-23 10:54 ` joakim 2010-03-23 15:02 ` Ted Zlatanov 1 sibling, 0 replies; 182+ messages in thread From: joakim @ 2010-03-23 10:54 UTC (permalink / raw) To: Juri Linkov; +Cc: Ted Zlatanov, emacs-devel Juri Linkov <juri@jurta.org> writes: >> SM> The symbol part depends on the rest of the patch: what do you use the >> SM> symbol for? Do you even need it? >> >> I was going to change url-http.el like so, in url-http-parse-headers: >> >> (case url-http-response-status >> ((url-http-code-multiple-choices) ; was 300, uses defconst >> ... >> nil) >> ...) >> >> and I thought generating the defconsts would be best without dynamically >> generating the symbol name as "url-http-code-%s" from the symbol in the >> table. >> >> But instead I can say (url-http-code-number 'multiple-choices). It's >> really a matter of taste. > > Please don't add symbols. Developers prefer to refer to numbers. > > The "number to text" mapping is necessary only to display codes > in a readable form to users. Please re-read Lennart's original > request: > > Is there a list in Emacs for translating http status code to readable > text (like http://en.wikipedia.org/wiki/List_of_HTTP_status_codes)? As a developer I'd like both numbers and symbols. Maybe thats just me. I think this is a bit particular for the http protocol, since some codes are extremely well known, whereas others are not. -- Joakim Verona ^ permalink raw reply [flat|nested] 182+ messages in thread
* Re: Translation of http status code to text 2010-03-23 9:54 ` Juri Linkov 2010-03-23 10:54 ` joakim @ 2010-03-23 15:02 ` Ted Zlatanov 2010-03-24 3:22 ` Stefan Monnier 2010-03-24 17:35 ` Glenn Morris 1 sibling, 2 replies; 182+ messages in thread From: Ted Zlatanov @ 2010-03-23 15:02 UTC (permalink / raw) To: emacs-devel [-- Attachment #1: Type: text/plain, Size: 1913 bytes --] On Tue, 23 Mar 2010 11:54:18 +0200 Juri Linkov <juri@jurta.org> wrote: JL> Please don't add symbols. Developers prefer to refer to numbers. I considered that but it really seemed useful to have symbolic codes as I was working through the list of text explanations. On Tue, 23 Mar 2010 11:54:40 +0100 joakim@verona.se wrote: j> As a developer I'd like both numbers and symbols. Maybe thats just me. I j> think this is a bit particular for the http protocol, since some codes j> are extremely well known, whereas others are not. Ditto. On Tue, 23 Mar 2010 08:57:15 -0400 Stefan Monnier <monnier@iro.umontreal.ca> wrote: SM> I see, then it's OK to add symbols, I guess, but in this case use SM> shorter ones, so you can do: SM> (case (cadr (assq status-number url-http-codes))) SM> (OK ...) SM> (moved-permanently ...) SM> (proxy-authentication-required ...) SM> (accepted ...) SM> ...) Yeah, that was the other option. It works too. SM> And in that case, a better option is to create the symbols SM> programmatically from the error string: SM> (defconst url-http-codes SM> (mapcar (lambda (x) SM> (let ((s (subst-char-in-string ?\s ?- (cadr x)))) SM> (when (string-match "-(.*)" s) SM> (setq s (substring s 0 (match-beginning 0)))) SM> (list (car x) (intern (downcase s)) (cadr x)))) SM> '((100 "Continue with request") SM> ...))) From the perspective of the programmer it's better to see the symbols (grep won't work with this approach). Also "OK" look weird as the "ok" status code and "time-out" in the error code is odd IMHO. These are matters of taste but I hope you'll allow that the manually generated table is a little better. It only has to be done once, after all. Attached is another attempt which uses the symbols in the code and doesn't dynamically generate the symbols. Ted [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #2: url-http-codes.patch --] [-- Type: text/x-diff, Size: 11201 bytes --] === modified file 'lisp/url/url-http.el' --- lisp/url/url-http.el 2010-01-13 08:35:10 +0000 +++ lisp/url/url-http.el 2010-03-23 14:54:14 +0000 @@ -64,6 +64,55 @@ nil will explicitly close the connection to the server after every request.") +(defconst url-http-codes + '((100 continue "Continue with request") + (101 switching-protocols "Switching protocols") + (102 processing "Processing (Added by DAV)") + (200 OK "OK") + (201 created "Created") + (202 accepted "Accepted") + (203 non-authoritative "Non-authoritative information") + (204 no-content "No content") + (205 reset-content "Reset content") + (206 partial-content "Partial content") + (207 multi-status "Multi-status (Added by DAV)") + (300 multiple-choices "Multiple choices") + (301 moved-permanently "Moved permanently") + (302 found "Found") + (303 see-other "See other") + (304 not-modified "Not modified") + (305 use-proxy "Use proxy") + (307 temporary-redirect "Temporary redirect") + (400 bad-request "Bad Request") + (401 unauthorized "Unauthorized") + (402 payment-required "Payment required") + (403 forbidden "Forbidden") + (404 not-found "Not found") + (405 method-not-allowed "Method not allowed") + (406 not-acceptable "Not acceptable") + (407 proxy-authentication-required "Proxy authentication required") + (408 request-timeout "Request time-out") + (409 conflict "Conflict") + (410 gone "Gone") + (411 length-required "Length required") + (412 precondition-failed "Precondition failed") + (413 request-entity-too-large "Request entity too large") + (414 request-uri-too-large "Request-URI too large") + (415 unsupported-media-type "Unsupported media type") + (416 requested-range-not-satisfiable "Requested range not satisfiable") + (417 expectation-failed "Expectation failed") + (422 unprocessable-entity "Unprocessable Entity (Added by DAV)") + (423 locked "Locked") + (424 failed-Dependency "Failed Dependency") + (500 internal-server-error "Internal server error") + (501 not-implemented "Not implemented") + (502 bad-gateway "Bad gateway") + (503 service-unavailable "Service unavailable") + (504 gateway-timeout "Gateway time-out") + (505 http-version-not-supported "HTTP version not supported") + (507 insufficient-storage "Insufficient storage") +"The HTTP return codes and their text.")) + ;(eval-when-compile ;; These are all macros so that they are hidden from external sight ;; when the file is byte-compiled. @@ -435,7 +484,9 @@ (delete-process url-http-process))))) (let ((buffer (current-buffer)) (class nil) - (success nil)) + (success nil) + ;; other status symbols: jewelry and luxury cars + (status-symbol (cadr (assq status-number url-http-codes)))) (setq class (/ url-http-response-status 100)) (url-http-debug "Parsed HTTP headers: class=%d status=%d" class url-http-response-status) (url-http-handle-cookies) @@ -463,8 +514,8 @@ ;; 205 Reset content ;; 206 Partial content ;; 207 Multi-status (Added by DAV) - (case url-http-response-status - ((204 205) + (case status-symbol + ((no-content reset-content) ;; No new data, just stay at the same document (url-mark-buffer-as-dead buffer) (setq success t)) @@ -485,8 +536,8 @@ ;; 307 Temporary redirect (let ((redirect-uri (or (mail-fetch-field "Location") (mail-fetch-field "URI")))) - (case url-http-response-status - (300 + (case status-symbol + (multiple-choices ; 300 ;; Quoth the spec (section 10.3.1) ;; ------------------------------- ;; The requested resource corresponds to any one of a set of @@ -503,7 +554,7 @@ ;; We do not support agent-driven negotiation, so we just ;; redirect to the preferred URI if one is provided. nil) - ((301 302 307) + ((moved-permanently found temporary-redirect) ; 301 302 307 ;; If the 301|302 status code is received in response to a ;; request other than GET or HEAD, the user agent MUST NOT ;; automatically redirect the request unless it can be @@ -519,20 +570,20 @@ url-http-method url-http-response-status) (setq url-http-method "GET" url-http-data nil))) - (303 + (see-other ; 303 ;; The response to the request can be found under a different ;; URI and SHOULD be retrieved using a GET method on that ;; resource. (setq url-http-method "GET" url-http-data nil)) - (304 + (not-modified ; 304 ;; The 304 response MUST NOT contain a message-body. (url-http-debug "Extracting document from cache... (%s)" (url-cache-create-filename (url-view-url t))) (url-cache-extract (url-cache-create-filename (url-view-url t))) (setq redirect-uri nil success t)) - (305 + (use-proxy ; 305 ;; The requested resource MUST be accessed through the ;; proxy given by the Location field. The Location field ;; gives the URI of the proxy. The recipient is expected @@ -620,51 +671,51 @@ ;; 422 Unprocessable Entity (Added by DAV) ;; 423 Locked ;; 424 Failed Dependency - (case url-http-response-status - (401 + (case status-symbol + (unauthorized ; 401 ;; The request requires user authentication. The response ;; MUST include a WWW-Authenticate header field containing a ;; challenge applicable to the requested resource. The ;; client MAY repeat the request with a suitable ;; Authorization header field. (url-http-handle-authentication nil)) - (402 + (payment-required ; 402 ;; This code is reserved for future use (url-mark-buffer-as-dead buffer) (error "Somebody wants you to give them money")) - (403 + (forbidden ; 403 ;; The server understood the request, but is refusing to ;; fulfill it. Authorization will not help and the request ;; SHOULD NOT be repeated. (setq success t)) - (404 + (not-found ; 404 ;; Not found (setq success t)) - (405 + (method-not-allowed ; 405 ;; The method specified in the Request-Line is not allowed ;; for the resource identified by the Request-URI. The ;; response MUST include an Allow header containing a list of ;; valid methods for the requested resource. (setq success t)) - (406 + (not-acceptable ; 406 ;; The resource identified by the request is only capable of ;; generating response entities which have content ;; characteristics nota cceptable according to the accept ;; headers sent in the request. (setq success t)) - (407 + (proxy-authentication-required ; 407 ;; This code is similar to 401 (Unauthorized), but indicates ;; that the client must first authenticate itself with the ;; proxy. The proxy MUST return a Proxy-Authenticate header ;; field containing a challenge applicable to the proxy for ;; the requested resource. (url-http-handle-authentication t)) - (408 + (request-timeout ; 408 ;; The client did not produce a request within the time that ;; the server was prepared to wait. The client MAY repeat ;; the request without modifications at any later time. (setq success t)) - (409 + (conflict ; 409 ;; The request could not be completed due to a conflict with ;; the current state of the resource. This code is only ;; allowed in situations where it is expected that the user @@ -673,11 +724,11 @@ ;; information for the user to recognize the source of the ;; conflict. (setq success t)) - (410 + (gone ; 410 ;; The requested resource is no longer available at the ;; server and no forwarding address is known. (setq success t)) - (411 + (length-required ; 411 ;; The server refuses to accept the request without a defined ;; Content-Length. The client MAY repeat the request if it ;; adds a valid Content-Length header field containing the @@ -687,29 +738,29 @@ ;; `url-http-create-request' automatically calculates the ;; content-length. (setq success t)) - (412 + (precondition-failed ; 412 ;; The precondition given in one or more of the ;; request-header fields evaluated to false when it was ;; tested on the server. (setq success t)) - ((413 414) + ((request-entity-too-large request-uri-too-large) ; 413 414 ;; The server is refusing to process a request because the ;; request entity|URI is larger than the server is willing or ;; able to process. (setq success t)) - (415 + (unsupported-media-type ; 415 ;; The server is refusing to service the request because the ;; entity of the request is in a format not supported by the ;; requested resource for the requested method. (setq success t)) - (416 + (requested-range-not-satisfiable ; 416 ;; A server SHOULD return a response with this status code if ;; a request included a Range request-header field, and none ;; of the range-specifier values in this field overlap the ;; current extent of the selected resource, and the request ;; did not include an If-Range request-header field. (setq success t)) - (417 + (expectation-failed ; 417 ;; The expectation given in an Expect request-header field ;; could not be met by this server, or, if the server is a ;; proxy, the server has unambiguous evidence that the @@ -736,16 +787,16 @@ ;; 507 Insufficient storage (setq success t) (case url-http-response-status - (501 + (not-implemented ; 501 ;; The server does not support the functionality required to ;; fulfill the request. nil) - (502 + (bad-gateway ; 502 ;; The server, while acting as a gateway or proxy, received ;; an invalid response from the upstream server it accessed ;; in attempting to fulfill the request. nil) - (503 + (service-unavailable ; 503 ;; The server is currently unable to handle the request due ;; to a temporary overloading or maintenance of the server. ;; The implication is that this is a temporary condition @@ -754,19 +805,19 @@ ;; header. If no Retry-After is given, the client SHOULD ;; handle the response as it would for a 500 response. nil) - (504 + (gateway-timeout ; 504 ;; The server, while acting as a gateway or proxy, did not ;; receive a timely response from the upstream server ;; specified by the URI (e.g. HTTP, FTP, LDAP) or some other ;; auxiliary server (e.g. DNS) it needed to access in ;; attempting to complete the request. nil) - (505 + (http-version-not-supported ; 505 ;; The server does not support, or refuses to support, the ;; HTTP protocol version that was used in the request ;; message. nil) - (507 ; DAV + (insufficient-storage ; 507 (DAV) ;; The method could not be performed on the resource ;; because the server is unable to store the representation ;; needed to successfully complete the request. This ^ permalink raw reply [flat|nested] 182+ messages in thread
* Re: Translation of http status code to text 2010-03-23 15:02 ` Ted Zlatanov @ 2010-03-24 3:22 ` Stefan Monnier 2010-03-24 17:35 ` Glenn Morris 1 sibling, 0 replies; 182+ messages in thread From: Stefan Monnier @ 2010-03-24 3:22 UTC (permalink / raw) To: Ted Zlatanov; +Cc: emacs-devel > From the perspective of the programmer it's better to see the symbols > (grep won't work with this approach). Also "OK" look weird as the "ok" > status code and "time-out" in the error code is odd IMHO. These are > matters of taste but I hope you'll allow that the manually generated > table is a little better. It only has to be done once, after all. ok, Stefan ^ permalink raw reply [flat|nested] 182+ messages in thread
* Re: Translation of http status code to text 2010-03-23 15:02 ` Ted Zlatanov 2010-03-24 3:22 ` Stefan Monnier @ 2010-03-24 17:35 ` Glenn Morris 2010-03-24 19:37 ` Ted Zlatanov 1 sibling, 1 reply; 182+ messages in thread From: Glenn Morris @ 2010-03-24 17:35 UTC (permalink / raw) To: Ted Zlatanov; +Cc: emacs-devel + ;; other status symbols: jewelry and luxury cars + (status-symbol (cadr (assq status-number url-http-codes)))) `status-number' doesn't seem to be defined anywhere; should it be url-http-response-status ? ^ permalink raw reply [flat|nested] 182+ messages in thread
* Re: Translation of http status code to text 2010-03-24 17:35 ` Glenn Morris @ 2010-03-24 19:37 ` Ted Zlatanov 2010-03-25 1:16 ` Ted Zlatanov 0 siblings, 1 reply; 182+ messages in thread From: Ted Zlatanov @ 2010-03-24 19:37 UTC (permalink / raw) To: emacs-devel [-- Attachment #1: Type: text/plain, Size: 455 bytes --] On Wed, 24 Mar 2010 13:35:10 -0400 Glenn Morris <rgm@gnu.org> wrote: GM> + ;; other status symbols: jewelry and luxury cars GM> + (status-symbol (cadr (assq status-number url-http-codes)))) GM> `status-number' doesn't seem to be defined anywhere; should it be GM> url-http-response-status ? Sorry about that, I had the right version but screwed up the commit. I can't commit currently but maybe you could? The (trivial) patch is below. Thanks Ted [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #2: url-http-status.patch --] [-- Type: text/x-diff, Size: 1082 bytes --] === modified file 'lisp/url/ChangeLog' --- lisp/url/ChangeLog 2010-03-24 09:30:28 +0000 +++ lisp/url/ChangeLog 2010-03-24 19:29:55 +0000 @@ -1,5 +1,9 @@ 2010-03-24 Teodor Zlatanov <tzz@lifelogs.com> + * url-http.el (url-http-parse-headers): Fix wrong variable name. + +2010-03-24 Teodor Zlatanov <tzz@lifelogs.com> + * url-http.el (url-http-codes): New variable to hold a mapping of HTTP status codes' numbers, their symbolic name, and their text. (url-http-parse-headers): Use it, leaving the original numeric === modified file 'lisp/url/url-http.el' --- lisp/url/url-http.el 2010-03-24 09:30:28 +0000 +++ lisp/url/url-http.el 2010-03-24 19:29:08 +0000 @@ -486,7 +486,7 @@ (class nil) (success nil) ;; other status symbols: jewelry and luxury cars - (status-symbol (cadr (assq status-number url-http-codes)))) + (status-symbol (cadr (assq url-http-response-status url-http-codes)))) (setq class (/ url-http-response-status 100)) (url-http-debug "Parsed HTTP headers: class=%d status=%d" class url-http-response-status) (url-http-handle-cookies) ^ permalink raw reply [flat|nested] 182+ messages in thread
* Re: Translation of http status code to text 2010-03-24 19:37 ` Ted Zlatanov @ 2010-03-25 1:16 ` Ted Zlatanov 0 siblings, 0 replies; 182+ messages in thread From: Ted Zlatanov @ 2010-03-25 1:16 UTC (permalink / raw) To: emacs-devel On Wed, 24 Mar 2010 14:37:24 -0500 Ted Zlatanov <tzz@lifelogs.com> wrote: TZ> On Wed, 24 Mar 2010 13:35:10 -0400 Glenn Morris <rgm@gnu.org> wrote: GM> + ;; other status symbols: jewelry and luxury cars GM> + (status-symbol (cadr (assq status-number url-http-codes)))) GM> `status-number' doesn't seem to be defined anywhere; should it be GM> url-http-response-status ? TZ> Sorry about that, I had the right version but screwed up the commit. I TZ> can't commit currently but maybe you could? The (trivial) patch is below. It's fixed. Ted ^ permalink raw reply [flat|nested] 182+ messages in thread
* Re: Translation of http status code to text 2010-03-22 19:15 ` Ted Zlatanov 2010-03-23 9:54 ` Juri Linkov @ 2010-03-23 12:57 ` Stefan Monnier 1 sibling, 0 replies; 182+ messages in thread From: Stefan Monnier @ 2010-03-23 12:57 UTC (permalink / raw) To: Ted Zlatanov; +Cc: emacs-devel >>> Here's one version of the HTTP codes as an alist. I was thinking of >>> also generating defconst calls based on this list, that's why I named >>> everything "url-http-code-*". But maybe that's not necessary and >>> accessor functions will be enough, so then we can s/url-http-code-// SM> The symbol part depends on the rest of the patch: what do you use the SM> symbol for? Do you even need it? > I was going to change url-http.el like so, in url-http-parse-headers: > (case url-http-response-status > ((url-http-code-multiple-choices) ; was 300, uses defconst > ... > nil) > ...) I see, then it's OK to add symbols, I guess, but in this case use shorter ones, so you can do: (case (cadr (assq status-number url-http-codes))) (OK ...) (moved-permanently ...) (proxy-authentication-required ...) (accepted ...) ...) And in that case, a better option is to create the symbols programmatically from the error string: (defconst url-http-codes (mapcar (lambda (x) (let ((s (subst-char-in-string ?\s ?- (cadr x)))) (when (string-match "-(.*)" s) (setq s (substring s 0 (match-beginning 0)))) (list (car x) (intern (downcase s)) (cadr x)))) '((100 "Continue with request") ...))) -- Stefan ^ permalink raw reply [flat|nested] 182+ messages in thread
end of thread, other threads:[~2011-08-15 8:57 UTC | newest] Thread overview: 182+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2010-03-18 19:11 Translation of http status code to text Lennart Borgman 2010-03-22 1:19 ` Juri Linkov 2010-03-22 13:17 ` Ted Zlatanov 2010-03-22 14:01 ` Stefan Monnier 2010-03-22 14:25 ` Ted Zlatanov 2010-03-22 17:06 ` Ted Zlatanov 2010-03-22 17:55 ` Sven Joachim 2010-03-22 19:23 ` Ted Zlatanov 2010-03-22 20:32 ` Sven Joachim 2010-03-22 21:31 ` Ted Zlatanov 2010-03-23 9:55 ` Juri Linkov 2010-03-23 13:08 ` Lennart Borgman 2010-03-23 14:26 ` face for non-ASCII characters (was: Translation of http status code to text) Ted Zlatanov 2010-03-23 16:28 ` Lennart Borgman 2010-03-23 18:18 ` face for non-ASCII characters Ted Zlatanov 2011-04-15 22:41 ` Ted Zlatanov 2011-04-15 23:07 ` Lennart Borgman 2011-04-16 0:51 ` Ted Zlatanov 2011-04-16 9:10 ` Lennart Borgman 2011-04-16 15:05 ` Ted Zlatanov 2011-04-16 15:28 ` Lennart Borgman 2011-04-16 15:42 ` Ted Zlatanov 2011-04-16 15:50 ` Lennart Borgman 2011-04-16 15:57 ` Ted Zlatanov 2011-04-16 16:01 ` Lennart Borgman 2011-04-16 16:13 ` Ted Zlatanov 2011-04-16 16:22 ` Lennart Borgman 2011-04-16 16:27 ` Drew Adams 2011-04-16 16:45 ` Ted Zlatanov 2011-04-16 16:48 ` Lennart Borgman 2011-04-16 16:55 ` Ted Zlatanov 2011-04-16 17:11 ` Lennart Borgman 2011-04-18 15:48 ` Ted Zlatanov 2011-04-18 15:53 ` Lennart Borgman 2011-04-18 16:20 ` Ted Zlatanov 2011-04-18 17:03 ` Lennart Borgman 2011-04-19 13:07 ` Ted Zlatanov 2011-04-19 18:56 ` Lennart Borgman 2011-04-20 14:49 ` Ted Zlatanov 2011-04-20 21:38 ` Lennart Borgman 2011-04-21 17:35 ` Ted Zlatanov 2011-04-21 18:42 ` Lennart Borgman 2011-04-21 19:14 ` Ted Zlatanov 2011-04-21 20:00 ` Lennart Borgman 2011-04-21 20:35 ` Ted Zlatanov 2011-04-21 20:53 ` Lennart Borgman 2011-04-21 21:18 ` Ted Zlatanov 2011-04-22 12:20 ` Lennart Borgman 2011-04-22 12:49 ` Stephen J. Turnbull 2011-04-22 13:23 ` Lennart Borgman 2011-04-23 0:50 ` Richard Stallman 2011-04-23 7:13 ` Lennart Borgman 2011-04-25 17:54 ` Richard Stallman 2011-04-26 18:26 ` Chong Yidong 2011-04-26 19:05 ` Ted Zlatanov 2011-04-26 20:29 ` Chong Yidong 2011-04-27 3:45 ` Ted Zlatanov 2011-04-27 4:42 ` Stephen J. Turnbull 2011-05-02 18:18 ` Ted Zlatanov 2011-05-03 1:50 ` Stephen J. Turnbull 2011-05-03 14:45 ` Ted Zlatanov 2011-05-03 21:21 ` Lennart Borgman 2011-05-04 14:41 ` Stephen J. Turnbull 2011-04-27 12:41 ` Lennart Borgman 2011-04-22 14:20 ` Ted Zlatanov 2011-04-22 17:12 ` Lennart Borgman 2011-04-26 3:14 ` package management proposals for Emacs (was: face for non-ASCII characters) Ted Zlatanov 2011-04-26 8:10 ` Lennart Borgman 2011-04-26 21:46 ` Richard Stallman 2011-04-27 1:19 ` package management proposals for Emacs Stefan Monnier 2011-04-27 3:36 ` Ted Zlatanov 2011-04-27 21:14 ` Richard Stallman 2011-04-26 3:09 ` markchars.el 0.2.0 and idn.el (was: face for non-ASCII characters) Ted Zlatanov 2011-04-26 8:13 ` Lennart Borgman 2011-04-26 15:28 ` idn.el and confusables.txt (was: markchars.el 0.2.0 and idn.el) Ted Zlatanov 2011-05-13 19:42 ` idn.el and confusables.txt Stefan Monnier 2011-05-13 20:19 ` Ted Zlatanov 2011-05-14 8:13 ` Eli Zaretskii 2011-05-14 8:06 ` Eli Zaretskii 2011-05-14 8:56 ` Lennart Borgman 2011-05-14 9:36 ` Eli Zaretskii 2011-05-14 13:40 ` Ted Zlatanov 2011-05-14 14:38 ` Eli Zaretskii 2011-05-14 15:30 ` Ted Zlatanov 2011-05-14 16:42 ` Eli Zaretskii 2011-05-14 17:06 ` Ted Zlatanov 2011-05-14 20:59 ` Eli Zaretskii 2011-05-15 1:22 ` Ted Zlatanov 2011-05-15 5:56 ` Eli Zaretskii 2011-05-15 12:14 ` Ted Zlatanov 2011-05-16 12:38 ` Eli Zaretskii 2011-05-16 18:31 ` Ted Zlatanov 2011-05-17 17:59 ` Eli Zaretskii 2011-05-17 15:32 ` Ted Zlatanov 2011-05-18 18:15 ` Ted Zlatanov 2011-05-14 17:25 ` Stefan Monnier 2011-05-15 13:06 ` Kenichi Handa 2011-05-15 17:34 ` Eli Zaretskii 2011-05-18 5:23 ` handa 2011-05-18 7:38 ` Eli Zaretskii 2011-05-18 7:59 ` handa 2011-05-18 8:13 ` Eli Zaretskii 2011-06-17 8:15 ` Kenichi Handa 2011-06-17 15:12 ` Eli Zaretskii 2011-06-21 2:07 ` Kenichi Handa 2011-06-21 2:53 ` Eli Zaretskii 2011-06-21 3:29 ` Kenichi Handa 2011-06-21 6:11 ` Eli Zaretskii 2011-06-21 7:22 ` Kenichi Handa 2011-06-21 7:34 ` Eli Zaretskii 2011-06-21 8:02 ` Kenichi Handa 2011-06-21 10:30 ` bidi at startup (was: idn.el and confusables.txt) Eli Zaretskii 2011-06-21 15:12 ` bidi at startup Stefan Monnier 2011-06-21 17:13 ` Eli Zaretskii 2011-06-22 15:32 ` Stefan Monnier 2011-07-07 6:10 ` C interface to Unicode character property char-tables Kenichi Handa 2011-08-06 16:52 ` Using uniprop_table_lookup (was: idn.el and confusables.txt) Eli Zaretskii 2011-08-09 0:55 ` Kenichi Handa 2011-08-09 1:32 ` Using uniprop_table_lookup Stefan Monnier 2011-08-09 4:31 ` Kenichi Handa 2011-08-15 8:57 ` Eli Zaretskii 2011-05-31 10:42 ` uni-confusables 0.1 is on the Emacs ELPA branch (was: idn.el and confusables.txt) Ted Zlatanov 2011-06-08 10:42 ` uni-confusables 0.1 is on the Emacs ELPA branch Ted Zlatanov 2011-06-08 15:22 ` Stefan Monnier 2011-04-16 16:00 ` face for non-ASCII characters Drew Adams 2010-03-23 19:40 ` Florian Beck 2010-03-23 14:35 ` Translation of http status code to text Miles Bader 2010-03-23 14:22 ` highlighting non-ASCII characters (was: Translation of http status code to text) Ted Zlatanov 2010-03-23 16:50 ` highlighting non-ASCII characters (was: Translation of http statuscode " Drew Adams 2010-03-23 21:49 ` highlighting non-ASCII characters Stefan Monnier 2010-03-23 21:53 ` Drew Adams 2010-03-24 0:45 ` Stefan Monnier 2010-03-24 1:03 ` Ted Zlatanov 2010-03-24 2:47 ` Stefan Monnier 2010-03-24 4:20 ` Eli Zaretskii 2010-03-24 5:14 ` Jason Rumney 2010-03-24 13:25 ` Stefan Monnier 2010-03-24 15:06 ` Jason Rumney 2010-03-24 19:47 ` Ted Zlatanov 2010-03-24 10:05 ` Ted Zlatanov 2010-03-24 16:21 ` Lennart Borgman 2010-03-24 19:34 ` Lennart Borgman 2010-03-26 17:35 ` Ted Zlatanov 2010-03-26 20:43 ` Ted Zlatanov 2010-03-26 22:50 ` Lennart Borgman 2010-03-29 18:38 ` Ted Zlatanov 2010-03-29 18:48 ` Drew Adams 2010-03-29 20:20 ` Stefan Monnier 2010-03-29 20:19 ` Stefan Monnier 2010-03-29 20:51 ` Lennart Borgman 2010-03-30 13:22 ` Ted Zlatanov 2010-03-29 21:05 ` Ted Zlatanov 2010-03-29 21:31 ` Lennart Borgman 2010-03-29 21:32 ` Drew Adams 2010-03-30 13:15 ` Ted Zlatanov 2010-03-30 14:04 ` Drew Adams 2010-03-30 14:17 ` Lennart Borgman 2010-03-30 14:42 ` Ted Zlatanov 2010-03-30 16:18 ` Juri Linkov 2010-03-30 1:45 ` Stefan Monnier 2010-03-25 7:11 ` Juri Linkov 2010-03-25 14:07 ` Lennart Borgman 2010-03-25 17:32 ` Juri Linkov 2010-03-26 0:32 ` Lennart Borgman 2010-03-26 13:38 ` Stephen Berman 2010-03-26 22:44 ` Lennart Borgman 2010-03-25 7:12 ` Juri Linkov 2010-03-24 2:09 ` Drew Adams 2010-03-24 5:00 ` Stephen J. Turnbull 2010-03-24 9:28 ` Juri Linkov 2010-03-24 13:15 ` Ted Zlatanov 2010-03-24 9:27 ` Juri Linkov 2010-03-22 18:41 ` Translation of http status code to text Stefan Monnier 2010-03-22 19:15 ` Ted Zlatanov 2010-03-23 9:54 ` Juri Linkov 2010-03-23 10:54 ` joakim 2010-03-23 15:02 ` Ted Zlatanov 2010-03-24 3:22 ` Stefan Monnier 2010-03-24 17:35 ` Glenn Morris 2010-03-24 19:37 ` Ted Zlatanov 2010-03-25 1:16 ` Ted Zlatanov 2010-03-23 12:57 ` Stefan Monnier
Code repositories for project(s) associated with this public inbox https://git.savannah.gnu.org/cgit/emacs.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).