Translation of http status code to text

all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed

* Translation of http status code to text
@ 2010-03-18 19:11 Lennart Borgman
  2010-03-22  1:19 ` Juri Linkov
  0 siblings, 1 reply; 182+ messages in thread
From: Lennart Borgman @ 2010-03-18 19:11 UTC (permalink / raw)
  To: Emacs-Devel devel

Is there a list in Emacs for translating http status code to readable
text (like http://en.wikipedia.org/wiki/List_of_HTTP_status_codes)?

If not could we please add one that can be used from different elisp
files that needs this?

^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: Translation of http status code to text
  2010-03-18 19:11 Translation of http status code to text Lennart Borgman
@ 2010-03-22  1:19 ` Juri Linkov
  2010-03-22 13:17   ` Ted Zlatanov
  0 siblings, 1 reply; 182+ messages in thread
From: Juri Linkov @ 2010-03-22  1:19 UTC (permalink / raw)
  To: Lennart Borgman; +Cc: Emacs-Devel devel

> Is there a list in Emacs for translating http status code to readable
> text (like http://en.wikipedia.org/wiki/List_of_HTTP_status_codes)?
>
> If not could we please add one that can be used from different elisp
> files that needs this?

I suppose this should be in lisp/url/url-http.el.

-- 
Juri Linkov
http://www.jurta.org/emacs/




^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: Translation of http status code to text
  2010-03-22  1:19 ` Juri Linkov
@ 2010-03-22 13:17   ` Ted Zlatanov
  2010-03-22 14:01     ` Stefan Monnier
  0 siblings, 1 reply; 182+ messages in thread
From: Ted Zlatanov @ 2010-03-22 13:17 UTC (permalink / raw)
  To: emacs-devel

On Mon, 22 Mar 2010 03:19:32 +0200 Juri Linkov <juri@jurta.org> wrote: 

>> Is there a list in Emacs for translating http status code to readable
>> text (like http://en.wikipedia.org/wiki/List_of_HTTP_status_codes)?
>> 
>> If not could we please add one that can be used from different elisp
>> files that needs this?

JL> I suppose this should be in lisp/url/url-http.el.

The codes are used (and listed in the comments) in
url-http-parse-headers.  Would it be OK to just make an alist with
symbols and text, e.g.

(setq url-http-codes '((206 (url-http-code-partial-content "Partial content"))))
...
(assq 206 url-http-codes)

or is there anything else needed?

Ted

^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: Translation of http status code to text
  2010-03-22 13:17   ` Ted Zlatanov
@ 2010-03-22 14:01     ` Stefan Monnier
  2010-03-22 14:25       ` Ted Zlatanov
  0 siblings, 1 reply; 182+ messages in thread
From: Stefan Monnier @ 2010-03-22 14:01 UTC (permalink / raw)
  To: Ted Zlatanov; +Cc: emacs-devel

> Would it be OK to just make an alist with symbols and text, e.g.

The question is sufficiently general that some of the possible
understandings of it are indeed OK.  Of course, you can also come up
with an alist of symbols and text to which I would be strongly opposed.

So assuming you (can) do it right, then yes, that's of course perfectly
acceptable; but details matter.

> (setq url-http-codes '((206 (url-http-code-partial-content "Partial content"))))

`setq' doesn't sound right, for example,

        Stefan

^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: Translation of http status code to text
  2010-03-22 14:01     ` Stefan Monnier
@ 2010-03-22 14:25       ` Ted Zlatanov
  2010-03-22 17:06         ` Ted Zlatanov
  0 siblings, 1 reply; 182+ messages in thread
From: Ted Zlatanov @ 2010-03-22 14:25 UTC (permalink / raw)
  To: emacs-devel

On Mon, 22 Mar 2010 10:01:24 -0400 Stefan Monnier <monnier@iro.umontreal.ca> wrote: 

>> Would it be OK to just make an alist with symbols and text, e.g.
SM> The question is sufficiently general that some of the possible
SM> understandings of it are indeed OK.  Of course, you can also come up
SM> with an alist of symbols and text to which I would be strongly opposed.

SM> So assuming you (can) do it right, then yes, that's of course perfectly
SM> acceptable; but details matter.

I am not sure what you're saying.  Is the format I showed, which is
sufficient for the url-http.el needs, OK or should I use a plist in the
cdr?

Or do you mean I should put the list together and then we can massage it?

>> (setq url-http-codes '((206 (url-http-code-partial-content "Partial content"))))

SM> `setq' doesn't sound right, for example,

I was just showing the format.  It would be a proper defvar of course.

Ted

^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: Translation of http status code to text
  2010-03-22 14:25       ` Ted Zlatanov
@ 2010-03-22 17:06         ` Ted Zlatanov
  2010-03-22 17:55           ` Sven Joachim
  2010-03-22 18:41           ` Translation of http status code to text Stefan Monnier
  0 siblings, 2 replies; 182+ messages in thread
From: Ted Zlatanov @ 2010-03-22 17:06 UTC (permalink / raw)
  To: emacs-devel

Here's one version of the HTTP codes as an alist.  I was thinking of
also generating defconst calls based on this list, that's why I named
everything "url-http-code-*".  But maybe that's not necessary and
accessor functions will be enough, so then we can s/url-http-code-//

Ted

=== modified file 'lisp/url/url-http.el'
--- lisp/url/url-http.el	2010-01-13 08:35:10 +0000
+++ lisp/url/url-http.el	2010-03-22 17:00:04 +0000
@@ -64,6 +64,56 @@
 nil will explicitly close the connection to the server after every
 request.")
 
+(defvar url-http-codes
+  "The HTTP return codes."
+  '(
+    (100 url-http-code-continue                        "Continue with request")
+    (101 url-http-code-switching-protocols             "Switching protocols")
+    (102 url-http-code-processing                      "Processing (Added by DAV)")
+    (200 url-http-code-OK                              "OК")
+    (201 url-http-code-created                         "Created")
+    (202 url-http-code-accepted                        "Accepted")
+    (203 url-http-code-non-authoritative               "Non-authoritative information")
+    (204 url-http-code-no-content                      "No content")
+    (205 url-http-code-reset-content                   "Reset content")
+    (206 url-http-code-partial-content                 "Partial content")
+    (207 url-http-code-multi-status                    "Multi-status (Added by DAV)")
+    (300 url-http-code-multiple-choices                "Multiple choices")
+    (301 url-http-code-moved-permanently               "Moved permanently")
+    (302 url-http-code-found                           "Found")
+    (303 url-http-code-see-other                       "See other")
+    (304 url-http-code-not-modified                    "Not modified")
+    (305 url-http-code-use-proxy                       "Use proxy")
+    (307 url-http-code-temporary-redirect              "Temporary redirect")
+    (400 url-http-code-bad-request                     "Bad Request")
+    (401 url-http-code-unauthorized                    "Unauthorized")
+    (402 url-http-code-payment-required                "Payment required")
+    (403 url-http-code-forbidden                       "Forbidden")
+    (404 url-http-code-not-found                       "Not found")
+    (405 url-http-code-method-not-allowed              "Method not allowed")
+    (406 url-http-code-not-acceptable                  "Not acceptable")
+    (407 url-http-code-proxy-authentication-required   "Proxy authentication required")
+    (408 url-http-code-request-timeout                 "Request time-out")
+    (409 url-http-code-conflict                        "Conflict")
+    (410 url-http-code-gone                            "Gone")
+    (411 url-http-code-length-required                 "Length required")
+    (412 url-http-code-precondition-failed             "Precondition failed")
+    (413 url-http-code-request-entity-too-large        "Request entity too large")
+    (414 url-http-code-request-uri-too-large           "Request-URI too large")
+    (415 url-http-code-unsupported-media-type          "Unsupported media type")
+    (416 url-http-code-requested-range-not-satisfiable "Requested range not satisfiable")
+    (417 url-http-code-expectation-failed              "Expectation failed")
+    (422 url-http-code-unprocessable-entity            "Unprocessable Entity (Added by DAV)")
+    (423 url-http-code-locked                          "Locked")
+    (424 url-http-code-failed-Dependency               "Failed Dependency")
+    (500 url-http-code-internal-server-error           "Internal server error")
+    (501 url-http-code-not-implemented                 "Not implemented")
+    (502 url-http-code-bad-gateway                     "Bad gateway")
+    (503 url-http-code-service-unavailable             "Service unavailable")
+    (504 url-http-code-gateway-timeout                 "Gateway time-out")
+    (505 url-http-code-http-version-not-supported      "HTTP version not supported")
+    (507 url-http-code-insufficient-storage            "Insufficient storage")))
+
 ;(eval-when-compile
 ;; These are all macros so that they are hidden from external sight
 ;; when the file is byte-compiled.






^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: Translation of http status code to text
  2010-03-22 17:06         ` Ted Zlatanov
@ 2010-03-22 17:55           ` Sven Joachim
  2010-03-22 19:23             ` Ted Zlatanov
  2010-03-22 18:41           ` Translation of http status code to text Stefan Monnier
  1 sibling, 1 reply; 182+ messages in thread
From: Sven Joachim @ 2010-03-22 17:55 UTC (permalink / raw)
  To: Ted Zlatanov; +Cc: emacs-devel

On 2010-03-22 18:06 +0100, Ted Zlatanov wrote:

> +    (200 url-http-code-OK                              "OК")
                                                            ^ 
I think this is not quite OK. ;-)

Sven




^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: Translation of http status code to text
  2010-03-22 17:06         ` Ted Zlatanov
  2010-03-22 17:55           ` Sven Joachim
@ 2010-03-22 18:41           ` Stefan Monnier
  2010-03-22 19:15             ` Ted Zlatanov
  1 sibling, 1 reply; 182+ messages in thread
From: Stefan Monnier @ 2010-03-22 18:41 UTC (permalink / raw)
  To: Ted Zlatanov; +Cc: emacs-devel

> Here's one version of the HTTP codes as an alist.  I was thinking of
> also generating defconst calls based on this list, that's why I named
> everything "url-http-code-*".  But maybe that's not necessary and
> accessor functions will be enough, so then we can s/url-http-code-//

The symbol part depends on the rest of the patch: what do you use the
symbol for?  Do you even need it?


        Stefan




^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: Translation of http status code to text
  2010-03-22 18:41           ` Translation of http status code to text Stefan Monnier
@ 2010-03-22 19:15             ` Ted Zlatanov
  2010-03-23  9:54               ` Juri Linkov
  2010-03-23 12:57               ` Stefan Monnier
  0 siblings, 2 replies; 182+ messages in thread
From: Ted Zlatanov @ 2010-03-22 19:15 UTC (permalink / raw)
  To: emacs-devel

On Mon, 22 Mar 2010 14:41:46 -0400 Stefan Monnier <monnier@iro.umontreal.ca> wrote: 

>> Here's one version of the HTTP codes as an alist.  I was thinking of
>> also generating defconst calls based on this list, that's why I named
>> everything "url-http-code-*".  But maybe that's not necessary and
>> accessor functions will be enough, so then we can s/url-http-code-//

SM> The symbol part depends on the rest of the patch: what do you use the
SM> symbol for?  Do you even need it?

I was going to change url-http.el like so, in url-http-parse-headers:

	 (case url-http-response-status
	   ((url-http-code-multiple-choices) ; was 300, uses defconst
...
	    nil)
...)

and I thought generating the defconsts would be best without dynamically
generating the symbol name as "url-http-code-%s" from the symbol in the
table.

But instead I can say (url-http-code-number 'multiple-choices).  It's
really a matter of taste.

Ted

^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: Translation of http status code to text
  2010-03-22 17:55           ` Sven Joachim
@ 2010-03-22 19:23             ` Ted Zlatanov
  2010-03-22 20:32               ` Sven Joachim
  0 siblings, 1 reply; 182+ messages in thread
From: Ted Zlatanov @ 2010-03-22 19:23 UTC (permalink / raw)
  To: emacs-devel

On Mon, 22 Mar 2010 18:55:57 +0100 Sven Joachim <svenjoac@gmx.de> wrote: 

SJ> On 2010-03-22 18:06 +0100, Ted Zlatanov wrote:
>> +    (200 url-http-code-OK                              "OК")
SJ>                                                             ^ 
SJ> I think this is not quite OK. ;-)

Sorry, can you elaborate?

Ted





^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: Translation of http status code to text
  2010-03-22 19:23             ` Ted Zlatanov
@ 2010-03-22 20:32               ` Sven Joachim
  2010-03-22 21:31                 ` Ted Zlatanov
  0 siblings, 1 reply; 182+ messages in thread
From: Sven Joachim @ 2010-03-22 20:32 UTC (permalink / raw)
  To: Ted Zlatanov; +Cc: emacs-devel

On 2010-03-22 20:23 +0100, Ted Zlatanov wrote:

> On Mon, 22 Mar 2010 18:55:57 +0100 Sven Joachim <svenjoac@gmx.de> wrote: 
>
> SJ> On 2010-03-22 18:06 +0100, Ted Zlatanov wrote:
>>> +    (200 url-http-code-OK                              "OК")
> SJ>                                                             ^ 
> SJ> I think this is not quite OK. ;-)
>
> Sorry, can you elaborate?

Press C-u C-x = over the К in "OК" and notice how this differs from "K".

Sven




^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: Translation of http status code to text
  2010-03-22 20:32               ` Sven Joachim
@ 2010-03-22 21:31                 ` Ted Zlatanov
  2010-03-23  9:55                   ` Juri Linkov
  0 siblings, 1 reply; 182+ messages in thread
From: Ted Zlatanov @ 2010-03-22 21:31 UTC (permalink / raw)
  To: emacs-devel

On Mon, 22 Mar 2010 21:32:53 +0100 Sven Joachim <svenjoac@gmx.de> wrote: 

SJ> On 2010-03-22 20:23 +0100, Ted Zlatanov wrote:
>> On Mon, 22 Mar 2010 18:55:57 +0100 Sven Joachim <svenjoac@gmx.de> wrote: 
>> 
SJ> On 2010-03-22 18:06 +0100, Ted Zlatanov wrote:
>>>> +    (200 url-http-code-OK                              "OК")
SJ> ^ 
SJ> I think this is not quite OK. ;-)
>> 
>> Sorry, can you elaborate?

SJ> Press C-u C-x = over the К in "OК" and notice how this differs from "K".

Thanks for noticing that.  I had the cyrillic-translit input method on
accidentally.  It would be nice if I could highlight non-ASCII
characters in a subtle way to avoid this, is there a relevant face?

Thanks
Ted





^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: Translation of http status code to text
  2010-03-22 19:15             ` Ted Zlatanov
@ 2010-03-23  9:54               ` Juri Linkov
  2010-03-23 10:54                 ` joakim
  2010-03-23 15:02                 ` Ted Zlatanov
  2010-03-23 12:57               ` Stefan Monnier
  1 sibling, 2 replies; 182+ messages in thread
From: Juri Linkov @ 2010-03-23  9:54 UTC (permalink / raw)
  To: Ted Zlatanov; +Cc: emacs-devel

> SM> The symbol part depends on the rest of the patch: what do you use the
> SM> symbol for?  Do you even need it?
>
> I was going to change url-http.el like so, in url-http-parse-headers:
>
> 	 (case url-http-response-status
> 	   ((url-http-code-multiple-choices) ; was 300, uses defconst
> ...
> 	    nil)
> ...)
>
> and I thought generating the defconsts would be best without dynamically
> generating the symbol name as "url-http-code-%s" from the symbol in the
> table.
>
> But instead I can say (url-http-code-number 'multiple-choices).  It's
> really a matter of taste.

Please don't add symbols.  Developers prefer to refer to numbers.

The "number to text" mapping is necessary only to display codes
in a readable form to users.  Please re-read Lennart's original
request:

  Is there a list in Emacs for translating http status code to readable
  text (like http://en.wikipedia.org/wiki/List_of_HTTP_status_codes)?

-- 
Juri Linkov
http://www.jurta.org/emacs/




^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: Translation of http status code to text
  2010-03-22 21:31                 ` Ted Zlatanov
@ 2010-03-23  9:55                   ` Juri Linkov
  2010-03-23 13:08                     ` Lennart Borgman
  2010-03-23 14:22                     ` highlighting non-ASCII characters (was: Translation of http status code to text) Ted Zlatanov
  0 siblings, 2 replies; 182+ messages in thread
From: Juri Linkov @ 2010-03-23  9:55 UTC (permalink / raw)
  To: Ted Zlatanov; +Cc: emacs-devel

>>>>> +    (200 url-http-code-OK                              "OК")
> SJ> ^
> SJ> I think this is not quite OK. ;-)
>>>
>>> Sorry, can you elaborate?
>
> SJ> Press C-u C-x = over the К in "OК" and notice how this differs from "K".
>
> Thanks for noticing that.  I had the cyrillic-translit input method on
> accidentally.  It would be nice if I could highlight non-ASCII
> characters in a subtle way to avoid this, is there a relevant face?

Maybe `escape-glyph' is a relevant face?

-- 
Juri Linkov
http://www.jurta.org/emacs/




^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: Translation of http status code to text
  2010-03-23  9:54               ` Juri Linkov
@ 2010-03-23 10:54                 ` joakim
  2010-03-23 15:02                 ` Ted Zlatanov
  1 sibling, 0 replies; 182+ messages in thread
From: joakim @ 2010-03-23 10:54 UTC (permalink / raw)
  To: Juri Linkov; +Cc: Ted Zlatanov, emacs-devel

Juri Linkov <juri@jurta.org> writes:

>> SM> The symbol part depends on the rest of the patch: what do you use the
>> SM> symbol for?  Do you even need it?
>>
>> I was going to change url-http.el like so, in url-http-parse-headers:
>>
>> 	 (case url-http-response-status
>> 	   ((url-http-code-multiple-choices) ; was 300, uses defconst
>> ...
>> 	    nil)
>> ...)
>>
>> and I thought generating the defconsts would be best without dynamically
>> generating the symbol name as "url-http-code-%s" from the symbol in the
>> table.
>>
>> But instead I can say (url-http-code-number 'multiple-choices).  It's
>> really a matter of taste.
>
> Please don't add symbols.  Developers prefer to refer to numbers.
>
> The "number to text" mapping is necessary only to display codes
> in a readable form to users.  Please re-read Lennart's original
> request:
>
>   Is there a list in Emacs for translating http status code to readable
>   text (like http://en.wikipedia.org/wiki/List_of_HTTP_status_codes)?

As a developer I'd like both numbers and symbols. Maybe thats just me. I
think this is a bit particular for the http protocol, since some codes
are extremely well known, whereas others are not.

-- 
Joakim Verona




^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: Translation of http status code to text
  2010-03-22 19:15             ` Ted Zlatanov
  2010-03-23  9:54               ` Juri Linkov
@ 2010-03-23 12:57               ` Stefan Monnier
  1 sibling, 0 replies; 182+ messages in thread
From: Stefan Monnier @ 2010-03-23 12:57 UTC (permalink / raw)
  To: Ted Zlatanov; +Cc: emacs-devel

>>> Here's one version of the HTTP codes as an alist.  I was thinking of
>>> also generating defconst calls based on this list, that's why I named
>>> everything "url-http-code-*".  But maybe that's not necessary and
>>> accessor functions will be enough, so then we can s/url-http-code-//
SM> The symbol part depends on the rest of the patch: what do you use the
SM> symbol for?  Do you even need it?
> I was going to change url-http.el like so, in url-http-parse-headers:

> 	 (case url-http-response-status
> 	   ((url-http-code-multiple-choices) ; was 300, uses defconst
> ...
> 	    nil)
> ...)

I see, then it's OK to add symbols, I guess, but in this case use
shorter ones, so you can do:

  (case (cadr (assq status-number url-http-codes)))
    (OK ...)
    (moved-permanently ...)
    (proxy-authentication-required ...)
    (accepted ...)
    ...)

And in that case, a better option is to create the symbols
programmatically from the error string:

  (defconst url-http-codes
    (mapcar (lambda (x)
              (let ((s (subst-char-in-string ?\s ?- (cadr x))))
                (when (string-match "-(.*)" s)
                  (setq s (substring s 0 (match-beginning 0))))
                (list (car x) (intern (downcase s)) (cadr x))))
     '((100 "Continue with request")
       ...)))


-- Stefan




^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: Translation of http status code to text
  2010-03-23  9:55                   ` Juri Linkov
@ 2010-03-23 13:08                     ` Lennart Borgman
  2010-03-23 14:26                       ` face for non-ASCII characters (was: Translation of http status code to text) Ted Zlatanov
  2010-03-23 14:35                       ` Translation of http status code to text Miles Bader
  2010-03-23 14:22                     ` highlighting non-ASCII characters (was: Translation of http status code to text) Ted Zlatanov
  1 sibling, 2 replies; 182+ messages in thread
From: Lennart Borgman @ 2010-03-23 13:08 UTC (permalink / raw)
  To: Juri Linkov; +Cc: Ted Zlatanov, emacs-devel

[-- Attachment #1: Type: text/plain, Size: 760 bytes --]

2010/3/23 Juri Linkov <juri@jurta.org>:
>>>>>> +    (200 url-http-code-OK                              "OК")
>> SJ> ^
>> SJ> I think this is not quite OK. ;-)
>>>>
>>>> Sorry, can you elaborate?
>>
>> SJ> Press C-u C-x = over the К in "OК" and notice how this differs from "K".
>>
>> Thanks for noticing that.  I had the cyrillic-translit input method on
>> accidentally.  It would be nice if I could highlight non-ASCII
>> characters in a subtle way to avoid this, is there a relevant face?
>
> Maybe `escape-glyph' is a relevant face?

I just wrote a little mis-named library for marking non-ascii
characters. I have attached it if someone want to use it (or convert
it to something more flexible and less mis-named).

[-- Attachment #2: markchars.el --]
[-- Type: text/plain, Size: 2176 bytes --]

;;; markchars.el --- Mark characters fitting certain characteristics
;;
;; Author: Lennart Borgman (lennart O borgman A gmail O com)
;; Created: 2010-03-22 Mon
;; Version:
;; Last-Updated:
;; URL:
;; Keywords:
;; Compatibility:
;;
;; Features that might be required by this library:
;;
;;   None
;;
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;
;;; Commentary:
;;
;;
;;
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;
;;; Change log:
;;
;;
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;
;; This program is free software; you can redistribute it and/or
;; modify it under the terms of the GNU General Public License as
;; published by the Free Software Foundation; either version 3, or
;; (at your option) any later version.
;;
;; This program is distributed in the hope that it will be useful,
;; but WITHOUT ANY WARRANTY; without even the implied warranty of
;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
;; General Public License for more details.
;;
;; You should have received a copy of the GNU General Public License
;; along with this program; see the file COPYING.  If not, write to
;; the Free Software Foundation, Inc., 51 Franklin Street, Fifth
;; Floor, Boston, MA 02110-1301, USA.
;;
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;
;;; Code:

;; (re-search-forward "[[:nonascii:]]")
;; äåö

(defgroup markchars nil
  "Customization group for `markchars-mode'."
  :group 'convenience)

(defface markchars-face
  '((t (:underline "light blue")))
  "Face for `markchars-mode' char marking."
  :group 'markchars)

(defconst markchars-keywords
  `(
    ("[[:nonascii:]]"
     (0 (put-text-property (match-beginning 0) (match-end 0)
                           'face 'markchars-face)))
    ))

(define-minor-mode markchars-mode
  "Color up non-ascii chars."
  :group 'markchars
  (if markchars-mode
      (font-lock-add-keywords nil markchars-keywords)
    (font-lock-remove-keywords nil markchars-keywords))
  (font-lock-fontify-buffer))

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;; markchars.el ends here

^ permalink raw reply	[flat|nested] 182+ messages in thread

* highlighting non-ASCII characters (was: Translation of http status code to text)
  2010-03-23  9:55                   ` Juri Linkov
  2010-03-23 13:08                     ` Lennart Borgman
@ 2010-03-23 14:22                     ` Ted Zlatanov
  2010-03-23 16:50                       ` highlighting non-ASCII characters (was: Translation of http statuscode " Drew Adams
  2010-03-24  9:27                       ` Juri Linkov
  1 sibling, 2 replies; 182+ messages in thread
From: Ted Zlatanov @ 2010-03-23 14:22 UTC (permalink / raw)
  To: emacs-devel

On Tue, 23 Mar 2010 11:55:17 +0200 Juri Linkov <juri@jurta.org> wrote: 

>> Thanks for noticing that.  I had the cyrillic-translit input method on
>> accidentally.  It would be nice if I could highlight non-ASCII
>> characters in a subtle way to avoid this, is there a relevant face?

JL> Maybe `escape-glyph' is a relevant face?

I don't think it applies to non-ASCII by default, at least for me it
doesn't.  I can set up font-lock for myself, of course, but was
wondering if Emacs has a simple built-in way to highlight them.

Also it would be good if the effect did not interfere with other faces,
so e.g. a subtle fading or underline effect overlaid with the regular
face would be nicer than just asking the user to customize it themselves
(although that option is still valuable).  I don't know much about Emacs
faces so any help is welcome.

Ted

^ permalink raw reply	[flat|nested] 182+ messages in thread

* face for non-ASCII characters (was: Translation of http status code to text)
  2010-03-23 13:08                     ` Lennart Borgman
@ 2010-03-23 14:26                       ` Ted Zlatanov
  2010-03-23 16:28                         ` Lennart Borgman
  2010-03-23 19:40                         ` Florian Beck
  2010-03-23 14:35                       ` Translation of http status code to text Miles Bader
  1 sibling, 2 replies; 182+ messages in thread
From: Ted Zlatanov @ 2010-03-23 14:26 UTC (permalink / raw)
  To: emacs-devel

On Tue, 23 Mar 2010 14:08:42 +0100 Lennart Borgman <lennart.borgman@gmail.com> wrote: 

...
LB> (defface markchars-face
LB>   '((t (:underline "light blue")))
LB>   "Face for `markchars-mode' char marking."
LB>   :group 'markchars)

LB> (defconst markchars-keywords
LB>   `(
LB>     ("[[:nonascii:]]"
LB>      (0 (put-text-property (match-beginning 0) (match-end 0)
LB>                            'face 'markchars-face)))
LB>     ))

This is what I was thinking of, and the effect should be subtle.  The
light blue underline is pretty good; it would be nice if it could be a
choice between a few similarly well-chosen defaults (I'm hardly the one
to provide them, though).  

I would make this an Emacs first-class option rather than an optional
library.  I think it's really useful.

Ted

^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: Translation of http status code to text
  2010-03-23 13:08                     ` Lennart Borgman
  2010-03-23 14:26                       ` face for non-ASCII characters (was: Translation of http status code to text) Ted Zlatanov
@ 2010-03-23 14:35                       ` Miles Bader
  1 sibling, 0 replies; 182+ messages in thread
From: Miles Bader @ 2010-03-23 14:35 UTC (permalink / raw)
  To: Lennart Borgman; +Cc: Juri Linkov, Ted Zlatanov, emacs-devel

Don't use "-face" as a suffix for face names.

-Miles

-- 
Is it true that nothing can be known?  If so how do we know this?  -Woody Allen




^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: Translation of http status code to text
  2010-03-23  9:54               ` Juri Linkov
  2010-03-23 10:54                 ` joakim
@ 2010-03-23 15:02                 ` Ted Zlatanov
  2010-03-24  3:22                   ` Stefan Monnier
  2010-03-24 17:35                   ` Glenn Morris
  1 sibling, 2 replies; 182+ messages in thread
From: Ted Zlatanov @ 2010-03-23 15:02 UTC (permalink / raw)
  To: emacs-devel

[-- Attachment #1: Type: text/plain, Size: 1913 bytes --]

On Tue, 23 Mar 2010 11:54:18 +0200 Juri Linkov <juri@jurta.org> wrote: 

JL> Please don't add symbols.  Developers prefer to refer to numbers.

I considered that but it really seemed useful to have symbolic codes as
I was working through the list of text explanations.

On Tue, 23 Mar 2010 11:54:40 +0100 joakim@verona.se wrote: 

j> As a developer I'd like both numbers and symbols. Maybe thats just me. I
j> think this is a bit particular for the http protocol, since some codes
j> are extremely well known, whereas others are not.

Ditto.

On Tue, 23 Mar 2010 08:57:15 -0400 Stefan Monnier <monnier@iro.umontreal.ca> wrote: 

SM> I see, then it's OK to add symbols, I guess, but in this case use
SM> shorter ones, so you can do:

SM>   (case (cadr (assq status-number url-http-codes)))
SM>     (OK ...)
SM>     (moved-permanently ...)
SM>     (proxy-authentication-required ...)
SM>     (accepted ...)
SM>     ...)

Yeah, that was the other option.  It works too.

SM> And in that case, a better option is to create the symbols
SM> programmatically from the error string:

SM>   (defconst url-http-codes
SM>     (mapcar (lambda (x)
SM>               (let ((s (subst-char-in-string ?\s ?- (cadr x))))
SM>                 (when (string-match "-(.*)" s)
SM>                   (setq s (substring s 0 (match-beginning 0))))
SM>                 (list (car x) (intern (downcase s)) (cadr x))))
SM>      '((100 "Continue with request")
SM>        ...)))

From the perspective of the programmer it's better to see the symbols
(grep won't work with this approach).  Also "OK" look weird as the "ok"
status code and "time-out" in the error code is odd IMHO.  These are
matters of taste but I hope you'll allow that the manually generated
table is a little better.  It only has to be done once, after all.

Attached is another attempt which uses the symbols in the code and
doesn't dynamically generate the symbols.

Ted


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: url-http-codes.patch --]
[-- Type: text/x-diff, Size: 11201 bytes --]

=== modified file 'lisp/url/url-http.el'
--- lisp/url/url-http.el	2010-01-13 08:35:10 +0000
+++ lisp/url/url-http.el	2010-03-23 14:54:14 +0000
@@ -64,6 +64,55 @@
 nil will explicitly close the connection to the server after every
 request.")
 
+(defconst url-http-codes
+  '((100 continue			 "Continue with request")
+    (101 switching-protocols		 "Switching protocols")
+    (102 processing			 "Processing (Added by DAV)")
+    (200 OK				 "OK")
+    (201 created			 "Created")
+    (202 accepted			 "Accepted")
+    (203 non-authoritative		 "Non-authoritative information")
+    (204 no-content			 "No content")
+    (205 reset-content			 "Reset content")
+    (206 partial-content		 "Partial content")
+    (207 multi-status			 "Multi-status (Added by DAV)")
+    (300 multiple-choices		 "Multiple choices")
+    (301 moved-permanently		 "Moved permanently")
+    (302 found				 "Found")
+    (303 see-other			 "See other")
+    (304 not-modified			 "Not modified")
+    (305 use-proxy			 "Use proxy")
+    (307 temporary-redirect		 "Temporary redirect")
+    (400 bad-request			 "Bad Request")
+    (401 unauthorized			 "Unauthorized")
+    (402 payment-required		 "Payment required")
+    (403 forbidden			 "Forbidden")
+    (404 not-found			 "Not found")
+    (405 method-not-allowed		 "Method not allowed")
+    (406 not-acceptable			 "Not acceptable")
+    (407 proxy-authentication-required	 "Proxy authentication required")
+    (408 request-timeout		 "Request time-out")
+    (409 conflict			 "Conflict")
+    (410 gone				 "Gone")
+    (411 length-required		 "Length required")
+    (412 precondition-failed		 "Precondition failed")
+    (413 request-entity-too-large	 "Request entity too large")
+    (414 request-uri-too-large		 "Request-URI too large")
+    (415 unsupported-media-type		 "Unsupported media type")
+    (416 requested-range-not-satisfiable "Requested range not satisfiable")
+    (417 expectation-failed		 "Expectation failed")
+    (422 unprocessable-entity		 "Unprocessable Entity (Added by DAV)")
+    (423 locked				 "Locked")
+    (424 failed-Dependency		 "Failed Dependency")
+    (500 internal-server-error		 "Internal server error")
+    (501 not-implemented		 "Not implemented")
+    (502 bad-gateway			 "Bad gateway")
+    (503 service-unavailable		 "Service unavailable")
+    (504 gateway-timeout		 "Gateway time-out")
+    (505 http-version-not-supported	 "HTTP version not supported")
+    (507 insufficient-storage		 "Insufficient storage")
+"The HTTP return codes and their text."))
+
 ;(eval-when-compile
 ;; These are all macros so that they are hidden from external sight
 ;; when the file is byte-compiled.
@@ -435,7 +484,9 @@
 	(delete-process url-http-process)))))
   (let ((buffer (current-buffer))
 	(class nil)
-	(success nil))
+	(success nil)
+	;; other status symbols: jewelry and luxury cars
+	(status-symbol (cadr (assq status-number url-http-codes))))
     (setq class (/ url-http-response-status 100))
     (url-http-debug "Parsed HTTP headers: class=%d status=%d" class url-http-response-status)
     (url-http-handle-cookies)
@@ -463,8 +514,8 @@
        ;; 205 Reset content
        ;; 206 Partial content
        ;; 207 Multi-status (Added by DAV)
-       (case url-http-response-status
-	 ((204 205)
+       (case status-symbol
+	 ((no-content reset-content)
 	  ;; No new data, just stay at the same document
 	  (url-mark-buffer-as-dead buffer)
 	  (setq success t))
@@ -485,8 +536,8 @@
        ;; 307 Temporary redirect
        (let ((redirect-uri (or (mail-fetch-field "Location")
 			       (mail-fetch-field "URI"))))
-	 (case url-http-response-status
-	   (300
+	 (case status-symbol
+	   (multiple-choices	    ; 300
 	    ;; Quoth the spec (section 10.3.1)
 	    ;; -------------------------------
 	    ;; The requested resource corresponds to any one of a set of
@@ -503,7 +554,7 @@
 	    ;; We do not support agent-driven negotiation, so we just
 	    ;; redirect to the preferred URI if one is provided.
 	    nil)
-	   ((301 302 307)
+	   ((moved-permanently found temporary-redirect) ; 301 302 307
 	    ;; If the 301|302 status code is received in response to a
 	    ;; request other than GET or HEAD, the user agent MUST NOT
 	    ;; automatically redirect the request unless it can be
@@ -519,20 +570,20 @@
 			      url-http-method url-http-response-status)
 	      (setq url-http-method "GET"
 		    url-http-data nil)))
-	   (303
+	   (see-other			; 303
 	    ;; The response to the request can be found under a different
 	    ;; URI and SHOULD be retrieved using a GET method on that
 	    ;; resource.
 	    (setq url-http-method "GET"
 		  url-http-data nil))
-	   (304
+	   (not-modified		; 304
 	    ;; The 304 response MUST NOT contain a message-body.
 	    (url-http-debug "Extracting document from cache... (%s)"
 			    (url-cache-create-filename (url-view-url t)))
 	    (url-cache-extract (url-cache-create-filename (url-view-url t)))
 	    (setq redirect-uri nil
 		  success t))
-	   (305
+	   (use-proxy			; 305
 	    ;; The requested resource MUST be accessed through the
 	    ;; proxy given by the Location field.  The Location field
 	    ;; gives the URI of the proxy.  The recipient is expected
@@ -620,51 +671,51 @@
        ;; 422 Unprocessable Entity (Added by DAV)
        ;; 423 Locked
        ;; 424 Failed Dependency
-       (case url-http-response-status
-	 (401
+       (case status-symbol
+	 (unauthorized			; 401
 	  ;; The request requires user authentication.  The response
 	  ;; MUST include a WWW-Authenticate header field containing a
 	  ;; challenge applicable to the requested resource.  The
 	  ;; client MAY repeat the request with a suitable
 	  ;; Authorization header field.
 	  (url-http-handle-authentication nil))
-	 (402
+	 (payment-required              ; 402
 	  ;; This code is reserved for future use
 	  (url-mark-buffer-as-dead buffer)
 	  (error "Somebody wants you to give them money"))
-	 (403
+	 (forbidden			; 403
 	  ;; The server understood the request, but is refusing to
 	  ;; fulfill it.  Authorization will not help and the request
 	  ;; SHOULD NOT be repeated.
 	  (setq success t))
-	 (404
+	 (not-found			; 404
 	  ;; Not found
 	  (setq success t))
-	 (405
+	 (method-not-allowed		; 405
 	  ;; The method specified in the Request-Line is not allowed
 	  ;; for the resource identified by the Request-URI.  The
 	  ;; response MUST include an Allow header containing a list of
 	  ;; valid methods for the requested resource.
 	  (setq success t))
-	 (406
+	 (not-acceptable		; 406
 	  ;; The resource identified by the request is only capable of
 	  ;; generating response entities which have content
 	  ;; characteristics nota cceptable according to the accept
 	  ;; headers sent in the request.
 	  (setq success t))
-	 (407
+	 (proxy-authentication-required ; 407
 	  ;; This code is similar to 401 (Unauthorized), but indicates
 	  ;; that the client must first authenticate itself with the
 	  ;; proxy.  The proxy MUST return a Proxy-Authenticate header
 	  ;; field containing a challenge applicable to the proxy for
 	  ;; the requested resource.
 	  (url-http-handle-authentication t))
-	 (408
+	 (request-timeout		; 408
 	  ;; The client did not produce a request within the time that
 	  ;; the server was prepared to wait.  The client MAY repeat
 	  ;; the request without modifications at any later time.
 	  (setq success t))
-	 (409
+	 (conflict			; 409
 	  ;; The request could not be completed due to a conflict with
 	  ;; the current state of the resource.  This code is only
 	  ;; allowed in situations where it is expected that the user
@@ -673,11 +724,11 @@
 	  ;; information for the user to recognize the source of the
 	  ;; conflict.
 	  (setq success t))
-	 (410
+	 (gone                          ; 410
 	  ;; The requested resource is no longer available at the
 	  ;; server and no forwarding address is known.
 	  (setq success t))
-	 (411
+	 (length-required		; 411
 	  ;; The server refuses to accept the request without a defined
 	  ;; Content-Length.  The client MAY repeat the request if it
 	  ;; adds a valid Content-Length header field containing the
@@ -687,29 +738,29 @@
 	  ;; `url-http-create-request' automatically calculates the
 	  ;; content-length.
 	  (setq success t))
-	 (412
+	 (precondition-failed		; 412
 	  ;; The precondition given in one or more of the
 	  ;; request-header fields evaluated to false when it was
 	  ;; tested on the server.
 	  (setq success t))
-	 ((413 414)
+	 ((request-entity-too-large request-uri-too-large) ; 413 414
 	  ;; The server is refusing to process a request because the
 	  ;; request entity|URI is larger than the server is willing or
 	  ;; able to process.
 	  (setq success t))
-	 (415
+	 (unsupported-media-type	; 415
 	  ;; The server is refusing to service the request because the
 	  ;; entity of the request is in a format not supported by the
 	  ;; requested resource for the requested method.
 	  (setq success t))
-	 (416
+	 (requested-range-not-satisfiable ; 416
 	  ;; A server SHOULD return a response with this status code if
 	  ;; a request included a Range request-header field, and none
 	  ;; of the range-specifier values in this field overlap the
 	  ;; current extent of the selected resource, and the request
 	  ;; did not include an If-Range request-header field.
 	  (setq success t))
-	 (417
+	 (expectation-failed		; 417
 	  ;; The expectation given in an Expect request-header field
 	  ;; could not be met by this server, or, if the server is a
 	  ;; proxy, the server has unambiguous evidence that the
@@ -736,16 +787,16 @@
        ;; 507 Insufficient storage
        (setq success t)
        (case url-http-response-status
-	 (501
+	 (not-implemented		; 501
 	  ;; The server does not support the functionality required to
 	  ;; fulfill the request.
 	  nil)
-	 (502
+	 (bad-gateway			; 502
 	  ;; The server, while acting as a gateway or proxy, received
 	  ;; an invalid response from the upstream server it accessed
 	  ;; in attempting to fulfill the request.
 	  nil)
-	 (503
+	 (service-unavailable		; 503
 	  ;; The server is currently unable to handle the request due
 	  ;; to a temporary overloading or maintenance of the server.
 	  ;; The implication is that this is a temporary condition
@@ -754,19 +805,19 @@
 	  ;; header.  If no Retry-After is given, the client SHOULD
 	  ;; handle the response as it would for a 500 response.
 	  nil)
-	 (504
+	 (gateway-timeout		; 504
 	  ;; The server, while acting as a gateway or proxy, did not
 	  ;; receive a timely response from the upstream server
 	  ;; specified by the URI (e.g. HTTP, FTP, LDAP) or some other
 	  ;; auxiliary server (e.g. DNS) it needed to access in
 	  ;; attempting to complete the request.
 	  nil)
-	 (505
+	 (http-version-not-supported	; 505
 	  ;; The server does not support, or refuses to support, the
 	  ;; HTTP protocol version that was used in the request
 	  ;; message.
 	  nil)
-	 (507				; DAV
+	 (insufficient-storage		; 507 (DAV)
 	  ;; The method could not be performed on the resource
 	  ;; because the server is unable to store the representation
 	  ;; needed to successfully complete the request.  This


^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: face for non-ASCII characters (was: Translation of http status  code to text)
  2010-03-23 14:26                       ` face for non-ASCII characters (was: Translation of http status code to text) Ted Zlatanov
@ 2010-03-23 16:28                         ` Lennart Borgman
  2010-03-23 18:18                           ` face for non-ASCII characters Ted Zlatanov
  2010-03-23 19:40                         ` Florian Beck
  1 sibling, 1 reply; 182+ messages in thread
From: Lennart Borgman @ 2010-03-23 16:28 UTC (permalink / raw)
  To: Ted Zlatanov; +Cc: emacs-devel

[-- Attachment #1: Type: text/plain, Size: 991 bytes --]

2010/3/23 Ted Zlatanov <tzz@lifelogs.com>:
> On Tue, 23 Mar 2010 14:08:42 +0100 Lennart Borgman <lennart.borgman@gmail.com> wrote:
>
> ...
> LB> (defface markchars-face
> LB>   '((t (:underline "light blue")))
> LB>   "Face for `markchars-mode' char marking."
> LB>   :group 'markchars)
>
> LB> (defconst markchars-keywords
> LB>   `(
> LB>     ("[[:nonascii:]]"
> LB>      (0 (put-text-property (match-beginning 0) (match-end 0)
> LB>                            'face 'markchars-face)))
> LB>     ))
>
> This is what I was thinking of, and the effect should be subtle.  The
> light blue underline is pretty good; it would be nice if it could be a
> choice between a few similarly well-chosen defaults (I'm hardly the one
> to provide them, though).
>
> I would make this an Emacs first-class option rather than an optional
> library.  I think it's really useful.

Ok, since you found it useful I tried to finish it. See the attachment.

[-- Attachment #2: markchars.el --]
[-- Type: text/plain, Size: 3294 bytes --]

;;; markchars.el --- Mark chars fitting certain characteristics
;;
;; Author: Lennart Borgman (lennart O borgman A gmail O com)
;; Created: 2010-03-22 Mon
;; Version:
;; Last-Updated:
;; URL:
;; Keywords:
;; Compatibility:
;;
;; Features that might be required by this library:
;;
;;   None
;;
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;
;;; Commentary:
;;
;; Mark special chars, by default nonascii chars. See `markchars-mode'.
;;
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;
;;; Change log:
;;
;;
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;
;; This program is free software; you can redistribute it and/or
;; modify it under the terms of the GNU General Public License as
;; published by the Free Software Foundation; either version 3, or
;; (at your option) any later version.
;;
;; This program is distributed in the hope that it will be useful,
;; but WITHOUT ANY WARRANTY; without even the implied warranty of
;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
;; General Public License for more details.
;;
;; You should have received a copy of the GNU General Public License
;; along with this program; see the file COPYING.  If not, write to
;; the Free Software Foundation, Inc., 51 Franklin Street, Fifth
;; Floor, Boston, MA 02110-1301, USA.
;;
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;
;;; Code:

;; (re-search-forward "[[:nonascii:]]")
;; äåö

;;;###autoload
(defgroup markchars nil
  "Customization group for `markchars-mode'."
  :group 'convenience)

(defface markchars-light
  '((t (:underline "light blue")))
  "Light face for `markchars-mode' char marking."
  :group 'markchars)

(defface markchars-heavy
  '((t (:underline "magenta")))
  "Heavy face for `markchars-mode' char marking."
  :group 'markchars)

(defcustom markchars-face 'markchars-heavy
  "Pointer to face used for marking chars."
  :type 'face
  :group 'markchars)

(defcustom markchars-pattern "[[:nonascii:]]"
  "Regexp for characters to mark.
This is supposed to match single characters, but you can of
course \(mis)use it anyway you want.

By default it matches nonascii-chars."
  :type 'regexp
  :group 'markchars)

(defvar markchars-keywords nil
  "Keywords for font lock.")

(defun markchars-set-keywords ()
  "Set `markchars-keywords' from options."
  (set (make-local-variable 'markchars-keywords)
       (list
        (list markchars-pattern
              (list 0 '(put-text-property (match-beginning 0) (match-end 0)
                                         'face markchars-face))))))

;;;###autoload
(define-minor-mode markchars-mode
  "Mark special characters.
Which characters to mark are defined by `markchars-pattern'.

The default is to mark nonascii chars with a magenta underline."
  :group 'markchars
  (if markchars-mode
      (progn
        (markchars-set-keywords)
        (font-lock-add-keywords nil markchars-keywords))
    (font-lock-remove-keywords nil markchars-keywords))
  (font-lock-fontify-buffer))

;;;###autoload
(define-globalized-minor-mode markchars-global-mode markchars-mode
  (lambda () (markchars-mode 1))
  :group 'markchars)

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;; markchars.el ends here

^ permalink raw reply	[flat|nested] 182+ messages in thread

* RE: highlighting non-ASCII characters (was: Translation of http statuscode to text)
  2010-03-23 14:22                     ` highlighting non-ASCII characters (was: Translation of http status code to text) Ted Zlatanov
@ 2010-03-23 16:50                       ` Drew Adams
  2010-03-23 21:49                         ` highlighting non-ASCII characters Stefan Monnier
  2010-03-24  9:27                       ` Juri Linkov
  1 sibling, 1 reply; 182+ messages in thread
From: Drew Adams @ 2010-03-23 16:50 UTC (permalink / raw)
  To: 'Ted Zlatanov', emacs-devel

> JL> Maybe `escape-glyph' is a relevant face?
> 
> I don't think it applies to non-ASCII by default, at least
> for me it doesn't.... Also it would be good if the effect
> did not interfere with other faces

I agree. `escape-glyph' is designed for a different use. Its intended use is
given by its doc string, and it is very precise and clear:

"Face for characters displayed as sequences using `^' or `\'.

In my case, I customize `escape-glyph' to be something very noticeable (orange
background, with blue foreground and 1-pixel blue box). I want to see something
like ^G really stand out. I don't want to see non-ascii chars handled the same
way.

It is a too-common mistake, IMO, to try too hard to reuse existing faces. It
complicates use (including customization) of those faces if they end up
signifying a collection of disparate things.

We should have the reflex, when tempted to reuse a face, to think twice. In
general, just say no to such y-a-qu'a impulses. ;-)

^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: face for non-ASCII characters
  2010-03-23 16:28                         ` Lennart Borgman
@ 2010-03-23 18:18                           ` Ted Zlatanov
  2011-04-15 22:41                             ` Ted Zlatanov
  0 siblings, 1 reply; 182+ messages in thread
From: Ted Zlatanov @ 2010-03-23 18:18 UTC (permalink / raw)
  To: emacs-devel

On Tue, 23 Mar 2010 17:28:01 +0100 Lennart Borgman <lennart.borgman@gmail.com> wrote: 

LB> 2010/3/23 Ted Zlatanov <tzz@lifelogs.com>:
>> I would make this an Emacs first-class option rather than an optional
>> library.  I think it's really useful.

LB> Ok, since you found it useful I tried to finish it. See the attachment.

Stefan, Yidong, what do you think about something like markchars.el
becoming part of Emacs, so users can turn it on with a global
customization?  I don't think it has to be a minor mode necessarily but
I'm no expert in that area.

If it goes in there are several tasks: picking a variable name, picking
suitable default faces, (maybe) adding a menu item, and documenting the
new behavior.  I would really like to make it easily available to Emacs
users and will help with these tasks.

Ted

^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: face for non-ASCII characters
  2010-03-23 14:26                       ` face for non-ASCII characters (was: Translation of http status code to text) Ted Zlatanov
  2010-03-23 16:28                         ` Lennart Borgman
@ 2010-03-23 19:40                         ` Florian Beck
  1 sibling, 0 replies; 182+ messages in thread
From: Florian Beck @ 2010-03-23 19:40 UTC (permalink / raw)
  To: Ted Zlatanov; +Cc: emacs-devel

Ted Zlatanov wrote:
> On Tue, 23 Mar 2010 14:08:42 +0100 Lennart Borgman <lennart.borgman@gmail.com> wrote: 
> 
> ...
> LB> (defface markchars-face
> LB>   '((t (:underline "light blue")))
> LB>   "Face for `markchars-mode' char marking."
> LB>   :group 'markchars)
> 
> LB> (defconst markchars-keywords
> LB>   `(
> LB>     ("[[:nonascii:]]"
> LB>      (0 (put-text-property (match-beginning 0) (match-end 0)
> LB>                            'face 'markchars-face)))
> LB>     ))
> 
> This is what I was thinking of, and the effect should be subtle.  The
> light blue underline is pretty good; it would be nice if it could be a
> choice between a few similarly well-chosen defaults (I'm hardly the one
> to provide them, though).  
> 
> I would make this an Emacs first-class option rather than an optional
> library.  I think it's really useful.

You could always do something like

(set-fontset-font "fontset-coding" ;; or whatever
	'(128 . 999999)
	(font-spec :family "LastResort" :size 20) ;; ok, not so subtle
	nil 'prepend)



> 
> Ted
> 
> 
> 
> 





^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: highlighting non-ASCII characters
  2010-03-23 16:50                       ` highlighting non-ASCII characters (was: Translation of http statuscode " Drew Adams
@ 2010-03-23 21:49                         ` Stefan Monnier
  2010-03-23 21:53                           ` Drew Adams
  0 siblings, 1 reply; 182+ messages in thread
From: Stefan Monnier @ 2010-03-23 21:49 UTC (permalink / raw)
  To: Drew Adams; +Cc: 'Ted Zlatanov', emacs-devel

> In my case, I customize `escape-glyph' to be something very noticeable
> (orange background, with blue foreground and 1-pixel blue box).
> I want to see something like ^G really stand out.  I don't want to see
> non-ascii chars handled the same way.

Actually, in the original case (a weird non-ASCII K in the middle of
ASCII text), I would want it to stand out.  The problem there is how to
define "weird".


        Stefan




^ permalink raw reply	[flat|nested] 182+ messages in thread

* RE: highlighting non-ASCII characters
  2010-03-23 21:49                         ` highlighting non-ASCII characters Stefan Monnier
@ 2010-03-23 21:53                           ` Drew Adams
  2010-03-24  0:45                             ` Stefan Monnier
  0 siblings, 1 reply; 182+ messages in thread
From: Drew Adams @ 2010-03-23 21:53 UTC (permalink / raw)
  To: 'Stefan Monnier'; +Cc: 'Ted Zlatanov', emacs-devel

> > In my case, I customize `escape-glyph' to be something very 
> > noticeable (orange background, with blue foreground and
> > 1-pixel blue box). I want to see something like ^G really
> > stand out.  I don't want to see
> > non-ascii chars handled the same way.
> 
> Actually, in the original case (a weird non-ASCII K in the middle of
> ASCII text), I would want it to stand out.  The problem there 
> is how to define "weird".

Whether you want non-ASCII chars to also stand out is a separate question. My
point is that I don't want to see them handled _the same way_ as ^G et
compagnie.

Having separate faces lets users get the behavior they want. If you then want to
customize the two faces to look the same, no problem.





^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: highlighting non-ASCII characters
  2010-03-23 21:53                           ` Drew Adams
@ 2010-03-24  0:45                             ` Stefan Monnier
  2010-03-24  1:03                               ` Ted Zlatanov
                                                 ` (3 more replies)
  0 siblings, 4 replies; 182+ messages in thread
From: Stefan Monnier @ 2010-03-24  0:45 UTC (permalink / raw)
  To: Drew Adams; +Cc: 'Ted Zlatanov', emacs-devel

>> > In my case, I customize `escape-glyph' to be something very 
>> > noticeable (orange background, with blue foreground and
>> > 1-pixel blue box). I want to see something like ^G really
>> > stand out.  I don't want to see
>> > non-ascii chars handled the same way.
>> Actually, in the original case (a weird non-ASCII K in the middle of
>> ASCII text), I would want it to stand out.  The problem there 
>> is how to define "weird".

> Whether you want non-ASCII chars to also stand out is a separate
> question. My point is that I don't want to see them handled _the same
> way_ as ^G et compagnie.

What I'm saying is that there are two issues: non-ASCII chars in general
(which I personally don't want to display in any special manner:
they're just as normal as ASCII chars), and then there are "chars that
are out of place or that may not be what they look like", such as the
weird "K" in the other message's "OK" (which to me, is similar to the
NBSP char in that it is meant to be displayed in the same way as some
other char, so we want to call the attention of the user to the
difference).

> Having separate faces lets users get the behavior they want.  If you
> then want to customize the two faces to look the same, no problem.

I don't insist on using escape-glyph for those chars, indeed (I don't
really care which face is used for them).  What I care about is figuring
out how to define programmatically "chars that are out of place or that
may not be what they look like".

        Stefan

^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: highlighting non-ASCII characters
  2010-03-24  0:45                             ` Stefan Monnier
@ 2010-03-24  1:03                               ` Ted Zlatanov
  2010-03-24  2:47                                 ` Stefan Monnier
  2010-03-24  2:09                               ` Drew Adams
                                                 ` (2 subsequent siblings)
  3 siblings, 1 reply; 182+ messages in thread
From: Ted Zlatanov @ 2010-03-24  1:03 UTC (permalink / raw)
  To: emacs-devel

On Tue, 23 Mar 2010 20:45:48 -0400 Stefan Monnier <monnier@iro.umontreal.ca> wrote: 

SM> What I'm saying is that there are two issues: non-ASCII chars in general
SM> (which I personally don't want to display in any special manner:
SM> they're just as normal as ASCII chars), and then there are "chars that
SM> are out of place or that may not be what they look like", such as the
SM> weird "K" in the other message's "OK" (which to me, is similar to the
SM> NBSP char in that it is meant to be displayed in the same way as some
SM> other char, so we want to call the attention of the user to the
SM> difference).
...
SM> I don't insist on using escape-glyph for those chars, indeed (I don't
SM> really care which face is used for them).  What I care about is figuring
SM> out how to define programmatically "chars that are out of place or that
SM> may not be what they look like".

How about this:

show-nonascii-characters: t, 'majority-paragraph, majority-line, 
                          'minority-line, 'minority-paragraph, 
                          'suspicious, a function, or nil (default) 

show-nonascii-characters-face: customizable from a list of presets,
escape-glyph, or a custom face

The rules:

t = always

majority-paragraph = highlight when they are 0-90% of the paragraph

majority-line = ditto for the line

minority-line = highlight when they are 0-20% of the line

minority-paragraph = ditto for the paragraph

suspicious = 3-5 characters per visual line

function = passed a list of regions of non-ASCII characters

nil = the default, don't highlight

I'm sure we'll tune this but as someone who writes non-ASCII characters
a lot, this would make sense.  I'd personally use t but I can see how
that could be annoying.  I think you're in favor of 'suspicious.

The idea is to make this easy to set up and available in any mode or
globally.  It's easy for an experienced user to set up something
atrocious, but good choices for the colors and the rules are, I think,
the tricky and valuable part.

Ted

^ permalink raw reply	[flat|nested] 182+ messages in thread

* RE: highlighting non-ASCII characters
  2010-03-24  0:45                             ` Stefan Monnier
  2010-03-24  1:03                               ` Ted Zlatanov
@ 2010-03-24  2:09                               ` Drew Adams
  2010-03-24  5:00                               ` Stephen J. Turnbull
  2010-03-24  9:28                               ` Juri Linkov
  3 siblings, 0 replies; 182+ messages in thread
From: Drew Adams @ 2010-03-24  2:09 UTC (permalink / raw)
  To: 'Stefan Monnier'; +Cc: 'Ted Zlatanov', emacs-devel

> >> > In my case, I customize `escape-glyph' to be something very 
> >> > noticeable (orange background, with blue foreground and
> >> > 1-pixel blue box). I want to see something like ^G really
> >> > stand out.  I don't want to see
> >> > non-ascii chars handled the same way.
> >>
> >> Actually, in the original case (a weird non-ASCII K in the 
> >> middle of ASCII text), I would want it to stand out.
> >> The problem there 
> >> is how to define "weird".
> 
> > Whether you want non-ASCII chars to also stand out is a separate
> > question. My point is that I don't want to see them handled 
> > _the same way_ as ^G et compagnie.
> 
> What I'm saying is that there are two issues: non-ASCII chars 
> in general (which I personally don't want to display in any
> special manner:
> they're just as normal as ASCII chars), and then there are "chars that
> are out of place or that may not be what they look like", such as the
> weird "K" in the other message's "OK" (which to me, is similar to the
> NBSP char in that it is meant to be displayed in the same way as some
> other char, so we want to call the attention of the user to the
> difference).
> 
> > Having separate faces lets users get the behavior they want.  If you
> > then want to customize the two faces to look the same, no problem.
> 
> I don't insist on using escape-glyph for those chars, indeed (I don't
> really care which face is used for them).  What I care about 
> is figuring out how to define programmatically "chars that are
> out of place or that may not be what they look like".

Sounds good to me. When you figure it out, you can have a way to highlight
"weird" non-ASCII that's different from escape-glyph. And if someone also wants
a way to highlight non-weird non-ASCII, that can be done too. ;-) 





^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: highlighting non-ASCII characters
  2010-03-24  1:03                               ` Ted Zlatanov
@ 2010-03-24  2:47                                 ` Stefan Monnier
  2010-03-24  4:20                                   ` Eli Zaretskii
  2010-03-24 10:05                                   ` Ted Zlatanov
  0 siblings, 2 replies; 182+ messages in thread
From: Stefan Monnier @ 2010-03-24  2:47 UTC (permalink / raw)
  To: Ted Zlatanov; +Cc: emacs-devel

> show-nonascii-characters: t, 'majority-paragraph, majority-line,
>                           'minority-line, 'minority-paragraph,
>                           'suspicious, a function, or nil (default)

The name is wrong, I think: I'd probably want to highlight ASCII chars
that are out of place, just as with non-ASCII chars.

Also, I'm not sure if proportion compared to total text (or line) is
a good metric to decide whether it's suspicious.  I don't have much
better to suggest, tho.


        Stefan




^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: Translation of http status code to text
  2010-03-23 15:02                 ` Ted Zlatanov
@ 2010-03-24  3:22                   ` Stefan Monnier
  2010-03-24 17:35                   ` Glenn Morris
  1 sibling, 0 replies; 182+ messages in thread
From: Stefan Monnier @ 2010-03-24  3:22 UTC (permalink / raw)
  To: Ted Zlatanov; +Cc: emacs-devel

> From the perspective of the programmer it's better to see the symbols
> (grep won't work with this approach).  Also "OK" look weird as the "ok"
> status code and "time-out" in the error code is odd IMHO.  These are
> matters of taste but I hope you'll allow that the manually generated
> table is a little better.  It only has to be done once, after all.

ok,


        Stefan




^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: highlighting non-ASCII characters
  2010-03-24  2:47                                 ` Stefan Monnier
@ 2010-03-24  4:20                                   ` Eli Zaretskii
  2010-03-24  5:14                                     ` Jason Rumney
  2010-03-24 10:05                                   ` Ted Zlatanov
  1 sibling, 1 reply; 182+ messages in thread
From: Eli Zaretskii @ 2010-03-24  4:20 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: tzz, emacs-devel

> From: Stefan Monnier <monnier@iro.umontreal.ca>
> Date: Tue, 23 Mar 2010 22:47:37 -0400
> Cc: emacs-devel@gnu.org
> 
> Also, I'm not sure if proportion compared to total text (or line) is
> a good metric to decide whether it's suspicious.  I don't have much
> better to suggest, tho.

If we go for such a metric, it would need to be augmented by a
database of words where a small number of such characters is
``normal'', not to be highlighted.  This is for words like naïve.
Otherwise the feature will be an annoyance.





^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: highlighting non-ASCII characters
  2010-03-24  0:45                             ` Stefan Monnier
  2010-03-24  1:03                               ` Ted Zlatanov
  2010-03-24  2:09                               ` Drew Adams
@ 2010-03-24  5:00                               ` Stephen J. Turnbull
  2010-03-24  9:28                               ` Juri Linkov
  3 siblings, 0 replies; 182+ messages in thread
From: Stephen J. Turnbull @ 2010-03-24  5:00 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: 'Ted Zlatanov', Drew Adams, emacs-devel

Stefan Monnier writes:

 > I don't insist on using escape-glyph for those chars, indeed (I don't
 > really care which face is used for them).  What I care about is figuring
 > out how to define programmatically "chars that are out of place or that
 > may not be what they look like".

There were long threads on Python-dev about this with respect to the
PEPs implementing Unicode.  The bottom line was basically that the
recommendations of the Unicode Security Considerations UTR #36 should
be followed with respect to "characters that may not be what they look
like".

"Out of place" is harder.





^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: highlighting non-ASCII characters
  2010-03-24  4:20                                   ` Eli Zaretskii
@ 2010-03-24  5:14                                     ` Jason Rumney
  2010-03-24 13:25                                       ` Stefan Monnier
  0 siblings, 1 reply; 182+ messages in thread
From: Jason Rumney @ 2010-03-24  5:14 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: tzz, Stefan Monnier, emacs-devel

On 24/03/2010 12:20, Eli Zaretskii wrote:

> If we go for such a metric, it would need to be augmented by a
> database of words where a small number of such characters is
> ``normal'', not to be highlighted.  This is for words like naïve.
> Otherwise the feature will be an annoyance.
>    

It's also dependent on which characters they are - Cyrillic, Han, Greek, 
Hebrew etc should be expected to appear in long runs, perhaps with runs 
of ASCII and/or other characters interleaved.  Latin-1 on the other hand 
would normally appear individually or in very short runs mixed in with 
ASCII.

There is no single heuristic that can be used to identify "suspicious" 
characters.

^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: highlighting non-ASCII characters
  2010-03-23 14:22                     ` highlighting non-ASCII characters (was: Translation of http status code to text) Ted Zlatanov
  2010-03-23 16:50                       ` highlighting non-ASCII characters (was: Translation of http statuscode " Drew Adams
@ 2010-03-24  9:27                       ` Juri Linkov
  1 sibling, 0 replies; 182+ messages in thread
From: Juri Linkov @ 2010-03-24  9:27 UTC (permalink / raw)
  To: Ted Zlatanov; +Cc: emacs-devel

> I can set up font-lock for myself, of course, but was wondering
> if Emacs has a simple built-in way to highlight them.

This is what `hi-lock.el' is intended for.

You can put (highlight-regexp "[[:nonascii:]]" 'escape-glyph)
to any hook you want with any face you want.  Or to add file-local
patterns like Hi-lock:(("[[:nonascii:]]" (0 (quote escape-glyph) t))).

-- 
Juri Linkov
http://www.jurta.org/emacs/




^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: highlighting non-ASCII characters
  2010-03-24  0:45                             ` Stefan Monnier
                                                 ` (2 preceding siblings ...)
  2010-03-24  5:00                               ` Stephen J. Turnbull
@ 2010-03-24  9:28                               ` Juri Linkov
  2010-03-24 13:15                                 ` Ted Zlatanov
  3 siblings, 1 reply; 182+ messages in thread
From: Juri Linkov @ 2010-03-24  9:28 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: 'Ted Zlatanov', emacs-devel

> What I'm saying is that there are two issues: non-ASCII chars in general
> (which I personally don't want to display in any special manner:
> they're just as normal as ASCII chars), and then there are "chars that
> are out of place or that may not be what they look like", such as the
> weird "K" in the other message's "OK" (which to me, is similar to the
> NBSP char in that it is meant to be displayed in the same way as some
> other char, so we want to call the attention of the user to the
> difference).

There is another case where non-highlighted out of place chars
cause problems: http://thread.gmane.org/gmane.emacs.devel/121494

When I look at NBSP chars in cc-engine.el, I see them as `EQUALS SIGN'
(actually `C-u C-x =' says they are `BOX DRAWINGS DOUBLE HORIZONTAL').
That's because of the wrong coding used (there is no coding: tag,
so they are displayed with my default koi8).  When Miles reported this
problem with NBSP chars in cc-engine.el, I found nothing suspicious
because equals-sign chars don't look out of place.  And only when
I grepped for non-ASCII chars, I discovered them.

What should be a heuristics to highlight these problematic chars
that look like normal chars?

-- 
Juri Linkov
http://www.jurta.org/emacs/




^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: highlighting non-ASCII characters
  2010-03-24  2:47                                 ` Stefan Monnier
  2010-03-24  4:20                                   ` Eli Zaretskii
@ 2010-03-24 10:05                                   ` Ted Zlatanov
  2010-03-24 16:21                                     ` Lennart Borgman
  2010-03-25  7:12                                     ` Juri Linkov
  1 sibling, 2 replies; 182+ messages in thread
From: Ted Zlatanov @ 2010-03-24 10:05 UTC (permalink / raw)
  To: emacs-devel

On Tue, 23 Mar 2010 22:47:37 -0400 Stefan Monnier <monnier@iro.umontreal.ca> wrote: 

>> show-nonascii-characters: t, 'majority-paragraph, majority-line,
>> 'minority-line, 'minority-paragraph,
>> 'suspicious, a function, or nil (default)

SM> The name is wrong, I think: I'd probably want to highlight ASCII chars
SM> that are out of place, just as with non-ASCII chars.

Although Unicode calls them "confusable" I think that's a terrible name.
So how about show-out-of-place-glyphs as an alist and 'homoglyphs as
a key option (see http://en.wikipedia.org/wiki/Homoglyph and, amazingly,
http://homoglyphs.net/)?

show-out-of-place-glyphs: alist; keys can be 'ascii, 'nonascii,
or 'homoglyphs.  Maybe we can also allow a general regex.

Values can be 'always, 'majority-paragraph, majority-line,
'minority-line, 'minority-paragraph, 'suspicious (with the same rules I
proposed earlier).  A function should also be possible.  Optional second
value is a face, defaulting to `out-of-place-glyph'.

That lets us map an interesting class of characters to a heuristic that
determines whether they are out of place.

So Stefan might have (and this could be the Emacs default)

(setq show-out-of-place-glyphs '(homoglyphs suspicious))

but I would have

(setq show-out-of-place-glyphs '(nonascii always face1))

which includes Stefan's setting.

SM> Also, I'm not sure if proportion compared to total text (or line) is
SM> a good metric to decide whether it's suspicious.  I don't have much
SM> better to suggest, tho.

I based it on what I would find useful.  I think the majority of people
will want 'suspicious and let Emacs choose a default.  So maybe the
{majority,minority}-* options are superfluous.

On Wed, 24 Mar 2010 06:20:51 +0200 Eli Zaretskii <eliz@gnu.org> wrote: 

EZ> If we go for such a metric, it would need to be augmented by a
EZ> database of words where a small number of such characters is
EZ> ``normal'', not to be highlighted.  This is for words like naïve.
EZ> Otherwise the feature will be an annoyance.

That's in the extended ASCII charset which would probably be included in
the ASCII definition above, although it certainly has homoglyphs to
upper-range Unicode (I revised the proposal to distinguish between
highlighting homoglyphs and non-ASCII).  I think regular English doesn't
have many common words that would be outside the extended ASCII charset.

On Wed, 24 Mar 2010 13:14:13 +0800 Jason Rumney <jasonr@gnu.org> wrote: 

JR> It's also dependent on which characters they are - Cyrillic, Han,
JR> Greek, Hebrew etc should be expected to appear in long runs, perhaps
JR> with runs of ASCII and/or other characters interleaved.  Latin-1 on
JR> the other hand would normally appear individually or in very short
JR> runs mixed in with ASCII.

Agreed, and that can be fine-tuned.

JR> There is no single heuristic that can be used to identify "suspicious"
JR> characters.

So we'll provide several.  I'd rather have something useful than try to
make it perfect.

On Tue, 23 Mar 2010 19:09:18 -0700 "Drew Adams" <drew.adams@oracle.com> wrote: 

>> What I'm saying is that there are two issues: non-ASCII chars in
>> general (which I personally don't want to display in any special
>> manner: they're just as normal as ASCII chars), and then there are
>> "chars that are out of place or that may not be what they look like",
>> such as the weird "K" in the other message's "OK" (which to me, is
>> similar to the NBSP char in that it is meant to be displayed in the
>> same way as some other char, so we want to call the attention of the
>> user to the difference).

I hope you'll go along with "homoglyphs" as I propose, I think that's
what you mean :)

On Wed, 24 Mar 2010 14:00:47 +0900 "Stephen J. Turnbull" <stephen@xemacs.org> wrote: 

SJT> There were long threads on Python-dev about this with respect to the
SJT> PEPs implementing Unicode.  The bottom line was basically that the
SJT> recommendations of the Unicode Security Considerations UTR #36 should
SJT> be followed with respect to "characters that may not be what they look
SJT> like".

This is relevant, thanks for the pointer.  See

http://unicode.org/reports/tr36/

which links to:

http://www.unicode.org/reports/tr39/#Confusable_Detection

which can also be used to build a table of homoglyphs (as in http://homoglyphs.net).

Thanks!
Ted

^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: highlighting non-ASCII characters
  2010-03-24  9:28                               ` Juri Linkov
@ 2010-03-24 13:15                                 ` Ted Zlatanov
  0 siblings, 0 replies; 182+ messages in thread
From: Ted Zlatanov @ 2010-03-24 13:15 UTC (permalink / raw)
  To: emacs-devel

On Wed, 24 Mar 2010 11:28:16 +0200 Juri Linkov <juri@jurta.org> wrote: 

JL> There is another case where non-highlighted out of place chars
JL> cause problems: http://thread.gmane.org/gmane.emacs.devel/121494

JL> When I look at NBSP chars in cc-engine.el, I see them as `EQUALS SIGN'
JL> (actually `C-u C-x =' says they are `BOX DRAWINGS DOUBLE HORIZONTAL').
JL> That's because of the wrong coding used (there is no coding: tag,
JL> so they are displayed with my default koi8).  When Miles reported this
JL> problem with NBSP chars in cc-engine.el, I found nothing suspicious
JL> because equals-sign chars don't look out of place.  And only when
JL> I grepped for non-ASCII chars, I discovered them.

JL> What should be a heuristics to highlight these problematic chars
JL> that look like normal chars?

If you look at my latest proposal for showing out-of-place characters,
'(homoglyphs suspicious) will probably work for this case as well.

Ted

^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: highlighting non-ASCII characters
  2010-03-24  5:14                                     ` Jason Rumney
@ 2010-03-24 13:25                                       ` Stefan Monnier
  2010-03-24 15:06                                         ` Jason Rumney
  0 siblings, 1 reply; 182+ messages in thread
From: Stefan Monnier @ 2010-03-24 13:25 UTC (permalink / raw)
  To: Jason Rumney; +Cc: Eli Zaretskii, tzz, emacs-devel

>> If we go for such a metric, it would need to be augmented by a
>> database of words where a small number of such characters is
>> ``normal'', not to be highlighted.  This is for words like naïve.
>> Otherwise the feature will be an annoyance.
> It's also dependent on which characters they are - Cyrillic, Han, Greek,
> Hebrew etc should be expected to appear in long runs, perhaps with runs of
> ASCII and/or other characters interleaved.  Latin-1 on the other hand would
> normally appear individually or in very short runs mixed in with ASCII.

> There is no single heuristic that can be used to identify
> "suspicious" characters.

Maybe within the scope of "social engineering attacks where some text
(e.g. a host name) looks like something well-known but actually uses
funny Unicode chars", some people have come up with something like
a list of known problems (typically Unicode points that are usually
displayed identically).


        Stefan




^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: highlighting non-ASCII characters
  2010-03-24 13:25                                       ` Stefan Monnier
@ 2010-03-24 15:06                                         ` Jason Rumney
  2010-03-24 19:47                                           ` Ted Zlatanov
  0 siblings, 1 reply; 182+ messages in thread
From: Jason Rumney @ 2010-03-24 15:06 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: Eli Zaretskii, tzz, emacs-devel

Stefan Monnier <monnier@iro.umontreal.ca> writes:

>> There is no single heuristic that can be used to identify
>> "suspicious" characters.
>
> Maybe within the scope of "social engineering attacks where some text
> (e.g. a host name) looks like something well-known but actually uses
> funny Unicode chars", some people have come up with something like
> a list of known problems (typically Unicode points that are usually
> displayed identically).

Yes, within that context suspicious has a clear enough meaning. But the
cases others were discussing seemed more like cases of mismatched
encodings that they wanted to be flagged more obviously.




^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: highlighting non-ASCII characters
  2010-03-24 10:05                                   ` Ted Zlatanov
@ 2010-03-24 16:21                                     ` Lennart Borgman
  2010-03-24 19:34                                       ` Lennart Borgman
  2010-03-25  7:11                                       ` Juri Linkov
  2010-03-25  7:12                                     ` Juri Linkov
  1 sibling, 2 replies; 182+ messages in thread
From: Lennart Borgman @ 2010-03-24 16:21 UTC (permalink / raw)
  To: Ted Zlatanov; +Cc: emacs-devel

2010/3/24 Ted Zlatanov <tzz@lifelogs.com>:
> On Wed, 24 Mar 2010 14:00:47 +0900 "Stephen J. Turnbull" <stephen@xemacs.org> wrote:
>
> SJT> There were long threads on Python-dev about this with respect to the
> SJT> PEPs implementing Unicode.  The bottom line was basically that the
> SJT> recommendations of the Unicode Security Considerations UTR #36 should
> SJT> be followed with respect to "characters that may not be what they look
> SJT> like".
>
> This is relevant, thanks for the pointer.  See
>
> http://unicode.org/reports/tr36/
>
> which links to:
>
> http://www.unicode.org/reports/tr39/#Confusable_Detection
>
> which can also be used to build a table of homoglyphs (as in http://homoglyphs.net).


Maybe "Recommended Identifier Profiles for IDN" should be implemented
in Emacs? (See http://www.unicode.org/reports/tr39/data/idnchars.txt)

How about a bool vector (see make-bool-vector) for this?




^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: Translation of http status code to text
  2010-03-23 15:02                 ` Ted Zlatanov
  2010-03-24  3:22                   ` Stefan Monnier
@ 2010-03-24 17:35                   ` Glenn Morris
  2010-03-24 19:37                     ` Ted Zlatanov
  1 sibling, 1 reply; 182+ messages in thread
From: Glenn Morris @ 2010-03-24 17:35 UTC (permalink / raw)
  To: Ted Zlatanov; +Cc: emacs-devel


+	;; other status symbols: jewelry and luxury cars
+	(status-symbol (cadr (assq status-number url-http-codes))))

`status-number' doesn't seem to be defined anywhere; should it be
url-http-response-status ?




^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: highlighting non-ASCII characters
  2010-03-24 16:21                                     ` Lennart Borgman
@ 2010-03-24 19:34                                       ` Lennart Borgman
  2010-03-26 17:35                                         ` Ted Zlatanov
  2010-03-25  7:11                                       ` Juri Linkov
  1 sibling, 1 reply; 182+ messages in thread
From: Lennart Borgman @ 2010-03-24 19:34 UTC (permalink / raw)
  To: Ted Zlatanov; +Cc: emacs-devel

[-- Attachment #1: Type: text/plain, Size: 562 bytes --]

On Wed, Mar 24, 2010 at 5:21 PM, Lennart Borgman
<lennart.borgman@gmail.com> wrote:
>
> Maybe "Recommended Identifier Profiles for IDN" should be implemented
> in Emacs? (See http://www.unicode.org/reports/tr39/data/idnchars.txt)
>
> How about a bool vector (see make-bool-vector) for this?

I wrote a little file for that part. (Too much trouble trying to
report some difficult bugs I have not reported yet. I need to do some
more relaxing thing like this... ;-)

The attached file sets up IDN chars as above. How about defining a
character class [:idnchars:]?

[-- Attachment #2: idn.el --]
[-- Type: text/plain, Size: 3460 bytes --]

;;; idn.el --- Recommended Identifier Profiles for IDN
;;
;; Author: Lennart Borgman (lennart O borgman A gmail O com)
;; Created: 2010-03-24 Wed
;; Version: 0.1
;; Last-Updated: 2010-03-24 Wed
;; URL:
;; Keywords:
;; Compatibility:
;;
;; Features that might be required by this library:
;;
;;   None
;;
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;
;;; Commentary:
;;
;; Functions for handling IDN chars defined by
;; `http://www.unicode.org/reports/tr39/'.
;;
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;
;;; Change log:
;;
;;
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;
;; This program is free software; you can redistribute it and/or
;; modify it under the terms of the GNU General Public License as
;; published by the Free Software Foundation; either version 3, or
;; (at your option) any later version.
;;
;; This program is distributed in the hope that it will be useful,
;; but WITHOUT ANY WARRANTY; without even the implied warranty of
;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
;; General Public License for more details.
;;
;; You should have received a copy of the GNU General Public License
;; along with this program; see the file COPYING.  If not, write to
;; the Free Software Foundation, Inc., 51 Franklin Street, Fifth
;; Floor, Boston, MA 02110-1301, USA.
;;
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;
;;; Code:

(defvar uts39-datadir "c:/uts39/data/")

(defun idn-init (bv)
  (let* ((idnchars-file (expand-file-name "idnchars.txt" uts39-datadir))
         (idnchars-old (find-buffer-visiting idnchars-file))
         (idnchars-buf (or idnchars-old
                           (if (not (file-exists-p idnchars-file))
                               (message "Can't find file %S" idnchars-file)
                             (find-file-noselect idnchars-file))))
         here
         (range-patt (rx bol
                         (group (repeat 4 (any xdigit)))
                         (optional ".."
                                   (group (repeat 4 (any xdigit)))))))
    (when idnchars-buf
      (with-current-buffer idnchars-buf
        (setq here (point))
        (save-restriction
          (widen)
          (goto-char (point-min))
          (while (re-search-forward range-patt nil t)
            (let* ((str-beg (match-string 0))
                   (str-end (match-string 1))
                   (beg (string-to-number str-beg 16))
                   (end (or (when str-end (string-to-number str-end 16))
                            beg)))
              (dotimes (ii (1+ (- end beg)))
                (aset bv (+ ii beg) t)))))
        (goto-char here))
      (unless idnchars-old (kill-buffer idnchars-buf))
      t)))

(defconst idn-chars
  "Boolean vector with recommended IDN chars.
See URL `http://www.unicode.org/reports/tr39/'.

Data is initialized from the file idnchars.txt in the directory
`uts39-datadir'.  This file is fetched from the above location."
  (let ((bv (make-bool-vector (1- (* 256 256)) nil)))
    (when (idn-init bv)
      ;; (string-to-number "002D" 16)
      ;; Make a quick sanity check:
      (unless (and (not (aref idn-chars 44))
                   (aref idn-chars 45))
        (message "idn-chars: Bad idn data in file idnchars.txt"))
      bv)))

(provide 'idn)
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;; idn.el ends here

^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: Translation of http status code to text
  2010-03-24 17:35                   ` Glenn Morris
@ 2010-03-24 19:37                     ` Ted Zlatanov
  2010-03-25  1:16                       ` Ted Zlatanov
  0 siblings, 1 reply; 182+ messages in thread
From: Ted Zlatanov @ 2010-03-24 19:37 UTC (permalink / raw)
  To: emacs-devel

[-- Attachment #1: Type: text/plain, Size: 455 bytes --]

On Wed, 24 Mar 2010 13:35:10 -0400 Glenn Morris <rgm@gnu.org> wrote: 

GM> +	;; other status symbols: jewelry and luxury cars
GM> +	(status-symbol (cadr (assq status-number url-http-codes))))

GM> `status-number' doesn't seem to be defined anywhere; should it be
GM> url-http-response-status ?

Sorry about that, I had the right version but screwed up the commit.  I
can't commit currently but maybe you could?  The (trivial) patch is below.

Thanks
Ted


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: url-http-status.patch --]
[-- Type: text/x-diff, Size: 1082 bytes --]

=== modified file 'lisp/url/ChangeLog'
--- lisp/url/ChangeLog	2010-03-24 09:30:28 +0000
+++ lisp/url/ChangeLog	2010-03-24 19:29:55 +0000
@@ -1,5 +1,9 @@
 2010-03-24  Teodor Zlatanov  <tzz@lifelogs.com>
 
+	* url-http.el (url-http-parse-headers): Fix wrong variable name.
+
+2010-03-24  Teodor Zlatanov  <tzz@lifelogs.com>
+
 	* url-http.el (url-http-codes): New variable to hold a mapping of
 	HTTP status codes' numbers, their symbolic name, and their text.
 	(url-http-parse-headers): Use it, leaving the original numeric

=== modified file 'lisp/url/url-http.el'
--- lisp/url/url-http.el	2010-03-24 09:30:28 +0000
+++ lisp/url/url-http.el	2010-03-24 19:29:08 +0000
@@ -486,7 +486,7 @@
 	(class nil)
 	(success nil)
 	;; other status symbols: jewelry and luxury cars
-	(status-symbol (cadr (assq status-number url-http-codes))))
+	(status-symbol (cadr (assq url-http-response-status url-http-codes))))
     (setq class (/ url-http-response-status 100))
     (url-http-debug "Parsed HTTP headers: class=%d status=%d" class url-http-response-status)
     (url-http-handle-cookies)


^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: highlighting non-ASCII characters
  2010-03-24 15:06                                         ` Jason Rumney
@ 2010-03-24 19:47                                           ` Ted Zlatanov
  0 siblings, 0 replies; 182+ messages in thread
From: Ted Zlatanov @ 2010-03-24 19:47 UTC (permalink / raw)
  To: emacs-devel

On Wed, 24 Mar 2010 23:06:48 +0800 Jason Rumney <jasonr@gnu.org> wrote: 

JR> Yes, within that [social engineering] context suspicious has a clear
JR> enough meaning. But the cases others were discussing seemed more
JR> like cases of mismatched encodings that they wanted to be flagged
JR> more obviously.

All of those use cases are valid, but the social engineering is IMO the
most important reason why we should consider it for first-class
inclusion in Emacs.

On Wed, 24 Mar 2010 17:21:40 +0100 Lennart Borgman <lennart.borgman@gmail.com> wrote: 

LB> Maybe "Recommended Identifier Profiles for IDN" should be implemented
LB> in Emacs? (See http://www.unicode.org/reports/tr39/data/idnchars.txt)

LB> How about a bool vector (see make-bool-vector) for this?

I don't know if we should be planning the storage mechanism yet, but you
can certainly put that together.  Don't optimize this prematurely, just
use a list at first.

It's important to decide what we're doing with this feature; as you see
there's several use cases and I want to be sure we can cover them all
(hence my proposal with several classes of characters and matching
rules) before jumping into the implementation.  I'm pretty sure we'll at
least need the "confusables" or homoglyphs so maybe that's a good start.

Ted

^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: Translation of http status code to text
  2010-03-24 19:37                     ` Ted Zlatanov
@ 2010-03-25  1:16                       ` Ted Zlatanov
  0 siblings, 0 replies; 182+ messages in thread
From: Ted Zlatanov @ 2010-03-25  1:16 UTC (permalink / raw)
  To: emacs-devel

On Wed, 24 Mar 2010 14:37:24 -0500 Ted Zlatanov <tzz@lifelogs.com> wrote: 

TZ> On Wed, 24 Mar 2010 13:35:10 -0400 Glenn Morris <rgm@gnu.org> wrote: 
GM> +	;; other status symbols: jewelry and luxury cars
GM> +	(status-symbol (cadr (assq status-number url-http-codes))))

GM> `status-number' doesn't seem to be defined anywhere; should it be
GM> url-http-response-status ?

TZ> Sorry about that, I had the right version but screwed up the commit.  I
TZ> can't commit currently but maybe you could?  The (trivial) patch is below.

It's fixed.

Ted





^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: highlighting non-ASCII characters
  2010-03-24 16:21                                     ` Lennart Borgman
  2010-03-24 19:34                                       ` Lennart Borgman
@ 2010-03-25  7:11                                       ` Juri Linkov
  2010-03-25 14:07                                         ` Lennart Borgman
  1 sibling, 1 reply; 182+ messages in thread
From: Juri Linkov @ 2010-03-25  7:11 UTC (permalink / raw)
  To: Lennart Borgman; +Cc: Ted Zlatanov, emacs-devel

> Maybe "Recommended Identifier Profiles for IDN" should be implemented
> in Emacs? (See http://www.unicode.org/reports/tr39/data/idnchars.txt)

I think it would be more useful to implement this spec:
http://www.unicode.org/reports/tr39/data/confusables.txt
"Visually Confusable Characters: Provides a mapping for visual
confusables for use in further restricting identifiers for security".

It's very large, but it seems it's still incomplete.  I can't find
a "confusable" mapping for the problem I reported:

  BOX DRAWINGS DOUBLE HORIZONTAL   ->   EQUALS SIGN

-- 
Juri Linkov
http://www.jurta.org/emacs/

^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: highlighting non-ASCII characters
  2010-03-24 10:05                                   ` Ted Zlatanov
  2010-03-24 16:21                                     ` Lennart Borgman
@ 2010-03-25  7:12                                     ` Juri Linkov
  1 sibling, 0 replies; 182+ messages in thread
From: Juri Linkov @ 2010-03-25  7:12 UTC (permalink / raw)
  To: Ted Zlatanov; +Cc: emacs-devel

> show-out-of-place-glyphs: alist; keys can be 'ascii, 'nonascii,
> or 'homoglyphs.  Maybe we can also allow a general regex.
>
> Values can be 'always, 'majority-paragraph, majority-line,
> 'minority-line, 'minority-paragraph, 'suspicious (with the same rules I
> proposed earlier).  A function should also be possible.  Optional second
> value is a face, defaulting to `out-of-place-glyph'.

If you are going to implement this package, I suggest you to look at
whitespace.el: how it defines where to highlight whitespace character,
how it defines rules for different classes of whitespace.

I don't propose to extend it to visualize suspicious characters
and homoglyphs in addition to whitespace, but really it has very
similar functionality.

-- 
Juri Linkov
http://www.jurta.org/emacs/




^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: highlighting non-ASCII characters
  2010-03-25  7:11                                       ` Juri Linkov
@ 2010-03-25 14:07                                         ` Lennart Borgman
  2010-03-25 17:32                                           ` Juri Linkov
  0 siblings, 1 reply; 182+ messages in thread
From: Lennart Borgman @ 2010-03-25 14:07 UTC (permalink / raw)
  To: Juri Linkov; +Cc: Ted Zlatanov, emacs-devel

On Thu, Mar 25, 2010 at 8:11 AM, Juri Linkov <juri@jurta.org> wrote:
>> Maybe "Recommended Identifier Profiles for IDN" should be implemented
>> in Emacs? (See http://www.unicode.org/reports/tr39/data/idnchars.txt)
>
> I think it would be more useful to implement this spec:
> http://www.unicode.org/reports/tr39/data/confusables.txt
> "Visually Confusable Characters: Provides a mapping for visual
> confusables for use in further restricting identifiers for security".


I have already implemented marking of non-ascii, non-IDN chars (it is
in the nXhtml repository currently).

To me it looks like IDN is the most important. Is not this a
derivative work from "confusables"?

Perhaps it is useful to implement "confusables" too. Do you have any
idea of what to mark there then? (And what would actually differ from
the IDN case?)




^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: highlighting non-ASCII characters
  2010-03-25 14:07                                         ` Lennart Borgman
@ 2010-03-25 17:32                                           ` Juri Linkov
  2010-03-26  0:32                                             ` Lennart Borgman
  0 siblings, 1 reply; 182+ messages in thread
From: Juri Linkov @ 2010-03-25 17:32 UTC (permalink / raw)
  To: Lennart Borgman; +Cc: Ted Zlatanov, emacs-devel

> To me it looks like IDN is the most important. Is not this a
> derivative work from "confusables"?

How IDN will help to highlight characters that look like other characters?

> Perhaps it is useful to implement "confusables" too. Do you have any
> idea of what to mark there then?

You can mark all characters mentioned in confusables.txt.

> (And what would actually differ from the IDN case?)

What do you want to use IDN for?

-- 
Juri Linkov
http://www.jurta.org/emacs/




^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: highlighting non-ASCII characters
  2010-03-25 17:32                                           ` Juri Linkov
@ 2010-03-26  0:32                                             ` Lennart Borgman
  2010-03-26 13:38                                               ` Stephen Berman
  0 siblings, 1 reply; 182+ messages in thread
From: Lennart Borgman @ 2010-03-26  0:32 UTC (permalink / raw)
  To: Juri Linkov; +Cc: Ted Zlatanov, emacs-devel

On Thu, Mar 25, 2010 at 6:32 PM, Juri Linkov <juri@jurta.org> wrote:
>> To me it looks like IDN is the most important. Is not this a
>> derivative work from "confusables"?
>
> How IDN will help to highlight characters that look like other characters?

All this is new to me so I might have misunderstood it. However it
looks to me like the IDN (which I have supposed means IDentifier Name,
but maybe that is wrong) chars are selected so that the characters
choosen are not easily mistaken for other chars.

And I suppose that as many characters as possible has been included in IDN.

If my assumptions are right, would then the non-IDN chars not be the
ones to mark?

>> Perhaps it is useful to implement "confusables" too. Do you have any
>> idea of what to mark there then?
>
> You can mark all characters mentioned in confusables.txt.

Yes, I thought of that (if someone can explain why and when it may be useful).

>> (And what would actually differ from the IDN case?)
>
> What do you want to use IDN for?

I thought they were choosen to more safely create URLs, programming
symbols etc. Is not that the purpose?

^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: highlighting non-ASCII characters
  2010-03-26  0:32                                             ` Lennart Borgman
@ 2010-03-26 13:38                                               ` Stephen Berman
  2010-03-26 22:44                                                 ` Lennart Borgman
  0 siblings, 1 reply; 182+ messages in thread
From: Stephen Berman @ 2010-03-26 13:38 UTC (permalink / raw)
  To: emacs-devel

On Fri, 26 Mar 2010 01:32:53 +0100 Lennart Borgman <lennart.borgman@gmail.com> wrote:

> On Thu, Mar 25, 2010 at 6:32 PM, Juri Linkov <juri@jurta.org> wrote:
>>> To me it looks like IDN is the most important. Is not this a
>>> derivative work from "confusables"?
>>
>> How IDN will help to highlight characters that look like other characters?
>
>
> All this is new to me so I might have misunderstood it. However it
> looks to me like the IDN (which I have supposed means IDentifier Name,
> but maybe that is wrong)

Internationalized Domain Names (IDN), see
http://unicode.org/reports/tr36/#international_domain_names

Steve Berman





^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: highlighting non-ASCII characters
  2010-03-24 19:34                                       ` Lennart Borgman
@ 2010-03-26 17:35                                         ` Ted Zlatanov
  2010-03-26 20:43                                           ` Ted Zlatanov
  0 siblings, 1 reply; 182+ messages in thread
From: Ted Zlatanov @ 2010-03-26 17:35 UTC (permalink / raw)
  To: emacs-devel

On Wed, 24 Mar 2010 20:34:41 +0100 Lennart Borgman <lennart.borgman@gmail.com> wrote: 

LB> The attached file sets up IDN chars as above. How about defining a
LB> character class [:idnchars:]?

The IDN character class could be useful.  The list changes so rarely
that it can be hard-coded like the POSIX classes IMO.  I think this
would be done in src/regex.c by defining RECC_IDNCHARS for instance.
This could highlight when non-IDN characters are used in a domain name.

But IDN characters are separate from the "confusables" (homoglyphs) we
should discuss, which are much more problematic and more complex because
they not just a character class.

On Thu, 25 Mar 2010 09:11:35 +0200 Juri Linkov <juri@jurta.org> wrote: 

JL> I think it would be more useful to implement this spec:
JL> http://www.unicode.org/reports/tr39/data/confusables.txt
JL> "Visually Confusable Characters: Provides a mapping for visual
JL> confusables for use in further restricting identifiers for security".

JL> It's very large, but it seems it's still incomplete.  I can't find
JL> a "confusable" mapping for the problem I reported:

JL>   BOX DRAWINGS DOUBLE HORIZONTAL   ->   EQUALS SIGN

We can have a [:confusable:] character class defined in src/regex.c.
That lets us find these characters.  It could be generated from the TXT
database and augmented with our own mappings.  But there's grouping
information, so maybe that should be available too.  For highlighting we
don't need grouping information, but the user would find it useful to
look at a glyph and find out that it looks like 3 other glyphs.  So this
can be in a Lisp-level data structure like a hashtable with list values.

I looked at whitespace.el and it looks generally suitable for this kind
of highlighting.  I can't decide if the work should augment
whitespace.el or if it should be a new library called visible.el
(because the name whitespace.el is so specific).

On Thu, 25 Mar 2010 15:07:04 +0100 Lennart Borgman <lennart.borgman@gmail.com> wrote: 

LB> To me it looks like IDN is the most important. Is not this a
LB> derivative work from "confusables"?

I think they are separate logically.  TR39 cares about "confusables" in
the context of IDN but Emacs has a wider view as a general text editor,
IIUC.

Ted

^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: highlighting non-ASCII characters
  2010-03-26 17:35                                         ` Ted Zlatanov
@ 2010-03-26 20:43                                           ` Ted Zlatanov
  2010-03-26 22:50                                             ` Lennart Borgman
  0 siblings, 1 reply; 182+ messages in thread
From: Ted Zlatanov @ 2010-03-26 20:43 UTC (permalink / raw)
  To: emacs-devel

On Fri, 26 Mar 2010 12:35:36 -0500 Ted Zlatanov <tzz@lifelogs.com> wrote: 

TZ> We can have a [:confusable:] character class defined in src/regex.c.
TZ> That lets us find these characters.  It could be generated from the TXT
TZ> database and augmented with our own mappings.  But there's grouping
TZ> information, so maybe that should be available too.  For highlighting we
TZ> don't need grouping information, but the user would find it useful to
TZ> look at a glyph and find out that it looks like 3 other glyphs.  So this
TZ> can be in a Lisp-level data structure like a hashtable with list values.

I forgot to mention this RFC is relevant as well, section 2.2.6:

http://www.ietf.org/rfc/rfc4690.txt

Like the IDN character class, the discussion centers on homoglyphs
inside domain names, but it mentions general relationship-based
confusable detection and points to further RFCs.

Ted

^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: highlighting non-ASCII characters
  2010-03-26 13:38                                               ` Stephen Berman
@ 2010-03-26 22:44                                                 ` Lennart Borgman
  0 siblings, 0 replies; 182+ messages in thread
From: Lennart Borgman @ 2010-03-26 22:44 UTC (permalink / raw)
  To: Stephen Berman; +Cc: emacs-devel

On Fri, Mar 26, 2010 at 2:38 PM, Stephen Berman <stephen.berman@gmx.net> wrote:
>>
>> All this is new to me so I might have misunderstood it. However it
>> looks to me like the IDN (which I have supposed means IDentifier Name,
>> but maybe that is wrong)
>
> Internationalized Domain Names (IDN), see
> http://unicode.org/reports/tr36/#international_domain_names

Oh, thanks. Seems like I have mixed my thoughts about what it can be
used to with the currently official intended use.

Still I think it could be a good idea to restrict identifier names to
chars within IDN.




^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: highlighting non-ASCII characters
  2010-03-26 20:43                                           ` Ted Zlatanov
@ 2010-03-26 22:50                                             ` Lennart Borgman
  2010-03-29 18:38                                               ` Ted Zlatanov
  0 siblings, 1 reply; 182+ messages in thread
From: Lennart Borgman @ 2010-03-26 22:50 UTC (permalink / raw)
  To: Ted Zlatanov; +Cc: emacs-devel

2010/3/26 Ted Zlatanov <tzz@lifelogs.com>:
> On Fri, 26 Mar 2010 12:35:36 -0500 Ted Zlatanov <tzz@lifelogs.com> wrote:
>
> TZ> We can have a [:confusable:] character class defined in src/regex.c.
> TZ> That lets us find these characters.  It could be generated from the TXT
> TZ> database and augmented with our own mappings.  But there's grouping
> TZ> information, so maybe that should be available too.  For highlighting we
> TZ> don't need grouping information, but the user would find it useful to
> TZ> look at a glyph and find out that it looks like 3 other glyphs.  So this
> TZ> can be in a Lisp-level data structure like a hashtable with list values.
>
> I forgot to mention this RFC is relevant as well, section 2.2.6:
>
> http://www.ietf.org/rfc/rfc4690.txt
>
> Like the IDN character class, the discussion centers on homoglyphs
> inside domain names, but it mentions general relationship-based
> confusable detection and points to further RFCs.


Thanks, but what would the difference be to try confusable detection
instead of chars outside IDN?

I believe marking confusable chars would also mean marking those
confusable chars that have been decided on in IDN.

Perhaps they could also be handled in another way (for example helping
switching/rotating the confusable chars, all or one-by-one). Would
that be useful?




^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: highlighting non-ASCII characters
  2010-03-26 22:50                                             ` Lennart Borgman
@ 2010-03-29 18:38                                               ` Ted Zlatanov
  2010-03-29 18:48                                                 ` Drew Adams
  2010-03-29 20:19                                                 ` Stefan Monnier
  0 siblings, 2 replies; 182+ messages in thread
From: Ted Zlatanov @ 2010-03-29 18:38 UTC (permalink / raw)
  To: emacs-devel

On Fri, 26 Mar 2010 23:50:26 +0100 Lennart Borgman <lennart.borgman@gmail.com> wrote: 

LB> Thanks, but what would the difference be to try confusable detection
LB> instead of chars outside IDN?

They are two different character sets.  The IDN chars are fewer and have
a specific purpose for IDN; confusable characters are any Unicode
characters mapped to homoglyphs.

LB> I believe marking confusable chars would also mean marking those
LB> confusable chars that have been decided on in IDN.

Yes, probably.  But that's accidental.  I still think the character
classes [:idn:] (revised name from before) and [:confusable:] (or
[:homoglyph:]) would make sense as a first step, then we can decide how
to highlight them.

LB> Perhaps they could also be handled in another way (for example helping
LB> switching/rotating the confusable chars, all or one-by-one). Would
LB> that be useful?

I don't know what you mean by switching/rotating, sorry.

LB> Still I think it could be a good idea to restrict identifier names to
LB> chars within IDN.

We're talking about generic text in Emacs, so I think IDN should be
limited to places in Emacs where domain names show up (e.g. URLs).

Ted

^ permalink raw reply	[flat|nested] 182+ messages in thread

* RE: highlighting non-ASCII characters
  2010-03-29 18:38                                               ` Ted Zlatanov
@ 2010-03-29 18:48                                                 ` Drew Adams
  2010-03-29 20:20                                                   ` Stefan Monnier
  2010-03-29 20:19                                                 ` Stefan Monnier
  1 sibling, 1 reply; 182+ messages in thread
From: Drew Adams @ 2010-03-29 18:48 UTC (permalink / raw)
  To: 'Ted Zlatanov', emacs-devel

I haven't been following this thread; sorry.

But it occurred to me that besides different categories of such critters there
might be different levels of fontification details that users might want to see.

For example, for some users or for some purposes, it might be useful to see
different kinds of quote marks distinguished (e.g. different kinds of curly
quotes that might be homoglyphs or curly vs straight quotes, which are not
homoglyphs). For other users or for other purposes such highlighting would be a
distraction.

Font-lock verbosity levels allow for this kind of thing.

^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: highlighting non-ASCII characters
  2010-03-29 18:38                                               ` Ted Zlatanov
  2010-03-29 18:48                                                 ` Drew Adams
@ 2010-03-29 20:19                                                 ` Stefan Monnier
  2010-03-29 20:51                                                   ` Lennart Borgman
  2010-03-29 21:05                                                   ` Ted Zlatanov
  1 sibling, 2 replies; 182+ messages in thread
From: Stefan Monnier @ 2010-03-29 20:19 UTC (permalink / raw)
  To: Ted Zlatanov; +Cc: emacs-devel

> Yes, probably.  But that's accidental.  I still think the character
> classes [:idn:] (revised name from before) and [:confusable:] (or
> [:homoglyph:]) would make sense as a first step, then we can decide how
> to highlight them.

The homoglyph data would be a useful starting point for the feature
I imagine, indeed.  But from the message that started this thread, "K"
is a homoglyph, yet highlighting it everywhere doesn't sound like a good
idea, so basically we need to associate with each homoglyph char
a context where it is expected and only highlight it when it appears in
a different context (or maybe rather when it appears in the context of
its peer).

        Stefan

^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: highlighting non-ASCII characters
  2010-03-29 18:48                                                 ` Drew Adams
@ 2010-03-29 20:20                                                   ` Stefan Monnier
  0 siblings, 0 replies; 182+ messages in thread
From: Stefan Monnier @ 2010-03-29 20:20 UTC (permalink / raw)
  To: Drew Adams; +Cc: 'Ted Zlatanov', emacs-devel

> Font-lock verbosity levels allow for this kind of thing.

font-lock verbosity level must die!


        Stefan "because it's too coarse: just because you want
                fine-grained for some things shouldn't force you to
                suffer from the information overload of other things"




^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: highlighting non-ASCII characters
  2010-03-29 20:19                                                 ` Stefan Monnier
@ 2010-03-29 20:51                                                   ` Lennart Borgman
  2010-03-30 13:22                                                     ` Ted Zlatanov
  2010-03-29 21:05                                                   ` Ted Zlatanov
  1 sibling, 1 reply; 182+ messages in thread
From: Lennart Borgman @ 2010-03-29 20:51 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: Ted Zlatanov, emacs-devel

On Mon, Mar 29, 2010 at 10:19 PM, Stefan Monnier
<monnier@iro.umontreal.ca> wrote:
>> Yes, probably.  But that's accidental.  I still think the character
>> classes [:idn:] (revised name from before) and [:confusable:] (or
>> [:homoglyph:]) would make sense as a first step, then we can decide how
>> to highlight them.
>
> The homoglyph data would be a useful starting point for the feature
> I imagine, indeed.  But from the message that started this thread, "K"
> is a homoglyph, yet highlighting it everywhere doesn't sound like a good
> idea, so basically we need to associate with each homoglyph char
> a context where it is expected and only highlight it when it appears in
> a different context (or maybe rather when it appears in the context of
> its peer).

I agree (and I agree to most of what Ted said too).

However just hilighting non-IDN chars seems useful enough. I think it
should be done everywhere (because it is simple and probably does not
hurt, IDN seems mostly useful for variables to for examples), or
optionally only in strings (the only URL context we can actually
guess).

For the moment I have implemented this as fontification. Having it as
a char class that is flexibly initialized would be better. Perhaps my
routines for reading the chars can be used there too.

The homoglyph context thing is maybe more difficult. I did not try to
read carefully so I do not know much. I guess there is something like
char value ranges to use, or? Someone knows which document that where
those ranges can be read (by some elisp code)?

^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: highlighting non-ASCII characters
  2010-03-29 20:19                                                 ` Stefan Monnier
  2010-03-29 20:51                                                   ` Lennart Borgman
@ 2010-03-29 21:05                                                   ` Ted Zlatanov
  2010-03-29 21:31                                                     ` Lennart Borgman
                                                                       ` (2 more replies)
  1 sibling, 3 replies; 182+ messages in thread
From: Ted Zlatanov @ 2010-03-29 21:05 UTC (permalink / raw)
  To: emacs-devel

On Mon, 29 Mar 2010 16:19:07 -0400 Stefan Monnier <monnier@iro.umontreal.ca> wrote: 

>> Yes, probably.  But that's accidental.  I still think the character
>> classes [:idn:] (revised name from before) and [:confusable:] (or
>> [:homoglyph:]) would make sense as a first step, then we can decide how
>> to highlight them.

SM> The homoglyph data would be a useful starting point for the feature
SM> I imagine, indeed.  But from the message that started this thread, "K"
SM> is a homoglyph, yet highlighting it everywhere doesn't sound like a good
SM> idea, so basically we need to associate with each homoglyph char
SM> a context where it is expected and only highlight it when it appears in
SM> a different context (or maybe rather when it appears in the context of
SM> its peer).

(I had a "lightbulb moment" I should have had long ago: "confusable" is
a character property, while "homoglyph" is a glyph property; thus the
character class should be [:confusable:] and "homoglyph" should be used
in the face name as long as it's not, er, confusing.)

I know the goal is to match in context and I may take whitespace.el as a
guide in this regard, but I have to start with a [:confusable:]
character class.  I'll also add a [:idn:] class as discussed.  Is that
OK or are you concerned about code bloat in regexp.c?

Afterwards we can set up the map between each confusable character and
the set of characters it can match; this is also in the data file.  That
lets us look in context and apply the rules I proposed.  So for example
if Cyrillic K is confusable with Roman K and we see Roman characters
around, that's suspicious.  But Cyrillic "zhe" is not confusable with
any Roman characters so it wouldn't be as suspicious.

On Mon, 29 Mar 2010 11:48:28 -0700 "Drew Adams" <drew.adams@oracle.com> wrote: 

DA> But it occurred to me that besides different categories of such critters there
DA> might be different levels of fontification details that users might want to see.

DA> For example, for some users or for some purposes, it might be useful to see
DA> different kinds of quote marks distinguished (e.g. different kinds of curly
DA> quotes that might be homoglyphs or curly vs straight quotes, which are not
DA> homoglyphs). For other users or for other purposes such highlighting would be a
DA> distraction.

I'll set up a flexible mechanism, probably patterned after
whitespace.el, to do this kind of highlighting.  So the users will be
able to extend it if needed.

I don't know about curly vs. straight quotes.  I don't think that's a
significant problem, whereas a Cyrillic K in Roman text can actually
cause problems and security compromises.  I'm not against the idea, I
have just never seen it become an issue, and there's a million ways to
combine quotation marks depending on the context.  What's the specific
case that you're thinking of?

Ted

^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: highlighting non-ASCII characters
  2010-03-29 21:05                                                   ` Ted Zlatanov
@ 2010-03-29 21:31                                                     ` Lennart Borgman
  2010-03-29 21:32                                                     ` Drew Adams
  2010-03-30  1:45                                                     ` Stefan Monnier
  2 siblings, 0 replies; 182+ messages in thread
From: Lennart Borgman @ 2010-03-29 21:31 UTC (permalink / raw)
  To: Ted Zlatanov; +Cc: emacs-devel

2010/3/29 Ted Zlatanov <tzz@lifelogs.com>:
>
> I know the goal is to match in context and I may take whitespace.el as a
> guide in this regard, but I have to start with a [:confusable:]
> character class.  I'll also add a [:idn:] class as discussed.  Is that
> OK or are you concerned about code bloat in regexp.c?

My impression is that IDN is a work in progress so it might be good
idea to read in the characters from a file if possible (and let the
user reread that file later if necessary).




^ permalink raw reply	[flat|nested] 182+ messages in thread

* RE: highlighting non-ASCII characters
  2010-03-29 21:05                                                   ` Ted Zlatanov
  2010-03-29 21:31                                                     ` Lennart Borgman
@ 2010-03-29 21:32                                                     ` Drew Adams
  2010-03-30 13:15                                                       ` Ted Zlatanov
  2010-03-30  1:45                                                     ` Stefan Monnier
  2 siblings, 1 reply; 182+ messages in thread
From: Drew Adams @ 2010-03-29 21:32 UTC (permalink / raw)
  To: 'Ted Zlatanov', emacs-devel

> DA> But it occurred to me that besides different categories 
> DA> of such critters there might be different levels of
> DA> fontification details that users might want to see.
> 
> DA> For example, for some users or for some purposes, it 
> DA> might be useful to see different kinds of quote marks
> DA> distinguished (e.g. different kinds of curly quotes
> DA> that might be homoglyphs or curly vs straight quotes,
> DA> which are not homoglyphs). For other users or for other
> DA> purposes such highlighting would be a distraction.
> 
> I'll set up a flexible mechanism, probably patterned after
> whitespace.el, to do this kind of highlighting.  So the users will be
> able to extend it if needed.
> 
> I don't know about curly vs. straight quotes.  I don't think that's a
> significant problem, whereas a Cyrillic K in Roman text can actually
> cause problems and security compromises.  I'm not against the idea, I
> have just never seen it become an issue, and there's a million ways to
> combine quotation marks depending on the context.  What's the specific
> case that you're thinking of?

Nothing special. I know that in Framemaker we consistently replace all curly
quotes with straight quotes, mainly so that copy+paste of doc examples will work
in applications (e.g. SQL).

The point was that besides multiple kinds of curly-quote characters that are
true homoglyphs (dunno if such exist; I'm assuming they do), it can also be
useful to highlight a character (e.g. curly quote) that is similar to but not
identical to another character (e.g. straight quote).

That use would be contextual, of course: you might turn it on the same way you
might turn on an on-the-fly spell-check, to let you know whenever you type or
paste the wrong char.

Probably this need would be taken care of by what you described in your first
paragraphs (not quoted here): being able to highlight similar looking chars that
are not necessarily exact homoglyphs.

^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: highlighting non-ASCII characters
  2010-03-29 21:05                                                   ` Ted Zlatanov
  2010-03-29 21:31                                                     ` Lennart Borgman
  2010-03-29 21:32                                                     ` Drew Adams
@ 2010-03-30  1:45                                                     ` Stefan Monnier
  2 siblings, 0 replies; 182+ messages in thread
From: Stefan Monnier @ 2010-03-30  1:45 UTC (permalink / raw)
  To: Ted Zlatanov; +Cc: emacs-devel

> I know the goal is to match in context and I may take whitespace.el as a
> guide in this regard, but I have to start with a [:confusable:]
> character class.  I'll also add a [:idn:] class as discussed.  Is that
> OK or are you concerned about code bloat in regexp.c?

It seems that character categories (see category-talble and friends) are
a better choice.


        Stefan




^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: highlighting non-ASCII characters
  2010-03-29 21:32                                                     ` Drew Adams
@ 2010-03-30 13:15                                                       ` Ted Zlatanov
  2010-03-30 14:04                                                         ` Drew Adams
  2010-03-30 16:18                                                         ` Juri Linkov
  0 siblings, 2 replies; 182+ messages in thread
From: Ted Zlatanov @ 2010-03-30 13:15 UTC (permalink / raw)
  To: emacs-devel

On Mon, 29 Mar 2010 14:32:33 -0700 "Drew Adams" <drew.adams@oracle.com> wrote: 

DA> The point was that besides multiple kinds of curly-quote characters that are
DA> true homoglyphs (dunno if such exist; I'm assuming they do), 

There are definitely some funky quotes in Unicode so let's assume so.

DA> it can also be useful to highlight a character (e.g. curly quote)
DA> that is similar to but not identical to another character
DA> (e.g. straight quote).

OK, but can you say how it's useful in a specific example?  In SQL,
Perl, Java, Lisp, and TeX editing I would not need the *glyphs*
highlighted because the mode would detect the mismatch, e.g. in Perl

$result = `run command here[wrong backward quote here]; # comment here

would highlight "comment here" as part of the command.  IOW they are
syntactically significant so a mismatch is not likely to go unnoticed
anyway by the regular font-lock and the parser.

In regular text it's legitimate to have any combination of quote marks
so I don't see the benefit of looking for suspicious combinations.  In
domain names quote marks of any kind are suspicious :)

Ted

^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: highlighting non-ASCII characters
  2010-03-29 20:51                                                   ` Lennart Borgman
@ 2010-03-30 13:22                                                     ` Ted Zlatanov
  0 siblings, 0 replies; 182+ messages in thread
From: Ted Zlatanov @ 2010-03-30 13:22 UTC (permalink / raw)
  To: emacs-devel

On Mon, 29 Mar 2010 22:51:02 +0200 Lennart Borgman <lennart.borgman@gmail.com> wrote: 

LB> However just hilighting non-IDN chars seems useful enough. I think it
LB> should be done everywhere (because it is simple and probably does not
LB> hurt, IDN seems mostly useful for variables to for examples), or
LB> optionally only in strings (the only URL context we can actually
LB> guess).

LB> For the moment I have implemented this as fontification. Having it as
LB> a char class that is flexibly initialized would be better. Perhaps my
LB> routines for reading the chars can be used there too.

Look at Categories in the ELisp manual (what Stefan referred to when he
mentioned category-table).  If you can implement your reader that way it
would be great.  It's much better than modifying regexp.c :)

LB> The homoglyph context thing is maybe more difficult. I did not try to
LB> read carefully so I do not know much. I guess there is something like
LB> char value ranges to use, or? Someone knows which document that where
LB> those ranges can be read (by some elisp code)?

The confusables text file will give you all of them for the category
table.  But you also need to group them by homoglyph (probably with a
hashtable), so I'd write a custom reader.  If you don't get to it, I
will eventually :)

The two text files (IDN and confusables) would have to live inside Emacs
somewhere and the reader will load them when it's loaded.

LB> My impression is that IDN is a work in progress so it might be good
LB> idea to read in the characters from a file if possible (and let the
LB> user reread that file later if necessary).

Re-reading the file is a really, really rare occurrence for the user so
I would make it an internal function.  You can always call it directly
while developing, but end users will never need to.

Ted

^ permalink raw reply	[flat|nested] 182+ messages in thread

* RE: highlighting non-ASCII characters
  2010-03-30 13:15                                                       ` Ted Zlatanov
@ 2010-03-30 14:04                                                         ` Drew Adams
  2010-03-30 14:17                                                           ` Lennart Borgman
  2010-03-30 14:42                                                           ` Ted Zlatanov
  2010-03-30 16:18                                                         ` Juri Linkov
  1 sibling, 2 replies; 182+ messages in thread
From: Drew Adams @ 2010-03-30 14:04 UTC (permalink / raw)
  To: 'Ted Zlatanov', emacs-devel

> DA> it can also be useful to highlight a character (e.g. curly quote)
> DA> that is similar to but not identical to another character
> DA> (e.g. straight quote).
> 
> OK, but can you say how it's useful in a specific example?  In SQL,
> Perl, Java, Lisp, and TeX editing I would not need the *glyphs*
> highlighted because the mode would detect the mismatch, e.g. in Perl
> 
> $result = `run command here[wrong backward quote here]; # comment here
> 
> would highlight "comment here" as part of the command.  IOW they are
> syntactically significant so a mismatch is not likely to go unnoticed
> anyway by the regular font-lock and the parser.
> 
> In regular text it's legitimate to have any combination of quote marks
> so I don't see the benefit of looking for suspicious combinations.  In
> domain names quote marks of any kind are suspicious :)

Perhaps you're assuming that the code will be used in Emacs, so you say that
Emacs treats all such quotes similarly or highlights them anyway etc. (so no
problem).

Emacs might be used to write raw documentation (e.g. including code samples)
that is used to generate HTML or PDF or... Readers of that doc might then copy
and paste such examples into an app other than Emacs for execution - an app that
does not treat all such quotes similarly. 

Just one hypothetical example, extrapolated from why we use straight quotes in
our use of Framemaker.

Beyond that, I would think that there might be a number of use cases where one
might want to visually distinguish characters that are difficult to distinguish
- either exact homoglyphs or approximate ones. That's all.

Remember that it was only recently that Emacs itself started to treat a
non-breaking space in Lisp code the same as a regular space. Nothing guarantees
that an app DTRT with chars that look similar but are different.

And there's the opposite potential problem: not distinguishing similar chars
visually in the case where they do have different behaviors in some app. Suppose
you prepare code (for example) in Emacs for use in some other context, and you
want to be made aware when you use the wrong char, to avoid a problem
downstream.

Anyway, you get the point, I think. If you don't think there is a problem, I'm
OK with that.

^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: highlighting non-ASCII characters
  2010-03-30 14:04                                                         ` Drew Adams
@ 2010-03-30 14:17                                                           ` Lennart Borgman
  2010-03-30 14:42                                                           ` Ted Zlatanov
  1 sibling, 0 replies; 182+ messages in thread
From: Lennart Borgman @ 2010-03-30 14:17 UTC (permalink / raw)
  To: Drew Adams; +Cc: Ted Zlatanov, emacs-devel

On Tue, Mar 30, 2010 at 4:04 PM, Drew Adams <drew.adams@oracle.com> wrote:
>
> And there's the opposite potential problem: not distinguishing similar chars
> visually in the case where they do have different behaviors in some app. Suppose
> you prepare code (for example) in Emacs for use in some other context, and you
> want to be made aware when you use the wrong char, to avoid a problem
> downstream.


Even though I understand IDN chars are limited to URL-like things I
expect them to form the basis of attempts to solve also problems like
this.




^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: highlighting non-ASCII characters
  2010-03-30 14:04                                                         ` Drew Adams
  2010-03-30 14:17                                                           ` Lennart Borgman
@ 2010-03-30 14:42                                                           ` Ted Zlatanov
  1 sibling, 0 replies; 182+ messages in thread
From: Ted Zlatanov @ 2010-03-30 14:42 UTC (permalink / raw)
  To: emacs-devel

On Tue, 30 Mar 2010 07:04:05 -0700 "Drew Adams" <drew.adams@oracle.com> wrote: 

DA> Perhaps you're assuming that the code will be used in Emacs, so you say that
DA> Emacs treats all such quotes similarly or highlights them anyway etc. (so no
DA> problem).

DA> Emacs might be used to write raw documentation (e.g. including code samples)
DA> that is used to generate HTML or PDF or... Readers of that doc might then copy
DA> and paste such examples into an app other than Emacs for execution - an app that
DA> does not treat all such quotes similarly. 

DA> Just one hypothetical example, extrapolated from why we use straight quotes in
DA> our use of Framemaker.

I see what you mean.  It would require copy from a text mode (any
programming modes would show the problem with mismatched quotes).  I
don't know if Emacs should try to prevent this problem; the effort spent
is far larger than the potential benefit.

DA> Beyond that, I would think that there might be a number of use cases where one
DA> might want to visually distinguish characters that are difficult to distinguish
DA> - either exact homoglyphs or approximate ones. That's all.

So far we've discussed exact homoglyphs, but I think once we have the
basic UCS confusables we can augment it with more characters we think
are confusable.

DA> And there's the opposite potential problem: not distinguishing similar chars
DA> visually in the case where they do have different behaviors in some app. Suppose
DA> you prepare code (for example) in Emacs for use in some other context, and you
DA> want to be made aware when you use the wrong char, to avoid a problem
DA> downstream.

OK.

DA> Anyway, you get the point, I think. If you don't think there is a
DA> problem, I'm OK with that.

I think it's sufficient to start with the idnchars.txt and
confusables.txt databases (the latter parsed into an auxiliary hashtable
as well, to find what homoglyph group a character belongs to).  From
what you have said, it seems useful to augment the confusables.txt
database with "similar but not exact homoglyphs" eventually but it's not
as important.

I'll try to put something together soon, unless Lennart beats me to it,
to at least read the databases and generate the category-tables.

Ted

^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: highlighting non-ASCII characters
  2010-03-30 13:15                                                       ` Ted Zlatanov
  2010-03-30 14:04                                                         ` Drew Adams
@ 2010-03-30 16:18                                                         ` Juri Linkov
  1 sibling, 0 replies; 182+ messages in thread
From: Juri Linkov @ 2010-03-30 16:18 UTC (permalink / raw)
  To: Ted Zlatanov; +Cc: emacs-devel

> DA> it can also be useful to highlight a character (e.g. curly quote)
> DA> that is similar to but not identical to another character
> DA> (e.g. straight quote).
>
> OK, but can you say how it's useful in a specific example?  In SQL,
> Perl, Java, Lisp, and TeX editing I would not need the *glyphs*
> highlighted because the mode would detect the mismatch, e.g. in Perl
>
> $result = `run command here[wrong backward quote here]; # comment here
>
> would highlight "comment here" as part of the command.  IOW they are
> syntactically significant so a mismatch is not likely to go unnoticed
> anyway by the regular font-lock and the parser.

This is a significant problem.  Often non-ASCII typographical characters
(such as en-dash, em-dash used in the documentation for the long command
line arguments, and different quotation marks used to quote command line
arguments) copied from a Web page to Emacs look like ASCII characters.
They can cause security compromises when pasted into the M-x shell
command line unnoticed.  In the best case, a shell command just signals
an error, in the worst case it overwrites existing files.

-- 
Juri Linkov
http://www.jurta.org/emacs/




^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: face for non-ASCII characters
  2010-03-23 18:18                           ` face for non-ASCII characters Ted Zlatanov
@ 2011-04-15 22:41                             ` Ted Zlatanov
  2011-04-15 23:07                               ` Lennart Borgman
  0 siblings, 1 reply; 182+ messages in thread
From: Ted Zlatanov @ 2011-04-15 22:41 UTC (permalink / raw)
  To: emacs-devel

[-- Attachment #1: Type: text/plain, Size: 2147 bytes --]

On Tue, 23 Mar 2010 13:18:05 -0500 Ted Zlatanov <tzz@lifelogs.com> wrote: 

TZ> On Tue, 23 Mar 2010 17:28:01 +0100 Lennart Borgman <lennart.borgman@gmail.com> wrote: 
LB> Ok, since you found it useful I tried to finish it. See the attachment.

TZ> Stefan, Yidong, what do you think about something like markchars.el
TZ> becoming part of Emacs, so users can turn it on with a global
TZ> customization?  I don't think it has to be a minor mode necessarily but
TZ> I'm no expert in that area.

TZ> If it goes in there are several tasks: picking a variable name, picking
TZ> suitable default faces, (maybe) adding a menu item, and documenting the
TZ> new behavior.  I would really like to make it easily available to Emacs
TZ> users and will help with these tasks.

I worked on markchars.el some more (over 1 year later, heh).  The result
is attached; the commentary explains what it will do:

;;; Commentary:
;;
;; Mark special chars, by default nonascii chars, in modes where they
;; may be confused with regular chars. See `markchars-mode' and
;; `markchars-what'.  There are two modes: confusable detection (where
;; we look for mixed scripts within a word, without using the
;; http://www.unicode.org/reports/tr39/ confusable tables) and pattern
;; detection (where any regular expressions can be matched).
;;
;; The marked text will have the 'markchars property set to either
;; 'confusable or 'pattern and the face set to either
;; `markchars-face-confusable' or `markchars-face-pattern'
;; respectively.

So basically you can choose what to highlight; if you choose to
highlight confusable characters, any strings whose characters have mixed
scripts according to `char-script-table' will be highlighted.  I didn't
use the actual confusables.txt table from
http://www.unicode.org/reports/tr39/ because it's huge and probably not
very useful; the goal is to show suspicious characters and within
individual scripts it's rare to find confusable characters.

It's pretty easy to add more markchar properties and to make the
confusable detection smarter.  Let me know if you find this useful; I
will propose it for the GNU ELPA if so.

Thanks
Ted

[-- Attachment #2: markchars.el --]
[-- Type: application/emacs-lisp, Size: 6397 bytes --]

^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: face for non-ASCII characters
  2011-04-15 22:41                             ` Ted Zlatanov
@ 2011-04-15 23:07                               ` Lennart Borgman
  2011-04-16  0:51                                 ` Ted Zlatanov
  0 siblings, 1 reply; 182+ messages in thread
From: Lennart Borgman @ 2011-04-15 23:07 UTC (permalink / raw)
  To: Ted Zlatanov; +Cc: emacs-devel

2011/4/16 Ted Zlatanov <tzz@lifelogs.com>:
> On Tue, 23 Mar 2010 13:18:05 -0500 Ted Zlatanov <tzz@lifelogs.com> wrote:
>
> TZ> On Tue, 23 Mar 2010 17:28:01 +0100 Lennart Borgman <lennart.borgman@gmail.com> wrote:
> LB> Ok, since you found it useful I tried to finish it. See the attachment.
>
> TZ> Stefan, Yidong, what do you think about something like markchars.el
> TZ> becoming part of Emacs, so users can turn it on with a global
> TZ> customization?  I don't think it has to be a minor mode necessarily but
> TZ> I'm no expert in that area.
>
> TZ> If it goes in there are several tasks: picking a variable name, picking
> TZ> suitable default faces, (maybe) adding a menu item, and documenting the
> TZ> new behavior.  I would really like to make it easily available to Emacs
> TZ> users and will help with these tasks.
>
> I worked on markchars.el some more (over 1 year later, heh).  The result
> is attached; the commentary explains what it will do:
>
> ;;; Commentary:
> ;;
> ;; Mark special chars, by default nonascii chars, in modes where they
> ;; may be confused with regular chars. See `markchars-mode' and
> ;; `markchars-what'.  There are two modes: confusable detection (where
> ;; we look for mixed scripts within a word, without using the
> ;; http://www.unicode.org/reports/tr39/ confusable tables) and pattern
> ;; detection (where any regular expressions can be matched).
> ;;
> ;; The marked text will have the 'markchars property set to either
> ;; 'confusable or 'pattern and the face set to either
> ;; `markchars-face-confusable' or `markchars-face-pattern'
> ;; respectively.
>
> So basically you can choose what to highlight; if you choose to
> highlight confusable characters, any strings whose characters have mixed
> scripts according to `char-script-table' will be highlighted.  I didn't
> use the actual confusables.txt table from
> http://www.unicode.org/reports/tr39/ because it's huge and probably not
> very useful; the goal is to show suspicious characters and within
> individual scripts it's rare to find confusable characters.
>
> It's pretty easy to add more markchar properties and to make the
> confusable detection smarter.  Let me know if you find this useful; I
> will propose it for the GNU ELPA if so.
>
> Thanks
> Ted


Nice to see you are enhancing it, Ted. However I wonder if you are
working on an older copy of it since it does not use idn.el. Could you
please take a look at the latest version and see how
idn-is-recommended compares to what you call confusables?



^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: face for non-ASCII characters
  2011-04-15 23:07                               ` Lennart Borgman
@ 2011-04-16  0:51                                 ` Ted Zlatanov
  2011-04-16  9:10                                   ` Lennart Borgman
  0 siblings, 1 reply; 182+ messages in thread
From: Ted Zlatanov @ 2011-04-16  0:51 UTC (permalink / raw)
  To: emacs-devel

On Sat, 16 Apr 2011 01:07:06 +0200 Lennart Borgman <lennart.borgman@gmail.com> wrote: 

LB> Nice to see you are enhancing it, Ted. However I wonder if you are
LB> working on an older copy of it since it does not use idn.el. Could you
LB> please take a look at the latest version and see how
LB> idn-is-recommended compares to what you call confusables?

Where is the latest version?  I didn't see any further messages from you
in that thread after 2010-03 so I didn't know you had updated it.

Ted

^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: face for non-ASCII characters
  2011-04-16  0:51                                 ` Ted Zlatanov
@ 2011-04-16  9:10                                   ` Lennart Borgman
  2011-04-16 15:05                                     ` Ted Zlatanov
  0 siblings, 1 reply; 182+ messages in thread
From: Lennart Borgman @ 2011-04-16  9:10 UTC (permalink / raw)
  To: Ted Zlatanov; +Cc: emacs-devel

2011/4/16 Ted Zlatanov <tzz@lifelogs.com>:
> On Sat, 16 Apr 2011 01:07:06 +0200 Lennart Borgman <lennart.borgman@gmail.com> wrote:
>
> LB> Nice to see you are enhancing it, Ted. However I wonder if you are
> LB> working on an older copy of it since it does not use idn.el. Could you
> LB> please take a look at the latest version and see how
> LB> idn-is-recommended compares to what you call confusables?
>
> Where is the latest version?  I didn't see any further messages from you
> in that thread after 2010-03 so I didn't know you had updated it.

Oh, I am very sorry Ted. I have put mostly every elisp library I have
written into nXhtml. So you find it in the nXhtml repository at
Launchpad.



^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: face for non-ASCII characters
  2011-04-16  9:10                                   ` Lennart Borgman
@ 2011-04-16 15:05                                     ` Ted Zlatanov
  2011-04-16 15:28                                       ` Lennart Borgman
  0 siblings, 1 reply; 182+ messages in thread
From: Ted Zlatanov @ 2011-04-16 15:05 UTC (permalink / raw)
  To: emacs-devel

[-- Attachment #1: Type: text/plain, Size: 2653 bytes --]

On Sat, 16 Apr 2011 11:10:03 +0200 Lennart Borgman <lennart.borgman@gmail.com> wrote: 

LB> 2011/4/16 Ted Zlatanov <tzz@lifelogs.com>:
>> On Sat, 16 Apr 2011 01:07:06 +0200 Lennart Borgman <lennart.borgman@gmail.com> wrote:
>> 
LB> Nice to see you are enhancing it, Ted. However I wonder if you are
LB> working on an older copy of it since it does not use idn.el. Could you
LB> please take a look at the latest version and see how
LB> idn-is-recommended compares to what you call confusables?
>> 
>> Where is the latest version?  I didn't see any further messages from you
>> in that thread after 2010-03 so I didn't know you had updated it.

LB> Oh, I am very sorry Ted. I have put mostly every elisp library I have
LB> written into nXhtml. So you find it in the nXhtml repository at
LB> Launchpad.

I merged your changes with my version and called myself a "contrbuthor" :)

I'd like to keep markchars.el a standalone library, so the attached does
not require idn.el.  I also set the version to 0.2.  I would like to put
it in the GNU ELPA, if you don't mind (it can still live in nXhtml, we
can mirror it).  You'll need to assign the copyright, though.

The major change is that instead of detecting the range at the font-lock
keyword level, I run non-IDN detection at the word markup level (just
like confusables detection).  I think that results in cleaner, easily
extensible code--take a look and see what you think.

For an IDN markup face I defined a new one.  Your call on what it should
be, I just set it to a white underline for now.

This is IMO a good change:

(make-obsolete-variable 'markchars-keywords 'markchars-what "markchars.el 0.2")

because you had `markchars-keywords' and `markchars-used-keywords' which
was confusing.

`markchars--render-nonidn' is not optimized: it steps through the word
in the buffer and assigns the properties to each individual character
instead of each range it finds.  I don't think that's a big deal but it
could be done better.  I couldn't reuse your non-IDN detection logic
because it was not word-oriented.

I would use a char-table for idn.el instead of a bool-vector.  Also
perhaps idn.el's .txt files and confusables.txt should simply be part of
Emacs, so the IDN and confusables properties can be looked up like the
other properties.  Emacs already does that for many properties, see for
example:

(format "%S" (mapcar 'car char-code-property-alist))
(get-char-code-property ?q 'titlecase)

I think that inclusion would benefit everyone, but the original .txt
files are large so I'll leave it up to the experts.  If they are
included, `markchars--render-nonidn' would be much much smaller.

Ted

[-- Attachment #2: markchars.el --]
[-- Type: application/emacs-lisp, Size: 7721 bytes --]

^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: face for non-ASCII characters
  2011-04-16 15:05                                     ` Ted Zlatanov
@ 2011-04-16 15:28                                       ` Lennart Borgman
  2011-04-16 15:42                                         ` Ted Zlatanov
  0 siblings, 1 reply; 182+ messages in thread
From: Lennart Borgman @ 2011-04-16 15:28 UTC (permalink / raw)
  To: Ted Zlatanov; +Cc: emacs-devel

2011/4/16 Ted Zlatanov <tzz@lifelogs.com>:
> On Sat, 16 Apr 2011 11:10:03 +0200 Lennart Borgman <lennart.borgman@gmail.com> wrote:
>
> LB> 2011/4/16 Ted Zlatanov <tzz@lifelogs.com>:
>>> On Sat, 16 Apr 2011 01:07:06 +0200 Lennart Borgman <lennart.borgman@gmail.com> wrote:
>>>
> LB> Nice to see you are enhancing it, Ted. However I wonder if you are
> LB> working on an older copy of it since it does not use idn.el. Could you
> LB> please take a look at the latest version and see how
> LB> idn-is-recommended compares to what you call confusables?
>>>
>>> Where is the latest version?  I didn't see any further messages from you
>>> in that thread after 2010-03 so I didn't know you had updated it.
>
> LB> Oh, I am very sorry Ted. I have put mostly every elisp library I have
> LB> written into nXhtml. So you find it in the nXhtml repository at
> LB> Launchpad.
>
> I merged your changes with my version and called myself a "contrbuthor" :)

:-)

> I'd like to keep markchars.el a standalone library, so the attached does
> not require idn.el.

Ok. Do I understand it correctly that you use idn.el if it is
available, otherwise not?

> can mirror it).  You'll need to assign the copyright, though.

I have of course signed papers for Emacs.

> The major change is that instead of detecting the range at the font-lock
> keyword level, I run non-IDN detection at the word markup level (just
> like confusables detection).  I think that results in cleaner, easily
> extensible code--take a look and see what you think.

I have no special opinion about it. Just use what is easiest to
maintain and understand. I guess it works the same?

> For an IDN markup face I defined a new one.  Your call on what it should
> be, I just set it to a white underline for now.

Thanks. I have no special opinion on that.

> This is IMO a good change:
>
> (make-obsolete-variable 'markchars-keywords 'markchars-what "markchars.el 0.2")
>
> because you had `markchars-keywords' and `markchars-used-keywords' which
> was confusing.

Ok.

> `markchars--render-nonidn' is not optimized: it steps through the word
> in the buffer and assigns the properties to each individual character
> instead of each range it finds.  I don't think that's a big deal but it
> could be done better.  I couldn't reuse your non-IDN detection logic
> because it was not word-oriented.

Please suggest a better inferface from idn.el if possible to fix this.

> I would use a char-table for idn.el instead of a bool-vector.  Also
> perhaps idn.el's .txt files and confusables.txt should simply be part of
> Emacs, so the IDN and confusables properties can be looked up like the
> other properties.  Emacs already does that for many properties, see for
> example:
>
> (format "%S" (mapcar 'car char-code-property-alist))
> (get-char-code-property ?q 'titlecase)
>
> I think that inclusion would benefit everyone, but the original .txt
> files are large so I'll leave it up to the experts.

That was why I used a boolean vector. But I am not an expert on this.

> If they are
> included, `markchars--render-nonidn' would be much much smaller.

I do not understand what you mean here.

> Ted
>
>



^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: face for non-ASCII characters
  2011-04-16 15:28                                       ` Lennart Borgman
@ 2011-04-16 15:42                                         ` Ted Zlatanov
  2011-04-16 15:50                                           ` Lennart Borgman
  0 siblings, 1 reply; 182+ messages in thread
From: Ted Zlatanov @ 2011-04-16 15:42 UTC (permalink / raw)
  To: emacs-devel

On Sat, 16 Apr 2011 17:28:48 +0200 Lennart Borgman <lennart.borgman@gmail.com> wrote: 

>> I'd like to keep markchars.el a standalone library, so the attached does
>> not require idn.el.

LB> Ok. Do I understand it correctly that you use idn.el if it is
LB> available, otherwise not?

Yes, `idn-is-recommended' is autoloaded.  It may be a problem with the
fboundp call, which will return true even if the library is not
available:

(when (autoload 'coocoo "coo")
  (fboundp 'coocoo))

=> t

so maybe (require 'idn nil t) is a better approach, and we can define a
stub replacement for `idn-is-recommended'.  I didn't dig too much into
it because of the possibility of inclusion I mentioned.

>> The major change is that instead of detecting the range at the font-lock
>> keyword level, I run non-IDN detection at the word markup level (just
>> like confusables detection).  I think that results in cleaner, easily
>> extensible code--take a look and see what you think.

LB> I have no special opinion about it. Just use what is easiest to
LB> maintain and understand. I guess it works the same?

Effectively it's the same, yes.

>> `markchars--render-nonidn' is not optimized: it steps through the word
>> in the buffer and assigns the properties to each individual character
>> instead of each range it finds.  I don't think that's a big deal but it
>> could be done better.  I couldn't reuse your non-IDN detection logic
>> because it was not word-oriented.

LB> Please suggest a better inferface from idn.el if possible to fix
LB> this.

This is the stepping algorithm at the buffer level, idn.el can't help.

>> I would use a char-table for idn.el instead of a bool-vector.  Also
>> perhaps idn.el's .txt files and confusables.txt should simply be part of
>> Emacs, so the IDN and confusables properties can be looked up like the
>> other properties.  Emacs already does that for many properties, see for
>> example:
>> 
>> (format "%S" (mapcar 'car char-code-property-alist))
>> (get-char-code-property ?q 'titlecase)
>> 
>> I think that inclusion would benefit everyone, but the original .txt
>> files are large so I'll leave it up to the experts.

LB> That was why I used a boolean vector. But I am not an expert on
LB> this.

The char-table is meant for character lookups, that's why I suggested
it instead of a bool-vector.

>> If they are
>> included, `markchars--render-nonidn' would be much much smaller.

LB> I do not understand what you mean here.

The function wouldn't need to assign a special property, the character
would already have it.  And the highlighting could just look for that
special property instead of hunting for it with a function.

Ted

^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: face for non-ASCII characters
  2011-04-16 15:42                                         ` Ted Zlatanov
@ 2011-04-16 15:50                                           ` Lennart Borgman
  2011-04-16 15:57                                             ` Ted Zlatanov
  2011-04-16 16:00                                             ` face for non-ASCII characters Drew Adams
  0 siblings, 2 replies; 182+ messages in thread
From: Lennart Borgman @ 2011-04-16 15:50 UTC (permalink / raw)
  To: Ted Zlatanov; +Cc: emacs-devel

2011/4/16 Ted Zlatanov <tzz@lifelogs.com>:
> On Sat, 16 Apr 2011 17:28:48 +0200 Lennart Borgman <lennart.borgman@gmail.com> wrote:
>
>>> I'd like to keep markchars.el a standalone library, so the attached does
>>> not require idn.el.
>
> LB> Ok. Do I understand it correctly that you use idn.el if it is
> LB> available, otherwise not?
>
> Yes, `idn-is-recommended' is autoloaded.  It may be a problem with the
> fboundp call, which will return true even if the library is not
> available:
>
> (when (autoload 'coocoo "coo")
>  (fboundp 'coocoo))
>
> => t

This is not a problem if autoload is setup correctly.

> so maybe (require 'idn nil t) is a better approach, and we can define a
> stub replacement for `idn-is-recommended'.  I didn't dig too much into
> it because of the possibility of inclusion I mentioned.

I think autoloading and checking if the function is defined is the
best way. (Autoloading of `idn-is-recommended' should of course not be
setup if idn.el is not available.)

> Effectively it's the same, yes.

Good.

>>> `markchars--render-nonidn' is not optimized: it steps through the word
>>> in the buffer and assigns the properties to each individual character
>>> instead of each range it finds.  I don't think that's a big deal but it
>>> could be done better.  I couldn't reuse your non-IDN detection logic
>>> because it was not word-oriented.
>
> LB> Please suggest a better inferface from idn.el if possible to fix
> LB> this.
>
> This is the stepping algorithm at the buffer level, idn.el can't help.
...
> The char-table is meant for character lookups, that's why I suggested
> it instead of a bool-vector.

So using a char-table instead in idn.el would make the interface
better? (But we need someone good at this to tell if that is usable
here performancewise.)

> The function wouldn't need to assign a special property, the character
> would already have it.  And the highlighting could just look for that
> special property instead of hunting for it with a function.

I see, thanks.

> Ted
>
>
>



^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: face for non-ASCII characters
  2011-04-16 15:50                                           ` Lennart Borgman
@ 2011-04-16 15:57                                             ` Ted Zlatanov
  2011-04-16 16:01                                               ` Lennart Borgman
  2011-04-16 16:00                                             ` face for non-ASCII characters Drew Adams
  1 sibling, 1 reply; 182+ messages in thread
From: Ted Zlatanov @ 2011-04-16 15:57 UTC (permalink / raw)
  To: emacs-devel

On Sat, 16 Apr 2011 17:50:07 +0200 Lennart Borgman <lennart.borgman@gmail.com> wrote: 

>> so maybe (require 'idn nil t) is a better approach, and we can define a
>> stub replacement for `idn-is-recommended'.  I didn't dig too much into
>> it because of the possibility of inclusion I mentioned.

LB> I think autoloading and checking if the function is defined is the
LB> best way. (Autoloading of `idn-is-recommended' should of course not be
LB> setup if idn.el is not available.)

OK, but then this:

(defcustom markchars-what
  `(markchars-simple-pattern
    markchars-confusables
    ,@(when (fboundp 'idn-is-recommended) '(markchars-nonidn-fun)))
...

will always add `markchars-nonidn-fun' to `markchars-what' and that will
break users without idn.el.

LB> So using a char-table instead in idn.el would make the interface
LB> better? (But we need someone good at this to tell if that is usable
LB> here performancewise.)

I think it's slightly better for characters to use char-table instead of
bool-vector.  Semantically it's cleaner: you're using a char-table,
which is meant for characters, instead of a bool-vector, which is for
general data.  But it's a really small difference so don't bother if you
don't feel like it :)

OK, so after we resolve the autoload issue above and find out if the
confusables.txt and idn*.txt files can be turned into Emacs character
properties, we can publish 0.2, right?  Is there anything else it needs?

Ted

^ permalink raw reply	[flat|nested] 182+ messages in thread

* RE: face for non-ASCII characters
  2011-04-16 15:50                                           ` Lennart Borgman
  2011-04-16 15:57                                             ` Ted Zlatanov
@ 2011-04-16 16:00                                             ` Drew Adams
  1 sibling, 0 replies; 182+ messages in thread
From: Drew Adams @ 2011-04-16 16:00 UTC (permalink / raw)
  To: 'Lennart Borgman', 'Ted Zlatanov'; +Cc: emacs-devel

> > Yes, `idn-is-recommended' is autoloaded.  It may be a 
> > problem with the fboundp call, which will return true even
> > if the library is not available:
> > (when (autoload 'coocoo "coo")(fboundp 'coocoo)) => t
> > so maybe (require 'idn nil t) is a better approach
> 
> I think autoloading and checking if the function is defined is the
> best way. (Autoloading of `idn-is-recommended' should of course not be
> setup if idn.el is not available.)

I'm not following this thread, so ignore if this question doesn't help:

Why set up autoloading here?  Seems like you would want to either soft-require
the library (no error if not there) or simply test whether the function is
defined (which presumably happens when the library is loaded).

IOW, why don't you just choose between (require 'foo nil t) and (fboundp
'foobar)?  What's the point of having this library autoload the other one but
not soft-require it?

(FWIW, it sounds a priori like you should maybe just use the `fboundp' test, and
not worry about automatically loading the other library.)

Again, ignore if irrelevant to your discussion.

^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: face for non-ASCII characters
  2011-04-16 15:57                                             ` Ted Zlatanov
@ 2011-04-16 16:01                                               ` Lennart Borgman
  2011-04-16 16:13                                                 ` Ted Zlatanov
  0 siblings, 1 reply; 182+ messages in thread
From: Lennart Borgman @ 2011-04-16 16:01 UTC (permalink / raw)
  To: Ted Zlatanov; +Cc: emacs-devel

2011/4/16 Ted Zlatanov <tzz@lifelogs.com>:
> On Sat, 16 Apr 2011 17:50:07 +0200 Lennart Borgman <lennart.borgman@gmail.com> wrote:
>
>>> so maybe (require 'idn nil t) is a better approach, and we can define a
>>> stub replacement for `idn-is-recommended'.  I didn't dig too much into
>>> it because of the possibility of inclusion I mentioned.
>
> LB> I think autoloading and checking if the function is defined is the
> LB> best way. (Autoloading of `idn-is-recommended' should of course not be
> LB> setup if idn.el is not available.)
>
> OK, but then this:
>
> (defcustom markchars-what
>  `(markchars-simple-pattern
>    markchars-confusables
>    ,@(when (fboundp 'idn-is-recommended) '(markchars-nonidn-fun)))
> ...
>
> will always add `markchars-nonidn-fun' to `markchars-what' and that will
> break users without idn.el.

Why? (fboundp 'idn-is-recommended) only returns true if the function
is defined or autoloaded. If idn.el is not available the function
should not be autoloaded, or?

> LB> So using a char-table instead in idn.el would make the interface
> LB> better? (But we need someone good at this to tell if that is usable
> LB> here performancewise.)
>
> I think it's slightly better for characters to use char-table instead of
> bool-vector.  Semantically it's cleaner: you're using a char-table,
> which is meant for characters, instead of a bool-vector, which is for
> general data.  But it's a really small difference so don't bother if you
> don't feel like it :)

I sounds like a good idea, I am just waiting for some more info about
performance.

> OK, so after we resolve the autoload issue above and find out if the
> confusables.txt and idn*.txt files can be turned into Emacs character
> properties, we can publish 0.2, right?  Is there anything else it needs?
>
> Ted
>
>
>



^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: face for non-ASCII characters
  2011-04-16 16:01                                               ` Lennart Borgman
@ 2011-04-16 16:13                                                 ` Ted Zlatanov
  2011-04-16 16:22                                                   ` Lennart Borgman
  2011-04-16 16:27                                                   ` Drew Adams
  0 siblings, 2 replies; 182+ messages in thread
From: Ted Zlatanov @ 2011-04-16 16:13 UTC (permalink / raw)
  To: emacs-devel

On Sat, 16 Apr 2011 18:01:12 +0200 Lennart Borgman <lennart.borgman@gmail.com> wrote: 

LB> Why? (fboundp 'idn-is-recommended) only returns true if the function
LB> is defined or autoloaded. If idn.el is not available the function
LB> should not be autoloaded, or?

My example showed that `fboundp' returns t if the function is
autoloaded, even if it's not actually loaded yet.

(when (autoload 'coocoo1 "coo")
  (fboundp 'coocoo1))

=> t

(coocoo1)

=> Debugger entered--Lisp error: (file-error "Cannot open load file" "coo")

On Sat, 16 Apr 2011 09:00:08 -0700 "Drew Adams" <drew.adams@oracle.com> wrote: 

DA> Why set up autoloading here?  Seems like you would want to either soft-require
DA> the library (no error if not there) or simply test whether the function is
DA> defined (which presumably happens when the library is loaded).

DA> IOW, why don't you just choose between (require 'foo nil t) and (fboundp
DA> 'foobar)?  What's the point of having this library autoload the other one but
DA> not soft-require it?

Lennart wants to conditionally, if idn.el is available, to enable its
use in the defcustom so things Just Work for the user:

(defcustom markchars-what
  `(markchars-simple-pattern
    markchars-confusables
    ,@(when (fboundp 'idn-is-recommended) '(markchars-nonidn-fun)))
...

I'm not sure what's the right approach, but IMO (require 'idn nil t) is
the simplest solution.

Ted

^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: face for non-ASCII characters
  2011-04-16 16:13                                                 ` Ted Zlatanov
@ 2011-04-16 16:22                                                   ` Lennart Borgman
  2011-04-16 16:27                                                   ` Drew Adams
  1 sibling, 0 replies; 182+ messages in thread
From: Lennart Borgman @ 2011-04-16 16:22 UTC (permalink / raw)
  To: Ted Zlatanov; +Cc: emacs-devel

2011/4/16 Ted Zlatanov <tzz@lifelogs.com>:
> On Sat, 16 Apr 2011 18:01:12 +0200 Lennart Borgman <lennart.borgman@gmail.com> wrote:
>
> LB> Why? (fboundp 'idn-is-recommended) only returns true if the function
> LB> is defined or autoloaded. If idn.el is not available the function
> LB> should not be autoloaded, or?
>
> My example showed that `fboundp' returns t if the function is
> autoloaded, even if it's not actually loaded yet.
>
> (when (autoload 'coocoo1 "coo")
>  (fboundp 'coocoo1))
>
> => t
>
> (coocoo1)
>
> => Debugger entered--Lisp error: (file-error "Cannot open load file" "coo")

Yes, of course. But coocoo1 should not be autoloaded if coo.el is not
available, or?



> Lennart wants to conditionally, if idn.el is available, to enable its
> use in the defcustom so things Just Work for the user:
>
> (defcustom markchars-what
>  `(markchars-simple-pattern
>    markchars-confusables
>    ,@(when (fboundp 'idn-is-recommended) '(markchars-nonidn-fun)))
> ...
>
> I'm not sure what's the right approach, but IMO (require 'idn nil t) is
> the simplest solution.

Maybe I am beginning to understand. I suppose you want idn.el to be
available even if autoloading is not used for `idn-is-recommended'?

In that case this could perhaps work:

(unless (fboundp 'idn-is-recommended)
   (require 'idn nil t))

> Ted
>
>
>



^ permalink raw reply	[flat|nested] 182+ messages in thread

* RE: face for non-ASCII characters
  2011-04-16 16:13                                                 ` Ted Zlatanov
  2011-04-16 16:22                                                   ` Lennart Borgman
@ 2011-04-16 16:27                                                   ` Drew Adams
  2011-04-16 16:45                                                     ` Ted Zlatanov
  1 sibling, 1 reply; 182+ messages in thread
From: Drew Adams @ 2011-04-16 16:27 UTC (permalink / raw)
  To: 'Ted Zlatanov', emacs-devel

> DA> Why set up autoloading here?  Seems like you would want 
> DA> to either soft-require the library (no error if not there)
> DA> or simply test whether the function is defined (which
> DA> presumably happens when the library is loaded).
> DA>
> DA> IOW, why don't you just choose between (require 'foo nil 
> DA> t) and (fboundp 'foobar)?  What's the point of having
> DA> this library autoload the other one but not soft-require it?
> 
> Lennart wants to conditionally, if idn.el is available, to enable its
> use in the defcustom so things Just Work for the user:
> 
> (defcustom markchars-what
>   `(markchars-simple-pattern
>     markchars-confusables
>     ,@(when (fboundp 'idn-is-recommended) '(markchars-nonidn-fun)))

I understand that.  But it seems to me that either we want to unconditionally
load idn.el if available (in `load-path') or we do not want to do that.  In the
latter case, we are leaving it up to the user to load it if s?he wants.  And in
that case the proper test is simply `fboundp'.

I still don't see the point of the middle ground you're scoping out, of
autoloading but not soft-requiring.

When this library (the other one, not idn.el) is loaded, do you want idn.el to
have been loaded or not?  If you do, then soft-require.  If you want to leave
that choice up to the user, then test with `fboundp'.

What am I missing?

> I'm not sure what's the right approach, but IMO (require 'idn 
> nil t) is the simplest solution.

`fboundp' is just as simple.  It's not about simple, I think.

It's about deciding what's best for most users.  Do we want this library to load
idn.el whenever it is in the `load-path' or not?

^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: face for non-ASCII characters
  2011-04-16 16:27                                                   ` Drew Adams
@ 2011-04-16 16:45                                                     ` Ted Zlatanov
  2011-04-16 16:48                                                       ` Lennart Borgman
  0 siblings, 1 reply; 182+ messages in thread
From: Ted Zlatanov @ 2011-04-16 16:45 UTC (permalink / raw)
  To: emacs-devel

Sorry if I confused you guys.  I set it up to do

(require 'idn nil t)

and later test (fboundp 'idn-is-recommended) which I believe is the
right thing.  autoloading is not required and was confusing me.  If the
user overrides the default and adds 'markchars-nonidn-fun to
`markchars-what' then they'll get an error.  The change is trivial so I
won't repost the whole file.

Ted

^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: face for non-ASCII characters
  2011-04-16 16:45                                                     ` Ted Zlatanov
@ 2011-04-16 16:48                                                       ` Lennart Borgman
  2011-04-16 16:55                                                         ` Ted Zlatanov
  0 siblings, 1 reply; 182+ messages in thread
From: Lennart Borgman @ 2011-04-16 16:48 UTC (permalink / raw)
  To: Ted Zlatanov; +Cc: emacs-devel

2011/4/16 Ted Zlatanov <tzz@lifelogs.com>:
> Sorry if I confused you guys.  I set it up to do
>
> (require 'idn nil t)
>
> and later test (fboundp 'idn-is-recommended) which I believe is the
> right thing.  autoloading is not required and was confusing me.  If the
> user overrides the default and adds 'markchars-nonidn-fun to
> `markchars-what' then they'll get an error.  The change is trivial so I
> won't repost the whole file.

If you do it this way and the function is autoloaded but not used you
will load idn.el though it is not needed. Please see my last
suggestion instead.



^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: face for non-ASCII characters
  2011-04-16 16:48                                                       ` Lennart Borgman
@ 2011-04-16 16:55                                                         ` Ted Zlatanov
  2011-04-16 17:11                                                           ` Lennart Borgman
  0 siblings, 1 reply; 182+ messages in thread
From: Ted Zlatanov @ 2011-04-16 16:55 UTC (permalink / raw)
  To: emacs-devel

On Sat, 16 Apr 2011 18:48:49 +0200 Lennart Borgman <lennart.borgman@gmail.com> wrote: 

LB> 2011/4/16 Ted Zlatanov <tzz@lifelogs.com>:
>> Sorry if I confused you guys.  I set it up to do
>> 
>> (require 'idn nil t)
>> 
>> and later test (fboundp 'idn-is-recommended) which I believe is the
>> right thing.  autoloading is not required and was confusing me.  If the
>> user overrides the default and adds 'markchars-nonidn-fun to
>> `markchars-what' then they'll get an error.  The change is trivial so I
>> won't repost the whole file.

LB> If you do it this way and the function is autoloaded but not used you
LB> will load idn.el though it is not needed. Please see my last
LB> suggestion instead.

Could you please do it the way you like and post your version 0.2?  Even
better, put it in nXhtml and we can later propose markchars.el for the
GNU ELPA?

Thanks
Ted




^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: face for non-ASCII characters
  2011-04-16 16:55                                                         ` Ted Zlatanov
@ 2011-04-16 17:11                                                           ` Lennart Borgman
  2011-04-18 15:48                                                             ` Ted Zlatanov
  0 siblings, 1 reply; 182+ messages in thread
From: Lennart Borgman @ 2011-04-16 17:11 UTC (permalink / raw)
  To: Ted Zlatanov; +Cc: emacs-devel

2011/4/16 Ted Zlatanov <tzz@lifelogs.com>:
> On Sat, 16 Apr 2011 18:48:49 +0200 Lennart Borgman <lennart.borgman@gmail.com> wrote:
>
> LB> 2011/4/16 Ted Zlatanov <tzz@lifelogs.com>:
>>> Sorry if I confused you guys.  I set it up to do
>>>
>>> (require 'idn nil t)
>>>
>>> and later test (fboundp 'idn-is-recommended) which I believe is the
>>> right thing.  autoloading is not required and was confusing me.  If the
>>> user overrides the default and adds 'markchars-nonidn-fun to
>>> `markchars-what' then they'll get an error.  The change is trivial so I
>>> won't repost the whole file.
>
> LB> If you do it this way and the function is autoloaded but not used you
> LB> will load idn.el though it is not needed. Please see my last
> LB> suggestion instead.
>
> Could you please do it the way you like and post your version 0.2?  Even
> better, put it in nXhtml and we can later propose markchars.el for the
> GNU ELPA?
>
> Thanks
> Ted
>
>
>



^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: face for non-ASCII characters
  2011-04-16 17:11                                                           ` Lennart Borgman
@ 2011-04-18 15:48                                                             ` Ted Zlatanov
  2011-04-18 15:53                                                               ` Lennart Borgman
  0 siblings, 1 reply; 182+ messages in thread
From: Ted Zlatanov @ 2011-04-18 15:48 UTC (permalink / raw)
  To: emacs-devel

On Sat, 16 Apr 2011 19:11:45 +0200 Lennart Borgman <lennart.borgman@gmail.com> wrote: 

>> Could you please do it the way you like and post your version 0.2?  Even
>> better, put it in nXhtml and we can later propose markchars.el for the
>> GNU ELPA?

(no message)

Lennart, maybe you sent prematurely or something else went wrong?  There
was no new content in your message.

Ted




^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: face for non-ASCII characters
  2011-04-18 15:48                                                             ` Ted Zlatanov
@ 2011-04-18 15:53                                                               ` Lennart Borgman
  2011-04-18 16:20                                                                 ` Ted Zlatanov
  0 siblings, 1 reply; 182+ messages in thread
From: Lennart Borgman @ 2011-04-18 15:53 UTC (permalink / raw)
  To: Ted Zlatanov; +Cc: emacs-devel

2011/4/18 Ted Zlatanov <tzz@lifelogs.com>:
> On Sat, 16 Apr 2011 19:11:45 +0200 Lennart Borgman <lennart.borgman@gmail.com> wrote:
>
>>> Could you please do it the way you like and post your version 0.2?  Even
>>> better, put it in nXhtml and we can later propose markchars.el for the
>>> GNU ELPA?
>
> (no message)
>
> Lennart, maybe you sent prematurely or something else went wrong?  There
> was no new content in your message.

Eh, yes. I was going to say that I will take your version (with my
suggestion for loading) and put it in nXhtml repository. (Some
troubles with my computers at the moment.)



^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: face for non-ASCII characters
  2011-04-18 15:53                                                               ` Lennart Borgman
@ 2011-04-18 16:20                                                                 ` Ted Zlatanov
  2011-04-18 17:03                                                                   ` Lennart Borgman
  0 siblings, 1 reply; 182+ messages in thread
From: Ted Zlatanov @ 2011-04-18 16:20 UTC (permalink / raw)
  To: emacs-devel

On Mon, 18 Apr 2011 17:53:03 +0200 Lennart Borgman <lennart.borgman@gmail.com> wrote: 

LB> 2011/4/18 Ted Zlatanov <tzz@lifelogs.com>:
>> On Sat, 16 Apr 2011 19:11:45 +0200 Lennart Borgman <lennart.borgman@gmail.com> wrote:
>> 
>>>> Could you please do it the way you like and post your version 0.2?  Even
>>>> better, put it in nXhtml and we can later propose markchars.el for the
>>>> GNU ELPA?
>> 
>> (no message)
>> 
>> Lennart, maybe you sent prematurely or something else went wrong?  There
>> was no new content in your message.

LB> Eh, yes. I was going to say that I will take your version (with my
LB> suggestion for loading) and put it in nXhtml repository. (Some
LB> troubles with my computers at the moment.)

Is it OK if I also propose it for the GNU ELPA?  It doesn't depend on
idn.el or anything else external, and I can set up a nightly job to
mirror it from the nXhtml repository.

Ted




^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: face for non-ASCII characters
  2011-04-18 16:20                                                                 ` Ted Zlatanov
@ 2011-04-18 17:03                                                                   ` Lennart Borgman
  2011-04-19 13:07                                                                     ` Ted Zlatanov
  0 siblings, 1 reply; 182+ messages in thread
From: Lennart Borgman @ 2011-04-18 17:03 UTC (permalink / raw)
  To: Ted Zlatanov, Reuben Thomas; +Cc: emacs-devel

2011/4/18 Ted Zlatanov <tzz@lifelogs.com>:
>
> LB> Eh, yes. I was going to say that I will take your version (with my
> LB> suggestion for loading) and put it in nXhtml repository. (Some
> LB> troubles with my computers at the moment.)
>
> Is it OK if I also propose it for the GNU ELPA?  It doesn't depend on
> idn.el or anything else external, and I can set up a nightly job to
> mirror it from the nXhtml repository.

Could you please coordinate this with Reuben who have had similar
questions and offered to help?

I think the best would be to keep nXhtml as one package at the moment
(and I think Reuben agrees to that). Later on I think it should be
split and more general things like this extracted to ELPA or Emacs.



^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: face for non-ASCII characters
  2011-04-18 17:03                                                                   ` Lennart Borgman
@ 2011-04-19 13:07                                                                     ` Ted Zlatanov
  2011-04-19 18:56                                                                       ` Lennart Borgman
  0 siblings, 1 reply; 182+ messages in thread
From: Ted Zlatanov @ 2011-04-19 13:07 UTC (permalink / raw)
  To: emacs-devel

On Mon, 18 Apr 2011 19:03:15 +0200 Lennart Borgman <lennart.borgman@gmail.com> wrote: 

LB> 2011/4/18 Ted Zlatanov <tzz@lifelogs.com>:
>> 
LB> Eh, yes. I was going to say that I will take your version (with my
LB> suggestion for loading) and put it in nXhtml repository. (Some
LB> troubles with my computers at the moment.)
>> 
>> Is it OK if I also propose it for the GNU ELPA?  It doesn't depend on
>> idn.el or anything else external, and I can set up a nightly job to
>> mirror it from the nXhtml repository.

LB> Could you please coordinate this with Reuben who have had similar
LB> questions and offered to help?

When?  Where?  You have to give me some idea of the questions Reuben
asked and what he offered :)

LB> I think the best would be to keep nXhtml as one package at the moment
LB> (and I think Reuben agrees to that). Later on I think it should be
LB> split and more general things like this extracted to ELPA or Emacs.

That makes it hard for people like me, who don't use nXhtml, to use
markchars.el (which as I mentioned is a standalone library).  What would
you suggest as a remedy instead of installing nXhtml?

Thanks
Ted

^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: face for non-ASCII characters
  2011-04-19 13:07                                                                     ` Ted Zlatanov
@ 2011-04-19 18:56                                                                       ` Lennart Borgman
  2011-04-20 14:49                                                                         ` Ted Zlatanov
  0 siblings, 1 reply; 182+ messages in thread
From: Lennart Borgman @ 2011-04-19 18:56 UTC (permalink / raw)
  To: Ted Zlatanov, Reuben Thomas; +Cc: emacs-devel

2011/4/19 Ted Zlatanov <tzz@lifelogs.com>:
> On Mon, 18 Apr 2011 19:03:15 +0200 Lennart Borgman <lennart.borgman@gmail.com> wrote:
>
> LB> 2011/4/18 Ted Zlatanov <tzz@lifelogs.com>:
>>>
> LB> Eh, yes. I was going to say that I will take your version (with my
> LB> suggestion for loading) and put it in nXhtml repository. (Some
> LB> troubles with my computers at the moment.)
>>>
>>> Is it OK if I also propose it for the GNU ELPA?  It doesn't depend on
>>> idn.el or anything else external, and I can set up a nightly job to
>>> mirror it from the nXhtml repository.
>
> LB> Could you please coordinate this with Reuben who have had similar
> LB> questions and offered to help?
>
> When?  Where?  You have to give me some idea of the questions Reuben
> asked and what he offered :)

He contacted me privately. Could you please ask him so I do not get it wrong?

> LB> I think the best would be to keep nXhtml as one package at the moment
> LB> (and I think Reuben agrees to that). Later on I think it should be
> LB> split and more general things like this extracted to ELPA or Emacs.
>
> That makes it hard for people like me, who don't use nXhtml, to use
> markchars.el (which as I mentioned is a standalone library).  What would
> you suggest as a remedy instead of installing nXhtml?

I do not know how ELPA handles this. (But I do not think there is any
problems installing all of nXhtml. Everything is autoloaded.)



^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: face for non-ASCII characters
  2011-04-19 18:56                                                                       ` Lennart Borgman
@ 2011-04-20 14:49                                                                         ` Ted Zlatanov
  2011-04-20 21:38                                                                           ` Lennart Borgman
  0 siblings, 1 reply; 182+ messages in thread
From: Ted Zlatanov @ 2011-04-20 14:49 UTC (permalink / raw)
  To: emacs-devel

On Tue, 19 Apr 2011 20:56:41 +0200 Lennart Borgman <lennart.borgman@gmail.com> wrote: 

LB> 2011/4/19 Ted Zlatanov <tzz@lifelogs.com>:
LB> I think the best would be to keep nXhtml as one package at the moment
LB> (and I think Reuben agrees to that). Later on I think it should be
LB> split and more general things like this extracted to ELPA or Emacs.
>> 
>> That makes it hard for people like me, who don't use nXhtml, to use
>> markchars.el (which as I mentioned is a standalone library).  What would
>> you suggest as a remedy instead of installing nXhtml?

LB> I do not know how ELPA handles this. (But I do not think there is any
LB> problems installing all of nXhtml. Everything is autoloaded.)

All I want is an easy way to display suspicious characters (as I recall
from when we started this discussion over a year ago).  I have no
problem that you're hosting markchars.el inside nXhtml and I can mirror
just that one file into the GNU ELPA, but if nXhtml has to be
*installed* in order to use markchars.el, I'll have to write my own
version.  I think installing a large, unrelated package is an
unnecessary burden on those who just want to detect suspicious
characters.  Can you please confirm one way or the other so I know what
I need to do, mirror or rewrite?

Just to be clear, mirroring markchars.el does not require you to change
anything.  The GNU ELPA machine will pull the latest markchars.el down
daily, that's all.

Thanks
Ted

^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: face for non-ASCII characters
  2011-04-20 14:49                                                                         ` Ted Zlatanov
@ 2011-04-20 21:38                                                                           ` Lennart Borgman
  2011-04-21 17:35                                                                             ` Ted Zlatanov
  0 siblings, 1 reply; 182+ messages in thread
From: Lennart Borgman @ 2011-04-20 21:38 UTC (permalink / raw)
  To: Ted Zlatanov, Reuben Thomas; +Cc: emacs-devel

2011/4/20 Ted Zlatanov <tzz@lifelogs.com>:
>
> LB> I do not know how ELPA handles this. (But I do not think there is any
> LB> problems installing all of nXhtml. Everything is autoloaded.)
>
> All I want is an easy way to display suspicious characters (as I recall
> from when we started this discussion over a year ago).  I have no
> problem that you're hosting markchars.el inside nXhtml and I can mirror
> just that one file into the GNU ELPA, but if nXhtml has to be
> *installed* in order to use markchars.el, I'll have to write my own
> version.  I think installing a large, unrelated package is an
> unnecessary burden on those who just want to detect suspicious
> characters.  Can you please confirm one way or the other so I know what
> I need to do, mirror or rewrite?
>
> Just to be clear, mirroring markchars.el does not require you to change
> anything.  The GNU ELPA machine will pull the latest markchars.el down
> daily, that's all.

nXhtml is not very much more monolitic than ELPA actually. You can use
just the basic libraries in nXhtml and get the rest downloaded
automatically when you try to access them. So it contains kind of
"ELPA" inside.

But that is not the point here, but since you took it up I mention it.

As I tried to say nXhtml will somehow be mirrored in ELPA and Reuben
has said that he want to contribute to that process. So please try to
discuss this witrh him.



^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: face for non-ASCII characters
  2011-04-20 21:38                                                                           ` Lennart Borgman
@ 2011-04-21 17:35                                                                             ` Ted Zlatanov
  2011-04-21 18:42                                                                               ` Lennart Borgman
  0 siblings, 1 reply; 182+ messages in thread
From: Ted Zlatanov @ 2011-04-21 17:35 UTC (permalink / raw)
  To: emacs-devel

On Wed, 20 Apr 2011 23:38:12 +0200 Lennart Borgman <lennart.borgman@gmail.com> wrote: 

LB> 2011/4/20 Ted Zlatanov <tzz@lifelogs.com>:
>> 
LB> I do not know how ELPA handles this. (But I do not think there is any
LB> problems installing all of nXhtml. Everything is autoloaded.)
>> 
>> All I want is an easy way to display suspicious characters (as I recall
>> from when we started this discussion over a year ago).  I have no
>> problem that you're hosting markchars.el inside nXhtml and I can mirror
>> just that one file into the GNU ELPA, but if nXhtml has to be
>> *installed* in order to use markchars.el, I'll have to write my own
>> version.  I think installing a large, unrelated package is an
>> unnecessary burden on those who just want to detect suspicious
>> characters.  Can you please confirm one way or the other so I know what
>> I need to do, mirror or rewrite?
>> 
>> Just to be clear, mirroring markchars.el does not require you to change
>> anything.  The GNU ELPA machine will pull the latest markchars.el down
>> daily, that's all.

LB> nXhtml is not very much more monolitic than ELPA actually. You can use
LB> just the basic libraries in nXhtml and get the rest downloaded
LB> automatically when you try to access them. So it contains kind of
LB> "ELPA" inside.

LB> But that is not the point here, but since you took it up I mention it.

LB> As I tried to say nXhtml will somehow be mirrored in ELPA and Reuben
LB> has said that he want to contribute to that process. So please try to
LB> discuss this witrh him.

If you think about it, this is not a good user experience.

"You should beware domain names with suspicious characters that are not
in the same script.  markchars.el can do it."

"How do I install it?"

"You have to install nXhtml, a web development environment."

"???"

Compare to 'You run (package-install "markchars").'  Which one makes
more sense for the users?  And even assuming nXhtml is in the ELPA on
its own, if they are browsing the list of packages, why would they
expect a web development environment to provide what markchars.el
provides?

Sorry if this is not what you want to do, but it really doesn't make
sense to force the user to install nXhtml in order to detect suspicious
characters.  So I think mirroring or rewriting markchars.el are the only
options to provide a good user experience.  I can't think of any others.

Ted

^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: face for non-ASCII characters
  2011-04-21 17:35                                                                             ` Ted Zlatanov
@ 2011-04-21 18:42                                                                               ` Lennart Borgman
  2011-04-21 19:14                                                                                 ` Ted Zlatanov
  0 siblings, 1 reply; 182+ messages in thread
From: Lennart Borgman @ 2011-04-21 18:42 UTC (permalink / raw)
  To: Ted Zlatanov; +Cc: emacs-devel

2011/4/21 Ted Zlatanov <tzz@lifelogs.com>:
> On Wed, 20 Apr 2011 23:38:12 +0200 Lennart Borgman <lennart.borgman@gmail.com> wrote:
>
> LB> 2011/4/20 Ted Zlatanov <tzz@lifelogs.com>:
>>>
> LB> I do not know how ELPA handles this. (But I do not think there is any
> LB> problems installing all of nXhtml. Everything is autoloaded.)
>>>
>>> All I want is an easy way to display suspicious characters (as I recall
>>> from when we started this discussion over a year ago).  I have no
>>> problem that you're hosting markchars.el inside nXhtml and I can mirror
>>> just that one file into the GNU ELPA, but if nXhtml has to be
>>> *installed* in order to use markchars.el, I'll have to write my own
>>> version.  I think installing a large, unrelated package is an
>>> unnecessary burden on those who just want to detect suspicious
>>> characters.  Can you please confirm one way or the other so I know what
>>> I need to do, mirror or rewrite?
>>>
>>> Just to be clear, mirroring markchars.el does not require you to change
>>> anything.  The GNU ELPA machine will pull the latest markchars.el down
>>> daily, that's all.
>
> LB> nXhtml is not very much more monolitic than ELPA actually. You can use
> LB> just the basic libraries in nXhtml and get the rest downloaded
> LB> automatically when you try to access them. So it contains kind of
> LB> "ELPA" inside.
>
> LB> But that is not the point here, but since you took it up I mention it.
>
> LB> As I tried to say nXhtml will somehow be mirrored in ELPA and Reuben
> LB> has said that he want to contribute to that process. So please try to
> LB> discuss this witrh him.
>
> If you think about it, this is not a good user experience.
>
> "You should beware domain names with suspicious characters that are not
> in the same script.  markchars.el can do it."
>
> "How do I install it?"
>
> "You have to install nXhtml, a web development environment."
>
> "???"

Yes, the name and description is misleading. I have been waiting for
something like ELPA (with the additonal capabilities in that area that
nXhtml provides).


> Sorry if this is not what you want to do, but it really doesn't make
> sense to force the user to install nXhtml in order to detect suspicious
> characters.  So I think mirroring or rewriting markchars.el are the only
> options to provide a good user experience.  I can't think of any others.

You are maybe misunderstanding part of this. For me the transition is
the problem (and maybe the current lack of dynamic loading of
libraries from ELPA). I do agree with you where markchars.el should be
in the end. It should be in ELPA or Emacs. It is the road we have to
think about.



^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: face for non-ASCII characters
  2011-04-21 18:42                                                                               ` Lennart Borgman
@ 2011-04-21 19:14                                                                                 ` Ted Zlatanov
  2011-04-21 20:00                                                                                   ` Lennart Borgman
  0 siblings, 1 reply; 182+ messages in thread
From: Ted Zlatanov @ 2011-04-21 19:14 UTC (permalink / raw)
  To: emacs-devel

On Thu, 21 Apr 2011 20:42:08 +0200 Lennart Borgman <lennart.borgman@gmail.com> wrote: 

LB> You are maybe misunderstanding part of this. For me the transition is
LB> the problem (and maybe the current lack of dynamic loading of
LB> libraries from ELPA).

I'm not sure what you mean about dynamic loading.  Package installation
from the GNU ELPA (and ELPA in general) is static: you install what you
want synchronously, there is no "auto-install this when I need it"
functionality.  Do you mean this is an issue?

LB> I do agree with you where markchars.el should be in the end. It
LB> should be in ELPA or Emacs. It is the road we have to think about.

OK, as I understand it you also want to be able to offer markchars.el to
users as a separate feature independent of nXhtml.  We agree.

As I said markchars.el can be simply mirrored into the GNU ELPA nightly
from the nXhtml repository.  Is that the road you're referring to?  Or
does something else about the mirroring worry you?  How would you like
to tell users to install it, if my proposal is not OK for some reason?

Thanks
Ted

^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: face for non-ASCII characters
  2011-04-21 19:14                                                                                 ` Ted Zlatanov
@ 2011-04-21 20:00                                                                                   ` Lennart Borgman
  2011-04-21 20:35                                                                                     ` Ted Zlatanov
  0 siblings, 1 reply; 182+ messages in thread
From: Lennart Borgman @ 2011-04-21 20:00 UTC (permalink / raw)
  To: Ted Zlatanov; +Cc: emacs-devel

2011/4/21 Ted Zlatanov <tzz@lifelogs.com>:
> On Thu, 21 Apr 2011 20:42:08 +0200 Lennart Borgman <lennart.borgman@gmail.com> wrote:
>
> LB> You are maybe misunderstanding part of this. For me the transition is
> LB> the problem (and maybe the current lack of dynamic loading of
> LB> libraries from ELPA).
>
> I'm not sure what you mean about dynamic loading.

What I mean here is what can be used in nXhtml: If you (require
'somelib) and somelib.el is not on your computer then you can have it
automatically downloaded from nXhtml repository (with a possibility to
check the code before actually installing it).

> Package installation
> from the GNU ELPA (and ELPA in general) is static: you install what you
> want synchronously, there is no "auto-install this when I need it"
> functionality.  Do you mean this is an issue?

Yes. It makes it more difficult to integrate elisp libraries. This is
one of the reasons I have kept so many things in nXhtml. If ELPA had
that possibility I would not hesitate to move things into ELPA.
(Unfortunately for this to work realy  reliable a (simple) change in
`require' is needed.)

> LB> I do agree with you where markchars.el should be in the end. It
> LB> should be in ELPA or Emacs. It is the road we have to think about.
>
> OK, as I understand it you also want to be able to offer markchars.el to
> users as a separate feature independent of nXhtml.  We agree.
>
> As I said markchars.el can be simply mirrored into the GNU ELPA nightly
> from the nXhtml repository.  Is that the road you're referring to?  Or
> does something else about the mirroring worry you?  How would you like
> to tell users to install it, if my proposal is not OK for some reason?

At the moment just grab it from the nXhtml repository. You can
download it from the web page there.



^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: face for non-ASCII characters
  2011-04-21 20:00                                                                                   ` Lennart Borgman
@ 2011-04-21 20:35                                                                                     ` Ted Zlatanov
  2011-04-21 20:53                                                                                       ` Lennart Borgman
  0 siblings, 1 reply; 182+ messages in thread
From: Ted Zlatanov @ 2011-04-21 20:35 UTC (permalink / raw)
  To: emacs-devel

On Thu, 21 Apr 2011 22:00:38 +0200 Lennart Borgman <lennart.borgman@gmail.com> wrote: 

LB> 2011/4/21 Ted Zlatanov <tzz@lifelogs.com>:
>> On Thu, 21 Apr 2011 20:42:08 +0200 Lennart Borgman <lennart.borgman@gmail.com> wrote:
>> 
LB> You are maybe misunderstanding part of this. For me the transition is
LB> the problem (and maybe the current lack of dynamic loading of
LB> libraries from ELPA).
>> 
>> I'm not sure what you mean about dynamic loading.

LB> What I mean here is what can be used in nXhtml: If you (require
LB> 'somelib) and somelib.el is not on your computer then you can have it
LB> automatically downloaded from nXhtml repository (with a possibility to
LB> check the code before actually installing it).

I would be strongly opposed to opportunistic package installations in
general, although nXhtml can use it internally of course.

>> Package installation from the GNU ELPA (and ELPA in general) is
>> static: you install what you want synchronously, there is no
>> "auto-install this when I need it" functionality.  Do you mean this
>> is an issue?

LB> Yes. It makes it more difficult to integrate elisp libraries. This is
LB> one of the reasons I have kept so many things in nXhtml. If ELPA had
LB> that possibility I would not hesitate to move things into ELPA.
LB> (Unfortunately for this to work realy  reliable a (simple) change in
LB> `require' is needed.)

You should make that dynamic loading proposal separately and maybe it
will be accepted, but as far as markchars.el goes, since it will reside
in nXhtml, you can still do the nXhtml opportunistic installs if the
user hasn't installed it from the GNU ELPA statically.

>> As I said markchars.el can be simply mirrored into the GNU ELPA nightly
>> from the nXhtml repository.  Is that the road you're referring to?  Or
>> does something else about the mirroring worry you?  How would you like
>> to tell users to install it, if my proposal is not OK for some reason?

LB> At the moment just grab it from the nXhtml repository. You can
LB> download it from the web page there.

Sorry, is "just grab it" advice for me to mirror into the GNU ELPA, or
advice for the end user?  I hope it's the former but just making sure...

Ted

^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: face for non-ASCII characters
  2011-04-21 20:35                                                                                     ` Ted Zlatanov
@ 2011-04-21 20:53                                                                                       ` Lennart Borgman
  2011-04-21 21:18                                                                                         ` Ted Zlatanov
  0 siblings, 1 reply; 182+ messages in thread
From: Lennart Borgman @ 2011-04-21 20:53 UTC (permalink / raw)
  To: Ted Zlatanov; +Cc: emacs-devel

2011/4/21 Ted Zlatanov <tzz@lifelogs.com>:
> On Thu, 21 Apr 2011 22:00:38 +0200 Lennart Borgman <lennart.borgman@gmail.com> wrote:
>
> LB> 2011/4/21 Ted Zlatanov <tzz@lifelogs.com>:
>>> On Thu, 21 Apr 2011 20:42:08 +0200 Lennart Borgman <lennart.borgman@gmail.com> wrote:
>>>
> LB> You are maybe misunderstanding part of this. For me the transition is
> LB> the problem (and maybe the current lack of dynamic loading of
> LB> libraries from ELPA).
>>>
>>> I'm not sure what you mean about dynamic loading.
>
> LB> What I mean here is what can be used in nXhtml: If you (require
> LB> 'somelib) and somelib.el is not on your computer then you can have it
> LB> automatically downloaded from nXhtml repository (with a possibility to
> LB> check the code before actually installing it).
>
> I would be strongly opposed to opportunistic package installations in
> general, although nXhtml can use it internally of course.

Why is the word opportunistic used by you here? I do not have time to
discuss if you do not take it seriously. Please describe exactly what
it is you do not like instead.


>>> Package installation from the GNU ELPA (and ELPA in general) is
>>> static: you install what you want synchronously, there is no
>>> "auto-install this when I need it" functionality.  Do you mean this
>>> is an issue?
>
> LB> Yes. It makes it more difficult to integrate elisp libraries. This is
> LB> one of the reasons I have kept so many things in nXhtml. If ELPA had
> LB> that possibility I would not hesitate to move things into ELPA.
> LB> (Unfortunately for this to work realy  reliable a (simple) change in
> LB> `require' is needed.)
>
> You should make that dynamic loading proposal separately and maybe it
> will be accepted, but as far as markchars.el goes, since it will reside
> in nXhtml, you can still do the nXhtml opportunistic installs if the
> user hasn't installed it from the GNU ELPA statically.

You are greatly exaggerating. The difference between ELPA and nXhtml
here is that nXhtml will propose that you can install a library to get
things working while ELPA will not do that. So I think the way nXhtml
handles this just makes things easier for the user. (And I  would be
glad if ELPA did this instead of nXhtml. I really do not want the code
there. And I do not want markchars.el either there.)

>>> As I said markchars.el can be simply mirrored into the GNU ELPA nightly
>>> from the nXhtml repository.  Is that the road you're referring to?  Or
>>> does something else about the mirroring worry you?  How would you like
>>> to tell users to install it, if my proposal is not OK for some reason?
>
> LB> At the moment just grab it from the nXhtml repository. You can
> LB> download it from the web page there.
>
> Sorry, is "just grab it" advice for me to mirror into the GNU ELPA, or
> advice for the end user?  I hope it's the former but just making sure...

At first sight one might think that your proposal to mirror
markchars.el into ELPA is not troublesome. However you may end up with
two versions of markchars.el if you mirror it into ELPA now.

I would be glad to have it in ELPA - if just the automatic
installation could be fixed too.

But you are however of course free to do what you want.



^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: face for non-ASCII characters
  2011-04-21 20:53                                                                                       ` Lennart Borgman
@ 2011-04-21 21:18                                                                                         ` Ted Zlatanov
  2011-04-22 12:20                                                                                           ` Lennart Borgman
  0 siblings, 1 reply; 182+ messages in thread
From: Ted Zlatanov @ 2011-04-21 21:18 UTC (permalink / raw)
  To: emacs-devel

On Thu, 21 Apr 2011 22:53:01 +0200 Lennart Borgman <lennart.borgman@gmail.com> wrote: 

LB> 2011/4/21 Ted Zlatanov <tzz@lifelogs.com>:
>>>> I'm not sure what you mean about dynamic loading.
>> 
LB> What I mean here is what can be used in nXhtml: If you (require
LB> 'somelib) and somelib.el is not on your computer then you can have it
LB> automatically downloaded from nXhtml repository (with a possibility to
LB> check the code before actually installing it).
>> 
>> I would be strongly opposed to opportunistic package installations in
>> general, although nXhtml can use it internally of course.

LB> Why is the word opportunistic used by you here? I do not have time to
LB> discuss if you do not take it seriously. Please describe exactly what
LB> it is you do not like instead.

"Opportunistic" means it's installed when you need it as you said.
"Dynamic loading" is a term easily confused with the Unix dynamic
libraries, that's why I avoided it.  I am not using "opportunistic"
disparagingly.

I am opposed to opportunistic installs because they destabilize the
working environment.  They may make sense in a tightly controlled
environment, but for a general audience (all Emacs users) I think they
are a bad idea.  Most package managers I've used (Perl, Python, Ruby,
Emacs, XEmacs, Unix distributions) do static installs.

This is different from autoloading, where you know the library is
available and you've scanned it for autoload cookies.

LB> You are greatly exaggerating. The difference between ELPA and nXhtml
LB> here is that nXhtml will propose that you can install a library to get
LB> things working while ELPA will not do that.

ELPA will install all the dependencies when it installs the library.  So
when the library is installed, you won't have surprises later.  If
you're talking about optional add-ons and plugins, that's a different
discussion :)

As I said, you should make the opportunistic/dynamic loading proposal
and maybe it will be accepted.  While it seems to me like a bad idea,
it's entirely possible it turns out to be good!  We won't know until
it's discussed directly.

LB> At first sight one might think that your proposal to mirror
LB> markchars.el into ELPA is not troublesome. However you may end up with
LB> two versions of markchars.el if you mirror it into ELPA now.

That would last for at most 1 day, until the nightly synchronization
catches up with the nXhtml repository.  I think that's OK.  The nXhtml
repository will still be the primary repository.

LB> I would be glad to have it in ELPA - if just the automatic
LB> installation could be fixed too.

Can you give a specific scenario where markchars.el in both the GNU ELPA
and in nXhtml would be a problem?  I want to understand what needs to be
fixed.

LB> But you are however of course free to do what you want.

Sure, but I'd rather collaborate if I can.  The easiest thing (just keep
markchars.el in the GNU ELPA) is not the best thing for the users.

Ted

^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: face for non-ASCII characters
  2011-04-21 21:18                                                                                         ` Ted Zlatanov
@ 2011-04-22 12:20                                                                                           ` Lennart Borgman
  2011-04-22 12:49                                                                                             ` Stephen J. Turnbull
  2011-04-22 14:20                                                                                             ` Ted Zlatanov
  0 siblings, 2 replies; 182+ messages in thread
From: Lennart Borgman @ 2011-04-22 12:20 UTC (permalink / raw)
  To: Ted Zlatanov; +Cc: emacs-devel

2011/4/21 Ted Zlatanov <tzz@lifelogs.com>:
> On Thu, 21 Apr 2011 22:53:01 +0200 Lennart Borgman <lennart.borgman@gmail.com> wrote:
>
> LB> 2011/4/21 Ted Zlatanov <tzz@lifelogs.com>:
>>>>> I'm not sure what you mean about dynamic loading.
>>>
> LB> What I mean here is what can be used in nXhtml: If you (require
> LB> 'somelib) and somelib.el is not on your computer then you can have it
> LB> automatically downloaded from nXhtml repository (with a possibility to
> LB> check the code before actually installing it).
>>>
>>> I would be strongly opposed to opportunistic package installations in
>>> general, although nXhtml can use it internally of course.
>
> LB> Why is the word opportunistic used by you here? I do not have time to
> LB> discuss if you do not take it seriously. Please describe exactly what
> LB> it is you do not like instead.
>
> "Opportunistic" means it's installed when you need it as you said.
> "Dynamic loading" is a term easily confused with the Unix dynamic
> libraries, that's why I avoided it.  I am not using "opportunistic"
> disparagingly.

Oh, I see. Then my answer was inappropriate. My excuses. And I am glad
I was wrong.

> I am opposed to opportunistic installs because they destabilize the
> working environment.  They may make sense in a tightly controlled
> environment, but for a general audience (all Emacs users) I think they
> are a bad idea.  Most package managers I've used (Perl, Python, Ruby,
> Emacs, XEmacs, Unix distributions) do static installs.

I can surely see the problem, but if the opportunistic installer asks
(and make it possible to check) before each install I do not think it
is an additional problem when using Emacs.

For another comparison think about the firewalls. They effectively act
similar to such an opportunistic installer as I suggest when they ask
you if you want a program to be able to do that and that.

> This is different from autoloading, where you know the library is
> available and you've scanned it for autoload cookies.

This can be done by an opportunistic installer. In fact the
opportunistic installer in nXhtml allows you to check the file.

> LB> You are greatly exaggerating. The difference between ELPA and nXhtml
> LB> here is that nXhtml will propose that you can install a library to get
> LB> things working while ELPA will not do that.
>
> ELPA will install all the dependencies when it installs the library.  So
> when the library is installed, you won't have surprises later.  If
> you're talking about optional add-ons and plugins, that's a different
> discussion :)

It is not clear all the time what dependencies there are since that
may depend on how you are using a library. That is why I think an
opportunistic installer is good.

> As I said, you should make the opportunistic/dynamic loading proposal
> and maybe it will be accepted.  While it seems to me like a bad idea,
> it's entirely possible it turns out to be good!  We won't know until
> it's discussed directly.

I do not have time to discuss it much now, but please see my explanations above.

> LB> At first sight one might think that your proposal to mirror
> LB> markchars.el into ELPA is not troublesome. However you may end up with
> LB> two versions of markchars.el if you mirror it into ELPA now.
>
> That would last for at most 1 day, until the nightly synchronization
> catches up with the nXhtml repository.  I think that's OK.  The nXhtml
> repository will still be the primary repository.

A misunderstanding. I was referring to two versions in different
locations on the users computers.

> LB> I would be glad to have it in ELPA - if just the automatic
> LB> installation could be fixed too.
>
> Can you give a specific scenario where markchars.el in both the GNU ELPA
> and in nXhtml would be a problem?  I want to understand what needs to be
> fixed.

Please see just above.

> LB> But you are however of course free to do what you want.
>
> Sure, but I'd rather collaborate if I can.  The easiest thing (just keep
> markchars.el in the GNU ELPA) is not the best thing for the users.

Good. I am not sure either but want to give you my concerns. Please
feel free to handle it the way you think is best at the moment.



^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: face for non-ASCII characters
  2011-04-22 12:20                                                                                           ` Lennart Borgman
@ 2011-04-22 12:49                                                                                             ` Stephen J. Turnbull
  2011-04-22 13:23                                                                                               ` Lennart Borgman
  2011-04-22 14:20                                                                                             ` Ted Zlatanov
  1 sibling, 1 reply; 182+ messages in thread
From: Stephen J. Turnbull @ 2011-04-22 12:49 UTC (permalink / raw)
  To: Lennart Borgman; +Cc: Ted Zlatanov, emacs-devel

Lennart Borgman writes:

 > > As I said, you should make the opportunistic/dynamic loading proposal
 > > and maybe it will be accepted.  While it seems to me like a bad idea,
 > > it's entirely possible it turns out to be good!  We won't know until
 > > it's discussed directly.
 > 
 > I do not have time to discuss it much now, but please see my
 > explanations above.

It was discussed at the time that ELPA became officially sanctioned by
the Emacs project, and Richard's reaction then was almost identical to
Ted's now, except Richard left less room for changing his mind on the
issue.




^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: face for non-ASCII characters
  2011-04-22 12:49                                                                                             ` Stephen J. Turnbull
@ 2011-04-22 13:23                                                                                               ` Lennart Borgman
  2011-04-23  0:50                                                                                                 ` Richard Stallman
  0 siblings, 1 reply; 182+ messages in thread
From: Lennart Borgman @ 2011-04-22 13:23 UTC (permalink / raw)
  To: Stephen J. Turnbull; +Cc: Ted Zlatanov, emacs-devel

On Fri, Apr 22, 2011 at 2:49 PM, Stephen J. Turnbull <stephen@xemacs.org> wrote:
> Lennart Borgman writes:
>
>  > > As I said, you should make the opportunistic/dynamic loading proposal
>  > > and maybe it will be accepted.  While it seems to me like a bad idea,
>  > > it's entirely possible it turns out to be good!  We won't know until
>  > > it's discussed directly.
>  >
>  > I do not have time to discuss it much now, but please see my
>  > explanations above.
>
> It was discussed at the time that ELPA became officially sanctioned by
> the Emacs project, and Richard's reaction then was almost identical to
> Ted's now, except Richard left less room for changing his mind on the
> issue.

I believe RMS rejection was not so much because of instability but
insecurity and that the user should have control. It was after that I
added the possibility to review and reject the opportunistic install,
just before the library is going to be installed.



^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: face for non-ASCII characters
  2011-04-22 12:20                                                                                           ` Lennart Borgman
  2011-04-22 12:49                                                                                             ` Stephen J. Turnbull
@ 2011-04-22 14:20                                                                                             ` Ted Zlatanov
  2011-04-22 17:12                                                                                               ` Lennart Borgman
  2011-04-26  3:09                                                                                               ` markchars.el 0.2.0 and idn.el (was: face for non-ASCII characters) Ted Zlatanov
  1 sibling, 2 replies; 182+ messages in thread
From: Ted Zlatanov @ 2011-04-22 14:20 UTC (permalink / raw)
  To: emacs-devel

On Fri, 22 Apr 2011 14:20:45 +0200 Lennart Borgman <lennart.borgman@gmail.com> wrote: 

LB> I can surely see the problem, but if the opportunistic installer asks
LB> (and make it possible to check) before each install I do not think it
LB> is an additional problem when using Emacs.

At the very least it's a burden on the user.  What programs do you know
that use this system?  If the prevailing norm is to do static installs,
that suggests that users prefer it (I can't believe no one thought
"let's do opportunistic installs!" before).

LB> For another comparison think about the firewalls. They effectively act
LB> similar to such an opportunistic installer as I suggest when they ask
LB> you if you want a program to be able to do that and that.

I think the difference here is between installing software and enabling
services.

>> ELPA will install all the dependencies when it installs the library.  So
>> when the library is installed, you won't have surprises later.  If
>> you're talking about optional add-ons and plugins, that's a different
>> discussion :)

LB> It is not clear all the time what dependencies there are since that
LB> may depend on how you are using a library. That is why I think an
LB> opportunistic installer is good.

OK, so we're talking about plugins, not package dependencies.  Those may
be useful in a limited context, e.g. within nXhtml itself.  Emacs may
even get facilities to support them generally some day.  But plugins are
not packages.

I don't think markchars.el is a plugin.  It does not depend on nXhtml
and does not enhance it in a special way; it's a general package.  So
perhaps our misunderstanding is semantic :)

LB> At first sight one might think that your proposal to mirror
LB> markchars.el into ELPA is not troublesome. However you may end up with
LB> two versions of markchars.el if you mirror it into ELPA now.
>> 
>> That would last for at most 1 day, until the nightly synchronization
>> catches up with the nXhtml repository.  I think that's OK.  The nXhtml
>> repository will still be the primary repository.

LB> A misunderstanding. I was referring to two versions in different
LB> locations on the users computers.

Ah.  package.el installs the two versions of the library in different
locations and will activate only one.  Thus the user has control over
the versions and can upgrade.  Does nXhtml do that?

In any case, as long as nXhtml puts its plugin directory in front of
package.el on the load-path, markchars.el will be loaded from the
install location nXhtml specifies.

>> Sure, but I'd rather collaborate if I can.  The easiest thing (just keep
>> markchars.el in the GNU ELPA) is not the best thing for the users.

LB> Good. I am not sure either but want to give you my concerns. Please
LB> feel free to handle it the way you think is best at the moment.

OK, I'll mirror it.  I don't expect it to become a problem.

LB> I believe RMS rejection was not so much because of instability but
LB> insecurity and that the user should have control. It was after that I
LB> added the possibility to review and reject the opportunistic install,
LB> just before the library is going to be installed.

As I said, you have to make a proposal and defend it.  It may turn out
to be really great, we won't know until it's up for review.  But I think
you should frame it as a "plugin facility" instead of a package manager
to give it a good chance to be accepted.

Ted

^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: face for non-ASCII characters
  2011-04-22 14:20                                                                                             ` Ted Zlatanov
@ 2011-04-22 17:12                                                                                               ` Lennart Borgman
  2011-04-26  3:14                                                                                                 ` package management proposals for Emacs (was: face for non-ASCII characters) Ted Zlatanov
  2011-04-26  3:09                                                                                               ` markchars.el 0.2.0 and idn.el (was: face for non-ASCII characters) Ted Zlatanov
  1 sibling, 1 reply; 182+ messages in thread
From: Lennart Borgman @ 2011-04-22 17:12 UTC (permalink / raw)
  To: Ted Zlatanov; +Cc: emacs-devel

2011/4/22 Ted Zlatanov <tzz@lifelogs.com>:
> On Fri, 22 Apr 2011 14:20:45 +0200 Lennart Borgman <lennart.borgman@gmail.com> wrote:
>
> LB> I can surely see the problem, but if the opportunistic installer asks
> LB> (and make it possible to check) before each install I do not think it
> LB> is an additional problem when using Emacs.
>
> At the very least it's a burden on the user.  What programs do you know
> that use this system?  If the prevailing norm is to do static installs,
> that suggests that users prefer it (I can't believe no one thought
> "let's do opportunistic installs!" before).

Web browsers do it all the time.

> LB> For another comparison think about the firewalls. They effectively act
> LB> similar to such an opportunistic installer as I suggest when they ask
> LB> you if you want a program to be able to do that and that.
>
> I think the difference here is between installing software and enabling
> services.

There is no difference that I can see when it comes to stability
(which was what I believe you suggested as the most important).

>>> ELPA will install all the dependencies when it installs the library.  So
>>> when the library is installed, you won't have surprises later.  If
>>> you're talking about optional add-ons and plugins, that's a different
>>> discussion :)
>
> LB> It is not clear all the time what dependencies there are since that
> LB> may depend on how you are using a library. That is why I think an
> LB> opportunistic installer is good.
>
> OK, so we're talking about plugins, not package dependencies.

?? Elisp libraries work the same way AFAICS.

> Those may
> be useful in a limited context, e.g. within nXhtml itself.  Emacs may
> even get facilities to support them generally some day.  But plugins are
> not packages.
>
> I don't think markchars.el is a plugin.  It does not depend on nXhtml
> and does not enhance it in a special way; it's a general package.  So
> perhaps our misunderstanding is semantic :)

I can't find any sense in what you say here. Could explain what
differences you see?

> LB> A misunderstanding. I was referring to two versions in different
> LB> locations on the users computers.
>
> Ah.  package.el installs the two versions of the library in different
> locations and will activate only one.  Thus the user has control over
> the versions and can upgrade.  Does nXhtml do that?

No, it did not make sense to finish the system for opportunistic
install (since ELPA was to be used). I made it more as an example of
how it can  be built. (But it works, of course.)

> In any case, as long as nXhtml puts its plugin directory in front of
> package.el on the load-path, markchars.el will be loaded from the
> install location nXhtml specifies.

Yes.

>>> Sure, but I'd rather collaborate if I can.  The easiest thing (just keep

>>> markchars.el in the GNU ELPA) is not the best thing for the users.
>
> LB> Good. I am not sure either but want to give you my concerns. Please
> LB> feel free to handle it the way you think is best at the moment.
>
> OK, I'll mirror it.  I don't expect it to become a problem.

Why not mirror idn.el too then?

> LB> I believe RMS rejection was not so much because of instability but
> LB> insecurity and that the user should have control. It was after that I
> LB> added the possibility to review and reject the opportunistic install,
> LB> just before the library is going to be installed.
>
> As I said, you have to make a proposal and defend it.  It may turn out
> to be really great, we won't know until it's up for review.  But I think
> you should frame it as a "plugin facility" instead of a package manager
> to give it a good chance to be accepted.

I think it would make most sense as an enhancemen to ELPA since the
libraries within ELPA should be good to use so we do not have to worry
about something bad getting installed this way.



^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: face for non-ASCII characters
  2011-04-22 13:23                                                                                               ` Lennart Borgman
@ 2011-04-23  0:50                                                                                                 ` Richard Stallman
  2011-04-23  7:13                                                                                                   ` Lennart Borgman
  0 siblings, 1 reply; 182+ messages in thread
From: Richard Stallman @ 2011-04-23  0:50 UTC (permalink / raw)
  To: Lennart Borgman; +Cc: stephen, tzz, emacs-devel

I already made a decision about this question.  The users' freedom is
at stake, not just convenience.

To decide it again would be extra work that doesn't need to be done
and that I don't have time for.

-- 
Dr Richard Stallman
President, Free Software Foundation
51 Franklin St
Boston MA 02110
USA
www.fsf.org, www.gnu.org
Skype: No way! That's nonfree (freedom-denying) software.




^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: face for non-ASCII characters
  2011-04-23  0:50                                                                                                 ` Richard Stallman
@ 2011-04-23  7:13                                                                                                   ` Lennart Borgman
  2011-04-25 17:54                                                                                                     ` Richard Stallman
  0 siblings, 1 reply; 182+ messages in thread
From: Lennart Borgman @ 2011-04-23  7:13 UTC (permalink / raw)
  To: rms; +Cc: stephen, tzz, emacs-devel

On Sat, Apr 23, 2011 at 2:50 AM, Richard Stallman <rms@gnu.org> wrote:
> I already made a decision about this question.  The users' freedom is
> at stake, not just convenience.

This was a decision you made against a version of opportunistic
install where the user could not decide against it. After that I
changed it so the user can review and decide at each opportunistic
install.

I can not see that the users' freedom in any way is at stake with the
new version.

> To decide it again would be extra work that doesn't need to be done
> and that I don't have time for.



^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: face for non-ASCII characters
  2011-04-23  7:13                                                                                                   ` Lennart Borgman
@ 2011-04-25 17:54                                                                                                     ` Richard Stallman
  2011-04-26 18:26                                                                                                       ` Chong Yidong
  0 siblings, 1 reply; 182+ messages in thread
From: Richard Stallman @ 2011-04-25 17:54 UTC (permalink / raw)
  To: Lennart Borgman; +Cc: stephen, tzz, emacs-devel

    After that I
    changed it so the user can review and decide at each opportunistic
    install.

This is not a binary question, it's a scale.  You want to take a step
towards the wrong end of the scale.  We are just introducing the
separate package archive, which is a big step.  Now is too soon to
consider another step.  The risk here is hard to calculate, so we must
be cautious.

-- 
Dr Richard Stallman
President, Free Software Foundation
51 Franklin St
Boston MA 02110
USA
www.fsf.org, www.gnu.org
Skype: No way! That's nonfree (freedom-denying) software.




^ permalink raw reply	[flat|nested] 182+ messages in thread

* markchars.el 0.2.0 and idn.el (was: face for non-ASCII characters)
  2011-04-22 14:20                                                                                             ` Ted Zlatanov
  2011-04-22 17:12                                                                                               ` Lennart Borgman
@ 2011-04-26  3:09                                                                                               ` Ted Zlatanov
  2011-04-26  8:13                                                                                                 ` Lennart Borgman
  1 sibling, 1 reply; 182+ messages in thread
From: Ted Zlatanov @ 2011-04-26  3:09 UTC (permalink / raw)
  To: emacs-devel

On Fri, 22 Apr 2011 09:20:00 -0500 Ted Zlatanov <tzz@lifelogs.com> wrote: 

>>> Sure, but I'd rather collaborate if I can.  The easiest thing (just keep
>>> markchars.el in the GNU ELPA) is not the best thing for the users.
...
TZ> OK, I'll mirror it.  I don't expect it to become a problem.

When markchars.el is up to version 0.2 (I try to use semantic
versioning, so that became 0.2.0) in the nXhtml Launchpad repository,
let me know and I'll set up mirroring.  Until then I've checked my
version 0.2.0 into the GNU ELPA Bazaar repository.  I did not deploy it
yet.

Chong or Stefan, can you please look at markchars.el (and epoch-view.el,
which I also checked in tonight) and let me know if I need to fix
anything before deploying?

idn.el depends on nXhtml:

(require 'nxhtml-base)
(defvar uts39-datadir (expand-file-name "etc/uts39/" nxhtml-install-dir))

so I don't think it can be mirrored yet.  Let me know when and if that
changes and how you want to do it.  With the GNU ELPA we can make idn.el
a package tarball that also has the confusables.txt file.

Thanks
Ted

^ permalink raw reply	[flat|nested] 182+ messages in thread

* package management proposals for Emacs (was: face for non-ASCII characters)
  2011-04-22 17:12                                                                                               ` Lennart Borgman
@ 2011-04-26  3:14                                                                                                 ` Ted Zlatanov
  2011-04-26  8:10                                                                                                   ` Lennart Borgman
  0 siblings, 1 reply; 182+ messages in thread
From: Ted Zlatanov @ 2011-04-26  3:14 UTC (permalink / raw)
  To: emacs-devel

On Fri, 22 Apr 2011 19:12:15 +0200 Lennart Borgman <lennart.borgman@gmail.com> wrote: 

LB> I think it would make most sense as an enhancemen to ELPA since the
LB> libraries within ELPA should be good to use so we do not have to worry
LB> about something bad getting installed this way.

OK, you can propose it later when the GNU ELPA (note the difference
between "ELPA" and "GNU ELPA" btw) has been around for a bit.

Meanwhile check out emacsmirror and el-get, they are interesting and go
in a different direction from nXhtml's package management and from the
ELPA model, and there's plenty of overlap.

Ted

^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: package management proposals for Emacs (was: face for non-ASCII characters)
  2011-04-26  3:14                                                                                                 ` package management proposals for Emacs (was: face for non-ASCII characters) Ted Zlatanov
@ 2011-04-26  8:10                                                                                                   ` Lennart Borgman
  2011-04-26 21:46                                                                                                     ` Richard Stallman
  0 siblings, 1 reply; 182+ messages in thread
From: Lennart Borgman @ 2011-04-26  8:10 UTC (permalink / raw)
  To: Ted Zlatanov; +Cc: emacs-devel

2011/4/26 Ted Zlatanov <tzz@lifelogs.com>:
> On Fri, 22 Apr 2011 19:12:15 +0200 Lennart Borgman <lennart.borgman@gmail.com> wrote:
>
> LB> I think it would make most sense as an enhancemen to ELPA since the
> LB> libraries within ELPA should be good to use so we do not have to worry
> LB> about something bad getting installed this way.
>
> OK, you can propose it later when the GNU ELPA (note the difference
> between "ELPA" and "GNU ELPA" btw) has been around for a bit.

Yes, I want the software used for GNU ELPA to stabilize first and then
propose it.

> Meanwhile check out emacsmirror and el-get, they are interesting and go
> in a different direction from nXhtml's package management and from the
> ELPA model, and there's plenty of overlap.
>
> Ted



^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: markchars.el 0.2.0 and idn.el (was: face for non-ASCII characters)
  2011-04-26  3:09                                                                                               ` markchars.el 0.2.0 and idn.el (was: face for non-ASCII characters) Ted Zlatanov
@ 2011-04-26  8:13                                                                                                 ` Lennart Borgman
  2011-04-26 15:28                                                                                                   ` idn.el and confusables.txt (was: markchars.el 0.2.0 and idn.el) Ted Zlatanov
  0 siblings, 1 reply; 182+ messages in thread
From: Lennart Borgman @ 2011-04-26  8:13 UTC (permalink / raw)
  To: Ted Zlatanov; +Cc: emacs-devel

2011/4/26 Ted Zlatanov <tzz@lifelogs.com>:
> On Fri, 22 Apr 2011 09:20:00 -0500 Ted Zlatanov <tzz@lifelogs.com> wrote:
>
>>>> Sure, but I'd rather collaborate if I can.  The easiest thing (just keep
>>>> markchars.el in the GNU ELPA) is not the best thing for the users.
> ...
> TZ> OK, I'll mirror it.  I don't expect it to become a problem.
>
> When markchars.el is up to version 0.2 (I try to use semantic
> versioning, so that became 0.2.0) in the nXhtml Launchpad repository,
> let me know and I'll set up mirroring.  Until then I've checked my
> version 0.2.0 into the GNU ELPA Bazaar repository.  I did not deploy it
> yet.

Ok.

> Chong or Stefan, can you please look at markchars.el (and epoch-view.el,
> which I also checked in tonight) and let me know if I need to fix
> anything before deploying?
>
> idn.el depends on nXhtml:
>
> (require 'nxhtml-base)
> (defvar uts39-datadir (expand-file-name "etc/uts39/" nxhtml-install-dir))

Ah, I forgot that dependency. I will have a look at it. Do you have a
better proposal for this?

> so I don't think it can be mirrored yet.  Let me know when and if that
> changes and how you want to do it.  With the GNU ELPA we can make idn.el
> a package tarball that also has the confusables.txt file.

Thanks. I can't see any copyright problem with confusables.txt, but I
am not sure. It would be good to have this checked.

> Thanks
> Ted



^ permalink raw reply	[flat|nested] 182+ messages in thread

* idn.el and confusables.txt (was: markchars.el 0.2.0 and idn.el)
  2011-04-26  8:13                                                                                                 ` Lennart Borgman
@ 2011-04-26 15:28                                                                                                   ` Ted Zlatanov
  2011-05-13 19:42                                                                                                     ` idn.el and confusables.txt Stefan Monnier
  2011-05-31 10:42                                                                                                     ` uni-confusables 0.1 is on the Emacs ELPA branch (was: idn.el and confusables.txt) Ted Zlatanov
  0 siblings, 2 replies; 182+ messages in thread
From: Ted Zlatanov @ 2011-04-26 15:28 UTC (permalink / raw)
  To: emacs-devel

On Tue, 26 Apr 2011 10:13:34 +0200 Lennart Borgman <lennart.borgman@gmail.com> wrote: 

LB> 2011/4/26 Ted Zlatanov <tzz@lifelogs.com>:
>> idn.el depends on nXhtml:
>> 
>> (require 'nxhtml-base)
>> (defvar uts39-datadir (expand-file-name "etc/uts39/" nxhtml-install-dir))

LB> Ah, I forgot that dependency. I will have a look at it. Do you have a
LB> better proposal for this?

I would look at the Unicode copyrights on confusables.txt and figure out
if they need to be considered.  Since Emacs already includes many
transformed Unicode files, I imagine it's OK, but don't know for sure.

Once we figure that out, I will bundle idn.el with the confusables.txt
and make it a tarball-style ELPA package.  

IMHO idn.el and confusables.txt should go into the Emacs trunk so they
can generate first-class character properties for `C-u x ='.  Stefan,
Chong, what do you think?

Thanks
Ted

^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: face for non-ASCII characters
  2011-04-25 17:54                                                                                                     ` Richard Stallman
@ 2011-04-26 18:26                                                                                                       ` Chong Yidong
  2011-04-26 19:05                                                                                                         ` Ted Zlatanov
  2011-04-27 12:41                                                                                                         ` Lennart Borgman
  0 siblings, 2 replies; 182+ messages in thread
From: Chong Yidong @ 2011-04-26 18:26 UTC (permalink / raw)
  To: rms; +Cc: stephen, Lennart Borgman, tzz, emacs-devel

Richard Stallman <rms@gnu.org> writes:

>     After that I changed it so the user can review and decide at each
>     opportunistic install.
>
> This is not a binary question, it's a scale.  You want to take a step
> towards the wrong end of the scale.  We are just introducing the
> separate package archive, which is a big step.  Now is too soon to
> consider another step.  The risk here is hard to calculate, so we must
> be cautious.

For what it's worth, I don't think general-case opportunistic
installation is a good idea, even from a strictly technical standpoint.

One can imagine special instances where one might wish to prompt the
user, at a very specific point during the execution of a command, to
install a package.  In fact, this can be done easily using the existing
package.el infrastructure (or even without package.el; this is Lisp,
after all).

But such program behavior should not be encouraged in general.  In the
vast majority of forseeable Emacs-relevant situations, opportunistic
installation is a false optimization.  It introduces more moving parts
into a program and obnoxiously makes the user wait for a download in the
middle of a command---all for the purpose of saving trivial amounts of
disk space.  In practically all instances, package dependencies are the
proper way to go.

So the functionality that Lennart is promoting is not something that I'm
interested in at all.

^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: face for non-ASCII characters
  2011-04-26 18:26                                                                                                       ` Chong Yidong
@ 2011-04-26 19:05                                                                                                         ` Ted Zlatanov
  2011-04-26 20:29                                                                                                           ` Chong Yidong
  2011-04-27 12:41                                                                                                         ` Lennart Borgman
  1 sibling, 1 reply; 182+ messages in thread
From: Ted Zlatanov @ 2011-04-26 19:05 UTC (permalink / raw)
  To: emacs-devel

On Tue, 26 Apr 2011 14:26:13 -0400 Chong Yidong <cyd@stupidchicken.com> wrote: 

CY> Richard Stallman <rms@gnu.org> writes:
>> After that I changed it so the user can review and decide at each
>> opportunistic install.
>> 
>> This is not a binary question, it's a scale.  You want to take a step
>> towards the wrong end of the scale.  We are just introducing the
>> separate package archive, which is a big step.  Now is too soon to
>> consider another step.  The risk here is hard to calculate, so we must
>> be cautious.

CY> For what it's worth, I don't think general-case opportunistic
CY> installation is a good idea, even from a strictly technical standpoint.

CY> One can imagine special instances where one might wish to prompt the
CY> user, at a very specific point during the execution of a command, to
CY> install a package.  In fact, this can be done easily using the existing
CY> package.el infrastructure (or even without package.el; this is Lisp,
CY> after all).

OTOH it's very nice to be told "hey, you could have X if you installed
package Y."  I think it's good to know about these opportunities at
customization time.

For example, markchars.el has optional behavior if idn.el is installed.
It would be nice if the customization for `markchars-what' connected the
'markchars-nonidn-fun option with the idn.el package, so that 1) the
user knows there's a connection, and 2) the user can choose to enable
'markchars-nonidn-fun and *at that time* must install idn.el as well.

This could be done in customization code like this:

(defcustom markchars-what
  `(markchars-simple-pattern
    markchars-confusables
    ,@(when (fboundp 'idn-is-recommended) '(markchars-nonidn-fun)))
  "Things to mark, a list of regular expressions or symbols."
  :type `(repeat (choice :tag "Marking choices"
                         (const
                          :required-feature 'idn ; NEW Custom tag
                          :tag "Non IDN chars (Unicode.org tr39 suggestions)"
                          markchars-nonidn-fun)
                         (const :tag "Confusables" markchars-confusables)
                         (const :tag "`markchars-simple-pattern'"
                                markchars-simple-pattern)
                         (regexp :tag "Arbitrary pattern")))
  :group 'markchars)

Note the awkwardness of the default, which could also be remedied by
this new tag.

Ted

^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: face for non-ASCII characters
  2011-04-26 19:05                                                                                                         ` Ted Zlatanov
@ 2011-04-26 20:29                                                                                                           ` Chong Yidong
  2011-04-27  3:45                                                                                                             ` Ted Zlatanov
  0 siblings, 1 reply; 182+ messages in thread
From: Chong Yidong @ 2011-04-26 20:29 UTC (permalink / raw)
  To: Ted Zlatanov; +Cc: emacs-devel

Ted Zlatanov <tzz@lifelogs.com> writes:

> OTOH it's very nice to be told "hey, you could have X if you installed
> package Y."  I think it's good to know about these opportunities at
> customization time.

That's more similar to "recommended packages" in OS-level packaging.
I don't object, in principle, to adding functionality along those lines.

But the markchars.el and idn.el situation actually points to an
important danger.  IIUC, the straightforward solution would be to make
idn.el a dependency, but that's been made difficult by the tangle of
copyright and other problems in nXhtml.  So this kind of functionality
might tend to encourage end-runs around mainline Emacs ("come use my
version, and all this extra functionality is magically turned on!").

^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: package management proposals for Emacs (was: face for non-ASCII characters)
  2011-04-26  8:10                                                                                                   ` Lennart Borgman
@ 2011-04-26 21:46                                                                                                     ` Richard Stallman
  2011-04-27  1:19                                                                                                       ` package management proposals for Emacs Stefan Monnier
  0 siblings, 1 reply; 182+ messages in thread
From: Richard Stallman @ 2011-04-26 21:46 UTC (permalink / raw)
  To: Lennart Borgman; +Cc: tzz, emacs-devel

    > OK, you can propose it later when the GNU ELPA (note the difference
    > between "ELPA" and "GNU ELPA" btw) has been around for a bit.

Can't we choose some other name for our package archive
that doesn't cause confusion with another one?

-- 
Dr Richard Stallman
President, Free Software Foundation
51 Franklin St
Boston MA 02110
USA
www.fsf.org, www.gnu.org
Skype: No way! That's nonfree (freedom-denying) software.




^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: package management proposals for Emacs
  2011-04-26 21:46                                                                                                     ` Richard Stallman
@ 2011-04-27  1:19                                                                                                       ` Stefan Monnier
  2011-04-27  3:36                                                                                                         ` Ted Zlatanov
  2011-04-27 21:14                                                                                                         ` Richard Stallman
  0 siblings, 2 replies; 182+ messages in thread
From: Stefan Monnier @ 2011-04-27  1:19 UTC (permalink / raw)
  To: rms; +Cc: tzz, Lennart Borgman, emacs-devel

>> OK, you can propose it later when the GNU ELPA (note the difference
>> between "ELPA" and "GNU ELPA" btw) has been around for a bit.
> Can't we choose some other name for our package archive
> that doesn't cause confusion with another one?

Ours is an ELPA, it's just not the same one as the one originally setup
by the original author of package.el.  So what we need to is to avoid
using "ELPA" without specifying which one (GNU's, Tom's, or some other one).


        Stefan



^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: package management proposals for Emacs
  2011-04-27  1:19                                                                                                       ` package management proposals for Emacs Stefan Monnier
@ 2011-04-27  3:36                                                                                                         ` Ted Zlatanov
  2011-04-27 21:14                                                                                                         ` Richard Stallman
  1 sibling, 0 replies; 182+ messages in thread
From: Ted Zlatanov @ 2011-04-27  3:36 UTC (permalink / raw)
  To: emacs-devel

On Tue, 26 Apr 2011 22:19:35 -0300 Stefan Monnier <monnier@iro.umontreal.ca> wrote: 

>>> OK, you can propose it later when the GNU ELPA (note the difference
>>> between "ELPA" and "GNU ELPA" btw) has been around for a bit.
>> Can't we choose some other name for our package archive
>> that doesn't cause confusion with another one?

SM> Ours is an ELPA, it's just not the same one as the one originally setup
SM> by the original author of package.el.  So what we need to is to avoid
SM> using "ELPA" without specifying which one (GNU's, Tom's, or some other one).

It adds to the confusion that Tom's ELPA calls itself "the ELPA."

I try to be consistent in the use of "GNU ELPA," at least, but we
already had a discussion about this.

Ted




^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: face for non-ASCII characters
  2011-04-26 20:29                                                                                                           ` Chong Yidong
@ 2011-04-27  3:45                                                                                                             ` Ted Zlatanov
  2011-04-27  4:42                                                                                                               ` Stephen J. Turnbull
  0 siblings, 1 reply; 182+ messages in thread
From: Ted Zlatanov @ 2011-04-27  3:45 UTC (permalink / raw)
  To: emacs-devel

On Tue, 26 Apr 2011 16:29:39 -0400 Chong Yidong <cyd@stupidchicken.com> wrote: 

CY> Ted Zlatanov <tzz@lifelogs.com> writes:
>> OTOH it's very nice to be told "hey, you could have X if you installed
>> package Y."  I think it's good to know about these opportunities at
>> customization time.

CY> That's more similar to "recommended packages" in OS-level packaging.
CY> I don't object, in principle, to adding functionality along those lines.

CY> But the markchars.el and idn.el situation actually points to an
CY> important danger.  IIUC, the straightforward solution would be to make
CY> idn.el a dependency, but that's been made difficult by the tangle of
CY> copyright and other problems in nXhtml.  So this kind of functionality
CY> might tend to encourage end-runs around mainline Emacs ("come use my
CY> version, and all this extra functionality is magically turned on!").

We could enforce the rule that you're only allowed to recommend other
packages that live within the same ELPA or something even more carefully
controlled (a fsf-kevin-bacon-number equal or greater than the current
one, if you know what I mean).  So the Emacs core can recommend anything
else in the core.  The GNU ELPA can recommend things in the GNU ELPA or
the Emacs core.  Tom Tromey's ELPA can recommend things in itself or the
GNU ELPA or the Emacs core...

Ted

^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: face for non-ASCII characters
  2011-04-27  3:45                                                                                                             ` Ted Zlatanov
@ 2011-04-27  4:42                                                                                                               ` Stephen J. Turnbull
  2011-05-02 18:18                                                                                                                 ` Ted Zlatanov
  0 siblings, 1 reply; 182+ messages in thread
From: Stephen J. Turnbull @ 2011-04-27  4:42 UTC (permalink / raw)
  To: Ted Zlatanov; +Cc: emacs-devel

Ted Zlatanov writes:
 > On Tue, 26 Apr 2011 16:29:39 -0400 Chong Yidong <cyd@stupidchicken.com> wrote: 

 > CY> So this kind of functionality might tend to encourage end-runs
 > CY> around mainline Emacs ("come use my version, and all this extra
 > CY> functionality is magically turned on!").

That ship sailed when Richard released Emacs under a free license.
Cf. the Mac versions, or the 15-year history of Mule before it was
included in the mainline, or Epoch.  Experience shows that most users
return to the mainline.

 > We could enforce the rule that you're only allowed to recommend other
 > packages that live within the same ELPA or something even more carefully
 > controlled (a fsf-kevin-bacon-number equal or greater than the current
 > one, if you know what I mean).

You can't (and shouldn't try to) enforce anything.  If somebody finds
the restrictions onerous, they'll fork the package library (or all of
Emacs, whichever seems more convenient at the time).  If others find
the restrictions onerous, they'll install the forked version.  That's
what free software means.

Eventually there will be a "google this package and install it for me"
functionality.  I think a better way to go is to add a way to filter
on that functionality such that the user can white- or blacklist
certain URLs, and the package manager will query for URLs not matched
by either list.  Default the whitelist to (<GNU ELPA>) and the
blacklist to t (ie, everything not in the whitelist).

Cf. the Debian free, contrib, non-free hierarchy, and Gentoo's
allowed-licenses facilty.

^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: face for non-ASCII characters
  2011-04-26 18:26                                                                                                       ` Chong Yidong
  2011-04-26 19:05                                                                                                         ` Ted Zlatanov
@ 2011-04-27 12:41                                                                                                         ` Lennart Borgman
  1 sibling, 0 replies; 182+ messages in thread
From: Lennart Borgman @ 2011-04-27 12:41 UTC (permalink / raw)
  To: Chong Yidong; +Cc: tzz, stephen, rms, emacs-devel

On Tue, Apr 26, 2011 at 8:26 PM, Chong Yidong <cyd@stupidchicken.com> wrote:
> Richard Stallman <rms@gnu.org> writes:
>
>>     After that I changed it so the user can review and decide at each
>>     opportunistic install.
>>
>> This is not a binary question, it's a scale.  You want to take a step
>> towards the wrong end of the scale.  We are just introducing the
>> separate package archive, which is a big step.  Now is too soon to
>> consider another step.  The risk here is hard to calculate, so we must
>> be cautious.
...
>
> But such program behavior should not be encouraged in general.  In the
> vast majority of forseeable Emacs-relevant situations, opportunistic
> installation is a false optimization.  It introduces more moving parts
> into a program and obnoxiously makes the user wait for a download in the
> middle of a command---all for the purpose of saving trivial amounts of
> disk space.  In practically all instances, package dependencies are the
> proper way to go.

I think this contains some misunderstandings. Opportunistic install is
not (mainly) for saving disk space. And package dependencies only is
in my opinion too rough.

The main advantage of opportunistic install is easier structure. It is
basically the same thing as (require ...), but from the network if
needed.

Using only package dependencies makes it hard to "announce" new
possibilities. In the case of nXhtml I have tried to announce
possibilities by placing them in menus. If only package dependencies
where used that would mean installing the menus for nXhtml would
install all of nXhtml.



^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: package management proposals for Emacs
  2011-04-27  1:19                                                                                                       ` package management proposals for Emacs Stefan Monnier
  2011-04-27  3:36                                                                                                         ` Ted Zlatanov
@ 2011-04-27 21:14                                                                                                         ` Richard Stallman
  1 sibling, 0 replies; 182+ messages in thread
From: Richard Stallman @ 2011-04-27 21:14 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: tzz, lennart.borgman, emacs-devel

    Ours is an ELPA, it's just not the same one as the one originally setup
    by the original author of package.el.  So what we need to is to avoid
    using "ELPA" without specifying which one (GNU's, Tom's, or some other one).
Sure, but that's a mistake waiting to happen.

Why not call it the GNU Emacs Package Archive, GEPA?

-- 
Dr Richard Stallman
President, Free Software Foundation
51 Franklin St
Boston MA 02110
USA
www.fsf.org, www.gnu.org
Skype: No way! That's nonfree (freedom-denying) software.




^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: face for non-ASCII characters
  2011-04-27  4:42                                                                                                               ` Stephen J. Turnbull
@ 2011-05-02 18:18                                                                                                                 ` Ted Zlatanov
  2011-05-03  1:50                                                                                                                   ` Stephen J. Turnbull
  0 siblings, 1 reply; 182+ messages in thread
From: Ted Zlatanov @ 2011-05-02 18:18 UTC (permalink / raw)
  To: emacs-devel

On Wed, 27 Apr 2011 13:42:28 +0900 "Stephen J. Turnbull" <stephen@xemacs.org> wrote: 

SJT> Ted Zlatanov writes:

>> We could enforce the rule that you're only allowed to recommend other
>> packages that live within the same ELPA or something even more carefully
>> controlled (a fsf-kevin-bacon-number equal or greater than the current
>> one, if you know what I mean).

SJT> You can't (and shouldn't try to) enforce anything.

Within GNU Emacs and the GNU ELPA, many things can (and are) enforced.
Obviously I'm not speaking about things outside that sphere.

SJT> If somebody finds the restrictions onerous, they'll fork the
SJT> package library (or all of Emacs, whichever seems more convenient
SJT> at the time).  If others find the restrictions onerous, they'll
SJT> install the forked version.  That's what free software means.

That's even more bizarre, defining free software by someone's actions.

SJT> Eventually there will be a "google this package and install it for me"
SJT> functionality.  I think a better way to go is to add a way to filter
SJT> on that functionality such that the user can white- or blacklist
SJT> certain URLs, and the package manager will query for URLs not matched
SJT> by either list.  Default the whitelist to (<GNU ELPA>) and the
SJT> blacklist to t (ie, everything not in the whitelist).

This is what package.el does, minus the Google dependency and using
symbolic package names instead of URLs.  The user just adds whatever
archive URLs he likes to the list.

Ted

^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: face for non-ASCII characters
  2011-05-02 18:18                                                                                                                 ` Ted Zlatanov
@ 2011-05-03  1:50                                                                                                                   ` Stephen J. Turnbull
  2011-05-03 14:45                                                                                                                     ` Ted Zlatanov
  0 siblings, 1 reply; 182+ messages in thread
From: Stephen J. Turnbull @ 2011-05-03  1:50 UTC (permalink / raw)
  To: Ted Zlatanov; +Cc: emacs-devel

Ted Zlatanov writes:

 > SJT> You can't (and shouldn't try to) enforce anything.
 > 
 > Within GNU Emacs and the GNU ELPA, many things can (and are) enforced.
 > Obviously I'm not speaking about things outside that sphere.

Including a passel of copyleft licenses, whose justification for
existence is reaching outside that sphere; it is not obvious what you
are talking about until you say it.

 > That's even more bizarre, defining free software by someone's
 > actions.

*sigh*  There is no other way to define it than by what people are
permitted to do, ie, their actions.  What is permitted will be done,
by somebody somewhere sometime.




^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: face for non-ASCII characters
  2011-05-03  1:50                                                                                                                   ` Stephen J. Turnbull
@ 2011-05-03 14:45                                                                                                                     ` Ted Zlatanov
  2011-05-03 21:21                                                                                                                       ` Lennart Borgman
  0 siblings, 1 reply; 182+ messages in thread
From: Ted Zlatanov @ 2011-05-03 14:45 UTC (permalink / raw)
  To: emacs-devel

On Tue, 03 May 2011 10:50:46 +0900 "Stephen J. Turnbull" <stephen@xemacs.org> wrote: 

SJT> What is permitted will be done, by somebody somewhere sometime.

I hereby permit the ignition of the Horsehead Nebula in my honor.

(It's worth a try, and would improve the Milky Way real estate values.)

Ted




^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: face for non-ASCII characters
  2011-05-03 14:45                                                                                                                     ` Ted Zlatanov
@ 2011-05-03 21:21                                                                                                                       ` Lennart Borgman
  2011-05-04 14:41                                                                                                                         ` Stephen J. Turnbull
  0 siblings, 1 reply; 182+ messages in thread
From: Lennart Borgman @ 2011-05-03 21:21 UTC (permalink / raw)
  To: Ted Zlatanov; +Cc: emacs-devel

2011/5/3 Ted Zlatanov <tzz@lifelogs.com>:
> On Tue, 03 May 2011 10:50:46 +0900 "Stephen J. Turnbull" <stephen@xemacs.org> wrote:
>
> SJT> What is permitted will be done, by somebody somewhere sometime.
>
> I hereby permit the ignition of the Horsehead Nebula in my honor.
>
> (It's worth a try, and would improve the Milky Way real estate values.)

So now the experimental verification of the rule has started. We just
have to wait and see. If the ignition never ever happens we are done.
C U



^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: face for non-ASCII characters
  2011-05-03 21:21                                                                                                                       ` Lennart Borgman
@ 2011-05-04 14:41                                                                                                                         ` Stephen J. Turnbull
  0 siblings, 0 replies; 182+ messages in thread
From: Stephen J. Turnbull @ 2011-05-04 14:41 UTC (permalink / raw)
  To: Lennart Borgman; +Cc: Ted Zlatanov, emacs-devel

Lennart Borgman writes:
 > 2011/5/3 Ted Zlatanov <tzz@lifelogs.com>:
 > > On Tue, 03 May 2011 10:50:46 +0900 "Stephen J. Turnbull" <stephen@xemacs.org> wrote:
 > >
 > > SJT> What is permitted will be done, by somebody somewhere sometime.
 > >
 > > I hereby permit the ignition of the Horsehead Nebula in my honor.
 > >
 > > (It's worth a try, and would improve the Milky Way real estate values.)
 > 
 > So now the experimental verification of the rule has started.

Ted doesn't have the authority to permit that.  You wait in vain.




^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: idn.el and confusables.txt
  2011-04-26 15:28                                                                                                   ` idn.el and confusables.txt (was: markchars.el 0.2.0 and idn.el) Ted Zlatanov
@ 2011-05-13 19:42                                                                                                     ` Stefan Monnier
  2011-05-13 20:19                                                                                                       ` Ted Zlatanov
  2011-05-14  8:06                                                                                                       ` Eli Zaretskii
  2011-05-31 10:42                                                                                                     ` uni-confusables 0.1 is on the Emacs ELPA branch (was: idn.el and confusables.txt) Ted Zlatanov
  1 sibling, 2 replies; 182+ messages in thread
From: Stefan Monnier @ 2011-05-13 19:42 UTC (permalink / raw)
  To: Ted Zlatanov; +Cc: emacs-devel

> IMHO idn.el and confusables.txt should go into the Emacs trunk so they
> can generate first-class character properties for `C-u x ='.  Stefan,
> Chong, what do you think?

I don't know enough about the way we handle Unicode tables to know.
It does sound like confusables.txt could be turned into
a lisp/international/uni-confusables.el, but I don't know whether there
is a large benefit from having it part of Emacs as opposed to having it
in GNU ELPA.  As for idn.el, I haven't seen the file, and don't know
what uses it, so I can't judge.


        Stefan



^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: idn.el and confusables.txt
  2011-05-13 19:42                                                                                                     ` idn.el and confusables.txt Stefan Monnier
@ 2011-05-13 20:19                                                                                                       ` Ted Zlatanov
  2011-05-14  8:13                                                                                                         ` Eli Zaretskii
  2011-05-14  8:06                                                                                                       ` Eli Zaretskii
  1 sibling, 1 reply; 182+ messages in thread
From: Ted Zlatanov @ 2011-05-13 20:19 UTC (permalink / raw)
  To: emacs-devel

On Fri, 13 May 2011 16:42:57 -0300 Stefan Monnier <monnier@iro.umontreal.ca> wrote: 

>> IMHO idn.el and confusables.txt should go into the Emacs trunk so they
>> can generate first-class character properties for `C-u x ='.  Stefan,
>> Chong, what do you think?

SM> I don't know enough about the way we handle Unicode tables to know.
SM> It does sound like confusables.txt could be turned into
SM> a lisp/international/uni-confusables.el, but I don't know whether there
SM> is a large benefit from having it part of Emacs as opposed to having it
SM> in GNU ELPA.  As for idn.el, I haven't seen the file, and don't know
SM> what uses it, so I can't judge.

Looking at lisp/international/uni-uppercase.el for example, it seems
possible to generate something similar from confusables.txt that would
live in the GNU ELPA and can be loaded by other packages.  I don't see
anything that requires the code to live in Emacs.  Ditto for idn.el.
But it would be nice if we could reuse the admin/unidata infrastructure
to generate uni-confusables.el and uni-idn.el.  Could that be connected
with the GNU ELPA somehow?

If Juanma Barranquero or Kenichi Handa (two recent committers to that
area) could comment, that would be great.

Ted

^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: idn.el and confusables.txt
  2011-05-13 19:42                                                                                                     ` idn.el and confusables.txt Stefan Monnier
  2011-05-13 20:19                                                                                                       ` Ted Zlatanov
@ 2011-05-14  8:06                                                                                                       ` Eli Zaretskii
  2011-05-14  8:56                                                                                                         ` Lennart Borgman
                                                                                                                           ` (2 more replies)
  1 sibling, 3 replies; 182+ messages in thread
From: Eli Zaretskii @ 2011-05-14  8:06 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: tzz, emacs-devel

> From: Stefan Monnier <monnier@iro.umontreal.ca>
> Date: Fri, 13 May 2011 16:42:57 -0300
> Cc: emacs-devel@gnu.org
> 
> > IMHO idn.el and confusables.txt should go into the Emacs trunk so they
> > can generate first-class character properties for `C-u x ='.  Stefan,
> > Chong, what do you think?
> 
> I don't know enough about the way we handle Unicode tables to know.

We create char-tables from them.  But I'm not sure I understand the
question, so maybe my answer is not helpful.

> It does sound like confusables.txt could be turned into
> a lisp/international/uni-confusables.el, but I don't know whether there
> is a large benefit from having it part of Emacs as opposed to having it
> in GNU ELPA.  As for idn.el, I haven't seen the file, and don't know
> what uses it, so I can't judge.

What is idn.el? where can I see it?  And how and where would we like
to use it?  I searched the relevant threads (which were all spin-offs
of other threads, which didn't help searching for the info), but
didn't find any pointers.  Apologies if I missed something.

You see, the uni-*.el files we create out of the Unicode DB are not
used anywhere in application code, AFAIK.  We use them to display
character properties in the likes of "C-u C-x =", and that's it.  I'm
not even sure they are organized in a way that makes them useful.
E.g., when I needed to use the Unicode bidirectional properties for
bidi reordering, I eventually was forced to create my own tables (see
src/biditype.h and src/bidimirror.h, and the corresponding Awk scripts
in admin/unidata/) which lend themselves well to using them in
real-life code.

So I'd really like to avoid introducing yet another huge table whose
only effects are to show one more property in "C-u C-x =" and bloat
the ELisp manual some more.

Can we please have some preliminary ideas and design for using the
"confusables" information and the IDNA protocol in Emacs, before we
decide whether and how to include them?

^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: idn.el and confusables.txt
  2011-05-13 20:19                                                                                                       ` Ted Zlatanov
@ 2011-05-14  8:13                                                                                                         ` Eli Zaretskii
  0 siblings, 0 replies; 182+ messages in thread
From: Eli Zaretskii @ 2011-05-14  8:13 UTC (permalink / raw)
  To: Ted Zlatanov; +Cc: emacs-devel

> From: Ted Zlatanov <tzz@lifelogs.com>
> Date: Fri, 13 May 2011 15:19:34 -0500
> 
> But it would be nice if we could reuse the admin/unidata infrastructure
> to generate uni-confusables.el and uni-idn.el.

AFAIK, admin/unidata/unidata-gen.el can only parse the format of the
UnicodeData.txt file.  confusables.txt is in different format, so I
don't think you can reuse unidata-gen.el for that.



^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: idn.el and confusables.txt
  2011-05-14  8:06                                                                                                       ` Eli Zaretskii
@ 2011-05-14  8:56                                                                                                         ` Lennart Borgman
  2011-05-14  9:36                                                                                                           ` Eli Zaretskii
  2011-05-14 13:40                                                                                                         ` Ted Zlatanov
  2011-05-15 13:06                                                                                                         ` Kenichi Handa
  2 siblings, 1 reply; 182+ messages in thread
From: Lennart Borgman @ 2011-05-14  8:56 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: tzz, Stefan Monnier, emacs-devel

On Sat, May 14, 2011 at 10:06 AM, Eli Zaretskii <eliz@gnu.org> wrote:
>
> What is idn.el? where can I see it?

It is currently part of nXhtml:
http://bazaar.launchpad.net/~nxhtml/nxhtml/main/files/head:/util/

> And how and where would we like
> to use it?  I searched the relevant threads (which were all spin-offs
> of other threads, which didn't help searching for the info), but
> didn't find any pointers.  Apologies if I missed something.

It is used by markchars.el. (Which I have not yet updated I think.)



^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: idn.el and confusables.txt
  2011-05-14  8:56                                                                                                         ` Lennart Borgman
@ 2011-05-14  9:36                                                                                                           ` Eli Zaretskii
  0 siblings, 0 replies; 182+ messages in thread
From: Eli Zaretskii @ 2011-05-14  9:36 UTC (permalink / raw)
  To: Lennart Borgman; +Cc: tzz, monnier, emacs-devel

> From: Lennart Borgman <lennart.borgman@gmail.com>
> Date: Sat, 14 May 2011 10:56:33 +0200
> Cc: Stefan Monnier <monnier@iro.umontreal.ca>, tzz@lifelogs.com, emacs-devel@gnu.org
> 
> On Sat, May 14, 2011 at 10:06 AM, Eli Zaretskii <eliz@gnu.org> wrote:
> >
> > What is idn.el? where can I see it?
> 
> It is currently part of nXhtml:
> http://bazaar.launchpad.net/~nxhtml/nxhtml/main/files/head:/util/

Thanks.

> > And how and where would we like
> > to use it?  I searched the relevant threads (which were all spin-offs
> > of other threads, which didn't help searching for the info), but
> > didn't find any pointers.  Apologies if I missed something.
> 
> It is used by markchars.el. (Which I have not yet updated I think.)

So we are discussing addition of a feature that is only used by an
unbundled package, and then only to highlight certain characters, is
that right?




^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: idn.el and confusables.txt
  2011-05-14  8:06                                                                                                       ` Eli Zaretskii
  2011-05-14  8:56                                                                                                         ` Lennart Borgman
@ 2011-05-14 13:40                                                                                                         ` Ted Zlatanov
  2011-05-14 14:38                                                                                                           ` Eli Zaretskii
  2011-05-15 13:06                                                                                                         ` Kenichi Handa
  2 siblings, 1 reply; 182+ messages in thread
From: Ted Zlatanov @ 2011-05-14 13:40 UTC (permalink / raw)
  To: emacs-devel

On Sat, 14 May 2011 11:06:52 +0300 Eli Zaretskii <eliz@gnu.org> wrote: 

>> It does sound like confusables.txt could be turned into
>> a lisp/international/uni-confusables.el, but I don't know whether there
>> is a large benefit from having it part of Emacs as opposed to having it
>> in GNU ELPA.  As for idn.el, I haven't seen the file, and don't know
>> what uses it, so I can't judge.

EZ> What is idn.el? where can I see it?  And how and where would we like
EZ> to use it?  I searched the relevant threads (which were all spin-offs
EZ> of other threads, which didn't help searching for the info), but
EZ> didn't find any pointers.  Apologies if I missed something.

Both idn.el and confusables.txt can be used by markchars.el to show
suspicious characters, especially in URLs but also in other situations
(e.g. accidentally putting a Cyrillic о instead of the Latin o in e-mail
text).

The latest markchars.el is in the GNU ELPA, though the primary location
may continue to be nXhtml.

EZ> You see, the uni-*.el files we create out of the Unicode DB are not
EZ> used anywhere in application code, AFAIK.  We use them to display
EZ> character properties in the likes of "C-u C-x =", and that's it.  I'm
EZ> not even sure they are organized in a way that makes them useful.

markchars.el could use other Unicode properties if people ask.  But
specifically regarding the ones I'm proposing for inclusion, since we've
started using the GNU ELPA more and markchars.el lives in it, we can put
uni-confusables.el and uni-idn.el in the GNU ELPA instead of the Emacs
trunk.

EZ> So I'd really like to avoid introducing yet another huge table whose
EZ> only effects are to show one more property in "C-u C-x =" and bloat
EZ> the ELisp manual some more.

IMO it's not a huge table and should not bloat the manual significantly
if it was in the trunk.  There is useful extra information for each
character (the characters it can be confused with) which would grow the
char-table if it was included.  Also the char-table doesn't have to
cover the Asian confusables--I'm not sure anyone would need those.  So
there's some vagueness as far as the memory usage.

EZ> Can we please have some preliminary ideas and design for using the
EZ> "confusables" information and the IDNA protocol in Emacs, before we
EZ> decide whether and how to include them?

We've had literally hundreds of messages on this over the last year,
unfortunately over many threads as you pointed out.  My best attempt to
propose their usage would be as supplements to markchars.el, I can't
think of other uses currently.

EZ> So we are discussing addition of a feature that is only used by an
EZ> unbundled package, and then only to highlight certain characters, is
EZ> that right?

uni-confusables.el and uni-idn.el will define Unicode properties and not
a feature per se.  Thus they are really reference tables and not
functional changes to Emacs.  I point this out because the burden of
including them in the Emacs trunk is not large: some memory usage,
keeping the Unicode data updated, and the conversion scripts.

On Sat, 14 May 2011 11:13:46 +0300 Eli Zaretskii <eliz@gnu.org> wrote: 

EZ> AFAIK, admin/unidata/unidata-gen.el can only parse the format of the
EZ> UnicodeData.txt file.  confusables.txt is in different format, so I
EZ> don't think you can reuse unidata-gen.el for that.

OK, I'll set a converter up to live in the trunk or in the GNU ELPA when
the maintainers decide where uni-confusables.el and uni-idn.el should be.

Ted

^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: idn.el and confusables.txt
  2011-05-14 13:40                                                                                                         ` Ted Zlatanov
@ 2011-05-14 14:38                                                                                                           ` Eli Zaretskii
  2011-05-14 15:30                                                                                                             ` Ted Zlatanov
  2011-05-14 17:25                                                                                                             ` Stefan Monnier
  0 siblings, 2 replies; 182+ messages in thread
From: Eli Zaretskii @ 2011-05-14 14:38 UTC (permalink / raw)
  To: Ted Zlatanov; +Cc: emacs-devel

> From: Ted Zlatanov <tzz@lifelogs.com>
> Date: Sat, 14 May 2011 08:40:48 -0500
> 
> EZ> You see, the uni-*.el files we create out of the Unicode DB are not
> EZ> used anywhere in application code, AFAIK.  We use them to display
> EZ> character properties in the likes of "C-u C-x =", and that's it.  I'm
> EZ> not even sure they are organized in a way that makes them useful.
> 
> markchars.el could use other Unicode properties if people ask.

I'm talking about the details.  The way we currently set the tables in
uni-*.el is that many of the values are symbols.  For example:

  (get-char-code-property ?1 'general-category) => Nd
  (get-char-code-property ?א 'bidi-class) => R
  (get-char-code-property ?\( 'mirrored) => Y

The `Nd', `R', and `Y' are symbols.

Now, suppose you wanted to use these values in some code that needs to
be fast -- how would you feel about having to write multi-branch
`cond' forms to compare the value against all the possibilities?

For bidi reordering, which runs in the innermost loop of the display
engine, using the `bidi-class' or `mirrored' properties that are
symbols would be prohibitively expensive.

For now, with markchars.el, all you need is a boolean value for each
character.  However, in other use cases, some other Lisp code will
want the paired character.  Yet another application will want to
compare characters such that confusable pairs will compare equal.  Can
a single table satisfy all these needs efficiently?  Maybe it can, but
we need to design that table carefully.

> But specifically regarding the ones I'm proposing for inclusion,
> since we've started using the GNU ELPA more and markchars.el lives
> in it, we can put uni-confusables.el and uni-idn.el in the GNU ELPA
> instead of the Emacs trunk.

I'm not arguing about where to put them.  I'm saying that for such
basic infrastructure, we should consider the possible uses before we
rush into implementation.  Otherwise, we will again repeat the same
mistake, whose result is that the only real user of bidirectional
properties cannot use uni-bidi.el!

> EZ> So I'd really like to avoid introducing yet another huge table whose
> EZ> only effects are to show one more property in "C-u C-x =" and bloat
> EZ> the ELisp manual some more.
> 
> IMO it's not a huge table

??? It's a char-table that can be indexed by any character supported
by Emacs.  Even if you count only the characters mentioned in
confusables.txt, there are 20 thousand of them.  char-tables are
memory-efficient, but their footprint is not negligible.

The bloat may be insignificant by comparison, but if the _only_ useful
effect is the bloat, why should we do that?

> Also the char-table doesn't have to
> cover the Asian confusables--I'm not sure anyone would need those.

Well, the Unicode consortium definitely thought they were needed.
Either we follow established standards, or we don't.

^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: idn.el and confusables.txt
  2011-05-14 14:38                                                                                                           ` Eli Zaretskii
@ 2011-05-14 15:30                                                                                                             ` Ted Zlatanov
  2011-05-14 16:42                                                                                                               ` Eli Zaretskii
  2011-05-14 17:25                                                                                                             ` Stefan Monnier
  1 sibling, 1 reply; 182+ messages in thread
From: Ted Zlatanov @ 2011-05-14 15:30 UTC (permalink / raw)
  To: emacs-devel

On Sat, 14 May 2011 17:38:11 +0300 Eli Zaretskii <eliz@gnu.org> wrote: 

>> From: Ted Zlatanov <tzz@lifelogs.com>
>> Date: Sat, 14 May 2011 08:40:48 -0500
>> 
EZ> You see, the uni-*.el files we create out of the Unicode DB are not
EZ> used anywhere in application code, AFAIK.  We use them to display
EZ> character properties in the likes of "C-u C-x =", and that's it.  I'm
EZ> not even sure they are organized in a way that makes them useful.
>> 
>> markchars.el could use other Unicode properties if people ask.

EZ> I'm talking about the details.  The way we currently set the tables in
EZ> uni-*.el is that many of the values are symbols.  For example:

EZ>   (get-char-code-property ?1 'general-category) => Nd
EZ>   (get-char-code-property ?א 'bidi-class) => R
EZ>   (get-char-code-property ?\( 'mirrored) => Y

EZ> The `Nd', `R', and `Y' are symbols.

EZ> Now, suppose you wanted to use these values in some code that needs to
EZ> be fast -- how would you feel about having to write multi-branch
EZ> `cond' forms to compare the value against all the possibilities?

It wouldn't be ideal, surely, but most glyphs are not confusable so the
lookup would fail.  I might write some of it in C if performance was an
issue, or try to inline the conditions with macros, or cache the
lookups.  But I don't know if markchars.el needs to be terribly fast.
It runs at the font-lock level and IIUC that's opportunistic and not
time-critical like the display code.  For instance, unmodified text is
not rechecked, right?

EZ> For now, with markchars.el, all you need is a boolean value for each
EZ> character.  However, in other use cases, some other Lisp code will
EZ> want the paired character.  Yet another application will want to
EZ> compare characters such that confusable pairs will compare equal.  Can
EZ> a single table satisfy all these needs efficiently?  Maybe it can, but
EZ> we need to design that table carefully.

Two char-tables would be enough: one small table for the confusable ->
target mapping, and one even smaller for the reverse target ->
(confusable list) mapping.  The reverse lookup table could be stored in
an extra slot of the primary lookup table.

markchars.el could use this mapping to show more information than just
underlining the characters.  A tooltip could show why the glyph is
confusable, for instance.

>> Also the char-table doesn't have to
>> cover the Asian confusables--I'm not sure anyone would need those.

EZ> Well, the Unicode consortium definitely thought they were needed.
EZ> Either we follow established standards, or we don't.

You're right.  Also there are Asian characters that could be confused
for Latin characters so it's not safe to exclude them.

Ted

^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: idn.el and confusables.txt
  2011-05-14 15:30                                                                                                             ` Ted Zlatanov
@ 2011-05-14 16:42                                                                                                               ` Eli Zaretskii
  2011-05-14 17:06                                                                                                                 ` Ted Zlatanov
  0 siblings, 1 reply; 182+ messages in thread
From: Eli Zaretskii @ 2011-05-14 16:42 UTC (permalink / raw)
  To: Ted Zlatanov; +Cc: emacs-devel

> From: Ted Zlatanov <tzz@lifelogs.com>
> Date: Sat, 14 May 2011 10:30:37 -0500
> 
> It wouldn't be ideal, surely, but most glyphs are not confusable so the
> lookup would fail.

For some value of "most": there are 20K entries in confusables.txt.

> I might write some of it in C if performance was an issue

C won't help, if you need to access the same char-table and compare
with half a dozen possible symbols.

> or try to inline the conditions with macros, or cache the lookups.

Isn't it better to design the table for efficient use to begin with?

> But I don't know if markchars.el needs to be terribly fast.

I hope we are not introducing another character property for a single
use.  Some use, some day might need to do it fast.

> It runs at the font-lock level and IIUC that's opportunistic and not
> time-critical like the display code.  For instance, unmodified text is
> not rechecked, right?

No, you cannot count on that.  E.g., fontification-functions are
always called with a region that starts at the beginning of a line,
even if part of that line is already fontified.

> Two char-tables would be enough: one small table for the confusable ->
> target mapping, and one even smaller for the reverse target ->
> (confusable list) mapping.  The reverse lookup table could be stored in
> an extra slot of the primary lookup table.

Doesn't confusables.txt include both mappings already?  If so, you
don't need the reverse table.



^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: idn.el and confusables.txt
  2011-05-14 16:42                                                                                                               ` Eli Zaretskii
@ 2011-05-14 17:06                                                                                                                 ` Ted Zlatanov
  2011-05-14 20:59                                                                                                                   ` Eli Zaretskii
  0 siblings, 1 reply; 182+ messages in thread
From: Ted Zlatanov @ 2011-05-14 17:06 UTC (permalink / raw)
  To: emacs-devel

On Sat, 14 May 2011 19:42:39 +0300 Eli Zaretskii <eliz@gnu.org> wrote: 

EZ> Isn't it better to design the table for efficient use to begin with?

Yes, and I ask you and the other experts on char-tables to help with
that design.  I am far from an expert on that topic.

>> But I don't know if markchars.el needs to be terribly fast.

EZ> I hope we are not introducing another character property for a
EZ> single use.  Some use, some day might need to do it fast.

This is premature optimization.  I only have a single use in hand.
Let's make sure markchars.el is fast and we can optimize for other uses
when they are needed.

>> Two char-tables would be enough: one small table for the confusable ->
>> target mapping, and one even smaller for the reverse target ->
>> (confusable list) mapping.  The reverse lookup table could be stored in
>> an extra slot of the primary lookup table.

EZ> Doesn't confusables.txt include both mappings already?  If so, you
EZ> don't need the reverse table.

I thought the lookups would be faster with a reverse mapping in one of
the scenarios you listed (looking up all the characters that might be
confused with a given one).  But I realized it doesn't need to be.
Let's say C1, C2, and C3 are confusables mapped to C1.  Then the mapping
is C1 -> (C2, C3); C2 -> C1; and C3 -> C1.

The algorithm is "if a character maps to an atom it's confusable with
it, if it maps to a list the whole lisp is confusable to this
character."  So to find all the confusables mapped to a character you
need at most two lookups.

In addition to the character mapping we also need a confusable data
type, which can be SL/SA (single-script) or ML/MA (mixed-script).  I
don't know where to store that.  Maybe we can just have two char-tables
for the two data types.  There aren't going to be more data types
AFAIK.  But markchars.el can definitely use the knowledge that the
confusable is within a single script or not.

Does all of that make sense?

Ted

^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: idn.el and confusables.txt
  2011-05-14 14:38                                                                                                           ` Eli Zaretskii
  2011-05-14 15:30                                                                                                             ` Ted Zlatanov
@ 2011-05-14 17:25                                                                                                             ` Stefan Monnier
  1 sibling, 0 replies; 182+ messages in thread
From: Stefan Monnier @ 2011-05-14 17:25 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Ted Zlatanov, emacs-devel

> I'm not arguing about where to put them.  I'm saying that for such
> basic infrastructure, we should consider the possible uses before we
> rush into implementation.  Otherwise, we will again repeat the same

I recommend we keep it in the GNU ELPA for now, so it can more easily be
changed in the future.


        Stefan



^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: idn.el and confusables.txt
  2011-05-14 17:06                                                                                                                 ` Ted Zlatanov
@ 2011-05-14 20:59                                                                                                                   ` Eli Zaretskii
  2011-05-15  1:22                                                                                                                     ` Ted Zlatanov
  0 siblings, 1 reply; 182+ messages in thread
From: Eli Zaretskii @ 2011-05-14 20:59 UTC (permalink / raw)
  To: Ted Zlatanov; +Cc: emacs-devel

> From: Ted Zlatanov <tzz@lifelogs.com>
> Date: Sat, 14 May 2011 12:06:04 -0500
> 
> On Sat, 14 May 2011 19:42:39 +0300 Eli Zaretskii <eliz@gnu.org> wrote: 
> 
> EZ> Isn't it better to design the table for efficient use to begin with?
> 
> Yes, and I ask you and the other experts on char-tables to help with
> that design.  I am far from an expert on that topic.

I don't think char-table expertise will help here.  The main issue is
what uses of this data we envision.

> >> But I don't know if markchars.el needs to be terribly fast.
> 
> EZ> I hope we are not introducing another character property for a
> EZ> single use.  Some use, some day might need to do it fast.
> 
> This is premature optimization.

I call it foresight ;-)

> I only have a single use in hand.
> Let's make sure markchars.el is fast and we can optimize for other uses
> when they are needed.

No, let's think what uses we envision, and make the data structure
convenient for those uses.

> Let's say C1, C2, and C3 are confusables mapped to C1.  Then the mapping
> is C1 -> (C2, C3); C2 -> C1; and C3 -> C1.
> 
> The algorithm is "if a character maps to an atom it's confusable with
> it, if it maps to a list the whole lisp is confusable to this
> character."

Should it be a list or a string?  How would you use this mapping?

> So to find all the confusables mapped to a character you
> need at most two lookups.

The RHS of a mapping can be several characters, in which case there's
no reverse mapping and no "confusables mapped to a character", I think.

> In addition to the character mapping we also need a confusable data
> type, which can be SL/SA (single-script) or ML/MA (mixed-script).

What would be a possible use of that?



^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: idn.el and confusables.txt
  2011-05-14 20:59                                                                                                                   ` Eli Zaretskii
@ 2011-05-15  1:22                                                                                                                     ` Ted Zlatanov
  2011-05-15  5:56                                                                                                                       ` Eli Zaretskii
  0 siblings, 1 reply; 182+ messages in thread
From: Ted Zlatanov @ 2011-05-15  1:22 UTC (permalink / raw)
  To: emacs-devel

On Sat, 14 May 2011 23:59:22 +0300 Eli Zaretskii <eliz@gnu.org> wrote: 

>> Let's say C1, C2, and C3 are confusables mapped to C1.  Then the mapping
>> is C1 -> (C2, C3); C2 -> C1; and C3 -> C1.
>> 
>> The algorithm is "if a character maps to an atom it's confusable with
>> it, if it maps to a list the whole lisp is confusable to this
>> character."

EZ> Should it be a list or a string?  How would you use this mapping?

It could be any type of sequence, I guess.  Strings are more compact but
for small amounts of data (typically 1-3 characters) I'm not sure if
that matters.  For 1 character in particular I'm pretty sure it's more
efficient to store the character directly than any sequence.

markchars.el would use it as follows: look at all the characters of a
word.  If any are of a different script S2 from the majority script S1,
highlight them (we do this now with `markchars-face-confusable').

New functionality: now if any of the S2 characters are multi-script
confusables that map to a character in the majority script S1, highlight
them specially with the new variable
`markchars-face-confusable-multi-script' and give them a tooltip to say
they are confusable with a particular character.

New functionality: if any of the word characters, regardless of script,
are confusables of the single-script type, highlight them with
`markchars-face-confusable'.  But see below about normalization.

EZ> The RHS of a mapping can be several characters, in which case there's
EZ> no reverse mapping and no "confusables mapped to a character", I
EZ> think.

OK.  I was thinking of using the transitivity information but that's not
very useful so never mind.

>> In addition to the character mapping we also need a confusable data
>> type, which can be SL/SA (single-script) or ML/MA (mixed-script).

EZ> What would be a possible use of that?

Single-script confusables can be an accident and are usually due to
combining, e.g. parenthesized numbers:

2485 ;	0028 006C 0038 0029 ;	SL	#* ( ⒅ → (l8) ) PARENTHESIZED NUMBER EIGHTEEN → LEFT PARENTHESIS, LATIN SMALL LETTER L, DIGIT EIGHT, RIGHT PARENTHESIS	# →(18)→

...although there are many cases where that's not true:

0399 ;	0031 ;	SA	# ( Ι → 1 ) GREEK CAPITAL LETTER IOTA → DIGIT ONE	# →l→
0417 ;	0033 ;	SA	# ( З → 3 ) CYRILLIC CAPITAL LETTER ZE → DIGIT THREE	# 

As a general rule I'd say that if the mapping is to a single character
with the SL/SA single-script property, chances are it's a true
confusable.  Otherwise it could be legitimate and we'd need to convert
the string to a normalized form, which is probably slow (do you know?)

Mixed-script confusables are more dangerous because they look exactly
like the other character and are less likely to be an accident, e.g.
FF01 ;	0021 ;	ML	#* ( ！ → ! ) FULLWIDTH EXCLAMATION MARK → EXCLAMATION MARK	# →ǃ→
0430 ;	0061 ;	ML	# ( а → a ) CYRILLIC SMALL LETTER A → LATIN SMALL LETTER A	# 

so I would make them more noticeable and would skip any normalization.
Thus my new functionality proposals above.

There are also whole-script confusables, e.g. "scope" in Latin and
"scope" in Cyrillic (example from http://unicode.org/reports/tr39/) but
I think those are covered by the rules above already and don't merit
special treatment.

Finally, confusables.txt has transitivity mappings that explain how the
mapping was derived.  I don't think that's particularly useful for
markchars.el.  I can't think of any other uses for the confusables.txt
data beyond the listed above.

Based on all this, I think it's best to make the confusables char-table
values atoms or sequences (strings or lists) but split them into two
char-tables for the single-script and multi-script mappings.

Ted

^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: idn.el and confusables.txt
  2011-05-15  1:22                                                                                                                     ` Ted Zlatanov
@ 2011-05-15  5:56                                                                                                                       ` Eli Zaretskii
  2011-05-15 12:14                                                                                                                         ` Ted Zlatanov
  0 siblings, 1 reply; 182+ messages in thread
From: Eli Zaretskii @ 2011-05-15  5:56 UTC (permalink / raw)
  To: Ted Zlatanov; +Cc: emacs-devel

> From: Ted Zlatanov <tzz@lifelogs.com>
> Date: Sat, 14 May 2011 20:22:44 -0500
> 
> EZ> Should it be a list or a string?  How would you use this mapping?
> 
> It could be any type of sequence, I guess.  Strings are more compact but
> for small amounts of data (typically 1-3 characters) I'm not sure if
> that matters.  For 1 character in particular I'm pretty sure it's more
> efficient to store the character directly than any sequence.
> 
> markchars.el would use it as follows: look at all the characters of a
> word.  If any are of a different script S2 from the majority script S1,
> highlight them (we do this now with `markchars-face-confusable').
> 
> New functionality: now if any of the S2 characters are multi-script
> confusables that map to a character in the majority script S1, highlight
> them specially with the new variable
> `markchars-face-confusable-multi-script' and give them a tooltip to say
> they are confusable with a particular character.
> 
> New functionality: if any of the word characters, regardless of script,
> are confusables of the single-script type, highlight them with
> `markchars-face-confusable'.  But see below about normalization.

These all examine portions of a buffer ("words") for being a match to
some string or regexp.  So I think having strings in the char-table
will be more convenient, because you could then use looking-at,
string=, string-match, etc.

> As a general rule I'd say that if the mapping is to a single character
> with the SL/SA single-script property, chances are it's a true
> confusable.  Otherwise it could be legitimate and we'd need to convert
> the string to a normalized form, which is probably slow (do you know?)

What do you mean by "normalized form"?

> Based on all this, I think it's best to make the confusables char-table
> values atoms or sequences (strings or lists) but split them into two
> char-tables for the single-script and multi-script mappings.

If we were to implement the full IDNA protocol, would the above be
enough?  Or will we need additional information?



^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: idn.el and confusables.txt
  2011-05-15  5:56                                                                                                                       ` Eli Zaretskii
@ 2011-05-15 12:14                                                                                                                         ` Ted Zlatanov
  2011-05-16 12:38                                                                                                                           ` Eli Zaretskii
  2011-05-17 15:32                                                                                                                           ` Ted Zlatanov
  0 siblings, 2 replies; 182+ messages in thread
From: Ted Zlatanov @ 2011-05-15 12:14 UTC (permalink / raw)
  To: emacs-devel

On Sun, 15 May 2011 01:56:02 -0400 Eli Zaretskii <eliz@gnu.org> wrote: 

EZ> These all examine portions of a buffer ("words") for being a match to
EZ> some string or regexp.  So I think having strings in the char-table
EZ> will be more convenient, because you could then use looking-at,
EZ> string=, string-match, etc.

Oh, good point.  OK, strings it is.  I'll write the converter.

>> As a general rule I'd say that if the mapping is to a single character
>> with the SL/SA single-script property, chances are it's a true
>> confusable.  Otherwise it could be legitimate and we'd need to convert
>> the string to a normalized form, which is probably slow (do you know?)

EZ> What do you mean by "normalized form"?

Unicode has a normalization algorithm to see if two strings are
informationally the same regardless of the combining characters and
other sequences within.  But thinking about it, even if normalization
says they're the same, it's still a potential problem for the user, so
we can skip normalization and always mark those.

>> Based on all this, I think it's best to make the confusables char-table
>> values atoms or sequences (strings or lists) but split them into two
>> char-tables for the single-script and multi-script mappings.

EZ> If we were to implement the full IDNA protocol, would the above be
EZ> enough?  Or will we need additional information?

Oh, all this has been for confusables (TR39) only.  IDNA and uni-idn.el
will have their own needs!  IIUC, Lennart used IDNA only as a character
set in markchars.el (I didn't write that functionality and he maintains
idn.el), but there are more security issues with it we may need to
handle.

IDNA is better described in http://unicode.org/reports/tr46/ and the
links at the end of that document (a whole bunch of RFCs).  I'm not
interested in implementing the IDNA code beyond supporting the current
character set detection because I don't think IDNA is popular enough,
but maybe Lennart and others want to do it.

For further possible markchars.el functionality, take a look at
http://www.unicode.org/reports/tr36/ (Unicode Security Considerations).
It talks about the confusables issues, IDNA issues, and bidi issues
among others.  It's a really good explanation of what security-related
functionality is needed from the confusables char-table and potentially
other places in Emacs.

Ted

^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: idn.el and confusables.txt
  2011-05-14  8:06                                                                                                       ` Eli Zaretskii
  2011-05-14  8:56                                                                                                         ` Lennart Borgman
  2011-05-14 13:40                                                                                                         ` Ted Zlatanov
@ 2011-05-15 13:06                                                                                                         ` Kenichi Handa
  2011-05-15 17:34                                                                                                           ` Eli Zaretskii
  2 siblings, 1 reply; 182+ messages in thread
From: Kenichi Handa @ 2011-05-15 13:06 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: tzz, monnier, emacs-devel

In article <83iptdg0yr.fsf@gnu.org>, Eli Zaretskii <eliz@gnu.org> writes:

> You see, the uni-*.el files we create out of the Unicode DB are not
> used anywhere in application code, AFAIK.  We use them to display
> character properties in the likes of "C-u C-x =", and that's it.

composite.el uses `general-category' and `canonical-combining-class'.
ucs-normalize.el uses `decomposition' and `canonical-combining-class'.
mule-cmds.el uses `name' and `old-name' for read-char-by-name.

> I'm
> not even sure they are organized in a way that makes them useful.
> E.g., when I needed to use the Unicode bidirectional properties for
> bidi reordering, I eventually was forced to create my own tables (see
> src/biditype.h and src/bidimirror.h, and the corresponding Awk scripts
> in admin/unidata/) which lend themselves well to using them in
> real-life code.

Why did you have to create another table?  Was it because
get-char-code-property is defiend by Lisp and not efficient
to call from C?

---
Kenichi Handa
handa@m17n.org



^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: idn.el and confusables.txt
  2011-05-15 13:06                                                                                                         ` Kenichi Handa
@ 2011-05-15 17:34                                                                                                           ` Eli Zaretskii
  2011-05-18  5:23                                                                                                             ` handa
  0 siblings, 1 reply; 182+ messages in thread
From: Eli Zaretskii @ 2011-05-15 17:34 UTC (permalink / raw)
  To: Kenichi Handa; +Cc: tzz, monnier, emacs-devel

> From: Kenichi Handa <handa@m17n.org>
> Cc: monnier@iro.umontreal.ca, tzz@lifelogs.com, emacs-devel@gnu.org
> Date: Sun, 15 May 2011 22:06:23 +0900
> 
> In article <83iptdg0yr.fsf@gnu.org>, Eli Zaretskii <eliz@gnu.org> writes:
> 
> > You see, the uni-*.el files we create out of the Unicode DB are not
> > used anywhere in application code, AFAIK.  We use them to display
> > character properties in the likes of "C-u C-x =", and that's it.
> 
> composite.el uses `general-category' and `canonical-combining-class'.
> ucs-normalize.el uses `decomposition' and `canonical-combining-class'.
> mule-cmds.el uses `name' and `old-name' for read-char-by-name.

Are functions defined by ucs-normalize.el used anywhere?

> Why did you have to create another table?  Was it because
> get-char-code-property is defiend by Lisp and not efficient
> to call from C?

Yes, calling a Lisp function (one that calls `load' at that!) in the
lowest level of display engine was out of the question.  But there
were several other reasons as well:

  . get-char-code-property returns a property list in which bidi types
    are recorded as symbols, while I needed them as small numeric
    values of a C enumerated type (see bidi_type_t), to fit in a small
    number of bits in `struct glyph'.

  . The data structures manipulated by get-char-code-property include
    complications (e.g., a function in the extra slot) for which I
    could find no documentation, so I couldn't figure out whether it
    would be possible to replace get-char-code-property by a simple
    call to CHAR_TABLE_REF.

  . Even if I could use CHAR_TABLE_REF, the additional call to
    plist-get means more overhead.  bidi_get_type, the function which
    needs to look up the bidirectional type of an arbitrary character,
    runs in the innermost loop of the display engine, and is called at
    least once (sometimes more) for every character in the displayed
    portion of the buffer, so it must be very efficient.

  . For bidi-mirrored property, the data in the `mirrored' property
    recorded by uni-mirrored.el is simply inadequate: the value is a
    boolean (albeit in a form of symbols `Y' and `N').  What I needed
    was for each character its mirrored character, if there is one;
    this data was simply not available in uni-mirrored.el.  The
    corresponding function bidi_mirror_char is also called for a large
    percentage of displayed characters, and must be efficient.

It was extremely frustrating to have all that data at my fingertips
and not be able to use it for the purposes of bidi.c, which at first
seems like a first-class client of Unicode DB.  What I wanted was
something similar to C ctype macros in simplicity and efficiency, but
nothing quite like that was available.  A char-table comes close, but
it must be a simple table with numerical values -- and that is what
bidi.c currently uses, leaving uni-bidi.el unused.



^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: idn.el and confusables.txt
  2011-05-15 12:14                                                                                                                         ` Ted Zlatanov
@ 2011-05-16 12:38                                                                                                                           ` Eli Zaretskii
  2011-05-16 18:31                                                                                                                             ` Ted Zlatanov
  2011-05-17 15:32                                                                                                                           ` Ted Zlatanov
  1 sibling, 1 reply; 182+ messages in thread
From: Eli Zaretskii @ 2011-05-16 12:38 UTC (permalink / raw)
  To: Ted Zlatanov; +Cc: emacs-devel

> From: Ted Zlatanov <tzz@lifelogs.com>
> Date: Sun, 15 May 2011 07:14:47 -0500
> 
> EZ> If we were to implement the full IDNA protocol, would the above be
> EZ> enough?  Or will we need additional information?
> 
> Oh, all this has been for confusables (TR39) only.  IDNA and uni-idn.el
> will have their own needs!

Granted, but I was asking whether IDNA implementation will need
anything from confusables.txt beyond what we discussed.  That is, will
a mapping of a character to a string be all IDNA needs to use the
information in confusables.txt?

> IDNA is better described in http://unicode.org/reports/tr46/ and the
> links at the end of that document (a whole bunch of RFCs).  I'm not
> interested in implementing the IDNA code beyond supporting the current
> character set detection because I don't think IDNA is popular enough,
> but maybe Lennart and others want to do it.
> 
> For further possible markchars.el functionality, take a look at
> http://www.unicode.org/reports/tr36/ (Unicode Security Considerations).
> It talks about the confusables issues, IDNA issues, and bidi issues
> among others.  It's a really good explanation of what security-related
> functionality is needed from the confusables char-table and potentially
> other places in Emacs.

Yes, I'm familiar with these specs, but I didn't try to design an
implementation of IDNA, so the answers to the above questions are not
clear to me.  If someone can explain how would an IDNA implementation
use confusables.txt, we could make sure that the tables we produce
from it will be good for IDNA as well, if and when someone will want
to implement it.



^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: idn.el and confusables.txt
  2011-05-16 12:38                                                                                                                           ` Eli Zaretskii
@ 2011-05-16 18:31                                                                                                                             ` Ted Zlatanov
  2011-05-17 17:59                                                                                                                               ` Eli Zaretskii
  0 siblings, 1 reply; 182+ messages in thread
From: Ted Zlatanov @ 2011-05-16 18:31 UTC (permalink / raw)
  To: emacs-devel

On Mon, 16 May 2011 15:38:38 +0300 Eli Zaretskii <eliz@gnu.org> wrote: 

>> From: Ted Zlatanov <tzz@lifelogs.com>
>> Date: Sun, 15 May 2011 07:14:47 -0500
>> 
EZ> If we were to implement the full IDNA protocol, would the above be
EZ> enough?  Or will we need additional information?
>> 
>> Oh, all this has been for confusables (TR39) only.  IDNA and uni-idn.el
>> will have their own needs!

EZ> Granted, but I was asking whether IDNA implementation will need
EZ> anything from confusables.txt beyond what we discussed.  That is, will
EZ> a mapping of a character to a string be all IDNA needs to use the
EZ> information in confusables.txt?

I can't think of any other uses, so yes.  Lennart may want to comment.

>> IDNA is better described in http://unicode.org/reports/tr46/ and the
>> links at the end of that document (a whole bunch of RFCs).  I'm not
>> interested in implementing the IDNA code beyond supporting the current
>> character set detection because I don't think IDNA is popular enough,
>> but maybe Lennart and others want to do it.
>> 
>> For further possible markchars.el functionality, take a look at
>> http://www.unicode.org/reports/tr36/ (Unicode Security Considerations).
>> It talks about the confusables issues, IDNA issues, and bidi issues
>> among others.  It's a really good explanation of what security-related
>> functionality is needed from the confusables char-table and potentially
>> other places in Emacs.

EZ> Yes, I'm familiar with these specs, but I didn't try to design an
EZ> implementation of IDNA, so the answers to the above questions are not
EZ> clear to me.  If someone can explain how would an IDNA implementation
EZ> use confusables.txt, we could make sure that the tables we produce
EZ> from it will be good for IDNA as well, if and when someone will want
EZ> to implement it.

I may be misunderstanding, but I think you're sort of implying that IDNA
(in domain names and URLs) is the main use for confusables.txt.  I think
file names, shell interactions, and general text (especially source
code) are also important uses of that data because confusables in those
settings can introduce security issues.  Sorry if this is clear to you
and everyone else.

Ted

^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: idn.el and confusables.txt
  2011-05-15 12:14                                                                                                                         ` Ted Zlatanov
  2011-05-16 12:38                                                                                                                           ` Eli Zaretskii
@ 2011-05-17 15:32                                                                                                                           ` Ted Zlatanov
  2011-05-18 18:15                                                                                                                             ` Ted Zlatanov
  1 sibling, 1 reply; 182+ messages in thread
From: Ted Zlatanov @ 2011-05-17 15:32 UTC (permalink / raw)
  To: emacs-devel

[-- Attachment #1: Type: text/plain, Size: 1014 bytes --]

On Sun, 15 May 2011 07:14:47 -0500 Ted Zlatanov <tzz@lifelogs.com> wrote: 

TZ> On Sun, 15 May 2011 01:56:02 -0400 Eli Zaretskii <eliz@gnu.org> wrote: 
EZ> These all examine portions of a buffer ("words") for being a match to
EZ> some string or regexp.  So I think having strings in the char-table
EZ> will be more convenient, because you could then use looking-at,
EZ> string=, string-match, etc.

TZ> Oh, good point.  OK, strings it is.  I'll write the converter.

Here's the converter.  It reads the confusables.txt file and generates a
char-table with strings as values.  I'll package the converter and the
resulting uni-confusables.el library and put them on the GNU ELPA.

Could you tell me the best way to write uni-confusables.el?  In what
format should I provide the char-tables in the ELisp code?  The uni-*
files in Emacs use `define-char-code-property' because they deal with a
single property, so should the properties here be
'confusables-single-script and 'confusables-multiple-script?

Thanks
Ted

[-- Attachment #2: gen-confusables.el --]
[-- Type: application/emacs-lisp, Size: 2498 bytes --]

^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: idn.el and confusables.txt
  2011-05-16 18:31                                                                                                                             ` Ted Zlatanov
@ 2011-05-17 17:59                                                                                                                               ` Eli Zaretskii
  0 siblings, 0 replies; 182+ messages in thread
From: Eli Zaretskii @ 2011-05-17 17:59 UTC (permalink / raw)
  To: Ted Zlatanov; +Cc: emacs-devel

> From: Ted Zlatanov <tzz@lifelogs.com>
> Date: Mon, 16 May 2011 13:31:44 -0500
> 
> >> For further possible markchars.el functionality, take a look at
> >> http://www.unicode.org/reports/tr36/ (Unicode Security Considerations).
> >> It talks about the confusables issues, IDNA issues, and bidi issues
> >> among others.  It's a really good explanation of what security-related
> >> functionality is needed from the confusables char-table and potentially
> >> other places in Emacs.
> 
> EZ> Yes, I'm familiar with these specs, but I didn't try to design an
> EZ> implementation of IDNA, so the answers to the above questions are not
> EZ> clear to me.  If someone can explain how would an IDNA implementation
> EZ> use confusables.txt, we could make sure that the tables we produce
> EZ> from it will be good for IDNA as well, if and when someone will want
> EZ> to implement it.
> 
> I may be misunderstanding, but I think you're sort of implying that IDNA
> (in domain names and URLs) is the main use for confusables.txt.

No, I wasn't implying that.  But it's certainly one possible use.



^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: idn.el and confusables.txt
  2011-05-15 17:34                                                                                                           ` Eli Zaretskii
@ 2011-05-18  5:23                                                                                                             ` handa
  2011-05-18  7:38                                                                                                               ` Eli Zaretskii
  0 siblings, 1 reply; 182+ messages in thread
From: handa @ 2011-05-18  5:23 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: tzz, monnier, emacs-devel

In article <83wrhreukg.fsf@gnu.org>, Eli Zaretskii <eliz@gnu.org> writes:

> Are functions defined by ucs-normalize.el used anywhere?

What do you mean by "anywhere".  That file provides a coding
system utf-8-hfs that uses functions in that file.  And I
don't know whether a specific coding system is used or not
except for what I use.  In addition, I have a plan of using
the facility of normalization to display, for instance, the
sequence 'A' '́' by 'Á' when a font doesn't support OTF GPOS
features.

> > Why did you have to create another table?  Was it because
> > get-char-code-property is defiend by Lisp and not efficient
> > to call from C?

> Yes, calling a Lisp function (one that calls `load' at that!) in the
> lowest level of display engine was out of the question.

Why?  If that loading is just once, what's the problem?  For
automatic composition, we are already doing that.

> But there were several other reasons as well:

>   . get-char-code-property returns a property list in which bidi types
>     are recorded as symbols, while I needed them as small numeric
>     values of a C enumerated type (see bidi_type_t), to fit in a small
>     number of bits in `struct glyph'.

>   . The data structures manipulated by get-char-code-property include
>     complications (e.g., a function in the extra slot) for which I
>     could find no documentation, so I couldn't figure out whether it
>     would be possible to replace get-char-code-property by a simple
>     call to CHAR_TABLE_REF.

>   . Even if I could use CHAR_TABLE_REF, the additional call to
>     plist-get means more overhead.  bidi_get_type, the function which
>     needs to look up the bidirectional type of an arbitrary character,
>     runs in the innermost loop of the display engine, and is called at
>     least once (sometimes more) for every character in the displayed
>     portion of the buffer, so it must be very efficient.

>   . For bidi-mirrored property, the data in the `mirrored' property
>     recorded by uni-mirrored.el is simply inadequate: the value is a
>     boolean (albeit in a form of symbols `Y' and `N').  What I needed
>     was for each character its mirrored character, if there is one;
>     this data was simply not available in uni-mirrored.el.  The
>     corresponding function bidi_mirror_char is also called for a large
>     percentage of displayed characters, and must be efficient.

> It was extremely frustrating to have all that data at my fingertips
> and not be able to use it for the purposes of bidi.c, which at first
> seems like a first-class client of Unicode DB.  What I wanted was
> something similar to C ctype macros in simplicity and efficiency, but
> nothing quite like that was available.  A char-table comes close, but
> it must be a simple table with numerical values -- and that is what
> bidi.c currently uses, leaving uni-bidi.el unused.

I understand your situation, and it's unfortunate that you
didn't consult me to improve the situation.  I think it's
not difficult to write a C code that makes uni-* char-tables
more easily be accessed from C without running Lisp.

As for the value type of symbol vs integer in uni-bidi.el,
it's possible to write a code to expose integer value to
C, and the corresponding symbol to Lisp.

As for the uni-mirrored.el, I think it's ok to change the
value type to what you want (character or nil).

---
Kenichi Handa
handa@m17n.org



^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: idn.el and confusables.txt
  2011-05-18  5:23                                                                                                             ` handa
@ 2011-05-18  7:38                                                                                                               ` Eli Zaretskii
  2011-05-18  7:59                                                                                                                 ` handa
  0 siblings, 1 reply; 182+ messages in thread
From: Eli Zaretskii @ 2011-05-18  7:38 UTC (permalink / raw)
  To: handa; +Cc: monnier, emacs-devel

> From: handa <handa@m17n.org>
> Cc: monnier@iro.umontreal.ca, tzz@lifelogs.com, emacs-devel@gnu.org
> Date: Wed, 18 May 2011 14:23:26 +0900
> 
> > > Why did you have to create another table?  Was it because
> > > get-char-code-property is defiend by Lisp and not efficient
> > > to call from C?
> 
> > Yes, calling a Lisp function (one that calls `load' at that!) in the
> > lowest level of display engine was out of the question.
> 
> Why?  If that loading is just once, what's the problem?  For
> automatic composition, we are already doing that.

When I started working on bidirectional support, I didn't know that
(as well as many other intimate details of the display engine I know
now).  Loading and calling Lisp code in the middle of redisplay buffer
iteration surely sounded scary.  I also didn't know then whether we
will enable the bidi reordering by default, or just when and where
needed.  The delay of loading is significant, and I wanted to avoid
that.

Maybe these all were just shadows of a dwarf, I don't know.  I guess
the real reason is that I felt alone in this adventure (I still do),
and needed to get things done as quickly and as simply as possible,
and in a way that I understood well enough to design and write
reliable code.  The uni-* files and the corresponding properties were
not documented back then, except in short doc strings, so it was hard
for me to know whether changing them to what bidi.c needed was
feasible.  With the little time I had to work on that, and the job
being so huge, I couldn't afford spending too much time on research
and discussions.

> I understand your situation, and it's unfortunate that you
> didn't consult me to improve the situation.  I think it's
> not difficult to write a C code that makes uni-* char-tables
> more easily be accessed from C without running Lisp.

It's not too late: Emacs 24 is not yet released, and won't be for some
time.  If uni-bidi can be changed to suit the requirements of bidi.c,
I will gladly change bidi.c to use it and throw away the private table
it uses now.  TIA.

> As for the value type of symbol vs integer in uni-bidi.el,
> it's possible to write a code to expose integer value to
> C, and the corresponding symbol to Lisp.

That would be great, thanks.

> As for the uni-mirrored.el, I think it's ok to change the
> value type to what you want (character or nil).

AFAIK, this needs a change in how uni-mirrored.el is generated,
because UnicodeData.txt doesn't include the necessary information.  So
admin/unidata/unidata-gen.el's way of gleaning the info from
UnicodeData.txt is not appropriate for such a change.

Maybe we should simply expose to Lisp the table built from
BidiMirroring.txt, which bidi.c already creates and uses.

^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: idn.el and confusables.txt
  2011-05-18  7:38                                                                                                               ` Eli Zaretskii
@ 2011-05-18  7:59                                                                                                                 ` handa
  2011-05-18  8:13                                                                                                                   ` Eli Zaretskii
  2011-06-17  8:15                                                                                                                   ` Kenichi Handa
  0 siblings, 2 replies; 182+ messages in thread
From: handa @ 2011-05-18  7:59 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: monnier, emacs-devel

In article <E1QMbKx-0003tU-B3@fencepost.gnu.org>, Eli Zaretskii <eliz@gnu.org> writes:

> AFAIK, this needs a change in how uni-mirrored.el is generated,
> because UnicodeData.txt doesn't include the necessary information.  So
> admin/unidata/unidata-gen.el's way of gleaning the info from
> UnicodeData.txt is not appropriate for such a change.

> Maybe we should simply expose to Lisp the table built from
> BidiMirroring.txt, which bidi.c already creates and uses.

Now there are several *.txt files that can't be delived from
UnicodeData.txt, and I want to provide Emacs the facility to
utilize all such Unicode data.  For that, I want to improve
unidata-gen.el to handle those data files.

But, at first, I'll work on improving C interface to
uni-*.el.

---
Kenichi Handa
handa@m17n.org



^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: idn.el and confusables.txt
  2011-05-18  7:59                                                                                                                 ` handa
@ 2011-05-18  8:13                                                                                                                   ` Eli Zaretskii
  2011-06-17  8:15                                                                                                                   ` Kenichi Handa
  1 sibling, 0 replies; 182+ messages in thread
From: Eli Zaretskii @ 2011-05-18  8:13 UTC (permalink / raw)
  To: handa; +Cc: monnier, emacs-devel

> From: handa <handa@m17n.org>
> Cc: monnier@iro.umontreal.ca, emacs-devel@gnu.org
> Date: Wed, 18 May 2011 16:59:15 +0900
> 
> Now there are several *.txt files that can't be delived from
> UnicodeData.txt, and I want to provide Emacs the facility to
> utilize all such Unicode data.  For that, I want to improve
> unidata-gen.el to handle those data files.
> 
> But, at first, I'll work on improving C interface to
> uni-*.el.

Thank you.



^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: idn.el and confusables.txt
  2011-05-17 15:32                                                                                                                           ` Ted Zlatanov
@ 2011-05-18 18:15                                                                                                                             ` Ted Zlatanov
  0 siblings, 0 replies; 182+ messages in thread
From: Ted Zlatanov @ 2011-05-18 18:15 UTC (permalink / raw)
  To: emacs-devel

[-- Attachment #1: Type: text/plain, Size: 1141 bytes --]

On Tue, 17 May 2011 10:32:03 -0500 Ted Zlatanov <tzz@lifelogs.com> wrote: 

TZ> Here's the converter.  It reads the confusables.txt file and generates a
TZ> char-table with strings as values.  I'll package the converter and the
TZ> resulting uni-confusables.el library and put them on the GNU ELPA.

TZ> Could you tell me the best way to write uni-confusables.el?  In what
TZ> format should I provide the char-tables in the ELisp code?

The shortest format turned out to be a range enumeration, because the
native char-table dump was much bigger (700K vs. 100K).  So I wrote
`gen-confusables-write' to create the "uni-confusables.el" file that
defines the two char-tables and then populates them.

As a bonus, two ERT tests (one per single/multiple type) are also
generated dynamically based on the data found in the confusables.txt
file.

gen-confusables.el is a pretty unholy mix of Lisp and string
manipulations, but since I am the only real user I don't mind.  You can
test it with
http://www.unicode.org/Public/security/revision-04/confusables.txt (I'm
not including the resulting uni-confusables.el here because it's over
100K).

Ted


[-- Attachment #2: gen-confusables.el --]
[-- Type: application/emacs-lisp, Size: 4409 bytes --]

;;; gen-confusables.el --- generate uni-confusables.el from confusables.txt

;; Copyright (C) 2011  Teodor Zlatanov

;; Author: Teodor Zlatanov <tzz@lifelogs.com>

;; This program is free software; you can redistribute it and/or modify
;; it under the terms of the GNU General Public License as published by
;; the Free Software Foundation, either version 3 of the License, or
;; (at your option) any later version.

;; This program is distributed in the hope that it will be useful,
;; but WITHOUT ANY WARRANTY; without even the implied warranty of
;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
;; GNU General Public License for more details.

;; You should have received a copy of the GNU General Public License
;; along with this program.  If not, see <http://www.gnu.org/licenses/>.

;;; Commentary:

;;; Code:

(require 'cl)

(defvar gen-confusables-char-table-single)
(defvar gen-confusables-char-table-multiple)

(defun gen-confusables-read (file)
  (interactive "fConfusables filename: \n")
  (flet ((reader (h) (string-to-number h 16)))
    (let ((stable (make-char-table 'confusables-single-script))
          (mtable (make-char-table 'confusables-multiple-script))
          (count 0)
          (confusable-line-regexp (concat "^\\([[:xdigit:]]+\\)" ; \x+
                                          " ;\t"
                                          ;; \x+ separated by spaces
                                          "\\([[:space:][:xdigit:]]+\\)"
                                          " ;\t"
                                          "\\([SM]\\)[LA]"))) ; SL, SA, ML, MA
      (setq gen-confusables-char-table-single stable)
      (setq gen-confusables-char-table-multiple mtable)
      (with-temp-buffer
        (insert-file-contents file)
        (goto-char (point-min))
        (while (re-search-forward confusable-line-regexp nil t)
          (incf count)
          (when (and (called-interactively-p)
                     (zerop (mod count 100)))
            (message "processed %d lines" count))
          (let* ((from (match-string 1))
                 (to (match-string 2))
                 (class (match-string 3))
                 (table (if (string-equal "S" class) stable mtable)))
            (set-char-table-range
             table
             (reader from)
             (concat (mapcar 'reader (split-string to))))))))))

(defun gen-confusables-write (file)
  (interactive "fDumped filename: \n")
  (let ((coding-system-for-write 'utf-8-emacs))
    (with-temp-file file
      (insert ";; Copyright (C) 1991-2009, 2010 Unicode, Inc.
;; This file was generated from the Unicode confusables list at
;; http://www.unicode.org/Public/security/revision-04/confusables.txt.
;; See lisp/international/README in the Emacs trunk
;; for the copyright and permission notice.\n\n")
      (dolist (type '(single multiple))
        (let* ((tablesym (intern (format "uni-confusables-char-table-%s" type)))
               (oursym (intern (format "gen-confusables-char-table-%s" type)))
               (ourtable (symbol-value oursym))
               (ourtablename (symbol-name oursym))
               (tablename (symbol-name tablesym))
               (prop (format "confusables-%s-script" type))
               props)
          (insert (format "(defvar %s (make-char-table '%s))\n\n"
                          tablename prop))
          (map-char-table
           (lambda (k v) (setq props (cons k (cons v props))))
           ourtable)

          (insert (format "(let ((k nil) (v nil) (ranges '%S))\n" props))
          (insert (format "
  (while ranges
     (setq k (pop ranges)
           v (pop ranges))
     (set-char-table-range %s k v)))\n\n" tablename))

          (insert (format "(ert-deftest uni-confusables-test-%s ()\n" type))

          (dolist (offset '(100 200 800 3000 3500))
            (insert (format "
  (should (string-equal
           (char-table-range %s %d)
           %S))\n"
                            tablename
                            (nth (* 2 offset) props)
                            (nth (1+ (* 2 offset)) props))))
          (insert ")\n\n")))
      (insert "
;; Local Variables:
;; coding: utf-8
;; no-byte-compile: t
;; End:

;; uni-confusables.el ends here"))))

(provide 'gen-confusables)
;;; gen-confusables.el ends here

^ permalink raw reply	[flat|nested] 182+ messages in thread

* uni-confusables 0.1 is on the Emacs ELPA branch (was: idn.el and confusables.txt)
  2011-04-26 15:28                                                                                                   ` idn.el and confusables.txt (was: markchars.el 0.2.0 and idn.el) Ted Zlatanov
  2011-05-13 19:42                                                                                                     ` idn.el and confusables.txt Stefan Monnier
@ 2011-05-31 10:42                                                                                                     ` Ted Zlatanov
  2011-06-08 10:42                                                                                                       ` uni-confusables 0.1 is on the Emacs ELPA branch Ted Zlatanov
  1 sibling, 1 reply; 182+ messages in thread
From: Ted Zlatanov @ 2011-05-31 10:42 UTC (permalink / raw)
  To: emacs-devel

I've put uni-confusables.el on the ELPA branch.  Stefan, could you
please check it?  I tried to stick to the *-pkg.el format you specified
in archive-contents.el, but I may have missed something.  It's the first
multi-file package I've bundled.  Also I don't know how the license
should look.

If you prefer I could just put uni-confusables.el there, but I thought
the generator and the source database were important too.

Ted

^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: uni-confusables 0.1 is on the Emacs ELPA branch
  2011-05-31 10:42                                                                                                     ` uni-confusables 0.1 is on the Emacs ELPA branch (was: idn.el and confusables.txt) Ted Zlatanov
@ 2011-06-08 10:42                                                                                                       ` Ted Zlatanov
  2011-06-08 15:22                                                                                                         ` Stefan Monnier
  0 siblings, 1 reply; 182+ messages in thread
From: Ted Zlatanov @ 2011-06-08 10:42 UTC (permalink / raw)
  To: emacs-devel

On Tue, 31 May 2011 05:42:55 -0500 Ted Zlatanov <tzz@lifelogs.com> wrote: 

TZ> I've put uni-confusables.el on the ELPA branch.  Stefan, could you
TZ> please check it?  I tried to stick to the *-pkg.el format you specified
TZ> in archive-contents.el, but I may have missed something.  It's the first
TZ> multi-file package I've bundled.  Also I don't know how the license
TZ> should look.

(to clarify, the license should be as part of Emacs)

TZ> If you prefer I could just put uni-confusables.el there, but I thought
TZ> the generator and the source database were important too.

Ping on the questions above?

Ted

^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: uni-confusables 0.1 is on the Emacs ELPA branch
  2011-06-08 10:42                                                                                                       ` uni-confusables 0.1 is on the Emacs ELPA branch Ted Zlatanov
@ 2011-06-08 15:22                                                                                                         ` Stefan Monnier
  0 siblings, 0 replies; 182+ messages in thread
From: Stefan Monnier @ 2011-06-08 15:22 UTC (permalink / raw)
  To: emacs-devel

TZ> I've put uni-confusables.el on the ELPA branch.  Stefan, could you
TZ> please check it?  I tried to stick to the *-pkg.el format you specified
TZ> in archive-contents.el, but I may have missed something.  It's the first
TZ> multi-file package I've bundled.  Also I don't know how the license
TZ> should look.
> (to clarify, the license should be as part of Emacs)

I'm not sure which is best.  All the single files there seem to say
"part of Emacs" and most of the multi-file packages seem to say "not
part of Emacs".
I don't think it matters much, as long as the text is correct (i.e. it
doesn't say "not part of GNU Emacs" followed by "GNU Emacs is Free
Software blabla" since that makes the blurb legally meaningless).
So write it whichever fits best your intentions.

TZ> If you prefer I could just put uni-confusables.el there, but I thought
TZ> the generator and the source database were important too.
> Ping on the questions above?

Including the generator is important, indeed.  The source database is
not that important since it can be downloaded from other places, but
feel free to include it.

        Stefan

^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: idn.el and confusables.txt
  2011-05-18  7:59                                                                                                                 ` handa
  2011-05-18  8:13                                                                                                                   ` Eli Zaretskii
@ 2011-06-17  8:15                                                                                                                   ` Kenichi Handa
  2011-06-17 15:12                                                                                                                     ` Eli Zaretskii
  1 sibling, 1 reply; 182+ messages in thread
From: Kenichi Handa @ 2011-06-17  8:15 UTC (permalink / raw)
  To: eliz; +Cc: emacs-devel

In article <tl7r57wo2wc.fsf@m17n.org>, handa <handa@m17n.org> writes:

> But, at first, I'll work on improving C interface to
> uni-*.el.

Attached is the first version for that.  It provides two C
functions (excerpt from chartab.c).

/* Unicode character property

   This section provides a convenient and efficient way to get a
   Unicode character property from C code (from Lisp, you can use
   get-char-code-property).

   The typical usage is to get a char-table for a specific property at
   a proper initialization time as this:

	Lisp_Object bidi_class_table = uniprop_table (intern ("bidi-class"));

   and get a property value for character CH as this:

	Lisp_Object bidi_class = uniprop_table_lookup (CH, bidi_class_table);

   In this case, what you actually get is an index number to the
   vector of property values (symbols nil, L, R, etc).  See the
   comment of uniprop_table_lookup for the detail.  */

Eli, I arranged that bidi_class got as above is an integer
value that is the same as enum bidi_type_t defined in
dispextern.h.  And, updata_table (intern "mirroring") gives
a char-table for mirroring character.

Could you check if the attached patch gives sufficient
facility to be used in the bidi code?

---
Kenichi Handa
handa@m17n.org

=== modified file 'admin/ChangeLog'
--- admin/ChangeLog	2011-04-10 16:33:22 +0000
+++ admin/ChangeLog	2011-06-17 07:27:55 +0000
@@ -1,3 +1,29 @@
+2011-06-17  Kenichi Handa  <handa@m17n.org>
+
+	* unidata/unidata-gen.el (unidata-dir): New variable.
+	(unidata-setup-list): Expand unidata-text-file in unidata-dir.
+	(unidata-prop-alist): INDEX element may be a function.  New
+	optional element VAL-LIST (for general-category and bidi-class).
+	New entry `mirroring'.
+	(unidata-get-character): Adjusted for the new compression format
+	of char-table element.
+	(unidata-gen-table-character): New arg IGNORE.  Adjusted for the
+	above changes.
+	(unidata-encode-val): Assume the first element of VAL-LIST is (nil
+	. 0).
+	(unidata-gen-table): Change argument DEFAULT-VALUE to VAL-LIST.
+	(unidata-gen-table-symbol): New arg VAL-LIST.
+	(unidata-gen-table-integer): Likewise.
+	(unidata-gen-table-numeric): Likewise.
+	(unidata-gen-table-name): New arg IGNORE.
+	(unidata-gen-table-decomposition): Likewise.
+	(unidata-gen-mirroring-list): New funciton.
+	(unidata-gen-files): New arg DATA-DIR.  Adjusted for the change of
+	unidata-prop-alist.
+
+	* unidata/Makefile.in (${DSTDIR}/charprop.el): New arg to
+	unidata-gen-files.
+
 2011-03-07  Chong Yidong  <cyd@stupidchicken.com>
 
 	* Version 23.3 released.

=== modified file 'admin/unidata/Makefile.in'
--- admin/unidata/Makefile.in	2011-01-14 17:18:41 +0000
+++ admin/unidata/Makefile.in	2011-06-16 03:46:42 +0000
@@ -33,9 +33,10 @@
 
 ${DSTDIR}/charprop.el: unidata-gen.elc unidata.txt
 	ELC=`/bin/pwd`/unidata-gen.elc; \
-	DATA=`/bin/pwd`/unidata.txt; \
+	DATADIR=`/bin/pwd`; \
+	DATA=unidata.txt; \
 	cd ${DSTDIR}; \
-	${RUNEMACS} -batch --load $${ELC} -f unidata-gen-files $${DATA}
+	${RUNEMACS} -batch --load $${ELC} -f unidata-gen-files $${DATADIR} $${DATA}
 
 ../../src/biditype.h: UnicodeData.txt
 	gawk -F";" -f biditype.awk $< > $@

=== modified file 'admin/unidata/unidata-gen.el'
--- admin/unidata/unidata-gen.el	2011-01-14 17:18:41 +0000
+++ admin/unidata/unidata-gen.el	2011-06-17 07:07:19 +0000
@@ -41,7 +41,7 @@
 ;;   uni-name.el, uni-category.el, uni-combining.el, uni-bidi.el,
 ;;   uni-decomposition.el, uni-decimal.el, uni-digit.el, uni-numeric.el,
 ;;   uni-mirrored.el, uni-old-name.el, uni-comment.el, uni-uppercase.el,
-;;   uni-lowercase.el, uni-titlecase.el
+;;   uni-lowercase.el, uni-titlecase.el, uni-mirroring.el
 ;;	They each contain a single form of this format:
 ;;	  (char-code-property-register PROP CHAR-TABLE)
 ;;	where PROP is the same as above, and CHAR-TABLE is a
@@ -49,8 +49,9 @@
 ;;
 ;;   When they are installed in .../lisp/international/, the file
 ;;   "charprop.el" is preloaded in loadup.el.  The other files are
-;;   automatically loaded when the functions `get-char-code-property'
-;;   and `put-char-code-property' are called.
+;;   automatically loaded when the Lisp functions
+;;   `get-char-code-property' and `put-char-code-property', and C
+;;   function uniprop_table are called.
 ;;
 ;; FORMAT OF A CHAR TABLE
 ;;
@@ -70,7 +71,8 @@
 
 ;;   The char table has four extra slots:
 ;;      1st: property symbol
-;;	2nd: function to call to get a property value
+;;	2nd: function to call to get a property value,
+;;	     or an index number of C function to uncompress the data
 ;;	3nd: function to call to put a property value
 ;;	4th: function to call to get a description of a property value
 ;;	5th: data referred by the above functions
@@ -82,6 +84,11 @@
 
 (defvar unidata-list nil)
 
+;; Name of the directory containing files of Unicode Character
+;; Database.
+
+(defvar unidata-dir nil)
+
 (defun unidata-setup-list (unidata-text-file)
   (let* ((table (list nil))
 	 (tail table)
@@ -90,6 +97,7 @@
 			("^<.*Surrogate" . nil)
 			("^<.*Private Use" . PRIVATE\ USE)))
 	 val char name)
+    (setq unidata-text-file (expand-file-name unidata-text-file unidata-dir))
     (or (file-readable-p unidata-text-file)
 	(error "File not readable: %s" unidata-text-file))
     (with-temp-buffer
@@ -136,10 +144,13 @@
 ;; Alist of this form:
 ;;   (PROP INDEX GENERATOR FILENAME)
 ;; PROP: character property
-;; INDEX: index to each element of unidata-list for PROP
+;; INDEX: index to each element of unidata-list for PROP.
+;;   It may be a function that generates an alist of character codes
+;;   vs. the corresponding property values.
 ;; GENERATOR: function to generate a char-table
 ;; FILENAME: filename to store the char-table
 ;; DESCRIBER: function to call to get a description string of property value
+;; VAL-LIST: list of specially ordered property values
 
 (defconst unidata-prop-alist
   '((name
@@ -152,7 +163,9 @@
 Property value is one of the following symbols:
   Lu, Ll, Lt, Lm, Lo, Mn, Mc, Me, Nd, Nl, No, Pc, Pd, Ps, Pe, Pi, Pf, Po,
   Sm, Sc, Sk, So, Zs, Zl, Zp, Cc, Cf, Cs, Co, Cn"
-     unidata-describe-general-category)
+     unidata-describe-general-category
+     (Lu Ll Lt Lm Lo Mn Mc Me Nd Nl No Pc Pd Ps Pe Pi Pf Po
+	 Sm Sc Sk So Zs Zl Zp Cc Cf Cs Co Cn))
     (canonical-combining-class
      3 unidata-gen-table-integer "uni-combining.el"
      "Unicode canonical combining class.
@@ -164,7 +177,8 @@
 Property value is one of the following symbols:
   L, LRE, LRO, R, AL, RLE, RLO, PDF, EN, ES, ET,
   AN, CS, NSM, BN, B, S, WS, ON"
-     unidata-describe-bidi-class)
+     unidata-describe-bidi-class
+     (L R EN AN BN B AL LRE LRO RLE RLO PDF ES ET CS NSM S WS ON))
     (decomposition
      5 unidata-gen-table-decomposition "uni-decomposition.el"
      "Unicode decomposition mapping.
@@ -188,7 +202,7 @@
     (mirrored
      9 unidata-gen-table-symbol "uni-mirrored.el"
      "Unicode bidi mirrored flag.
-Property value is a symbol `Y' or `N'.")
+Property value is a symbol `Y' or `N'.  See also the property `mirroring'.")
     (old-name
      10 unidata-gen-table-name "uni-old-name.el"
      "Unicode old names as published in Unicode 1.0.
@@ -211,7 +225,12 @@
      14 unidata-gen-table-character "uni-titlecase.el"
      "Unicode simple titlecase mapping.
 Property value is a character."
-     string)))
+     string)
+    (mirroring
+     unidata-gen-mirroring-list unidata-gen-table-character "uni-mirroring.el"
+     "Unicode bidi-mirroring characters.
+Property value is a character that has the corresponding mirroring image,
+or nil for non-mirrored character.")))
 
 ;; Functions to access the above data.
 (defsubst unidata-prop-index (prop) (nth 1 (assq prop unidata-prop-alist)))
@@ -219,6 +238,7 @@
 (defsubst unidata-prop-file (prop) (nth 3 (assq prop unidata-prop-alist)))
 (defsubst unidata-prop-docstring (prop) (nth 4 (assq prop unidata-prop-alist)))
 (defsubst unidata-prop-describer (prop) (nth 5 (assq prop unidata-prop-alist)))
+(defsubst unidata-prop-val-list (prop) (nth 6 (assq prop unidata-prop-alist)))
 
 \f
 ;; SIMPLE TABLE
@@ -227,14 +247,18 @@
 ;; values of succeeding character codes are usually different, we use
 ;; a char-table described here to store such values.
 ;;
-;; If succeeding 128 characters has no property, a char-table has the
-;; symbol t for them.  Otherwise a char-table has a string of the
-;; following format for them.
+;; A char-table divides character code space (#x0..#x3FFFFF) into
+;; #x8000 blocks (each block contains 128 characters).
+
+;; If all characters of a block have no property, a char-table has the
+;; symbol nil for that block.  Otherwise a char-table has a string of
+;; the following format for it.
 ;;
-;; The first character of the string is FIRST-INDEX.
-;; The Nth (N > 0) character of the string is a property value of the
-;; character (BLOCK-HEAD + FIRST-INDEX + N - 1), where BLOCK-HEAD is
-;; the first of the characters in the block.
+;; The first character of the string is '\0'.
+;; The second character of the string is FIRST-INDEX.
+;; The Nth (N > 1) character of the string is a property value of the
+;; character (BLOCK-HEAD + FIRST-INDEX + N - 2), where BLOCK-HEAD is
+;; the first character of the block.
 ;;
 ;; The 4th extra slot of a char-table is nil.
 
@@ -247,9 +271,9 @@
     (let* ((len (length val))
 	   (block-head (lsh (lsh char -7) 7))
 	   (vec (make-vector 128 nil))
-	   (first-index (aref val 0)))
+	   (first-index (aref val 1)))
       (dotimes (i (1- len))
-	(let ((elt (aref val (1+ i))))
+	(let ((elt (aref val (+ 2 i))))
 	  (if (> elt 0)
 	      (aset vec (+ first-index i) elt))))
       (dotimes (i 128)
@@ -266,13 +290,15 @@
 	  (funcall (char-table-extra-slot table 1) char current-val table))
       (aset table char val))))
 
-(defun unidata-gen-table-character (prop)
+(defun unidata-gen-table-character (prop ignore)
   (let ((table (make-char-table 'char-code-property-table))
 	(prop-idx (unidata-prop-index prop))
 	(vec (make-vector 128 0))
 	(tail unidata-list)
 	elt range val idx slot)
-    (set-char-table-range table (cons 0 (max-char)) t)
+    (if (functionp prop-idx)
+	(setq tail (funcall prop-idx)
+	      prop-idx 1))
     (while tail
       (setq elt (car tail) tail (cdr tail))
       (setq range (car elt)
@@ -301,7 +327,7 @@
 		  (setq first-index last-index)))
 	    (setq tail (cdr tail)))
 	  (when first-index
-	    (let ((str (string first-index))
+	    (let ((str (string 0 first-index))
 		  c)
 	      (while (<= first-index last-index)
 		(setq str (format "%s%c"  str (or (aref vec first-index) 0))
@@ -311,7 +337,7 @@
     (set-char-table-extra-slot table 0 prop)
     (byte-compile 'unidata-get-character)
     (byte-compile 'unidata-put-character)
-    (set-char-table-extra-slot table 1 (symbol-function 'unidata-get-character))
+    (set-char-table-extra-slot table 1 0)
     (set-char-table-extra-slot table 2 (symbol-function 'unidata-put-character))
 
     table))
@@ -463,30 +489,34 @@
 ;; Encode the character property value VAL into an integer value by
 ;; VAL-LIST.  By side effect, VAL-LIST is modified.
 ;; VAL-LIST has this form:
-;;   (t (VAL1 . VAL-CODE1) (VAL2 . VAL-CODE2) ...)
-;; If VAL is one of VALn, just return VAL-CODEn.  Otherwise,
-;; VAL-LIST is modified to this:
-;;   (t (VAL . (1+ VAL-CODE1)) (VAL1 . VAL-CODE1) (VAL2 . VAL-CODE2) ...)
+;;   ((nil . 0) (VAL1 . 1) (VAL2 . 2) ...)
+;; If VAL is one of VALn, just return n.
+;; Otherwise, VAL-LIST is modified to this:
+;;   ((nil . 0) (VAL1 . 1) (VAL2 . 2) ... (VAL . n+1))
 
 (defun unidata-encode-val (val-list val)
   (let ((slot (assoc val val-list))
 	val-code)
     (if slot
 	(cdr slot)
-      (setq val-code (if (cdr val-list) (1+ (cdr (nth 1 val-list))) 1))
-      (setcdr val-list (cons (cons val val-code) (cdr val-list)))
+      (setq val-code (length val-list))
+      (nconc val-list (list (cons val val-code)))
       val-code)))
 
 ;; Generate a char-table for the character property PROP.
 
-(defun unidata-gen-table (prop val-func default-value)
+(defun unidata-gen-table (prop val-func val-list)
   (let ((table (make-char-table 'char-code-property-table))
 	(prop-idx (unidata-prop-index prop))
-	(val-list (list t))
 	(vec (make-vector 128 0))
 	tail elt range val val-code idx slot
 	prev-range-data)
-    (set-char-table-range table (cons 0 (max-char)) default-value)
+    (setq val-list (cons nil (copy-sequence val-list)))
+    (setq tail val-list val-code 0)
+    ;; Convert (nil A B ...) to ((nil . 0) (A . 1) (B . 2) ...)
+    (while tail
+      (setcar tail (cons (car tail) val-code))
+      (setq tail (cdr tail) val-code (1+ val-code)))
     (setq tail unidata-list)
     (while tail
       (setq elt (car tail) tail (cdr tail))
@@ -534,7 +564,7 @@
 	    (if val-code
 		(aset vec (- range start) val-code))
 	    (setq tail (cdr tail)))
-	  (setq str "" val-code -1 count 0)
+	  (setq str "\000" val-code -1 count 0)
 	  (mapc #'(lambda (x)
 		    (if (= val-code x)
 			(setq count (1+ count))
@@ -559,34 +589,33 @@
 		  (setq str (concat str (string val-code)))))
 	      (set-char-table-range table (cons start limit) str))))))
 
-    (setq val-list (nreverse (cdr val-list)))
     (set-char-table-extra-slot table 0 prop)
     (set-char-table-extra-slot table 4 (vconcat (mapcar 'car val-list)))
     table))
 
-(defun unidata-gen-table-symbol (prop)
+(defun unidata-gen-table-symbol (prop val-list)
   (let ((table (unidata-gen-table prop
 				  #'(lambda (x) (and (> (length x) 0)
 						     (intern x)))
-				  0)))
+				  val-list)))
     (byte-compile 'unidata-get-symbol)
     (byte-compile 'unidata-put-symbol)
-    (set-char-table-extra-slot table 1 (symbol-function 'unidata-get-symbol))
+    (set-char-table-extra-slot table 1 1)
     (set-char-table-extra-slot table 2 (symbol-function 'unidata-put-symbol))
     table))
 
-(defun unidata-gen-table-integer (prop)
+(defun unidata-gen-table-integer (prop val-list)
   (let ((table (unidata-gen-table prop
 				  #'(lambda (x) (and (> (length x) 0)
 						     (string-to-number x)))
-				  t)))
+				  val-list)))
     (byte-compile 'unidata-get-integer)
     (byte-compile 'unidata-put-integer)
-    (set-char-table-extra-slot table 1 (symbol-function 'unidata-get-integer))
+    (set-char-table-extra-slot table 1 1)
     (set-char-table-extra-slot table 2 (symbol-function 'unidata-put-integer))
     table))
 
-(defun unidata-gen-table-numeric (prop)
+(defun unidata-gen-table-numeric (prop val-list)
   (let ((table (unidata-gen-table prop
 				  #'(lambda (x)
 				      (if (string-match "/" x)
@@ -595,10 +624,10 @@
 					      (substring x (match-end 0))))
 					(if (> (length x) 0)
 					    (string-to-number x))))
-				  t)))
+				  val-list)))
     (byte-compile 'unidata-get-numeric)
     (byte-compile 'unidata-put-numeric)
-    (set-char-table-extra-slot table 1 (symbol-function 'unidata-get-numeric))
+    (set-char-table-extra-slot table 1 1)
     (set-char-table-extra-slot table 2 (symbol-function 'unidata-put-numeric))
     table))
 
@@ -1025,7 +1054,7 @@
 		      idx (1+ i)))))
 	(nreverse (cons (intern (substring str idx)) l))))))
 
-(defun unidata-gen-table-name (prop)
+(defun unidata-gen-table-name (prop ignore)
   (let* ((table (unidata-gen-table-word-list prop 'unidata-split-name))
 	 (word-tables (char-table-extra-slot table 4)))
     (byte-compile 'unidata-get-name)
@@ -1064,7 +1093,7 @@
 	(nreverse l)))))
 
 
-(defun unidata-gen-table-decomposition (prop)
+(defun unidata-gen-table-decomposition (prop ignore)
   (let* ((table (unidata-gen-table-word-list prop 'unidata-split-decomposition))
 	 (word-tables (char-table-extra-slot table 4)))
     (byte-compile 'unidata-get-decomposition)
@@ -1171,6 +1200,19 @@
 		 (string ?'))))
    val " "))
 
+(defun unidata-gen-mirroring-list ()
+  (let ((head (list nil))
+	tail)
+    (with-temp-buffer
+      (insert-file-contents (expand-file-name "BidiMirroring.txt" unidata-dir))
+      (goto-char (point-min))
+      (setq tail head)
+      (while (re-search-forward "^\\([0-9A-F]+\\);\\s +\\([0-9A-F]+\\)" nil t)
+	(let ((char (string-to-number (match-string 1) 16))
+	      (mirror (match-string 2)))
+	  (setq tail (setcdr tail (list (list char mirror)))))))
+    (cdr head)))
+
 ;; Verify if we can retrieve correct values from the generated
 ;; char-tables.
 
@@ -1212,13 +1254,16 @@
 ;; The entry function.  It generates files described in the header
 ;; comment of this file.
 
-(defun unidata-gen-files (&optional unidata-text-file)
-  (or unidata-text-file
-      (setq unidata-text-file (car command-line-args-left)
+(defun unidata-gen-files (&optional data-dir unidata-text-file)
+  (or data-dir
+      (setq data-dir (car command-line-args-left)
+	    command-line-args-left (cdr command-line-args-left)
+	    unidata-text-file (car command-line-args-left)
 	    command-line-args-left (cdr command-line-args-left)))
-  (unidata-setup-list unidata-text-file)
   (let ((coding-system-for-write 'utf-8-unix)
-	(charprop-file "charprop.el"))
+	(charprop-file "charprop.el")
+	(unidata-dir data-dir))
+    (unidata-setup-list unidata-text-file)
     (with-temp-file charprop-file
       (insert ";; Automatically generated by unidata-gen.el.\n")
       (dolist (elt unidata-prop-alist)
@@ -1227,6 +1272,7 @@
 	       (file (unidata-prop-file prop))
 	       (docstring (unidata-prop-docstring prop))
 	       (describer (unidata-prop-describer prop))
+	       (val-list (unidata-prop-val-list prop))
 	       table)
 	  ;; Filename in this comment line is extracted by sed in
 	  ;; Makefile.
@@ -1235,15 +1281,15 @@
 			  prop file docstring))
 	  (with-temp-file file
 	    (message "Generating %s..." file)
-	    (setq table (funcall generator prop))
+	    (setq table (funcall generator prop val-list))
 	    (when describer
 	      (unless (subrp (symbol-function describer))
 		(byte-compile describer)
 		(setq describer (symbol-function describer)))
 	      (set-char-table-extra-slot table 3 describer))
 	    (insert ";; Copyright (C) 1991-2009 Unicode, Inc.
-;; This file was generated from the Unicode data file at
-;; http://www.unicode.org/Public/UNIDATA/UnicodeData.txt.
+;; This file was generated from the Unicode data files at
+;; http://www.unicode.org/Public/UNIDATA/.
 ;; See lisp/international/README for the copyright and permission notice.\n"
 		    (format "(define-char-code-property '%S %S %S)\n"
 			    prop table docstring)
@@ -1251,7 +1297,8 @@
 		    ";; coding: utf-8\n"
 		    ";; no-byte-compile: t\n"
 		    ";; End:\n\n"
-		    (format ";; %s ends here\n" file)))))
+		    (format ";; %s ends here\n" file))
+	    (message "Generating %s...done" file))))
       (message "Writing %s..." charprop-file)
       (insert ";; Local Variables:\n"
 	      ";; coding: utf-8\n"

=== modified file 'lisp/ChangeLog'
--- lisp/ChangeLog	2011-05-18 03:42:33 +0000
+++ lisp/ChangeLog	2011-06-17 07:33:43 +0000
@@ -1,3 +1,28 @@
+2011-06-17  Kenichi Handa  <handa@m17n.org>
+
+	* international/mule-cmds.el (char-code-property-alist): Moved to
+	to src/chartab.c.
+	(get-char-code-property): Call get-unicode-property-internal where
+	necessary.
+
+	* international/charprop.el:
+	* international/uni-bidi.el:
+	* international/uni-category.el:
+	* international/uni-combining.el:
+	* international/uni-comment.el:
+	* international/uni-decimal.el:
+	* international/uni-decomposition.el:
+	* international/uni-digit.el:
+	* international/uni-lowercase.el:
+	* international/uni-mirrored.el:
+	* international/uni-name.el:
+	* international/uni-numeric.el:
+	* international/uni-old-name.el:
+	* international/uni-titlecase.el:
+	* international/uni-uppercase.el: Regenerate.
+
+	* international/uni-mirroring.el: New file.
+
 2011-05-18  Glenn Morris  <rgm@gnu.org>
 
 	* emacs-lisp/authors.el (authors-fixed-entries): Remove fakemail.c

=== modified file 'lisp/international/charprop.el'
--- lisp/international/charprop.el	2010-06-09 15:46:41 +0000
+++ lisp/international/charprop.el	2011-06-17 07:07:51 +0000
@@ -1,8 +1,4 @@
-;; Copyright (C) 1991-2010 Unicode, Inc.
-;; This file was generated from the Unicode data file at
-;; http://www.unicode.org/Public/UNIDATA/UnicodeData.txt.
-;; See lisp/international/README for the copyright and permission notice.
-
+;; Automatically generated by unidata-gen.el.
 ;; FILE: uni-name.el
 (define-char-code-property 'name "uni-name.el"
   "Unicode character name.
@@ -45,7 +41,7 @@
 ;; FILE: uni-mirrored.el
 (define-char-code-property 'mirrored "uni-mirrored.el"
   "Unicode bidi mirrored flag.
-Property value is a symbol `Y' or `N'.")
+Property value is a symbol `Y' or `N'.  See also the property `mirroring'.")
 ;; FILE: uni-old-name.el
 (define-char-code-property 'old-name "uni-old-name.el"
   "Unicode old names as published in Unicode 1.0.
@@ -66,6 +62,11 @@
 (define-char-code-property 'titlecase "uni-titlecase.el"
   "Unicode simple titlecase mapping.
 Property value is a character.")
+;; FILE: uni-mirroring.el
+(define-char-code-property 'mirroring "uni-mirroring.el"
+  "Unicode bidi-mirroring characters.
+Property value is a character that has the corresponding mirroring image,
+or nil for non-mirrored character.")
 ;; Local Variables:
 ;; coding: utf-8
 ;; no-byte-compile: t

=== modified file 'lisp/international/mule-cmds.el'
--- lisp/international/mule-cmds.el	2011-02-28 01:07:29 +0000
+++ lisp/international/mule-cmds.el	2011-06-04 04:33:19 +0000
@@ -2709,16 +2709,6 @@
 \f
 ;;; Character property
 
-;; Each element has the form (PROP . TABLE).
-;; PROP is a symbol representing a character property.
-;; TABLE is a char-table containing the property value for each character.
-;; TABLE may be a name of file to load to build a char-table.
-;; Don't modify this variable directly but use `define-char-code-property'.
-
-(defvar char-code-property-alist nil
-  "Alist of character property name vs char-table containing property values.
-Internal use only.")
-
 (put 'char-code-property-table 'char-table-extra-slots 5)
 
 (defun define-char-code-property (name table &optional docstring)
@@ -2776,10 +2766,11 @@
 	  (if (stringp (cdr slot))
 	      (load (cdr slot) nil t))
 	  (setq table (cdr slot)
-		value (aref table char)
 		func (char-table-extra-slot table 1))
-	  (if (functionp func)
-	      (setq value (funcall func char value table)))
+	  (cond ((integerp func)
+		 (setq value (get-unicode-property-internal char table)))
+		((functionp func)
+		 (setq value (funcall func char (aref table char) table))))
 	  value)
       (plist-get (aref char-code-property-table char) propname))))
 

=== modified file 'lisp/international/uni-bidi.el'
Binary files lisp/international/uni-bidi.el	2010-09-05 02:06:39 +0000 and lisp/international/uni-bidi.el	2011-06-17 07:07:49 +0000 differ
=== modified file 'lisp/international/uni-category.el'
Binary files lisp/international/uni-category.el	2010-09-05 02:06:39 +0000 and lisp/international/uni-category.el	2011-06-17 07:07:49 +0000 differ
=== modified file 'lisp/international/uni-combining.el'
Binary files lisp/international/uni-combining.el	2010-09-05 02:06:39 +0000 and lisp/international/uni-combining.el	2011-06-17 07:07:49 +0000 differ
=== modified file 'lisp/international/uni-comment.el'
Binary files lisp/international/uni-comment.el	2010-06-09 15:46:41 +0000 and lisp/international/uni-comment.el	2011-06-17 07:07:50 +0000 differ
=== modified file 'lisp/international/uni-decimal.el'
Binary files lisp/international/uni-decimal.el	2010-09-05 02:06:39 +0000 and lisp/international/uni-decimal.el	2011-06-17 07:07:50 +0000 differ
=== modified file 'lisp/international/uni-decomposition.el'
Binary files lisp/international/uni-decomposition.el	2010-06-09 15:46:41 +0000 and lisp/international/uni-decomposition.el	2011-06-17 07:07:50 +0000 differ
=== modified file 'lisp/international/uni-digit.el'
Binary files lisp/international/uni-digit.el	2010-06-09 15:46:41 +0000 and lisp/international/uni-digit.el	2011-06-17 07:07:50 +0000 differ
=== modified file 'lisp/international/uni-lowercase.el'
Binary files lisp/international/uni-lowercase.el	2010-06-09 15:46:41 +0000 and lisp/international/uni-lowercase.el	2011-06-17 07:07:51 +0000 differ
=== modified file 'lisp/international/uni-mirrored.el'
Binary files lisp/international/uni-mirrored.el	2010-09-05 02:06:39 +0000 and lisp/international/uni-mirrored.el	2011-06-17 07:07:50 +0000 differ
=== added file 'lisp/international/uni-mirroring.el'
Binary files lisp/international/uni-mirroring.el	1970-01-01 00:00:00 +0000 and lisp/international/uni-mirroring.el	2011-06-17 07:07:51 +0000 differ
=== modified file 'lisp/international/uni-name.el'
Binary files lisp/international/uni-name.el	2010-09-05 02:06:39 +0000 and lisp/international/uni-name.el	2011-06-17 07:07:49 +0000 differ
=== modified file 'lisp/international/uni-numeric.el'
Binary files lisp/international/uni-numeric.el	2010-06-09 15:46:41 +0000 and lisp/international/uni-numeric.el	2011-06-17 07:07:50 +0000 differ
=== modified file 'lisp/international/uni-old-name.el'
Binary files lisp/international/uni-old-name.el	2010-06-09 15:46:41 +0000 and lisp/international/uni-old-name.el	2011-06-17 07:07:50 +0000 differ
=== modified file 'lisp/international/uni-titlecase.el'
Binary files lisp/international/uni-titlecase.el	2010-06-09 15:46:41 +0000 and lisp/international/uni-titlecase.el	2011-06-17 07:07:51 +0000 differ
=== modified file 'lisp/international/uni-uppercase.el'
Binary files lisp/international/uni-uppercase.el	2010-06-09 15:46:41 +0000 and lisp/international/uni-uppercase.el	2011-06-17 07:07:50 +0000 differ
=== modified file 'src/ChangeLog'
--- src/ChangeLog	2011-05-18 03:00:08 +0000
+++ src/ChangeLog	2011-06-17 07:32:14 +0000
@@ -1,3 +1,18 @@
+2011-06-17  Kenichi Handa  <handa@m17n.org>
+
+	* chartab.c (sub_char_table_set): Delete it.
+	(char_table_get_bottom): New function.
+	(char_table_set): Use char_table_get_bottom.
+	(UNIPROP_COMPRESSED_FORM_P): New macro.
+	(uniprop_table_uncompress_simple)
+	(uniprop_table_uncompress_run_length): New functions.
+	(uniprop_uncompress, uniprop_uncompressor_count): New variables.
+	(uniprop_table, uniprop_table_lookup)
+	(Fget_unicode_property_internal): New functions.
+	(syms_of_chartab): Defsubr
+	Sget_unicode_property_internal. Defvar_lisp
+	char-code-property-alist.
+
 2011-05-18  Christoph Scholtes  <cschol2112@googlemail.com>
 
 	* menu.c: Include limits.h (fixes the MS-Windows build broken by

=== modified file 'src/chartab.c'
--- src/chartab.c	2011-05-12 07:07:06 +0000
+++ src/chartab.c	2011-06-16 03:54:55 +0000
@@ -330,7 +330,7 @@
   return val;
 }
 
-
+#if 0
 static void
 sub_char_table_set (Lisp_Object table, int c, Lisp_Object val)
 {
@@ -354,6 +354,29 @@
       sub_char_table_set (sub, c, val);
     }
 }
+#endif
+
+static Lisp_Object
+char_table_get_bottom (struct Lisp_Char_Table *tbl, int c)
+{
+  Lisp_Object *sub = tbl->contents + CHARTAB_IDX (c, 0, 0);
+  int depth;
+
+  if (! SUB_CHAR_TABLE_P (*sub))
+    *sub = make_sub_char_table (1, c & ~(chartab_chars[0] - 1), *sub);
+  for (depth = 1; depth < 3; depth++)
+    {
+      int min_char = XINT (XSUB_CHAR_TABLE (*sub)->min_char);
+      int idx = CHARTAB_IDX (c, depth, min_char);
+
+      sub = XSUB_CHAR_TABLE (*sub)->contents + idx;
+      if (! SUB_CHAR_TABLE_P (*sub))
+	*sub = make_sub_char_table (depth + 1, c & ~(chartab_chars[depth] - 1),
+				    *sub);
+    }
+  return *sub;
+}
+
 
 Lisp_Object
 char_table_set (Lisp_Object table, int c, Lisp_Object val)
@@ -367,18 +390,11 @@
     }
   else
     {
-      int i = CHARTAB_IDX (c, 0, 0);
-      Lisp_Object sub;
+      Lisp_Object sub_table = char_table_get_bottom (tbl, c);
 
-      sub = tbl->contents[i];
-      if (! SUB_CHAR_TABLE_P (sub))
-	{
-	  sub = make_sub_char_table (1, i * chartab_chars[0], sub);
-	  tbl->contents[i] = sub;
-	}
-      sub_char_table_set (sub, c, val);
+      XSUB_CHAR_TABLE (sub_table)->contents[c & (chartab_chars[2] - 1)] = val;
       if (ASCII_CHAR_P (c))
-	tbl->ascii = char_table_ascii (table);
+	tbl->ascii = sub_table;
     }
   return val;
 }
@@ -984,6 +1000,213 @@
 }
 
 \f
+/* Unicode character property
+
+   This section provides a convenient and efficient way to get a
+   Unicode character property from C code (from Lisp, you can use
+   get-char-code-property).
+
+   The typical usage is to get a char-table for a specific property at
+   a proper initialization time as this:
+
+	Lisp_Object bidi_class_table = uniprop_table (intern ("bidi-class"));
+
+   and get a property value for character CH as this:
+
+	Lisp_Object bidi_class = uniprop_table_lookup (CH, bidi_class_table);
+
+   In this case, what you actually get is an index number to the
+   vector of property values (symbols nil, L, R, etc).  See the
+   comment of uniprop_table_lookup for the detail.  */
+
+/* Nonzero iff OBJ is a string representing Unicode character
+   properties of 128 succeeding characters (the bottom level of a
+   char-table) by a compressed format.  We are sure that no property
+   value has a string starting from NULL character.  */
+
+#define UNIPROP_COMPRESSED_FORM_P(OBJ) (STRINGP (OBJ) && (SREF (OBJ, 0) == 0))
+
+
+/* The first element of uniprop_uncompress (which see).
+   This decodes the compressed data of "SIMPLE TABLE" (see the comment
+   in admin/unidata/unidata-gen.el.  */
+
+static Lisp_Object
+uniprop_table_uncompress_simple (Lisp_Object str,
+				 struct Lisp_Sub_Char_Table *subtbl)
+{
+  const unsigned char *p, *pend;
+  int i, idx;
+
+  p = SDATA (str) + 1, pend = p + SBYTES (str);
+  idx = STRING_CHAR_ADVANCE (p);
+  for (i = 0; i < idx; i++)
+    subtbl->contents[idx] = Qnil;
+  while (p < pend && idx < chartab_chars[2])
+    {
+      int v = STRING_CHAR_ADVANCE (p);
+      subtbl->contents[idx++] = v > 0 ? make_number (v) : Qnil;
+    }
+  while (idx < chartab_chars[2])
+    subtbl->contents[idx++] = Qnil;
+  return Qnil;
+}
+
+/* The second element of uniprop_uncompress (which see).
+   This decodes the compressed data of "RUN-LENGTH TABLE" (see the
+   comment in admin/unidata/unidata-gen.el.  */
+
+static Lisp_Object
+uniprop_table_uncompress_run_length (Lisp_Object str,
+				     struct Lisp_Sub_Char_Table *subtbl)
+{
+  const unsigned char *p, *pend;
+  int idx;
+
+  p = SDATA (str) + 1, pend = p + SBYTES (str);
+  for (idx = 0; p < pend; )
+    {
+      int v = STRING_CHAR_ADVANCE (p);
+      int count = 1;
+      int len;
+      
+      if (p < pend)
+	{
+	  count = STRING_CHAR_AND_LENGTH (p, len);
+	  if (count < 128)
+	    count = 1;
+	  else
+	    {
+	      count -= 128;
+	      p += len;
+	    }
+	}
+      while (count-- > 0)
+	subtbl->contents[idx++] = make_number (v);
+    }
+  return Qnil;
+}
+
+#if 0
+/* It seems that we don't need this function because C code won't need
+   to get a property that is compressed in this form.  */
+
+/* The third element of uniprop_uncompress (which see).
+   This decodes the compressed data of "WORD-LIST TABLE" (see the
+   comment in admin/unidata/unidata-gen.el.  */
+
+static Lisp_Object
+uniprop_table_uncompress_word_list (Lisp_Object str,
+				    struct Lisp_Sub_Char_Table *subtbl)
+{
+  return Qnil;
+}
+#endif
+
+/* Array of functions that decode the compressed property values for
+   consecutive 128 characters in STR, and store each value in the
+   sub-chartable SUBTBL.  */
+
+static Lisp_Object (*uniprop_uncompress []) (Lisp_Object,
+					     struct Lisp_Sub_Char_Table *)
+  = { uniprop_table_uncompress_simple,
+      uniprop_table_uncompress_run_length};
+
+/* How many elements uniprop_uncompress has.  */
+
+static int uniprop_uncompressor_count =
+  (sizeof uniprop_uncompress) / (sizeof uniprop_uncompress[0]);
+
+/* Return a char-table for Unicode character property PROPNAME.  This
+   function may load a Lisp file and thus may cause
+   garbage-collection.  */
+
+Lisp_Object
+uniprop_table (Lisp_Object propname)
+{
+  Lisp_Object val, table;
+
+  val = Fassq (propname, Vchar_code_property_alist);
+  if (! CONSP (val))
+    return Qnil;
+  table = XCDR (val);
+  if (STRINGP (table))
+    {
+      Lisp_Object result = Fload (AREF (table, 0), Qt, Qt, Qt, Qt);
+      if (NILP (result))
+	return Qnil;
+      table = XCDR (val);
+    }
+  if (NILP (table))
+    return Qnil;
+  val = XCHAR_TABLE (table)->extras[1];
+  if (INTEGERP (val)
+      && XINT (val) >= 0 && XINT (val) < uniprop_uncompressor_count)
+    return table;
+  return Qnil;
+}
+
+/* Return a Unicode character property of character C stored in TABLE.
+   TABLE must be what the function uniprop_table returns.
+
+   If XCHAR_TABLE (TABLE)->extras[4] is a vector, the returned value
+   is an index number to that vector which contains the actual
+   property value.  */
+
+Lisp_Object
+uniprop_table_lookup (int c, Lisp_Object table)
+{
+  struct Lisp_Char_Table *tbl = XCHAR_TABLE (table);
+  struct Lisp_Sub_Char_Table *subtbl;
+  Lisp_Object sub_table, val;
+
+  if (ASCII_CHAR_P (c))
+    {
+      if (SUB_CHAR_TABLE_P (tbl->ascii))
+	return XSUB_CHAR_TABLE (tbl->ascii)->contents[c];
+      /* Always unfold the bottom sub-table for ASCII chars.  */
+      tbl->ascii = sub_table = char_table_get_bottom (tbl, c);
+      subtbl = XSUB_CHAR_TABLE (sub_table);
+      val = subtbl->contents[c];
+      if (UNIPROP_COMPRESSED_FORM_P (val))
+	uniprop_uncompress[XINT (tbl->extras[1])] (val, subtbl);
+    }
+  else
+    {
+      val = char_table_ref (table, c);
+      if (! UNIPROP_COMPRESSED_FORM_P (val))
+	return val;
+      sub_table = char_table_get_bottom (tbl, c);
+      subtbl = XSUB_CHAR_TABLE (sub_table);
+      uniprop_uncompress[XINT (tbl->extras[1])] (val, subtbl);
+    }
+  return subtbl->contents[c - XINT (subtbl->min_char)];
+}
+
+DEFUN ("get-unicode-property-internal", Fget_unicode_property_internal,
+       Sget_unicode_property_internal, 2, 2, 0,
+       doc: /* Get Unicode character property of CH stored in TABLE.
+Internal use only.  */)
+  (Lisp_Object ch, Lisp_Object table)
+{
+  Lisp_Object val;
+
+  CHECK_CHARACTER (ch);
+  CHECK_CHAR_TABLE (table);
+  val = uniprop_table_lookup (XINT (ch), table);
+  if (INTEGERP (val)
+      && VECTORP (XCHAR_TABLE (table)->extras[4]))
+    {
+      Lisp_Object val_vec = XCHAR_TABLE (table)->extras[4];
+      if (XINT (val) > 0 && XINT (val) < ASIZE (val_vec))
+	val = AREF (val_vec, XINT (val) - 1);
+      else
+	val = Qnil;
+    }
+  return val;
+}
+
+\f
 void
 syms_of_chartab (void)
 {
@@ -998,4 +1221,17 @@
   defsubr (&Sset_char_table_default);
   defsubr (&Soptimize_char_table);
   defsubr (&Smap_char_table);
+  defsubr (&Sget_unicode_property_internal);
+
+  /* Each element has the form (PROP . TABLE).
+     PROP is a symbol representing a character property.
+     TABLE is a char-table containing the property value for each character.
+     TABLE may be a name of file to load to build a char-table.
+     This variable should be modified only through
+     `define-char-code-property'. */
+
+  DEFVAR_LISP ("char-code-property-alist", Vchar_code_property_alist,
+	       doc: /* Alist of character property name vs char-table containing property values.
+Internal use only.  */);
+  Vchar_code_property_alist = Qnil;
 }




^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: idn.el and confusables.txt
  2011-06-17  8:15                                                                                                                   ` Kenichi Handa
@ 2011-06-17 15:12                                                                                                                     ` Eli Zaretskii
  2011-06-21  2:07                                                                                                                       ` Kenichi Handa
  0 siblings, 1 reply; 182+ messages in thread
From: Eli Zaretskii @ 2011-06-17 15:12 UTC (permalink / raw)
  To: Kenichi Handa; +Cc: emacs-devel

[Resending, as the original message didn't get through.]

> From: Kenichi Handa <handa@m17n.org>
> Cc: emacs-devel@gnu.org
> Date: Fri, 17 Jun 2011 17:15:23 +0900
> 
> In article <tl7r57wo2wc.fsf@m17n.org>, handa <handa@m17n.org> writes:
> 
> > But, at first, I'll work on improving C interface to
> > uni-*.el.
> 
> Attached is the first version for that.  It provides two C
> functions (excerpt from chartab.c).

Thanks!

> 	Lisp_Object bidi_class_table = uniprop_table (intern ("bidi-class"));
> 
>    and get a property value for character CH as this:
> 
> 	Lisp_Object bidi_class = uniprop_table_lookup (CH, bidi_class_table);
> 
>    In this case, what you actually get is an index number to the
>    vector of property values (symbols nil, L, R, etc).  See the
>    comment of uniprop_table_lookup for the detail.  */
> 
> Eli, I arranged that bidi_class got as above is an integer
> value that is the same as enum bidi_type_t defined in
> dispextern.h.

This sounds good, but I think we should add a comment to both
dispextern.h and where you create the properties regarding the fact
that the order must be in sync if any changes are made in any of these
two places.

> And, updata_table (intern "mirroring") gives a char-table for
> mirroring character.

And what is the entry in that table for characters that don't have a
mirrored character defined by BidiMirroring.txt?  Is it the character
itself or nil?

Also, uni-mirroring.el will clash with uni-mirrorred.el on 8+3
filesystems.  Can't we use the same uni-mirrorred.el file for both
tables?

> Could you check if the attached patch gives sufficient
> facility to be used in the bidi code?

I will do that, thanks.  But please wait for a while, as I'd like to
finish the changes needed for string reordering first.

Thanks again for working on this.



^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: idn.el and confusables.txt
  2011-06-17 15:12                                                                                                                     ` Eli Zaretskii
@ 2011-06-21  2:07                                                                                                                       ` Kenichi Handa
  2011-06-21  2:53                                                                                                                         ` Eli Zaretskii
                                                                                                                                           ` (2 more replies)
  0 siblings, 3 replies; 182+ messages in thread
From: Kenichi Handa @ 2011-06-21  2:07 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel

In article <83vcw4wkxy.fsf@gnu.org>, Eli Zaretskii <eliz@gnu.org> writes:
> > 	Lisp_Object bidi_class_table = uniprop_table (intern ("bidi-class"));
> > 
> >    and get a property value for character CH as this:
> > 
> > 	Lisp_Object bidi_class = uniprop_table_lookup (CH, bidi_class_table);

> > 
> >    In this case, what you actually get is an index number to the
> >    vector of property values (symbols nil, L, R, etc).  See the
> >    comment of uniprop_table_lookup for the detail.  */
> > 
> > Eli, I arranged that bidi_class got as above is an integer
> > value that is the same as enum bidi_type_t defined in
> > dispextern.h.

> This sounds good, but I think we should add a comment to both
> dispextern.h and where you create the properties regarding the fact
> that the order must be in sync if any changes are made in any of these
> two places.

Ok.  By the way, as I got an idea of making the normal
CHAR_TABLE_REF macro usable for getting an element of such
table, please wait until I implement it.

> > And, updata_table (intern "mirroring") gives a char-table for
> > mirroring character.

> And what is the entry in that table for characters that don't have a
> mirrored character defined by BidiMirroring.txt?  Is it the character
> itself or nil?

It's nil.  We can make it the character itself, but that
makes the uni-mirroring.el file big.  Currently, the size is
just 3605 bytes (half of bidimirror.h).

> Also, uni-mirroring.el will clash with uni-mirrorred.el on 8+3
> filesystems.  Can't we use the same uni-mirrorred.el file for both
> tables?

If it is agreed to break the backward compatibility, I'll do
that.  At the moment, at least in Emacs code,
uni-mirrorred.el is not used.

---
Kenichi Handa
handa@m17n.org



^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: idn.el and confusables.txt
  2011-06-21  2:07                                                                                                                       ` Kenichi Handa
@ 2011-06-21  2:53                                                                                                                         ` Eli Zaretskii
  2011-06-21  3:29                                                                                                                           ` Kenichi Handa
  2011-07-07  6:10                                                                                                                         ` C interface to Unicode character property char-tables Kenichi Handa
  2011-08-06 16:52                                                                                                                         ` Using uniprop_table_lookup (was: idn.el and confusables.txt) Eli Zaretskii
  2 siblings, 1 reply; 182+ messages in thread
From: Eli Zaretskii @ 2011-06-21  2:53 UTC (permalink / raw)
  To: Kenichi Handa; +Cc: emacs-devel

> From: Kenichi Handa <handa@m17n.org>
> Cc: emacs-devel@gnu.org
> Date: Tue, 21 Jun 2011 11:07:02 +0900
> 
> By the way, as I got an idea of making the normal
> CHAR_TABLE_REF macro usable for getting an element of such
> table, please wait until I implement it.

Thanks, will do.

> > And what is the entry in that table for characters that don't have a
> > mirrored character defined by BidiMirroring.txt?  Is it the character
> > itself or nil?
> 
> It's nil.  We can make it the character itself, but that
> makes the uni-mirroring.el file big.  Currently, the size is
> just 3605 bytes (half of bidimirror.h).

nil is fine, it's what bidi.c expects.

> > Also, uni-mirroring.el will clash with uni-mirrorred.el on 8+3
> > filesystems.  Can't we use the same uni-mirrorred.el file for both
> > tables?
> 
> If it is agreed to break the backward compatibility, I'll do
> that.  At the moment, at least in Emacs code,
> uni-mirrorred.el is not used.

I'm not sure what backward compatibility this would break.  Adding
another table to a file hardly breaks something, can it?



^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: idn.el and confusables.txt
  2011-06-21  2:53                                                                                                                         ` Eli Zaretskii
@ 2011-06-21  3:29                                                                                                                           ` Kenichi Handa
  2011-06-21  6:11                                                                                                                             ` Eli Zaretskii
  0 siblings, 1 reply; 182+ messages in thread
From: Kenichi Handa @ 2011-06-21  3:29 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel

In article <83zklbvqrd.fsf@gnu.org>, Eli Zaretskii <eliz@gnu.org> writes:

> > > Also, uni-mirroring.el will clash with uni-mirrorred.el on 8+3
> > > filesystems.  Can't we use the same uni-mirrorred.el file for both
> > > tables?
> > 
> > If it is agreed to break the backward compatibility, I'll do
> > that.  At the moment, at least in Emacs code,
> > uni-mirrorred.el is not used.

> I'm not sure what backward compatibility this would break.  Adding
> another table to a file hardly breaks something, can it?

Ah, I misunderstood what you meant.  Yes, it's possible to
add one more define-char-code-property call in
uni-mirrorred.el.  One disadvantage is that you only need
uni-mirroring property but uni-mirrored property is also
loaded.

---
Kenichi Handa
handa@m17n.org



^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: idn.el and confusables.txt
  2011-06-21  3:29                                                                                                                           ` Kenichi Handa
@ 2011-06-21  6:11                                                                                                                             ` Eli Zaretskii
  2011-06-21  7:22                                                                                                                               ` Kenichi Handa
  0 siblings, 1 reply; 182+ messages in thread
From: Eli Zaretskii @ 2011-06-21  6:11 UTC (permalink / raw)
  To: Kenichi Handa; +Cc: emacs-devel

> From: Kenichi Handa <handa@m17n.org>
> Cc: emacs-devel@gnu.org
> Date: Tue, 21 Jun 2011 12:29:21 +0900
> 
> Yes, it's possible to add one more define-char-code-property call in
> uni-mirrorred.el.  One disadvantage is that you only need
> uni-mirroring property but uni-mirrored property is also loaded.

That should be fine, because both tables are very small.

Btw, this begs a question: should we preload uni-bidi.el and
uni-mirrored.el in Emacs 24?  If we don't, it will be loaded right
away on startup, I think, since bidi.c will need it.  Stefan, Chong,
WDYT?



^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: idn.el and confusables.txt
  2011-06-21  6:11                                                                                                                             ` Eli Zaretskii
@ 2011-06-21  7:22                                                                                                                               ` Kenichi Handa
  2011-06-21  7:34                                                                                                                                 ` Eli Zaretskii
  0 siblings, 1 reply; 182+ messages in thread
From: Kenichi Handa @ 2011-06-21  7:22 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel

In article <E1QYuB6-0004u1-8i@fencepost.gnu.org>, Eli Zaretskii <eliz@gnu.org> writes:

> > From: Kenichi Handa <handa@m17n.org>
> > Cc: emacs-devel@gnu.org
> > Date: Tue, 21 Jun 2011 12:29:21 +0900
> > 
> > Yes, it's possible to add one more define-char-code-property call in
> > uni-mirrorred.el.  One disadvantage is that you only need
> > uni-mirroring property but uni-mirrored property is also loaded.

> That should be fine, because both tables are very small.

> Btw, this begs a question: should we preload uni-bidi.el and
> uni-mirrored.el in Emacs 24?  If we don't, it will be loaded right
> away on startup, I think, since bidi.c will need it.  Stefan, Chong,
> WDYT?

Isn't it possible to load it where we surely know that the
table is necessarily.  For instance, in bidi_init_it.

---
Kenichi Handa
handa@m17n.org



^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: idn.el and confusables.txt
  2011-06-21  7:22                                                                                                                               ` Kenichi Handa
@ 2011-06-21  7:34                                                                                                                                 ` Eli Zaretskii
  2011-06-21  8:02                                                                                                                                   ` Kenichi Handa
  0 siblings, 1 reply; 182+ messages in thread
From: Eli Zaretskii @ 2011-06-21  7:34 UTC (permalink / raw)
  To: Kenichi Handa; +Cc: emacs-devel

> From: Kenichi Handa <handa@m17n.org>
> Cc: emacs-devel@gnu.org
> Date: Tue, 21 Jun 2011 16:22:23 +0900
> 
> > Btw, this begs a question: should we preload uni-bidi.el and
> > uni-mirrored.el in Emacs 24?  If we don't, it will be loaded right
> > away on startup, I think, since bidi.c will need it.  Stefan, Chong,
> > WDYT?
> 
> Isn't it possible to load it where we surely know that the
> table is necessarily.  For instance, in bidi_init_it.

Yes, of course.  But when bidi-display-reordering is turned on by
default, bidi_init will _always_ load these 2 files at startup,
whenever it needs to display the first thing on the Emacs frame (mode
line or menu/tool bar, I think).  AFAIK, it has been our policy since
before I remember to preload every Lisp file that is always loaded at
startup.



^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: idn.el and confusables.txt
  2011-06-21  7:34                                                                                                                                 ` Eli Zaretskii
@ 2011-06-21  8:02                                                                                                                                   ` Kenichi Handa
  2011-06-21 10:30                                                                                                                                     ` bidi at startup (was: idn.el and confusables.txt) Eli Zaretskii
  0 siblings, 1 reply; 182+ messages in thread
From: Kenichi Handa @ 2011-06-21  8:02 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel

In article <E1QYvTe-0002I4-79@fencepost.gnu.org>, Eli Zaretskii <eliz@gnu.org> writes:

> > Isn't it possible to load it where we surely know that the
> > table is necessarily.  For instance, in bidi_init_it.

> Yes, of course.  But when bidi-display-reordering is turned on by
> default, bidi_init will _always_ load these 2 files at startup,
> whenever it needs to display the first thing on the Emacs frame (mode
> line or menu/tool bar, I think).  AFAIK, it has been our policy since
> before I remember to preload every Lisp file that is always loaded at
> startup.

Is it used when we run emacs with -batch?

---
Kenichi Handa
handa@m17n.org



^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: bidi at startup (was: idn.el and confusables.txt)
  2011-06-21  8:02                                                                                                                                   ` Kenichi Handa
@ 2011-06-21 10:30                                                                                                                                     ` Eli Zaretskii
  2011-06-21 15:12                                                                                                                                       ` bidi at startup Stefan Monnier
  0 siblings, 1 reply; 182+ messages in thread
From: Eli Zaretskii @ 2011-06-21 10:30 UTC (permalink / raw)
  To: Kenichi Handa; +Cc: emacs-devel

> From: Kenichi Handa <handa@m17n.org>
> Cc: emacs-devel@gnu.org
> Date: Tue, 21 Jun 2011 17:02:46 +0900
> 
> > Yes, of course.  But when bidi-display-reordering is turned on by
> > default, bidi_init will _always_ load these 2 files at startup,
> > whenever it needs to display the first thing on the Emacs frame (mode
> > line or menu/tool bar, I think).  AFAIK, it has been our policy since
> > before I remember to preload every Lisp file that is always loaded at
> > startup.
> 
> Is it used when we run emacs with -batch?

That's an excellent question, one that I meant to raise a bit later,
but since we are talking...

I think in -batch mode Emacs should not reorder bidirectional text.
Mainly because if the underlying terminal is bidi-aware, it will
reorder it by itself, and if it isn't aware, most probably it doesn't
support the corresponding scripts anyway, and scripts that need
reordering will come out garbled.

So I think bidi-display-reordering should be set early during startup,
but only in interactive sessions, i.e. dynamically.

Comments?



^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: bidi at startup
  2011-06-21 10:30                                                                                                                                     ` bidi at startup (was: idn.el and confusables.txt) Eli Zaretskii
@ 2011-06-21 15:12                                                                                                                                       ` Stefan Monnier
  2011-06-21 17:13                                                                                                                                         ` Eli Zaretskii
  0 siblings, 1 reply; 182+ messages in thread
From: Stefan Monnier @ 2011-06-21 15:12 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel, Kenichi Handa

>> > Yes, of course.  But when bidi-display-reordering is turned on by
>> > default, bidi_init will _always_ load these 2 files at startup,
>> > whenever it needs to display the first thing on the Emacs frame (mode
>> > line or menu/tool bar, I think).  AFAIK, it has been our policy since
>> > before I remember to preload every Lisp file that is always loaded at
>> > startup.

Yes, if the file is needed when you just run "emacs" with an empty
.emacs file, then it definitely should be preloaded.

>> Is it used when we run emacs with -batch?

That doesn't matter for preloading.

> I think in -batch mode Emacs should not reorder bidirectional text.
> Mainly because if the underlying terminal is bidi-aware, it will
> reorder it by itself, and if it isn't aware, most probably it doesn't
> support the corresponding scripts anyway, and scripts that need
> reordering will come out garbled.

> So I think bidi-display-reordering should be set early during startup,
> but only in interactive sessions, i.e. dynamically.

As mentioned a few days ago in another thread, thinking in terms of
"interactive/noninteractive" session is the wrong way to look at it.
Instead, you want to think about which terminal is used.  E.g. think of
the case of an Emacs server that did not daemonize: some terminals will
be GUIs, others will be ttys, and there'll be one more which is just
stdin/stdout.  IIUC you're saying that the special initial terminal
(bound to stdin/stdout) should not do bib-processing, which sounds fine.


        Stefan



^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: bidi at startup
  2011-06-21 15:12                                                                                                                                       ` bidi at startup Stefan Monnier
@ 2011-06-21 17:13                                                                                                                                         ` Eli Zaretskii
  2011-06-22 15:32                                                                                                                                           ` Stefan Monnier
  0 siblings, 1 reply; 182+ messages in thread
From: Eli Zaretskii @ 2011-06-21 17:13 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: emacs-devel, handa

> From: Stefan Monnier <monnier@iro.umontreal.ca>
> Cc: Kenichi Handa <handa@m17n.org>,  emacs-devel@gnu.org
> Date: Tue, 21 Jun 2011 11:12:12 -0400
> 
> >> Is it used when we run emacs with -batch?
> 
> That doesn't matter for preloading.

I think it does, because temacs is normally run with -batch.

> > I think in -batch mode Emacs should not reorder bidirectional text.
> > Mainly because if the underlying terminal is bidi-aware, it will
> > reorder it by itself, and if it isn't aware, most probably it doesn't
> > support the corresponding scripts anyway, and scripts that need
> > reordering will come out garbled.
> 
> > So I think bidi-display-reordering should be set early during startup,
> > but only in interactive sessions, i.e. dynamically.
> 
> As mentioned a few days ago in another thread, thinking in terms of
> "interactive/noninteractive" session is the wrong way to look at it.

Maybe in general, but not in this case, see below.

> Instead, you want to think about which terminal is used.  E.g. think of
> the case of an Emacs server that did not daemonize: some terminals will
> be GUIs, others will be ttys

The device where Emacs displays is of no importance, as long as the
display engine supports it, because the bidi reordering works on all
of them.

> and there'll be one more which is just
> stdin/stdout.  IIUC you're saying that the special initial terminal
> (bound to stdin/stdout) should not do bib-processing, which sounds fine.

No.  stdout is not a display device in this context; a text terminal
is.  When Emacs writes to stdout from `message' and such likes, the
result doesn't trigger redisplay, and thus doesn't invoke the
reordering.  Reordering happens when the result of `message' is
inserted into some buffer, like the minibuffer, because that triggers
redisplay, which reorders.

What I was saying was that "emacs -batch" does not need reordering
services, because text we write to stdout should not be reordered.
Therefore, I think bidi-display-reordering should be initialized to
nil in xdisp.c, then set to t during startup if Emacs is invoked
interactively or as a daemon.  Do you agree?

^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: bidi at startup
  2011-06-21 17:13                                                                                                                                         ` Eli Zaretskii
@ 2011-06-22 15:32                                                                                                                                           ` Stefan Monnier
  0 siblings, 0 replies; 182+ messages in thread
From: Stefan Monnier @ 2011-06-22 15:32 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel, handa

>> >> Is it used when we run emacs with -batch?
>> That doesn't matter for preloading.
> I think it does, because temacs is normally run with -batch.

I don't see how that's relevant.

> No.  stdout is not a display device in this context; a text terminal
> is.  When Emacs writes to stdout from `message' and such likes, the
> result doesn't trigger redisplay, and thus doesn't invoke the
> reordering.

Good.

> What I was saying was that "emacs -batch" does not need reordering
> services, because text we write to stdout should not be reordered.
> Therefore, I think bidi-display-reordering should be initialized to
> nil in xdisp.c, then set to t during startup if Emacs is invoked
> interactively or as a daemon.  Do you agree?

No.  I still don't see the harm in setting it to t all the time.
From what you're saying bidi only happens during redisplay and redisplay
does not happen in --batch mode, so the setting of
bidi-display-reordering is irrelevant in such a situation and might as
well be the same as in other cases.


        Stefan



^ permalink raw reply	[flat|nested] 182+ messages in thread

* C interface to Unicode character property char-tables.
  2011-06-21  2:07                                                                                                                       ` Kenichi Handa
  2011-06-21  2:53                                                                                                                         ` Eli Zaretskii
@ 2011-07-07  6:10                                                                                                                         ` Kenichi Handa
  2011-08-06 16:52                                                                                                                         ` Using uniprop_table_lookup (was: idn.el and confusables.txt) Eli Zaretskii
  2 siblings, 0 replies; 182+ messages in thread
From: Kenichi Handa @ 2011-07-07  6:10 UTC (permalink / raw)
  To: Kenichi Handa; +Cc: eliz, emacs-devel

In article <tl7aadcj5tl.fsf@m17n.org>, Kenichi Handa <handa@m17n.org> writes:

> Ok.  By the way, as I got an idea of making the normal
> CHAR_TABLE_REF macro usable for getting an element of such
> table, please wait until I implement it.

I've just committed that change to the trunk.

---
Kenichi Handa
handa@m17n.org



^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: Using uniprop_table_lookup (was: idn.el and confusables.txt)
  2011-06-21  2:07                                                                                                                       ` Kenichi Handa
  2011-06-21  2:53                                                                                                                         ` Eli Zaretskii
  2011-07-07  6:10                                                                                                                         ` C interface to Unicode character property char-tables Kenichi Handa
@ 2011-08-06 16:52                                                                                                                         ` Eli Zaretskii
  2011-08-09  0:55                                                                                                                           ` Kenichi Handa
  2 siblings, 1 reply; 182+ messages in thread
From: Eli Zaretskii @ 2011-08-06 16:52 UTC (permalink / raw)
  To: Kenichi Handa; +Cc: emacs-devel

> From: Kenichi Handa <handa@m17n.org>
> Cc: emacs-devel@gnu.org
> Date: Tue, 21 Jun 2011 11:07:02 +0900
> 
> In article <83vcw4wkxy.fsf@gnu.org>, Eli Zaretskii <eliz@gnu.org> writes:
> > > 	Lisp_Object bidi_class_table = uniprop_table (intern ("bidi-class"));
> > > 
> > >    and get a property value for character CH as this:
> > > 
> > > 	Lisp_Object bidi_class = uniprop_table_lookup (CH, bidi_class_table);
> 
> > > 
> > >    In this case, what you actually get is an index number to the
> > >    vector of property values (symbols nil, L, R, etc).  See the
> > >    comment of uniprop_table_lookup for the detail.  */
> > > 
> > > Eli, I arranged that bidi_class got as above is an integer
> > > value that is the same as enum bidi_type_t defined in
> > > dispextern.h.
> 
> > This sounds good, but I think we should add a comment to both
> > dispextern.h and where you create the properties regarding the fact
> > that the order must be in sync if any changes are made in any of these
> > two places.
> 
> Ok.  By the way, as I got an idea of making the normal
> CHAR_TABLE_REF macro usable for getting an element of such
> table, please wait until I implement it.

I'm getting ready to use this feature instead of src/biditype.h.  I
understand that CHAR_TABLE_REF is usable for this job already, and the
updated way of using uniprop_table in bidi.c is this:

    Lisp_Object bidi_class_table = uniprop_table (intern ("bidi-class"));
    ...
    Lisp_Object bidi_class = CHAR_TABLE_REF (CH, bidi_class_table);

If so, do I need to staticpro the bidi_class_table variable above?  It
will be initialized only once, at startup time, and then used every
time bidi.c needs to look up the bidirectional properties of a
character.

Thanks.



^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: Using uniprop_table_lookup (was: idn.el and confusables.txt)
  2011-08-06 16:52                                                                                                                         ` Using uniprop_table_lookup (was: idn.el and confusables.txt) Eli Zaretskii
@ 2011-08-09  0:55                                                                                                                           ` Kenichi Handa
  2011-08-09  1:32                                                                                                                             ` Using uniprop_table_lookup Stefan Monnier
  0 siblings, 1 reply; 182+ messages in thread
From: Kenichi Handa @ 2011-08-09  0:55 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel

In article <838vr6wkh4.fsf@gnu.org>, Eli Zaretskii <eliz@gnu.org> writes:

> I'm getting ready to use this feature instead of src/biditype.h.  I
> understand that CHAR_TABLE_REF is usable for this job already, and the
> updated way of using uniprop_table in bidi.c is this:

>     Lisp_Object bidi_class_table = uniprop_table (intern ("bidi-class"));
>     ...
>     Lisp_Object bidi_class = CHAR_TABLE_REF (CH, bidi_class_table);

> If so, do I need to staticpro the bidi_class_table variable above?

No.  All returned tables by uniprop_table are stored in
Vchar_code_property_alist which is DEFVAR_LISPed in
chartab.c.  It means that the next call of uniprop_table
with the same PROP argument just returns the already loaded
one, and thus fast.

---
Kenichi Handa
handa@m17n.org



^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: Using uniprop_table_lookup
  2011-08-09  0:55                                                                                                                           ` Kenichi Handa
@ 2011-08-09  1:32                                                                                                                             ` Stefan Monnier
  2011-08-09  4:31                                                                                                                               ` Kenichi Handa
  0 siblings, 1 reply; 182+ messages in thread
From: Stefan Monnier @ 2011-08-09  1:32 UTC (permalink / raw)
  To: Kenichi Handa; +Cc: Eli Zaretskii, emacs-devel

>> I'm getting ready to use this feature instead of src/biditype.h.  I
>> understand that CHAR_TABLE_REF is usable for this job already, and the
>> updated way of using uniprop_table in bidi.c is this:

>> Lisp_Object bidi_class_table = uniprop_table (intern ("bidi-class"));
>> ...
>> Lisp_Object bidi_class = CHAR_TABLE_REF (CH, bidi_class_table);

>> If so, do I need to staticpro the bidi_class_table variable above?

> No.  All returned tables by uniprop_table are stored in
> Vchar_code_property_alist which is DEFVAR_LISPed in
> chartab.c.  It means that the next call of uniprop_table
> with the same PROP argument just returns the already loaded
> one, and thus fast.

But since Eli says that he initializes bidi_class_table only once and
never touches it again, if he doesn't staticpro that variable and
someone does someone changes Vchar_code_property_alist, you could end up
with bidi_class_table pointing to a dead object.

There's no point wasting time trying to figure out whether a global
Lisp_Object should be staticpro'd or not: just staticpro it and
move on.  In 99% of the cases it's the only safe choice and the
remaining 1% is not worth the trouble worrying about.


        Stefan



^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: Using uniprop_table_lookup
  2011-08-09  1:32                                                                                                                             ` Using uniprop_table_lookup Stefan Monnier
@ 2011-08-09  4:31                                                                                                                               ` Kenichi Handa
  2011-08-15  8:57                                                                                                                                 ` Eli Zaretskii
  0 siblings, 1 reply; 182+ messages in thread
From: Kenichi Handa @ 2011-08-09  4:31 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: eliz, emacs-devel

In article <jwvty9rfk32.fsf-monnier+emacs@gnu.org>, Stefan Monnier <monnier@iro.umontreal.ca> writes:
> > No.  All returned tables by uniprop_table are stored in
> > Vchar_code_property_alist which is DEFVAR_LISPed in
> > chartab.c.  It means that the next call of uniprop_table
> > with the same PROP argument just returns the already loaded
> > one, and thus fast.

> But since Eli says that he initializes bidi_class_table only once and
> never touches it again, if he doesn't staticpro that variable and
> someone does someone changes Vchar_code_property_alist, you could end up
> with bidi_class_table pointing to a dead object.

Ah, yes, you are right.  I misunderstood.

> There's no point wasting time trying to figure out whether a global
> Lisp_Object should be staticpro'd or not: just staticpro it and
> move on.  In 99% of the cases it's the only safe choice and the
> remaining 1% is not worth the trouble worrying about.

---
Kenichi Handa
handa@m17n.org



^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: Using uniprop_table_lookup
  2011-08-09  4:31                                                                                                                               ` Kenichi Handa
@ 2011-08-15  8:57                                                                                                                                 ` Eli Zaretskii
  0 siblings, 0 replies; 182+ messages in thread
From: Eli Zaretskii @ 2011-08-15  8:57 UTC (permalink / raw)
  To: Kenichi Handa; +Cc: monnier, emacs-devel

> From: Kenichi Handa <handa@m17n.org>
> Cc: eliz@gnu.org, emacs-devel@gnu.org
> Date: Tue, 09 Aug 2011 13:31:00 +0900
> 
> In article <jwvty9rfk32.fsf-monnier+emacs@gnu.org>, Stefan Monnier <monnier@iro.umontreal.ca> writes:
> > > No.  All returned tables by uniprop_table are stored in
> > > Vchar_code_property_alist which is DEFVAR_LISPed in
> > > chartab.c.  It means that the next call of uniprop_table
> > > with the same PROP argument just returns the already loaded
> > > one, and thus fast.
> 
> > But since Eli says that he initializes bidi_class_table only once and
> > never touches it again, if he doesn't staticpro that variable and
> > someone does someone changes Vchar_code_property_alist, you could end up
> > with bidi_class_table pointing to a dead object.
> 
> Ah, yes, you are right.  I misunderstood.

I have another question.  The commentary in chartab.c says:

   and get a property value for character CH as this:

       Lisp_Object bidi_class = CHAR_TABLE_REF (CH, bidi_class_table);

   In this case, what you actually get is an index number to the
   vector of property values (symbols nil, L, R, etc).

What does it mean by "in this case"? is this something specific to the
"bidi-class" property?  If yes, what will the call to CHAR_TABLE_REF
yield in the general case?  It is essential to document that, for the
C code to be able to use any Unicode character property through this
API.

The commentary also says:

   o The second extra slot is a Lisp function, an index (integer) to
   the array uniprop_decoder[], or nil.  If it is a Lisp function, we
   can't use such a table from C (at the moment).  If it is nil, it
   means that we don't have to decode values.

   o The third extra slot is a Lisp function, an index (integer) to
   the array uniprop_encoder[], or nil.  If it is a Lisp function, we
   can't use such a table from C (at the moment).  If it is nil, it
   means that we don't have to encode values.

This doesn't say anything about the indices into uniprop_decoder[] and
uniprop_encoder[] arrays, which I believe are values that need to be
handled by the C code.  Can you tell something about that?  I would
like to improve this commentary with the additional information.

TIA

^ permalink raw reply	[flat|nested] 182+ messages in thread

end of thread, other threads:[~2011-08-15  8:57 UTC | newest]

Thread overview: 182+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-03-18 19:11 Translation of http status code to text Lennart Borgman
2010-03-22  1:19 ` Juri Linkov
2010-03-22 13:17   ` Ted Zlatanov
2010-03-22 14:01     ` Stefan Monnier
2010-03-22 14:25       ` Ted Zlatanov
2010-03-22 17:06         ` Ted Zlatanov
2010-03-22 17:55           ` Sven Joachim
2010-03-22 19:23             ` Ted Zlatanov
2010-03-22 20:32               ` Sven Joachim
2010-03-22 21:31                 ` Ted Zlatanov
2010-03-23  9:55                   ` Juri Linkov
2010-03-23 13:08                     ` Lennart Borgman
2010-03-23 14:26                       ` face for non-ASCII characters (was: Translation of http status code to text) Ted Zlatanov
2010-03-23 16:28                         ` Lennart Borgman
2010-03-23 18:18                           ` face for non-ASCII characters Ted Zlatanov
2011-04-15 22:41                             ` Ted Zlatanov
2011-04-15 23:07                               ` Lennart Borgman
2011-04-16  0:51                                 ` Ted Zlatanov
2011-04-16  9:10                                   ` Lennart Borgman
2011-04-16 15:05                                     ` Ted Zlatanov
2011-04-16 15:28                                       ` Lennart Borgman
2011-04-16 15:42                                         ` Ted Zlatanov
2011-04-16 15:50                                           ` Lennart Borgman
2011-04-16 15:57                                             ` Ted Zlatanov
2011-04-16 16:01                                               ` Lennart Borgman
2011-04-16 16:13                                                 ` Ted Zlatanov
2011-04-16 16:22                                                   ` Lennart Borgman
2011-04-16 16:27                                                   ` Drew Adams
2011-04-16 16:45                                                     ` Ted Zlatanov
2011-04-16 16:48                                                       ` Lennart Borgman
2011-04-16 16:55                                                         ` Ted Zlatanov
2011-04-16 17:11                                                           ` Lennart Borgman
2011-04-18 15:48                                                             ` Ted Zlatanov
2011-04-18 15:53                                                               ` Lennart Borgman
2011-04-18 16:20                                                                 ` Ted Zlatanov
2011-04-18 17:03                                                                   ` Lennart Borgman
2011-04-19 13:07                                                                     ` Ted Zlatanov
2011-04-19 18:56                                                                       ` Lennart Borgman
2011-04-20 14:49                                                                         ` Ted Zlatanov
2011-04-20 21:38                                                                           ` Lennart Borgman
2011-04-21 17:35                                                                             ` Ted Zlatanov
2011-04-21 18:42                                                                               ` Lennart Borgman
2011-04-21 19:14                                                                                 ` Ted Zlatanov
2011-04-21 20:00                                                                                   ` Lennart Borgman
2011-04-21 20:35                                                                                     ` Ted Zlatanov
2011-04-21 20:53                                                                                       ` Lennart Borgman
2011-04-21 21:18                                                                                         ` Ted Zlatanov
2011-04-22 12:20                                                                                           ` Lennart Borgman
2011-04-22 12:49                                                                                             ` Stephen J. Turnbull
2011-04-22 13:23                                                                                               ` Lennart Borgman
2011-04-23  0:50                                                                                                 ` Richard Stallman
2011-04-23  7:13                                                                                                   ` Lennart Borgman
2011-04-25 17:54                                                                                                     ` Richard Stallman
2011-04-26 18:26                                                                                                       ` Chong Yidong
2011-04-26 19:05                                                                                                         ` Ted Zlatanov
2011-04-26 20:29                                                                                                           ` Chong Yidong
2011-04-27  3:45                                                                                                             ` Ted Zlatanov
2011-04-27  4:42                                                                                                               ` Stephen J. Turnbull
2011-05-02 18:18                                                                                                                 ` Ted Zlatanov
2011-05-03  1:50                                                                                                                   ` Stephen J. Turnbull
2011-05-03 14:45                                                                                                                     ` Ted Zlatanov
2011-05-03 21:21                                                                                                                       ` Lennart Borgman
2011-05-04 14:41                                                                                                                         ` Stephen J. Turnbull
2011-04-27 12:41                                                                                                         ` Lennart Borgman
2011-04-22 14:20                                                                                             ` Ted Zlatanov
2011-04-22 17:12                                                                                               ` Lennart Borgman
2011-04-26  3:14                                                                                                 ` package management proposals for Emacs (was: face for non-ASCII characters) Ted Zlatanov
2011-04-26  8:10                                                                                                   ` Lennart Borgman
2011-04-26 21:46                                                                                                     ` Richard Stallman
2011-04-27  1:19                                                                                                       ` package management proposals for Emacs Stefan Monnier
2011-04-27  3:36                                                                                                         ` Ted Zlatanov
2011-04-27 21:14                                                                                                         ` Richard Stallman
2011-04-26  3:09                                                                                               ` markchars.el 0.2.0 and idn.el (was: face for non-ASCII characters) Ted Zlatanov
2011-04-26  8:13                                                                                                 ` Lennart Borgman
2011-04-26 15:28                                                                                                   ` idn.el and confusables.txt (was: markchars.el 0.2.0 and idn.el) Ted Zlatanov
2011-05-13 19:42                                                                                                     ` idn.el and confusables.txt Stefan Monnier
2011-05-13 20:19                                                                                                       ` Ted Zlatanov
2011-05-14  8:13                                                                                                         ` Eli Zaretskii
2011-05-14  8:06                                                                                                       ` Eli Zaretskii
2011-05-14  8:56                                                                                                         ` Lennart Borgman
2011-05-14  9:36                                                                                                           ` Eli Zaretskii
2011-05-14 13:40                                                                                                         ` Ted Zlatanov
2011-05-14 14:38                                                                                                           ` Eli Zaretskii
2011-05-14 15:30                                                                                                             ` Ted Zlatanov
2011-05-14 16:42                                                                                                               ` Eli Zaretskii
2011-05-14 17:06                                                                                                                 ` Ted Zlatanov
2011-05-14 20:59                                                                                                                   ` Eli Zaretskii
2011-05-15  1:22                                                                                                                     ` Ted Zlatanov
2011-05-15  5:56                                                                                                                       ` Eli Zaretskii
2011-05-15 12:14                                                                                                                         ` Ted Zlatanov
2011-05-16 12:38                                                                                                                           ` Eli Zaretskii
2011-05-16 18:31                                                                                                                             ` Ted Zlatanov
2011-05-17 17:59                                                                                                                               ` Eli Zaretskii
2011-05-17 15:32                                                                                                                           ` Ted Zlatanov
2011-05-18 18:15                                                                                                                             ` Ted Zlatanov
2011-05-14 17:25                                                                                                             ` Stefan Monnier
2011-05-15 13:06                                                                                                         ` Kenichi Handa
2011-05-15 17:34                                                                                                           ` Eli Zaretskii
2011-05-18  5:23                                                                                                             ` handa
2011-05-18  7:38                                                                                                               ` Eli Zaretskii
2011-05-18  7:59                                                                                                                 ` handa
2011-05-18  8:13                                                                                                                   ` Eli Zaretskii
2011-06-17  8:15                                                                                                                   ` Kenichi Handa
2011-06-17 15:12                                                                                                                     ` Eli Zaretskii
2011-06-21  2:07                                                                                                                       ` Kenichi Handa
2011-06-21  2:53                                                                                                                         ` Eli Zaretskii
2011-06-21  3:29                                                                                                                           ` Kenichi Handa
2011-06-21  6:11                                                                                                                             ` Eli Zaretskii
2011-06-21  7:22                                                                                                                               ` Kenichi Handa
2011-06-21  7:34                                                                                                                                 ` Eli Zaretskii
2011-06-21  8:02                                                                                                                                   ` Kenichi Handa
2011-06-21 10:30                                                                                                                                     ` bidi at startup (was: idn.el and confusables.txt) Eli Zaretskii
2011-06-21 15:12                                                                                                                                       ` bidi at startup Stefan Monnier
2011-06-21 17:13                                                                                                                                         ` Eli Zaretskii
2011-06-22 15:32                                                                                                                                           ` Stefan Monnier
2011-07-07  6:10                                                                                                                         ` C interface to Unicode character property char-tables Kenichi Handa
2011-08-06 16:52                                                                                                                         ` Using uniprop_table_lookup (was: idn.el and confusables.txt) Eli Zaretskii
2011-08-09  0:55                                                                                                                           ` Kenichi Handa
2011-08-09  1:32                                                                                                                             ` Using uniprop_table_lookup Stefan Monnier
2011-08-09  4:31                                                                                                                               ` Kenichi Handa
2011-08-15  8:57                                                                                                                                 ` Eli Zaretskii
2011-05-31 10:42                                                                                                     ` uni-confusables 0.1 is on the Emacs ELPA branch (was: idn.el and confusables.txt) Ted Zlatanov
2011-06-08 10:42                                                                                                       ` uni-confusables 0.1 is on the Emacs ELPA branch Ted Zlatanov
2011-06-08 15:22                                                                                                         ` Stefan Monnier
2011-04-16 16:00                                             ` face for non-ASCII characters Drew Adams
2010-03-23 19:40                         ` Florian Beck
2010-03-23 14:35                       ` Translation of http status code to text Miles Bader
2010-03-23 14:22                     ` highlighting non-ASCII characters (was: Translation of http status code to text) Ted Zlatanov
2010-03-23 16:50                       ` highlighting non-ASCII characters (was: Translation of http statuscode " Drew Adams
2010-03-23 21:49                         ` highlighting non-ASCII characters Stefan Monnier
2010-03-23 21:53                           ` Drew Adams
2010-03-24  0:45                             ` Stefan Monnier
2010-03-24  1:03                               ` Ted Zlatanov
2010-03-24  2:47                                 ` Stefan Monnier
2010-03-24  4:20                                   ` Eli Zaretskii
2010-03-24  5:14                                     ` Jason Rumney
2010-03-24 13:25                                       ` Stefan Monnier
2010-03-24 15:06                                         ` Jason Rumney
2010-03-24 19:47                                           ` Ted Zlatanov
2010-03-24 10:05                                   ` Ted Zlatanov
2010-03-24 16:21                                     ` Lennart Borgman
2010-03-24 19:34                                       ` Lennart Borgman
2010-03-26 17:35                                         ` Ted Zlatanov
2010-03-26 20:43                                           ` Ted Zlatanov
2010-03-26 22:50                                             ` Lennart Borgman
2010-03-29 18:38                                               ` Ted Zlatanov
2010-03-29 18:48                                                 ` Drew Adams
2010-03-29 20:20                                                   ` Stefan Monnier
2010-03-29 20:19                                                 ` Stefan Monnier
2010-03-29 20:51                                                   ` Lennart Borgman
2010-03-30 13:22                                                     ` Ted Zlatanov
2010-03-29 21:05                                                   ` Ted Zlatanov
2010-03-29 21:31                                                     ` Lennart Borgman
2010-03-29 21:32                                                     ` Drew Adams
2010-03-30 13:15                                                       ` Ted Zlatanov
2010-03-30 14:04                                                         ` Drew Adams
2010-03-30 14:17                                                           ` Lennart Borgman
2010-03-30 14:42                                                           ` Ted Zlatanov
2010-03-30 16:18                                                         ` Juri Linkov
2010-03-30  1:45                                                     ` Stefan Monnier
2010-03-25  7:11                                       ` Juri Linkov
2010-03-25 14:07                                         ` Lennart Borgman
2010-03-25 17:32                                           ` Juri Linkov
2010-03-26  0:32                                             ` Lennart Borgman
2010-03-26 13:38                                               ` Stephen Berman
2010-03-26 22:44                                                 ` Lennart Borgman
2010-03-25  7:12                                     ` Juri Linkov
2010-03-24  2:09                               ` Drew Adams
2010-03-24  5:00                               ` Stephen J. Turnbull
2010-03-24  9:28                               ` Juri Linkov
2010-03-24 13:15                                 ` Ted Zlatanov
2010-03-24  9:27                       ` Juri Linkov
2010-03-22 18:41           ` Translation of http status code to text Stefan Monnier
2010-03-22 19:15             ` Ted Zlatanov
2010-03-23  9:54               ` Juri Linkov
2010-03-23 10:54                 ` joakim
2010-03-23 15:02                 ` Ted Zlatanov
2010-03-24  3:22                   ` Stefan Monnier
2010-03-24 17:35                   ` Glenn Morris
2010-03-24 19:37                     ` Ted Zlatanov
2010-03-25  1:16                       ` Ted Zlatanov
2010-03-23 12:57               ` Stefan Monnier

Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.