unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
* Converting a string to valid XHTML id?
@ 2010-11-29  1:43 Lennart Borgman
  2010-11-29 18:08 ` Andreas Schwab
  0 siblings, 1 reply; 25+ messages in thread
From: Lennart Borgman @ 2010-11-29  1:43 UTC (permalink / raw)
  To: Emacs-Devel devel

Do we have a function somewhere for converting a string to a valid XHTML id:

  id	Specifies a unique id for an element.
  Naming rules:

  Must begin with a letter A-Z or a-z
  Can be followed by: letters (A-Za-z), digits (0-9), hyphens ("-"),
underscores ("_"), colons (":"), and periods (".")
  Values are case-sensitive



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Converting a string to valid XHTML id?
  2010-11-29  1:43 Converting a string to valid XHTML id? Lennart Borgman
@ 2010-11-29 18:08 ` Andreas Schwab
  2010-11-29 18:18   ` Lennart Borgman
  0 siblings, 1 reply; 25+ messages in thread
From: Andreas Schwab @ 2010-11-29 18:08 UTC (permalink / raw)
  To: Lennart Borgman; +Cc: Emacs-Devel devel

Lennart Borgman <lennart.borgman@gmail.com> writes:

> Do we have a function somewhere for converting a string to a valid XHTML id:

(replace-regexp-in-string "\\`[^A-Za-z]" "X"
   (replace-regexp-in-string "[^A-Za-z0-9_:.-]" "_" string))

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Converting a string to valid XHTML id?
  2010-11-29 18:08 ` Andreas Schwab
@ 2010-11-29 18:18   ` Lennart Borgman
  2010-11-29 18:33     ` Deniz Dogan
  0 siblings, 1 reply; 25+ messages in thread
From: Lennart Borgman @ 2010-11-29 18:18 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: Emacs-Devel devel

On Mon, Nov 29, 2010 at 7:08 PM, Andreas Schwab <schwab@linux-m68k.org> wrote:
> Lennart Borgman <lennart.borgman@gmail.com> writes:
>
>> Do we have a function somewhere for converting a string to a valid XHTML id:
>
> (replace-regexp-in-string "\\`[^A-Za-z]" "X"
>   (replace-regexp-in-string "[^A-Za-z0-9_:.-]" "_" string))

Thanks, that seems good.

Could we please add something like that in an appropriate library in Emacs?



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Converting a string to valid XHTML id?
  2010-11-29 18:18   ` Lennart Borgman
@ 2010-11-29 18:33     ` Deniz Dogan
  2010-11-29 18:39       ` Lennart Borgman
  0 siblings, 1 reply; 25+ messages in thread
From: Deniz Dogan @ 2010-11-29 18:33 UTC (permalink / raw)
  To: Lennart Borgman; +Cc: Andreas Schwab, Emacs-Devel devel

2010/11/29 Lennart Borgman <lennart.borgman@gmail.com>:
> On Mon, Nov 29, 2010 at 7:08 PM, Andreas Schwab <schwab@linux-m68k.org> wrote:
>> Lennart Borgman <lennart.borgman@gmail.com> writes:
>>
>>> Do we have a function somewhere for converting a string to a valid XHTML id:
>>
>> (replace-regexp-in-string "\\`[^A-Za-z]" "X"
>>   (replace-regexp-in-string "[^A-Za-z0-9_:.-]" "_" string))
>
> Thanks, that seems good.
>
> Could we please add something like that in an appropriate library in Emacs?
>
>

What is this for? Just curious.

-- 
Deniz Dogan



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Converting a string to valid XHTML id?
  2010-11-29 18:33     ` Deniz Dogan
@ 2010-11-29 18:39       ` Lennart Borgman
  2010-11-30 14:50         ` Ralf Mattes
  0 siblings, 1 reply; 25+ messages in thread
From: Lennart Borgman @ 2010-11-29 18:39 UTC (permalink / raw)
  To: Deniz Dogan; +Cc: Andreas Schwab, Emacs-Devel devel

On Mon, Nov 29, 2010 at 7:33 PM, Deniz Dogan <deniz.a.m.dogan@gmail.com> wrote:
> 2010/11/29 Lennart Borgman <lennart.borgman@gmail.com>:
>> On Mon, Nov 29, 2010 at 7:08 PM, Andreas Schwab <schwab@linux-m68k.org> wrote:
>>> Lennart Borgman <lennart.borgman@gmail.com> writes:
>>>
>>>> Do we have a function somewhere for converting a string to a valid XHTML id:
>>>
>>> (replace-regexp-in-string "\\`[^A-Za-z]" "X"
>>>   (replace-regexp-in-string "[^A-Za-z0-9_:.-]" "_" string))
>>
>> Thanks, that seems good.
>>
>> Could we please add something like that in an appropriate library in Emacs?
>>
>>
>
> What is this for? Just curious.

I need something like this for exporting org-mode to html (Jambunathan
and I are rewriting the export routines to cover export to odt too).

BTW, I came up with this for the moment:


;; (org-newhtml-escape-id "fig:5")
;; (org-newhtml-escape-id "56")
(defun org-newhtml-escape-id (id)
  "Return a valid id string.
See URL http://www.w3schools.com/tags/att_standard_id.asp"
  (setq id (replace-regexp-in-string "\\`\\([^A-Za-z]\\)" "ANON-\\1" id nil))
  (setq id (replace-regexp-in-string "[^A-Za-z0-9_.-]" "-" id t)))



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Converting a string to valid XHTML id?
  2010-11-29 18:39       ` Lennart Borgman
@ 2010-11-30 14:50         ` Ralf Mattes
  2010-12-01 14:53           ` Lennart Borgman
  0 siblings, 1 reply; 25+ messages in thread
From: Ralf Mattes @ 2010-11-30 14:50 UTC (permalink / raw)
  To: emacs-devel

On Mon, 29 Nov 2010 19:39:17 +0100, Lennart Borgman wrote:

> On Mon, Nov 29, 2010 at 7:33 PM, Deniz Dogan <deniz.a.m.dogan@gmail.com>
> wrote:
>> ...
>> What is this for? Just curious.
> 
> I need something like this for exporting org-mode to html (Jambunathan
> and I are rewriting the export routines to cover export to odt too).
> 
> BTW, I came up with this for the moment:
> 
> 
> ;; (org-newhtml-escape-id "fig:5")
> ;; (org-newhtml-escape-id "56")
> (defun org-newhtml-escape-id (id)
>   "Return a valid id string.
> See URL http://www.w3schools.com/tags/att_standard_id.asp"
>   (setq id (replace-regexp-in-string "\\`\\([^A-Za-z]\\)" "ANON-\\1" id
>   nil)) (setq id (replace-regexp-in-string "[^A-Za-z0-9_.-]" "-" id t)))


But this is wrong - it'll possibly generate invalid html. 
Consider the following:

 (org-newhtml-escape-id "this is cool!")

⇒ "this-is-cool-"
 
 (org-newhtml-escape-id "this is cool?")
 
⇒ "this-is-cool-"

collapsing two different strings to the same ID, resulting in
invalid html.

 Cheers, Ralf Mattes




^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Converting a string to valid XHTML id?
  2010-11-30 14:50         ` Ralf Mattes
@ 2010-12-01 14:53           ` Lennart Borgman
  2010-12-01 15:34             ` Davis Herring
  2010-12-01 15:51             ` Stefan Monnier
  0 siblings, 2 replies; 25+ messages in thread
From: Lennart Borgman @ 2010-12-01 14:53 UTC (permalink / raw)
  To: Ralf Mattes; +Cc: emacs-devel

On Tue, Nov 30, 2010 at 3:50 PM, Ralf Mattes <rm@seid-online.de> wrote:
>
> But this is wrong - it'll possibly generate invalid html.
> Consider the following:
>
>  (org-newhtml-escape-id "this is cool!")
>
> ⇒ "this-is-cool-"
>
>  (org-newhtml-escape-id "this is cool?")
>
> ⇒ "this-is-cool-"
>
> collapsing two different strings to the same ID, resulting in
> invalid html.

Thanks Ralf, I thought it was a bit too much too handle, but here is a
new version that tries to handle this. (You might perhaps sometimes
want to set org-newhtml-escaped-ids to nil.)


(defvar org-newhtml-escaped-ids nil)
(make-variable-buffer-local 'org-newhtml-escaped-ids)

(defun org-newhtml-escape-id (id)
  "Return a valid xhtml id attribute string.
See URL `http://xhtml.com/en/xhtml/reference/attribute-data-types/#id'.

Try to make this unique.  Note that this cannot be done unless we
know all used ids since the resulting string might be an already
used id."
  (let ((old (assoc id org-newhtml-escaped-ids))
        new-id)
    (if old
        (cdr old)
      (setq new-id (replace-regexp-in-string "\\`\\([^A-Za-z]\\)"
"ANON-\\1" id nil))
      (setq new-id (replace-regexp-in-string "[^A-Za-z0-9_.-]" "-" new-id t))
      (setq old t)
      (while old
        (setq old (rassoc new-id org-newhtml-escaped-ids))
        (when old
          (setq new-id (concat new-id "X"))))
      (push (cons id new-id) org-newhtml-escaped-ids)
      new-id)))



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Converting a string to valid XHTML id?
  2010-12-01 14:53           ` Lennart Borgman
@ 2010-12-01 15:34             ` Davis Herring
  2010-12-01 15:58               ` rm
  2010-12-01 15:51             ` Stefan Monnier
  1 sibling, 1 reply; 25+ messages in thread
From: Davis Herring @ 2010-12-01 15:34 UTC (permalink / raw)
  To: Lennart Borgman; +Cc: Ralf Mattes, emacs-devel

>   (let ((old (assoc id org-newhtml-escaped-ids))

Wouldn't it be easier to do something like percent encoding?  Map
everything that isn't [-.a-zA-Z0-9] onto _HH.  Multibyte characters could
be handled by writing their UTF-8 encoding, or else by escaping as _nHH...
where n is the number of hex digits needed (itself always a single digit):

;; Uses Emacs' internal encoding instead of UTF-8 proper.
(defun org-newhtml-escape-id (str)
  "Return a valid xhtml id attribute string.
See URL `http://xhtml.com/en/xhtml/reference/attribute-data-types/#id'."
  (replace-regexp-in-string
   "[^-.a-zA-Z0-9]" (lambda (c)
                      (mapconcat (lambda (d) (format "_%02x" d))
                                 (string-as-unibyte c) "")) str))

Certainly someone could already have an id "foo_5fbar", but the
table-based implementation already makes the assumption that all IDs will
be generated by it.

Davis

-- 
This product is sold by volume, not by mass.  If it appears too dense or
too sparse, it is because mass-energy conversion has occurred during
shipping.



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Converting a string to valid XHTML id?
  2010-12-01 14:53           ` Lennart Borgman
  2010-12-01 15:34             ` Davis Herring
@ 2010-12-01 15:51             ` Stefan Monnier
  2010-12-01 19:51               ` Lennart Borgman
  1 sibling, 1 reply; 25+ messages in thread
From: Stefan Monnier @ 2010-12-01 15:51 UTC (permalink / raw)
  To: Lennart Borgman; +Cc: Ralf Mattes, emacs-devel

>> collapsing two different strings to the same ID, resulting in
>> invalid html.

I have no idea what those ids are for, but wouldn't a cryptographic hash
work as well?


        Stefan



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Converting a string to valid XHTML id?
  2010-12-01 15:34             ` Davis Herring
@ 2010-12-01 15:58               ` rm
  2010-12-01 22:32                 ` Davis Herring
  0 siblings, 1 reply; 25+ messages in thread
From: rm @ 2010-12-01 15:58 UTC (permalink / raw)
  To: Davis Herring; +Cc: Ralf Mattes, Lennart Borgman, emacs-devel

On Wed, Dec 01, 2010 at 07:34:00AM -0800, Davis Herring wrote:
> >   (let ((old (assoc id org-newhtml-escaped-ids))
> 
> Wouldn't it be easier to do something like percent encoding?  Map
> everything that isn't [-.a-zA-Z0-9] onto _HH.  Multibyte characters could
> be handled by writing their UTF-8 encoding, or else by escaping as _nHH...
> where n is the number of hex digits needed (itself always a single digit):


That sounds tempting but is wrong :-/ Percent-encoding doesn't produce
valid  ID values. From the html 4 specs:

 6.2 SGML basic types

  ....

 ID and NAME tokens must begin with a letter ([A-Za-z]) and may be
 followed by any number of letters, digits ([0-9]), hyphens ("-"),
 underscores ("_"), colons (":"), and periods (".").


Cheers. Ralf Mattes

> 
> ;; Uses Emacs' internal encoding instead of UTF-8 proper.
> (defun org-newhtml-escape-id (str)
>   "Return a valid xhtml id attribute string.
> See URL `http://xhtml.com/en/xhtml/reference/attribute-data-types/#id'."
>   (replace-regexp-in-string
>    "[^-.a-zA-Z0-9]" (lambda (c)
>                       (mapconcat (lambda (d) (format "_%02x" d))
>                                  (string-as-unibyte c) "")) str))
> 
> Certainly someone could already have an id "foo_5fbar", but the
> table-based implementation already makes the assumption that all IDs will
> be generated by it.
> 
> Davis
> 
> -- 
> This product is sold by volume, not by mass.  If it appears too dense or
> too sparse, it is because mass-energy conversion has occurred during
> shipping.



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Converting a string to valid XHTML id?
  2010-12-01 15:51             ` Stefan Monnier
@ 2010-12-01 19:51               ` Lennart Borgman
  2010-12-02  2:37                 ` Kevin Rodgers
  0 siblings, 1 reply; 25+ messages in thread
From: Lennart Borgman @ 2010-12-01 19:51 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: Ralf Mattes, emacs-devel

On Wed, Dec 1, 2010 at 4:51 PM, Stefan Monnier <monnier@iro.umontreal.ca> wrote:
>>> collapsing two different strings to the same ID, resulting in
>>> invalid html.
>
> I have no idea what those ids are for, but wouldn't a cryptographic hash
> work as well?

It is just the value of the id attribute, for example like this:

   <span id="...">



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Converting a string to valid XHTML id?
  2010-12-01 15:58               ` rm
@ 2010-12-01 22:32                 ` Davis Herring
  2010-12-01 23:12                   ` Lennart Borgman
  0 siblings, 1 reply; 25+ messages in thread
From: Davis Herring @ 2010-12-01 22:32 UTC (permalink / raw)
  To: rm; +Cc: Ralf Mattes, Lennart Borgman, emacs-devel

> That sounds tempting but is wrong :-/ Percent-encoding doesn't produce
> valid  ID values. From the html 4 specs:
>
>  6.2 SGML basic types
>
>   ....
>
>  ID and NAME tokens must begin with a letter ([A-Za-z]) and may be
>  followed by any number of letters, digits ([0-9]), hyphens ("-"),
>  underscores ("_"), colons (":"), and periods (".").

If you're referring to the leading letter, you're right -- I forgot about
it.  Easy enough to fix: also use Lennart's "ANON-" prefix when the string
begins with a non-letter or with the string "ANON-".

Or is there something more fundamental that I'm missing?

Davis

-- 
This product is sold by volume, not by mass.  If it appears too dense or
too sparse, it is because mass-energy conversion has occurred during
shipping.



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Converting a string to valid XHTML id?
  2010-12-01 22:32                 ` Davis Herring
@ 2010-12-01 23:12                   ` Lennart Borgman
  2010-12-01 23:16                     ` Davis Herring
  0 siblings, 1 reply; 25+ messages in thread
From: Lennart Borgman @ 2010-12-01 23:12 UTC (permalink / raw)
  To: herring; +Cc: rm, Ralf Mattes, emacs-devel

On Wed, Dec 1, 2010 at 11:32 PM, Davis Herring <herring@lanl.gov> wrote:
>> That sounds tempting but is wrong :-/ Percent-encoding doesn't produce
>> valid  ID values. From the html 4 specs:
>>
>>  6.2 SGML basic types
>>
>>   ....
>>
>>  ID and NAME tokens must begin with a letter ([A-Za-z]) and may be
>>  followed by any number of letters, digits ([0-9]), hyphens ("-"),
>>  underscores ("_"), colons (":"), and periods (".").
>
> If you're referring to the leading letter, you're right -- I forgot about
> it.  Easy enough to fix: also use Lennart's "ANON-" prefix when the string
> begins with a non-letter or with the string "ANON-".
>
> Or is there something more fundamental that I'm missing?

Yes, % is not allowed. And the names should be unique.



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Converting a string to valid XHTML id?
  2010-12-01 23:12                   ` Lennart Borgman
@ 2010-12-01 23:16                     ` Davis Herring
  2010-12-01 23:31                       ` Lennart Borgman
  0 siblings, 1 reply; 25+ messages in thread
From: Davis Herring @ 2010-12-01 23:16 UTC (permalink / raw)
  To: Lennart Borgman; +Cc: rm, Ralf Mattes, emacs-devel

>> Or is there something more fundamental that I'm missing?
>
> Yes, % is not allowed. And the names should be unique.

That's why I used _ instead of %.  And my function is injective.

Davis

-- 
This product is sold by volume, not by mass.  If it appears too dense or
too sparse, it is because mass-energy conversion has occurred during
shipping.



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Converting a string to valid XHTML id?
  2010-12-01 23:16                     ` Davis Herring
@ 2010-12-01 23:31                       ` Lennart Borgman
  2010-12-02  0:12                         ` Davis Herring
  0 siblings, 1 reply; 25+ messages in thread
From: Lennart Borgman @ 2010-12-01 23:31 UTC (permalink / raw)
  To: herring; +Cc: rm, Ralf Mattes, emacs-devel

On Thu, Dec 2, 2010 at 12:16 AM, Davis Herring <herring@lanl.gov> wrote:
>>> Or is there something more fundamental that I'm missing?
>>
>> Yes, % is not allowed. And the names should be unique.
>
> That's why I used _ instead of %.

Sorry, I missed that

> And my function is injective.

Do you mean that it always maps the same original input string to the
same unique output string? Then it could be used. However I thought it
would be easier reading the result with my simpler mapping.



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Converting a string to valid XHTML id?
  2010-12-01 23:31                       ` Lennart Borgman
@ 2010-12-02  0:12                         ` Davis Herring
  2010-12-02  0:44                           ` Lennart Borgman
  0 siblings, 1 reply; 25+ messages in thread
From: Davis Herring @ 2010-12-02  0:12 UTC (permalink / raw)
  To: Lennart Borgman; +Cc: rm, Ralf Mattes, emacs-devel

>> And my function is injective.
>
> Do you mean that it always maps the same original input string to the
> same unique output string?

Yes -- injective gives you the "unique".  The "same" is because it's pure.

> However I thought it would be easier reading the result with my
> simpler mapping.

Sure, unless this runs too many times.

>>>     (when old
>>>       (setq new-id (concat new-id "X"))))

I don't know what's really easier; I just think that having it be
stateless is a good thing.

Davis

-- 
This product is sold by volume, not by mass.  If it appears too dense or
too sparse, it is because mass-energy conversion has occurred during
shipping.



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Converting a string to valid XHTML id?
  2010-12-02  0:12                         ` Davis Herring
@ 2010-12-02  0:44                           ` Lennart Borgman
  2010-12-02  1:18                             ` Davis Herring
  0 siblings, 1 reply; 25+ messages in thread
From: Lennart Borgman @ 2010-12-02  0:44 UTC (permalink / raw)
  To: herring; +Cc: rm, Ralf Mattes, emacs-devel

On Thu, Dec 2, 2010 at 1:12 AM, Davis Herring <herring@lanl.gov> wrote:
>>> And my function is injective.
>>
>> Do you mean that it always maps the same original input string to the
>> same unique output string?
>
> Yes -- injective gives you the "unique".  The "same" is because it's pure.
>
>> However I thought it would be easier reading the result with my
>> simpler mapping.
>
> Sure, unless this runs too many times.
>
>>>>     (when old
>>>>       (setq new-id (concat new-id "X"))))
>
> I don't know what's really easier; I just think that having it be
> stateless is a good thing.

I don't understand how your version is supposed to work. How can it be
unique if it does not keep track of if the id is already used? (Or did
you say that this case is not covered?)



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Converting a string to valid XHTML id?
  2010-12-02  0:44                           ` Lennart Borgman
@ 2010-12-02  1:18                             ` Davis Herring
  2010-12-02  1:51                               ` Lennart Borgman
  0 siblings, 1 reply; 25+ messages in thread
From: Davis Herring @ 2010-12-02  1:18 UTC (permalink / raw)
  To: Lennart Borgman; +Cc: rm, Ralf Mattes, emacs-devel

> I don't understand how your version is supposed to work. How can it be
> unique if it does not keep track of if the id is already used? (Or did
> you say that this case is not covered?)

The requirement is that unique inputs map to unique outputs, yes?  What I
wrote does that, by making the string longer when it contains characters
that can't be used directly.  It's a standard thing: map the strings in
A^n onto B^(n+e), where B is a smaller alphabet than A and e is the extra
length required because each letter conveys less information.  (In
particular, it must be that |A|^n<=|B|^(n+e) for any such injective
coding.)  Like base64 or uuencode or quoted-printable.

Davis

-- 
This product is sold by volume, not by mass.  If it appears too dense or
too sparse, it is because mass-energy conversion has occurred during
shipping.



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Converting a string to valid XHTML id?
  2010-12-02  1:18                             ` Davis Herring
@ 2010-12-02  1:51                               ` Lennart Borgman
  0 siblings, 0 replies; 25+ messages in thread
From: Lennart Borgman @ 2010-12-02  1:51 UTC (permalink / raw)
  To: herring; +Cc: rm, Ralf Mattes, emacs-devel

On Thu, Dec 2, 2010 at 2:18 AM, Davis Herring <herring@lanl.gov> wrote:
>> I don't understand how your version is supposed to work. How can it be
>> unique if it does not keep track of if the id is already used? (Or did
>> you say that this case is not covered?)
>
> The requirement is that unique inputs map to unique outputs, yes?  What I
> wrote does that, by making the string longer when it contains characters
> that can't be used directly.  It's a standard thing: map the strings in
> A^n onto B^(n+e), where B is a smaller alphabet than A and e is the extra
> length required because each letter conveys less information.  (In
> particular, it must be that |A|^n<=|B|^(n+e) for any such injective
> coding.)  Like base64 or uuencode or quoted-printable.

Yes, but I can't see that you cover the case that the id converted id
is already used. Or do you do that?



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Converting a string to valid XHTML id?
  2010-12-01 19:51               ` Lennart Borgman
@ 2010-12-02  2:37                 ` Kevin Rodgers
  2010-12-02  2:54                   ` Lennart Borgman
  0 siblings, 1 reply; 25+ messages in thread
From: Kevin Rodgers @ 2010-12-02  2:37 UTC (permalink / raw)
  To: emacs-devel

On 12/1/10 12:51 PM, Lennart Borgman wrote:
> On Wed, Dec 1, 2010 at 4:51 PM, Stefan Monnier<monnier@iro.umontreal.ca>  wrote:
>>>> collapsing two different strings to the same ID, resulting in
>>>> invalid html.
>>
>> I have no idea what those ids are for, but wouldn't a cryptographic hash
>> work as well?
>
> It is just the value of the id attribute, for example like this:
>
>     <span id="...">

Is the point of your function to create a syntactically valid XHTML id from a
string that is assumed to be unique within the context of the current document,
or is it to generate a syntactically valid, unique XHTML id every time it is
called (even when called multiple times with the same string)?

-- 
Kevin Rodgers
Denver, Colorado, USA




^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Converting a string to valid XHTML id?
  2010-12-02  2:37                 ` Kevin Rodgers
@ 2010-12-02  2:54                   ` Lennart Borgman
  2010-12-02  4:42                     ` PJ Weisberg
  0 siblings, 1 reply; 25+ messages in thread
From: Lennart Borgman @ 2010-12-02  2:54 UTC (permalink / raw)
  To: Kevin Rodgers; +Cc: emacs-devel

On Thu, Dec 2, 2010 at 3:37 AM, Kevin Rodgers <kevin.d.rodgers@gmail.com> wrote:
> On 12/1/10 12:51 PM, Lennart Borgman wrote:
>>
>> On Wed, Dec 1, 2010 at 4:51 PM, Stefan Monnier<monnier@iro.umontreal.ca>
>>  wrote:
>>>>>
>>>>> collapsing two different strings to the same ID, resulting in
>>>>> invalid html.
>>>
>>> I have no idea what those ids are for, but wouldn't a cryptographic hash
>>> work as well?
>>
>> It is just the value of the id attribute, for example like this:
>>
>>    <span id="...">
>
> Is the point of your function to create a syntactically valid XHTML id from
> a
> string that is assumed to be unique within the context of the current
> document,
> or is it to generate a syntactically valid, unique XHTML id every time it is
> called (even when called multiple times with the same string)?

Currently unique within the current document.

In the context where it is used it is for export of org-mode files to
xhtml. Obviously if there are links to anchors within other files my
approach will fails.

So, hm, maybe I should reset this variable when starting a directory
tree export or a single file export rather than making it buffer
local. (But then I have to look into the export of directory trees in
org-mode which I have not done yet.)



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Converting a string to valid XHTML id?
  2010-12-02  2:54                   ` Lennart Borgman
@ 2010-12-02  4:42                     ` PJ Weisberg
  2010-12-02 12:26                       ` Lennart Borgman
  0 siblings, 1 reply; 25+ messages in thread
From: PJ Weisberg @ 2010-12-02  4:42 UTC (permalink / raw)
  To: Lennart Borgman; +Cc: Kevin Rodgers, emacs-devel

On 12/1/10, Lennart Borgman <lennart.borgman@gmail.com> wrote:
> On Thu, Dec 2, 2010 at 3:37 AM, Kevin Rodgers <kevin.d.rodgers@gmail.com>
> wrote:
>> Is the point of your function to create a syntactically valid XHTML id
>> from
>> a
>> string that is assumed to be unique within the context of the current
>> document,
>> or is it to generate a syntactically valid, unique XHTML id every time it
>> is
>> called (even when called multiple times with the same string)?
>
> Currently unique within the current document.
>
> In the context where it is used it is for export of org-mode files to
> xhtml. Obviously if there are links to anchors within other files my
> approach will fails.
>
> So, hm, maybe I should reset this variable when starting a directory
> tree export or a single file export rather than making it buffer
> local. (But then I have to look into the export of directory trees in
> org-mode which I have not done yet.)
>
>

Just to be sure we're on the same page: the string MUST be unique
within the output, but it may NOT be unique within the input?
Therefore calling the function twice with the same argument must give
different results?

-PJ



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Converting a string to valid XHTML id?
  2010-12-02  4:42                     ` PJ Weisberg
@ 2010-12-02 12:26                       ` Lennart Borgman
  2010-12-02 15:50                         ` Lawrence Mitchell
  0 siblings, 1 reply; 25+ messages in thread
From: Lennart Borgman @ 2010-12-02 12:26 UTC (permalink / raw)
  To: PJ Weisberg; +Cc: Kevin Rodgers, emacs-devel

On Thu, Dec 2, 2010 at 5:42 AM, PJ Weisberg <pj@irregularexpressions.net> wrote:
>>
>> In the context where it is used it is for export of org-mode files to
>> xhtml. Obviously if there are links to anchors within other files my
>> approach will fails.
>>
>> So, hm, maybe I should reset this variable when starting a directory
>> tree export or a single file export rather than making it buffer
>> local. (But then I have to look into the export of directory trees in
>> org-mode which I have not done yet.)
>>
>>
>
> Just to be sure we're on the same page: the string MUST be unique
> within the output, but it may NOT be unique within the input?
> Therefore calling the function twice with the same argument must give
> different results?

No, I think they are already unique enough so to say in org-mode.
Otherwise the links within org-mode could not work.

So calling the function with the same argument must give the same
result all times. (AND that result must be unique, ie no other input
string should give the same result.)



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Converting a string to valid XHTML id?
  2010-12-02 12:26                       ` Lennart Borgman
@ 2010-12-02 15:50                         ` Lawrence Mitchell
  2010-12-02 17:47                           ` Lennart Borgman
  0 siblings, 1 reply; 25+ messages in thread
From: Lawrence Mitchell @ 2010-12-02 15:50 UTC (permalink / raw)
  To: emacs-devel

Lennart Borgman wrote:
> On Thu, Dec 2, 2010 at 5:42 AM, PJ Weisberg <pj@irregularexpressions.net> wrote:

>>> In the context where it is used it is for export of org-mode files to
>>> xhtml. Obviously if there are links to anchors within other files my
>>> approach will fails.

>>> So, hm, maybe I should reset this variable when starting a directory
>>> tree export or a single file export rather than making it buffer
>>> local. (But then I have to look into the export of directory trees in
>>> org-mode which I have not done yet.)



>> Just to be sure we're on the same page: the string MUST be unique
>> within the output, but it may NOT be unique within the input?
>> Therefore calling the function twice with the same argument must give
>> different results?

> No, I think they are already unique enough so to say in org-mode.
> Otherwise the links within org-mode could not work.

> So calling the function with the same argument must give the same
> result all times. (AND that result must be unique, ie no other input
> string should give the same result.)

As suggested previously, just take a crypto hash of the id.

(defun org-newhtml-escape-id (id)
   (format "ANON-%s" (sha1 id)))

As long as you do this for /all/ ids in the buffer, that'll work
fine.

If you only do it to invalid ids, then there's the possibility
that an existing ID in the buffer will have the form ANON-sha1sum
and a different invalid id will be escaped to ANON-sha1sum.

Or use Davis' solution which works in a similar way, and as a
bonus you can map back to the original id easily.

Recall his solution:

(defun org-newhtml-escape-id (str)
  "Return a valid xhtml id attribute string.
See URL `http://xhtml.com/en/xhtml/reference/attribute-data-types/#id'."
  (replace-regexp-in-string
   "[^-.a-zA-Z0-9]" (lambda (c)
                      (mapconcat (lambda (d) (format "_%02x" d))
                                 (string-as-unibyte c) "")) str))

Notice that the output uses "_" which is a /valid/ char in an
xhtml id.  However, it is not considered valid in an input
string.

So (org-newhtml-escape-id "foo_5fbar") => foo_5f5fbar
But (org-newhtml-escape-id "foo_bar") => foo_5fbar

So notice that valid ids /without/ an underscore in them are left
as is, but ids with an underscore are encoded under this scheme,
so you can't generate a collision.

Lawrence

-- 
Lawrence Mitchell <wence@gmx.li>




^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Converting a string to valid XHTML id?
  2010-12-02 15:50                         ` Lawrence Mitchell
@ 2010-12-02 17:47                           ` Lennart Borgman
  0 siblings, 0 replies; 25+ messages in thread
From: Lennart Borgman @ 2010-12-02 17:47 UTC (permalink / raw)
  To: Lawrence Mitchell; +Cc: emacs-devel

On Thu, Dec 2, 2010 at 4:50 PM, Lawrence Mitchell <wence@gmx.li> wrote:
>
> Or use Davis' solution which works in a similar way, and as a
> bonus you can map back to the original id easily.
>
> Recall his solution:
>
> (defun org-newhtml-escape-id (str)
>  "Return a valid xhtml id attribute string.
> See URL `http://xhtml.com/en/xhtml/reference/attribute-data-types/#id'."
>  (replace-regexp-in-string
>   "[^-.a-zA-Z0-9]" (lambda (c)
>                      (mapconcat (lambda (d) (format "_%02x" d))
>                                 (string-as-unibyte c) "")) str))
>
> Notice that the output uses "_" which is a /valid/ char in an
> xhtml id.  However, it is not considered valid in an input
> string.
>
> So (org-newhtml-escape-id "foo_5fbar") => foo_5f5fbar
> But (org-newhtml-escape-id "foo_bar") => foo_5fbar
>
> So notice that valid ids /without/ an underscore in them are left
> as is, but ids with an underscore are encoded under this scheme,
> so you can't generate a collision.

Ah, thanks, now I understand. I missed that detail.



^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2010-12-02 17:47 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-11-29  1:43 Converting a string to valid XHTML id? Lennart Borgman
2010-11-29 18:08 ` Andreas Schwab
2010-11-29 18:18   ` Lennart Borgman
2010-11-29 18:33     ` Deniz Dogan
2010-11-29 18:39       ` Lennart Borgman
2010-11-30 14:50         ` Ralf Mattes
2010-12-01 14:53           ` Lennart Borgman
2010-12-01 15:34             ` Davis Herring
2010-12-01 15:58               ` rm
2010-12-01 22:32                 ` Davis Herring
2010-12-01 23:12                   ` Lennart Borgman
2010-12-01 23:16                     ` Davis Herring
2010-12-01 23:31                       ` Lennart Borgman
2010-12-02  0:12                         ` Davis Herring
2010-12-02  0:44                           ` Lennart Borgman
2010-12-02  1:18                             ` Davis Herring
2010-12-02  1:51                               ` Lennart Borgman
2010-12-01 15:51             ` Stefan Monnier
2010-12-01 19:51               ` Lennart Borgman
2010-12-02  2:37                 ` Kevin Rodgers
2010-12-02  2:54                   ` Lennart Borgman
2010-12-02  4:42                     ` PJ Weisberg
2010-12-02 12:26                       ` Lennart Borgman
2010-12-02 15:50                         ` Lawrence Mitchell
2010-12-02 17:47                           ` Lennart Borgman

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).