all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: Lawrence Mitchell <wence@gmx.li>
To: emacs-devel@gnu.org
Subject: Re: Converting a string to valid XHTML id?
Date: Thu, 02 Dec 2010 15:50:11 +0000	[thread overview]
Message-ID: <pvhp62vcmax8.fsf@gmx.li> (raw)
In-Reply-To: AANLkTinXF0hm8k=KJ4SE5+fqN_pCRHQ6u_ORpOxWh78h@mail.gmail.com

Lennart Borgman wrote:
> On Thu, Dec 2, 2010 at 5:42 AM, PJ Weisberg <pj@irregularexpressions.net> wrote:

>>> In the context where it is used it is for export of org-mode files to
>>> xhtml. Obviously if there are links to anchors within other files my
>>> approach will fails.

>>> So, hm, maybe I should reset this variable when starting a directory
>>> tree export or a single file export rather than making it buffer
>>> local. (But then I have to look into the export of directory trees in
>>> org-mode which I have not done yet.)



>> Just to be sure we're on the same page: the string MUST be unique
>> within the output, but it may NOT be unique within the input?
>> Therefore calling the function twice with the same argument must give
>> different results?

> No, I think they are already unique enough so to say in org-mode.
> Otherwise the links within org-mode could not work.

> So calling the function with the same argument must give the same
> result all times. (AND that result must be unique, ie no other input
> string should give the same result.)

As suggested previously, just take a crypto hash of the id.

(defun org-newhtml-escape-id (id)
   (format "ANON-%s" (sha1 id)))

As long as you do this for /all/ ids in the buffer, that'll work
fine.

If you only do it to invalid ids, then there's the possibility
that an existing ID in the buffer will have the form ANON-sha1sum
and a different invalid id will be escaped to ANON-sha1sum.

Or use Davis' solution which works in a similar way, and as a
bonus you can map back to the original id easily.

Recall his solution:

(defun org-newhtml-escape-id (str)
  "Return a valid xhtml id attribute string.
See URL `http://xhtml.com/en/xhtml/reference/attribute-data-types/#id'."
  (replace-regexp-in-string
   "[^-.a-zA-Z0-9]" (lambda (c)
                      (mapconcat (lambda (d) (format "_%02x" d))
                                 (string-as-unibyte c) "")) str))

Notice that the output uses "_" which is a /valid/ char in an
xhtml id.  However, it is not considered valid in an input
string.

So (org-newhtml-escape-id "foo_5fbar") => foo_5f5fbar
But (org-newhtml-escape-id "foo_bar") => foo_5fbar

So notice that valid ids /without/ an underscore in them are left
as is, but ids with an underscore are encoded under this scheme,
so you can't generate a collision.

Lawrence

-- 
Lawrence Mitchell <wence@gmx.li>




  reply	other threads:[~2010-12-02 15:50 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-11-29  1:43 Converting a string to valid XHTML id? Lennart Borgman
2010-11-29 18:08 ` Andreas Schwab
2010-11-29 18:18   ` Lennart Borgman
2010-11-29 18:33     ` Deniz Dogan
2010-11-29 18:39       ` Lennart Borgman
2010-11-30 14:50         ` Ralf Mattes
2010-12-01 14:53           ` Lennart Borgman
2010-12-01 15:34             ` Davis Herring
2010-12-01 15:58               ` rm
2010-12-01 22:32                 ` Davis Herring
2010-12-01 23:12                   ` Lennart Borgman
2010-12-01 23:16                     ` Davis Herring
2010-12-01 23:31                       ` Lennart Borgman
2010-12-02  0:12                         ` Davis Herring
2010-12-02  0:44                           ` Lennart Borgman
2010-12-02  1:18                             ` Davis Herring
2010-12-02  1:51                               ` Lennart Borgman
2010-12-01 15:51             ` Stefan Monnier
2010-12-01 19:51               ` Lennart Borgman
2010-12-02  2:37                 ` Kevin Rodgers
2010-12-02  2:54                   ` Lennart Borgman
2010-12-02  4:42                     ` PJ Weisberg
2010-12-02 12:26                       ` Lennart Borgman
2010-12-02 15:50                         ` Lawrence Mitchell [this message]
2010-12-02 17:47                           ` Lennart Borgman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=pvhp62vcmax8.fsf@gmx.li \
    --to=wence@gmx.li \
    --cc=emacs-devel@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.