From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Lawrence Mitchell Newsgroups: gmane.emacs.devel Subject: Re: Converting a string to valid XHTML id? Date: Thu, 02 Dec 2010 15:50:11 +0000 Message-ID: References: NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain X-Trace: dough.gmane.org 1291308804 27216 80.91.229.12 (2 Dec 2010 16:53:24 GMT) X-Complaints-To: usenet@dough.gmane.org NNTP-Posting-Date: Thu, 2 Dec 2010 16:53:24 +0000 (UTC) To: emacs-devel@gnu.org Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Thu Dec 02 17:53:19 2010 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1POCPG-0002a2-DD for ged-emacs-devel@m.gmane.org; Thu, 02 Dec 2010 17:53:19 +0100 Original-Received: from localhost ([127.0.0.1]:36012 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1POCPF-0001MS-Sd for ged-emacs-devel@m.gmane.org; Thu, 02 Dec 2010 11:53:17 -0500 Original-Received: from [140.186.70.92] (port=52560 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1POBQS-0000Cu-4B for emacs-devel@gnu.org; Thu, 02 Dec 2010 10:50:29 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1POBQQ-0001GW-UW for emacs-devel@gnu.org; Thu, 02 Dec 2010 10:50:28 -0500 Original-Received: from lo.gmane.org ([80.91.229.12]:50734) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1POBQQ-0001GA-Kz for emacs-devel@gnu.org; Thu, 02 Dec 2010 10:50:26 -0500 Original-Received: from list by lo.gmane.org with local (Exim 4.69) (envelope-from ) id 1POBQO-0002nD-41 for emacs-devel@gnu.org; Thu, 02 Dec 2010 16:50:24 +0100 Original-Received: from garnet.epcc.ed.ac.uk ([129.215.56.222]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Thu, 02 Dec 2010 16:50:23 +0100 Original-Received: from wence by garnet.epcc.ed.ac.uk with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Thu, 02 Dec 2010 16:50:23 +0100 X-Injected-Via-Gmane: http://gmane.org/ Original-Lines: 66 Original-X-Complaints-To: usenet@dough.gmane.org X-Gmane-NNTP-Posting-Host: garnet.epcc.ed.ac.uk User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.0.50 (usg-unix-v) Cancel-Lock: sha1:7WDTKrtVArFMd/Yf1bJQ5KBxciI= X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 3) X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:133323 Archived-At: Lennart Borgman wrote: > On Thu, Dec 2, 2010 at 5:42 AM, PJ Weisberg wrote: >>> In the context where it is used it is for export of org-mode files to >>> xhtml. Obviously if there are links to anchors within other files my >>> approach will fails. >>> So, hm, maybe I should reset this variable when starting a directory >>> tree export or a single file export rather than making it buffer >>> local. (But then I have to look into the export of directory trees in >>> org-mode which I have not done yet.) >> Just to be sure we're on the same page: the string MUST be unique >> within the output, but it may NOT be unique within the input? >> Therefore calling the function twice with the same argument must give >> different results? > No, I think they are already unique enough so to say in org-mode. > Otherwise the links within org-mode could not work. > So calling the function with the same argument must give the same > result all times. (AND that result must be unique, ie no other input > string should give the same result.) As suggested previously, just take a crypto hash of the id. (defun org-newhtml-escape-id (id) (format "ANON-%s" (sha1 id))) As long as you do this for /all/ ids in the buffer, that'll work fine. If you only do it to invalid ids, then there's the possibility that an existing ID in the buffer will have the form ANON-sha1sum and a different invalid id will be escaped to ANON-sha1sum. Or use Davis' solution which works in a similar way, and as a bonus you can map back to the original id easily. Recall his solution: (defun org-newhtml-escape-id (str) "Return a valid xhtml id attribute string. See URL `http://xhtml.com/en/xhtml/reference/attribute-data-types/#id'." (replace-regexp-in-string "[^-.a-zA-Z0-9]" (lambda (c) (mapconcat (lambda (d) (format "_%02x" d)) (string-as-unibyte c) "")) str)) Notice that the output uses "_" which is a /valid/ char in an xhtml id. However, it is not considered valid in an input string. So (org-newhtml-escape-id "foo_5fbar") => foo_5f5fbar But (org-newhtml-escape-id "foo_bar") => foo_5fbar So notice that valid ids /without/ an underscore in them are left as is, but ids with an underscore are encoded under this scheme, so you can't generate a collision. Lawrence -- Lawrence Mitchell