From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: rm@tuxteam.de Newsgroups: gmane.emacs.devel Subject: Re: Converting a string to valid XHTML id? Date: Wed, 1 Dec 2010 16:58:58 +0100 Message-ID: <20101201155858.GB12842@seid-online.de> References: <40291.130.55.118.19.1291217640.squirrel@webmail.lanl.gov> NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: dough.gmane.org 1291221971 5806 80.91.229.12 (1 Dec 2010 16:46:11 GMT) X-Complaints-To: usenet@dough.gmane.org NNTP-Posting-Date: Wed, 1 Dec 2010 16:46:11 +0000 (UTC) Cc: Ralf Mattes , Lennart Borgman , emacs-devel@gnu.org To: Davis Herring Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Wed Dec 01 17:46:06 2010 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1PNpoj-0002Ed-54 for ged-emacs-devel@m.gmane.org; Wed, 01 Dec 2010 17:46:05 +0100 Original-Received: from localhost ([127.0.0.1]:60444 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1PNpoi-0004OL-IH for ged-emacs-devel@m.gmane.org; Wed, 01 Dec 2010 11:46:04 -0500 Original-Received: from [140.186.70.92] (port=43275 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1PNovi-0006wH-VO for emacs-devel@gnu.org; Wed, 01 Dec 2010 10:49:16 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1PNovg-0004H2-QQ for emacs-devel@gnu.org; Wed, 01 Dec 2010 10:49:14 -0500 Original-Received: from alextrapp1.equinoxe.de ([217.22.192.104]:57703 helo=www.elogos.de) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1PNovg-0004Fk-Kw for emacs-devel@gnu.org; Wed, 01 Dec 2010 10:49:12 -0500 Original-Received: by www.elogos.de (Postfix, from userid 1001) id 4B5EF9004B; Wed, 1 Dec 2010 16:58:58 +0100 (CET) Content-Disposition: inline In-Reply-To: <40291.130.55.118.19.1291217640.squirrel@webmail.lanl.gov> User-Agent: Mutt/1.5.15+20070412 (2007-04-11) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 2) X-Mailman-Approved-At: Wed, 01 Dec 2010 11:45:54 -0500 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:133280 Archived-At: On Wed, Dec 01, 2010 at 07:34:00AM -0800, Davis Herring wrote: > > (let ((old (assoc id org-newhtml-escaped-ids)) > > Wouldn't it be easier to do something like percent encoding? Map > everything that isn't [-.a-zA-Z0-9] onto _HH. Multibyte characters could > be handled by writing their UTF-8 encoding, or else by escaping as _nHH... > where n is the number of hex digits needed (itself always a single digit): That sounds tempting but is wrong :-/ Percent-encoding doesn't produce valid ID values. From the html 4 specs: 6.2 SGML basic types .... ID and NAME tokens must begin with a letter ([A-Za-z]) and may be followed by any number of letters, digits ([0-9]), hyphens ("-"), underscores ("_"), colons (":"), and periods ("."). Cheers. Ralf Mattes > > ;; Uses Emacs' internal encoding instead of UTF-8 proper. > (defun org-newhtml-escape-id (str) > "Return a valid xhtml id attribute string. > See URL `http://xhtml.com/en/xhtml/reference/attribute-data-types/#id'." > (replace-regexp-in-string > "[^-.a-zA-Z0-9]" (lambda (c) > (mapconcat (lambda (d) (format "_%02x" d)) > (string-as-unibyte c) "")) str)) > > Certainly someone could already have an id "foo_5fbar", but the > table-based implementation already makes the assumption that all IDs will > be generated by it. > > Davis > > -- > This product is sold by volume, not by mass. If it appears too dense or > too sparse, it is because mass-energy conversion has occurred during > shipping.