From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Lennart Borgman Newsgroups: gmane.emacs.devel Subject: Re: Converting a string to valid XHTML id? Date: Wed, 1 Dec 2010 15:53:46 +0100 Message-ID: References: NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Trace: dough.gmane.org 1291215269 32729 80.91.229.12 (1 Dec 2010 14:54:29 GMT) X-Complaints-To: usenet@dough.gmane.org NNTP-Posting-Date: Wed, 1 Dec 2010 14:54:29 +0000 (UTC) Cc: emacs-devel@gnu.org To: Ralf Mattes Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Wed Dec 01 15:54:25 2010 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1PNo4e-0002ej-RZ for ged-emacs-devel@m.gmane.org; Wed, 01 Dec 2010 15:54:25 +0100 Original-Received: from localhost ([127.0.0.1]:40459 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1PNo4e-0003QU-7S for ged-emacs-devel@m.gmane.org; Wed, 01 Dec 2010 09:54:24 -0500 Original-Received: from [140.186.70.92] (port=50858 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1PNo4S-0003OK-Lf for emacs-devel@gnu.org; Wed, 01 Dec 2010 09:54:13 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1PNo4P-0002dg-Bm for emacs-devel@gnu.org; Wed, 01 Dec 2010 09:54:12 -0500 Original-Received: from mail-ew0-f41.google.com ([209.85.215.41]:64070) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1PNo4P-0002dO-5H for emacs-devel@gnu.org; Wed, 01 Dec 2010 09:54:09 -0500 Original-Received: by ewy27 with SMTP id 27so17192873ewy.0 for ; Wed, 01 Dec 2010 06:54:07 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:mime-version:received:in-reply-to :references:from:date:message-id:subject:to:cc:content-type :content-transfer-encoding; bh=n/fYDQwich7XUxZC0gc6NrwaZnY/BuJ6fa2qxO4uM40=; b=hRS1gIUOPoi/24xDzvFN3IAl3s49zhZnlUR3Z40s/UVB631tecRM94lvva9ViSowRV Pn4IdafhH9snvfPRPSQ598dHqI7Y83ridee1YxjM5KhR5Ud5pkQS9oG3/PSJJh1kZce1 WYWFTfdmGjyG2do6QWGdX4ayb1p99TMUgNUK4= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-type:content-transfer-encoding; b=GGMfQVXPB8bjfahXWazWYQT/AUu7NPLk3fx1/IXgcd5cp1Y1H0OULa79dMf2OrvonB buLKIZOptZBWSkvxgLtEjbaDUYeJFTPM1djQuCNc4oq8dzrnK0/pA1NPpQSA5cjd6CAc TJI7VD4pXdtDKnS4Z4w4U54AHDqugTW8oUyWQ= Original-Received: by 10.213.15.139 with SMTP id k11mr4216595eba.97.1291215246735; Wed, 01 Dec 2010 06:54:06 -0800 (PST) Original-Received: by 10.213.29.8 with HTTP; Wed, 1 Dec 2010 06:53:46 -0800 (PST) In-Reply-To: X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 2) X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:133274 Archived-At: On Tue, Nov 30, 2010 at 3:50 PM, Ralf Mattes wrote: > > But this is wrong - it'll possibly generate invalid html. > Consider the following: > > =C2=A0(org-newhtml-escape-id "this is cool!") > > =E2=87=92 "this-is-cool-" > > =C2=A0(org-newhtml-escape-id "this is cool?") > > =E2=87=92 "this-is-cool-" > > collapsing two different strings to the same ID, resulting in > invalid html. Thanks Ralf, I thought it was a bit too much too handle, but here is a new version that tries to handle this. (You might perhaps sometimes want to set org-newhtml-escaped-ids to nil.) (defvar org-newhtml-escaped-ids nil) (make-variable-buffer-local 'org-newhtml-escaped-ids) (defun org-newhtml-escape-id (id) "Return a valid xhtml id attribute string. See URL `http://xhtml.com/en/xhtml/reference/attribute-data-types/#id'. Try to make this unique. Note that this cannot be done unless we know all used ids since the resulting string might be an already used id." (let ((old (assoc id org-newhtml-escaped-ids)) new-id) (if old (cdr old) (setq new-id (replace-regexp-in-string "\\`\\([^A-Za-z]\\)" "ANON-\\1" id nil)) (setq new-id (replace-regexp-in-string "[^A-Za-z0-9_.-]" "-" new-id t= )) (setq old t) (while old (setq old (rassoc new-id org-newhtml-escaped-ids)) (when old (setq new-id (concat new-id "X")))) (push (cons id new-id) org-newhtml-escaped-ids) new-id)))