From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: MON KEY Newsgroups: gmane.emacs.bugs Subject: bug#6345: css-mode `css-extract-keyword-list' does not actually [PATCH] Date: Thu, 3 Jun 2010 14:01:06 -0400 Message-ID: NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: multipart/mixed; boundary=000e0cd379aa91175a048823fdf7 X-Trace: dough.gmane.org 1275589687 10039 80.91.229.12 (3 Jun 2010 18:28:07 GMT) X-Complaints-To: usenet@dough.gmane.org NNTP-Posting-Date: Thu, 3 Jun 2010 18:28:07 +0000 (UTC) To: 6345@debbugs.gnu.org Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Thu Jun 03 20:28:05 2010 connect(): No such file or directory Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1OKF9A-0006wo-0V for geb-bug-gnu-emacs@m.gmane.org; Thu, 03 Jun 2010 20:28:04 +0200 Original-Received: from localhost ([127.0.0.1]:36227 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1OKF98-0000xs-Vp for geb-bug-gnu-emacs@m.gmane.org; Thu, 03 Jun 2010 14:28:03 -0400 Original-Received: from [140.186.70.92] (port=52441 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1OKF8x-0000tv-MA for bug-gnu-emacs@gnu.org; Thu, 03 Jun 2010 14:27:53 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.69) (envelope-from ) id 1OKF8w-0004e5-F5 for bug-gnu-emacs@gnu.org; Thu, 03 Jun 2010 14:27:51 -0400 Original-Received: from debbugs.gnu.org ([140.186.70.43]:42933) by eggs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1OKF8w-0004dw-CO for bug-gnu-emacs@gnu.org; Thu, 03 Jun 2010 14:27:50 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.69) (envelope-from ) id 1OKEjy-0000yZ-4Z; Thu, 03 Jun 2010 14:02:02 -0400 X-Loop: help-debbugs@gnu.org Resent-From: MON KEY Original-Sender: debbugs-submit-bounces@debbugs.gnu.org Resent-To: owner@debbugs.gnu.org Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Thu, 03 Jun 2010 18:02:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: report 6345 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: patch X-Debbugs-Original-To: bug-gnu-emacs@gnu.org Original-Received: via spool by submit@debbugs.gnu.org id=B.12755880823697 (code B ref -1); Thu, 03 Jun 2010 18:02:02 +0000 Original-Received: (at submit) by debbugs.gnu.org; 3 Jun 2010 18:01:22 +0000 Original-Received: from localhost ([127.0.0.1] helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1OKEjJ-0000xZ-VJ for submit@debbugs.gnu.org; Thu, 03 Jun 2010 14:01:22 -0400 Original-Received: from mx10.gnu.org ([199.232.76.166]) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1OKEjH-0000xP-UP for submit@debbugs.gnu.org; Thu, 03 Jun 2010 14:01:20 -0400 Original-Received: from lists.gnu.org ([199.232.76.165]:34824) by monty-python.gnu.org with esmtps (TLS-1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.60) (envelope-from ) id 1OKEjD-0008An-I8 for submit@debbugs.gnu.org; Thu, 03 Jun 2010 14:01:15 -0400 Original-Received: from [140.186.70.92] (port=56337 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1OKEjC-0007xu-0D for bug-gnu-emacs@gnu.org; Thu, 03 Jun 2010 14:01:15 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.69) (envelope-from ) id 1OKEj6-0006bA-Jh for bug-gnu-emacs@gnu.org; Thu, 03 Jun 2010 14:01:13 -0400 Original-Received: from mail-gw0-f41.google.com ([74.125.83.41]:58473) by eggs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1OKEj6-0006ad-GT for bug-gnu-emacs@gnu.org; Thu, 03 Jun 2010 14:01:08 -0400 Original-Received: by gwaa18 with SMTP id a18so347204gwa.0 for ; Thu, 03 Jun 2010 11:01:06 -0700 (PDT) Original-Received: by 10.150.210.16 with SMTP id i16mr10389833ybg.70.1275588066219; Thu, 03 Jun 2010 11:01:06 -0700 (PDT) Original-Received: by 10.151.143.21 with HTTP; Thu, 3 Jun 2010 11:01:06 -0700 (PDT) X-Google-Sender-Auth: hsKF6n8Yg4vkn_KE7Usn3I9j66Y X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 2) X-detected-operating-system: by monty-python.gnu.org: GNU/Linux 2.6, seldom 2.4 (older, 4) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.11 Precedence: list Resent-Date: Thu, 03 Jun 2010 14:02:02 -0400 X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 3) X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.bugs:37517 Archived-At: --000e0cd379aa91175a048823fdf7 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable `css-extract-keyword-list' does not actually [PATCH] In function `css-extract-keyword-list' the search for "Appendix H. Index" fails e.g. this form: (search-backward "Appendix H. Index") when used to search this the contents of this URL: "http://www.w3.org/TR/REC-CSS2/css2.txt" which is dated: W3C Candidate Recommendation 08 September 2009 Returns this message: css-extract-keyword-list: Search failed: "Appendix H. Index" It appears this function was originally supplied to scrape CSS keywords as per the commented code in: lisp/textmodes/css-mode.el ,---- | (css-extract-keyword-list | '((pseudo . "^ +\\* :\\([^ \n,]+\\)") | (at . "^ +\\* @\\([^ \n,]+\\)") | (descriptor . "^ +\\* '\\([^ '\n]+\\)' (descriptor)") | (media . "^ +\\* '\\([^ '\n]+\\)' media group") | (property . "^ +\\* '\\([^ '\n]+\\)',"))) `---- However, W3C has gone behined Stefan's back and changed the Appendix enumeration without asking his permission first :) "Appendix H" is now "Appendix I". Compare the version scraped (presumably): (URL `http://www.w3.org/TR/2008/REC-CSS2-20080411/indexlist.html') (URL `http://www.w3.org/TR/2008/REC-CSS2-20080411/css2.txt') with the current version: (URL `http://www.w3.org/TR/CSS2/indexlist.html') (URL `http://www.w3.org/TR/CSS2/css2.txt') The following regexp may be more robust and appears to works for either the older version or the latest version and leaves room for W3C to continue add appendices J-M: (search-backward-regexp "[_=E2=94=81]\\{60,79\\}\xa[[:space:]]+Appendix [A= -M]\. Index") This said, `css-extract-keyword-list' is now borking on regexps in these conses: (css-extract-keyword-list '((pseudo . "^ +\\* :\\([^ \n,]+\\)") (at . "^ +\\* @\\([^ \n,]+\\)") (descriptor . "^ +\\* '\\([^ '\n]+\\)' (descriptor)") (media . "^ +\\* '\\([^ '\n]+\\)' media group") (property . "^ +\\* '\\([^ '\n]+\\)',"))) and seems to be failing per `url-insert-file-contents' reliance on `decode-coding-inserted-region' which frobs the asterisks `*' (char #x2a) into a bullet `=E2=80=A2' (char #x2022) -- at least on on my system. If we substitute occurences of "\\*" with "[*=E2=80=A2]" (e.g. "[\x2a\x2022= ]") the following regexps now seem to work correctly: (pp (css-extract-keyword-list '((pseudo . "^ +[\x2a\x2022] :\\([^ \n,]+\\)") (at . "^ +[\x2a\x2022] @\\([^ \n,]+\\)") (descriptor . "^ +[\x2a\x2022] '\\([^ '\n]+\\)' (descriptor)") (media . "^ +[\x2a\x2022] '\\([^ '\n]+\\)' media group") (property . "^ +[\x2a\x2022] '\\([^ '\n]+\\)',"))) (current-buffer)) Following diffed against Bazaar revision 100231 ;;; =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D *** ediff3753M5g 2010-06-03 09:43:04.000000000 -0400 --- lisp/textmodes/css-mode.el 2010-06-03 09:42:43.000000000 -0400 *************** *** 41,49 **** (defun css-extract-keyword-list (res) (with-temp-buffer ! (url-insert-file-contents "http://www.w3.org/TR/REC-CSS2/css2.txt") (goto-char (point-max)) ! (search-backward "Appendix H. Index") (forward-line) (delete-region (point-min) (point)) (let ((result nil) --- 41,49 ---- (defun css-extract-keyword-list (res) (with-temp-buffer ! (url-insert-file-contents "http://www.w3.org/TR/2008/REC-CSS2-20080411/css2.txt") (goto-char (point-max)) ! (search-backward-regexp "[_=E2=94=81]\\{60,79\\}\xa[[:space:]]+Append= ix [A-M]\. Index") (forward-line) (delete-region (point-min) (point)) (let ((result nil) *************** *** 115,125 **** ;; Extraction was done with: ;; (css-extract-keyword-list ! ;; '((pseudo . "^ +\\* :\\([^ \n,]+\\)") ! ;; (at . "^ +\\* @\\([^ \n,]+\\)") ! ;; (descriptor . "^ +\\* '\\([^ '\n]+\\)' (descriptor)") ! ;; (media . "^ +\\* '\\([^ '\n]+\\)' media group") ! ;; (property . "^ +\\* '\\([^ '\n]+\\)',"))) (defconst css-pseudo-ids '("active" "after" "before" "first" "first-child" "first-letter" "first-line" --- 115,125 ---- ;; Extraction was done with: ;; (css-extract-keyword-list ! ;; '((pseudo . "^ +[\x2a\x2022] :\\([^ \n,]+\\)") ! ;; (at . "^ +[\x2a\x2022] @\\([^ \n,]+\\)") ! ;; (descriptor . "^ +[\x2a\x2022] '\\([^ '\n]+\\)' (descriptor)") ! ;; (media . "^ +[\x2a\x2022] '\\([^ '\n]+\\)' media group") ! ;; (property . "^ +[\x2a\x2022] '\\([^ '\n]+\\)',"))) (defconst css-pseudo-ids '("active" "after" "before" "first" "first-child" "first-letter" "first-line" --000e0cd379aa91175a048823fdf7 Content-Type: application/octet-stream; name="css-mode.diff-2010-06-03" Content-Disposition: attachment; filename="css-mode.diff-2010-06-03" Content-Transfer-Encoding: base64 X-Attachment-Id: f_g9zw5lgc0 KioqIGVkaWZmMzc1M001ZwkyMDEwLTA2LTAzIDA5OjQzOjA0LjAwMDAwMDAwMCAtMDQwMAotLS0g bGlzcC90ZXh0bW9kZXMvY3NzLW1vZGUuZWwJMjAxMC0wNi0wMyAwOTo0Mjo0My4wMDAwMDAwMDAg LTA0MDAKKioqKioqKioqKioqKioqCioqKiA0MSw0OSAqKioqCiAgCiAgKGRlZnVuIGNzcy1leHRy YWN0LWtleXdvcmQtbGlzdCAocmVzKQogICAgKHdpdGgtdGVtcC1idWZmZXIKISAgICAgKHVybC1p bnNlcnQtZmlsZS1jb250ZW50cyAiaHR0cDovL3d3dy53My5vcmcvVFIvUkVDLUNTUzIvY3NzMi50 eHQiKQogICAgICAoZ290by1jaGFyIChwb2ludC1tYXgpKQohICAgICAoc2VhcmNoLWJhY2t3YXJk ICJBcHBlbmRpeCBILiBJbmRleCIpCiAgICAgIChmb3J3YXJkLWxpbmUpCiAgICAgIChkZWxldGUt cmVnaW9uIChwb2ludC1taW4pIChwb2ludCkpCiAgICAgIChsZXQgKChyZXN1bHQgbmlsKQotLS0g NDEsNDkgLS0tLQogIAogIChkZWZ1biBjc3MtZXh0cmFjdC1rZXl3b3JkLWxpc3QgKHJlcykKICAg ICh3aXRoLXRlbXAtYnVmZmVyCiEgICAgICh1cmwtaW5zZXJ0LWZpbGUtY29udGVudHMgImh0dHA6 Ly93d3cudzMub3JnL1RSLzIwMDgvUkVDLUNTUzItMjAwODA0MTEvY3NzMi50eHQiKQogICAgICAo Z290by1jaGFyIChwb2ludC1tYXgpKQohICAgICAoc2VhcmNoLWJhY2t3YXJkLXJlZ2V4cCAiW1/i lIFdXFx7NjAsNzlcXH1ceGFbWzpzcGFjZTpdXStBcHBlbmRpeCBbQS1NXVwuIEluZGV4IikKICAg ICAgKGZvcndhcmQtbGluZSkKICAgICAgKGRlbGV0ZS1yZWdpb24gKHBvaW50LW1pbikgKHBvaW50 KSkKICAgICAgKGxldCAoKHJlc3VsdCBuaWwpCioqKioqKioqKioqKioqKgoqKiogMTE1LDEyNSAq KioqCiAgCiAgOzsgRXh0cmFjdGlvbiB3YXMgZG9uZSB3aXRoOgogIDs7IChjc3MtZXh0cmFjdC1r ZXl3b3JkLWxpc3QKISA7OyAgJygocHNldWRvIC4gIl4gK1xcKiA6XFwoW14gXG4sXStcXCkiKQoh IDs7ICAgIChhdCAuICJeICtcXCogQFxcKFteIFxuLF0rXFwpIikKISA7OyAgICAoZGVzY3JpcHRv ciAuICJeICtcXCogJ1xcKFteICdcbl0rXFwpJyAoZGVzY3JpcHRvcikiKQohIDs7ICAgIChtZWRp YSAuICJeICtcXCogJ1xcKFteICdcbl0rXFwpJyBtZWRpYSBncm91cCIpCiEgOzsgICAgKHByb3Bl cnR5IC4gIl4gK1xcKiAnXFwoW14gJ1xuXStcXCknLCIpKSkKICAKICAoZGVmY29uc3QgY3NzLXBz ZXVkby1pZHMKICAgICcoImFjdGl2ZSIgImFmdGVyIiAiYmVmb3JlIiAiZmlyc3QiICJmaXJzdC1j aGlsZCIgImZpcnN0LWxldHRlciIgImZpcnN0LWxpbmUiCi0tLSAxMTUsMTI1IC0tLS0KICAKICA7 OyBFeHRyYWN0aW9uIHdhcyBkb25lIHdpdGg6CiAgOzsgKGNzcy1leHRyYWN0LWtleXdvcmQtbGlz dAohIDs7ICAgICAgJygocHNldWRvIC4gIl4gK1tceDJhXHgyMDIyXSA6XFwoW14gXG4sXStcXCki KQohIDs7ICAgICAgICAoYXQgLiAiXiArW1x4MmFceDIwMjJdIEBcXChbXiBcbixdK1xcKSIpCiEg OzsgICAgICAgIChkZXNjcmlwdG9yIC4gIl4gK1tceDJhXHgyMDIyXSAnXFwoW14gJ1xuXStcXCkn IChkZXNjcmlwdG9yKSIpCiEgOzsgICAgICAgIChtZWRpYSAuICJeICtbXHgyYVx4MjAyMl0gJ1xc KFteICdcbl0rXFwpJyBtZWRpYSBncm91cCIpCiEgOzsgICAgICAgIChwcm9wZXJ0eSAuICJeICtb XHgyYVx4MjAyMl0gJ1xcKFteICdcbl0rXFwpJywiKSkpCiAgCiAgKGRlZmNvbnN0IGNzcy1wc2V1 ZG8taWRzCiAgICAnKCJhY3RpdmUiICJhZnRlciIgImJlZm9yZSIgImZpcnN0IiAiZmlyc3QtY2hp bGQiICJmaXJzdC1sZXR0ZXIiICJmaXJzdC1saW5lIgo= --000e0cd379aa91175a048823fdf7--