`css-extract-keyword-list' does not actually [PATCH] In function `css-extract-keyword-list' the search for "Appendix H. Index" fails e.g. this form: (search-backward "Appendix H. Index") when used to search this the contents of this URL: "http://www.w3.org/TR/REC-CSS2/css2.txt" which is dated: W3C Candidate Recommendation 08 September 2009 Returns this message: css-extract-keyword-list: Search failed: "Appendix H. Index" It appears this function was originally supplied to scrape CSS keywords as per the commented code in: lisp/textmodes/css-mode.el ,---- | (css-extract-keyword-list | '((pseudo . "^ +\\* :\\([^ \n,]+\\)") | (at . "^ +\\* @\\([^ \n,]+\\)") | (descriptor . "^ +\\* '\\([^ '\n]+\\)' (descriptor)") | (media . "^ +\\* '\\([^ '\n]+\\)' media group") | (property . "^ +\\* '\\([^ '\n]+\\)',"))) `---- However, W3C has gone behined Stefan's back and changed the Appendix enumeration without asking his permission first :) "Appendix H" is now "Appendix I". Compare the version scraped (presumably): (URL `http://www.w3.org/TR/2008/REC-CSS2-20080411/indexlist.html') (URL `http://www.w3.org/TR/2008/REC-CSS2-20080411/css2.txt') with the current version: (URL `http://www.w3.org/TR/CSS2/indexlist.html') (URL `http://www.w3.org/TR/CSS2/css2.txt') The following regexp may be more robust and appears to works for either the older version or the latest version and leaves room for W3C to continue add appendices J-M: (search-backward-regexp "[_━]\\{60,79\\}\xa[[:space:]]+Appendix [A-M]\. Index") This said, `css-extract-keyword-list' is now borking on regexps in these conses: (css-extract-keyword-list '((pseudo . "^ +\\* :\\([^ \n,]+\\)") (at . "^ +\\* @\\([^ \n,]+\\)") (descriptor . "^ +\\* '\\([^ '\n]+\\)' (descriptor)") (media . "^ +\\* '\\([^ '\n]+\\)' media group") (property . "^ +\\* '\\([^ '\n]+\\)',"))) and seems to be failing per `url-insert-file-contents' reliance on `decode-coding-inserted-region' which frobs the asterisks `*' (char #x2a) into a bullet `•' (char #x2022) -- at least on on my system. If we substitute occurences of "\\*" with "[*•]" (e.g. "[\x2a\x2022]") the following regexps now seem to work correctly: (pp (css-extract-keyword-list '((pseudo . "^ +[\x2a\x2022] :\\([^ \n,]+\\)") (at . "^ +[\x2a\x2022] @\\([^ \n,]+\\)") (descriptor . "^ +[\x2a\x2022] '\\([^ '\n]+\\)' (descriptor)") (media . "^ +[\x2a\x2022] '\\([^ '\n]+\\)' media group") (property . "^ +[\x2a\x2022] '\\([^ '\n]+\\)',"))) (current-buffer)) Following diffed against Bazaar revision 100231 ;;; ============================== *** ediff3753M5g 2010-06-03 09:43:04.000000000 -0400 --- lisp/textmodes/css-mode.el 2010-06-03 09:42:43.000000000 -0400 *************** *** 41,49 **** (defun css-extract-keyword-list (res) (with-temp-buffer ! (url-insert-file-contents "http://www.w3.org/TR/REC-CSS2/css2.txt") (goto-char (point-max)) ! (search-backward "Appendix H. Index") (forward-line) (delete-region (point-min) (point)) (let ((result nil) --- 41,49 ---- (defun css-extract-keyword-list (res) (with-temp-buffer ! (url-insert-file-contents "http://www.w3.org/TR/2008/REC-CSS2-20080411/css2.txt") (goto-char (point-max)) ! (search-backward-regexp "[_━]\\{60,79\\}\xa[[:space:]]+Appendix [A-M]\. Index") (forward-line) (delete-region (point-min) (point)) (let ((result nil) *************** *** 115,125 **** ;; Extraction was done with: ;; (css-extract-keyword-list ! ;; '((pseudo . "^ +\\* :\\([^ \n,]+\\)") ! ;; (at . "^ +\\* @\\([^ \n,]+\\)") ! ;; (descriptor . "^ +\\* '\\([^ '\n]+\\)' (descriptor)") ! ;; (media . "^ +\\* '\\([^ '\n]+\\)' media group") ! ;; (property . "^ +\\* '\\([^ '\n]+\\)',"))) (defconst css-pseudo-ids '("active" "after" "before" "first" "first-child" "first-letter" "first-line" --- 115,125 ---- ;; Extraction was done with: ;; (css-extract-keyword-list ! ;; '((pseudo . "^ +[\x2a\x2022] :\\([^ \n,]+\\)") ! ;; (at . "^ +[\x2a\x2022] @\\([^ \n,]+\\)") ! ;; (descriptor . "^ +[\x2a\x2022] '\\([^ '\n]+\\)' (descriptor)") ! ;; (media . "^ +[\x2a\x2022] '\\([^ '\n]+\\)' media group") ! ;; (property . "^ +[\x2a\x2022] '\\([^ '\n]+\\)',"))) (defconst css-pseudo-ids '("active" "after" "before" "first" "first-child" "first-letter" "first-line"