unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
From: MON KEY <monkey@sandpframing.com>
To: 6345@debbugs.gnu.org
Subject: bug#6345: css-mode `css-extract-keyword-list' does not actually [PATCH]
Date: Thu, 3 Jun 2010 14:01:06 -0400	[thread overview]
Message-ID: <AANLkTinVTRe4mo8xXbt-qn1FmMuyecpocECLDvCTM_n-@mail.gmail.com> (raw)

[-- Attachment #1: Type: text/plain, Size: 4583 bytes --]

`css-extract-keyword-list' does not actually [PATCH]

In function `css-extract-keyword-list' the search for "Appendix
H. Index" fails e.g. this form:

   (search-backward "Appendix H. Index")

when used to search this the contents of this URL:

 "http://www.w3.org/TR/REC-CSS2/css2.txt"

which is dated: W3C Candidate Recommendation 08 September 2009

Returns this message:

 css-extract-keyword-list: Search failed: "Appendix H. Index"

It appears this function was originally supplied to scrape CSS
keywords as per the commented code in: lisp/textmodes/css-mode.el

,----
| (css-extract-keyword-list
|   '((pseudo . "^ +\\* :\\([^ \n,]+\\)")
|     (at . "^ +\\* @\\([^ \n,]+\\)")
|     (descriptor . "^ +\\* '\\([^ '\n]+\\)' (descriptor)")
|     (media . "^ +\\* '\\([^ '\n]+\\)' media group")
|    (property . "^ +\\* '\\([^ '\n]+\\)',")))
`----

However, W3C has gone behined Stefan's back and changed the Appendix
enumeration without asking his permission first :)

"Appendix H" is now "Appendix I".

Compare the version scraped (presumably):

 (URL `http://www.w3.org/TR/2008/REC-CSS2-20080411/indexlist.html')
 (URL `http://www.w3.org/TR/2008/REC-CSS2-20080411/css2.txt')

with the current version:

 (URL `http://www.w3.org/TR/CSS2/indexlist.html')
 (URL `http://www.w3.org/TR/CSS2/css2.txt')

The following regexp may be more robust and appears to works for
either the older version or the latest version and leaves room for W3C
to continue add appendices J-M:

 (search-backward-regexp "[_━]\\{60,79\\}\xa[[:space:]]+Appendix [A-M]\. Index")

This said, `css-extract-keyword-list' is now borking on regexps in
these conses:

 (css-extract-keyword-list
  '((pseudo . "^ +\\* :\\([^ \n,]+\\)")
    (at . "^ +\\* @\\([^ \n,]+\\)")
    (descriptor . "^ +\\* '\\([^ '\n]+\\)' (descriptor)")
    (media . "^ +\\* '\\([^ '\n]+\\)' media group")
    (property . "^ +\\* '\\([^ '\n]+\\)',")))

and seems to be failing per `url-insert-file-contents' reliance on
`decode-coding-inserted-region' which frobs the asterisks `*' (char
#x2a) into a bullet `•' (char #x2022) -- at least on on my system.

If we substitute occurences of "\\*" with "[*•]" (e.g. "[\x2a\x2022]")
the following regexps now seem to work correctly:

 (pp (css-extract-keyword-list
      '((pseudo . "^ +[\x2a\x2022] :\\([^ \n,]+\\)")
        (at . "^ +[\x2a\x2022] @\\([^ \n,]+\\)")
        (descriptor . "^ +[\x2a\x2022] '\\([^ '\n]+\\)' (descriptor)")
        (media . "^ +[\x2a\x2022] '\\([^ '\n]+\\)' media group")
        (property . "^ +[\x2a\x2022] '\\([^ '\n]+\\)',")))
     (current-buffer))


Following diffed against Bazaar revision 100231

;;; ==============================

*** ediff3753M5g	2010-06-03 09:43:04.000000000 -0400
--- lisp/textmodes/css-mode.el	2010-06-03 09:42:43.000000000 -0400
***************
*** 41,49 ****

  (defun css-extract-keyword-list (res)
    (with-temp-buffer
!     (url-insert-file-contents "http://www.w3.org/TR/REC-CSS2/css2.txt")
      (goto-char (point-max))
!     (search-backward "Appendix H. Index")
      (forward-line)
      (delete-region (point-min) (point))
      (let ((result nil)
--- 41,49 ----

  (defun css-extract-keyword-list (res)
    (with-temp-buffer
!     (url-insert-file-contents
"http://www.w3.org/TR/2008/REC-CSS2-20080411/css2.txt")
      (goto-char (point-max))
!     (search-backward-regexp "[_━]\\{60,79\\}\xa[[:space:]]+Appendix
[A-M]\. Index")
      (forward-line)
      (delete-region (point-min) (point))
      (let ((result nil)
***************
*** 115,125 ****

  ;; Extraction was done with:
  ;; (css-extract-keyword-list
! ;;  '((pseudo . "^ +\\* :\\([^ \n,]+\\)")
! ;;    (at . "^ +\\* @\\([^ \n,]+\\)")
! ;;    (descriptor . "^ +\\* '\\([^ '\n]+\\)' (descriptor)")
! ;;    (media . "^ +\\* '\\([^ '\n]+\\)' media group")
! ;;    (property . "^ +\\* '\\([^ '\n]+\\)',")))

  (defconst css-pseudo-ids
    '("active" "after" "before" "first" "first-child" "first-letter"
"first-line"
--- 115,125 ----

  ;; Extraction was done with:
  ;; (css-extract-keyword-list
! ;;      '((pseudo . "^ +[\x2a\x2022] :\\([^ \n,]+\\)")
! ;;        (at . "^ +[\x2a\x2022] @\\([^ \n,]+\\)")
! ;;        (descriptor . "^ +[\x2a\x2022] '\\([^ '\n]+\\)' (descriptor)")
! ;;        (media . "^ +[\x2a\x2022] '\\([^ '\n]+\\)' media group")
! ;;        (property . "^ +[\x2a\x2022] '\\([^ '\n]+\\)',")))

  (defconst css-pseudo-ids
    '("active" "after" "before" "first" "first-child" "first-letter"
"first-line"

[-- Attachment #2: css-mode.diff-2010-06-03 --]
[-- Type: application/octet-stream, Size: 1799 bytes --]

*** ediff3753M5g	2010-06-03 09:43:04.000000000 -0400
--- lisp/textmodes/css-mode.el	2010-06-03 09:42:43.000000000 -0400
***************
*** 41,49 ****
  
  (defun css-extract-keyword-list (res)
    (with-temp-buffer
!     (url-insert-file-contents "http://www.w3.org/TR/REC-CSS2/css2.txt")
      (goto-char (point-max))
!     (search-backward "Appendix H. Index")
      (forward-line)
      (delete-region (point-min) (point))
      (let ((result nil)
--- 41,49 ----
  
  (defun css-extract-keyword-list (res)
    (with-temp-buffer
!     (url-insert-file-contents "http://www.w3.org/TR/2008/REC-CSS2-20080411/css2.txt")
      (goto-char (point-max))
!     (search-backward-regexp "[_━]\\{60,79\\}\xa[[:space:]]+Appendix [A-M]\. Index")
      (forward-line)
      (delete-region (point-min) (point))
      (let ((result nil)
***************
*** 115,125 ****
  
  ;; Extraction was done with:
  ;; (css-extract-keyword-list
! ;;  '((pseudo . "^ +\\* :\\([^ \n,]+\\)")
! ;;    (at . "^ +\\* @\\([^ \n,]+\\)")
! ;;    (descriptor . "^ +\\* '\\([^ '\n]+\\)' (descriptor)")
! ;;    (media . "^ +\\* '\\([^ '\n]+\\)' media group")
! ;;    (property . "^ +\\* '\\([^ '\n]+\\)',")))
  
  (defconst css-pseudo-ids
    '("active" "after" "before" "first" "first-child" "first-letter" "first-line"
--- 115,125 ----
  
  ;; Extraction was done with:
  ;; (css-extract-keyword-list
! ;;      '((pseudo . "^ +[\x2a\x2022] :\\([^ \n,]+\\)")
! ;;        (at . "^ +[\x2a\x2022] @\\([^ \n,]+\\)")
! ;;        (descriptor . "^ +[\x2a\x2022] '\\([^ '\n]+\\)' (descriptor)")
! ;;        (media . "^ +[\x2a\x2022] '\\([^ '\n]+\\)' media group")
! ;;        (property . "^ +[\x2a\x2022] '\\([^ '\n]+\\)',")))
  
  (defconst css-pseudo-ids
    '("active" "after" "before" "first" "first-child" "first-letter" "first-line"

             reply	other threads:[~2010-06-03 18:01 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-06-03 18:01 MON KEY [this message]
2012-04-10 11:11 ` bug#6345: css-mode `css-extract-keyword-list' does not actually [PATCH] Lars Magne Ingebrigtsen
2012-04-10 12:06   ` Stefan Monnier
2015-03-19 22:42 ` bug#6345: Status: " Simen Heggestøyl

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=AANLkTinVTRe4mo8xXbt-qn1FmMuyecpocECLDvCTM_n-@mail.gmail.com \
    --to=monkey@sandpframing.com \
    --cc=6345@debbugs.gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).