* Troubles in Regular Expression Paradise
@ 2014-05-14 2:18 Len Blanks
2014-05-14 12:40 ` Nicolas Richard
0 siblings, 1 reply; 5+ messages in thread
From: Len Blanks @ 2014-05-14 2:18 UTC (permalink / raw)
To: help-gnu-emacs
I'm trying to parse an xml file containing information for a song "currently playing"
on my iTunes - specifically the artist, name of the tune and the CD in appears on.
Here's an example of the file:
<?xml version="1.0" encoding="UTF-8"?>
<now_playing playing="1" timestamp="2014-05-13T16:55:31Z">
<song timestamp="2014-05-13T16:55:30Z">
<title><![CDATA[Hold On]]></title>
<artist><![CDATA[Alabama Shakes]]></artist>
<album><![CDATA[Boys & Girls]]></album>
<genre><![CDATA[Alternative Rock]]></genre>
<kind>MPEG audio file</kind>
<track>1</track>
<numTracks>12</numTracks>
<year>2012</year>
<comments><![CDATA[Amazon.com Song ID: 228676960]]></comments>
<time>226</time>
<bitrate>249</bitrate>
<disc>1</disc>
<numDiscs>1</numDiscs>
<compilation>No</compilation>
<urlAmazon>http://www.amazon.com/Boys-Girls-Alabama-Shakes/dp/B0074MZSWW%3FSubscriptionId%3D03AKJ1J6S0FY8K0WRER2%26tag%3Dnowplaplu-20%26linkCode%3Dxm2%26camp%3D2025%26creative%3D165953%26creativeASIN%3DB0074MZSWW</urlAmazon>
<urlApple/>
<imageSmall>http://ecx.images-amazon.com/images/I/61P1-X1QkoL._SL75_.jpg</imageSmall>
<image>http://ecx.images-amazon.com/images/I/61P1-X1QkoL._SL160_.jpg</image>
<imageLarge>http://ecx.images-amazon.com/images/I/61P1-X1QkoL.jpg</imageLarge>
<composer><![CDATA[Alabama Shakes]]></composer>
<grouping><![CDATA[]]></grouping>
<file><![CDATA[]]></file>
<artworkID>c53ba0b14763278952c3fb1f8ea1da1f</artworkID>
</song>
</now_playing>
and here is a function I had hoped would strip the relevant fields and return a string in
the form: "_artist_'s _title_ from the CD _album_" to be inserted in a X-NOW-PLAYING:
header in emails and usenet posts:
(defun now-playing (xml-file)
;; (interactive "FFile: ")
(with-temp-buffer
(insert-file-contents xml-file)
(goto-char 1)
(when (re-search-forward (concat "<title><!\\[CDATA\\[\\([^\\]+\\)\\]></title>"
"[\0-\377[:nonascii:]]*"
"<artist><!\\[CDATA\\[\\([^\\]+\\)\\]></artist>"
"[\0-\377[:nonascii:]]*"
"<album><!\\[CDATA\\[\\([^\\]+\\)\\]></album>") nil t)
(concat "\\2"
(if (string= (downcase (substring "\\2" -2 -1)) "s") "'" "'s")
" \\1 from the CD \\3"))))
(message (now-playing "/tmp/now_playing.xml")) ;; test now-playing
The regular expression was built and tested using re-build and it works well in matching
including the groupings \\( ... \\), which re-build colours quite nicely. But I seem to have done
something really foolish since referencing \\1, \\2 and \\3 fail, so they don't seem to be
properly set by the groupings in the re.
The function returns "\2's \1 from the CD \3".
I'm sure the problem is something foolish, but I would really like to know what i did.
--
Len
The two most common elements in the universe are hydrogen and
stupidity. -- Harlan Ellison
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Troubles in Regular Expression Paradise
2014-05-14 2:18 Troubles in Regular Expression Paradise Len Blanks
@ 2014-05-14 12:40 ` Nicolas Richard
2014-05-14 22:07 ` Len Blanks
0 siblings, 1 reply; 5+ messages in thread
From: Nicolas Richard @ 2014-05-14 12:40 UTC (permalink / raw)
To: help-gnu-emacs
Len Blanks <ltb@haruspex.net> writes:
> I'm trying to parse an xml file containing information for a song "currently playing"
> on my iTunes - specifically the artist, name of the tune and the CD in appears on.
> and here is a function I had hoped would strip the relevant fields and return a string in
> the form: "_artist_'s _title_ from the CD _album_" to be inserted in a X-NOW-PLAYING:
> header in emails and usenet posts:
The regexp question was answered, so I allow myself to mention
libxml-parse-xml-region instead of regexps for parsing xml.
First eval:
(setq yftest (libxml-parse-html-region (point-min) (point-max)))
in a buffer which holds your file.
Then you can get away with:
(caddr (assoc 'title (caddr yftest)))
(caddr (assoc 'artist (caddr yftest)))
(caddr (assoc 'album (caddr yftest)))
to get the title, artist and album respectively.
As a side note, I wrote some elisp for using the kind of data that
libxml-parse-html-region spits out, so here's my way of solving your
problem with my code :
(require 'tree-html)
;; https://github.com/YoungFrog/tree-html/blob/master/tree-html.el
(defun yf/get-value-for-sole-subtree-with-given-tag (tree tag)
"Assume there's exactly one XML element with given TAG in TREE, and return its
associated value."
(yf/tree-html-get-value
(yf/tree-html-get-sole-element
(yf/tree-html-select
tree
(lambda (tree)
(eq tag (yf/tree-html-get-tag tree)))))))
(format "%s's %s from the CD %s"
(yf/get-value-for-sole-subtree-with-given-tag yftest 'artist)
(yf/get-value-for-sole-subtree-with-given-tag yftest 'title)
(yf/get-value-for-sole-subtree-with-given-tag yftest 'album))
(Yes, I'm *that* bad at naming things.)
HTH,
--
Nico.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Troubles in Regular Expression Paradise
2014-05-14 12:40 ` Nicolas Richard
@ 2014-05-14 22:07 ` Len Blanks
0 siblings, 0 replies; 5+ messages in thread
From: Len Blanks @ 2014-05-14 22:07 UTC (permalink / raw)
To: help-gnu-emacs
Nicolas Richard <theonewiththeevillook@yahoo.fr> writes:
> Len Blanks <ltb@haruspex.net> writes:
>> I'm trying to parse an xml file containing information for a song "currently playing"
>> on my iTunes - specifically the artist, name of the tune and the CD in appears on.
>
>> and here is a function I had hoped would strip the relevant fields and return a string in
>> the form: "_artist_'s _title_ from the CD _album_" to be inserted in a X-NOW-PLAYING:
>> header in emails and usenet posts:
>
> The regexp question was answered, so I allow myself to mention
> libxml-parse-xml-region instead of regexps for parsing xml.
>
> First eval:
> (setq yftest (libxml-parse-html-region (point-min) (point-max)))
> in a buffer which holds your file.
>
> Then you can get away with:
> (caddr (assoc 'title (caddr yftest)))
> (caddr (assoc 'artist (caddr yftest)))
> (caddr (assoc 'album (caddr yftest)))
> to get the title, artist and album respectively.
>
> As a side note, I wrote some elisp for using the kind of data that
> libxml-parse-html-region spits out, so here's my way of solving your
> problem with my code :
>
> (require 'tree-html)
> ;; https://github.com/YoungFrog/tree-html/blob/master/tree-html.el
>
> (defun yf/get-value-for-sole-subtree-with-given-tag (tree tag)
> "Assume there's exactly one XML element with given TAG in TREE, and return its
> associated value."
> (yf/tree-html-get-value
> (yf/tree-html-get-sole-element
> (yf/tree-html-select
> tree
> (lambda (tree)
> (eq tag (yf/tree-html-get-tag tree)))))))
>
> (format "%s's %s from the CD %s"
> (yf/get-value-for-sole-subtree-with-given-tag yftest 'artist)
> (yf/get-value-for-sole-subtree-with-given-tag yftest 'title)
> (yf/get-value-for-sole-subtree-with-given-tag yftest 'album))
>
> (Yes, I'm *that* bad at naming things.)
I need to do a better job researching what is available, rather than
reinventing the wheel. Thanks very much; I'll redo what I have using
libxml and your code to compare and learn.
Regards,
--
Len
Je suis Marxiste - tendance Groucho
-- Slogan used at Nanterre in Paris, 1968
^ permalink raw reply [flat|nested] 5+ messages in thread
[parent not found: <mailman.1280.1400034904.1147.help-gnu-emacs@gnu.org>]
* Re: Troubles in Regular Expression Paradise
[not found] <mailman.1280.1400034904.1147.help-gnu-emacs@gnu.org>
@ 2014-05-14 2:56 ` Joost Kremers
2014-05-14 3:13 ` Len Blanks
0 siblings, 1 reply; 5+ messages in thread
From: Joost Kremers @ 2014-05-14 2:56 UTC (permalink / raw)
To: help-gnu-emacs
Len Blanks wrote:
> (defun now-playing (xml-file)
> ;; (interactive "FFile: ")
> (with-temp-buffer
> (insert-file-contents xml-file)
> (goto-char 1)
> (when (re-search-forward (concat "<title><!\\[CDATA\\[\\([^\\]+\\)\\]></title>"
> "[\0-\377[:nonascii:]]*"
> "<artist><!\\[CDATA\\[\\([^\\]+\\)\\]></artist>"
> "[\0-\377[:nonascii:]]*"
> "<album><!\\[CDATA\\[\\([^\\]+\\)\\]></album>") nil t)
> (concat "\\2"
> (if (string= (downcase (substring "\\2" -2 -1)) "s") "'" "'s")
> " \\1 from the CD \\3"))))
>
> (message (now-playing "/tmp/now_playing.xml")) ;; test now-playing
>
>
> The regular expression was built and tested using re-build and it works well in matching
> including the groupings \\( ... \\), which re-build colours quite nicely. But I seem to have done
> something really foolish since referencing \\1, \\2 and \\3 fail, so they don't seem to be
> properly set by the groupings in the re.
>
> The function returns "\2's \1 from the CD \3".
>
> I'm sure the problem is something foolish, but I would really like to know what i did.
You can only use such substitution operators in functions that are aware
of them. Normal string-handling functions are not, you'll need something
like match-string.
--
Joost Kremers joostkremers@fastmail.fm
Selbst in die Unterwelt dringt durch Spalten Licht
EN:SiS(9)
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Troubles in Regular Expression Paradise
2014-05-14 2:56 ` Joost Kremers
@ 2014-05-14 3:13 ` Len Blanks
0 siblings, 0 replies; 5+ messages in thread
From: Len Blanks @ 2014-05-14 3:13 UTC (permalink / raw)
To: help-gnu-emacs
Joost Kremers <joost.m.kremers@gmail.com> writes:
> Len Blanks wrote:
>> (defun now-playing (xml-file)
>> ;; (interactive "FFile: ")
>> (with-temp-buffer
>> (insert-file-contents xml-file)
>> (goto-char 1)
>> (when (re-search-forward (concat "<title><!\\[CDATA\\[\\([^\\]+\\)\\]></title>"
>> "[\0-\377[:nonascii:]]*"
>> "<artist><!\\[CDATA\\[\\([^\\]+\\)\\]></artist>"
>> "[\0-\377[:nonascii:]]*"
>> "<album><!\\[CDATA\\[\\([^\\]+\\)\\]></album>") nil t)
>> (concat "\\2"
>> (if (string= (downcase (substring "\\2" -2 -1)) "s") "'" "'s")
>> " \\1 from the CD \\3"))))
>>
>> (message (now-playing "/tmp/now_playing.xml")) ;; test now-playing
>>
>>
>> The regular expression was built and tested using re-build and it works well in matching
>> including the groupings \\( ... \\), which re-build colours quite nicely. But I seem to have done
>> something really foolish since referencing \\1, \\2 and \\3 fail, so they don't seem to be
>> properly set by the groupings in the re.
>>
>> The function returns "\2's \1 from the CD \3".
>>
>> I'm sure the problem is something foolish, but I would really like to know what i did.
>
> You can only use such substitution operators in functions that are aware
> of them. Normal string-handling functions are not, you'll need something
> like match-string.
Vielen Dank. I'll try that.
--
Len
Science is supposedly the method by which we stand on the shoulders of those who
came before us. In computer science we are all standing on each others' feet.
-- G Popek
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2014-05-14 22:07 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-05-14 2:18 Troubles in Regular Expression Paradise Len Blanks
2014-05-14 12:40 ` Nicolas Richard
2014-05-14 22:07 ` Len Blanks
[not found] <mailman.1280.1400034904.1147.help-gnu-emacs@gnu.org>
2014-05-14 2:56 ` Joost Kremers
2014-05-14 3:13 ` Len Blanks
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).