unofficial mirror of help-gnu-emacs@gnu.org
 help / color / mirror / Atom feed
* Troubles in Regular Expression Paradise
@ 2014-05-14  2:18 Len Blanks
  2014-05-14 12:40 ` Nicolas Richard
  0 siblings, 1 reply; 5+ messages in thread
From: Len Blanks @ 2014-05-14  2:18 UTC (permalink / raw)
  To: help-gnu-emacs

I'm trying to parse an xml file containing information for a song "currently playing"
on my iTunes - specifically the artist, name of the tune and the CD in appears on.
Here's an example of the file:

<?xml version="1.0" encoding="UTF-8"?>

<now_playing playing="1" timestamp="2014-05-13T16:55:31Z">
	<song timestamp="2014-05-13T16:55:30Z">
		<title><![CDATA[Hold On]]></title>
		<artist><![CDATA[Alabama Shakes]]></artist>
		<album><![CDATA[Boys & Girls]]></album>
		<genre><![CDATA[Alternative Rock]]></genre>
		<kind>MPEG audio file</kind>
		<track>1</track>
		<numTracks>12</numTracks>
		<year>2012</year>
		<comments><![CDATA[Amazon.com Song ID: 228676960]]></comments>
		<time>226</time>
		<bitrate>249</bitrate>
		<disc>1</disc>
		<numDiscs>1</numDiscs>
		<compilation>No</compilation>
		<urlAmazon>http://www.amazon.com/Boys-Girls-Alabama-Shakes/dp/B0074MZSWW%3FSubscriptionId%3D03AKJ1J6S0FY8K0WRER2%26tag%3Dnowplaplu-20%26linkCode%3Dxm2%26camp%3D2025%26creative%3D165953%26creativeASIN%3DB0074MZSWW</urlAmazon>
		<urlApple/>
		<imageSmall>http://ecx.images-amazon.com/images/I/61P1-X1QkoL._SL75_.jpg</imageSmall>
		<image>http://ecx.images-amazon.com/images/I/61P1-X1QkoL._SL160_.jpg</image>
		<imageLarge>http://ecx.images-amazon.com/images/I/61P1-X1QkoL.jpg</imageLarge>
		<composer><![CDATA[Alabama Shakes]]></composer>
		<grouping><![CDATA[]]></grouping>
		<file><![CDATA[]]></file>
		<artworkID>c53ba0b14763278952c3fb1f8ea1da1f</artworkID>
	</song>
</now_playing>


and here is a function I had hoped would strip the relevant fields and return a string in
the form: "_artist_'s _title_ from the CD _album_" to be inserted in a X-NOW-PLAYING:
header in emails and usenet posts:

(defun now-playing (xml-file)
  ;;  (interactive "FFile: ")
  (with-temp-buffer
    (insert-file-contents xml-file)
    (goto-char 1)
    (when (re-search-forward (concat "<title><!\\[CDATA\\[\\([^\\]+\\)\\]></title>"
				     "[\0-\377[:nonascii:]]*"
				     "<artist><!\\[CDATA\\[\\([^\\]+\\)\\]></artist>"
				     "[\0-\377[:nonascii:]]*"
				     "<album><!\\[CDATA\\[\\([^\\]+\\)\\]></album>") nil t)
      (concat "\\2"
	      (if (string= (downcase (substring "\\2" -2 -1)) "s") "'" "'s")
	      " \\1 from the CD \\3"))))

(message (now-playing "/tmp/now_playing.xml")) ;; test now-playing


The regular expression was built and tested using re-build and it works well in matching
including the groupings \\( ... \\), which re-build colours quite nicely.  But I seem to have done
something really foolish since referencing \\1, \\2 and \\3 fail, so they don't seem to be
properly set by the groupings in the re.

The function returns "\2's \1 from the CD \3".

I'm sure the problem is something foolish, but I would really like to know what i did.
       
-- 
Len

The two most common elements in the universe are hydrogen and
stupidity. -- Harlan Ellison



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Troubles in Regular Expression Paradise
       [not found] <mailman.1280.1400034904.1147.help-gnu-emacs@gnu.org>
@ 2014-05-14  2:56 ` Joost Kremers
  2014-05-14  3:13   ` Len Blanks
  0 siblings, 1 reply; 5+ messages in thread
From: Joost Kremers @ 2014-05-14  2:56 UTC (permalink / raw)
  To: help-gnu-emacs

Len Blanks wrote:
> (defun now-playing (xml-file)
>   ;;  (interactive "FFile: ")
>   (with-temp-buffer
>     (insert-file-contents xml-file)
>     (goto-char 1)
>     (when (re-search-forward (concat "<title><!\\[CDATA\\[\\([^\\]+\\)\\]></title>"
> 				     "[\0-\377[:nonascii:]]*"
> 				     "<artist><!\\[CDATA\\[\\([^\\]+\\)\\]></artist>"
> 				     "[\0-\377[:nonascii:]]*"
> 				     "<album><!\\[CDATA\\[\\([^\\]+\\)\\]></album>") nil t)
>       (concat "\\2"
> 	      (if (string= (downcase (substring "\\2" -2 -1)) "s") "'" "'s")
> 	      " \\1 from the CD \\3"))))
>
> (message (now-playing "/tmp/now_playing.xml")) ;; test now-playing
>
>
> The regular expression was built and tested using re-build and it works well in matching
> including the groupings \\( ... \\), which re-build colours quite nicely.  But I seem to have done
> something really foolish since referencing \\1, \\2 and \\3 fail, so they don't seem to be
> properly set by the groupings in the re.
>
> The function returns "\2's \1 from the CD \3".
>
> I'm sure the problem is something foolish, but I would really like to know what i did.

You can only use such substitution operators in functions that are aware
of them. Normal string-handling functions are not, you'll need something
like match-string.



-- 
Joost Kremers                                   joostkremers@fastmail.fm
Selbst in die Unterwelt dringt durch Spalten Licht
EN:SiS(9)


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Troubles in Regular Expression Paradise
  2014-05-14  2:56 ` Troubles in Regular Expression Paradise Joost Kremers
@ 2014-05-14  3:13   ` Len Blanks
  0 siblings, 0 replies; 5+ messages in thread
From: Len Blanks @ 2014-05-14  3:13 UTC (permalink / raw)
  To: help-gnu-emacs

Joost Kremers <joost.m.kremers@gmail.com> writes:

> Len Blanks wrote:
>> (defun now-playing (xml-file)
>>   ;;  (interactive "FFile: ")
>>   (with-temp-buffer
>>     (insert-file-contents xml-file)
>>     (goto-char 1)
>>     (when (re-search-forward (concat "<title><!\\[CDATA\\[\\([^\\]+\\)\\]></title>"
>> 				     "[\0-\377[:nonascii:]]*"
>> 				     "<artist><!\\[CDATA\\[\\([^\\]+\\)\\]></artist>"
>> 				     "[\0-\377[:nonascii:]]*"
>> 				     "<album><!\\[CDATA\\[\\([^\\]+\\)\\]></album>") nil t)
>>       (concat "\\2"
>> 	      (if (string= (downcase (substring "\\2" -2 -1)) "s") "'" "'s")
>> 	      " \\1 from the CD \\3"))))
>>
>> (message (now-playing "/tmp/now_playing.xml")) ;; test now-playing
>>
>>
>> The regular expression was built and tested using re-build and it works well in matching
>> including the groupings \\( ... \\), which re-build colours quite nicely.  But I seem to have done
>> something really foolish since referencing \\1, \\2 and \\3 fail, so they don't seem to be
>> properly set by the groupings in the re.
>>
>> The function returns "\2's \1 from the CD \3".
>>
>> I'm sure the problem is something foolish, but I would really like to know what i did.
>
> You can only use such substitution operators in functions that are aware
> of them. Normal string-handling functions are not, you'll need something
> like match-string.

Vielen Dank.  I'll try that.
-- 
Len

Science is supposedly the method by which we stand on the shoulders of those who
came before us.  In computer science we are all standing on each others' feet.
                                                                      -- G Popek



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Troubles in Regular Expression Paradise
  2014-05-14  2:18 Len Blanks
@ 2014-05-14 12:40 ` Nicolas Richard
  2014-05-14 22:07   ` Len Blanks
  0 siblings, 1 reply; 5+ messages in thread
From: Nicolas Richard @ 2014-05-14 12:40 UTC (permalink / raw)
  To: help-gnu-emacs

Len Blanks <ltb@haruspex.net> writes:
> I'm trying to parse an xml file containing information for a song "currently playing"
> on my iTunes - specifically the artist, name of the tune and the CD in appears on.

> and here is a function I had hoped would strip the relevant fields and return a string in
> the form: "_artist_'s _title_ from the CD _album_" to be inserted in a X-NOW-PLAYING:
> header in emails and usenet posts:

The regexp question was answered, so I allow myself to mention
libxml-parse-xml-region instead of regexps for parsing xml.

First eval:
(setq yftest (libxml-parse-html-region (point-min) (point-max)))
in a buffer which holds your file.

Then you can get away with:
(caddr (assoc 'title (caddr yftest)))
(caddr (assoc 'artist (caddr yftest)))
(caddr (assoc 'album (caddr yftest)))
to get the title, artist and album respectively.

As a side note, I wrote some elisp for using the kind of data that
libxml-parse-html-region spits out, so here's my way of solving your
problem with my code :

(require 'tree-html)
;; https://github.com/YoungFrog/tree-html/blob/master/tree-html.el

(defun yf/get-value-for-sole-subtree-with-given-tag (tree tag)
  "Assume there's exactly one XML element with given TAG in TREE, and return its
associated value."
  (yf/tree-html-get-value
   (yf/tree-html-get-sole-element
    (yf/tree-html-select
     tree
     (lambda (tree)
       (eq tag (yf/tree-html-get-tag tree)))))))

(format "%s's %s from the CD %s"
        (yf/get-value-for-sole-subtree-with-given-tag yftest 'artist)
        (yf/get-value-for-sole-subtree-with-given-tag yftest 'title)
        (yf/get-value-for-sole-subtree-with-given-tag yftest 'album))

(Yes, I'm *that* bad at naming things.)

HTH,

-- 
Nico.



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Troubles in Regular Expression Paradise
  2014-05-14 12:40 ` Nicolas Richard
@ 2014-05-14 22:07   ` Len Blanks
  0 siblings, 0 replies; 5+ messages in thread
From: Len Blanks @ 2014-05-14 22:07 UTC (permalink / raw)
  To: help-gnu-emacs

Nicolas Richard <theonewiththeevillook@yahoo.fr> writes:

> Len Blanks <ltb@haruspex.net> writes:
>> I'm trying to parse an xml file containing information for a song "currently playing"
>> on my iTunes - specifically the artist, name of the tune and the CD in appears on.
>
>> and here is a function I had hoped would strip the relevant fields and return a string in
>> the form: "_artist_'s _title_ from the CD _album_" to be inserted in a X-NOW-PLAYING:
>> header in emails and usenet posts:
>
> The regexp question was answered, so I allow myself to mention
> libxml-parse-xml-region instead of regexps for parsing xml.
>
> First eval:
> (setq yftest (libxml-parse-html-region (point-min) (point-max)))
> in a buffer which holds your file.
>
> Then you can get away with:
> (caddr (assoc 'title (caddr yftest)))
> (caddr (assoc 'artist (caddr yftest)))
> (caddr (assoc 'album (caddr yftest)))
> to get the title, artist and album respectively.
>
> As a side note, I wrote some elisp for using the kind of data that
> libxml-parse-html-region spits out, so here's my way of solving your
> problem with my code :
>
> (require 'tree-html)
> ;; https://github.com/YoungFrog/tree-html/blob/master/tree-html.el
>
> (defun yf/get-value-for-sole-subtree-with-given-tag (tree tag)
>   "Assume there's exactly one XML element with given TAG in TREE, and return its
> associated value."
>   (yf/tree-html-get-value
>    (yf/tree-html-get-sole-element
>     (yf/tree-html-select
>      tree
>      (lambda (tree)
>        (eq tag (yf/tree-html-get-tag tree)))))))
>
> (format "%s's %s from the CD %s"
>         (yf/get-value-for-sole-subtree-with-given-tag yftest 'artist)
>         (yf/get-value-for-sole-subtree-with-given-tag yftest 'title)
>         (yf/get-value-for-sole-subtree-with-given-tag yftest 'album))
>
> (Yes, I'm *that* bad at naming things.)

I need to do a better job researching what is available, rather than
reinventing the wheel.  Thanks very much; I'll redo what I have using
libxml and your code to compare and learn.

Regards,
-- 
Len

Je suis Marxiste - tendance Groucho
            -- Slogan used at Nanterre in Paris, 1968




^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2014-05-14 22:07 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <mailman.1280.1400034904.1147.help-gnu-emacs@gnu.org>
2014-05-14  2:56 ` Troubles in Regular Expression Paradise Joost Kremers
2014-05-14  3:13   ` Len Blanks
2014-05-14  2:18 Len Blanks
2014-05-14 12:40 ` Nicolas Richard
2014-05-14 22:07   ` Len Blanks

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).