* [bug#29902] [PATCH] gnu: Add html-xml-utils.
@ 2017-12-29 21:00 Stefan Reichör
2017-12-31 6:30 ` Catonano
0 siblings, 1 reply; 7+ messages in thread
From: Stefan Reichör @ 2017-12-29 21:00 UTC (permalink / raw)
To: 29902
* gnu/packages/xml.scm (html-xml-utils): New variable.
---
gnu/packages/xml.scm | 54 ++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 54 insertions(+)
diff --git a/gnu/packages/xml.scm b/gnu/packages/xml.scm
index 344d7c3..dde1964 100644
--- a/gnu/packages/xml.scm
+++ b/gnu/packages/xml.scm
@@ -1116,6 +1116,60 @@ match and extract data, and elements can be added, deleted or modified using
XSLT and EXSLT.")
(license license:x11)))
+(define-public html-xml-utils
+ (package
+ (name "html-xml-utils")
+ (version "7.4")
+ (source
+ (origin
+ (method url-fetch)
+ (uri (string-append
+ "https://www.w3.org/Tools/HTML-XML-utils/html-xml-utils-"
+ version ".tar.gz"))
+ (sha256
+ (base32
+ "04pgrahsfawnzd9pilvirs05pfdgsd7qwvw4dvkb42rgybhw6h95"))))
+ (build-system gnu-build-system)
+ (home-page "https://www.w3.org/Tools/HTML-XML-utils/")
+ (synopsis "Command line utilities to manipulate HTML and XML files")
+ (description "HTML-XML-utils provides a number of simple utilities for
+manipulating and converting HTML and XML files in various ways. The suite
+consists of the following tools:
+
+@itemize
+ @item @command{asc2xml} convert from @code{UTF-8} to @code{&#nnn;} entities
+ @item @command{xml2asc} convert from @code{&#nnn;} entities to @code{UTF-8}
+ @item @command{hxaddid} add IDs to selected elements
+ @item @command{hxcite} replace bibliographic references by hyperlinks
+ @item @command{hxcite} mkbib - expand references and create bibliography
+ @item @command{hxclean} apply heuristics to correct an HTML file
+ @item @command{hxcopy} copy an HTML file while preserving relative links
+ @item @command{hxcount} count elements and attributes in HTML or XML files
+ @item @command{hxextract} extract selected elements
+ @item @command{hxincl} expand included HTML or XML files
+ @item @command{hxindex} create an alphabetically sorted index
+ @item @command{hxmkbib} create bibliography from a template
+ @item @command{hxmultitoc} create a table of contents for a set of HTML files
+ @item @command{hxname2id} move some @code{ID=} or @code{NAME=} from A elements to their parents
+ @item @command{hxnormalize} pretty-print an HTML file
+ @item @command{hxnsxml} convert output of hxxmlns back to normal XML
+ @item @command{hxnum} number section headings in an HTML file
+ @item @command{hxpipe} convert XML to a format easier to parse with Perl or AWK
+ @item @command{hxprintlinks} number links and add table of URLs at end of an HTML file
+ @item @command{hxprune} remove marked elements from an HTML file
+ @item @command{hxref} generate cross-references
+ @item @command{hxselect} extract elements that match a (CSS) selector
+ @item @command{hxtoc} insert a table of contents in an HTML file
+ @item @command{hxuncdata} replace CDATA sections by character entities
+ @item @command{hxunent} replace HTML predefined character entities to @code{UTF-8}
+ @item @command{hxunpipe} convert output of pipe back to XML format
+ @item @command{hxunxmlns} replace \"global names\" by XML Namespace prefixes
+ @item @command{hxwls} list links in an HTML file
+ @item @command{hxxmlns} replace XML Namespace prefixes by \"global names\"
+@end itemize
+")
+ (license license:expat)))
+
(define-public xlsx2csv
(package
(name "xlsx2csv")
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [bug#29902] [PATCH] gnu: Add html-xml-utils.
2017-12-29 21:00 [bug#29902] [PATCH] gnu: Add html-xml-utils Stefan Reichör
@ 2017-12-31 6:30 ` Catonano
2017-12-31 8:22 ` Stefan Reichör
2017-12-31 13:18 ` [bug#29902] " Tobias Geerinckx-Rice
0 siblings, 2 replies; 7+ messages in thread
From: Catonano @ 2017-12-31 6:30 UTC (permalink / raw)
To: Stefan Reichör; +Cc: 29902
[-- Attachment #1: Type: text/plain, Size: 4313 bytes --]
Hi Stefan !
Thanks for contributing !
I linted your patch and I get
gnu/packages/xml.scm:1120:1: html-xml-utils@7.4: line 1153 is way too long
(96 characters)
Also, I couldn't run
./pre-inst-env guix build --rounds=2 html-xml-utils
it just returns the store item as I had already built it without thinking
:-/
Apart from this, I'd say it's ok
It builds. I didn't try to run any of these commands.
Can you suggest me a command line and a set of html files to test them ?
Well this is just to be super scrupolous, anyway. If you say this works, I
believe you
So, as far as I'm concerned: lgtm !
2017-12-29 22:00 GMT+01:00 Stefan Reichör <stefan@xsteve.at>:
> * gnu/packages/xml.scm (html-xml-utils): New variable.
> ---
> gnu/packages/xml.scm | 54 ++++++++++++++++++++++++++++++
> ++++++++++++++++++++
> 1 file changed, 54 insertions(+)
>
> diff --git a/gnu/packages/xml.scm b/gnu/packages/xml.scm
> index 344d7c3..dde1964 100644
> --- a/gnu/packages/xml.scm
> +++ b/gnu/packages/xml.scm
> @@ -1116,6 +1116,60 @@ match and extract data, and elements can be added,
> deleted or modified using
> XSLT and EXSLT.")
> (license license:x11)))
>
> +(define-public html-xml-utils
> + (package
> + (name "html-xml-utils")
> + (version "7.4")
> + (source
> + (origin
> + (method url-fetch)
> + (uri (string-append
> + "https://www.w3.org/Tools/HTML-XML-utils/html-xml-utils-"
> + version ".tar.gz"))
> + (sha256
> + (base32
> + "04pgrahsfawnzd9pilvirs05pfdgsd7qwvw4dvkb42rgybhw6h95"))))
> + (build-system gnu-build-system)
> + (home-page "https://www.w3.org/Tools/HTML-XML-utils/")
> + (synopsis "Command line utilities to manipulate HTML and XML files")
> + (description "HTML-XML-utils provides a number of simple utilities for
> +manipulating and converting HTML and XML files in various ways. The suite
> +consists of the following tools:
> +
> +@itemize
> + @item @command{asc2xml} convert from @code{UTF-8} to @code{&#nnn;}
> entities
> + @item @command{xml2asc} convert from @code{&#nnn;} entities to
> @code{UTF-8}
> + @item @command{hxaddid} add IDs to selected elements
> + @item @command{hxcite} replace bibliographic references by hyperlinks
> + @item @command{hxcite} mkbib - expand references and create bibliography
> + @item @command{hxclean} apply heuristics to correct an HTML file
> + @item @command{hxcopy} copy an HTML file while preserving relative links
> + @item @command{hxcount} count elements and attributes in HTML or XML
> files
> + @item @command{hxextract} extract selected elements
> + @item @command{hxincl} expand included HTML or XML files
> + @item @command{hxindex} create an alphabetically sorted index
> + @item @command{hxmkbib} create bibliography from a template
> + @item @command{hxmultitoc} create a table of contents for a set of HTML
> files
> + @item @command{hxname2id} move some @code{ID=} or @code{NAME=} from A
> elements to their parents
> + @item @command{hxnormalize} pretty-print an HTML file
> + @item @command{hxnsxml} convert output of hxxmlns back to normal XML
> + @item @command{hxnum} number section headings in an HTML file
> + @item @command{hxpipe} convert XML to a format easier to parse with Perl
> or AWK
> + @item @command{hxprintlinks} number links and add table of URLs at end
> of an HTML file
> + @item @command{hxprune} remove marked elements from an HTML file
> + @item @command{hxref} generate cross-references
> + @item @command{hxselect} extract elements that match a (CSS) selector
> + @item @command{hxtoc} insert a table of contents in an HTML file
> + @item @command{hxuncdata} replace CDATA sections by character entities
> + @item @command{hxunent} replace HTML predefined character entities to
> @code{UTF-8}
> + @item @command{hxunpipe} convert output of pipe back to XML format
> + @item @command{hxunxmlns} replace \"global names\" by XML Namespace
> prefixes
> + @item @command{hxwls} list links in an HTML file
> + @item @command{hxxmlns} replace XML Namespace prefixes by \"global
> names\"
> +@end itemize
> +")
> + (license license:expat)))
> +
> (define-public xlsx2csv
> (package
> (name "xlsx2csv")
>
>
>
>
>
[-- Attachment #2: Type: text/html, Size: 5327 bytes --]
^ permalink raw reply [flat|nested] 7+ messages in thread
* [bug#29902] [PATCH] gnu: Add html-xml-utils.
2017-12-31 6:30 ` Catonano
@ 2017-12-31 8:22 ` Stefan Reichör
2018-01-01 14:31 ` Catonano
2018-01-08 9:30 ` bug#29902: " Ludovic Courtès
2017-12-31 13:18 ` [bug#29902] " Tobias Geerinckx-Rice
1 sibling, 2 replies; 7+ messages in thread
From: Stefan Reichör @ 2017-12-31 8:22 UTC (permalink / raw)
To: 29902
Hi Catonano!
Thanks for your review.
> Hi Stefan !
>
> Thanks for contributing !
>
> I linted your patch and I get
>
> gnu/packages/xml.scm:1120:1: html-xml-utils@7.4: line 1153 is way too long
> (96 characters)
I fixed this.
> Also, I couldn't run
>
> ./pre-inst-env guix build --rounds=2 html-xml-utils
>
> it just returns the store item as I had already built it without thinking
> :-/
>
> Apart from this, I'd say it's ok
>
> It builds. I didn't try to run any of these commands.
>
> Can you suggest me a command line and a set of html files to test them ?
I am not aware of a lot of documentation with examples for these tools.
Here is some stuff I found on the web:
http://joeferner.github.io/2015/07/15/linux-command-line-html-and-awk/
https://superuser.com/questions/528709/command-line-css-selector-tool
https://www.joyofdata.de/blog/using-linux-shell-web-scraping/
This is a command line that I use to extract links from h2 elements:
cat ~/tmp/document.html | hxnormalize -x | hxselect -i h2 | hxwls
> Well this is just to be super scrupolous, anyway. If you say this works, I
> believe you
>
> So, as far as I'm concerned: lgtm !
Below is the corrected patch (I added the missing copyright line as well)
* gnu/packages/xml.scm (html-xml-utils): New variable.
---
gnu/packages/xml.scm | 56 ++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 56 insertions(+)
diff --git a/gnu/packages/xml.scm b/gnu/packages/xml.scm
index 344d7c3..548cd1a 100644
--- a/gnu/packages/xml.scm
+++ b/gnu/packages/xml.scm
@@ -18,6 +18,7 @@
;;; Copyright © 2017 Gregor Giesen <giesen@zaehlwerk.net>
;;; Copyright © 2017 Alex Vong <alexvong1995@gmail.com>
;;; Copyright © 2017 Petter <petter@mykolab.ch>
+;;; Copyright © 2017 Stefan Reichör <stefan@xsteve.at>
;;;
;;; This file is part of GNU Guix.
;;;
@@ -1116,6 +1117,61 @@ match and extract data, and elements can be added, deleted or modified using
XSLT and EXSLT.")
(license license:x11)))
+(define-public html-xml-utils
+ (package
+ (name "html-xml-utils")
+ (version "7.4")
+ (source
+ (origin
+ (method url-fetch)
+ (uri (string-append
+ "https://www.w3.org/Tools/HTML-XML-utils/html-xml-utils-"
+ version ".tar.gz"))
+ (sha256
+ (base32
+ "04pgrahsfawnzd9pilvirs05pfdgsd7qwvw4dvkb42rgybhw6h95"))))
+ (build-system gnu-build-system)
+ (home-page "https://www.w3.org/Tools/HTML-XML-utils/")
+ (synopsis "Command line utilities to manipulate HTML and XML files")
+ (description "HTML-XML-utils provides a number of simple utilities for
+manipulating and converting HTML and XML files in various ways. The suite
+consists of the following tools:
+
+@itemize
+ @item @command{asc2xml} convert from @code{UTF-8} to @code{&#nnn;} entities
+ @item @command{xml2asc} convert from @code{&#nnn;} entities to @code{UTF-8}
+ @item @command{hxaddid} add IDs to selected elements
+ @item @command{hxcite} replace bibliographic references by hyperlinks
+ @item @command{hxcite} mkbib - expand references and create bibliography
+ @item @command{hxclean} apply heuristics to correct an HTML file
+ @item @command{hxcopy} copy an HTML file while preserving relative links
+ @item @command{hxcount} count elements and attributes in HTML or XML files
+ @item @command{hxextract} extract selected elements
+ @item @command{hxincl} expand included HTML or XML files
+ @item @command{hxindex} create an alphabetically sorted index
+ @item @command{hxmkbib} create bibliography from a template
+ @item @command{hxmultitoc} create a table of contents for a set of HTML files
+ @item @command{hxname2id} move some @code{ID=} or @code{NAME=} from A elements
+ to their parents
+ @item @command{hxnormalize} pretty-print an HTML file
+ @item @command{hxnsxml} convert output of hxxmlns back to normal XML
+ @item @command{hxnum} number section headings in an HTML file
+ @item @command{hxpipe} convert XML to a format easier to parse with Perl or AWK
+ @item @command{hxprintlinks} number links and add table of URLs at end of an HTML file
+ @item @command{hxprune} remove marked elements from an HTML file
+ @item @command{hxref} generate cross-references
+ @item @command{hxselect} extract elements that match a (CSS) selector
+ @item @command{hxtoc} insert a table of contents in an HTML file
+ @item @command{hxuncdata} replace CDATA sections by character entities
+ @item @command{hxunent} replace HTML predefined character entities to @code{UTF-8}
+ @item @command{hxunpipe} convert output of pipe back to XML format
+ @item @command{hxunxmlns} replace \"global names\" by XML Namespace prefixes
+ @item @command{hxwls} list links in an HTML file
+ @item @command{hxxmlns} replace XML Namespace prefixes by \"global names\"
+@end itemize
+")
+ (license license:expat)))
+
(define-public xlsx2csv
(package
(name "xlsx2csv")
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [bug#29902] [PATCH] gnu: Add html-xml-utils.
2017-12-31 6:30 ` Catonano
2017-12-31 8:22 ` Stefan Reichör
@ 2017-12-31 13:18 ` Tobias Geerinckx-Rice
2018-01-01 14:33 ` Catonano
1 sibling, 1 reply; 7+ messages in thread
From: Tobias Geerinckx-Rice @ 2017-12-31 13:18 UTC (permalink / raw)
To: catonano; +Cc: 29902
Catonano,
Catonano wrote on 31/12/17 at 07:30:
> Also, I couldn't run
>
> ./pre-inst-env guix build --rounds=2 html-xml-utils
>
> it just returns the store item as I had already built it without
> thinking :-/
Been there. ‘guix build’ has a handy ‘--check’ option that solves just
this problem.
Happy times,
T G-R
^ permalink raw reply [flat|nested] 7+ messages in thread
* [bug#29902] [PATCH] gnu: Add html-xml-utils.
2017-12-31 8:22 ` Stefan Reichör
@ 2018-01-01 14:31 ` Catonano
2018-01-08 9:30 ` bug#29902: " Ludovic Courtès
1 sibling, 0 replies; 7+ messages in thread
From: Catonano @ 2018-01-01 14:31 UTC (permalink / raw)
To: Stefan Reichör; +Cc: 29902
[-- Attachment #1: Type: text/plain, Size: 663 bytes --]
Hi Stefan !
2017-12-31 9:22 GMT+01:00 Stefan Reichör <stefan@xsteve.at>:
> > gnu/packages/xml.scm:1120:1: html-xml-utils@7.4: line 1153 is way too
> long
> > (96 characters)
>
> I fixed this.
>
Yes, the linter doesn't report that anymore
> I am not aware of a lot of documentation with examples for these tools.
>
> Here is some stuff I found on the web:
>
> http://joeferner.github.io/2015/07/15/linux-command-line-html-and-awk/
> https://superuser.com/questions/528709/command-line-css-selector-tool
I tried this one and I got the expected result
So, not only html-xml-utils builds, it also runs correctly !
LGTM ! 😊
[-- Attachment #2: Type: text/html, Size: 1519 bytes --]
^ permalink raw reply [flat|nested] 7+ messages in thread
* [bug#29902] [PATCH] gnu: Add html-xml-utils.
2017-12-31 13:18 ` [bug#29902] " Tobias Geerinckx-Rice
@ 2018-01-01 14:33 ` Catonano
0 siblings, 0 replies; 7+ messages in thread
From: Catonano @ 2018-01-01 14:33 UTC (permalink / raw)
To: Tobias Geerinckx-Rice; +Cc: 29902
[-- Attachment #1: Type: text/plain, Size: 568 bytes --]
2017-12-31 14:18 GMT+01:00 Tobias Geerinckx-Rice <me@tobias.gr>:
> Catonano,
>
> Catonano wrote on 31/12/17 at 07:30:
> > Also, I couldn't run
> >
> > ./pre-inst-env guix build --rounds=2 html-xml-utils
> >
> > it just returns the store item as I had already built it without
> > thinking :-/
>
> Been there. ‘guix build’ has a handy ‘--check’ option that solves just
> this problem.
>
> Happy times,
>
> T G-R
>
Thank you Tobias !
I tried that ! I didn't understand completely the output but I'll keep this
option in mind !
Ciao
[-- Attachment #2: Type: text/html, Size: 1036 bytes --]
^ permalink raw reply [flat|nested] 7+ messages in thread
* bug#29902: [PATCH] gnu: Add html-xml-utils.
2017-12-31 8:22 ` Stefan Reichör
2018-01-01 14:31 ` Catonano
@ 2018-01-08 9:30 ` Ludovic Courtès
1 sibling, 0 replies; 7+ messages in thread
From: Ludovic Courtès @ 2018-01-08 9:30 UTC (permalink / raw)
To: Stefan Reichör; +Cc: 29902-done
Stefan Reichör <stefan@xsteve.at> skribis:
> Below is the corrected patch (I added the missing copyright line as well)
Applied, thanks!
Ludo’.
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2018-01-08 9:31 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-12-29 21:00 [bug#29902] [PATCH] gnu: Add html-xml-utils Stefan Reichör
2017-12-31 6:30 ` Catonano
2017-12-31 8:22 ` Stefan Reichör
2018-01-01 14:31 ` Catonano
2018-01-08 9:30 ` bug#29902: " Ludovic Courtès
2017-12-31 13:18 ` [bug#29902] " Tobias Geerinckx-Rice
2018-01-01 14:33 ` Catonano
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/guix.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).