unofficial mirror of guix-patches@gnu.org 
 help / color / mirror / code / Atom feed
* [bug#29902] [PATCH] gnu: Add html-xml-utils.
@ 2017-12-29 21:00 Stefan Reichör
  2017-12-31  6:30 ` Catonano
  0 siblings, 1 reply; 7+ messages in thread
From: Stefan Reichör @ 2017-12-29 21:00 UTC (permalink / raw)
  To: 29902

* gnu/packages/xml.scm (html-xml-utils): New variable.
---
 gnu/packages/xml.scm |   54 ++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 54 insertions(+)

diff --git a/gnu/packages/xml.scm b/gnu/packages/xml.scm
index 344d7c3..dde1964 100644
--- a/gnu/packages/xml.scm
+++ b/gnu/packages/xml.scm
@@ -1116,6 +1116,60 @@ match and extract data, and elements can be added, deleted or modified using
 XSLT and EXSLT.")
    (license license:x11)))
 
+(define-public html-xml-utils
+ (package
+   (name "html-xml-utils")
+   (version "7.4")
+   (source
+    (origin
+      (method url-fetch)
+      (uri (string-append
+            "https://www.w3.org/Tools/HTML-XML-utils/html-xml-utils-"
+            version ".tar.gz"))
+      (sha256
+       (base32
+        "04pgrahsfawnzd9pilvirs05pfdgsd7qwvw4dvkb42rgybhw6h95"))))
+   (build-system gnu-build-system)
+   (home-page "https://www.w3.org/Tools/HTML-XML-utils/")
+   (synopsis "Command line utilities to manipulate HTML and XML files")
+   (description "HTML-XML-utils provides a number of simple utilities for
+manipulating and converting HTML and XML files in various ways.  The suite
+consists of the following tools:
+
+@itemize
+ @item @command{asc2xml} convert from @code{UTF-8} to @code{&#nnn;} entities
+ @item @command{xml2asc} convert from @code{&#nnn;} entities to @code{UTF-8}
+ @item @command{hxaddid} add IDs to selected elements
+ @item @command{hxcite} replace bibliographic references by hyperlinks
+ @item @command{hxcite} mkbib - expand references and create bibliography
+ @item @command{hxclean} apply heuristics to correct an HTML file
+ @item @command{hxcopy} copy an HTML file while preserving relative links
+ @item @command{hxcount} count elements and attributes in HTML or XML files
+ @item @command{hxextract} extract selected elements
+ @item @command{hxincl} expand included HTML or XML files
+ @item @command{hxindex} create an alphabetically sorted index
+ @item @command{hxmkbib} create bibliography from a template
+ @item @command{hxmultitoc} create a table of contents for a set of HTML files
+ @item @command{hxname2id} move some @code{ID=} or @code{NAME=} from A elements to their parents
+ @item @command{hxnormalize} pretty-print an HTML file
+ @item @command{hxnsxml} convert output of hxxmlns back to normal XML
+ @item @command{hxnum} number section headings in an HTML file
+ @item @command{hxpipe} convert XML to a format easier to parse with Perl or AWK
+ @item @command{hxprintlinks} number links and add table of URLs at end of an HTML file
+ @item @command{hxprune} remove marked elements from an HTML file
+ @item @command{hxref} generate cross-references
+ @item @command{hxselect} extract elements that match a (CSS) selector
+ @item @command{hxtoc} insert a table of contents in an HTML file
+ @item @command{hxuncdata} replace CDATA sections by character entities
+ @item @command{hxunent} replace HTML predefined character entities to @code{UTF-8}
+ @item @command{hxunpipe} convert output of pipe back to XML format
+ @item @command{hxunxmlns} replace \"global names\" by XML Namespace prefixes
+ @item @command{hxwls} list links in an HTML file
+ @item @command{hxxmlns} replace XML Namespace prefixes by \"global names\"
+@end itemize
+")
+   (license license:expat)))
+
 (define-public xlsx2csv
   (package
     (name "xlsx2csv")

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [bug#29902] [PATCH] gnu: Add html-xml-utils.
  2017-12-29 21:00 [bug#29902] [PATCH] gnu: Add html-xml-utils Stefan Reichör
@ 2017-12-31  6:30 ` Catonano
  2017-12-31  8:22   ` Stefan Reichör
  2017-12-31 13:18   ` [bug#29902] " Tobias Geerinckx-Rice
  0 siblings, 2 replies; 7+ messages in thread
From: Catonano @ 2017-12-31  6:30 UTC (permalink / raw)
  To: Stefan Reichör; +Cc: 29902

[-- Attachment #1: Type: text/plain, Size: 4313 bytes --]

Hi Stefan !

Thanks for contributing !

I linted your patch and I get

gnu/packages/xml.scm:1120:1: html-xml-utils@7.4: line 1153 is way too long
(96 characters)

Also, I couldn't run

./pre-inst-env guix build --rounds=2 html-xml-utils

it just returns the store item as I had already built it without thinking
:-/

Apart from this, I'd say it's ok

It builds. I didn't try to run any of these commands.

Can you suggest me a command line and a set of html files to test them ?

Well this is just to be super scrupolous, anyway. If you say this works, I
believe you

So, as far as I'm concerned: lgtm !

2017-12-29 22:00 GMT+01:00 Stefan Reichör <stefan@xsteve.at>:

> * gnu/packages/xml.scm (html-xml-utils): New variable.
> ---
>  gnu/packages/xml.scm |   54 ++++++++++++++++++++++++++++++
> ++++++++++++++++++++
>  1 file changed, 54 insertions(+)
>
> diff --git a/gnu/packages/xml.scm b/gnu/packages/xml.scm
> index 344d7c3..dde1964 100644
> --- a/gnu/packages/xml.scm
> +++ b/gnu/packages/xml.scm
> @@ -1116,6 +1116,60 @@ match and extract data, and elements can be added,
> deleted or modified using
>  XSLT and EXSLT.")
>     (license license:x11)))
>
> +(define-public html-xml-utils
> + (package
> +   (name "html-xml-utils")
> +   (version "7.4")
> +   (source
> +    (origin
> +      (method url-fetch)
> +      (uri (string-append
> +            "https://www.w3.org/Tools/HTML-XML-utils/html-xml-utils-"
> +            version ".tar.gz"))
> +      (sha256
> +       (base32
> +        "04pgrahsfawnzd9pilvirs05pfdgsd7qwvw4dvkb42rgybhw6h95"))))
> +   (build-system gnu-build-system)
> +   (home-page "https://www.w3.org/Tools/HTML-XML-utils/")
> +   (synopsis "Command line utilities to manipulate HTML and XML files")
> +   (description "HTML-XML-utils provides a number of simple utilities for
> +manipulating and converting HTML and XML files in various ways.  The suite
> +consists of the following tools:
> +
> +@itemize
> + @item @command{asc2xml} convert from @code{UTF-8} to @code{&#nnn;}
> entities
> + @item @command{xml2asc} convert from @code{&#nnn;} entities to
> @code{UTF-8}
> + @item @command{hxaddid} add IDs to selected elements
> + @item @command{hxcite} replace bibliographic references by hyperlinks
> + @item @command{hxcite} mkbib - expand references and create bibliography
> + @item @command{hxclean} apply heuristics to correct an HTML file
> + @item @command{hxcopy} copy an HTML file while preserving relative links
> + @item @command{hxcount} count elements and attributes in HTML or XML
> files
> + @item @command{hxextract} extract selected elements
> + @item @command{hxincl} expand included HTML or XML files
> + @item @command{hxindex} create an alphabetically sorted index
> + @item @command{hxmkbib} create bibliography from a template
> + @item @command{hxmultitoc} create a table of contents for a set of HTML
> files
> + @item @command{hxname2id} move some @code{ID=} or @code{NAME=} from A
> elements to their parents
> + @item @command{hxnormalize} pretty-print an HTML file
> + @item @command{hxnsxml} convert output of hxxmlns back to normal XML
> + @item @command{hxnum} number section headings in an HTML file
> + @item @command{hxpipe} convert XML to a format easier to parse with Perl
> or AWK
> + @item @command{hxprintlinks} number links and add table of URLs at end
> of an HTML file
> + @item @command{hxprune} remove marked elements from an HTML file
> + @item @command{hxref} generate cross-references
> + @item @command{hxselect} extract elements that match a (CSS) selector
> + @item @command{hxtoc} insert a table of contents in an HTML file
> + @item @command{hxuncdata} replace CDATA sections by character entities
> + @item @command{hxunent} replace HTML predefined character entities to
> @code{UTF-8}
> + @item @command{hxunpipe} convert output of pipe back to XML format
> + @item @command{hxunxmlns} replace \"global names\" by XML Namespace
> prefixes
> + @item @command{hxwls} list links in an HTML file
> + @item @command{hxxmlns} replace XML Namespace prefixes by \"global
> names\"
> +@end itemize
> +")
> +   (license license:expat)))
> +
>  (define-public xlsx2csv
>    (package
>      (name "xlsx2csv")
>
>
>
>
>

[-- Attachment #2: Type: text/html, Size: 5327 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [bug#29902] [PATCH] gnu: Add html-xml-utils.
  2017-12-31  6:30 ` Catonano
@ 2017-12-31  8:22   ` Stefan Reichör
  2018-01-01 14:31     ` Catonano
  2018-01-08  9:30     ` bug#29902: " Ludovic Courtès
  2017-12-31 13:18   ` [bug#29902] " Tobias Geerinckx-Rice
  1 sibling, 2 replies; 7+ messages in thread
From: Stefan Reichör @ 2017-12-31  8:22 UTC (permalink / raw)
  To: 29902

Hi Catonano!

Thanks for your review.

> Hi Stefan !
>
> Thanks for contributing !
>
> I linted your patch and I get
>
> gnu/packages/xml.scm:1120:1: html-xml-utils@7.4: line 1153 is way too long
> (96 characters)

I fixed this.

> Also, I couldn't run
>
> ./pre-inst-env guix build --rounds=2 html-xml-utils
>
> it just returns the store item as I had already built it without thinking
> :-/
>
> Apart from this, I'd say it's ok
>
> It builds. I didn't try to run any of these commands.
>
> Can you suggest me a command line and a set of html files to test them ?

I am not aware of a lot of documentation with examples for these tools.

Here is some stuff I found on the web:

http://joeferner.github.io/2015/07/15/linux-command-line-html-and-awk/
https://superuser.com/questions/528709/command-line-css-selector-tool
https://www.joyofdata.de/blog/using-linux-shell-web-scraping/

This is a command line that I use to extract links from h2 elements:
cat ~/tmp/document.html | hxnormalize -x | hxselect -i h2 | hxwls

> Well this is just to be super scrupolous, anyway. If you say this works, I
> believe you
>
> So, as far as I'm concerned: lgtm !

Below is the corrected patch (I added the missing copyright line as well)




* gnu/packages/xml.scm (html-xml-utils): New variable.
---
 gnu/packages/xml.scm |   56 ++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 56 insertions(+)

diff --git a/gnu/packages/xml.scm b/gnu/packages/xml.scm
index 344d7c3..548cd1a 100644
--- a/gnu/packages/xml.scm
+++ b/gnu/packages/xml.scm
@@ -18,6 +18,7 @@
 ;;; Copyright © 2017 Gregor Giesen <giesen@zaehlwerk.net>
 ;;; Copyright © 2017 Alex Vong <alexvong1995@gmail.com>
 ;;; Copyright © 2017 Petter <petter@mykolab.ch>
+;;; Copyright © 2017 Stefan Reichör <stefan@xsteve.at>
 ;;;
 ;;; This file is part of GNU Guix.
 ;;;
@@ -1116,6 +1117,61 @@ match and extract data, and elements can be added, deleted or modified using
 XSLT and EXSLT.")
    (license license:x11)))
 
+(define-public html-xml-utils
+ (package
+   (name "html-xml-utils")
+   (version "7.4")
+   (source
+    (origin
+      (method url-fetch)
+      (uri (string-append
+            "https://www.w3.org/Tools/HTML-XML-utils/html-xml-utils-"
+            version ".tar.gz"))
+      (sha256
+       (base32
+        "04pgrahsfawnzd9pilvirs05pfdgsd7qwvw4dvkb42rgybhw6h95"))))
+   (build-system gnu-build-system)
+   (home-page "https://www.w3.org/Tools/HTML-XML-utils/")
+   (synopsis "Command line utilities to manipulate HTML and XML files")
+   (description "HTML-XML-utils provides a number of simple utilities for
+manipulating and converting HTML and XML files in various ways.  The suite
+consists of the following tools:
+
+@itemize
+ @item @command{asc2xml} convert from @code{UTF-8} to @code{&#nnn;} entities
+ @item @command{xml2asc} convert from @code{&#nnn;} entities to @code{UTF-8}
+ @item @command{hxaddid} add IDs to selected elements
+ @item @command{hxcite} replace bibliographic references by hyperlinks
+ @item @command{hxcite} mkbib - expand references and create bibliography
+ @item @command{hxclean} apply heuristics to correct an HTML file
+ @item @command{hxcopy} copy an HTML file while preserving relative links
+ @item @command{hxcount} count elements and attributes in HTML or XML files
+ @item @command{hxextract} extract selected elements
+ @item @command{hxincl} expand included HTML or XML files
+ @item @command{hxindex} create an alphabetically sorted index
+ @item @command{hxmkbib} create bibliography from a template
+ @item @command{hxmultitoc} create a table of contents for a set of HTML files
+ @item @command{hxname2id} move some @code{ID=} or @code{NAME=} from A elements
+       to their parents
+ @item @command{hxnormalize} pretty-print an HTML file
+ @item @command{hxnsxml} convert output of hxxmlns back to normal XML
+ @item @command{hxnum} number section headings in an HTML file
+ @item @command{hxpipe} convert XML to a format easier to parse with Perl or AWK
+ @item @command{hxprintlinks} number links and add table of URLs at end of an HTML file
+ @item @command{hxprune} remove marked elements from an HTML file
+ @item @command{hxref} generate cross-references
+ @item @command{hxselect} extract elements that match a (CSS) selector
+ @item @command{hxtoc} insert a table of contents in an HTML file
+ @item @command{hxuncdata} replace CDATA sections by character entities
+ @item @command{hxunent} replace HTML predefined character entities to @code{UTF-8}
+ @item @command{hxunpipe} convert output of pipe back to XML format
+ @item @command{hxunxmlns} replace \"global names\" by XML Namespace prefixes
+ @item @command{hxwls} list links in an HTML file
+ @item @command{hxxmlns} replace XML Namespace prefixes by \"global names\"
+@end itemize
+")
+   (license license:expat)))
+
 (define-public xlsx2csv
   (package
     (name "xlsx2csv")

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [bug#29902] [PATCH] gnu: Add html-xml-utils.
  2017-12-31  6:30 ` Catonano
  2017-12-31  8:22   ` Stefan Reichör
@ 2017-12-31 13:18   ` Tobias Geerinckx-Rice
  2018-01-01 14:33     ` Catonano
  1 sibling, 1 reply; 7+ messages in thread
From: Tobias Geerinckx-Rice @ 2017-12-31 13:18 UTC (permalink / raw)
  To: catonano; +Cc: 29902

Catonano,

Catonano wrote on 31/12/17 at 07:30:
> Also, I couldn't run
> 
> ./pre-inst-env guix build --rounds=2 html-xml-utils
> 
> it just returns the store item as I had already built it without
> thinking :-/

Been there. ‘guix build’ has a handy ‘--check’ option that solves just
this problem.

Happy times,

T G-R

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [bug#29902] [PATCH] gnu: Add html-xml-utils.
  2017-12-31  8:22   ` Stefan Reichör
@ 2018-01-01 14:31     ` Catonano
  2018-01-08  9:30     ` bug#29902: " Ludovic Courtès
  1 sibling, 0 replies; 7+ messages in thread
From: Catonano @ 2018-01-01 14:31 UTC (permalink / raw)
  To: Stefan Reichör; +Cc: 29902

[-- Attachment #1: Type: text/plain, Size: 663 bytes --]

Hi Stefan !

2017-12-31 9:22 GMT+01:00 Stefan Reichör <stefan@xsteve.at>:

> > gnu/packages/xml.scm:1120:1: html-xml-utils@7.4: line 1153 is way too
> long
> > (96 characters)
>
> I fixed this.
>

Yes, the linter doesn't report that anymore


> I am not aware of a lot of documentation with examples for these tools.
>
> Here is some stuff I found on the web:
>
> http://joeferner.github.io/2015/07/15/linux-command-line-html-and-awk/
> https://superuser.com/questions/528709/command-line-css-selector-tool


I tried this one and I got the expected result

So, not only html-xml-utils builds, it also runs correctly !



LGTM ! 😊

[-- Attachment #2: Type: text/html, Size: 1519 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [bug#29902] [PATCH] gnu: Add html-xml-utils.
  2017-12-31 13:18   ` [bug#29902] " Tobias Geerinckx-Rice
@ 2018-01-01 14:33     ` Catonano
  0 siblings, 0 replies; 7+ messages in thread
From: Catonano @ 2018-01-01 14:33 UTC (permalink / raw)
  To: Tobias Geerinckx-Rice; +Cc: 29902

[-- Attachment #1: Type: text/plain, Size: 568 bytes --]

2017-12-31 14:18 GMT+01:00 Tobias Geerinckx-Rice <me@tobias.gr>:

> Catonano,
>
> Catonano wrote on 31/12/17 at 07:30:
> > Also, I couldn't run
> >
> > ./pre-inst-env guix build --rounds=2 html-xml-utils
> >
> > it just returns the store item as I had already built it without
> > thinking :-/
>
> Been there. ‘guix build’ has a handy ‘--check’ option that solves just
> this problem.
>
> Happy times,
>
> T G-R
>

Thank you Tobias !

I tried that ! I didn't understand completely the output but I'll keep this
option in mind !

Ciao

[-- Attachment #2: Type: text/html, Size: 1036 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* bug#29902: [PATCH] gnu: Add html-xml-utils.
  2017-12-31  8:22   ` Stefan Reichör
  2018-01-01 14:31     ` Catonano
@ 2018-01-08  9:30     ` Ludovic Courtès
  1 sibling, 0 replies; 7+ messages in thread
From: Ludovic Courtès @ 2018-01-08  9:30 UTC (permalink / raw)
  To: Stefan Reichör; +Cc: 29902-done

Stefan Reichör <stefan@xsteve.at> skribis:

> Below is the corrected patch (I added the missing copyright line as well)

Applied, thanks!

Ludo’.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2018-01-08  9:31 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-12-29 21:00 [bug#29902] [PATCH] gnu: Add html-xml-utils Stefan Reichör
2017-12-31  6:30 ` Catonano
2017-12-31  8:22   ` Stefan Reichör
2018-01-01 14:31     ` Catonano
2018-01-08  9:30     ` bug#29902: " Ludovic Courtès
2017-12-31 13:18   ` [bug#29902] " Tobias Geerinckx-Rice
2018-01-01 14:33     ` Catonano

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/guix.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).