From mboxrd@z Thu Jan 1 00:00:00 1970 From: Katherine Cox-Buday Subject: Re: Help with sxml simple parser for the quicklisp importer Date: Wed, 23 Jan 2019 10:55:21 -0600 Message-ID: <87h8dzco06.fsf@gmail.com> References: <1b161633-c285-1401-d771-c965dae58149@riseup.net> <874l9z78sc.fsf@elephly.net> <87womv5psn.fsf@elephly.net> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Return-path: Received: from eggs.gnu.org ([209.51.188.92]:60155) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1gmLog-0005pU-G4 for guix-devel@gnu.org; Wed, 23 Jan 2019 11:55:56 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1gmLoW-0002LX-EA for guix-devel@gnu.org; Wed, 23 Jan 2019 11:55:47 -0500 Received: from mail-io1-xd33.google.com ([2607:f8b0:4864:20::d33]:36215) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1gmLoU-00025o-HO for guix-devel@gnu.org; Wed, 23 Jan 2019 11:55:43 -0500 Received: by mail-io1-xd33.google.com with SMTP id m19so2206013ioh.3 for ; Wed, 23 Jan 2019 08:55:25 -0800 (PST) In-Reply-To: <87womv5psn.fsf@elephly.net> (Ricardo Wurmus's message of "Wed, 23 Jan 2019 16:58:32 +0100") List-Id: "Development of GNU Guix and the GNU System distribution." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guix-devel-bounces+gcggd-guix-devel=m.gmane.org@gnu.org Sender: "Guix-devel" To: Ricardo Wurmus Cc: guix-devel Ricardo Wurmus writes: > swedebugia writes: > >>> The second =E2=80=9Clink=E2=80=9D tag opens but is never closed. This = may be valid >>> HTML, but it is not valid XML, which is what xml->sxml expects. >> >> Thanks for the quick answer! >> I will try to remove this line before handling over to the parser. > > I would recommend looking for a better source of package information. > Parsing HTML is not fun and is often brittle. The package information in quickdocs is accessed[1] via the API of whatever is hosting the sourcecode. We could try doing the same. Alternatively, it is good practice for CL systems defined in .asd files to contain a `:description`, and even a `:long-description` field. We could take the stance that package information simply comes from there as technically this is the actual package's (i.e. system's) description. And as CL is a lisp, it should be relatively easy to parse this out. The only caveat is that I think it's possible for these fields to contain sexps which read in other files, in which case we should do the same. I hope this helps. [1] - https://github.com/quickdocs/quickdocs-updater/blob/a64a41df9e5f1a372= 1ab68f9f02189ecbb54513b/src/repos.lisp --=20 Katherine