From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED!not-for-mail From: =?UTF-8?Q?=E7=A9=8D=E4=B8=B9=E5=B0=BC?= Dan Jacobson Newsgroups: gmane.emacs.bugs Subject: bug#31665: libxml-parse-html-region' doesn't extract text in tables Date: Thu, 07 Jun 2018 04:50:15 +0800 Message-ID: <87d0x3od14.fsf@jidanni.org> References: <8736zjmtsa.fsf@jidanni.org> <87tvqof8uf.fsf_-_@jidanni.org> NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="=-=-=" X-Trace: blaine.gmane.org 1528357151 16747 195.159.176.226 (7 Jun 2018 07:39:11 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Thu, 7 Jun 2018 07:39:11 +0000 (UTC) Cc: Katsumi Yamaoka , 31665@debbugs.gnu.org To: Lars Ingebrigtsen Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Thu Jun 07 09:39:07 2018 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by blaine.gmane.org with esmtp (Exim 4.84_2) (envelope-from ) id 1fQpVi-0004FA-Ue for geb-bug-gnu-emacs@m.gmane.org; Thu, 07 Jun 2018 09:39:07 +0200 Original-Received: from localhost ([::1]:56139 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fQpXp-0004RX-W5 for geb-bug-gnu-emacs@m.gmane.org; Thu, 07 Jun 2018 03:41:18 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:35002) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fQpXe-0004RE-8M for bug-gnu-emacs@gnu.org; Thu, 07 Jun 2018 03:41:10 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1fQpXa-0004RE-IK for bug-gnu-emacs@gnu.org; Thu, 07 Jun 2018 03:41:06 -0400 Original-Received: from debbugs.gnu.org ([208.118.235.43]:57718) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1fQpXa-0004Qm-9w for bug-gnu-emacs@gnu.org; Thu, 07 Jun 2018 03:41:02 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1fQpXZ-0003Pk-Tr for bug-gnu-emacs@gnu.org; Thu, 07 Jun 2018 03:41:01 -0400 X-Loop: help-debbugs@gnu.org In-Reply-To: <87tvqof8uf.fsf_-_@jidanni.org> Resent-From: =?UTF-8?Q?=E7=A9=8D=E4=B8=B9=E5=B0=BC?= Dan Jacobson Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Thu, 07 Jun 2018 07:41:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 31665 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: moreinfo Original-Received: via spool by 31665-submit@debbugs.gnu.org id=B31665.152835721913069 (code B ref 31665); Thu, 07 Jun 2018 07:41:01 +0000 Original-Received: (at 31665) by debbugs.gnu.org; 7 Jun 2018 07:40:19 +0000 Original-Received: from localhost ([127.0.0.1]:37382 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1fQpWo-0003Of-Hm for submit@debbugs.gnu.org; Thu, 07 Jun 2018 03:40:19 -0400 Original-Received: from homie.mail.dreamhost.com ([208.97.132.208]:51545 helo=homiemail-a2.g.dreamhost.com) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1fQpWh-0003OR-G6 for 31665@debbugs.gnu.org; Thu, 07 Jun 2018 03:40:12 -0400 Original-Received: from homiemail-a2.g.dreamhost.com (localhost [127.0.0.1]) by homiemail-a2.g.dreamhost.com (Postfix) with ESMTP id 8BD5B28006D; Thu, 7 Jun 2018 00:40:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=jidanni.org; h=from:to:cc :subject:references:date:message-id:mime-version:content-type; s=jidanni.org; bh=S0LxGR0NM4wcDejYiqWV/CW+XKM=; b=oaHejUljiwWjJ q1Km2e4Xv+r4RT6AI7995hGbvRVhSHbFjvmnS4GWXFm3TwHdC/2ZzriQ9prVSP5B pAartK+X6nJBnH3vxClFIYd7HRGF09cP6b8aheTNH8cXPYEyWhiUB0HjXo3agHae azgJWwZ4jNFxB5w0lMmwd44q41jcZg= Original-Received: from jidanni.org (1-170-84-160.dynamic-ip.hinet.net [1.170.84.160]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) (Authenticated sender: jidanni@jidanni.org) by homiemail-a2.g.dreamhost.com (Postfix) with ESMTPSA id D5F55280063; Thu, 7 Jun 2018 00:40:05 -0700 (PDT) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 208.118.235.43 X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Original-Sender: "bug-gnu-emacs" Xref: news.gmane.org gmane.emacs.bugs:147122 Archived-At: --=-=-= Content-Type: text/plain >>>>> "LI" == Lars Ingebrigtsen writes: LI> Do you have an example table that `libxml-parse-html-region' doesn't LI> "extract" text from? OK here is a mail that I cleaned off my personal phone bill from: --=-=-= Content-Type: application/gzip Content-Disposition: attachment; filename=gg.gz Content-Transfer-Encoding: base64 H4sICFRIGFsAA2dnAO1abXPjthH+XPwKlJ1kmmklvumFlAX1qBc3zsQXx3Z6008eioQl3lEkQ1In q78+u+CLSEryyfY5c57pnW2RSxDYfXax+wDQeRyuaBDOQ3dLb5dr+tM6oLRPlc6gow10nWqKapCp nfIBPv4nVfv00t4KMVW1AfwoPfoPxVAUcg5dDejVMgw4nYRtOozeOcu07YSrdroZkWse+dvWbTig DxU5QYH00XPtIPDe5Z/tMF5IdHhAOiKXPEnsBW9dTAd0qBs9TTEfHvSOoXTNd3w+t1NnqY7IzXr+ kTtpoc7Y831yeXE5a/2Hx4kXBgOqthUyCYOUB2nrdhuBfau1n3qRHafyynvg7hklf5mH68C14y2T WvCP3V3B0zvNNHu6dqf2+qqq9eG6rXa1Xkc1My0kQlqt05s3lEj5Qyov05V/5iztOOEp++32vGXs WsV2kNzzuDULnND1gsWA/r4OU+62otgLUnvuc0KGf53+Mrn979WM/nh7+TO9+m3888WESi1Z/qBP ZHl6O80edNqKSkWPXgqg2L4sz95LIzJEBUbDJbfdEaF0mHqpz0ds1mFjg1lTNjOYcc6sczYzmTlm 47F4BBK1IuliS1PBC3hrrOMFYWaPWbOhnPWIfa94atNlmkYt/vva+8z0qVRFRKJOdocPduDQAh19 KvBBpWWh7xBDGe5c7zO1fW8R4IsOdMBjbCQQokm69Tk+2HhuuhxQQ+9GD2d0yb3FEoJGVRQF7iVK He77ke0i0NhckYQkiWzHY6SUfS4HSsNIyiDL1KAUr+PsAi/dA42zJ0KxeRi7PG4OVQ6UKYg3ugK3 Qnu86ymGRIuuitHpvrlZ94Mk9D2XqtED/Zs+75qmcZbbOFDBbjC77BiQ+C4blpFsYJBIu5Fq1pU2 NgTRaOitFjSJHewAfT2Q5c1m096lATlc3234XAZJIPM5TFfZW8E0T+Q5IzD5eay1o2BRRaDTqwBg oFZDOWqMLNd1gft4xBSTMk0R4uKzonvFPyvPdX0uNc07xVF1xWod1IOjLo/3hZlSzUjeDaDr4KD9 5/T6lw+M3FxZ70GqjwhYXf0R06MMC5pPA62nVKaB2u/gJBgNYX7G/L70HSOyrGp61Xuz8RXmOlmR J8t1sFhubHnie86nGx5/5vG/UnT87HtHfEwvVaWvdL+38e4CdFQlmtrxQkxm6W7u28En6VDENMa8 gnITYtrCwWXuruQZI3nn8sWNKuIFYsKGX7D2ILIiPo5AXkN4NEzSOAwWo+E9pCN6bzsCOJHcZpDR MCVC0oP8Bkmxh5nP0jEXgswUSdGCi45EE+9/4kVNGuErFjS18AWjg2lz1memhdciVUKvliYeQt89 Njtn4wkzJkKiMcsQKRbGn+aPTGsoF1pmhlU9/oI4ElFEsjjaxwo6fl4wYUY55NhnBRMjIpxutK8R TIyIcNoFk3ZyMNVEzfTy+CT/QmwdiyxyMLb6GCNjKy/M5lQUZouZ1fARVdnoowTKsoldVSMQRjZM vDC6onkZfwaOaIxFBxMs7nmbrnjrnBk97Idg9MJzHLgnmneZ2RVmmDhYNohhMUsRhoFyHXGhYsCj AQqyjGyWEBHmpeY9NMSa5BfYZUXvkm0IrlHRu1BOmJkrV+Kki/EysC00EJUzhC2C8yAUBuqdjZa/ rkNnQs+/C1IEzEfN4URcMwokAMaRhRPAWiObuh1m9XOlrPEPqCcpNO3j67nDRSfoRkvo2d3LC4KV ZZ0DGKBN5hdy1DO57l18KUMqZ3N1zBHxDBWNkeH4GopIZiN210ewLWEsdAF+faqbSeHo/bASOmcx YGXOMfA9Q0SAaQhVu0gtQUn0N8baWCuAKOLSMJuFqyAdXsAT3krbthvOOWYAOYpDd+2kiWw7MSMh rCLkGMhkkvJIayPhPJRVLCfGlvQaWgIV2M3SPNtjtsBYyzE8kB9oTklIswaBdJ8igBCpx2kUp2A5 5Us5F0ROuEtAu6xTJI9Wt8l3wPFDGdvs3hJjFiOVQ+BwJzDOOcf/NcZJz6QG5RQmkCpVO5I9DzDq p5BNIPF1uhnwjdpeePeHWV9pVgiV6N4PN4MlcEQenInHLd/ehmtYP9yL9eMGTG9tYjsaUEbmEE+f WigCW/fIY2WJoe7RSbUBDiNflVH6/D6t1PkjVb5e4/MS3+9+d1aU+25frS6htJ6a1Xug781Z+Px6 z0idPlrTF5BHmOh1+ljUex64qqK2P0anlfxHWFaMYOx56yieWgVP1awtSU0zp0+MfD04d4S5gPMl 9ImRY3Dea8+EM8uONUlT8NYR118FcV9X8jT2dRD/Qg55c6h3Xgf1zktQx1q61/AkGnCQBLyN0n+8 8NOi+hfzvND7aDg+iwfAWu/QttMyTHNXHlrYlRiXHiS7bBQ3thT2N8QO8LhdDT918j2ylldNAHJv y+cIG35kC84NHVlF/isoR6M/7PK523qM5AiLZfWepjB5mjn+oA9eCYgEMojnQBDYawgke5utAF4D AO0RAA4Ye4RuvBCABLTmyTKMqnp/AKVfxWT9WzG5ntpfxdbOt2mrPLmxoHh5QRqHr+Pk7ksNP4mG vHCVXFavbKPuDVfJP3uR7HvBp2KV/PTDkWOFrxb2xSZGXl2f6AfhgBz6AwX0sT3Y+eFHx+sPEIR8 153u9t+f3MuXJnnDeFj2in1i8pSd4myfeO88p/A88Nrjs/vc+hXLIG4tP3rCK/Zcf8X9+FV0ZmU0 WPjkebZr/T/d9iiMU9uXr+2Uuzxx7pL1KjusGWvC6j7uEuPurtiUzI5vTAOEpbHQ2dsx9zDr6xTO 3t9u7uFfw9ghQionCQZuzxvfvN+xJi49oHaMtHE7OG3DL6wjUyvYXtlb+XKL3xjh8UXg8oe2nUAG FmcqY3HMkG80W0JiisAXWBAY9ouW/z8jPStMS06OXwexnXSdtBwbXASliOf8PD8NwmOCLh4DmOLw aqzkRwkk8963Hpqnz9FuMUeziShOdMbnzNDFZD0XmdnEIx9TGzLyNszmuCu74jgdpVHlaKs8oiol 4nwMT6IMcZA0KZwujrrwbMrKTpkADkv51p2e5OaD08tstAw3DkKRBfeP3nueVk78u+I8sF8ztmkq I6enoJdyafbGjpweP3B6Sn4+tum59+WzEyg3I8e+BoWtfrr6t/Sk7x+U3mk4cte0CIXS/XVks7Hg Uzwciu/ejcgfnIYTJzQqAAA= --=-=-=--