unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
From: mah@everybody.org (Mark A. Hershberger)
To: 17613@debbugs.gnu.org
Subject: bug#17613: 24.3; html2text can't handle weird formatting
Date: Tue, 27 May 2014 19:14:04 -0400	[thread overview]
Message-ID: <87mwe2zuf7.fsf@flynn.nichework.com> (raw)

I got an email today whose source includes:

                                <a class=3D"mcnButton " title=3D"Profile L=
ink" href=3D"http://secret................................................=
...................................................." target=3D"_blank" st=
yle=3D"font-weight: bold;letter-spacing: normal;line-height: 100%;text-ali=
gn: center;text-decoration: none;color: #FFFFFF;word-wrap: break-word;-ms-=
text-size-adjust: 100%;-webkit-text-size-adjust: 100%;">Create Your Profil=
e!</a>

Line breaks exactly as they are in the email's source.

mu4e is being used to parse and display the email.  In
html2text-get-attr, execution stops on the following code:

      (cond
       ;; size=3
       ((string-match "[^ ]=[^ ]" this)
	(let ((attr  (nth 0 (split-string this "=")))
	      (value (substring prev (1+ (string-match "=" this)))))

with the message:

    Args out of range: "\"", 6, 1

describe-variable for this says:

    this's value is "title=\"Profile"

The html I've posted above is the only place where 'title="Profile' is
in the email.

In GNU Emacs 24.3.1 (x86_64-pc-linux-gnu, GTK+ Version 3.12.1)
 of 2014-05-05 on trouble, modified by Debian
Windowing system distributor `The X.Org Foundation', version 11.0.11501000
System Description:	Debian GNU/Linux testing (jessie)

Configured using:
 `configure '--build' 'x86_64-linux-gnu' '--build' 'x86_64-linux-gnu'
 '--prefix=/usr' '--sharedstatedir=/var/lib' '--libexecdir=/usr/lib'
 '--localstatedir=/var/lib' '--infodir=/usr/share/info'
 '--mandir=/usr/share/man' '--with-pop=yes'
 '--enable-locallisppath=/etc/emacs24:/etc/emacs:/usr/local/share/emacs/24.3/site-lisp:/usr/local/share/emacs/site-lisp:/usr/share/emacs/24.3/site-lisp:/usr/share/emacs/site-lisp'
 '--with-crt-dir=/usr/lib/x86_64-linux-gnu' '--with-x=yes'
 '--with-x-toolkit=gtk3' '--with-toolkit-scroll-bars'
 'build_alias=x86_64-linux-gnu' 'CFLAGS=-g -O2 -fstack-protector
 --param=ssp-buffer-size=4 -Wformat -Werror=format-security -Wall'
 'LDFLAGS=-Wl,-z,relro' 'CPPFLAGS=-D_FORTIFY_SOURCE=2''

Important settings:
  value of $LC_COLLATE: en_US.UTF-8
  value of $LC_CTYPE: en_US.UTF-8
  value of $LC_MESSAGES: en_US.UTF-8
  value of $LANG: en_US.utf8
  locale-coding-system: utf-8-unix
  default enable-multibyte-characters: t

Major mode: Fundamental

Minor modes in effect:
  erc-ring-mode: t
  erc-networks-mode: t
  erc-netsplit-mode: t
  erc-menu-mode: t
  erc-list-mode: t
  erc-pcomplete-mode: t
  erc-button-mode: t
  erc-fill-mode: t
  erc-stamp-mode: t
  erc-autojoin-mode: t
  erc-autoaway-mode: t
  erc-log-mode: t
  erc-track-mode: t
  erc-track-minor-mode: t
  erc-match-mode: t
  erc-smiley-mode: t
  erc-irccontrols-mode: t
  erc-noncommands-mode: t
  erc-move-to-prompt-mode: t
  erc-readonly-mode: t
  diff-auto-refine-mode: t
  show-paren-mode: t
  electric-indent-mode: t
  global-auto-revert-mode: t
  ido-everywhere: t
  shell-dirtrack-mode: t
  display-time-mode: t
  tooltip-mode: t
  mouse-wheel-mode: t
  file-name-shadow-mode: t
  global-font-lock-mode: t
  auto-composition-mode: t
  auto-encryption-mode: t
  auto-compression-mode: t
  column-number-mode: t
  line-number-mode: t
  transient-mark-mode: t
  abbrev-mode: t

Recent input:
C-n C-n C-n C-n C-n M-f M-f M-f M-f M-b C-SPC C-e C-b 
C-b C-b M-w <help-echo> <help-echo> <down-mouse-1> 
<mouse-movement> <mouse-1> q U <help-echo> <help-echo> 
<help-echo> <help-echo> <help-echo> <help-echo> <help-echo> 
<help-echo> <down-mouse-3> <mouse-3> <help-echo> <help-echo> 
<help-echo> <help-echo> <help-echo> <help-echo> <help-echo> 
<help-echo> <help-echo> <help-echo> <help-echo> <help-echo> 
<help-echo> <help-echo> <help-echo> <help-echo> <down-mouse-3> 
<mouse-3> <down-mouse-3> <mouse-3> <down-mouse-3> <mouse-3> 
<down-mouse-3> <mouse-3> <down-mouse-3> <mouse-3> <down-mouse-3> 
<mouse-3> <down-mouse-3> <mouse-3> <down-mouse-3> <mouse-3> 
<down-mouse-3> <mouse-3> <down-mouse-3> <mouse-3> <down-mouse-3> 
<mouse-3> <down-mouse-3> <mouse-3> <double-down-mouse-3> 
<double-mouse-3> <triple-down-mouse-3> <triple-mouse-3> 
<triple-down-mouse-3> <triple-mouse-3> <triple-down-mouse-3> 
<triple-mouse-3> <triple-down-mouse-3> <help-echo> 
<down-mouse-1> <mouse-1> C-x b d e <backspace> <backspace> 
<backspace> b a c <return> q <backspace> <backspace> 
C-x k <return> M-x C-g <S-up> b u <return> M-x M-p 
M-p M-p M-p M-p C-k t o g g l e SPC <tab> d <tab> e 
<tab> o <backspace> e <tab> <return> q C-x k <return> 
<return> C-x b b a C-g C-g M-x M-p C-e <M-backspace> 
q u <tab> <return> C-g C-g C-g q C-x k C-g C-x b b 
a l <backspace> c k g C-g M-x b a c k <tab> t <tab> 
<return> C-x o M-x d e <backspace> <backspace> e <backspace> 
m u C-g C-x b . e l <return> q <S-up> <return> q C-x 
1 <return> c c c c c n <backspace> X o <backspace> 
<backspace> X o _ _ _ <M-backspace> C-x o C-n <return> 
q C-x 1 <return> q q <help-echo> b u q U <help-echo> 
<help-echo> <help-echo> <down-mouse-1> <mouse-1> b 
u <return> c c c c c C-b C-b C-b C-h v t h <backspace> 
<backspace> <return> <down-mouse-1> <mouse-1> <mouse-movement> 
<mouse-movement> <mouse-movement> <mouse-movement> 
<mouse-movement> <mouse-movement> <mouse-movement> 
<mouse-movement> <help-echo> <mouse-movement> <mouse-movement> 
<mouse-movement> <mouse-movement> <mouse-movement> 
<down-mouse-1> <mouse-1> <help-echo> <down-mouse-1> 
<mouse-1> <help-echo> <down-mouse-1> <mouse-1> M-x 
r e p <backspace> p o <tab> r t q <backspace> <tab> 
u <backspace> b u <tab> <return>

Recent messages:
[mu4e] Indexing completed; processed 176499, updated 1, cleaned-up 0
[mu4e] Found 1 matching message
Continue...
Break
Continue...
Break
Continue...
Args out of range: "\"", 6, 1
Type "q" to delete help window.
Making completion list...

Load-path shadows:
/home/mah/.emacs.d/elpa/markdown-mode-1.9/markdown-mode hides ~/.emacs.d/lisp/markdown-mode
/home/mah/.emacs.d/elpa/htmlize-1.39/htmlize hides ~/.emacs.d/lisp/htmlize
/home/mah/.emacs.d/elpa/yaml-mode-0.0.9/yaml-mode hides ~/.emacs.d/lisp/yaml-mode
/home/mah/.emacs.d/elpa/javascript-2.2.1/javascript hides ~/.emacs.d/lisp/javascript
/home/mah/.emacs.d/elpa/lisppaste-1.8/lisppaste hides ~/.emacs.d/lisp/lisppaste
/home/mah/.emacs.d/elpa/php-mode-1.5.0/php-mode hides ~/.emacs.d/lisp/php-mode
/home/mah/.emacs.d/elpa/magit-1.2.0/magit-wip hides /usr/share/emacs/24.3/site-lisp/magit/magit-wip
/home/mah/.emacs.d/elpa/magit-1.2.0/magit hides /usr/share/emacs/24.3/site-lisp/magit/magit
/home/mah/.emacs.d/elpa/magit-1.2.0/magit-blame hides /usr/share/emacs/24.3/site-lisp/magit/magit-blame
/home/mah/.emacs.d/elpa/magit-1.2.0/magit-svn hides /usr/share/emacs/24.3/site-lisp/magit/magit-svn
/home/mah/.emacs.d/elpa/magit-1.2.0/magit-stgit hides /usr/share/emacs/24.3/site-lisp/magit/magit-stgit
/home/mah/.emacs.d/elpa/magit-1.2.0/magit-topgit hides /usr/share/emacs/24.3/site-lisp/magit/magit-topgit
/home/mah/.emacs.d/elpa/magit-1.2.0/magit-bisect hides /usr/share/emacs/24.3/site-lisp/magit/magit-bisect
/home/mah/.emacs.d/elpa/magit-1.2.0/magit-key-mode hides /usr/share/emacs/24.3/site-lisp/magit/magit-key-mode
/home/mah/.emacs.d/elpa/magit-1.2.0/rebase-mode hides /usr/share/emacs/24.3/site-lisp/magit/rebase-mode
/usr/share/emacs24/site-lisp/cmake-data/cmake-mode hides /usr/share/emacs/site-lisp/cmake-mode
/usr/share/emacs/24.3/site-lisp/debian-startup hides /usr/share/emacs/site-lisp/debian-startup
/usr/share/emacs24/site-lisp/flim/md4 hides /usr/share/emacs/24.3/lisp/md4
~/.emacs.d/lisp/iimage hides /usr/share/emacs/24.3/lisp/iimage
/usr/share/emacs24/site-lisp/flim/hex-util hides /usr/share/emacs/24.3/lisp/hex-util
/home/mah/.emacs.d/elpa/json-1.2/json hides /usr/share/emacs/24.3/lisp/json
/usr/share/emacs24/site-lisp/dictionaries-common/ispell hides /usr/share/emacs/24.3/lisp/textmodes/ispell
/usr/share/emacs24/site-lisp/dictionaries-common/flyspell hides /usr/share/emacs/24.3/lisp/textmodes/flyspell
/home/mah/.emacs.d/elpa/css-mode-1.0/css-mode hides /usr/share/emacs/24.3/lisp/textmodes/css-mode
/usr/share/emacs24/site-lisp/flim/sasl hides /usr/share/emacs/24.3/lisp/net/sasl
/usr/share/emacs24/site-lisp/flim/ntlm hides /usr/share/emacs/24.3/lisp/net/ntlm
/usr/share/emacs24/site-lisp/flim/hmac-def hides /usr/share/emacs/24.3/lisp/net/hmac-def
/usr/share/emacs24/site-lisp/flim/hmac-md5 hides /usr/share/emacs/24.3/lisp/net/hmac-md5
/usr/share/emacs24/site-lisp/flim/sasl-cram hides /usr/share/emacs/24.3/lisp/net/sasl-cram
/usr/share/emacs24/site-lisp/flim/sasl-digest hides /usr/share/emacs/24.3/lisp/net/sasl-digest
/usr/share/emacs24/site-lisp/flim/sasl-ntlm hides /usr/share/emacs/24.3/lisp/net/sasl-ntlm
/home/mah/.emacs.d/elpa/ert-0.1/ert hides /usr/share/emacs/24.3/lisp/emacs-lisp/ert

Features:
(shadow emacsbug url-http url-auth url-gw mm-url url url-proxy
url-privacy url-expand url-methods url-history url-cookie url-domsuf
url-util org-table org-element sql tramp-sh notify dbus xml erc-ring
erc-networks erc-netsplit erc-menu erc-pcomplete erc-button erc-fill
erc-stamp erc-sasl erc-join erc-autoaway erc-log erc-track erc-match
erc-goodies erc erc-backend erc-compat vc-sccs vc-svn vc-cvs vc-rcs
vc-dir ewoc canlock sh-script smie executable log-edit pcvs-util add-log
eudc eudc-vars pp dired-aux diff-mode vc vc-dispatcher vc-git subword
flymake php-extras eldoc php-mode etags cc-langs cc-mode cc-fonts
cc-guess cc-menus cc-cmds cc-styles cc-align cc-engine cc-vars
tramp-cache finder lisp-mnt edebug apropos debug image-file org-remember
org-datetree w3m-search rect conf-mode w3m-form w3m-cookie w3m-bookmark
w3m-tabmenu w3m-session w3m timezone w3m-hist w3m-fb bookmark-w3m
w3m-ems w3m-ccl ccl w3m-favicon w3m-image w3m-proc w3m-util loccur
misearch multi-isearch mailalias ispell qp sort smiley gnus-cite
flow-fill mm-archive mail-extr gnus-async gnus-bcklg gnus-ml gnus-topic
nndraft nnmh nnfolder parse-time epa-file netrc gnutls network-stream
tls gnus-agent gnus-srvr gnus-score score-mode nnvirtual nntp gnus-cache
gnus-pers spam spam-stat gnus-uu yenc gnus-msg gnus-art mm-uu mml2015
gnus-sum nnoo gnus-group gnus-undo nnmail mail-source starttls
gnus-start gnus-spec gnus-int gnus-range gnus-win time-stamp crm
thingatpt cus-edit multi-term term disp-table ehelp ffap url-parse
url-vars face-remap org-colview view mule-util cal-china lunar solar
cal-dst cal-bahai cal-islam cal-hebrew holidays hol-loaddefs cal-iso
vc-bzr test-case-mode fringe-helper cc-defs project-local-variables
org-indent org-wl org-w3m org-vm org-rmail org-mhe org-mew org-irc
org-jsinfo org-infojs org-html org-info org-gnus org-docview org-bibtex
bibtex org-bbdb ob-dot paren help-at-pt grep gnus gnus-ems nnheader
wid-edit chess-autoloads clojure-test-mode-autoloads company-autoloads
css-mode-autoloads dictionary-autoloads elk-test-autoloads
epresent-autoloads eproject-autoloads eproject ert-autoloads
etags-select-autoloads findr-autoloads flycheck-autoloads f-autoloads
fringe-helper-autoloads guess-style-autoloads highlight-80+-autoloads
jimb-patch-autoloads js2-mode-autoloads lisppaste-autoloads
magit-find-file-autoloads magit-gh-pulls-autoloads gh-autoloads
logito-autoloads magit-tramp-autoloads magit-autoloads
markdown-mode-autoloads marmalade-autoloads furl-autoloads
multi-web-mode-autoloads nrepl-autoloads pcache-autoloads
php-extras-autoloads info php-mode-autoloads pkg-info-autoloads
epl-autoloads dash-autoloads project-local-variables-autoloads
relax-autoloads javascript-autoloads json-autoloads s-autoloads
sass-mode-autoloads haml-mode-autoloads finder-inf scpaste-autoloads
htmlize-autoloads shellfm-autoloads ssh-config-mode-autoloads
subatomic-enhanced-theme-autoloads sudoku-autoloads
swank-clojure-autoloads clojure-mode-autoloads slime-repl-autoloads
slime-autoloads test-case-mode-autoloads weblogger-autoloads
xml-rpc-autoloads yaml-mode-autoloads zen-and-art-theme-autoloads
package mu4e mu4e-speedbar speedbar sb-image ezimage dframe mu4e-main
mu4e-view epa derived browse-url mu4e-headers mu4e-compose mu4e-draft
mu4e-actions rfc2368 smtpmail sendmail mu4e-mark mu4e-message html2text
mu4e-proc mu4e-utils doc-view jka-compr image-mode dired mu4e-lists
mu4e-about mu4e-vars message cl-macs gv rfc822 mml mailabbrev mail-utils
gmm-utils mailheader hl-line cl mu4e-meta electric autorevert edmacro
kmacro windmove ido cus-start cus-load appt diary-lib diary-loaddefs
org-clock org-exp ob-exp org-exp-blocks org-agenda org warnings
ob-tangle ob-ref ob-lob ob-table org-footnote org-src ob-comint ob-keys
org-pcomplete org-list org-faces org-entities noutline outline
easy-mmode org-version ob-emacs-lisp ob org-compat org-macs ob-eval
org-loaddefs find-func cal-menu calendar cal-loaddefs remember tramp
tramp-compat auth-source eieio byte-opt bytecomp byte-compile cconv
gnus-util tramp-loaddefs shell pcomplete format-spec advice help-fns
cl-lib advice-preload tabify mm-view mml-smime mml-sec smime
password-cache dig mm-decode mm-bodies mm-encode mailcap mail-parse
rfc2231 rfc2047 rfc2045 ietf-drums mm-util mail-prsvr server time allout
quail help-mode easymenu unichars epg epg-config w3m-load pylint compile
comint ansi-color ring 50magit time-date tooltip ediff-hook vc-hooks
lisp-float-type mwheel x-win x-dnd tool-bar dnd fontset image regexp-opt
fringe tabulated-list newcomment lisp-mode register page menu-bar
rfn-eshadow timer select scroll-bar mouse jit-lock font-lock syntax
facemenu font-core frame cham georgian utf-8-lang misc-lang vietnamese
tibetan thai tai-viet lao korean japanese hebrew greek romanian slovak
czech european ethiopic indian cyrillic chinese case-table epa-hook
jka-cmpr-hook help simple abbrev minibuffer loaddefs button faces
cus-face macroexp files text-properties overlay sha1 md5 base64 format
env code-pages mule custom widget hashtable-print-readable backquote
make-network-process dbusbind dynamic-setting system-font-setting
font-render-setting move-toolbar gtk x-toolkit x multi-tty emacs)


-- 
http://hexmode.com/

Despite honor and wealth, we don't last.  We die like any other animal.
        -- Psalm 49:12





             reply	other threads:[~2014-05-27 23:14 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-05-27 23:14 Mark A. Hershberger [this message]
2014-05-28  6:39 ` bug#17613: 24.3; html2text can't handle weird formatting Andreas Schwab

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87mwe2zuf7.fsf@flynn.nichework.com \
    --to=mah@everybody.org \
    --cc=17613@debbugs.gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).