unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
* bug#7410: Impossible multibyte->unibyte conversion
@ 2010-11-15 21:46 Stefan Monnier
  2012-04-10 20:53 ` Lars Magne Ingebrigtsen
  0 siblings, 1 reply; 3+ messages in thread
From: Stefan Monnier @ 2010-11-15 21:46 UTC (permalink / raw)
  To: 7410

Package: Emacs
Version: 24.0.50

I get incorrect treatment of accents in gnus-article-wash-html in
the trunk.  More specifically, accents from latin-1 HTML email get
turned into \NNN byte chars.

With extra checks, I get that the accented chars are properly decoded into
the *mm*<4> buffer, and then in mm-shr, we do

       (mm-with-part handle
	 (when (and charset
		    (setq charset (mm-charset-to-coding-system charset))
		    (not (eq charset 'ascii)))
	   (insert (prog1
		       (mm-decode-coding-string (buffer-string) charset)
		     (erase-buffer)
		     (mm-enable-multibyte))))
	 (libxml-parse-html-region (point-min) (point-max)))

where mm-part inserts the `handle' part into a unibyte temp buffer, thus
turning those latin-1 accents back into bytes (well, in my own branch
of Emacs this signals an error instead, which is how I caught it).

It looks like mm-handle-buffer does not consistently return bytes (as
it usually does) but also occasionally returns chars.
Such inconsistencies will hurt until we get rid of them.


        Stefan

         


In GNU Emacs 24.0.50.1 (i686-pc-linux-gnu, X toolkit, Xaw3d scroll bars)
 of 2010-11-04 on ceviche
Windowing system distributor `The X.Org Foundation', version 11.0.10707000
configured using `configure  'CFLAGS=-Wall -Wno-pointer-sign -DUSE_LISP_UNION_TYPE -DSYNC_INPUT -DENABLE_CHECKING -DXASSERTS -DFONTSET_DEBUG -g -O1 -I/usr/include/GNUstep' '--enable-maintainer-mode' '--with-x-toolkit=lucid''

Important settings:
  value of $LC_ALL: nil
  value of $LC_COLLATE: nil
  value of $LC_CTYPE: nil
  value of $LC_MESSAGES: nil
  value of $LC_MONETARY: nil
  value of $LC_NUMERIC: nil
  value of $LC_TIME: nil
  value of $LANG: fr_CH.UTF-8
  value of $XMODIFIERS: nil
  locale-coding-system: utf-8-unix
  default enable-multibyte-characters: t

Major mode: Article

Minor modes in effect:
  diff-auto-refine-mode: t
  electric-pair-mode: t
  electric-indent-mode: t
  url-handler-mode: t
  global-reveal-mode: t
  reveal-mode: t
  auto-insert-mode: t
  savehist-mode: t
  minibuffer-electric-default-mode: t
  mouse-wheel-mode: t
  menu-bar-mode: t
  file-name-shadow-mode: t
  global-font-lock-mode: t
  font-lock-mode: t
  auto-composition-mode: t
  auto-encryption-mode: t
  auto-compression-mode: t
  line-number-mode: t
  transient-mark-mode: t

Recent input:
<select-window> <switch-frame> <select-window> <switch-frame> 
<select-window> <switch-frame> <switch-frame> <select-window> 
<switch-frame> <select-window> <switch-frame> <switch-frame> 
<select-window> <switch-frame> <switch-frame> <switch-frame> 
<select-window> <switch-frame> e ( p o p t - o - b 
u f f e r <backspace> <backspace> <backspace> <backspace> 
<backspace> <backspace> <backspace> <backspace> <backspace> 
<backspace> - t o - b u f f e r SPC " SPC * m m * < 
4 > > C-e <left> <left> <backspace> <return> M-< <switch-frame> 
<select-window> <switch-frame> <select-window> <switch-frame> 
<select-window> <switch-frame> <select-window> <switch-frame> 
<select-window> <switch-frame> <select-window> <switch-frame> 
<switch-frame> <help-echo> <switch-frame> <select-window> 
<switch-frame> <help-echo> <switch-frame> <select-window> 
<switch-frame> <select-window> <switch-frame> <select-window> 
<switch-frame> <select-window> <switch-frame> <select-window> 
<switch-frame> <select-window> <switch-frame> <select-window> 
<switch-frame> <select-window> <switch-frame> <select-window> 
<switch-frame> <select-window> <switch-frame> <select-window> 
<switch-frame> <select-window> <switch-frame> <select-window> 
<switch-frame> <switch-frame> <select-window> <switch-frame> 
<switch-frame> <select-window> <switch-frame> <help-echo> 
<switch-frame> <select-window> <down-mouse-1> <mouse-1> 
<C-tab> C-s C-w C-w C-a <switch-frame> <help-echo> 
<down-mouse-2> <mouse-2> <switch-frame> <select-window> 
<switch-frame> <select-window> C-e C-c @ C-a <right> 
<down> <left> <right> <down> <left> <right> <down> 
<left> <right> <down> <left> <right> <up> <left> <right> 
<up> <left> <right> <down> <left> <right> <down> <down> 
<left> <right> <down> <left> <left> <left> <left> <right> 
<right> <right> <right> <left> <right> <up> <left> 
<right> <up> <left> <right> <down> <left> <right> <down> 
<left> <right> <down> <left> <right> <down> <left> 
<right> <down> <left> <right> <up> <left> <right> <up> 
<left> <right> <up> <left> <right> <up> <left> <right> 
<up> <left> <right> <switch-frame> <select-window> 
<switch-frame> <switch-frame> <help-echo> <switch-frame> 
<switch-frame> <switch-frame> <switch-frame> <help-echo> 
<switch-frame> <switch-frame> <select-window> <switch-frame> 
<switch-frame> <select-window> <switch-frame> <select-window> 
<switch-frame> <select-window> <switch-frame> <switch-frame> 
<select-window> <switch-frame> <switch-frame> <select-window> 
<switch-frame> <select-window> <switch-frame> <select-window> 
<switch-frame> <switch-frame> <help-echo> <switch-frame> 
<select-window> <switch-frame> <select-window> <switch-frame> 
<select-window> <switch-frame> <switch-frame> <switch-frame> 
<select-window> <switch-frame> <select-window> <switch-frame> 
<select-window> <switch-frame> <switch-frame> <help-echo> 
<switch-frame> <switch-frame> <select-window> <switch-frame> 
<select-window> <switch-frame> <select-window> <switch-frame> 
<switch-frame> <select-window> <switch-frame> <select-window> 
<help-echo> <switch-frame> <select-window> <switch-frame> 
<switch-frame> <select-window> <switch-frame> <select-window> 
<help-echo> <switch-frame> <select-window> <select-window> 
M-x r e p o <tab> r <tab> <return>

Recent messages:
Mark saved where search started
mm-shr
Mark saved where search started [3 times]
Mark set
mm-shr
Entering debugger...
#<buffer  *mm*<4>>
Mark set
Mark saved where search started
Making completion list...

Load-path shadows:
/usr/share/emacs23/site-lisp/bbdb/bbdb-migrate hides /usr/share/emacs/site-lisp/bbdb/lisp/bbdb-migrate
/usr/share/emacs23/site-lisp/bbdb/bbdb hides /usr/share/emacs/site-lisp/bbdb/lisp/bbdb
/usr/share/emacs23/site-lisp/bbdb/bbdb-rmail hides /usr/share/emacs/site-lisp/bbdb/lisp/bbdb-rmail
/usr/share/emacs23/site-lisp/bbdb/bbdb-gnus hides /usr/share/emacs/site-lisp/bbdb/lisp/bbdb-gnus
/usr/share/emacs23/site-lisp/bbdb/bbdb-w3 hides /usr/share/emacs/site-lisp/bbdb/lisp/bbdb-w3
/usr/share/emacs23/site-lisp/bbdb/bbdb-com hides /usr/share/emacs/site-lisp/bbdb/lisp/bbdb-com
/usr/share/emacs23/site-lisp/bbdb/bbdb-merge hides /usr/share/emacs/site-lisp/bbdb/lisp/bbdb-merge
/usr/share/emacs23/site-lisp/bbdb/bbdb-ftp hides /usr/share/emacs/site-lisp/bbdb/lisp/bbdb-ftp
/usr/share/emacs23/site-lisp/bbdb/bbdb-sc hides /usr/share/emacs/site-lisp/bbdb/lisp/bbdb-sc
/usr/share/emacs23/site-lisp/bbdb/bbdb-vm hides /usr/share/emacs/site-lisp/bbdb/lisp/bbdb-vm
/usr/share/emacs23/site-lisp/bbdb/bbdb-gui hides /usr/share/emacs/site-lisp/bbdb/lisp/bbdb-gui
/usr/share/emacs23/site-lisp/bbdb/bbdb-print hides /usr/share/emacs/site-lisp/bbdb/lisp/bbdb-print
/usr/share/emacs23/site-lisp/bbdb/bbdb-hooks hides /usr/share/emacs/site-lisp/bbdb/lisp/bbdb-hooks
/usr/share/emacs23/site-lisp/bbdb/bbdb-mhe hides /usr/share/emacs/site-lisp/bbdb/lisp/bbdb-mhe
/usr/share/emacs23/site-lisp/bbdb/bbdb-whois hides /usr/share/emacs/site-lisp/bbdb/lisp/bbdb-whois
/usr/share/emacs23/site-lisp/bbdb/bbdb-snarf hides /usr/share/emacs/site-lisp/bbdb/lisp/bbdb-snarf

Features:
(emacsbug gnus-topic cl-specs shr url-http url-auth url-gw footnote
xscheme warnings trace testcover scheme unsafep re-builder shadow
inf-lisp ielm comint ring elp edebug cust-print vc-bzr filecache
find-func dabbrev multi-isearch diff-mode jka-compr rect pp descr-text
gnus-fun skeleton canlock sha1 hex-util novice woman tutorial help-macro
man assoc info-look info help-at-pt ehelp apropos cus-edit cus-start
cus-load gnus-html browse-url xml url-cache mm-url url url-proxy
url-privacy url-expand url-methods url-history url-cookie url-util
supercite regi flow-fill executable copyright debug gnus-draft gnus-dup
mule-util sort smiley ansi-color gnus-cite mail-extr gnus-async
gnus-bcklg qp byte-opt bytecomp byte-compile gnus-ml disp-table nnfolder
utf-7 nnimap parse-time tls utf7 nndraft nnmh nnagent nnml gnus-agent
gnus-srvr gnus-score score-mode nnvirtual gnus-msg gnus-art mm-uu
mml2015 epg-config mm-view smime password-cache dig mailcap nntp
gnus-cache gnus-sum nnoo gnus-group time-date gnus-undo nnmail
mail-source format-spec server gnus-start gnus-spec gnus-int gnus-range
message sendmail rfc822 mml mml-sec mm-decode mm-bodies mm-encode
mail-parse rfc2231 rfc2047 rfc2045 ietf-drums mailabbrev gmm-utils
mailheader gnus-win gnus gnus-ems nnheader mail-utils wid-edit noutline
outline easy-mmode flyspell ispell eldoc checkdoc regexp-opt thingatpt
help-mode easymenu view prog-mode electric url-handlers url-parse
auth-source netrc gnus-util url-vars mm-util mail-prsvr reveal
autoinsert uniquify advice help-fns advice-preload savehist
minibuf-eldef cl cl-loaddefs proof-site proof-autoloads pg-vars
bbdb-autoloads agda2 tooltip ediff-hook vc-hooks lisp-float-type mwheel
x-win x-dnd tool-bar dnd fontset image fringe lisp-mode register page
newcomment menu-bar rfn-eshadow timer select scroll-bar mouse jit-lock
font-lock syntax font-core frame cham georgian utf-8-lang misc-lang
vietnamese tibetan thai tai-viet lao korean japanese hebrew greek
romanian slovak czech european ethiopic indian cyrillic chinese
case-table epa-hook jka-cmpr-hook help simple abbrev minibuffer loaddefs
button faces cus-face files text-properties overlay md5 base64 format
env code-pages mule custom widget hashtable-print-readable backquote
make-network-process dbusbind dynamic-setting system-font-setting
font-render-setting x-toolkit x multi-tty emacs)





^ permalink raw reply	[flat|nested] 3+ messages in thread

* bug#7410: Impossible multibyte->unibyte conversion
  2010-11-15 21:46 bug#7410: Impossible multibyte->unibyte conversion Stefan Monnier
@ 2012-04-10 20:53 ` Lars Magne Ingebrigtsen
  2012-04-10 20:59   ` Lars Magne Ingebrigtsen
  0 siblings, 1 reply; 3+ messages in thread
From: Lars Magne Ingebrigtsen @ 2012-04-10 20:53 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: 7410

Stefan Monnier <monnier@iro.umontreal.ca> writes:

> I get incorrect treatment of accents in gnus-article-wash-html in
> the trunk.  More specifically, accents from latin-1 HTML email get
> turned into \NNN byte chars.

I was able to reproduce this bug, but the real problem seemed to be that
the article buffer was in unibyte mode after `C-u g', and that made the
actual insertion go wrong.  I've now fixed that.

...

Oh, and now I tested the non `C-u g' case.  That still doesn't work.
Back to the drawing board...

-- 
(domestic pets only, the antidote for overdose, milk.)
  bloggy blog http://lars.ingebrigtsen.no/





^ permalink raw reply	[flat|nested] 3+ messages in thread

* bug#7410: Impossible multibyte->unibyte conversion
  2012-04-10 20:53 ` Lars Magne Ingebrigtsen
@ 2012-04-10 20:59   ` Lars Magne Ingebrigtsen
  0 siblings, 0 replies; 3+ messages in thread
From: Lars Magne Ingebrigtsen @ 2012-04-10 20:59 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: 7410

Found the real bug.  The `gnus-article-wash-html' had parsed the
displayed article, and not the original one, so it was missing charset
info (and stuff).  Now fixed in No Gnus, so I expect it to show up in
Emacs 24.1 soon.

-- 
(domestic pets only, the antidote for overdose, milk.)
  bloggy blog http://lars.ingebrigtsen.no/





^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2012-04-10 20:59 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-11-15 21:46 bug#7410: Impossible multibyte->unibyte conversion Stefan Monnier
2012-04-10 20:53 ` Lars Magne Ingebrigtsen
2012-04-10 20:59   ` Lars Magne Ingebrigtsen

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).