From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Stefan Monnier Newsgroups: gmane.emacs.bugs Subject: bug#7410: Impossible multibyte->unibyte conversion Date: Mon, 15 Nov 2010 16:46:53 -0500 Message-ID: NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain X-Trace: dough.gmane.org 1289859275 23858 80.91.229.12 (15 Nov 2010 22:14:35 GMT) X-Complaints-To: usenet@dough.gmane.org NNTP-Posting-Date: Mon, 15 Nov 2010 22:14:35 +0000 (UTC) To: 7410@debbugs.gnu.org Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Mon Nov 15 23:14:30 2010 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1PI7Jk-0002cS-MD for geb-bug-gnu-emacs@m.gmane.org; Mon, 15 Nov 2010 23:14:29 +0100 Original-Received: from localhost ([127.0.0.1]:60179 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1PI7Jj-0000qW-HJ for geb-bug-gnu-emacs@m.gmane.org; Mon, 15 Nov 2010 17:14:27 -0500 Original-Received: from [140.186.70.92] (port=40392 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1PI7JW-0000lK-4F for bug-gnu-emacs@gnu.org; Mon, 15 Nov 2010 17:14:19 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1PI7JS-0002OE-6i for bug-gnu-emacs@gnu.org; Mon, 15 Nov 2010 17:14:14 -0500 Original-Received: from debbugs.gnu.org ([140.186.70.43]:47561) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1PI7JS-0002O8-4h for bug-gnu-emacs@gnu.org; Mon, 15 Nov 2010 17:14:10 -0500 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.69) (envelope-from ) id 1PI6pJ-0003zn-Su; Mon, 15 Nov 2010 16:43:01 -0500 X-Loop: help-debbugs@gnu.org Resent-From: Stefan Monnier Original-Sender: debbugs-submit-bounces@debbugs.gnu.org Resent-To: owner@debbugs.gnu.org Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Mon, 15 Nov 2010 21:43:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: report 7410 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: X-Debbugs-Original-To: bug-gnu-emacs@gnu.org Original-Received: via spool by submit@debbugs.gnu.org id=B.128985733215351 (code B ref -1); Mon, 15 Nov 2010 21:43:01 +0000 Original-Received: (at submit) by debbugs.gnu.org; 15 Nov 2010 21:42:12 +0000 Original-Received: from localhost ([127.0.0.1] helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1PI6oV-0003zY-6m for submit@debbugs.gnu.org; Mon, 15 Nov 2010 16:42:11 -0500 Original-Received: from eggs.gnu.org ([140.186.70.92]) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1PI6oP-0003z9-J5 for submit@debbugs.gnu.org; Mon, 15 Nov 2010 16:42:09 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1PI6tC-0005K2-4c for submit@debbugs.gnu.org; Mon, 15 Nov 2010 16:47:03 -0500 Original-Received: from lists.gnu.org ([199.232.76.165]:51014) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1PI6tB-0005Jx-US for submit@debbugs.gnu.org; Mon, 15 Nov 2010 16:47:02 -0500 Original-Received: from [140.186.70.92] (port=43616 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1PI6t9-0005yW-Pd for bug-gnu-emacs@gnu.org; Mon, 15 Nov 2010 16:47:01 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1PI6t5-0005JN-KN for bug-gnu-emacs@gnu.org; Mon, 15 Nov 2010 16:46:59 -0500 Original-Received: from ironport2-out.teksavvy.com ([206.248.154.181]:12230 helo=ironport2-out.pppoe.ca) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1PI6t5-0005JI-Bt for bug-gnu-emacs@gnu.org; Mon, 15 Nov 2010 16:46:55 -0500 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: ArcHAG844UxFpY76/2dsb2JhbACUVwGNBX1ywHSDEYI5BIRajV8 X-IronPort-AV: E=Sophos;i="4.59,202,1288584000"; d="scan'208";a="82625482" Original-Received: from 69-165-142-250.dsl.teksavvy.com (HELO ceviche.home) ([69.165.142.250]) by ironport2-out.pppoe.ca with ESMTP/TLS/ADH-AES256-SHA; 15 Nov 2010 16:46:53 -0500 Original-Received: by ceviche.home (Postfix, from userid 20848) id 481C56611E; Mon, 15 Nov 2010 16:46:53 -0500 (EST) X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 2) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.11 Precedence: list Resent-Date: Mon, 15 Nov 2010 16:43:01 -0500 X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 3) X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.bugs:41653 Archived-At: Package: Emacs Version: 24.0.50 I get incorrect treatment of accents in gnus-article-wash-html in the trunk. More specifically, accents from latin-1 HTML email get turned into \NNN byte chars. With extra checks, I get that the accented chars are properly decoded into the *mm*<4> buffer, and then in mm-shr, we do (mm-with-part handle (when (and charset (setq charset (mm-charset-to-coding-system charset)) (not (eq charset 'ascii))) (insert (prog1 (mm-decode-coding-string (buffer-string) charset) (erase-buffer) (mm-enable-multibyte)))) (libxml-parse-html-region (point-min) (point-max))) where mm-part inserts the `handle' part into a unibyte temp buffer, thus turning those latin-1 accents back into bytes (well, in my own branch of Emacs this signals an error instead, which is how I caught it). It looks like mm-handle-buffer does not consistently return bytes (as it usually does) but also occasionally returns chars. Such inconsistencies will hurt until we get rid of them. Stefan In GNU Emacs 24.0.50.1 (i686-pc-linux-gnu, X toolkit, Xaw3d scroll bars) of 2010-11-04 on ceviche Windowing system distributor `The X.Org Foundation', version 11.0.10707000 configured using `configure 'CFLAGS=-Wall -Wno-pointer-sign -DUSE_LISP_UNION_TYPE -DSYNC_INPUT -DENABLE_CHECKING -DXASSERTS -DFONTSET_DEBUG -g -O1 -I/usr/include/GNUstep' '--enable-maintainer-mode' '--with-x-toolkit=lucid'' Important settings: value of $LC_ALL: nil value of $LC_COLLATE: nil value of $LC_CTYPE: nil value of $LC_MESSAGES: nil value of $LC_MONETARY: nil value of $LC_NUMERIC: nil value of $LC_TIME: nil value of $LANG: fr_CH.UTF-8 value of $XMODIFIERS: nil locale-coding-system: utf-8-unix default enable-multibyte-characters: t Major mode: Article Minor modes in effect: diff-auto-refine-mode: t electric-pair-mode: t electric-indent-mode: t url-handler-mode: t global-reveal-mode: t reveal-mode: t auto-insert-mode: t savehist-mode: t minibuffer-electric-default-mode: t mouse-wheel-mode: t menu-bar-mode: t file-name-shadow-mode: t global-font-lock-mode: t font-lock-mode: t auto-composition-mode: t auto-encryption-mode: t auto-compression-mode: t line-number-mode: t transient-mark-mode: t Recent input: e ( p o p t - o - b u f f e r - t o - b u f f e r SPC " SPC * m m * < 4 > > C-e M-< C-s C-w C-w C-a C-e C-c @ C-a M-x r e p o r Recent messages: Mark saved where search started mm-shr Mark saved where search started [3 times] Mark set mm-shr Entering debugger... #> Mark set Mark saved where search started Making completion list... Load-path shadows: /usr/share/emacs23/site-lisp/bbdb/bbdb-migrate hides /usr/share/emacs/site-lisp/bbdb/lisp/bbdb-migrate /usr/share/emacs23/site-lisp/bbdb/bbdb hides /usr/share/emacs/site-lisp/bbdb/lisp/bbdb /usr/share/emacs23/site-lisp/bbdb/bbdb-rmail hides /usr/share/emacs/site-lisp/bbdb/lisp/bbdb-rmail /usr/share/emacs23/site-lisp/bbdb/bbdb-gnus hides /usr/share/emacs/site-lisp/bbdb/lisp/bbdb-gnus /usr/share/emacs23/site-lisp/bbdb/bbdb-w3 hides /usr/share/emacs/site-lisp/bbdb/lisp/bbdb-w3 /usr/share/emacs23/site-lisp/bbdb/bbdb-com hides /usr/share/emacs/site-lisp/bbdb/lisp/bbdb-com /usr/share/emacs23/site-lisp/bbdb/bbdb-merge hides /usr/share/emacs/site-lisp/bbdb/lisp/bbdb-merge /usr/share/emacs23/site-lisp/bbdb/bbdb-ftp hides /usr/share/emacs/site-lisp/bbdb/lisp/bbdb-ftp /usr/share/emacs23/site-lisp/bbdb/bbdb-sc hides /usr/share/emacs/site-lisp/bbdb/lisp/bbdb-sc /usr/share/emacs23/site-lisp/bbdb/bbdb-vm hides /usr/share/emacs/site-lisp/bbdb/lisp/bbdb-vm /usr/share/emacs23/site-lisp/bbdb/bbdb-gui hides /usr/share/emacs/site-lisp/bbdb/lisp/bbdb-gui /usr/share/emacs23/site-lisp/bbdb/bbdb-print hides /usr/share/emacs/site-lisp/bbdb/lisp/bbdb-print /usr/share/emacs23/site-lisp/bbdb/bbdb-hooks hides /usr/share/emacs/site-lisp/bbdb/lisp/bbdb-hooks /usr/share/emacs23/site-lisp/bbdb/bbdb-mhe hides /usr/share/emacs/site-lisp/bbdb/lisp/bbdb-mhe /usr/share/emacs23/site-lisp/bbdb/bbdb-whois hides /usr/share/emacs/site-lisp/bbdb/lisp/bbdb-whois /usr/share/emacs23/site-lisp/bbdb/bbdb-snarf hides /usr/share/emacs/site-lisp/bbdb/lisp/bbdb-snarf Features: (emacsbug gnus-topic cl-specs shr url-http url-auth url-gw footnote xscheme warnings trace testcover scheme unsafep re-builder shadow inf-lisp ielm comint ring elp edebug cust-print vc-bzr filecache find-func dabbrev multi-isearch diff-mode jka-compr rect pp descr-text gnus-fun skeleton canlock sha1 hex-util novice woman tutorial help-macro man assoc info-look info help-at-pt ehelp apropos cus-edit cus-start cus-load gnus-html browse-url xml url-cache mm-url url url-proxy url-privacy url-expand url-methods url-history url-cookie url-util supercite regi flow-fill executable copyright debug gnus-draft gnus-dup mule-util sort smiley ansi-color gnus-cite mail-extr gnus-async gnus-bcklg qp byte-opt bytecomp byte-compile gnus-ml disp-table nnfolder utf-7 nnimap parse-time tls utf7 nndraft nnmh nnagent nnml gnus-agent gnus-srvr gnus-score score-mode nnvirtual gnus-msg gnus-art mm-uu mml2015 epg-config mm-view smime password-cache dig mailcap nntp gnus-cache gnus-sum nnoo gnus-group time-date gnus-undo nnmail mail-source format-spec server gnus-start gnus-spec gnus-int gnus-range message sendmail rfc822 mml mml-sec mm-decode mm-bodies mm-encode mail-parse rfc2231 rfc2047 rfc2045 ietf-drums mailabbrev gmm-utils mailheader gnus-win gnus gnus-ems nnheader mail-utils wid-edit noutline outline easy-mmode flyspell ispell eldoc checkdoc regexp-opt thingatpt help-mode easymenu view prog-mode electric url-handlers url-parse auth-source netrc gnus-util url-vars mm-util mail-prsvr reveal autoinsert uniquify advice help-fns advice-preload savehist minibuf-eldef cl cl-loaddefs proof-site proof-autoloads pg-vars bbdb-autoloads agda2 tooltip ediff-hook vc-hooks lisp-float-type mwheel x-win x-dnd tool-bar dnd fontset image fringe lisp-mode register page newcomment menu-bar rfn-eshadow timer select scroll-bar mouse jit-lock font-lock syntax font-core frame cham georgian utf-8-lang misc-lang vietnamese tibetan thai tai-viet lao korean japanese hebrew greek romanian slovak czech european ethiopic indian cyrillic chinese case-table epa-hook jka-cmpr-hook help simple abbrev minibuffer loaddefs button faces cus-face files text-properties overlay md5 base64 format env code-pages mule custom widget hashtable-print-readable backquote make-network-process dbusbind dynamic-setting system-font-setting font-render-setting x-toolkit x multi-tty emacs)