From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: David Kastrup Newsgroups: gmane.emacs.devel Subject: Re: Bug with UTF-8 string and dbus Date: Wed, 09 Jun 2010 22:42:55 +0200 Organization: Organization?!? Message-ID: <87fx0wuecw.fsf@lola.goethe.zz> References: <87typdnr08.fsf@keller.adm.naquadah.org> <87r5kgg1ee.fsf@keller.adm.naquadah.org> NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: dough.gmane.org 1276116299 31744 80.91.229.12 (9 Jun 2010 20:44:59 GMT) X-Complaints-To: usenet@dough.gmane.org NNTP-Posting-Date: Wed, 9 Jun 2010 20:44:59 +0000 (UTC) To: emacs-devel@gnu.org Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Wed Jun 09 22:44:58 2010 connect(): No such file or directory Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1OMS8u-0006hK-Oj for ged-emacs-devel@m.gmane.org; Wed, 09 Jun 2010 22:44:57 +0200 Original-Received: from localhost ([127.0.0.1]:51585 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1OMS8t-0000Ck-Rg for ged-emacs-devel@m.gmane.org; Wed, 09 Jun 2010 16:44:56 -0400 Original-Received: from [140.186.70.92] (port=46379 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1OMS7K-0007eg-5n for emacs-devel@gnu.org; Wed, 09 Jun 2010 16:43:19 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.69) (envelope-from ) id 1OMS7H-0008Kz-MB for emacs-devel@gnu.org; Wed, 09 Jun 2010 16:43:16 -0400 Original-Received: from lo.gmane.org ([80.91.229.12]:45547) by eggs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1OMS7H-0008Kd-E0 for emacs-devel@gnu.org; Wed, 09 Jun 2010 16:43:15 -0400 Original-Received: from list by lo.gmane.org with local (Exim 4.69) (envelope-from ) id 1OMS7D-0005vs-6P for emacs-devel@gnu.org; Wed, 09 Jun 2010 22:43:11 +0200 Original-Received: from pd9568328.dip0.t-ipconnect.de ([217.86.131.40]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Wed, 09 Jun 2010 22:43:11 +0200 Original-Received: from dak by pd9568328.dip0.t-ipconnect.de with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Wed, 09 Jun 2010 22:43:11 +0200 X-Injected-Via-Gmane: http://gmane.org/ connect(): No such file or directory Original-Lines: 21 Original-X-Complaints-To: usenet@dough.gmane.org X-Gmane-NNTP-Posting-Host: pd9568328.dip0.t-ipconnect.de X-Face: 2FEFf>]>q>2iw=B6, xrUubRI>pR&Ml9=ao@P@i)L:\urd*t9M~y1^:+Y]'C0~{mAl`oQuAl \!3KEIp?*w`|bL5qr,H)LFO6Q=qx~iH4DN; i"; /yuIsqbLLCh/!U#X[S~(5eZ41to5f%E@'ELIi$t^ Vc\LWP@J5p^rst0+('>Er0=^1{]M9!p?&:\z]|;&=NP3AhB!B_bi^]Pfkw User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.0.50 (gnu/linux) Cancel-Lock: sha1:lo6zAYvLa7Ch85GtCmImHp5cJNw= X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 3) X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:125687 Archived-At: Andreas Schwab writes: > Stefan Monnier writes: > >> AFAIK, Emacs's internal encoding is valid utf-8. It uses private >> characters for some things, but I don't think that makes it invalid. > > The eight-bit characters are encoded outside of the Unicode range, and a > good utf-8 decoder must treat them as invalid. Yes, that's the whole point. Indeed, Emacs own utf-8 decoder treats them as invalid too: when Emacs considers the data to be in utf-8 instead of emacs-internal encoding, it will decode the respective codes into its "raw byte" presentation. Which again is not legal utf-8 (but a rather obvious "extension" of the utf-8 encoding scheme which quite artificially stops at 2^20+2^16 or something similar which I don't accurately remember and that is a consequence of the range encodable with utf-16 with surrogate codes). -- David Kastrup