From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Thien-Thi Nguyen Newsgroups: gmane.emacs.devel Subject: Re: Creating a coding system Date: Sat, 20 Dec 2014 16:56:00 +0100 Message-ID: <87r3vu1fjj.fsf@zigzag.favinet> References: <87ppbeitcs.fsf@fencepost.gnu.org> <87sigasa3n.fsf@igel.home> <878ui2ieu1.fsf@fencepost.gnu.org> Reply-To: emacs-devel@gnu.org NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: multipart/signed; boundary="==-=-="; micalg=pgp-sha1; protocol="application/pgp-signature" X-Trace: ger.gmane.org 1419090741 22332 80.91.229.3 (20 Dec 2014 15:52:21 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Sat, 20 Dec 2014 15:52:21 +0000 (UTC) To: emacs-devel@gnu.org Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Sat Dec 20 16:52:13 2014 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1Y2MKD-0007N4-GY for ged-emacs-devel@m.gmane.org; Sat, 20 Dec 2014 16:52:13 +0100 Original-Received: from localhost ([::1]:34696 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Y2MKD-0008VF-1u for ged-emacs-devel@m.gmane.org; Sat, 20 Dec 2014 10:52:13 -0500 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:40990) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Y2MK4-0008Uw-6e for emacs-devel@gnu.org; Sat, 20 Dec 2014 10:52:10 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Y2MJx-0005Yq-8Z for emacs-devel@gnu.org; Sat, 20 Dec 2014 10:52:04 -0500 Original-Received: from smtp206.alice.it ([82.57.200.102]:47275) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Y2MJw-0005Yh-Tk for emacs-devel@gnu.org; Sat, 20 Dec 2014 10:51:57 -0500 Original-Received: from zigzag.favinet (79.21.50.59) by smtp206.alice.it (8.6.060.28) id 547D8AFA03E450D7 for emacs-devel@gnu.org; Sat, 20 Dec 2014 16:51:55 +0100 Original-Received: from ttn by zigzag.favinet with local (Exim 4.80) (envelope-from ) id 1Y2MO1-0000EG-9Z for emacs-devel@gnu.org; Sat, 20 Dec 2014 16:56:09 +0100 Mail-Followup-To: emacs-devel@gnu.org In-Reply-To: <878ui2ieu1.fsf@fencepost.gnu.org> (David Kastrup's message of "Sat, 20 Dec 2014 15:19:18 +0100") User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.0.50 (gnu/linux) X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 82.57.200.102 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:180380 Archived-At: --==-=-= Content-Type: multipart/mixed; boundary="=-=-=" --=-=-= Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable () David Kastrup () Sat, 20 Dec 2014 15:19:18 +0100 I am missing the big picture here in some manner. Does decoding not start from a byte stream but rather from an emacs-utf-8 encoded version of a byte stream? That does not seem to make sense to me. Have you tried specifying =E2=80=98:coding-type raw-text=E2=80=99? I see in src/coding.c line 5339 (from a mid-October pre-Git tree) the nice number 1: --=-=-= Content-Type: text/x-csrc Content-Disposition: inline; filename=nice-number-1.c static void decode_coding_raw_text (struct coding_system *coding) { bool eol_dos = [...] coding->chars_at_source = 1; ... } --=-=-= Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable It is the only instance in src/*.c of that literal value being assigned to that struct member. A truly singular hint? :-D I imagine that aside from =E2=80=98raw-text=E2=80=99, decoding is multi-lay= ered due to the presence of =E2=80=98undecided=E2=80=99 (which requires guesswor= k, heuristics, and (maybe) backtracking) and that the design tries to move data from bytes to characters as soon as possible, to reduce downstream complexity and for cohesion w/ the rest of Emacs. But, that's merely ignorant speculation... =2D-=20 Thien-Thi Nguyen GPG key: 4C807502 (if you're human and you know it) read my lisp: (responsep (questions 'technical) (not (via 'mailing-list))) =3D> nil --=-=-=-- --==-=-= Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) iEYEARECAAYFAlSVnBMACgkQZwMiJEyAdQJBSwCfcpDAseNCOso/f1LG1Og0PcTS C18AnAn4LzN7wRrc0efgnZGCXyuOUff1 =RTye -----END PGP SIGNATURE----- --==-=-=--