From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.devel Subject: Re: [Emacs-diffs] master c4782ea: Improve and extend filepos-to-bufferpos Date: Fri, 19 Jun 2015 09:59:38 +0300 Message-ID: <83y4jgji91.fsf@gnu.org> References: <20150618120808.22624.13860@vcs.savannah.gnu.org> Reply-To: Eli Zaretskii NNTP-Posting-Host: plane.gmane.org X-Trace: ger.gmane.org 1434697225 13558 80.91.229.3 (19 Jun 2015 07:00:25 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Fri, 19 Jun 2015 07:00:25 +0000 (UTC) Cc: emacs-devel@gnu.org To: Stefan Monnier Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Fri Jun 19 09:00:10 2015 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1Z5qHP-00067S-Rj for ged-emacs-devel@m.gmane.org; Fri, 19 Jun 2015 09:00:00 +0200 Original-Received: from localhost ([::1]:56502 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Z5qHO-0007FO-RV for ged-emacs-devel@m.gmane.org; Fri, 19 Jun 2015 02:59:58 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:59973) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Z5qHL-0007FG-K7 for emacs-devel@gnu.org; Fri, 19 Jun 2015 02:59:56 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Z5qHG-0001cE-G0 for emacs-devel@gnu.org; Fri, 19 Jun 2015 02:59:55 -0400 Original-Received: from mtaout26.012.net.il ([80.179.55.182]:56056) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Z5qHG-0001bR-2u for emacs-devel@gnu.org; Fri, 19 Jun 2015 02:59:50 -0400 Original-Received: from conversion-daemon.mtaout26.012.net.il by mtaout26.012.net.il (HyperSendmail v2007.08) id <0NQ600A00JDG9O00@mtaout26.012.net.il> for emacs-devel@gnu.org; Fri, 19 Jun 2015 10:02:13 +0300 (IDT) Original-Received: from HOME-C4E4A596F7 ([87.69.4.28]) by mtaout26.012.net.il (HyperSendmail v2007.08) with ESMTPA id <0NQ6001VBJJOWK90@mtaout26.012.net.il>; Fri, 19 Jun 2015 10:02:13 +0300 (IDT) In-reply-to: X-012-Sender: halo1@inter.net.il X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x X-Received-From: 80.179.55.182 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:187305 Archived-At: > From: Stefan Monnier > Cc: Eli Zaretskii > Date: Thu, 18 Jun 2015 23:17:28 -0400 > > > + (if (<= byte eol-offset) > > + (setq pos (point-min)) > > + (setq pos (point-max)))) > > Aka (setq pos (if (<= byte eol-offset) (point-min) (point-max))) Yes, but my code is clearer, IMO. > > (let ((eol (coding-system-eol-type coding-system)) > > (type (coding-system-type coding-system)) > > + (base (coding-system-base coding-system)) > > (pm (save-restriction (widen) (point-min)))) > > + (and (eq type 'utf-8-emacs) > > + (setq type 'utf-8)) > > (coding-system-type 'utf-8-emacs) returns `utf-8', so how/when can > `type' be `utf-8-emacs'? Never. I guess I got confused with coding-system-base. > > > + (and (eq type 'utf-8) > > + ;; Any post-read/pre-write conversions mean it's not really UTF-8. > > + (not (null (coding-system-get coding-system :pos-read-conversion))) > > + (setq type 'not-utf-8)) > > I guess this also applies for latin-N and utf-16, IOW for any value of > `type', right? Not really, no. UTF-8 is special here, in that we believe we know how to compute the byte position exactly, which is not true when there are conversions. Some profoundly non-UTF-8 encodings have that type, but then apply conversions that make them something very different. > > + (and (not (eq type 'utf-8)) > > + (eq quality 'exact) > > + (setq type 'use-exact)) > > IIUC this makes us use the slow exact code for latin-N. Only if they ask for 'exact'. > Why is it needed? They asked for it, didn't they? A more important problem is that we handle type before accuracy, so when exact is requested, the type checks should be bypassed, except with UTF-8. > > + (`utf-16 > > + ;; Account for BOM, which is always 2 bytes in UTF-16. > > + (setq byte (- byte 2)) > > Should that only be done for utf1-16B-with-signature? Do we have a UTF-16 encoding without a signature? > > + ;; In approximate mode, assume all characters are within the > > + ;; BMP, i.e. take up 2 bytes. > > + (setq byte (/ byte 2)) > > + (if (= eol 1) > > + (filepos-to-bufferpos--dos (+ pm byte) #'byte-to-position) > > + (byte-to-position (+ pm byte)))) > > Shouldn't this use `identity' rather than `byte-to-position'? This code tested OK for me, feel free to change if you have a test that fails.