From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Paul Eggert Newsgroups: gmane.emacs.devel Subject: Re: Unibyte characters, strings and buffers Date: Fri, 28 Mar 2014 12:21:04 -0700 Organization: UCLA Computer Science Department Message-ID: <5335CBA0.5050007@cs.ucla.edu> References: <831txozsqa.fsf@gnu.org> <83ppl7y30l.fsf@gnu.org> <83d2h6yezx.fsf@gnu.org> <533528B9.9040200@cs.ucla.edu> <837g7eybwl.fsf@gnu.org> <5335C288.4090306@cs.ucla.edu> <8338i2f95d.fsf@gnu.org> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: quoted-printable X-Trace: ger.gmane.org 1396034491 13399 80.91.229.3 (28 Mar 2014 19:21:31 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Fri, 28 Mar 2014 19:21:31 +0000 (UTC) Cc: emacs-devel@gnu.org To: Eli Zaretskii Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Fri Mar 28 20:21:40 2014 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1WTcLT-0007nT-O0 for ged-emacs-devel@m.gmane.org; Fri, 28 Mar 2014 20:21:39 +0100 Original-Received: from localhost ([::1]:35633 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1WTcLT-0008RW-BK for ged-emacs-devel@m.gmane.org; Fri, 28 Mar 2014 15:21:39 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:33075) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1WTcLI-0008RL-I6 for emacs-devel@gnu.org; Fri, 28 Mar 2014 15:21:36 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1WTcLB-0006D7-20 for emacs-devel@gnu.org; Fri, 28 Mar 2014 15:21:28 -0400 Original-Received: from smtp.cs.ucla.edu ([131.179.128.62]:57464) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1WTcL3-0006Av-0T; Fri, 28 Mar 2014 15:21:13 -0400 Original-Received: from localhost (localhost.localdomain [127.0.0.1]) by smtp.cs.ucla.edu (Postfix) with ESMTP id 37C3E39E801A; Fri, 28 Mar 2014 12:21:05 -0700 (PDT) X-Virus-Scanned: amavisd-new at smtp.cs.ucla.edu Original-Received: from smtp.cs.ucla.edu ([127.0.0.1]) by localhost (smtp.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id EQ57rfBfWbIW; Fri, 28 Mar 2014 12:21:04 -0700 (PDT) Original-Received: from penguin.cs.ucla.edu (Penguin.CS.UCLA.EDU [131.179.64.200]) by smtp.cs.ucla.edu (Postfix) with ESMTPSA id A4ACC39E8013; Fri, 28 Mar 2014 12:21:04 -0700 (PDT) User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.4.0 In-Reply-To: <8338i2f95d.fsf@gnu.org> X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x X-Received-From: 131.179.128.62 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:171094 Archived-At: >> Code that blithly passes bytes in the range 128-255 to char-equal is >> *already* buggy. > There's nothing wrong with those bytes, certainly not when they stand > for Latin-1 characters. Sure, and if they stand for Latin-1 characters the proposed change will=20 do the right thing. > How is it a win, when it actually _adds_ bugs? E.g., under your=20 > proposal, (char-equal 192 224) will yield non-nil when=20 > case-fold-search is non-nil.=20 That's not a bug, since =C0 and =E0 are the same character, ignoring case= . As I understand it, the scenario you're worried about is that someone is=20 visiting a unibyte buffer and is doing a case-folded search involving=20 non-ASCII bytes and doesn't want these bytes to match their Latin-1=20 case-folded counterparts. This scenario is not common enough to worry=20 about. Changing the behavior for this rare case is a cost, I suppose,=20 but it's outweighed by the benefit of simplifying case-equal and fixing=20 its semantics to be a bit saner. >> Plus, the change is simpler and easier to explain than what we have no= w, >> and that is a long-term win. > I don't see how it is simpler or easier to explain. It replaces one > lopsided interpretation of 128-255 values with another. > It's simpler because it decouples the rules for char-equal from the=20 question of whether the current buffer is multibyte. Separation of=20 concerns is a win. > I suggested a solution: ignore case-fold-search in unibyte buffers. Sorry, I didn't see that suggestion. It would be better than what we=20 have now for char-equal, but it would have undesirable side effects=20 elsewhere. When I type find-file-literally to visit a buffer in=20 raw-text form, it's more convenient if I can type C-s h t m l (or=20 whatever) and find "HTML". I'd rather not lose that capability.