From mboxrd@z Thu Jan  1 00:00:00 1970
Path: news.gmane.org!not-for-mail
From: Paul Eggert <eggert@cs.ucla.edu>
Newsgroups: gmane.emacs.devel
Subject: Re: Unibyte characters, strings and buffers
Date: Fri, 28 Mar 2014 12:21:04 -0700
Organization: UCLA Computer Science Department
Message-ID: <5335CBA0.5050007@cs.ucla.edu>
References: <831txozsqa.fsf@gnu.org> <jwv4n2j2141.fsf-monnier+emacs@gnu.org>
	<83ppl7y30l.fsf@gnu.org> <jwvd2h7xspc.fsf-monnier+emacs@gnu.org>
	<83d2h6yezx.fsf@gnu.org> <533528B9.9040200@cs.ucla.edu>
	<837g7eybwl.fsf@gnu.org> <5335C288.4090306@cs.ucla.edu>
	<8338i2f95d.fsf@gnu.org>
NNTP-Posting-Host: plane.gmane.org
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: quoted-printable
X-Trace: ger.gmane.org 1396034491 13399 80.91.229.3 (28 Mar 2014 19:21:31 GMT)
X-Complaints-To: usenet@ger.gmane.org
NNTP-Posting-Date: Fri, 28 Mar 2014 19:21:31 +0000 (UTC)
Cc: emacs-devel@gnu.org
To: Eli Zaretskii <eliz@gnu.org>
Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Fri Mar 28 20:21:40 2014
Return-path: <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>
Envelope-to: ged-emacs-devel@m.gmane.org
Original-Received: from lists.gnu.org ([208.118.235.17])
	by plane.gmane.org with esmtp (Exim 4.69)
	(envelope-from <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>)
	id 1WTcLT-0007nT-O0
	for ged-emacs-devel@m.gmane.org; Fri, 28 Mar 2014 20:21:39 +0100
Original-Received: from localhost ([::1]:35633 helo=lists.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>)
	id 1WTcLT-0008RW-BK
	for ged-emacs-devel@m.gmane.org; Fri, 28 Mar 2014 15:21:39 -0400
Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:33075)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <eggert@cs.ucla.edu>) id 1WTcLI-0008RL-I6
	for emacs-devel@gnu.org; Fri, 28 Mar 2014 15:21:36 -0400
Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <eggert@cs.ucla.edu>) id 1WTcLB-0006D7-20
	for emacs-devel@gnu.org; Fri, 28 Mar 2014 15:21:28 -0400
Original-Received: from smtp.cs.ucla.edu ([131.179.128.62]:57464)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <eggert@cs.ucla.edu>)
	id 1WTcL3-0006Av-0T; Fri, 28 Mar 2014 15:21:13 -0400
Original-Received: from localhost (localhost.localdomain [127.0.0.1])
	by smtp.cs.ucla.edu (Postfix) with ESMTP id 37C3E39E801A;
	Fri, 28 Mar 2014 12:21:05 -0700 (PDT)
X-Virus-Scanned: amavisd-new at smtp.cs.ucla.edu
Original-Received: from smtp.cs.ucla.edu ([127.0.0.1])
	by localhost (smtp.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10024)
	with ESMTP id EQ57rfBfWbIW; Fri, 28 Mar 2014 12:21:04 -0700 (PDT)
Original-Received: from penguin.cs.ucla.edu (Penguin.CS.UCLA.EDU [131.179.64.200])
	by smtp.cs.ucla.edu (Postfix) with ESMTPSA id A4ACC39E8013;
	Fri, 28 Mar 2014 12:21:04 -0700 (PDT)
User-Agent: Mozilla/5.0 (X11; Linux x86_64;
	rv:24.0) Gecko/20100101 Thunderbird/24.4.0
In-Reply-To: <8338i2f95d.fsf@gnu.org>
X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x
X-Received-From: 131.179.128.62
X-BeenThere: emacs-devel@gnu.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: "Emacs development discussions." <emacs-devel.gnu.org>
List-Unsubscribe: <https://lists.gnu.org/mailman/options/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/archive/html/emacs-devel>
List-Post: <mailto:emacs-devel@gnu.org>
List-Help: <mailto:emacs-devel-request@gnu.org?subject=help>
List-Subscribe: <https://lists.gnu.org/mailman/listinfo/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=subscribe>
Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org
Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org
Xref: news.gmane.org gmane.emacs.devel:171094
Archived-At: <http://permalink.gmane.org/gmane.emacs.devel/171094>


>> Code that blithly passes bytes in the range 128-255 to char-equal is
>> *already* buggy.
> There's nothing wrong with those bytes, certainly not when they stand
> for Latin-1 characters.

Sure, and if they stand for Latin-1 characters the proposed change will=20
do the right thing.

> How is it a win, when it actually _adds_ bugs? E.g., under your=20
> proposal, (char-equal 192 224) will yield non-nil when=20
> case-fold-search is non-nil.=20

That's not a bug, since =C0 and =E0 are the same character, ignoring case=
.

As I understand it, the scenario you're worried about is that someone is=20
visiting a unibyte buffer and is doing a case-folded search involving=20
non-ASCII bytes and doesn't want these bytes to match their Latin-1=20
case-folded counterparts.  This scenario is not common enough to worry=20
about.  Changing the behavior for this rare case is a cost, I suppose,=20
but it's outweighed by the benefit of simplifying case-equal and fixing=20
its semantics to be a bit saner.

>> Plus, the change is simpler and easier to explain than what we have no=
w,
>> and that is a long-term win.
> I don't see how it is simpler or easier to explain.  It replaces one
> lopsided interpretation of 128-255 values with another.
>

It's simpler because it decouples the rules for char-equal from the=20
question of whether the current buffer is multibyte.  Separation of=20
concerns is a win.

> I suggested a solution: ignore case-fold-search in unibyte buffers.

Sorry, I didn't see that suggestion.  It would be better than what we=20
have now for char-equal, but it would have undesirable side effects=20
elsewhere.  When I type find-file-literally to visit a buffer in=20
raw-text form, it's more convenient if I can type C-s h t m l (or=20
whatever) and find "HTML".  I'd rather not lose that capability.