From mboxrd@z Thu Jan  1 00:00:00 1970
Path: news.gmane.org!not-for-mail
From: Eli Zaretskii <eliz@gnu.org>
Newsgroups: gmane.emacs.devel
Subject: Re: Unibyte characters, strings, and buffers
Date: Sat, 29 Mar 2014 14:07:48 +0300
Message-ID: <83mwg9dzzv.fsf@gnu.org>
References: <831txozsqa.fsf@gnu.org> <jwv4n2j2141.fsf-monnier+emacs@gnu.org>
	<83ppl7y30l.fsf@gnu.org> <87r45nouvx.fsf@uwakimon.sk.tsukuba.ac.jp>
	<8361myyac6.fsf@gnu.org> <87a9capqfr.fsf@uwakimon.sk.tsukuba.ac.jp>
	<83eh1mfd09.fsf@gnu.org> <87ob0pnyt6.fsf@uwakimon.sk.tsukuba.ac.jp>
	<87y4ztp9p8.fsf@fencepost.gnu.org>
Reply-To: Eli Zaretskii <eliz@gnu.org>
NNTP-Posting-Host: plane.gmane.org
X-Trace: ger.gmane.org 1396091277 1449 80.91.229.3 (29 Mar 2014 11:07:57 GMT)
X-Complaints-To: usenet@ger.gmane.org
NNTP-Posting-Date: Sat, 29 Mar 2014 11:07:57 +0000 (UTC)
Cc: stephen@xemacs.org, monnier@IRO.UMontreal.CA, emacs-devel@gnu.org
To: David Kastrup <dak@gnu.org>
Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Sat Mar 29 12:08:05 2014
Return-path: <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>
Envelope-to: ged-emacs-devel@m.gmane.org
Original-Received: from lists.gnu.org ([208.118.235.17])
	by plane.gmane.org with esmtp (Exim 4.69)
	(envelope-from <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>)
	id 1WTr7M-00082g-FP
	for ged-emacs-devel@m.gmane.org; Sat, 29 Mar 2014 12:08:04 +0100
Original-Received: from localhost ([::1]:38427 helo=lists.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>)
	id 1WTr7M-0006TZ-0v
	for ged-emacs-devel@m.gmane.org; Sat, 29 Mar 2014 07:08:04 -0400
Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:47211)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <eliz@gnu.org>) id 1WTr7F-0006Ro-2Q
	for emacs-devel@gnu.org; Sat, 29 Mar 2014 07:08:02 -0400
Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <eliz@gnu.org>) id 1WTr79-00064g-Ti
	for emacs-devel@gnu.org; Sat, 29 Mar 2014 07:07:57 -0400
Original-Received: from mtaout20.012.net.il ([80.179.55.166]:58218)
	by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from <eliz@gnu.org>)
	id 1WTr74-00064B-2N; Sat, 29 Mar 2014 07:07:46 -0400
Original-Received: from conversion-daemon.a-mtaout20.012.net.il by
	a-mtaout20.012.net.il (HyperSendmail v2007.08) id
	<0N3700F002L79I00@a-mtaout20.012.net.il>;
	Sat, 29 Mar 2014 14:07:44 +0300 (IDT)
Original-Received: from HOME-C4E4A596F7 ([87.69.4.28]) by a-mtaout20.012.net.il
	(HyperSendmail v2007.08) with ESMTPA id
	<0N3700F2F2WWAL00@a-mtaout20.012.net.il>;
	Sat, 29 Mar 2014 14:07:44 +0300 (IDT)
In-reply-to: <87y4ztp9p8.fsf@fencepost.gnu.org>
X-012-Sender: halo1@inter.net.il
X-detected-operating-system: by eggs.gnu.org: Solaris 10
X-Received-From: 80.179.55.166
X-BeenThere: emacs-devel@gnu.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: "Emacs development discussions." <emacs-devel.gnu.org>
List-Unsubscribe: <https://lists.gnu.org/mailman/options/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/archive/html/emacs-devel>
List-Post: <mailto:emacs-devel@gnu.org>
List-Help: <mailto:emacs-devel-request@gnu.org?subject=help>
List-Subscribe: <https://lists.gnu.org/mailman/listinfo/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=subscribe>
Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org
Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org
Xref: news.gmane.org gmane.emacs.devel:171133
Archived-At: <http://permalink.gmane.org/gmane.emacs.devel/171133>

> From: David Kastrup <dak@gnu.org>
> Cc: Eli Zaretskii <eliz@gnu.org>,  monnier@IRO.UMontreal.CA,  emacs-devel@gnu.org
> Date: Sat, 29 Mar 2014 11:42:43 +0100
> 
> The current point of contention is about changing the way of
> codepoint-based character operations depending on the unibyte state of
> the current buffer.

The point for which this discussion was started was how to get rid of
this dependency, in those few places where we have them in Emacs.

> I am not necessarily of the same opinion as Stephen regarding whether or
> not abolishing unibyte buffers is a worthwhile goal.  But I am pretty
> sure that "unibyte" should not be bleeding over into character and
> string operations.

Indeed, and Emacs tries very hard to contain that distinction, so that
it doesn't leak out of the internals.  Mostly, it succeeds, but
sometimes it doesn't.

> A unibyte buffer or unibyte string might error out when trying to insert
> characters out of the range 0..255.

We currently don't do that.  Try (insert "xyz") in a unibyte buffer,
where "xyz" is some non-ASCII string, and watch the fun.

> If we want different semantics for case-fold-search in binary buffers,
> then the solution is setting a buffer-local setting of case-fold-search
> when opening a buffer intended to be manipulated in a binary way.
> 
> But the unibyte setting of the buffer should not affect normal character
> and string operation semantics.  It is a buffer implementation detail
> that should not really have a visible effect apart from making some
> buffer operations impossible.

But if case-fold-search is set to nil in unibyte buffers, and (as we
know) buffer-local value of case-fold-search does affects functions
that compare text, either because they consult case-fold-search
directly or because the consult buffer-local case-table, then the
unibyte setting does affect the semantics, albeit indirectly.

> If something chooses a unibyte buffer representation for some reason, it
> is the responsibility of the same something to switch character
> operations and case-fold-search etc to something making sense in the
> context of its operation.  That may well be through some buffer-local
> setting of case-fold-search etc, but it is not tied to the internal
> representation of the buffer contents.

Not that I disagree with you, but why does it matter whether some code
makes a buffer unibyte or sets its case-fold-search, to achieve that
goal?  In both cases, that something tells Emacs to ignore case
conversion, it just uses 2 different ways of saying that.  If we are
not going to abolish unibyte buffers, how is the difference important?