From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.devel Subject: Re: Unibyte characters, strings, and buffers Date: Sat, 29 Mar 2014 14:07:48 +0300 Message-ID: <83mwg9dzzv.fsf@gnu.org> References: <831txozsqa.fsf@gnu.org> <83ppl7y30l.fsf@gnu.org> <87r45nouvx.fsf@uwakimon.sk.tsukuba.ac.jp> <8361myyac6.fsf@gnu.org> <87a9capqfr.fsf@uwakimon.sk.tsukuba.ac.jp> <83eh1mfd09.fsf@gnu.org> <87ob0pnyt6.fsf@uwakimon.sk.tsukuba.ac.jp> <87y4ztp9p8.fsf@fencepost.gnu.org> Reply-To: Eli Zaretskii NNTP-Posting-Host: plane.gmane.org X-Trace: ger.gmane.org 1396091277 1449 80.91.229.3 (29 Mar 2014 11:07:57 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Sat, 29 Mar 2014 11:07:57 +0000 (UTC) Cc: stephen@xemacs.org, monnier@IRO.UMontreal.CA, emacs-devel@gnu.org To: David Kastrup Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Sat Mar 29 12:08:05 2014 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1WTr7M-00082g-FP for ged-emacs-devel@m.gmane.org; Sat, 29 Mar 2014 12:08:04 +0100 Original-Received: from localhost ([::1]:38427 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1WTr7M-0006TZ-0v for ged-emacs-devel@m.gmane.org; Sat, 29 Mar 2014 07:08:04 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:47211) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1WTr7F-0006Ro-2Q for emacs-devel@gnu.org; Sat, 29 Mar 2014 07:08:02 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1WTr79-00064g-Ti for emacs-devel@gnu.org; Sat, 29 Mar 2014 07:07:57 -0400 Original-Received: from mtaout20.012.net.il ([80.179.55.166]:58218) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1WTr74-00064B-2N; Sat, 29 Mar 2014 07:07:46 -0400 Original-Received: from conversion-daemon.a-mtaout20.012.net.il by a-mtaout20.012.net.il (HyperSendmail v2007.08) id <0N3700F002L79I00@a-mtaout20.012.net.il>; Sat, 29 Mar 2014 14:07:44 +0300 (IDT) Original-Received: from HOME-C4E4A596F7 ([87.69.4.28]) by a-mtaout20.012.net.il (HyperSendmail v2007.08) with ESMTPA id <0N3700F2F2WWAL00@a-mtaout20.012.net.il>; Sat, 29 Mar 2014 14:07:44 +0300 (IDT) In-reply-to: <87y4ztp9p8.fsf@fencepost.gnu.org> X-012-Sender: halo1@inter.net.il X-detected-operating-system: by eggs.gnu.org: Solaris 10 X-Received-From: 80.179.55.166 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:171133 Archived-At: > From: David Kastrup > Cc: Eli Zaretskii , monnier@IRO.UMontreal.CA, emacs-devel@gnu.org > Date: Sat, 29 Mar 2014 11:42:43 +0100 > > The current point of contention is about changing the way of > codepoint-based character operations depending on the unibyte state of > the current buffer. The point for which this discussion was started was how to get rid of this dependency, in those few places where we have them in Emacs. > I am not necessarily of the same opinion as Stephen regarding whether or > not abolishing unibyte buffers is a worthwhile goal. But I am pretty > sure that "unibyte" should not be bleeding over into character and > string operations. Indeed, and Emacs tries very hard to contain that distinction, so that it doesn't leak out of the internals. Mostly, it succeeds, but sometimes it doesn't. > A unibyte buffer or unibyte string might error out when trying to insert > characters out of the range 0..255. We currently don't do that. Try (insert "xyz") in a unibyte buffer, where "xyz" is some non-ASCII string, and watch the fun. > If we want different semantics for case-fold-search in binary buffers, > then the solution is setting a buffer-local setting of case-fold-search > when opening a buffer intended to be manipulated in a binary way. > > But the unibyte setting of the buffer should not affect normal character > and string operation semantics. It is a buffer implementation detail > that should not really have a visible effect apart from making some > buffer operations impossible. But if case-fold-search is set to nil in unibyte buffers, and (as we know) buffer-local value of case-fold-search does affects functions that compare text, either because they consult case-fold-search directly or because the consult buffer-local case-table, then the unibyte setting does affect the semantics, albeit indirectly. > If something chooses a unibyte buffer representation for some reason, it > is the responsibility of the same something to switch character > operations and case-fold-search etc to something making sense in the > context of its operation. That may well be through some buffer-local > setting of case-fold-search etc, but it is not tied to the internal > representation of the buffer contents. Not that I disagree with you, but why does it matter whether some code makes a buffer unibyte or sets its case-fold-search, to achieve that goal? In both cases, that something tells Emacs to ignore case conversion, it just uses 2 different ways of saying that. If we are not going to abolish unibyte buffers, how is the difference important?