From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: David Kastrup Newsgroups: gmane.emacs.devel Subject: Re: Unibyte characters, strings, and buffers Date: Sat, 29 Mar 2014 09:40:03 +0100 Message-ID: <8761mxqty4.fsf@fencepost.gnu.org> References: <831txozsqa.fsf@gnu.org> <83ppl7y30l.fsf@gnu.org> <87r45nouvx.fsf@uwakimon.sk.tsukuba.ac.jp> <8361myyac6.fsf@gnu.org> <87a9capqfr.fsf@uwakimon.sk.tsukuba.ac.jp> <83eh1mfd09.fsf@gnu.org> <87wqfeqkl1.fsf@fencepost.gnu.org> <834n2ifa50.fsf@gnu.org> <87siq2qg6a.fsf@fencepost.gnu.org> <83zjk9ec92.fsf@gnu.org> <87d2h5qxhm.fsf@fencepost.gnu.org> <83siq1e7kq.fsf@gnu.org> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain X-Trace: ger.gmane.org 1396082414 10400 80.91.229.3 (29 Mar 2014 08:40:14 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Sat, 29 Mar 2014 08:40:14 +0000 (UTC) Cc: emacs-devel@gnu.org To: Eli Zaretskii Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Sat Mar 29 09:40:24 2014 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1WTooR-0006Oo-QE for ged-emacs-devel@m.gmane.org; Sat, 29 Mar 2014 09:40:23 +0100 Original-Received: from localhost ([::1]:37969 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1WTooR-00064j-GE for ged-emacs-devel@m.gmane.org; Sat, 29 Mar 2014 04:40:23 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:54774) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1WTooM-00060S-IX for emacs-devel@gnu.org; Sat, 29 Mar 2014 04:40:21 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1WTooH-0001xt-Gp for emacs-devel@gnu.org; Sat, 29 Mar 2014 04:40:18 -0400 Original-Received: from fencepost.gnu.org ([2001:4830:134:3::e]:50556) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1WTooH-0001xn-Em for emacs-devel@gnu.org; Sat, 29 Mar 2014 04:40:13 -0400 Original-Received: from localhost ([127.0.0.1]:57732 helo=lola) by fencepost.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1WTooG-0003bY-NA; Sat, 29 Mar 2014 04:40:13 -0400 Original-Received: by lola (Postfix, from userid 1000) id 322DBE0497; Sat, 29 Mar 2014 09:40:03 +0100 (CET) In-Reply-To: <83siq1e7kq.fsf@gnu.org> (Eli Zaretskii's message of "Sat, 29 Mar 2014 11:24:05 +0300") User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.4.50 (gnu/linux) X-detected-operating-system: by eggs.gnu.org: Error: Malformed IPv6 address (bad octet value). X-Received-From: 2001:4830:134:3::e X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:171123 Archived-At: Eli Zaretskii writes: >> From: David Kastrup >> Cc: emacs-devel@gnu.org >> Date: Sat, 29 Mar 2014 08:23:33 +0100 >> >> Eli Zaretskii writes: >> >> >> From: David Kastrup >> >> Cc: emacs-devel@gnu.org >> >> Date: Fri, 28 Mar 2014 20:25:17 +0100 >> >> >> >> >> > Then what do you call a buffer whose "text" is encoded? >> >> >> >> >> >> I can't speak for Stephen, of course, but my impression was he would >> >> >> call it "a bad idea". >> >> > >> >> > Then what other ideas to use when Lisp code needs to encode or decode >> >> > text manually? >> >> >> >> Redecode right to a "binary" coding system would be my guess. >> > >> > Sorry, I don't follow. Can you tell more what that means? >> >> It means a buffer where each _character_ has the same value that the >> no-longer-available unibyte buffer would have in its bytes/characters. > > This doesn't seem to be a complete description of what is suggested. > E.g., just by looking at the values of characters, it is impossible to > distinguish between Latin characters below 256 and raw bytes. In a > unibyte buffer, we know how to make that distinction, Uh, what? The point of a unibyte buffer is that it does not make the distinction. > but if there are no unibyte buffers, something else is needed for > doing that. >> You can do that whether or not the conceptual array of 0..255 characters >> is internally encoded in unibyte or multibyte encodings. > > What do you mean by "multibyte encodings" in this context? Are you > suggesting to store the bytes 128..255 as Latin-1 characters, > i.e. using the 2-byte UTF-8 sequences of the corresponding Latin > characters? That would make the most sense, yes. > Or are you suggesting something else? You could also use the "raw byte" character encodings we use for not losing information when reading not properly formed utf-8 files into a multibyte buffer, but that seems less practical when working with the character codes. -- David Kastrup