From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.devel Subject: Re: Unibyte characters, strings, and buffers Date: Sat, 29 Mar 2014 11:24:05 +0300 Message-ID: <83siq1e7kq.fsf@gnu.org> References: <831txozsqa.fsf@gnu.org> <83ppl7y30l.fsf@gnu.org> <87r45nouvx.fsf@uwakimon.sk.tsukuba.ac.jp> <8361myyac6.fsf@gnu.org> <87a9capqfr.fsf@uwakimon.sk.tsukuba.ac.jp> <83eh1mfd09.fsf@gnu.org> <87wqfeqkl1.fsf@fencepost.gnu.org> <834n2ifa50.fsf@gnu.org> <87siq2qg6a.fsf@fencepost.gnu.org> <83zjk9ec92.fsf@gnu.org> <87d2h5qxhm.fsf@fencepost.gnu.org> Reply-To: Eli Zaretskii NNTP-Posting-Host: plane.gmane.org X-Trace: ger.gmane.org 1396081457 1476 80.91.229.3 (29 Mar 2014 08:24:17 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Sat, 29 Mar 2014 08:24:17 +0000 (UTC) Cc: emacs-devel@gnu.org To: David Kastrup Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Sat Mar 29 09:24:26 2014 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1WToYz-0005ji-Tf for ged-emacs-devel@m.gmane.org; Sat, 29 Mar 2014 09:24:26 +0100 Original-Received: from localhost ([::1]:37905 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1WToYz-0005jh-G0 for ged-emacs-devel@m.gmane.org; Sat, 29 Mar 2014 04:24:25 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:52061) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1WToYq-0005ab-V6 for emacs-devel@gnu.org; Sat, 29 Mar 2014 04:24:22 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1WToYl-00057a-3K for emacs-devel@gnu.org; Sat, 29 Mar 2014 04:24:16 -0400 Original-Received: from mtaout26.012.net.il ([80.179.55.182]:53707) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1WToYd-00055k-Qz; Sat, 29 Mar 2014 04:24:04 -0400 Original-Received: from conversion-daemon.mtaout26.012.net.il by mtaout26.012.net.il (HyperSendmail v2007.08) id <0N3600L00V4N3600@mtaout26.012.net.il>; Sat, 29 Mar 2014 11:22:53 +0300 (IDT) Original-Received: from HOME-C4E4A596F7 ([87.69.4.28]) by mtaout26.012.net.il (HyperSendmail v2007.08) with ESMTPA id <0N3600EF5VA4ZZ60@mtaout26.012.net.il>; Sat, 29 Mar 2014 11:22:53 +0300 (IDT) In-reply-to: <87d2h5qxhm.fsf@fencepost.gnu.org> X-012-Sender: halo1@inter.net.il X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x X-Received-From: 80.179.55.182 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:171121 Archived-At: > From: David Kastrup > Cc: emacs-devel@gnu.org > Date: Sat, 29 Mar 2014 08:23:33 +0100 > > Eli Zaretskii writes: > > >> From: David Kastrup > >> Cc: emacs-devel@gnu.org > >> Date: Fri, 28 Mar 2014 20:25:17 +0100 > >> > >> >> > Then what do you call a buffer whose "text" is encoded? > >> >> > >> >> I can't speak for Stephen, of course, but my impression was he would > >> >> call it "a bad idea". > >> > > >> > Then what other ideas to use when Lisp code needs to encode or decode > >> > text manually? > >> > >> Redecode right to a "binary" coding system would be my guess. > > > > Sorry, I don't follow. Can you tell more what that means? > > It means a buffer where each _character_ has the same value that the > no-longer-available unibyte buffer would have in its bytes/characters. This doesn't seem to be a complete description of what is suggested. E.g., just by looking at the values of characters, it is impossible to distinguish between Latin characters below 256 and raw bytes. In a unibyte buffer, we know how to make that distinction, but if there are no unibyte buffers, something else is needed for doing that. > > The situation I was describing is that I need to do something with > > undecoded bytes before decoding them, or after encoding them. > > You can do that whether or not the conceptual array of 0..255 characters > is internally encoded in unibyte or multibyte encodings. What do you mean by "multibyte encodings" in this context? Are you suggesting to store the bytes 128..255 as Latin-1 characters, i.e. using the 2-byte UTF-8 sequences of the corresponding Latin characters? Or are you suggesting something else?