From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.devel Subject: Re: Unibyte characters, strings and buffers Date: Fri, 28 Mar 2014 11:18:02 +0300 Message-ID: <837g7eybwl.fsf@gnu.org> References: <831txozsqa.fsf@gnu.org> <83ppl7y30l.fsf@gnu.org> <83d2h6yezx.fsf@gnu.org> <533528B9.9040200@cs.ucla.edu> Reply-To: Eli Zaretskii NNTP-Posting-Host: plane.gmane.org X-Trace: ger.gmane.org 1395994689 27466 80.91.229.3 (28 Mar 2014 08:18:09 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Fri, 28 Mar 2014 08:18:09 +0000 (UTC) Cc: emacs-devel@gnu.org To: Paul Eggert Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Fri Mar 28 09:18:18 2014 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1WTRzW-0004Yp-2d for ged-emacs-devel@m.gmane.org; Fri, 28 Mar 2014 09:18:18 +0100 Original-Received: from localhost ([::1]:57785 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1WTRzV-0001ME-Hc for ged-emacs-devel@m.gmane.org; Fri, 28 Mar 2014 04:18:17 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:56198) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1WTRzN-0001Fr-PD for emacs-devel@gnu.org; Fri, 28 Mar 2014 04:18:14 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1WTRzH-00061z-2s for emacs-devel@gnu.org; Fri, 28 Mar 2014 04:18:09 -0400 Original-Received: from mtaout23.012.net.il ([80.179.55.175]:57617) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1WTRzG-00061h-R1 for emacs-devel@gnu.org; Fri, 28 Mar 2014 04:18:03 -0400 Original-Received: from conversion-daemon.a-mtaout23.012.net.il by a-mtaout23.012.net.il (HyperSendmail v2007.08) id <0N3400900ZZWO900@a-mtaout23.012.net.il> for emacs-devel@gnu.org; Fri, 28 Mar 2014 11:18:01 +0300 (IDT) Original-Received: from HOME-C4E4A596F7 ([87.69.4.28]) by a-mtaout23.012.net.il (HyperSendmail v2007.08) with ESMTPA id <0N350098O0E0NK30@a-mtaout23.012.net.il>; Fri, 28 Mar 2014 11:18:01 +0300 (IDT) In-reply-to: <533528B9.9040200@cs.ucla.edu> X-012-Sender: halo1@inter.net.il X-detected-operating-system: by eggs.gnu.org: Solaris 10 X-Received-From: 80.179.55.175 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:171065 Archived-At: (I retitled the subject, because the unibyte issue is sufficiently different from what I originally raised.) > Date: Fri, 28 Mar 2014 00:46:01 -0700 > From: Paul Eggert > CC: emacs-devel@gnu.org > > Eli Zaretskii wrote: > > How to compare bytes, then? > > It depends on what kind of comparison one wants. Simplest is to use > '='. To ignore case and treat bytes 128-255 as Latin-1 characters, use > 'downcase' first. To ignore case and treat bytes 128-255 as > uninterpreted bit patterns, use 'unibyte-char-to-multibyte' before > downcasing. Etc. > > > we don't have a way of distinguishing between characters and > > bytes, unless we look on something besides the arguments themselves. > > Yes, that's right. Which is why your suggestions above will not necessarily DTRT. Arbitrary interpretation of bytes 128-255 as Latin-1 is not guaranteed to be correct, and therefore 'downcase' will sometimes produce unexpected results, unless we can make sure, somehow, that raw bytes will never be exposed to Lisp as having these values. Unless you show a practical way towards the latter goal, what you suggest will just replace one set of subtly buggy behaviors with another (in which case I vote for what we already have, because that one is at least well known and passed some test of time).