From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Stefan Monnier via Users list for the GNU Emacs text editor Newsgroups: gmane.emacs.help Subject: Re: More confusion about multibyte vs unibyte strings Date: Fri, 06 May 2022 13:39:08 -0400 Message-ID: References: <874k23or0c.fsf@ericabrahamsen.net> <83zgjv288x.fsf@gnu.org> <87v8ujn7ja.fsf@ericabrahamsen.net> <83tua3237r.fsf@gnu.org> <87levfmqtr.fsf@ericabrahamsen.net> <87h762fw4y.fsf@ericabrahamsen.net> Reply-To: Stefan Monnier Mime-Version: 1.0 Content-Type: text/plain Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="648"; mail-complaints-to="usenet@ciao.gmane.io" User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/29.0.50 (gnu/linux) To: help-gnu-emacs@gnu.org Cancel-Lock: sha1:gwi/oOJsDf1J8kBX7kCtmeYk3AA= Original-X-From: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane-mx.org@gnu.org Fri May 06 19:41:01 2022 Return-path: Envelope-to: geh-help-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1nn1wr-000AWT-3z for geh-help-gnu-emacs@m.gmane-mx.org; Fri, 06 May 2022 19:41:01 +0200 Original-Received: from localhost ([::1]:37680 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1nn1wq-0006Kb-2m for geh-help-gnu-emacs@m.gmane-mx.org; Fri, 06 May 2022 13:41:00 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:54832) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1nn1vC-0006Ja-Cy for help-gnu-emacs@gnu.org; Fri, 06 May 2022 13:39:18 -0400 Original-Received: from ciao.gmane.io ([116.202.254.214]:58346) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1nn1v9-0008KI-SO for help-gnu-emacs@gnu.org; Fri, 06 May 2022 13:39:18 -0400 Original-Received: from list by ciao.gmane.io with local (Exim 4.92) (envelope-from ) id 1nn1v7-0008OA-PI for help-gnu-emacs@gnu.org; Fri, 06 May 2022 19:39:13 +0200 X-Injected-Via-Gmane: http://gmane.org/ Received-SPF: pass client-ip=116.202.254.214; envelope-from=geh-help-gnu-emacs@m.gmane-mx.org; helo=ciao.gmane.io X-Spam_score_int: -16 X-Spam_score: -1.7 X-Spam_bar: - X-Spam_report: (-1.7 / 5.0 requ) BAYES_00=-1.9, HEADER_FROM_DIFFERENT_DOMAINS=0.249, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=no autolearn_force=no X-Spam_action: no action X-BeenThere: help-gnu-emacs@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Users list for the GNU Emacs text editor List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: "help-gnu-emacs" Xref: news.gmane.io gmane.emacs.help:137166 Archived-At: >>> If the search string is multibyte (in my mind this means "multiple bytes >>> per character", I guess that's where I went wrong), you have to encode >> >> In ELisp, "multibyte" means "a sequence of characters", whereas >> "unibyte" means "a sequence of bytes". > > Okay, thanks. I'd thought that distinction was covered by "encoded" vs > "decoded" strings. Maybe the lesson will stick this time. There's no reliable way to determine whether a string is decoded (other than to trace its origin and figure out what the code intended it to mean). This said, multibyte/unibyte can be used as an approximation of decoded/encoded (my own local hacks include signaling errors when trying to decode a multibyte string or to encode a unibyte string, but it trips over various places where we do that for legitimate reasons :-( ) Stefan