From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Eric Abrahamsen Newsgroups: gmane.emacs.help Subject: Re: More confusion about multibyte vs unibyte strings Date: Thu, 05 May 2022 17:45:36 -0700 Message-ID: <87levfmqtr.fsf@ericabrahamsen.net> References: <874k23or0c.fsf@ericabrahamsen.net> <83zgjv288x.fsf@gnu.org> <87v8ujn7ja.fsf@ericabrahamsen.net> <83tua3237r.fsf@gnu.org> Mime-Version: 1.0 Content-Type: text/plain Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="7590"; mail-complaints-to="usenet@ciao.gmane.io" User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/29.0.50 (gnu/linux) To: help-gnu-emacs@gnu.org Cancel-Lock: sha1:Y3kJu7hsfFeqmiUyOgYaAt72ZUs= Original-X-From: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane-mx.org@gnu.org Fri May 06 02:46:26 2022 Return-path: Envelope-to: geh-help-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1nmm6z-0001mv-D5 for geh-help-gnu-emacs@m.gmane-mx.org; Fri, 06 May 2022 02:46:25 +0200 Original-Received: from localhost ([::1]:51230 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1nmm6x-00030z-QD for geh-help-gnu-emacs@m.gmane-mx.org; Thu, 05 May 2022 20:46:23 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:52682) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1nmm6M-00030p-TL for help-gnu-emacs@gnu.org; Thu, 05 May 2022 20:45:46 -0400 Original-Received: from ciao.gmane.io ([116.202.254.214]:59438) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1nmm6L-000340-FQ for help-gnu-emacs@gnu.org; Thu, 05 May 2022 20:45:46 -0400 Original-Received: from list by ciao.gmane.io with local (Exim 4.92) (envelope-from ) id 1nmm6J-0000vN-Ok for help-gnu-emacs@gnu.org; Fri, 06 May 2022 02:45:43 +0200 X-Injected-Via-Gmane: http://gmane.org/ Received-SPF: pass client-ip=116.202.254.214; envelope-from=geh-help-gnu-emacs@m.gmane-mx.org; helo=ciao.gmane.io X-Spam_score_int: -16 X-Spam_score: -1.7 X-Spam_bar: - X-Spam_report: (-1.7 / 5.0 requ) BAYES_00=-1.9, HEADER_FROM_DIFFERENT_DOMAINS=0.249, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=no autolearn_force=no X-Spam_action: no action X-BeenThere: help-gnu-emacs@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Users list for the GNU Emacs text editor List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: "help-gnu-emacs" Xref: news.gmane.io gmane.emacs.help:137160 Archived-At: Eli Zaretskii writes: >> From: Eric Abrahamsen >> Date: Thu, 05 May 2022 11:44:41 -0700 >> >> > Why does it "mess things up", and what exactly is the nature of the >> > mess-up? A pure-ASCII string can be either unibyte or multibyte, and >> > that shouldn't change a thing. >> >> If the string is not ASCII, we need to encode it before sending to the >> server, and tell the server what encoding we used. Microsoft Exchange >> servers can't handle any encoding other than ascii. > > What do you mean by "ascii encoding" in this context? > > When you say that Microsoft Exchange can't handle any encoding other > than ascii, does it mean it cannot handle _any_ non-ASCII addressee > names? That'd be hard to believe, because such addressee names are > nowadays in wide use. So I guess you mean something else, but what? The IMAP search command can look like "UID SEARCH", or "UID SEARCH CHARSET XXX". Specifying no charset is (I think) the same as specifying US-ASCII, which is the only charset that Exchange accepts for the search command. If the search string is multibyte (in my mind this means "multiple bytes per character", I guess that's where I went wrong), you have to encode it as something, tell the server what charset you used to encode it, then send both the encoded string and the number of bytes it represents. The gnus-search code encodes it as emacs-utf-8, and then sends UID SEARCH CHARSET UTF-8, which Exchange won't accept. >> So if our code thinks a string isn't ascii, it sends the encoding >> message to the IMAP server, and Exchange blows up. > > Encoding ascii yields a string that is identical to the original (IIUC > what you mean by "encoding"), so I don't follow you here. > >> If the string is ascii, we don't try to encode it, and everything's >> fine. So I need to know whether the string is actually ascii or not. > > You can do that using the regexp class [:ascii:], I guess. That's how I'll solve it, then.