From: Eric Abrahamsen <eric@ericabrahamsen.net>
To: help-gnu-emacs@gnu.org
Subject: Re: More confusion about multibyte vs unibyte strings
Date: Thu, 05 May 2022 17:45:36 -0700 [thread overview]
Message-ID: <87levfmqtr.fsf@ericabrahamsen.net> (raw)
In-Reply-To: 83tua3237r.fsf@gnu.org
Eli Zaretskii <eliz@gnu.org> writes:
>> From: Eric Abrahamsen <eric@ericabrahamsen.net>
>> Date: Thu, 05 May 2022 11:44:41 -0700
>>
>> > Why does it "mess things up", and what exactly is the nature of the
>> > mess-up? A pure-ASCII string can be either unibyte or multibyte, and
>> > that shouldn't change a thing.
>>
>> If the string is not ASCII, we need to encode it before sending to the
>> server, and tell the server what encoding we used. Microsoft Exchange
>> servers can't handle any encoding other than ascii.
>
> What do you mean by "ascii encoding" in this context?
>
> When you say that Microsoft Exchange can't handle any encoding other
> than ascii, does it mean it cannot handle _any_ non-ASCII addressee
> names? That'd be hard to believe, because such addressee names are
> nowadays in wide use. So I guess you mean something else, but what?
The IMAP search command can look like "UID SEARCH", or "UID SEARCH
CHARSET XXX". Specifying no charset is (I think) the same as specifying
US-ASCII, which is the only charset that Exchange accepts for the search
command.
If the search string is multibyte (in my mind this means "multiple bytes
per character", I guess that's where I went wrong), you have to encode
it as something, tell the server what charset you used to encode it,
then send both the encoded string and the number of bytes it represents.
The gnus-search code encodes it as emacs-utf-8, and then sends UID
SEARCH CHARSET UTF-8, which Exchange won't accept.
>> So if our code thinks a string isn't ascii, it sends the encoding
>> message to the IMAP server, and Exchange blows up.
>
> Encoding ascii yields a string that is identical to the original (IIUC
> what you mean by "encoding"), so I don't follow you here.
>
>> If the string is ascii, we don't try to encode it, and everything's
>> fine. So I need to know whether the string is actually ascii or not.
>
> You can do that using the regexp class [:ascii:], I guess.
That's how I'll solve it, then.
next prev parent reply other threads:[~2022-05-06 0:45 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-05-05 16:58 More confusion about multibyte vs unibyte strings Eric Abrahamsen
2022-05-05 17:34 ` Eli Zaretskii
2022-05-05 18:44 ` Eric Abrahamsen
2022-05-05 19:23 ` Eli Zaretskii
2022-05-06 0:45 ` Eric Abrahamsen [this message]
2022-05-06 2:58 ` Stefan Monnier via Users list for the GNU Emacs text editor
2022-05-06 16:45 ` Eric Abrahamsen
2022-05-06 17:39 ` Stefan Monnier via Users list for the GNU Emacs text editor
2022-05-06 18:02 ` Eric Abrahamsen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://www.gnu.org/software/emacs/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87levfmqtr.fsf@ericabrahamsen.net \
--to=eric@ericabrahamsen.net \
--cc=help-gnu-emacs@gnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).