On Wed, Dec 4, 2013 at 9:29 AM, Eli Zaretskii wrote: > > From: Josh > > Date: Wed, 4 Dec 2013 06:00:46 -0800 > > Cc: Michael Albinus , 16048@debbugs.gnu.org > > On Wed, Dec 4, 2013 at 5:07 AM, Andreas Schwab wrote: > > > michael.albinus@gmx.de writes: > > > > > > > The following form evals to nil: > > > > > > > > (string-equal "\377" "ÿ") > > > > > > "\377" is a unibyte string. When converted to multibyte it yields > > > "\x3fffff". > > > > > > At least as of 24.3, the manual[0] suggests that such a conversion > > should not occur in this case: > And it doesn't occur, indeed: > > (multibyte-string-p "\377") > > => nil > > > You can also use hexadecimal escape sequences (`\xN') and octal > > escape sequences (`\N') in string constants. *But beware:* If a > > string constant contains hexadecimal or octal escape sequences, > > and these escape sequences all specify unibyte characters (i.e., > > less than 256), and there are no other literal non-ASCII > > characters or Unicode-style escape sequences in the string, then > > Emacs automatically assumes that it is a unibyte string. That is > > to say, it assumes that all non-ASCII characters occurring in the > > string are 8-bit raw bytes. > > > > [0] (info "(elisp) Non-ASCII in Strings") > Best citation contest? you're on! No, thanks. I haven't entered such contests in many years. > -- Function: string= string1 string2 > This function returns `t' if the characters of the two strings > match exactly. Symbols are also allowed as arguments, in which > case the symbol names are used. Case is always significant, > regardless of `case-fold-search'. > > [...] > > For technical reasons, a unibyte and a multibyte string are > `equal' if and only if they contain the same sequence of character > codes and all these codes are either in the range 0 through 127 > (ASCII) or 160 through 255 (`eight-bit-graphic'). However, when a > unibyte string is converted to a multibyte string, all characters > with codes in the range 160 through 255 are converted to > characters with higher codes, whereas ASCII characters remain > unchanged. Thus, a unibyte string and its conversion to multibyte > are only `equal' if the string is all ASCII. > > Note the last sentence. Yes, I must have misunderstood Andreas' meaning; I believed he was suggesting that the two strings compared differently due to "\377" having been converted to a multibyte string and therefore miscomparing with the unibyte (or so I thought) string "ÿ". I see now that I had it exactly backwards. Thanks for setting me straight.