From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED!not-for-mail From: Philipp Stephani
> From: Philipp Stephani <p.stephani2@gmail.com>
> Date: Sat, 23 Dec 2017 14:29:56 +0000
> Cc: emacs-dev= el@gnu.org, phst@g= oogle.com
>
>=C2=A0 OK, but why do we need external functions for doing that?=C2=A0 = What is
>=C2=A0 missing in our own code to detect such a situation?
>
> Not much I think, it's just easiest to use Gnulib functions becaus= e they are well-documented, have a clean
> interface, and are probably bug-free.
> coding.c has check_utf_8, which is quite similar, but has an incompati= ble interface (it takes struct
> coding_system objects) and also checks for embedded newlines, which is= n't necessary here.
So let's use check_utf_8, as its downsides don't sound serious to m= e,
and OTOH using unistring functions will bloat Emacs
for the benefit o= f
a single use case, not to mention create two different methods for
doing the same job, which IMO is even more confusing to any newcomer
to the Emacs internals.
Btw, doesn't find_charsets_in_text do the same job cleaner and
quicker?=C2=A0 AFAIU, all you need is make sure there are no characters
from the 2 eight-bit-* charsets in the text, or did I miss something?
= blockquote>What I need to check is one of the following= :- Is the initial string either a well-formed UTF-8 unibyte stri= ng, or a multibyte string that represents a Unicode scalar value sequence?<= /div>--001a11c012768206e205610375fc--- Is the encoded string a well-formed UTF-8 unibyte string?=Given my understanding of the implementation of coding.c, these two cr= iteria should be equivalent. (Unfortunately that doesn't seem to be doc= umented.) So I choose to implement the second check, which is easier and al= lows delaying the check until we know we have to signal an error.