From: Eli Zaretskii <eliz@gnu.org>
To: Po Lu <luangruo@yahoo.com>
Cc: emacs-devel@gnu.org
Subject: Re: master e39cb515a10 1/4: Correctly handle non-BMP characters in Android content file names
Date: Sat, 23 Mar 2024 12:24:03 +0200 [thread overview]
Message-ID: <86a5mpz1oc.fsf@gnu.org> (raw)
> diff --git a/lisp/term/android-win.el b/lisp/term/android-win.el
> index 8d262e5da98..6512ef81ff7 100644
> --- a/lisp/term/android-win.el
> +++ b/lisp/term/android-win.el
> @@ -529,5 +529,94 @@ accessible to other programs."
> (android-browse-url-internal url send))
>
> \f
> +;; Coding systems used by androidvfs.c.
> +
> +(define-ccl-program android-encode-jni
> + `(2 ((loop
> + (read r0)
> + (if (r0 < #x1) ; 0x0 is encoded specially in JNI environments.
> + ((write #xc0)
> + (write #x80))
> + ((if (r0 < #x80) ; ASCII
> + ((write r0))
> + (if (r0 < #x800) ; \u0080 - \u07ff
> + ((write ((r0 >> 6) | #xC0))
> + (write ((r0 & #x3F) | #x80)))
> + ;; \u0800 - \uFFFF
> + (if (r0 < #x10000)
> + ((write ((r0 >> 12) | #xE0))
> + (write (((r0 >> 6) & #x3F) | #x80))
> + (write ((r0 & #x3F) | #x80)))
> + ;; Supplementary characters must be converted into
> + ;; surrogate pairs before encoding.
> + (;; High surrogate
> + (r1 = ((((r0 - #x10000) >> 10) & #x3ff) + #xD800))
> + ;; Low surrogate.
> + (r2 = (((r0 - #x10000) & #x3ff) + #xDC00))
> + ;; Write both surrogate characters.
> + (write ((r1 >> 12) | #xE0))
> + (write (((r1 >> 6) & #x3F) | #x80))
> + (write ((r1 & #x3F) | #x80))
> + (write ((r2 >> 12) | #xE0))
> + (write (((r2 >> 6) & #x3F) | #x80))
> + (write ((r2 & #x3F) | #x80))))))))
> + (repeat))))
> + "Encode characters from the input buffer for Java virtual machines.")
AFAIU, this is because Java uses UTF-16 encoded strings to support
Unicode, is that right? If so, why not use encode-coding and
decode-coding to en/decode between UTF-16 and the internal
representation? AFAIR, we want to deprecate CCL, and thus using it in
new code should be avoided.
next reply other threads:[~2024-03-23 10:24 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-03-23 10:24 Eli Zaretskii [this message]
2024-03-23 12:11 ` master e39cb515a10 1/4: Correctly handle non-BMP characters in Android content file names Po Lu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=86a5mpz1oc.fsf@gnu.org \
--to=eliz@gnu.org \
--cc=emacs-devel@gnu.org \
--cc=luangruo@yahoo.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this external index
https://git.savannah.gnu.org/cgit/emacs.git
https://git.savannah.gnu.org/cgit/emacs/org-mode.git
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.