* Re: master e39cb515a10 1/4: Correctly handle non-BMP characters in Android content file names
@ 2024-03-23 10:24 Eli Zaretskii
2024-03-23 12:11 ` Po Lu
0 siblings, 1 reply; 2+ messages in thread
From: Eli Zaretskii @ 2024-03-23 10:24 UTC (permalink / raw)
To: Po Lu; +Cc: emacs-devel
> diff --git a/lisp/term/android-win.el b/lisp/term/android-win.el
> index 8d262e5da98..6512ef81ff7 100644
> --- a/lisp/term/android-win.el
> +++ b/lisp/term/android-win.el
> @@ -529,5 +529,94 @@ accessible to other programs."
> (android-browse-url-internal url send))
>
> \f
> +;; Coding systems used by androidvfs.c.
> +
> +(define-ccl-program android-encode-jni
> + `(2 ((loop
> + (read r0)
> + (if (r0 < #x1) ; 0x0 is encoded specially in JNI environments.
> + ((write #xc0)
> + (write #x80))
> + ((if (r0 < #x80) ; ASCII
> + ((write r0))
> + (if (r0 < #x800) ; \u0080 - \u07ff
> + ((write ((r0 >> 6) | #xC0))
> + (write ((r0 & #x3F) | #x80)))
> + ;; \u0800 - \uFFFF
> + (if (r0 < #x10000)
> + ((write ((r0 >> 12) | #xE0))
> + (write (((r0 >> 6) & #x3F) | #x80))
> + (write ((r0 & #x3F) | #x80)))
> + ;; Supplementary characters must be converted into
> + ;; surrogate pairs before encoding.
> + (;; High surrogate
> + (r1 = ((((r0 - #x10000) >> 10) & #x3ff) + #xD800))
> + ;; Low surrogate.
> + (r2 = (((r0 - #x10000) & #x3ff) + #xDC00))
> + ;; Write both surrogate characters.
> + (write ((r1 >> 12) | #xE0))
> + (write (((r1 >> 6) & #x3F) | #x80))
> + (write ((r1 & #x3F) | #x80))
> + (write ((r2 >> 12) | #xE0))
> + (write (((r2 >> 6) & #x3F) | #x80))
> + (write ((r2 & #x3F) | #x80))))))))
> + (repeat))))
> + "Encode characters from the input buffer for Java virtual machines.")
AFAIU, this is because Java uses UTF-16 encoded strings to support
Unicode, is that right? If so, why not use encode-coding and
decode-coding to en/decode between UTF-16 and the internal
representation? AFAIR, we want to deprecate CCL, and thus using it in
new code should be avoided.
^ permalink raw reply [flat|nested] 2+ messages in thread
* Re: master e39cb515a10 1/4: Correctly handle non-BMP characters in Android content file names
2024-03-23 10:24 master e39cb515a10 1/4: Correctly handle non-BMP characters in Android content file names Eli Zaretskii
@ 2024-03-23 12:11 ` Po Lu
0 siblings, 0 replies; 2+ messages in thread
From: Po Lu @ 2024-03-23 12:11 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: emacs-devel
Eli Zaretskii <eliz@gnu.org> writes:
> AFAIU, this is because Java uses UTF-16 encoded strings to support
> Unicode, is that right? If so, why not use encode-coding and
> decode-coding to en/decode between UTF-16 and the internal
> representation? AFAIR, we want to deprecate CCL, and thus using it in
> new code should be avoided.
I think you've misunderstood that code. Java communicates with C using
a custom character encoding that, while resembling UTF-8, encodes
characters that UTF-8 represents with 4-byte sequences as 3-byte
sequences of surrogate pairs, and the NULL character as a special
two-byte sequence, and it is this unique (i.e. underivable) coding
system that is being defined here. It wouldn't be wise to remove CCL
until some better means of defining custom coding systems comes into
existence, and when it does I'll be as glad as you to see it replace
CCL, but until then there's not really an alternative short of
implementing it in coding.c.
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2024-03-23 12:11 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-03-23 10:24 master e39cb515a10 1/4: Correctly handle non-BMP characters in Android content file names Eli Zaretskii
2024-03-23 12:11 ` Po Lu
Code repositories for project(s) associated with this external index
https://git.savannah.gnu.org/cgit/emacs.git
https://git.savannah.gnu.org/cgit/emacs/org-mode.git
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.