unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
* Re: master e39cb515a10 1/4: Correctly handle non-BMP characters in Android content file names
@ 2024-03-23 10:24 Eli Zaretskii
  2024-03-23 12:11 ` Po Lu
  0 siblings, 1 reply; 2+ messages in thread
From: Eli Zaretskii @ 2024-03-23 10:24 UTC (permalink / raw)
  To: Po Lu; +Cc: emacs-devel

> diff --git a/lisp/term/android-win.el b/lisp/term/android-win.el
> index 8d262e5da98..6512ef81ff7 100644
> --- a/lisp/term/android-win.el
> +++ b/lisp/term/android-win.el
> @@ -529,5 +529,94 @@ accessible to other programs."
>    (android-browse-url-internal url send))
>  
>  \f
> +;; Coding systems used by androidvfs.c.
> +
> +(define-ccl-program android-encode-jni
> +  `(2 ((loop
> +	(read r0)
> +	(if (r0 < #x1) ; 0x0 is encoded specially in JNI environments.
> +	    ((write #xc0)
> +	     (write #x80))
> +	  ((if (r0 < #x80) ; ASCII
> +	       ((write r0))
> +	     (if (r0 < #x800) ; \u0080 - \u07ff
> +		 ((write ((r0 >> 6) | #xC0))
> +		  (write ((r0 & #x3F) | #x80)))
> +	       ;; \u0800 - \uFFFF
> +	       (if (r0 < #x10000)
> +		   ((write ((r0 >> 12) | #xE0))
> +		    (write (((r0 >> 6) & #x3F) | #x80))
> +		    (write ((r0 & #x3F) | #x80)))
> +		 ;; Supplementary characters must be converted into
> +		 ;; surrogate pairs before encoding.
> +		 (;; High surrogate
> +		  (r1 = ((((r0 - #x10000) >> 10) & #x3ff) + #xD800))
> +		  ;; Low surrogate.
> +		  (r2 = (((r0 - #x10000) & #x3ff) + #xDC00))
> +		  ;; Write both surrogate characters.
> +		  (write ((r1 >> 12) | #xE0))
> +		  (write (((r1 >> 6) & #x3F) | #x80))
> +		  (write ((r1 & #x3F) | #x80))
> +		  (write ((r2 >> 12) | #xE0))
> +		  (write (((r2 >> 6) & #x3F) | #x80))
> +		  (write ((r2 & #x3F) | #x80))))))))
> +	(repeat))))
> +  "Encode characters from the input buffer for Java virtual machines.")

AFAIU, this is because Java uses UTF-16 encoded strings to support
Unicode, is that right?  If so, why not use encode-coding and
decode-coding to en/decode between UTF-16 and the internal
representation?  AFAIR, we want to deprecate CCL, and thus using it in
new code should be avoided.



^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: master e39cb515a10 1/4: Correctly handle non-BMP characters in Android content file names
  2024-03-23 10:24 master e39cb515a10 1/4: Correctly handle non-BMP characters in Android content file names Eli Zaretskii
@ 2024-03-23 12:11 ` Po Lu
  0 siblings, 0 replies; 2+ messages in thread
From: Po Lu @ 2024-03-23 12:11 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel

Eli Zaretskii <eliz@gnu.org> writes:

> AFAIU, this is because Java uses UTF-16 encoded strings to support
> Unicode, is that right?  If so, why not use encode-coding and
> decode-coding to en/decode between UTF-16 and the internal
> representation?  AFAIR, we want to deprecate CCL, and thus using it in
> new code should be avoided.

I think you've misunderstood that code.  Java communicates with C using
a custom character encoding that, while resembling UTF-8, encodes
characters that UTF-8 represents with 4-byte sequences as 3-byte
sequences of surrogate pairs, and the NULL character as a special
two-byte sequence, and it is this unique (i.e. underivable) coding
system that is being defined here.  It wouldn't be wise to remove CCL
until some better means of defining custom coding systems comes into
existence, and when it does I'll be as glad as you to see it replace
CCL, but until then there's not really an alternative short of
implementing it in coding.c.



^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2024-03-23 12:11 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-03-23 10:24 master e39cb515a10 1/4: Correctly handle non-BMP characters in Android content file names Eli Zaretskii
2024-03-23 12:11 ` Po Lu

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).