all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: Eli Zaretskii <eliz@gnu.org>
To: Po Lu <luangruo@yahoo.com>
Cc: emacs-devel@gnu.org
Subject: Re: master e39cb515a10 1/4: Correctly handle non-BMP characters in Android content file names
Date: Sat, 23 Mar 2024 12:24:03 +0200	[thread overview]
Message-ID: <86a5mpz1oc.fsf@gnu.org> (raw)

> diff --git a/lisp/term/android-win.el b/lisp/term/android-win.el
> index 8d262e5da98..6512ef81ff7 100644
> --- a/lisp/term/android-win.el
> +++ b/lisp/term/android-win.el
> @@ -529,5 +529,94 @@ accessible to other programs."
>    (android-browse-url-internal url send))
>  
>  \f
> +;; Coding systems used by androidvfs.c.
> +
> +(define-ccl-program android-encode-jni
> +  `(2 ((loop
> +	(read r0)
> +	(if (r0 < #x1) ; 0x0 is encoded specially in JNI environments.
> +	    ((write #xc0)
> +	     (write #x80))
> +	  ((if (r0 < #x80) ; ASCII
> +	       ((write r0))
> +	     (if (r0 < #x800) ; \u0080 - \u07ff
> +		 ((write ((r0 >> 6) | #xC0))
> +		  (write ((r0 & #x3F) | #x80)))
> +	       ;; \u0800 - \uFFFF
> +	       (if (r0 < #x10000)
> +		   ((write ((r0 >> 12) | #xE0))
> +		    (write (((r0 >> 6) & #x3F) | #x80))
> +		    (write ((r0 & #x3F) | #x80)))
> +		 ;; Supplementary characters must be converted into
> +		 ;; surrogate pairs before encoding.
> +		 (;; High surrogate
> +		  (r1 = ((((r0 - #x10000) >> 10) & #x3ff) + #xD800))
> +		  ;; Low surrogate.
> +		  (r2 = (((r0 - #x10000) & #x3ff) + #xDC00))
> +		  ;; Write both surrogate characters.
> +		  (write ((r1 >> 12) | #xE0))
> +		  (write (((r1 >> 6) & #x3F) | #x80))
> +		  (write ((r1 & #x3F) | #x80))
> +		  (write ((r2 >> 12) | #xE0))
> +		  (write (((r2 >> 6) & #x3F) | #x80))
> +		  (write ((r2 & #x3F) | #x80))))))))
> +	(repeat))))
> +  "Encode characters from the input buffer for Java virtual machines.")

AFAIU, this is because Java uses UTF-16 encoded strings to support
Unicode, is that right?  If so, why not use encode-coding and
decode-coding to en/decode between UTF-16 and the internal
representation?  AFAIR, we want to deprecate CCL, and thus using it in
new code should be avoided.



             reply	other threads:[~2024-03-23 10:24 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-23 10:24 Eli Zaretskii [this message]
2024-03-23 12:11 ` master e39cb515a10 1/4: Correctly handle non-BMP characters in Android content file names Po Lu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=86a5mpz1oc.fsf@gnu.org \
    --to=eliz@gnu.org \
    --cc=emacs-devel@gnu.org \
    --cc=luangruo@yahoo.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.