unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
From: Eli Zaretskii <eliz@gnu.org>
To: Lars Ingebrigtsen <larsi@gnus.org>
Cc: hmelman@gmail.com, 51292@debbugs.gnu.org
Subject: bug#51292: 27.2; Reversing strings with unicode combining characters
Date: Wed, 20 Oct 2021 14:45:46 +0300	[thread overview]
Message-ID: <83pmrzc42d.fsf@gnu.org> (raw)
In-Reply-To: <87fsswbyu0.fsf@gnus.org> (message from Lars Ingebrigtsen on Tue,  19 Oct 2021 21:26:31 +0200)

> From: Lars Ingebrigtsen <larsi@gnus.org>
> Date: Tue, 19 Oct 2021 21:26:31 +0200
> Cc: 51292@debbugs.gnu.org
> 
> Howard Melman <hmelman@gmail.com> writes:
> 
> > Reversing a string fails to account for unicode combining characters
> >
> >     (reverse "nai\u0308ve")
> >     "ev̈ian"
> >
> > Note the diaeresis is now on the v and not the i.  s-reverse gets it right:
> >
> >     (s-reverse "nai\u0308ve")
> >     "evïan"
> 
> So I wondered what s-reverse did, and indeed:
> 
> (defun s-reverse (s)
>   "Return the reverse of S."
>   (declare (pure t) (side-effect-free t))
>   (save-match-data
>     (if (multibyte-string-p s)
>         (let ((input (string-to-list s))
>               output)
>           (require 'ucs-normalize)
>           (while input
>             ;; Handle entire grapheme cluster as a single unit
>             (let ((grapheme (list (pop input))))
>               (while (memql (car input) ucs-normalize-combining-chars)
>                 (push (pop input) grapheme))
>               (setq output (nconc (nreverse grapheme) output))))
>           (concat output))
>       (concat (nreverse (string-to-list s))))))
> 
> Emacs has string-reverse, obsolete since 25.1.  Perhaps we should
> reintroduce it and use the definition from s?

I don't understand the use case(s) where this could be useful.  If
this is for display, then displaying text needs much more than just
combining accents with the base characters.  E.g., what if the accent
should not combine when the order is reversed, i.e. the composition
rules depend on the following characters as well?  And what if
character composition is not due to normalization rules.  Or what if
the text includes bidirectional scripts, whose reversal rules are
either very complex or simply undefined?

If this is not for display, then where is this useful and why?

If someone can describe real-life use cases, we could reason whether
doing something like that could be useful enough.  Without that, the
code in s-reverse seems like an incomplete semi-feature which supports
some limited use cases that someone needed in some specific situation,
not a useful general feature that handles the issue anywhere close to
completeness.





      parent reply	other threads:[~2021-10-20 11:45 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-10-19 19:16 bug#51292: 27.2; Reversing strings with unicode combining characters Howard Melman
2021-10-19 19:26 ` Lars Ingebrigtsen
2021-10-19 20:50   ` Lars Ingebrigtsen
2021-10-19 21:21     ` Howard Melman
2021-10-20  8:58       ` Lars Ingebrigtsen
2021-10-19 23:13     ` Stefan Kangas
2021-10-20  8:11       ` Lars Ingebrigtsen
2021-10-20 13:02         ` Stefan Kangas
2021-10-21  2:50           ` Lars Ingebrigtsen
2021-10-21  3:51             ` Stefan Kangas
2021-10-20 11:50     ` Eli Zaretskii
2021-10-20 11:45   ` Eli Zaretskii [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=83pmrzc42d.fsf@gnu.org \
    --to=eliz@gnu.org \
    --cc=51292@debbugs.gnu.org \
    --cc=hmelman@gmail.com \
    --cc=larsi@gnus.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).