unofficial mirror of guile-user@gnu.org 
 help / color / mirror / Atom feed
From: divoplade <d@divoplade.fr>
To: "Jérémy Korwin-Zmijowski" <jeremy@korwin-zmijowski.fr>,
	"Mailing list Guile User" <guile-user@gnu.org>
Subject: Re: Guile Hacker Handbook - Character sets
Date: Fri, 19 Feb 2021 00:15:15 +0100	[thread overview]
Message-ID: <de03be9bcbae69eeb38a1ff6d745babe7b15bfbe.camel@divoplade.fr> (raw)
In-Reply-To: <B9B5B608-F033-4FF6-BFDA-A04EDF8FB87B@korwin-zmijowski.fr>

Hello,

Le jeudi 18 février 2021 à 20:54 +0100, Jérémy Korwin-Zmijowski a
écrit :
> I happily managed to find some time to write a new chapter for the
> Guile Hacker Handbook !
> 
> https://jeko.frama.io/en/char-sets.html
> 
> It deals with char-sets, something new to me. The exercise was fun, I
> liked how convenient it is to play with these data type.

The use of unicode makes it tempting to think that each thing you can
index in a string is a character. This will work most of the time,
except in some cases with foreign languages. This remark is general,
and applies in many situations, including the previous chapter about
characters. I suggest reading: 
https://manishearth.github.io/blog/2017/01/14/stop-ascribing-meaning-to-unicode-code-points/
.

Fortunately, there are very few international problems that need to
look at individual characters of a string. Your password rules example
is arguably one of them, although it may make non-latin users angry
(this upper case / lower case distinction does not work in chinese, as
far as I know). The other example that I'm aware of is limiting the
size of a message so that the reader does not get bored (so, not for
storage reasons). One website famously limits the number of unicode
code points for a message, although it is in fact much more complex and
opinionated than expected (
https://developer.twitter.com/en/docs/counting-characters).

I think that the approach of demonstrating general code that works with
latin except "special characters" is rude to the rest of the world and
should not be put in such a strategic place as the Guile Hacker
Handbook.

For your example, I suggest switching to something that has more
structure and is purposedly latin, for instance checking the validity
of IBAN accounts, car license plates in an applicable country, maybe
your grocery store's customer ID... You can also invent your own.

The previous chapter about characters gives a good importance to letter
intervals, which is even more difficult because the locale order would
put 'é' after 'e' and before 'f', but the char>=? predicate would put
it after everything. So, this does not even work for all latin. And if
you use the locale order, then you won't even have meaningful character
ranges anymore.

Unicode is a very complex beast, with very few general use cases. Don't
let that discourage you. Fortunately, most of everyday computing tasks
can be solved without going down to the unicode character semantics. As
a general idea, I would suggest to stay away from characters, and start
with strings.




  parent reply	other threads:[~2021-02-18 23:15 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-02-18 19:54 Guile Hacker Handbook - Character sets Jérémy Korwin-Zmijowski
2021-02-18 21:53 ` Ricardo Wurmus
2021-02-19 11:15   ` Jérémy Korwin-Zmijowski
2021-02-18 21:54 ` Ricardo Wurmus
2021-02-19  8:10   ` Eli Zaretskii
2021-02-19 10:20     ` Jérémy Korwin-Zmijowski
2021-02-19 12:16       ` Eli Zaretskii
2021-02-18 21:56 ` Zelphir Kaltstahl
2021-02-19 11:23   ` Jérémy Korwin-Zmijowski
2021-02-19 17:26     ` Dr. Arne Babenhauserheide
2021-02-18 23:15 ` divoplade [this message]
2021-02-19  0:10   ` John Cowan
2021-02-19 12:59   ` Jérémy Korwin-Zmijowski
2021-02-19  7:09 ` Dr. Arne Babenhauserheide
2021-02-19 21:31   ` Jérémy Korwin-Zmijowski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/guile/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=de03be9bcbae69eeb38a1ff6d745babe7b15bfbe.camel@divoplade.fr \
    --to=d@divoplade.fr \
    --cc=guile-user@gnu.org \
    --cc=jeremy@korwin-zmijowski.fr \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).