all messages for Guix-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
* Relaxing the restrictions for store item names
@ 2023-08-22  6:49 Eidvilas Markevičius
  2023-08-23 19:04 ` jbranso
                   ` (4 more replies)
  0 siblings, 5 replies; 24+ messages in thread
From: Eidvilas Markevičius @ 2023-08-22  6:49 UTC (permalink / raw)
  To: guix-devel

Hello Guix,

Not long ago, somebody has raised an issue regarding an error that
occurs whenever some unconventional character is used as the name for
a store item [0]. Tobias Geerinckx-Rice pointed out that this
restriction was directly inherited from the Nix source code [1] and
that, as such, it isn't really a bug. Regardless, I believe that the
imposed limitation may be undesirable in some situations. One that I
can think of off the top of my head is packaging a piece of software
with a name that contains non-Latin characters in it (e.g.,
"Naršytuvas" by Raštija [2]). Of course, there are very few examples
of such programs in actual practice, but there's a small chance of
encountering them from time to time, especially if they're oriented
towards non-English speaking users, and personally, I don't feel like
resorting to transliteration is a good solution to this. After all,
it's 2023, why would such a restriction need to be there in the first
place when most filesystems are able to handle unicode characters just
fine?

Another scenario where these artificial restrictions could be a
potential cause of trouble is when we consider a possibility that Guix
might be used for packaging and distributing not only software, but
all kinds of non-executable data such as films, books, music,
databases, historical documents, website archives, etc. [3]. In the
case of website archives: say I wanted to package the contents of the
whole raštija.lt website. When choosing the package name for it,
should I go with "rastija.lt", "rashtija.lt", or "raštija.lt". The
latter would be a clear winner in my mind, since it is the canonical
domain name for that particular site. And for all other types of data
and media packages, using the official/original titles for their names
would, too, be much more preferable over making use of any kind of
transcription or transliteration method, IMO.

Therefore, my proposal is to relax these limitations as much as
possible (or at least somewhat) and to allow some more freedom when it
comes to naming packages and other kinds of items in the store. We
could, of course, still disallow all the main problematic characters,
such as NUL, /, $, ~, space, newline and a few others, but other than
that, I don't see any reason to forbid any of the remaining ones from
being used.

I'd like to hear your opinions on this and get to know whether this
idea is feasible to implement at all or not, and if not – why?

[0] https://issues.guix.gnu.org/64976
[1] https://git.savannah.gnu.org/cgit/guix.git/tree/nix/libstore/store-api.cc#n58
[2] https://raštija.lt/liepa/paslaugos-vartotojams/narsytuvas
[3] https://gitlab.com/guix-media-channels


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Relaxing the restrictions for store item names
  2023-08-22  6:49 Relaxing the restrictions for store item names Eidvilas Markevičius
@ 2023-08-23 19:04 ` jbranso
  2023-08-24  6:56 ` (
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 24+ messages in thread
From: jbranso @ 2023-08-23 19:04 UTC (permalink / raw)
  To: Eidvilas Markevičius, guix-devel

August 23, 2023 6:27 AM, "Eidvilas Markevičius" <markeviciuseidvilas@gmail.com> wrote:

> Hello Guix,
> 
> Another scenario where these artificial restrictions could be a
> potential cause of trouble is when we consider a possibility that Guix
> might be used for packaging and distributing not only software, but
> all kinds of non-executable data such as films, books, music,
> databases, historical documents, website archives, etc. [3]. 

+1 on distributing films with guix.  I personally want to package the video lectures
for structure and interpretation of programs to guix.
 
> [0] https://issues.guix.gnu.org/64976
> [1] https://git.savannah.gnu.org/cgit/guix.git/tree/nix/libstore/store-api.cc#n58
> [2] https://raštija.lt/liepa/paslaugos-vartotojams/narsytuvas
> [3] https://gitlab.com/guix-media-channels


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Relaxing the restrictions for store item names
  2023-08-22  6:49 Relaxing the restrictions for store item names Eidvilas Markevičius
  2023-08-23 19:04 ` jbranso
@ 2023-08-24  6:56 ` (
  2023-08-24  7:16   ` MSavoritias
  2023-08-24 10:33 ` Simon Tournier
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 24+ messages in thread
From: ( @ 2023-08-24  6:56 UTC (permalink / raw)
  To: Eidvilas Markevičius; +Cc: guix-devel

Eidvilas Markevičius <markeviciuseidvilas@gmail.com> writes:
> with a name that contains non-Latin characters in it (e.g.,
> "Naršytuvas" by Raštija [2]). 

I think we should stick to ASCII characters in package names, since it's
a bit difficult to type `guix install naršytuvas` for those who don't
have keyboards with the 'š' character.  You can do it in emacs with
insert-char or Evil digraphs, but not everyone uses the terminal in
emacs :)

(In fact, controversial studies show that some people may not even use
Emacs at all. This observation may well overturn the entirety of known
physics if proven.)

  -- (


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Relaxing the restrictions for store item names
  2023-08-24  6:56 ` (
@ 2023-08-24  7:16   ` MSavoritias
  2023-08-24  7:31     ` Nguyễn Gia Phong via Development of GNU Guix and the GNU System distribution.
  0 siblings, 1 reply; 24+ messages in thread
From: MSavoritias @ 2023-08-24  7:16 UTC (permalink / raw)
  To: (; +Cc: guix-devel


And some people don't have an english keyboard so its harder to type
english characters. Thats not a reason to exclude people in either
direction :)

I was not aware that its not possible to have Unicode characters in
store names but that is a bug to me at the very least (and exclusionary
of course). We should open a bug report and work on fixing the bug.

MSavoritias

"(" <paren@disroot.org> writes:

> Eidvilas Markevičius <markeviciuseidvilas@gmail.com> writes:
>> with a name that contains non-Latin characters in it (e.g.,
>> "Naršytuvas" by Raštija [2]). 
>
> I think we should stick to ASCII characters in package names, since it's
> a bit difficult to type `guix install naršytuvas` for those who don't
> have keyboards with the 'š' character.  You can do it in emacs with
> insert-char or Evil digraphs, but not everyone uses the terminal in
> emacs :)
>
> (In fact, controversial studies show that some people may not even use
> Emacs at all. This observation may well overturn the entirety of known
> physics if proven.)
>
>   -- (



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Relaxing the restrictions for store item names
  2023-08-24  7:16   ` MSavoritias
@ 2023-08-24  7:31     ` Nguyễn Gia Phong via Development of GNU Guix and the GNU System distribution.
  2023-08-24  7:35       ` Fannys
  2023-08-24  7:41       ` MSavoritias
  0 siblings, 2 replies; 24+ messages in thread
From: Nguyễn Gia Phong via Development of GNU Guix and the GNU System distribution. @ 2023-08-24  7:31 UTC (permalink / raw)
  To: MSavoritias, (; +Cc: guix-devel

On 2023-08-24 at 10:16+03:00, MSavoritias wrote:
> "(" <paren@disroot.org> writes:
> > Eidvilas Markevičius <markeviciuseidvilas@gmail.com> writes:
> > > with a name that contains non-Latin characters in it
> > > (e.g., "Naršytuvas" by Raštija [2]).
> >
> > I think we should stick to ASCII characters in package names,
> > since it's a bit difficult to type `guix install naršytuvas`
> > for those who don't have keyboards with the 'š' character.
>
> And some people don't have an english keyboard so its harder to type
> english characters. Thats not a reason to exclude people in either
> direction :)

I think the distinction must be made here between Guix and GuixSD.

On 2023-08-24 at 10:16+03:00, MSavoritias wrote:
> I was not aware that its not possible to have Unicode characters
> in store names but that is a bug to me at the very least
> (and exclusionary of course). We should open a bug report
> and work on fixing the bug.

The packaging software should support full localization,
but the distro should target the least common denominator.
AFAIK most keyboard layouts cover ASCII alphanumerics
but the reverse is not true.

Inclusion goes both ways.  Imagine trying to type my name
to fix a broken install that only boots to run level 3.


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Relaxing the restrictions for store item names
  2023-08-24  7:31     ` Nguyễn Gia Phong via Development of GNU Guix and the GNU System distribution.
@ 2023-08-24  7:35       ` Fannys
  2023-08-24  7:41       ` MSavoritias
  1 sibling, 0 replies; 24+ messages in thread
From: Fannys @ 2023-08-24  7:35 UTC (permalink / raw)
  To: Nguyễn Gia Phong; +Cc: guix-devel


Nguyễn Gia Phong <cnx@loang.net> writes:

> On 2023-08-24 at 10:16+03:00, MSavoritias wrote:
>> "(" <paren@disroot.org> writes:
>> > Eidvilas Markevičius <markeviciuseidvilas@gmail.com> writes:
>> > > with a name that contains non-Latin characters in it
>> > > (e.g., "Naršytuvas" by Raštija [2]).
>> >
>> > I think we should stick to ASCII characters in package names,
>> > since it's a bit difficult to type `guix install naršytuvas`
>> > for those who don't have keyboards with the 'š' character.
>>
>> And some people don't have an english keyboard so its harder to type
>> english characters. Thats not a reason to exclude people in either
>> direction :)
>
> I think the distinction must be made here between Guix and GuixSD.
>
> On 2023-08-24 at 10:16+03:00, MSavoritias wrote:
>> I was not aware that its not possible to have Unicode characters
>> in store names but that is a bug to me at the very least
>> (and exclusionary of course). We should open a bug report
>> and work on fixing the bug.
>
> The packaging software should support full localization,
> but the distro should target the least common denominator.
> AFAIK most keyboard layouts cover ASCII alphanumerics
> but the reverse is not true.
>
> Inclusion goes both ways.  Imagine trying to type my name
> to fix a broken install that only boots to run level 3.

Depends what do we mean the "distro" here.
If I can pick arabic or chinese in the installation as a display
language and also I am able to use an arabic/chinese keyboard sounds
good to me.

Regarding the initial question it was about package names to my
understanding. Specifically package names in the store to use unicode
characters. Which makes perfect sense there because some packages dont
use ascii names. Regarding the broken install example, most (all?) base
packages use ASCII due to unix historical baggage. So you shouldn't need
to type anything non ASCII to fix an install with only basic packages.

Not that I would care if that changed personally. Non Ascii people have
been accomodating for a long time already. Some balance would be nice :)


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Relaxing the restrictions for store item names
  2023-08-24  7:31     ` Nguyễn Gia Phong via Development of GNU Guix and the GNU System distribution.
  2023-08-24  7:35       ` Fannys
@ 2023-08-24  7:41       ` MSavoritias
  2023-08-24  8:10         ` Nguyễn Gia Phong via Development of GNU Guix and the GNU System distribution.
  1 sibling, 1 reply; 24+ messages in thread
From: MSavoritias @ 2023-08-24  7:41 UTC (permalink / raw)
  To: Nguyễn Gia Phong; +Cc: (, guix-devel


Nguyễn Gia Phong <cnx@loang.net> writes:

> On 2023-08-24 at 10:16+03:00, MSavoritias wrote:
>> "(" <paren@disroot.org> writes:
>> > Eidvilas Markevičius <markeviciuseidvilas@gmail.com> writes:
>> > > with a name that contains non-Latin characters in it
>> > > (e.g., "Naršytuvas" by Raštija [2]).
>> >
>> > I think we should stick to ASCII characters in package names,
>> > since it's a bit difficult to type `guix install naršytuvas`
>> > for those who don't have keyboards with the 'š' character.
>>
>> And some people don't have an english keyboard so its harder to type
>> english characters. Thats not a reason to exclude people in either
>> direction :)
>
> I think the distinction must be made here between Guix and GuixSD.
>
> On 2023-08-24 at 10:16+03:00, MSavoritias wrote:
>> I was not aware that its not possible to have Unicode characters
>> in store names but that is a bug to me at the very least
>> (and exclusionary of course). We should open a bug report
>> and work on fixing the bug.
>
> The packaging software should support full localization,
> but the distro should target the least common denominator.
> AFAIK most keyboard layouts cover ASCII alphanumerics
> but the reverse is not true.
>
> Inclusion goes both ways.  Imagine trying to type my name
> to fix a broken install that only boots to run level 3.

Depends what do we mean the "distro" here.
If I can pick arabic or chinese in the installation as a display
language and also I am able to use an arabic/chinese keyboard sounds
good to me.

Regarding the initial question it was about package names to my
understanding. Specifically package names in the store to use unicode
characters. Which makes perfect sense there because some packages dont
use ascii names. Regarding the broken install example, most (all?) base
packages use ASCII due to unix historical baggage. So you shouldn't need
to type anything non ASCII to fix an install with only basic packages.

Not that I would care if that changed personally. Non Ascii people have
been accomodating for a long time already. Some balance would be nice :)

MSavoritias


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Relaxing the restrictions for store item names
  2023-08-24  7:41       ` MSavoritias
@ 2023-08-24  8:10         ` Nguyễn Gia Phong via Development of GNU Guix and the GNU System distribution.
  2023-08-24  8:18           ` MSavoritias
  0 siblings, 1 reply; 24+ messages in thread
From: Nguyễn Gia Phong via Development of GNU Guix and the GNU System distribution. @ 2023-08-24  8:10 UTC (permalink / raw)
  To: MSavoritias; +Cc: (, guix-devel

[-- Attachment #1: Type: text/plain, Size: 1976 bytes --]

On 2023-08-24 at 10:41+03:00, MSavoritias wrote:
> Nguyễn Gia Phong <cnx@loang.net> writes:
> > I think the distinction must be made here between Guix and GuixSD.
> >
> > The packaging software should support full localization,
> > but the distro should target the least common denominator.
>
> Depends what do we mean the "distro" here.
> If I can pick arabic or chinese in the installation as a display
> language and also I am able to use an arabic/chinese keyboard sounds
> good to me.

I meant GuixSD.  I agree a distribution based on Guix Systems
shouldn't meet any obstacle declaring packages with non-ASCII names.
That you can type arabic and chinese and I can type hangul
and most latin characters doesn't mean names having all of the above
will be accessible to either of us or a third person.

On 2023-08-24 at 10:41+03:00, MSavoritias wrote:
> Regarding the initial question it was about package names to my
> understanding. Specifically package names in the store to use unicode
> characters. Which makes perfect sense there because some packages dont
> use ascii names.

It does, but as said before, whether this is desireable depends
on the target audience.  The purpose of API is to be used,
i.e. it would be useless if even just one user can't type it.


On 2023-08-24 at 10:41+03:00, MSavoritias wrote:
> Regarding the broken install example, most (all?) base
> packages use ASCII due to unix historical baggage.
> So you shouldn't need to type anything non ASCII
> to fix an install with only basic packages.

Due to historical baggage, most (all?) keyboard layouts can fall
back to ASCII alphanumerics.  A broken install was given
as the worst case; there's no reason any other packages
should be less accessible based on the users' culture.

I suggest, in an international context such as GuixSD,
for every package to have a ASCII name.  It'd of course
be better if a correctly written name is also available.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 248 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Relaxing the restrictions for store item names
  2023-08-24  8:10         ` Nguyễn Gia Phong via Development of GNU Guix and the GNU System distribution.
@ 2023-08-24  8:18           ` MSavoritias
  2023-08-24  8:41             ` Msavoritias
  0 siblings, 1 reply; 24+ messages in thread
From: MSavoritias @ 2023-08-24  8:18 UTC (permalink / raw)
  To: Nguyễn Gia Phong; +Cc: guix-devel


Nguyễn Gia Phong <cnx@loang.net> writes:

> [[PGP Signed Part:Undecided]]
> On 2023-08-24 at 10:41+03:00, MSavoritias wrote:
>> Nguyễn Gia Phong <cnx@loang.net> writes:
>> > I think the distinction must be made here between Guix and GuixSD.
>> >
>> > The packaging software should support full localization,
>> > but the distro should target the least common denominator.
>>
>> Depends what do we mean the "distro" here.
>> If I can pick arabic or chinese in the installation as a display
>> language and also I am able to use an arabic/chinese keyboard sounds
>> good to me.
>
> I meant GuixSD.  I agree a distribution based on Guix Systems
> shouldn't meet any obstacle declaring packages with non-ASCII names.
> That you can type arabic and chinese and I can type hangul
> and most latin characters doesn't mean names having all of the above
> will be accessible to either of us or a third person.
>
> On 2023-08-24 at 10:41+03:00, MSavoritias wrote:
>> Regarding the initial question it was about package names to my
>> understanding. Specifically package names in the store to use unicode
>> characters. Which makes perfect sense there because some packages dont
>> use ascii names.
>
> It does, but as said before, whether this is desireable depends
> on the target audience.  The purpose of API is to be used,
> i.e. it would be useless if even just one user can't type it.
>
Well we already have that don't we? What I mean is that ASCII names cant
be typed by all keyboards layouts easily. So what you are saying already
happens. Thats why I always have an ASCII layout available as a
secondary, next to my non ASCII. I bet every person that uses packages
with names other than english can add a seperate layout.

> On 2023-08-24 at 10:41+03:00, MSavoritias wrote:
>> Regarding the broken install example, most (all?) base
>> packages use ASCII due to unix historical baggage.
>> So you shouldn't need to type anything non ASCII
>> to fix an install with only basic packages.
>
> Due to historical baggage, most (all?) keyboard layouts can fall
> back to ASCII alphanumerics.  A broken install was given
> as the worst case; there's no reason any other packages
> should be less accessible based on the users' culture.
>

But they are already aren't they? Because if I want to add a package
with the Greek alphabet or the Japanese one I have to transliterate it
into ASCII which is always going to be worse and people won't be able to
find the package. Because they won't know we changed the name. Plus they
will have to change the layout. Same as an ASCII user would have to do.

> I suggest, in an international context such as GuixSD,
> for every package to have a ASCII name.  It'd of course
> be better if a correctly written name is also available.
>

So you propose two names? Sure if that can be done I don't see why not. Either way not
having unicode names is a bug. Also to note: Most of the world speaks
Unicode. So its more for compatibility purposes i guess (?) rather than
to be "international".

MSavoritias



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Relaxing the restrictions for store item names
  2023-08-24  8:18           ` MSavoritias
@ 2023-08-24  8:41             ` Msavoritias
  2023-08-24 10:21               ` Julien Lepiller
  0 siblings, 1 reply; 24+ messages in thread
From: Msavoritias @ 2023-08-24  8:41 UTC (permalink / raw)
  To: MSavoritias; +Cc: Nguyễn Gia Phong, guix-devel


What I am saying here is that:
Its easy to see from our very US centric tech culture why everybody
should just use ASCII because "This is how it is". But there is very
little reasons why we shouldn't strive to be more inclusive of all
cultures.
Especially since nowadays where we have tools like Unicode that make our
lives easier compared to US or nothing of 30-40 years ago.
Just imagine how many good programmers we are missing because they don't
want/can't learn English or don't have an ASCII keyboard.

MSavoritias

MSavoritias <email@msavoritias.me> writes:

> Nguyễn Gia Phong <cnx@loang.net> writes:
>
>> [[PGP Signed Part:Undecided]]
>> On 2023-08-24 at 10:41+03:00, MSavoritias wrote:
>>> Nguyễn Gia Phong <cnx@loang.net> writes:
>>> > I think the distinction must be made here between Guix and GuixSD.
>>> >
>>> > The packaging software should support full localization,
>>> > but the distro should target the least common denominator.
>>>
>>> Depends what do we mean the "distro" here.
>>> If I can pick arabic or chinese in the installation as a display
>>> language and also I am able to use an arabic/chinese keyboard sounds
>>> good to me.
>>
>> I meant GuixSD.  I agree a distribution based on Guix Systems
>> shouldn't meet any obstacle declaring packages with non-ASCII names.
>> That you can type arabic and chinese and I can type hangul
>> and most latin characters doesn't mean names having all of the above
>> will be accessible to either of us or a third person.
>>
>> On 2023-08-24 at 10:41+03:00, MSavoritias wrote:
>>> Regarding the initial question it was about package names to my
>>> understanding. Specifically package names in the store to use unicode
>>> characters. Which makes perfect sense there because some packages dont
>>> use ascii names.
>>
>> It does, but as said before, whether this is desireable depends
>> on the target audience.  The purpose of API is to be used,
>> i.e. it would be useless if even just one user can't type it.
>>
> Well we already have that don't we? What I mean is that ASCII names cant
> be typed by all keyboards layouts easily. So what you are saying already
> happens. Thats why I always have an ASCII layout available as a
> secondary, next to my non ASCII. I bet every person that uses packages
> with names other than english can add a seperate layout.
>
>> On 2023-08-24 at 10:41+03:00, MSavoritias wrote:
>>> Regarding the broken install example, most (all?) base
>>> packages use ASCII due to unix historical baggage.
>>> So you shouldn't need to type anything non ASCII
>>> to fix an install with only basic packages.
>>
>> Due to historical baggage, most (all?) keyboard layouts can fall
>> back to ASCII alphanumerics.  A broken install was given
>> as the worst case; there's no reason any other packages
>> should be less accessible based on the users' culture.
>>
>
> But they are already aren't they? Because if I want to add a package
> with the Greek alphabet or the Japanese one I have to transliterate it
> into ASCII which is always going to be worse and people won't be able to
> find the package. Because they won't know we changed the name. Plus they
> will have to change the layout. Same as an ASCII user would have to do.
>
>> I suggest, in an international context such as GuixSD,
>> for every package to have a ASCII name.  It'd of course
>> be better if a correctly written name is also available.
>>
>
> So you propose two names? Sure if that can be done I don't see why not. Either way not
> having unicode names is a bug. Also to note: Most of the world speaks
> Unicode. So its more for compatibility purposes i guess (?) rather than
> to be "international".
>
> MSavoritias



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Relaxing the restrictions for store item names
  2023-08-24  8:41             ` Msavoritias
@ 2023-08-24 10:21               ` Julien Lepiller
  2023-08-24 10:36                 ` MSavoritias
  2023-08-27 15:27                 ` wolf
  0 siblings, 2 replies; 24+ messages in thread
From: Julien Lepiller @ 2023-08-24 10:21 UTC (permalink / raw)
  To: guix-devel, Msavoritias, MSavoritias; +Cc: Nguyễn Gia Phong

Le 24 août 2023 10:41:23 GMT+02:00, Msavoritias <email@msavoritias.me> a écrit :
>
>What I am saying here is that:
>Its easy to see from our very US centric tech culture why everybody
>should just use ASCII because "This is how it is". But there is very
>little reasons why we shouldn't strive to be more inclusive of all
>cultures.
>Especially since nowadays where we have tools like Unicode that make our
>lives easier compared to US or nothing of 30-40 years ago.
>Just imagine how many good programmers we are missing because they don't
>want/can't learn English or don't have an ASCII keyboard.
>
>MSavoritias
>
>MSavoritias <email@msavoritias.me> writes:
>
>> Nguyễn Gia Phong <cnx@loang.net> writes:
>>
>>> [[PGP Signed Part:Undecided]]
>>> On 2023-08-24 at 10:41+03:00, MSavoritias wrote:
>>>> Nguyễn Gia Phong <cnx@loang.net> writes:
>>>> > I think the distinction must be made here between Guix and GuixSD.
>>>> >
>>>> > The packaging software should support full localization,
>>>> > but the distro should target the least common denominator.
>>>>
>>>> Depends what do we mean the "distro" here.
>>>> If I can pick arabic or chinese in the installation as a display
>>>> language and also I am able to use an arabic/chinese keyboard sounds
>>>> good to me.
>>>
>>> I meant GuixSD.  I agree a distribution based on Guix Systems
>>> shouldn't meet any obstacle declaring packages with non-ASCII names.
>>> That you can type arabic and chinese and I can type hangul
>>> and most latin characters doesn't mean names having all of the above
>>> will be accessible to either of us or a third person.
>>>
>>> On 2023-08-24 at 10:41+03:00, MSavoritias wrote:
>>>> Regarding the initial question it was about package names to my
>>>> understanding. Specifically package names in the store to use unicode
>>>> characters. Which makes perfect sense there because some packages dont
>>>> use ascii names.
>>>
>>> It does, but as said before, whether this is desireable depends
>>> on the target audience.  The purpose of API is to be used,
>>> i.e. it would be useless if even just one user can't type it.
>>>
>> Well we already have that don't we? What I mean is that ASCII names cant
>> be typed by all keyboards layouts easily. So what you are saying already
>> happens. Thats why I always have an ASCII layout available as a
>> secondary, next to my non ASCII. I bet every person that uses packages
>> with names other than english can add a seperate layout.
>>
>>> On 2023-08-24 at 10:41+03:00, MSavoritias wrote:
>>>> Regarding the broken install example, most (all?) base
>>>> packages use ASCII due to unix historical baggage.
>>>> So you shouldn't need to type anything non ASCII
>>>> to fix an install with only basic packages.
>>>
>>> Due to historical baggage, most (all?) keyboard layouts can fall
>>> back to ASCII alphanumerics.  A broken install was given
>>> as the worst case; there's no reason any other packages
>>> should be less accessible based on the users' culture.
>>>
>>
>> But they are already aren't they? Because if I want to add a package
>> with the Greek alphabet or the Japanese one I have to transliterate it
>> into ASCII which is always going to be worse and people won't be able to
>> find the package. Because they won't know we changed the name. Plus they
>> will have to change the layout. Same as an ASCII user would have to do.
>>
>>> I suggest, in an international context such as GuixSD,
>>> for every package to have a ASCII name.  It'd of course
>>> be better if a correctly written name is also available.
>>>
>>
>> So you propose two names? Sure if that can be done I don't see why not. Either way not
>> having unicode names is a bug. Also to note: Most of the world speaks
>> Unicode. So its more for compatibility purposes i guess (?) rather than
>> to be "international".
>>
>> MSavoritias
>
>

There are two things discussed here:

1. A restriction in the daemon prevents using unicode in store item names.

I think this is an issue worth fixing, as it would allow users to define their own store items more easily. For instance, I might want to make a file with non-ascii name a file-like item, eg.

(local-file "fond d'écran.jpg")

2. Naming policy for packages in the Guix channel

I don't think we should distribute packages that have non-ascii characters in their names. Of course I don't know all keyboards that exist out there, but I don't think you can find a programmer that can't type an ascii character, or a guix user that can't at least type "guix" in their terminal.

For discoverability, we could add the real non-ascii name in the package description.


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Relaxing the restrictions for store item names
  2023-08-22  6:49 Relaxing the restrictions for store item names Eidvilas Markevičius
  2023-08-23 19:04 ` jbranso
  2023-08-24  6:56 ` (
@ 2023-08-24 10:33 ` Simon Tournier
  2023-08-24 16:30 ` Kaelyn
  2023-09-02 20:02 ` Eidvilas Markevičius
  4 siblings, 0 replies; 24+ messages in thread
From: Simon Tournier @ 2023-08-24 10:33 UTC (permalink / raw)
  To: Eidvilas Markevičius, guix-devel

Hi,

On Tue, 22 Aug 2023 at 09:49, Eidvilas Markevičius <markeviciuseidvilas@gmail.com> wrote:

> Therefore, my proposal is to relax these limitations as much as
> possible (or at least somewhat) and to allow some more freedom when it
> comes to naming packages and other kinds of items in the store. We
> could, of course, still disallow all the main problematic characters,
> such as NUL, /, $, ~, space, newline and a few others, but other than
> that, I don't see any reason to forbid any of the remaining ones from
> being used.

Well, we could imagine to un-correlate package name and store path.
Other said, we could have a map from fancy characters to regular letter
already accepted as store path.

Hum, I have mixed feelings about fancy characters because they are often
painful to type.  For instance, I am French-speaking but using a UK
qwerty layout then cedilla (used in the words ça or façon or etc) is not
part of the layout so it’s painful to type because I have to rely on
another method than the usual typing.  Well, usually I type the regular
word using a regular c and then apply one spellchecker and I use Emacs
with ’C-x 8 RET cedilla’.  Even, when I am using XTerm or connected to
remote machine using plain TTY, I do not know how I could type a package
name with cedilla.

Another example is about Julia.  Some Julia packages support
mathematical Unicode notation or even emojis.  Typing them is painful
depending on your editor.  And rendering them can also be painful.  For
instance, how do you type ℝ?  Even when knowing LaTeX.

Well, I guess the proposal is about the support of non-latin alphabet.
My point is that some non-latin alphabets appears to me as exotic as
mathematical symbols when I have to type them.


Cheers,
simon



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Relaxing the restrictions for store item names
  2023-08-24 10:21               ` Julien Lepiller
@ 2023-08-24 10:36                 ` MSavoritias
  2023-08-24 19:38                   ` (
  2023-08-27 15:27                 ` wolf
  1 sibling, 1 reply; 24+ messages in thread
From: MSavoritias @ 2023-08-24 10:36 UTC (permalink / raw)
  To: Julien Lepiller; +Cc: guix-devel, Nguyễn Gia Phong


Julien Lepiller <julien@lepiller.eu> writes:

> Le 24 août 2023 10:41:23 GMT+02:00, Msavoritias <email@msavoritias.me> a écrit :
>>
>>What I am saying here is that:
>>Its easy to see from our very US centric tech culture why everybody
>>should just use ASCII because "This is how it is". But there is very
>>little reasons why we shouldn't strive to be more inclusive of all
>>cultures.
>>Especially since nowadays where we have tools like Unicode that make our
>>lives easier compared to US or nothing of 30-40 years ago.
>>Just imagine how many good programmers we are missing because they don't
>>want/can't learn English or don't have an ASCII keyboard.
>>
>>MSavoritias
>>
>>MSavoritias <email@msavoritias.me> writes:
>>
>>> Nguyễn Gia Phong <cnx@loang.net> writes:
>>>
>>>> [[PGP Signed Part:Undecided]]
>>>> On 2023-08-24 at 10:41+03:00, MSavoritias wrote:
>>>>> Nguyễn Gia Phong <cnx@loang.net> writes:
>>>>> > I think the distinction must be made here between Guix and GuixSD.
>>>>> >
>>>>> > The packaging software should support full localization,
>>>>> > but the distro should target the least common denominator.
>>>>>
>>>>> Depends what do we mean the "distro" here.
>>>>> If I can pick arabic or chinese in the installation as a display
>>>>> language and also I am able to use an arabic/chinese keyboard sounds
>>>>> good to me.
>>>>
>>>> I meant GuixSD.  I agree a distribution based on Guix Systems
>>>> shouldn't meet any obstacle declaring packages with non-ASCII names.
>>>> That you can type arabic and chinese and I can type hangul
>>>> and most latin characters doesn't mean names having all of the above
>>>> will be accessible to either of us or a third person.
>>>>
>>>> On 2023-08-24 at 10:41+03:00, MSavoritias wrote:
>>>>> Regarding the initial question it was about package names to my
>>>>> understanding. Specifically package names in the store to use unicode
>>>>> characters. Which makes perfect sense there because some packages dont
>>>>> use ascii names.
>>>>
>>>> It does, but as said before, whether this is desireable depends
>>>> on the target audience.  The purpose of API is to be used,
>>>> i.e. it would be useless if even just one user can't type it.
>>>>
>>> Well we already have that don't we? What I mean is that ASCII names cant
>>> be typed by all keyboards layouts easily. So what you are saying already
>>> happens. Thats why I always have an ASCII layout available as a
>>> secondary, next to my non ASCII. I bet every person that uses packages
>>> with names other than english can add a seperate layout.
>>>
>>>> On 2023-08-24 at 10:41+03:00, MSavoritias wrote:
>>>>> Regarding the broken install example, most (all?) base
>>>>> packages use ASCII due to unix historical baggage.
>>>>> So you shouldn't need to type anything non ASCII
>>>>> to fix an install with only basic packages.
>>>>
>>>> Due to historical baggage, most (all?) keyboard layouts can fall
>>>> back to ASCII alphanumerics.  A broken install was given
>>>> as the worst case; there's no reason any other packages
>>>> should be less accessible based on the users' culture.
>>>>
>>>
>>> But they are already aren't they? Because if I want to add a package
>>> with the Greek alphabet or the Japanese one I have to transliterate it
>>> into ASCII which is always going to be worse and people won't be able to
>>> find the package. Because they won't know we changed the name. Plus they
>>> will have to change the layout. Same as an ASCII user would have to do.
>>>
>>>> I suggest, in an international context such as GuixSD,
>>>> for every package to have a ASCII name.  It'd of course
>>>> be better if a correctly written name is also available.
>>>>
>>>
>>> So you propose two names? Sure if that can be done I don't see why not. Either way not
>>> having unicode names is a bug. Also to note: Most of the world speaks
>>> Unicode. So its more for compatibility purposes i guess (?) rather than
>>> to be "international".
>>>
>>> MSavoritias
>>
>>
>
> There are two things discussed here:
>
> 1. A restriction in the daemon prevents using unicode in store item names.
>
> I think this is an issue worth fixing, as it would allow users to define their own store items more easily. For instance, I might want to make a file with non-ascii name a file-like item, eg.
>
> (local-file "fond d'écran.jpg")
>
> 2. Naming policy for packages in the Guix channel
>
> I don't think we should distribute packages that have non-ascii
> characters in their names. Of course I don't know all keyboards that
> exist out there, but I don't think you can find a programmer that
> can't type an ascii character, or a guix user that can't at least type
> "guix" in their terminal.
>
> For discoverability, we could add the real non-ascii name in the package description.

Seems like a good solution for both cases.
I agree that it would help with searching especially to have the
non-ascii name in the description. or maybe as alternative translation
in the name (?). Probably description is the easiest one though.

MSavoritias


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Relaxing the restrictions for store item names
  2023-08-22  6:49 Relaxing the restrictions for store item names Eidvilas Markevičius
                   ` (2 preceding siblings ...)
  2023-08-24 10:33 ` Simon Tournier
@ 2023-08-24 16:30 ` Kaelyn
  2023-08-24 18:26   ` Eidvilas Markevičius
  2023-09-02 20:02 ` Eidvilas Markevičius
  4 siblings, 1 reply; 24+ messages in thread
From: Kaelyn @ 2023-08-24 16:30 UTC (permalink / raw)
  To: Eidvilas Markevičius; +Cc: guix-devel

Hi,

On Tuesday, August 22nd, 2023 at 6:49 AM, Eidvilas Markevičius
<markeviciuseidvilas@gmail.com> wrote:

> Therefore, my proposal is to relax these limitations as much as
> possible (or at least somewhat) and to allow some more freedom when it
> comes to naming packages and other kinds of items in the store. We
> could, of course, still disallow all the main problematic characters,
> such as NUL, /, $, ~, space, newline and a few others, but other than
> that, I don't see any reason to forbid any of the remaining ones from
> being used.

While I don't really have an opinion on the matter aside from the biases
of growing up in the US, one non-trivial issue with Unicode store paths
and package names which hasn't been mentioned is that of Unicode
equivalence[1], particularly homographs[2]. For example U+0061 and U+0430
(the Latin and Cyrillic small letter "a", respectively) are often visually
identical but programmatically distinct. If not handled well, it could
lead to untypable package or store names by virtue of the user having to
guess which Unicode code point(s) is/are the correct one(s) for a certain
visual glyph.

Cheers,
Kaelyn

[1] https://en.wikipedia.org/wiki/Unicode_equivalence
[2] https://en.wikipedia.org/wiki/IDN_homograph_attack


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Relaxing the restrictions for store item names
  2023-08-24 16:30 ` Kaelyn
@ 2023-08-24 18:26   ` Eidvilas Markevičius
  0 siblings, 0 replies; 24+ messages in thread
From: Eidvilas Markevičius @ 2023-08-24 18:26 UTC (permalink / raw)
  To: Kaelyn; +Cc: guix-devel

I guess that's true, but I very much doubt errors like this would come
up very often. Out of precaution, we could make guix lint issue us a
warning whenever a non-ASCII character is detected in a package name
or elsewhere. This would lower the chances of such oversights
occurring even more.

On Thu, Aug 24, 2023 at 7:30 PM Kaelyn <kaelyn.alexi@protonmail.com> wrote:
>
> Hi,
>
> On Tuesday, August 22nd, 2023 at 6:49 AM, Eidvilas Markevičius
> <markeviciuseidvilas@gmail.com> wrote:
>
> > Therefore, my proposal is to relax these limitations as much as
> > possible (or at least somewhat) and to allow some more freedom when it
> > comes to naming packages and other kinds of items in the store. We
> > could, of course, still disallow all the main problematic characters,
> > such as NUL, /, $, ~, space, newline and a few others, but other than
> > that, I don't see any reason to forbid any of the remaining ones from
> > being used.
>
> While I don't really have an opinion on the matter aside from the biases
> of growing up in the US, one non-trivial issue with Unicode store paths
> and package names which hasn't been mentioned is that of Unicode
> equivalence[1], particularly homographs[2]. For example U+0061 and U+0430
> (the Latin and Cyrillic small letter "a", respectively) are often visually
> identical but programmatically distinct. If not handled well, it could
> lead to untypable package or store names by virtue of the user having to
> guess which Unicode code point(s) is/are the correct one(s) for a certain
> visual glyph.
>
> Cheers,
> Kaelyn
>
> [1] https://en.wikipedia.org/wiki/Unicode_equivalence
> [2] https://en.wikipedia.org/wiki/IDN_homograph_attack


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Relaxing the restrictions for store item names
  2023-08-24 10:36                 ` MSavoritias
@ 2023-08-24 19:38                   ` (
  0 siblings, 0 replies; 24+ messages in thread
From: ( @ 2023-08-24 19:38 UTC (permalink / raw)
  To: MSavoritias; +Cc: Julien Lepiller, Nguyễn Gia Phong, guix-devel

MSavoritias <email@msavoritias.me> writes:
>> I don't think we should distribute packages that have non-ascii
>> characters in their names. Of course I don't know all keyboards that
>> exist out there, but I don't think you can find a programmer that
>> can't type an ascii character, or a guix user that can't at least type
>> "guix" in their terminal.

This is what I meant :)

> Seems like a good solution for both cases.
> I agree that it would help with searching especially to have the
> non-ascii name in the description. or maybe as alternative translation
> in the name (?). Probably description is the easiest one though.

I believe you can already do this:

  (package (name "foo" ... (properties '((upstream-name . "fôó")))))

  -- (


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Relaxing the restrictions for store item names
@ 2023-08-25  8:37 Nathan Dehnel
  2023-08-25  9:14 ` Eidvilas Markevičius
  0 siblings, 1 reply; 24+ messages in thread
From: Nathan Dehnel @ 2023-08-25  8:37 UTC (permalink / raw)
  To: markeviciuseidvilas, guix-devel

What you could do is implement percent encoding:
https://en.wikipedia.org/wiki/Percent-encoding
-Allows you to store package titles in any language in an encoded form
-Allows the titles to be typed on latin keyboards
-Allows the packages to be accessed through URIs in the future without
causing problems


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Relaxing the restrictions for store item names
  2023-08-25  8:37 Nathan Dehnel
@ 2023-08-25  9:14 ` Eidvilas Markevičius
  2023-08-25 14:01   ` Eidvilas Markevičius
  0 siblings, 1 reply; 24+ messages in thread
From: Eidvilas Markevičius @ 2023-08-25  9:14 UTC (permalink / raw)
  To: Nathan Dehnel; +Cc: guix-devel

On Fri, Aug 25, 2023 at 11:37 AM Nathan Dehnel <ncdehnel@gmail.com> wrote:
>
> What you could do is implement percent encoding:
> https://en.wikipedia.org/wiki/Percent-encoding
> -Allows you to store package titles in any language in an encoded form
> -Allows the titles to be typed on latin keyboards
> -Allows the packages to be accessed through URIs in the future without
> causing problems

Now that's an idea. I didn't really thought of that. Although it'd
probably be trickier to implement in order to make all the tooling
compatible. I think that might be a good solution nonetheless.


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Relaxing the restrictions for store item names
  2023-08-25  9:14 ` Eidvilas Markevičius
@ 2023-08-25 14:01   ` Eidvilas Markevičius
  2023-08-25 16:32     ` Kaelyn
  0 siblings, 1 reply; 24+ messages in thread
From: Eidvilas Markevičius @ 2023-08-25 14:01 UTC (permalink / raw)
  To: Nathan Dehnel; +Cc: guix-devel

Although now, just a few hours later, I'm having second thoughts on
this. When you really think about it, it's very unlinkely that some
user would prefer typing something like

guix install %D0%B8%D0%BC%D0%B0%D0%B3%D0%B8%D0%BD%D0%B0%D1%80%D0%B8-%D0%BF%D1%80%D0%BE%D0%B3%D1%80%D0%B0%D0%BC

over

guix install имагинари-програм

even if they don't have the russian (or whatever other language)
keyboard layout set up on their system, so just for accessability
purposes, the solution wouldn't be all that great. It would also make
store name unnecessarily long (they're already long as is), and
there's a 255 char limit for filenames that we have to keep in mind as
well. Searching the store using standard utilities such as find and
grep would too, as a consequence, break... There's just too many
problems with this.

I believe what Julien proposed is the most reasonable solution:
unrestrict unicode characters in the store and (maybe) make it a
project policy to not put unicode characters inside package names
(however, personally I wouldn't be against that either).

Now ensuring that URIs don't break, especially for substitute
provision, should also be taken into consideration, but this can be
handled separately.

On Fri, Aug 25, 2023 at 12:14 PM Eidvilas Markevičius
<markeviciuseidvilas@gmail.com> wrote:
>
> On Fri, Aug 25, 2023 at 11:37 AM Nathan Dehnel <ncdehnel@gmail.com> wrote:
> >
> > What you could do is implement percent encoding:
> > https://en.wikipedia.org/wiki/Percent-encoding
> > -Allows you to store package titles in any language in an encoded form
> > -Allows the titles to be typed on latin keyboards
> > -Allows the packages to be accessed through URIs in the future without
> > causing problems
>
> Now that's an idea. I didn't really thought of that. Although it'd
> probably be trickier to implement in order to make all the tooling
> compatible. I think that might be a good solution nonetheless.


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Relaxing the restrictions for store item names
  2023-08-25 14:01   ` Eidvilas Markevičius
@ 2023-08-25 16:32     ` Kaelyn
  2023-08-25 17:44       ` Eidvilas Markevičius
  2023-08-25 18:14       ` Saku Laesvuori
  0 siblings, 2 replies; 24+ messages in thread
From: Kaelyn @ 2023-08-25 16:32 UTC (permalink / raw)
  To: Eidvilas Markevičius; +Cc: Nathan Dehnel, guix-devel

Hi,

A couple of small early-morning (for me) comments below... not for or against the idea of percent encoding, but as a little bit of food for thought while pondering how to handle Unicode in package names and/or store paths.

On Friday, August 25th, 2023 at 2:01 PM, Eidvilas Markevičius <markeviciuseidvilas@gmail.com> wrote:

> Although now, just a few hours later, I'm having second thoughts on
> this. When you really think about it, it's very unlinkely that some
> user would prefer typing something like
> 
> guix install %D0%B8%D0%BC%D0%B0%D0%B3%D0%B8%D0%BD%D0%B0%D1%80%D0%B8-%D0%BF%D1%80%D0%BE%D0%B3%D1%80%D0%B0%D0%BC
> 
> over
> 
> guix install имагинари-програм

I imagine that, for usability, the percent encoding (or other encoding or transliteration) of non-ASCII characters could be handled transparently, i.e. for "guix install имагинари-програм", guix would translate "имагинари-програм" to the encoded form for operations. And if the escape character (e.g. the "%" in percent encoding) isn't also a valid character for store or package names then the values can be handled transparently. For example, both "guix install git" and "guix install %67%69%74" and "guix install g%69t" would all install git.

> even if they don't have the russian (or whatever other language)
> keyboard layout set up on their system, so just for accessability
> purposes, the solution wouldn't be all that great.

> It would also make
> store name unnecessarily long (they're already long as is), and
> there's a 255 char limit for filenames that we have to keep in mind as
> well. Searching the store using standard utilities such as find and
> grep would too, as a consequence,

I split out the quote above as a bit of reference. While I agree that we have to keep in mind the 255 char limit for filenames, with percent encoding causing a single byte in ASCII or UTF-8 to become ~3 bytes (with iirc most non-latin characters having multi-byte encodings in UTF-8) and the store hashes being a 33 byte prefix (counting the dash), 255 chars is still quite a bit. Specifically, the extracted quote above--without the "> " prefixes and with line breaks treated as single characters--is exactly 255 characters. (I find a bit of readable text to be helpful for wrapping my brain around a value like "255 characters".)

Cheers,
Kaelyn

> break... There's just too many
> problems with this.
> 
> I believe what Julien proposed is the most reasonable solution:
> unrestrict unicode characters in the store and (maybe) make it a
> project policy to not put unicode characters inside package names
> (however, personally I wouldn't be against that either).
> 
> Now ensuring that URIs don't break, especially for substitute
> provision, should also be taken into consideration, but this can be
> handled separately.
> 
> On Fri, Aug 25, 2023 at 12:14 PM Eidvilas Markevičius
> markeviciuseidvilas@gmail.com wrote:
> 
> > On Fri, Aug 25, 2023 at 11:37 AM Nathan Dehnel ncdehnel@gmail.com wrote:
> > 
> > > What you could do is implement percent encoding:
> > > https://en.wikipedia.org/wiki/Percent-encoding
> > > -Allows you to store package titles in any language in an encoded form
> > > -Allows the titles to be typed on latin keyboards
> > > -Allows the packages to be accessed through URIs in the future without
> > > causing problems
> > 
> > Now that's an idea. I didn't really thought of that. Although it'd
> > probably be trickier to implement in order to make all the tooling
> > compatible. I think that might be a good solution nonetheless.


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Relaxing the restrictions for store item names
  2023-08-25 16:32     ` Kaelyn
@ 2023-08-25 17:44       ` Eidvilas Markevičius
  2023-08-25 18:14       ` Saku Laesvuori
  1 sibling, 0 replies; 24+ messages in thread
From: Eidvilas Markevičius @ 2023-08-25 17:44 UTC (permalink / raw)
  To: Kaelyn; +Cc: Nathan Dehnel, guix-devel

Well, what I realized right now is that this sort of "transparency"
may not even have to be handled by guix at all. If we remember the
fact that we're on a unix-based system, a user who really wants to
install some piece of software with a unicode name, but doesn't know
how to type the requisite characters could always use the help of an
external program to do transliteration to another alphabet for him
(e.g., translit from the perl-lingua-translit package):

guix install $(echo imaginari-program | translit -t "ISO 9" -r)

On Fri, Aug 25, 2023 at 7:32 PM Kaelyn <kaelyn.alexi@protonmail.com> wrote:
>
> Hi,
>
> A couple of small early-morning (for me) comments below... not for or against the idea of percent encoding, but as a little bit of food for thought while pondering how to handle Unicode in package names and/or store paths.
>
> On Friday, August 25th, 2023 at 2:01 PM, Eidvilas Markevičius <markeviciuseidvilas@gmail.com> wrote:
>
> > Although now, just a few hours later, I'm having second thoughts on
> > this. When you really think about it, it's very unlinkely that some
> > user would prefer typing something like
> >
> > guix install %D0%B8%D0%BC%D0%B0%D0%B3%D0%B8%D0%BD%D0%B0%D1%80%D0%B8-%D0%BF%D1%80%D0%BE%D0%B3%D1%80%D0%B0%D0%BC
> >
> > over
> >
> > guix install имагинари-програм
>
> I imagine that, for usability, the percent encoding (or other encoding or transliteration) of non-ASCII characters could be handled transparently, i.e. for "guix install имагинари-програм", guix would translate "имагинари-програм" to the encoded form for operations. And if the escape character (e.g. the "%" in percent encoding) isn't also a valid character for store or package names then the values can be handled transparently. For example, both "guix install git" and "guix install %67%69%74" and "guix install g%69t" would all install git.
>
> > even if they don't have the russian (or whatever other language)
> > keyboard layout set up on their system, so just for accessability
> > purposes, the solution wouldn't be all that great.
>
> > It would also make
> > store name unnecessarily long (they're already long as is), and
> > there's a 255 char limit for filenames that we have to keep in mind as
> > well. Searching the store using standard utilities such as find and
> > grep would too, as a consequence,
>
> I split out the quote above as a bit of reference. While I agree that we have to keep in mind the 255 char limit for filenames, with percent encoding causing a single byte in ASCII or UTF-8 to become ~3 bytes (with iirc most non-latin characters having multi-byte encodings in UTF-8) and the store hashes being a 33 byte prefix (counting the dash), 255 chars is still quite a bit. Specifically, the extracted quote above--without the "> " prefixes and with line breaks treated as single characters--is exactly 255 characters. (I find a bit of readable text to be helpful for wrapping my brain around a value like "255 characters".)
>
> Cheers,
> Kaelyn
>
> > break... There's just too many
> > problems with this.
> >
> > I believe what Julien proposed is the most reasonable solution:
> > unrestrict unicode characters in the store and (maybe) make it a
> > project policy to not put unicode characters inside package names
> > (however, personally I wouldn't be against that either).
> >
> > Now ensuring that URIs don't break, especially for substitute
> > provision, should also be taken into consideration, but this can be
> > handled separately.
> >
> > On Fri, Aug 25, 2023 at 12:14 PM Eidvilas Markevičius
> > markeviciuseidvilas@gmail.com wrote:
> >
> > > On Fri, Aug 25, 2023 at 11:37 AM Nathan Dehnel ncdehnel@gmail.com wrote:
> > >
> > > > What you could do is implement percent encoding:
> > > > https://en.wikipedia.org/wiki/Percent-encoding
> > > > -Allows you to store package titles in any language in an encoded form
> > > > -Allows the titles to be typed on latin keyboards
> > > > -Allows the packages to be accessed through URIs in the future without
> > > > causing problems
> > >
> > > Now that's an idea. I didn't really thought of that. Although it'd
> > > probably be trickier to implement in order to make all the tooling
> > > compatible. I think that might be a good solution nonetheless.


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Relaxing the restrictions for store item names
  2023-08-25 16:32     ` Kaelyn
  2023-08-25 17:44       ` Eidvilas Markevičius
@ 2023-08-25 18:14       ` Saku Laesvuori
  1 sibling, 0 replies; 24+ messages in thread
From: Saku Laesvuori @ 2023-08-25 18:14 UTC (permalink / raw)
  To: Kaelyn; +Cc: Eidvilas Markevičius, Nathan Dehnel, guix-devel

[-- Attachment #1: Type: text/plain, Size: 3733 bytes --]

> > Although now, just a few hours later, I'm having second thoughts on
> > this. When you really think about it, it's very unlinkely that some
> > user would prefer typing something like
> > 
> > guix install %D0%B8%D0%BC%D0%B0%D0%B3%D0%B8%D0%BD%D0%B0%D1%80%D0%B8-%D0%BF%D1%80%D0%BE%D0%B3%D1%80%D0%B0%D0%BC
> > 
> > over
> > 
> > guix install имагинари-програм
> 
> I imagine that, for usability, the percent encoding (or other encoding
> or transliteration) of non-ASCII characters could be handled
> transparently, i.e. for "guix install имагинари-програм", guix would
> translate "имагинари-програм" to the encoded form for operations. And
> if the escape character (e.g. the "%" in percent encoding) isn't also
> a valid character for store or package names then the values can be
> handled transparently. For example, both "guix install git" and "guix
> install %67%69%74" and "guix install g%69t" would all install git.
>
> > [...]
>
> > It would also make
> > store name unnecessarily long (they're already long as is), and
> > there's a 255 char limit for filenames that we have to keep in mind as
> > well. Searching the store using standard utilities such as find and
> > grep would too, as a consequence,
> 
> I split out the quote above as a bit of reference. While I agree that
> we have to keep in mind the 255 char limit for filenames, with percent
> encoding causing a single byte in ASCII or UTF-8 to become ~3 bytes
> (with iirc most non-latin characters having multi-byte encodings in
> UTF-8) and the store hashes being a 33 byte prefix (counting the
> dash), 255 chars is still quite a bit. Specifically, the extracted
> quote above--without the "> " prefixes and with line breaks treated as
> single characters--is exactly 255 characters. (I find a bit of
> readable text to be helpful for wrapping my brain around a value like
> "255 characters".)
>
> > break... There's just too many problems with this.

The encoding could also be transparent in the other direction so the
percent encoded form would be usable on the command line (in addition to
the UTF-8 one, of course), but guix would translate it to UTF-8 for
operations. This would allow typing all package names with only ascii
characters but still keep the store readable and grepable. There are
most likely simple utility programs that can decode precent encoding, so
the store is also grepable with only ascii characters. 

There is no reason (that I can see) not to allow UTF-8 in the store
paths, other than it being hard to type with a keyboard for a different
locale. But how often do people actually want to type store paths by
hand? I at least avoid it at all times possible by using $(guix build ...), 
$(herd configuration ...), $(realpath /var/guix/profiles/...) etc.
Even when recovering a broken system the only store path you really need to
type is that of a working guix (and /var/guix/profiles/... probably also
works in a broken system).

> > even if they don't have the russian (or whatever other language)
> > keyboard layout set up on their system, so just for accessability
> > purposes, the solution wouldn't be all that great.

I agree. It is really annyoing and hard to write percent encoding by
hand, so this doesn't really solve the issue of UTF-8 being hard to
write with an ASCII keyboard.

Maybe some sort of fuzzy character matching could be used in guix search
instead of percent encoding. That way people could find the packages
even if they can't type the entire name and then use the name from guix
search (by copy-pasting or shell piping) to install it (or do whatever
operation they want to it).

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Relaxing the restrictions for store item names
  2023-08-24 10:21               ` Julien Lepiller
  2023-08-24 10:36                 ` MSavoritias
@ 2023-08-27 15:27                 ` wolf
  1 sibling, 0 replies; 24+ messages in thread
From: wolf @ 2023-08-27 15:27 UTC (permalink / raw)
  To: Julien Lepiller; +Cc: guix-devel, Msavoritias, Nguyễn Gia Phong

[-- Attachment #1: Type: text/plain, Size: 5254 bytes --]

On 2023-08-24 12:21:24 +0200, Julien Lepiller wrote:
> Le 24 août 2023 10:41:23 GMT+02:00, Msavoritias <email@msavoritias.me> a écrit :
> >
> >What I am saying here is that:
> >Its easy to see from our very US centric tech culture why everybody
> >should just use ASCII because "This is how it is". But there is very
> >little reasons why we shouldn't strive to be more inclusive of all
> >cultures.
> >Especially since nowadays where we have tools like Unicode that make our
> >lives easier compared to US or nothing of 30-40 years ago.
> >Just imagine how many good programmers we are missing because they don't
> >want/can't learn English or don't have an ASCII keyboard.
> >
> >MSavoritias
> >
> >MSavoritias <email@msavoritias.me> writes:
> >
> >> Nguyễn Gia Phong <cnx@loang.net> writes:
> >>
> >>> [[PGP Signed Part:Undecided]]
> >>> On 2023-08-24 at 10:41+03:00, MSavoritias wrote:
> >>>> Nguyễn Gia Phong <cnx@loang.net> writes:
> >>>> > I think the distinction must be made here between Guix and GuixSD.
> >>>> >
> >>>> > The packaging software should support full localization,
> >>>> > but the distro should target the least common denominator.
> >>>>
> >>>> Depends what do we mean the "distro" here.
> >>>> If I can pick arabic or chinese in the installation as a display
> >>>> language and also I am able to use an arabic/chinese keyboard sounds
> >>>> good to me.
> >>>
> >>> I meant GuixSD.  I agree a distribution based on Guix Systems
> >>> shouldn't meet any obstacle declaring packages with non-ASCII names.
> >>> That you can type arabic and chinese and I can type hangul
> >>> and most latin characters doesn't mean names having all of the above
> >>> will be accessible to either of us or a third person.
> >>>
> >>> On 2023-08-24 at 10:41+03:00, MSavoritias wrote:
> >>>> Regarding the initial question it was about package names to my
> >>>> understanding. Specifically package names in the store to use unicode
> >>>> characters. Which makes perfect sense there because some packages dont
> >>>> use ascii names.
> >>>
> >>> It does, but as said before, whether this is desireable depends
> >>> on the target audience.  The purpose of API is to be used,
> >>> i.e. it would be useless if even just one user can't type it.
> >>>
> >> Well we already have that don't we? What I mean is that ASCII names cant
> >> be typed by all keyboards layouts easily. So what you are saying already
> >> happens. Thats why I always have an ASCII layout available as a
> >> secondary, next to my non ASCII. I bet every person that uses packages
> >> with names other than english can add a seperate layout.
> >>
> >>> On 2023-08-24 at 10:41+03:00, MSavoritias wrote:
> >>>> Regarding the broken install example, most (all?) base
> >>>> packages use ASCII due to unix historical baggage.
> >>>> So you shouldn't need to type anything non ASCII
> >>>> to fix an install with only basic packages.
> >>>
> >>> Due to historical baggage, most (all?) keyboard layouts can fall
> >>> back to ASCII alphanumerics.  A broken install was given
> >>> as the worst case; there's no reason any other packages
> >>> should be less accessible based on the users' culture.
> >>>
> >>
> >> But they are already aren't they? Because if I want to add a package
> >> with the Greek alphabet or the Japanese one I have to transliterate it
> >> into ASCII which is always going to be worse and people won't be able to
> >> find the package. Because they won't know we changed the name. Plus they
> >> will have to change the layout. Same as an ASCII user would have to do.
> >>
> >>> I suggest, in an international context such as GuixSD,
> >>> for every package to have a ASCII name.  It'd of course
> >>> be better if a correctly written name is also available.
> >>>
> >>
> >> So you propose two names? Sure if that can be done I don't see why not. Either way not
> >> having unicode names is a bug. Also to note: Most of the world speaks
> >> Unicode. So its more for compatibility purposes i guess (?) rather than
> >> to be "international".
> >>
> >> MSavoritias
> >
> >
> 
> There are two things discussed here:
> 
> 1. A restriction in the daemon prevents using unicode in store item names.
> 
> I think this is an issue worth fixing, as it would allow users to define their own store items more easily. For instance, I might want to make a file with non-ascii name a file-like item, eg.
> 
> (local-file "fond d'écran.jpg")

Out of curiosity, do you have an idea how would the list of allowed characters
look like?  Anything except / and \0?  Or something more restrictive?

> 
> 2. Naming policy for packages in the Guix channel
> 
> I don't think we should distribute packages that have non-ascii characters in their names. Of course I don't know all keyboards that exist out there, but I don't think you can find a programmer that can't type an ascii character, or a guix user that can't at least type "guix" in their terminal.
> 
> For discoverability, we could add the real non-ascii name in the package description.
> 

-- 
There are only two hard things in Computer Science:
cache invalidation, naming things and off-by-one errors.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Relaxing the restrictions for store item names
  2023-08-22  6:49 Relaxing the restrictions for store item names Eidvilas Markevičius
                   ` (3 preceding siblings ...)
  2023-08-24 16:30 ` Kaelyn
@ 2023-09-02 20:02 ` Eidvilas Markevičius
  4 siblings, 0 replies; 24+ messages in thread
From: Eidvilas Markevičius @ 2023-09-02 20:02 UTC (permalink / raw)
  To: guix-devel

So I've being trying to tackle this issue by myself somewhat [1], but
very quickly got into a problem where, even when the restrictions
inside the store-api.cc are gotten rid of, this strange error appears
on an encounter of any non-ASCII character that I don't know how to
deal with:


$ guix build -L . test-ąčęėįšųū
Backtrace:
In srfi/srfi-1.scm:
   673:15 19 (append-map _ _ . _)
   586:17 18 (map1 ("x86_64-linux"))
In guix/scripts/build.scm:
   713:21 17 (_ _)
In guix/store.scm:
  1380:11 16 (map/accumulate-builds #<store-connection 256.99
7f00e9f62870> #<procedure 7f00d651dd70 at
guix/scripts/build.scm:714:43 (t-658ec5b154a5af8-181d)> _ #:cutoff _)
   1298:8 15 (call-with-build-handler #<procedure 7f00cd02ed80 at
guix/store.scm:1333:2 (continue store things mode)> _)
In guix/scripts/build.scm:
   672:18 14 (_ _)
In guix/store.scm:
  2168:25 13 (run-with-store #<store-connection 256.99 7f00e9f62870> _
#:guile-for-build _ #:system _ #:target _)
   1996:8 12 (_ _)
In guix/packages.scm:
  1970:11 11 (_ _)
In guix/store.scm:
  2040:38 10 (_ #<store-connection 256.99 7f00cd2de8c0>)
In guix/derivations.scm:
   833:24  9 (derivation #<store-connection 256.99 7f00cd2de8c0>
"test-ąčęėįšųū-0"
"/gnu/store/g8p09w6r78hhkl2rv1747pcp9zbk6fxv-guile-3.0.9/bin/guile"
("--no-auto-compile" "-L" "/gnu/s…" …) …)
   690:10  8 (derivation-hash _)
    677:5  7 (write-derivation _ #<output: string 7f00d5a3b930>)
    630:4  6 (write-string-list _)
In srfi/srfi-1.scm:
    634:9  5 (for-each #<procedure 7f00d66f4bc0 at
guix/derivations.scm:630:4 (item)>
("/gnu/store/14i2qkvlx6gi98akhwwk7bh26s710s35-test-ąčęėįšųū-0-builder"))
In guix/derivations.scm:
    626:4  4 (_
"/gnu/store/14i2qkvlx6gi98akhwwk7bh26s710s35-test-ąčęėįšųū-0-builder")
In unknown file:
           3 (put-string #<output: string 7f00d5a3b930>
"/gnu/store/14i2qkvlx6gi98akhwwk7bh26s710s35-test-ąčęėįšųū-0-builder"
#<undefined> #<undefined>)
In ice-9/boot-9.scm:
  1685:16  2 (raise-exception _ #:continuable? _)
  1685:16  1 (raise-exception _ #:continuable? _)
  1685:16  0 (raise-exception _ #:continuable? _)

ice-9/boot-9.scm:1685:16: In procedure raise-exception:
Throw to key `encoding-error' with args `("put-char" "conversion to
port encoding failed" 84 #<output: string 7f00d5a3b930> #\ą)'.


If anyone is knowledgeable enough to tell me why this happens and/or
how to fix it, I'd be grateful. Thanks.

Also, sorry for not opening a separate issue on the debbugs for now; I
will, if it turns out that the problem is a bit more than I can chew
on by myself (which, from the looks of, may actually be the case, but
I'm not entirely sure yet... maybe it's an easy fix).

[1] https://gitlab.com/markeviciuseidvilas/guix

On Tue, Aug 22, 2023 at 9:49 AM Eidvilas Markevičius
<markeviciuseidvilas@gmail.com> wrote:
>
> Hello Guix,
>
> Not long ago, somebody has raised an issue regarding an error that
> occurs whenever some unconventional character is used as the name for
> a store item [0]. Tobias Geerinckx-Rice pointed out that this
> restriction was directly inherited from the Nix source code [1] and
> that, as such, it isn't really a bug. Regardless, I believe that the
> imposed limitation may be undesirable in some situations. One that I
> can think of off the top of my head is packaging a piece of software
> with a name that contains non-Latin characters in it (e.g.,
> "Naršytuvas" by Raštija [2]). Of course, there are very few examples
> of such programs in actual practice, but there's a small chance of
> encountering them from time to time, especially if they're oriented
> towards non-English speaking users, and personally, I don't feel like
> resorting to transliteration is a good solution to this. After all,
> it's 2023, why would such a restriction need to be there in the first
> place when most filesystems are able to handle unicode characters just
> fine?
>
> Another scenario where these artificial restrictions could be a
> potential cause of trouble is when we consider a possibility that Guix
> might be used for packaging and distributing not only software, but
> all kinds of non-executable data such as films, books, music,
> databases, historical documents, website archives, etc. [3]. In the
> case of website archives: say I wanted to package the contents of the
> whole raštija.lt website. When choosing the package name for it,
> should I go with "rastija.lt", "rashtija.lt", or "raštija.lt". The
> latter would be a clear winner in my mind, since it is the canonical
> domain name for that particular site. And for all other types of data
> and media packages, using the official/original titles for their names
> would, too, be much more preferable over making use of any kind of
> transcription or transliteration method, IMO.
>
> Therefore, my proposal is to relax these limitations as much as
> possible (or at least somewhat) and to allow some more freedom when it
> comes to naming packages and other kinds of items in the store. We
> could, of course, still disallow all the main problematic characters,
> such as NUL, /, $, ~, space, newline and a few others, but other than
> that, I don't see any reason to forbid any of the remaining ones from
> being used.
>
> I'd like to hear your opinions on this and get to know whether this
> idea is feasible to implement at all or not, and if not – why?
>
> [0] https://issues.guix.gnu.org/64976
> [1] https://git.savannah.gnu.org/cgit/guix.git/tree/nix/libstore/store-api.cc#n58
> [2] https://raštija.lt/liepa/paslaugos-vartotojams/narsytuvas
> [3] https://gitlab.com/guix-media-channels


^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2023-09-02 20:03 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-08-22  6:49 Relaxing the restrictions for store item names Eidvilas Markevičius
2023-08-23 19:04 ` jbranso
2023-08-24  6:56 ` (
2023-08-24  7:16   ` MSavoritias
2023-08-24  7:31     ` Nguyễn Gia Phong via Development of GNU Guix and the GNU System distribution.
2023-08-24  7:35       ` Fannys
2023-08-24  7:41       ` MSavoritias
2023-08-24  8:10         ` Nguyễn Gia Phong via Development of GNU Guix and the GNU System distribution.
2023-08-24  8:18           ` MSavoritias
2023-08-24  8:41             ` Msavoritias
2023-08-24 10:21               ` Julien Lepiller
2023-08-24 10:36                 ` MSavoritias
2023-08-24 19:38                   ` (
2023-08-27 15:27                 ` wolf
2023-08-24 10:33 ` Simon Tournier
2023-08-24 16:30 ` Kaelyn
2023-08-24 18:26   ` Eidvilas Markevičius
2023-09-02 20:02 ` Eidvilas Markevičius
  -- strict thread matches above, loose matches on Subject: below --
2023-08-25  8:37 Nathan Dehnel
2023-08-25  9:14 ` Eidvilas Markevičius
2023-08-25 14:01   ` Eidvilas Markevičius
2023-08-25 16:32     ` Kaelyn
2023-08-25 17:44       ` Eidvilas Markevičius
2023-08-25 18:14       ` Saku Laesvuori

Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/guix.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.