From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp11.migadu.com ([2001:41d0:403:4789::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms9.migadu.com with LMTPS id gCNDALLo6GQxpAAA9RJhRA:P1 (envelope-from ) for ; Fri, 25 Aug 2023 19:45:22 +0200 Received: from aspmx1.migadu.com ([2001:41d0:403:4789::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp11.migadu.com with LMTPS id gCNDALLo6GQxpAAA9RJhRA (envelope-from ) for ; Fri, 25 Aug 2023 19:45:22 +0200 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id 787FF5E65A for ; Fri, 25 Aug 2023 19:45:21 +0200 (CEST) Authentication-Results: aspmx1.migadu.com; dkim=pass header.d=gmail.com header.s=20221208 header.b=dn6CfXlF; spf=pass (aspmx1.migadu.com: domain of "guix-devel-bounces+larch=yhetil.org@gnu.org" designates 209.51.188.17 as permitted sender) smtp.mailfrom="guix-devel-bounces+larch=yhetil.org@gnu.org"; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=yhetil.org; s=key1; t=1692985521; h=from:from:sender:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:list-id:list-help: list-unsubscribe:list-subscribe:list-post:dkim-signature; bh=dxQ1cPl8tzQpRtYUn24Mi92yaW4hj3yvb4OJe6RM1ZY=; b=G034TqJkMd+Z053ymCFEWlWNIWB4dfQwxACj48zA9gTTpMPq/GJ9VXULCoAOe0fFrWv2Tg B3I2/xOGEQT1KFLHS0v3iDIVxM/CUqNQ3j1fczO0QpesG7MrgLHCfh1HikjLDlgOnMg51l NGZPqdOnUaHQ+pKN7fMOV3nTgusJXM+pjxQcTvDt3Udu3DPQQGA8jPKyY05dhIdcw+iAYY Qo0PJU8heM2OIuTcDPd25BUBrRph5OYi7fANDkc0+OM8/sX4rj05AGcfB5bniX/oC3+2aj +iIMUdYPgQloYTSeTA+yNY0gfdMqJyGaNodG5eRuAkFNOF4ye/dh5O9rgV4gzQ== ARC-Seal: i=1; s=key1; d=yhetil.org; t=1692985521; a=rsa-sha256; cv=none; b=pPnGAVivGBCbzvc9SghEruNWY5+b2lvFtGuwXDrXcHCt8NjIBQS8tBKRiOXr36LPSsJ8FH xDSLOwAGjBzm6JJprIvIabnbMFHhzKwYoYhfSUxsobDDeCt8QZfHUFY+f5RzOcjc3AwOQl 7yzOA1dFq8HG8vfqyqxztltgnwov7AfKRyR1b/Byus3hGVjnu9Rk7hVvuZ/uoDI4mFparF HLgIj9W/DpT9tCYS9jKfCfd11tFLuzZNseRoeSy+3CLf2crm+xFEOb2UPu4QIApaOBVKJe RJk45UnjhmMo4QiCKUCya5fvPvlI9KcIC33HFwFL8zxxJR6quO6YVclXn5VJ0Q== ARC-Authentication-Results: i=1; aspmx1.migadu.com; dkim=pass header.d=gmail.com header.s=20221208 header.b=dn6CfXlF; spf=pass (aspmx1.migadu.com: domain of "guix-devel-bounces+larch=yhetil.org@gnu.org" designates 209.51.188.17 as permitted sender) smtp.mailfrom="guix-devel-bounces+larch=yhetil.org@gnu.org"; dmarc=pass (policy=none) header.from=gmail.com Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1qZarY-0004Er-2s; Fri, 25 Aug 2023 13:44:48 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qZarX-0004Ej-DA for guix-devel@gnu.org; Fri, 25 Aug 2023 13:44:47 -0400 Received: from mail-qt1-x833.google.com ([2607:f8b0:4864:20::833]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1qZarU-0007yu-Qo for guix-devel@gnu.org; Fri, 25 Aug 2023 13:44:47 -0400 Received: by mail-qt1-x833.google.com with SMTP id d75a77b69052e-41090d0f015so6254741cf.3 for ; Fri, 25 Aug 2023 10:44:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1692985484; x=1693590284; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=dxQ1cPl8tzQpRtYUn24Mi92yaW4hj3yvb4OJe6RM1ZY=; b=dn6CfXlFO997US0i6b8lif3ooSOWmusq87If3waos6/TK8j9B+ng3yJUwIafRMnIE8 TTuSuvXAYoXy6s1AwwPh97Y5gNmDMQNjWvcxHLeyTqR3WC+1YZAXNT2QdNfweLjWnLR0 jT9brhsVamRBM3/S26peEPkpPfSrmH9JAkH74W6pwbLiYvyAkb47MQsSqR/+f0mdKM+y ZU81t5qPrjvbJDOkFY0/NlUla/RFaq1zOE00wzzTeM4lOIkA4Fv4XomDoeHOxVTpfKk7 YV9kItXDmamCeh/LeIHncWcwDHBLWjyte90UQZChLC8WvIOhuu0f+Fin9CIm9eyYEDZA 7C7g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1692985484; x=1693590284; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=dxQ1cPl8tzQpRtYUn24Mi92yaW4hj3yvb4OJe6RM1ZY=; b=c0ZzxliL+lBMOKQFk4Q1EhrQXbOQiyOBCAslzaH164nmeYEJipVcT4gLmhhJL4l1cO yjChsf6v8qv0P+zXPW6gJbdZ/v52uEIUI0eoHoh9XIsWpaf05aKDCxPmMx7rp3NA8Uut Ox8s5CBvYbfoFDJJOvsgao4Fw3/a7iSVDS2U1euHiErBQcSxxhNN2/R5QGD1Apx9PXUX OwuC4iWiA4gcSYpYbVa7icy3V0n4xTXpck0MNMMJwE/B5Cf9B+B4nLLkQvKOjxJ/UpBJ e94K2kM1mtAUQu7SMLklTth7al6GOCPzzrQS/WEvhkmxIzOUPHcvN60wRm5TvNy51tuJ 19WA== X-Gm-Message-State: AOJu0YwPEIZNw8bcodFeRICbaPDBAbcCOSqUWYYbejrZGJh2w7rJTILx iV4LFQH36vxXCaPI+6jXzw5SBDI4L/w8fp6D6qM= X-Google-Smtp-Source: AGHT+IEGCISRezmVmXXqNDQYUOfbMxKJBSaqx5qiHPvVKLcf8mb+rm4ZhGSVmE+6KG0b3Au2lOkw2niB3ODBlfJyCuQ= X-Received: by 2002:ac8:7dd6:0:b0:405:3a8b:d753 with SMTP id c22-20020ac87dd6000000b004053a8bd753mr19908118qte.47.1692985483500; Fri, 25 Aug 2023 10:44:43 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: =?UTF-8?Q?Eidvilas_Markevi=C4=8Dius?= Date: Fri, 25 Aug 2023 20:44:31 +0300 Message-ID: Subject: Re: Relaxing the restrictions for store item names To: Kaelyn Cc: Nathan Dehnel , guix-devel@gnu.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Received-SPF: pass client-ip=2607:f8b0:4864:20::833; envelope-from=markeviciuseidvilas@gmail.com; helo=mail-qt1-x833.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: guix-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Development of GNU Guix and the GNU System distribution." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guix-devel-bounces+larch=yhetil.org@gnu.org Sender: guix-devel-bounces+larch=yhetil.org@gnu.org X-Migadu-Flow: FLOW_IN X-Migadu-Country: US X-Migadu-Spam-Score: -9.58 X-Spam-Score: -9.58 X-Migadu-Queue-Id: 787FF5E65A X-Migadu-Scanner: mx1.migadu.com X-TUID: OxM4NdEi4peh Well, what I realized right now is that this sort of "transparency" may not even have to be handled by guix at all. If we remember the fact that we're on a unix-based system, a user who really wants to install some piece of software with a unicode name, but doesn't know how to type the requisite characters could always use the help of an external program to do transliteration to another alphabet for him (e.g., translit from the perl-lingua-translit package): guix install $(echo imaginari-program | translit -t "ISO 9" -r) On Fri, Aug 25, 2023 at 7:32=E2=80=AFPM Kaelyn wrote: > > Hi, > > A couple of small early-morning (for me) comments below... not for or aga= inst the idea of percent encoding, but as a little bit of food for thought = while pondering how to handle Unicode in package names and/or store paths. > > On Friday, August 25th, 2023 at 2:01 PM, Eidvilas Markevi=C4=8Dius wrote: > > > Although now, just a few hours later, I'm having second thoughts on > > this. When you really think about it, it's very unlinkely that some > > user would prefer typing something like > > > > guix install %D0%B8%D0%BC%D0%B0%D0%B3%D0%B8%D0%BD%D0%B0%D1%80%D0%B8-%D0= %BF%D1%80%D0%BE%D0%B3%D1%80%D0%B0%D0%BC > > > > over > > > > guix install =D0=B8=D0=BC=D0=B0=D0=B3=D0=B8=D0=BD=D0=B0=D1=80=D0=B8-=D0= =BF=D1=80=D0=BE=D0=B3=D1=80=D0=B0=D0=BC > > I imagine that, for usability, the percent encoding (or other encoding or= transliteration) of non-ASCII characters could be handled transparently, i= .e. for "guix install =D0=B8=D0=BC=D0=B0=D0=B3=D0=B8=D0=BD=D0=B0=D1=80=D0= =B8-=D0=BF=D1=80=D0=BE=D0=B3=D1=80=D0=B0=D0=BC", guix would translate "=D0= =B8=D0=BC=D0=B0=D0=B3=D0=B8=D0=BD=D0=B0=D1=80=D0=B8-=D0=BF=D1=80=D0=BE=D0= =B3=D1=80=D0=B0=D0=BC" to the encoded form for operations. And if the escap= e character (e.g. the "%" in percent encoding) isn't also a valid character= for store or package names then the values can be handled transparently. F= or example, both "guix install git" and "guix install %67%69%74" and "guix = install g%69t" would all install git. > > > even if they don't have the russian (or whatever other language) > > keyboard layout set up on their system, so just for accessability > > purposes, the solution wouldn't be all that great. > > > It would also make > > store name unnecessarily long (they're already long as is), and > > there's a 255 char limit for filenames that we have to keep in mind as > > well. Searching the store using standard utilities such as find and > > grep would too, as a consequence, > > I split out the quote above as a bit of reference. While I agree that we = have to keep in mind the 255 char limit for filenames, with percent encodin= g causing a single byte in ASCII or UTF-8 to become ~3 bytes (with iirc mos= t non-latin characters having multi-byte encodings in UTF-8) and the store = hashes being a 33 byte prefix (counting the dash), 255 chars is still quite= a bit. Specifically, the extracted quote above--without the "> " prefixes = and with line breaks treated as single characters--is exactly 255 character= s. (I find a bit of readable text to be helpful for wrapping my brain aroun= d a value like "255 characters".) > > Cheers, > Kaelyn > > > break... There's just too many > > problems with this. > > > > I believe what Julien proposed is the most reasonable solution: > > unrestrict unicode characters in the store and (maybe) make it a > > project policy to not put unicode characters inside package names > > (however, personally I wouldn't be against that either). > > > > Now ensuring that URIs don't break, especially for substitute > > provision, should also be taken into consideration, but this can be > > handled separately. > > > > On Fri, Aug 25, 2023 at 12:14=E2=80=AFPM Eidvilas Markevi=C4=8Dius > > markeviciuseidvilas@gmail.com wrote: > > > > > On Fri, Aug 25, 2023 at 11:37=E2=80=AFAM Nathan Dehnel ncdehnel@gmail= .com wrote: > > > > > > > What you could do is implement percent encoding: > > > > https://en.wikipedia.org/wiki/Percent-encoding > > > > -Allows you to store package titles in any language in an encoded f= orm > > > > -Allows the titles to be typed on latin keyboards > > > > -Allows the packages to be accessed through URIs in the future with= out > > > > causing problems > > > > > > Now that's an idea. I didn't really thought of that. Although it'd > > > probably be trickier to implement in order to make all the tooling > > > compatible. I think that might be a good solution nonetheless.