From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Boruch Baum Newsgroups: gmane.emacs.devel Subject: Re: fixing url-unhex-string for unicode/multi-byte charsets Date: Sun, 8 Nov 2020 04:12:16 -0500 Message-ID: <20201108091216.xug4neeem7iuayhq@E15-2016.optimum.net> References: <20201106074742.jq3h4uujm7oce7af@E15-2016.optimum.net> <83wnyy9akw.fsf@gnu.org> <20201106102756.e2ctvpjruenatud5@E15-2016.optimum.net> <83pn4q8zdz.fsf@gnu.org> <20201106122846.unoizvad53blgncf@E15-2016.optimum.net> <83lffe8v92.fsf@gnu.org> <83eel68r2y.fsf@gnu.org> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="29022"; mail-complaints-to="usenet@ciao.gmane.io" User-Agent: NeoMutt/20180716 Cc: Stefan Monnier , emacs-devel@gnu.org To: Eli Zaretskii Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Sun Nov 08 10:17:31 2020 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1kbgpH-0007Sk-Ie for ged-emacs-devel@m.gmane-mx.org; Sun, 08 Nov 2020 10:17:31 +0100 Original-Received: from localhost ([::1]:56952 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1kbgpG-0007k5-4x for ged-emacs-devel@m.gmane-mx.org; Sun, 08 Nov 2020 04:17:30 -0500 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:38554) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1kbgkQ-0004Ox-QH for emacs-devel@gnu.org; Sun, 08 Nov 2020 04:12:32 -0500 Original-Received: from mout.gmx.net ([212.227.15.19]:55723) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1kbgkN-0006vV-Bb; Sun, 08 Nov 2020 04:12:30 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=gmx.net; s=badeba3b8450; t=1604826740; bh=+LFxVHpkS3oIu6+uQZ087uZz2sg5anxq3L4llEj5M5M=; h=X-UI-Sender-Class:Date:From:To:Cc:Subject:References:In-Reply-To; b=DvKt7uEmXsXV6wgF3yQGmDfhN7WxmFb5Qx1XXkdJxhHJG3QgiQe29Ta4UwLliZUff b8eyBGcjMjkGIfW2wl+JTgbln9/+QW44qVtWBrgLVM2Nnbq9zSYjdiMDlS2qpGsnyh /bC3V15N1XUxCZYzMQV05JM89jHmJURSrjDN1Y5k= X-UI-Sender-Class: 01bb95c1-4bf8-414a-932a-4f6e2808ef9c Original-Received: from E15-2016.optimum.net ([70.19.86.82]) by mail.gmx.com (mrgmx004 [212.227.17.184]) with ESMTPSA (Nemesis) id 1N9MpS-1kESum0GNY-015M6y; Sun, 08 Nov 2020 10:12:20 +0100 Content-Disposition: inline In-Reply-To: <83eel68r2y.fsf@gnu.org> X-Provags-ID: V03:K1:52T/pftiBya0r8Urwd0Clm72gf9k5q1tt82tpH6h6gF0BGms9xW lDbS3SqRDXO10yWOuQrN/6/NDPPcZ1ORjSdJb+HDCeMV1iUX6mh9Yf+Nd2jc5cmOtyDw4xU qinlY79HtczSpMfdHucDQePk0FAYghzT33nSeGX2NnIun/RwSs5VshH8KJPcKZnhiF6DO3M 5ZSJAIPSWIyym2Tm5FZSA== X-UI-Out-Filterresults: notjunk:1;V03:K0:C2m0emcAT8g=:NjULdQ7Y+rMMlGLaNZV3pF Vo6k5VSVXUE/faKe07ItJ+RSMmJnEmWDchQk2aBGDY/0EsD7xIl9GXanLDtjNvqP8zOhMiQ28 zT7x3bU0oM1n7tMwe2yasYx65m+hlubbJl1rrVbQig8Q5QH5CEZ5TIQsx30AM52vtKusW0/qi 8OlKD/GqA1iGf+blUclu6augzd/2uoybMMIhaA15uvtNKNH++b350OYRKm7POFR4WhmYkrgYB TMoOrgbbAdgcDmHxcLFbFh+VYPZv/PlyWozgwK/TVGojhlLbPhfncV8sX8g9e1xrmVAnDlykW 3nBED6BE0tztiLIeGKUd5TJFxnglcsnyYaUYW+O9awSQUQ2hobmqFzekXNJ1SrSJ1aBHN2JQZ qir4EJUNb+C2Chcq4Ep316fgR4Hqd/ezhmXkfZ5xNuzkZKBjYWOa2PW7E6mXWFwSDg8wD7f2r EVDLk+QQfc8Ffr1a5h6WL701/x7qBQSnsMXdwK94XGqEb+V7j8TPdC+0CjHQbl3hxRDrltRqs HMgizwBrNiGXbjQ5FBq7KixdH+m0QV7c1JAsCkdDlLl5P3WoxlzTYDLmfYOYPrDDkbeP+nFsn fnGWz4hWn6i4Hkrxt/9dfNvYpUe8vbljwHmMbHSm5YkMb4trKCw1eXlskwsKstQ5FvRHaD/1S MN/b3U9D3rwKTlOD5Gd1FAuF3bzgRVHP5y3I6xMkvtY3sf5dIlIITt/QtHX1a1iURJl06cZVT g+Iv/16a+Y/kT89X0LvNG4FdeHCbmlmzrYb+g2iB8GEtmhdoxNURU5XeIjyaU082J2gvbBsL Received-SPF: pass client-ip=212.227.15.19; envelope-from=boruch_baum@gmx.com; helo=mout.gmx.net X-detected-operating-system: by eggs.gnu.org: First seen = 2020/11/08 04:12:23 X-ACL-Warn: Detected OS = Linux 2.2.x-3.x [generic] [fuzzy] X-Spam_score_int: -25 X-Spam_score: -2.6 X-Spam_bar: -- X-Spam_report: (-2.6 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.io gmane.emacs.devel:258886 Archived-At: On 2020-11-06 17:04, Eli Zaretskii wrote: > > From: Stefan Monnier > > Cc: Boruch Baum , emacs-devel@gnu.org > > Date: Fri, 06 Nov 2020 09:59:02 -0500 > > > > My guess is that his `file-name-coding-system` is set to something > > different from utf-8. That's correct, kind of. The setting isn't 'mine', its the emacs default. In both emacs 26.1 (debian) and emacs-snapshot (v28), file-name-coding-system defaults to nil, and default-file-name-coding-system defaults to utf-8-unix, so we have: file-name-coding-system =3D> nil (default-value file-name-coding-system) =3D> nil default-file-name-coding-system =3D> utf-8-unix Anybody besides me find that amusing? It reminds me of Bug report #43294 in that both seem intentionally designed to cause confusion and trip-up developers. > > [ BTW, I wouldn't be surprised to hear that the Freedesktop spec > > documents that the file names in the Trash should use utf-8, in whic= h > > case the code should hard-code utf-8 rather than use > > `file-name-coding-system` ;-) ] > > If the trash spec says it must be UTF-8, then yes, TRT is to use that > unconditionally. The FreeDesktop.org Trash specification[1] says about the trash restore PATH: "The value type for this key is =E2=80=9Cstring=E2=80=9D; it SHOULD sto= re the file name as the sequence of bytes produced by the file system, with characters escaped as in URLs (as defined by RFC 2396, section 2)." The RFC says (section 2.1): "... there is currently no provision within the generic URI syntax to accomplish this identification ... It is expected that a systematic treatment of character encoding within URI will be developed as a future modification of this specification." > But if the spec says nothing, I'd expect the file names to be in > whatever encoding they were on disk, which usually should coincide > with file-name-coding-system. [1] https://specifications.freedesktop.org/trash-spec/trashspec-latest.htm= l [2] http://www.faqs.org/rfcs/rfc2396.html =2D- hkp://keys.gnupg.net CA45 09B5 5351 7C11 A9D1 7286 0036 9E45 1595 8BC0