From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Jean Louis Newsgroups: gmane.emacs.help Subject: Re: Decoding URLs input Date: Sat, 3 Jul 2021 22:17:39 +0300 Message-ID: References: Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="16733"; mail-complaints-to="usenet@ciao.gmane.io" User-Agent: Mutt/2.0.7+183 (3d24855) (2021-05-28) Cc: Help GNU Emacs To: Yuri Khan Original-X-From: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane-mx.org@gnu.org Sat Jul 03 21:21:42 2021 Return-path: Envelope-to: geh-help-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1lzlCw-0004AQ-Pr for geh-help-gnu-emacs@m.gmane-mx.org; Sat, 03 Jul 2021 21:21:42 +0200 Original-Received: from localhost ([::1]:34268 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1lzlCv-0005Hs-R2 for geh-help-gnu-emacs@m.gmane-mx.org; Sat, 03 Jul 2021 15:21:41 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:44658) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1lzlCd-0005GR-0e for help-gnu-emacs@gnu.org; Sat, 03 Jul 2021 15:21:23 -0400 Original-Received: from stw1.rcdrun.com ([217.170.207.13]:57735) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1lzlCa-0001Y6-Vj for help-gnu-emacs@gnu.org; Sat, 03 Jul 2021 15:21:22 -0400 Original-Received: from localhost ([::ffff:197.157.34.164]) (AUTH: PLAIN admin, TLS: TLS1.3,256bits,ECDHE_RSA_AES_256_GCM_SHA384) by stw1.rcdrun.com with ESMTPSA id 0000000000057F20.0000000060E0B88D.00000426; Sat, 03 Jul 2021 12:20:44 -0700 Mail-Followup-To: Yuri Khan , Help GNU Emacs Content-Disposition: inline In-Reply-To: Received-SPF: pass client-ip=217.170.207.13; envelope-from=bugs@gnu.support; helo=stw1.rcdrun.com X-Spam_score_int: -18 X-Spam_score: -1.9 X-Spam_bar: - X-Spam_report: (-1.9 / 5.0 requ) BAYES_00=-1.9, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: help-gnu-emacs@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Users list for the GNU Emacs text editor List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: "help-gnu-emacs" Xref: news.gmane.io gmane.emacs.help:131438 Archived-At: * Yuri Khan [2021-07-03 14:12]: > What you are dealing with is a URL, specifically, its query string > part. These are described in RFC 3986, and its percent-encoding scheme > in sections 2.1 and 2.5. > > (url-unhex-string …) will do half the work for you: It will decode > percent-encoded sequences into bytes. By convention, in URLs, > characters are UTF-8-encoded before percent-encoding (see RFC 3986 § > 2.5), so you’ll need to use: > > (decode-coding-string (url-unhex-string s) 'utf-8) > > to get a fully decoded text string. That is very correct and I have implemented that now. Until now it worked without `decode-coding-string' and I totally forgot UTF-8. When I faced the fact that spaces are replaced with plus `+' I started diggin more. It is not first time to deal with it, who knows which time and each time I stumble upon UTF-8 handlings, this time you were one step ahead of me, I have not stumbled upon it and could not discover what is missing. >From docstring of `url-unhex-string' I did not expect it would give just bytes back, then that should be IMHO described there, I am not sure really. Maybe it is assumed for programmer to know that. The docstring is poor, it says like: "Remove %XX embedded spaces, etc in a URL." -- with "remove" I don't expect converting UTF-8 into bytes. I guess now it is clear. I am now solving the issue that spaces are converted to plus sign and that I have to convert + signs maybe before: (decode-coding-string (url-unhex-string "Hello+There") 'utf-8) but maybe not before, maybe I leave it and convert later. Problem I have encountered is that library subr.el does not provide feature 'subr -- and I think I did file report but without acknowledgment and without seeing it being filed under my email. So I wait. So I cannot use in CGI script the function `string-replace' or as it asks for that file but I cannot `require' it, as it is not "provided", so I have to add that line. I would not like really fiddling on server with main Emacs files. -- Jean Take action in Free Software Foundation campaigns: https://www.fsf.org/campaigns In support of Richard M. Stallman https://stallmansupport.org/