From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED!not-for-mail From: Alan Mackenzie Newsgroups: gmane.emacs.devel Subject: Re: bug#23425: master branch: `message' wrongly corrupts ' to curly quote. Date: Mon, 5 Jun 2017 16:27:37 +0000 Message-ID: <20170605162737.GA30946@acm.fritz.box> References: <83zis4h59w.fsf@gnu.org> <51a2ae75-71f7-10f6-ae2a-7c830bdf0a30@cs.ucla.edu> <17c1c00d-a275-5e61-0c47-6872a64a9347@cs.ucla.edu> <20170531212452.GA3789@acm.fritz.box> <07bf5f9d-e8cd-a4d9-1843-b488bfe0b92c@cs.ucla.edu> <20170602210209.GA3570@acm.fritz.box> <11c0adfb-7fdd-8d28-1a47-869e3e7043ea@cs.ucla.edu> <20170603205331.GA2130@acm.fritz.box> NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-Trace: blaine.gmane.org 1496680137 9097 195.159.176.226 (5 Jun 2017 16:28:57 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Mon, 5 Jun 2017 16:28:57 +0000 (UTC) User-Agent: Mutt/1.5.24 (2015-08-30) Cc: Glenn Morris , 23425@debbugs.gnu.org, emacs-devel@gnu.org To: Paul Eggert Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Mon Jun 05 18:28:51 2017 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by blaine.gmane.org with esmtp (Exim 4.84_2) (envelope-from ) id 1dHus6-0001xg-UL for ged-emacs-devel@m.gmane.org; Mon, 05 Jun 2017 18:28:51 +0200 Original-Received: from localhost ([::1]:34267 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1dHusA-0003nZ-PN for ged-emacs-devel@m.gmane.org; Mon, 05 Jun 2017 12:28:54 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:47943) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1dHus3-0003nT-Qy for emacs-devel@gnu.org; Mon, 05 Jun 2017 12:28:49 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1dHury-0008Fw-Tc for emacs-devel@gnu.org; Mon, 05 Jun 2017 12:28:47 -0400 Original-Received: from ocolin.muc.de ([193.149.48.4]:28141 helo=mail.muc.de) by eggs.gnu.org with smtp (Exim 4.71) (envelope-from ) id 1dHury-0008DH-IH for emacs-devel@gnu.org; Mon, 05 Jun 2017 12:28:42 -0400 Original-Received: (qmail 65805 invoked by uid 3782); 5 Jun 2017 16:28:40 -0000 Original-Received: from acm.muc.de (p548C78F4.dip0.t-ipconnect.de [84.140.120.244]) by colin.muc.de (tmda-ofmipd) with ESMTP; Mon, 05 Jun 2017 18:28:38 +0200 Original-Received: (qmail 31064 invoked by uid 1000); 5 Jun 2017 16:27:37 -0000 Content-Disposition: inline In-Reply-To: X-Delivery-Agent: TMDA/1.1.12 (Macallan) X-Primary-Address: acm@muc.de X-detected-operating-system: by eggs.gnu.org: FreeBSD 9.x [fuzzy] X-Received-From: 193.149.48.4 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.org gmane.emacs.devel:215454 Archived-At: Hello, Paul. On Sun, Jun 04, 2017 at 14:01:42 -0700, Paul Eggert wrote: > Alan Mackenzie wrote: > > We have moved from a state where everybody knew what > > `message' did (in Emacs 24), to one with wild special characters which > > only apply sometimes, and necessitate crazy prolix formulations to work > > around unwanted translations of quote characters. > This exaggerates somewhat. We moved from Emacs 24 where only % is special, to > Emacs 25 where %, ` and ' are special. Yes. We moved from regularity (where %x, for varying x, and nothing else was special) to a ragbag (where there are 3 special characters, with the two "new" ones being syntactically totally different from %). > Although some people don't know that ` and ' are special, that's also > true for %. No. _Anybody_ who's used `message' knows that %s is how you print out an arbitrary sexp. Anybody who's used printf in C knows this, too. It is very easy not to know that ` and ' are special, and horribly easy to get caught out by it, as happened to me. > And although it can be annoying to write (message "%s" STR) to avoid > unwanted translation of STR, that annoyance was already present for %. It is not merely annoying, it is hideously irregular. Having to write (message "%s" (format "...." arg1 arg2 ....)) screams out "we didn't think this through properly". A call to message should only need one format string. The change I am proposing would achieve this. This was never the case for %. It is and always was trivially easy to cause a literal % sign to be output by message, and there was never danger of confusion in this. We also have `format' and `format-message' which handle format strings inconsistently. (Yes, I know, `format-message' was introduced deliberately to create this inconsistency, because `format' was no longer able to cope on its own.) > > it makes sense to shift this burden over to the use cases where the > > programmers need quote translation, and hence will be aware of it. > When text-quoting-style specifies translation, most instances of ` and ' in > Emacs messages are better off translated. So it also makes sense to translate by > default in this situation, with a way to avoid translation in the rare cases > where translation isn't wanted. I disagree with this, of course. Translating behind people's backs is not a friendly thing to do. Translation should only be done where it is specifically specified. > The question is about which approach makes more sense, not whether one > approach is sensible and the other nonsense. OK. > >> although it simplifies ‘message’ (obviously), this is at the price of > >> complicating everything else. > > What is the "everything else" that gets thus complicated? > I was referring to the hassle of going through hundreds or thousands of message > strings or calls, deciding which instances of ` and ' should be replaced with %` > and %', and replacing the instances accordingly. Yes. There are quite a lot, but not an unmanageable number. > It's also possible that at times we'll need two format strings instead > of one, complicating the code. We need two strings instead of one at the moment, with (message "%s" (format "..." .....)). With %` and %' we'd only need one string in each message invocation. This is simplification. Can you give an example of something which might need two strings? > > There are around 17,000 occurrances of "message" in our Lisp > > sources, and probably a few in our C sources. Only (some of) those > > containing the quote characters in the format string would need > > amendment. These will comprise a tiny portion of these ~17,000 > How many lines do you think will be in that "tiny portion"? No matter how you > count them, it'll be quite a few changes. By searching for "(\\(message\\|error\\)\\s +\\([^\"]\\|\"\\(\\\\.\\|[^\"\\]\\)*[`']\\)" , i.e. an invocation of message or error followed by either something which isn't a literal string, or a literal string containing ` or ', I get 2745 matches in our Lisp sources. There'll be a smaller number also in our C sources. I would have to enhance that regexp to recognise comments, and maybe a few other things, but 2745 is a good first approximation. A very great number of these are "(error ..." handlers in condition-case forms. A great number of those remaining could be simply and mechanically translated, for example "don't", "can't", "couldn't", etc., and a lot of "`%s'"s and "`%S'"s. I estimate there will be a few hundred forms remaining which need decision making to adapt them. For example, where message is used in macros, and the format string is a macro parameter. > > and can be found easily enough with a script > I'm afraid not, because in many cases the string is not a simple literal > constant argument to the message function. For starters, there's also the error > function; that's another 14,000 text matches in the Elisp source -- many of them > false alarms of course, but not all of them. See above. > I'm not saying this sort of change is impossible. It's just that it'd be quite a > bit of work, work that someone would need to volunteer to do. Is this really the > best use of our limited resources? Clearly, that someone would have to be me. The consequences of surreptitious unwanted translation are so severe that I think this would indeed be a good use of resources. -- Alan Mackenzie (Nuremberg, Germany).