* bug#28179: Fwd: Re: bug#28179: Fix use of string-to-multibyte in ispell.el [not found] <f64ac7bd-9952-c09a-71df-f1e123407cff@sc3d.org> @ 2017-08-23 10:59 ` Reuben Thomas 2017-08-24 16:59 ` Eli Zaretskii 0 siblings, 1 reply; 7+ messages in thread From: Reuben Thomas @ 2017-08-23 10:59 UTC (permalink / raw) To: Eli Zaretskii, 28179 On 22/08/17 18:23, Eli Zaretskii wrote: >> Cc: 28179@debbugs.gnu.org >> From: Reuben Thomas <rrt@sc3d.org> >> Date: Tue, 22 Aug 2017 18:04:11 +0100 >> >> Are you sure we don't need to ensure ispell-get-decoded-string always >> returns a multibyte string? What if decode-coding-string returns a >> pure ASCII string, which is therefore unibyte? >> >> This is multibyte too, no? The Emacs manual says: >> >> Rather, Emacs uses a variable-length internal representation of >> characters, that stores each character as a sequence of 1 to 5 8-bit >> bytes, depending on the magnitude of its codepoint(1). For example, any >> ASCII character takes up only 1 byte, a Latin-1 character takes up 2 >> bytes, etc. We call this representation of text “multibyte”. > This is a misunderstanding, caused by the overloaded meaning of > "multibyte string". The way I meant it, it has to do with the > internal flag marking a string either unibyte or multibyte. Observe: > > (multibyte-string-p "abcd") => nil > > but > > (multibyte-string-p (decode-coding-string "abcd" 'utf-8)) => t So here, running decode-coding-string on a plain ASCII string returns a multibyte string. > ispell-decode-string, which you replaced with its body. The call to > string-to-multibyte worked on the result of decoding, not instead of > the decoding. So actually the call to string-to-multibyte was not > replaced, it was removed. Yes, that call seemed to be unnecessary. > Is the issue more clear now? I now understand the two meanings of "multibyte", but I don't understand how my patch is deficient. I tried even: (multibyte-string-p (decode-coding-string "abcde" 'utf-8 t)) ; returns t; also if I use 'us-ascii So in fact even when the string isn't copied (as in my patch, where I also use a third argument of t to decode-coding-string) it appears to be changed to a multibyte string. -- https://rrt.sc3d.org ^ permalink raw reply [flat|nested] 7+ messages in thread
* bug#28179: Fwd: Re: bug#28179: Fix use of string-to-multibyte in ispell.el 2017-08-23 10:59 ` bug#28179: Fwd: Re: bug#28179: Fix use of string-to-multibyte in ispell.el Reuben Thomas @ 2017-08-24 16:59 ` Eli Zaretskii 2017-08-24 17:32 ` Noam Postavsky 2017-08-24 17:45 ` Reuben Thomas 0 siblings, 2 replies; 7+ messages in thread From: Eli Zaretskii @ 2017-08-24 16:59 UTC (permalink / raw) To: Reuben Thomas; +Cc: 28179 > From: Reuben Thomas <rrt@sc3d.org> > Date: Wed, 23 Aug 2017 11:59:41 +0100 > > I now understand the two meanings of "multibyte", but I don't understand > how my patch is deficient. I didn't say it was deficient, I asked whether you verified that either (a) the result is always multibyte, or (b) that we don't need to worry about it being multibyte if it is pure-ASCII. > So in fact even when the string isn't copied (as in my patch, where I > also use a third argument of t to decode-coding-string) it appears to be > changed to a multibyte string. Fine, if you are sure, go ahead and push. Thanks. ^ permalink raw reply [flat|nested] 7+ messages in thread
* bug#28179: Fwd: Re: bug#28179: Fix use of string-to-multibyte in ispell.el 2017-08-24 16:59 ` Eli Zaretskii @ 2017-08-24 17:32 ` Noam Postavsky 2017-08-24 17:45 ` Reuben Thomas 1 sibling, 0 replies; 7+ messages in thread From: Noam Postavsky @ 2017-08-24 17:32 UTC (permalink / raw) To: Eli Zaretskii; +Cc: 28179, Reuben Thomas On Thu, Aug 24, 2017 at 12:59 PM, Eli Zaretskii <eliz@gnu.org> wrote: >> From: Reuben Thomas <rrt@sc3d.org> >> So in fact even when the string isn't copied (as in my patch, where I >> also use a third argument of t to decode-coding-string) it appears to be >> changed to a multibyte string. > > Fine, if you are sure, go ahead and push. But please, think of the children^H^H^H^H^H^H^H^H readers (of your patch)! Put this information in the commit message. ^ permalink raw reply [flat|nested] 7+ messages in thread
* bug#28179: Fwd: Re: bug#28179: Fix use of string-to-multibyte in ispell.el 2017-08-24 16:59 ` Eli Zaretskii 2017-08-24 17:32 ` Noam Postavsky @ 2017-08-24 17:45 ` Reuben Thomas 2017-08-24 18:20 ` Eli Zaretskii 1 sibling, 1 reply; 7+ messages in thread From: Reuben Thomas @ 2017-08-24 17:45 UTC (permalink / raw) To: Eli Zaretskii; +Cc: 28179 On 24/08/17 17:59, Eli Zaretskii wrote: >> From: Reuben Thomas <rrt@sc3d.org> >> Date: Wed, 23 Aug 2017 11:59:41 +0100 >> >> I now understand the two meanings of "multibyte", but I don't understand >> how my patch is deficient. > I didn't say it was deficient, Sorry, I was unclear. I meant, precisely, I don't see why you think my patch's code returns a string that is not multibyte. > I asked whether you verified that > either (a) the result is always multibyte I believe I showed this is the case. > >> So in fact even when the string isn't copied (as in my patch, where I >> also use a third argument of t to decode-coding-string) it appears to be >> changed to a multibyte string. > Fine, if you are sure, go ahead and push. > The reason I am asking again is because you first said: > What if decode-coding-string returns a pure ASCII string, which is > therefore unibyte? and then later you said: > The way I meant it, it has to do with the internal flag marking a > string either unibyte or multibyte. Observe: > (multibyte-string-p "abcd") => nil > > but > > (multibyte-string-p (decode-coding-string "abcd" 'utf-8)) => t In other words: 1. As far as I can tell from the above (and my own confirmatory experiments and reading of the documentation), a pure ASCII string can be multibyte (it's a matter of the multibyte flag, not the number of bytes used to store each character). 2. decode-coding-string always returns a multibyte string. Since these two observations seemed to mean that you contradicted yourself, I was checking whether in fact I had misunderstood (so that for example one of my two observations above is wrong), or if your original understanding was incomplete (so that in fact your question about decode-coding-string is therefore misguided, because it can return a pure ASCII unibyte string (in the coding sense) which is nonetheless a multibyte string (in the sense that multibyte-string-p on it returns t). Sorry about the miscommunication. In any case, I think the code is correct, your original question was misguided, and I shall push, with, as Noam requested in another message, an explanation of my assumptions. No need to reply further unless you think there really is a problem! -- https://rrt.sc3d.org ^ permalink raw reply [flat|nested] 7+ messages in thread
* bug#28179: Fwd: Re: bug#28179: Fix use of string-to-multibyte in ispell.el 2017-08-24 17:45 ` Reuben Thomas @ 2017-08-24 18:20 ` Eli Zaretskii 2017-08-24 18:50 ` Reuben Thomas 0 siblings, 1 reply; 7+ messages in thread From: Eli Zaretskii @ 2017-08-24 18:20 UTC (permalink / raw) To: Reuben Thomas; +Cc: 28179 > Cc: 28179@debbugs.gnu.org > From: Reuben Thomas <rrt@sc3d.org> > Date: Thu, 24 Aug 2017 18:45:33 +0100 > > The reason I am asking again is because you first said: > > > What if decode-coding-string returns a pure ASCII string, which is > > therefore unibyte? > > and then later you said: > > > The way I meant it, it has to do with the internal flag marking a > > string either unibyte or multibyte. Observe: > > (multibyte-string-p "abcd") => nil > > > > but > > > > (multibyte-string-p (decode-coding-string "abcd" 'utf-8)) => t That example may be conclusive for UTF-8, but is it conclusive for _any_ encoding? I don't know. E.g., what about the ISO-2022 based encodings, where all the bytes are (AFAIR) pure ASCII? > 1. As far as I can tell from the above (and my own confirmatory > experiments and reading of the documentation), a pure ASCII string can > be multibyte (it's a matter of the multibyte flag, not the number of > bytes used to store each character). > > 2. decode-coding-string always returns a multibyte string. Can you show me why 2 is always correct? It might be, I simply don't know. All I know is that in general relying on plain-ASCII strings to be always multibyte in any given situation is risky, we were bitten by that a few times. But maybe it's not an issue in this case. Which is why I was asking you whether you have sufficient basis to believe this to be so in this case. > Since these two observations seemed to mean that you contradicted > yourself, I was checking whether in fact I had misunderstood (so that > for example one of my two observations above is wrong), or if your > original understanding was incomplete (so that in fact your question > about decode-coding-string is therefore misguided, because it can return > a pure ASCII unibyte string (in the coding sense) which is nonetheless a > multibyte string (in the sense that multibyte-string-p on it returns t). I only used decode-coding-string because I remembered it as an easy way of creating a multibyte ASCII string, when the coding-system is UTF-8, that's all. There was no contradiction in what I said, at least not an intended one. ^ permalink raw reply [flat|nested] 7+ messages in thread
* bug#28179: Fwd: Re: bug#28179: Fix use of string-to-multibyte in ispell.el 2017-08-24 18:20 ` Eli Zaretskii @ 2017-08-24 18:50 ` Reuben Thomas 2017-08-24 19:02 ` Eli Zaretskii 0 siblings, 1 reply; 7+ messages in thread From: Reuben Thomas @ 2017-08-24 18:50 UTC (permalink / raw) To: Eli Zaretskii; +Cc: 28179 On 24 August 2017 at 19:20, Eli Zaretskii <eliz@gnu.org> wrote: >> Cc: 28179@debbugs.gnu.org >> From: Reuben Thomas <rrt@sc3d.org> >> Date: Thu, 24 Aug 2017 18:45:33 +0100 >> >> The reason I am asking again is because you first said: >> >> > What if decode-coding-string returns a pure ASCII string, which is >> > therefore unibyte? >> >> and then later you said: >> >> > The way I meant it, it has to do with the internal flag marking a >> > string either unibyte or multibyte. Observe: >> > (multibyte-string-p "abcd") => nil >> > >> > but >> > >> > (multibyte-string-p (decode-coding-string "abcd" 'utf-8)) => t > > That example may be conclusive for UTF-8, but is it conclusive for > _any_ encoding? I don't know. E.g., what about the ISO-2022 based > encodings, where all the bytes are (AFAIR) pure ASCII? (multibyte-string-p (decode-coding-string "abcd" 'iso-2022-jp)) => t I still don't understand what you're getting at: the bytes in "abcd" are pure ASCII, whatever coding system one is decoding from. > Can you show me why 2 is always correct? It might be, I simply don't > know. All I know is that in general relying on plain-ASCII strings to > be always multibyte in any given situation is risky, we were bitten by > that a few times. But maybe it's not an issue in this case. Which is > why I was asking you whether you have sufficient basis to believe this > to be so in this case. I don't know. As I said before, the make-obsolete notice for string-to-multibyte says "use `decode-coding-string'". If it is as tricky as you suggest it might be, then the notice should be updated to point to more detailed guidance. The relevant commit is: commit f74d496478cd57f252817bd7437fe1b7972ce01f Author: Stefan Monnier <monnier@iro.umontreal.ca> Date: Mon Jan 30 13:02:18 2017 -0500 * lisp/subr.el (string-make-unibyte, string-make-multibyte): Obsolete. diff --git a/lisp/subr.el b/lisp/subr.el index a6ba05c..a204577 100644 --- a/lisp/subr.el +++ b/lisp/subr.el @@ -1417,8 +1417,10 @@ posn-object-width-height ;; bug#23850 (make-obsolete 'string-to-unibyte "use `encode-coding-string'." "26.1") (make-obsolete 'string-as-unibyte "use `encode-coding-string'." "26.1") +(make-obsolete 'string-make-unibyte "use `encode-coding-string'." "26.1") (make-obsolete 'string-to-multibyte "use `decode-coding-string'." "26.1") (make-obsolete 'string-as-multibyte "use `decode-coding-string'." "26.1") +(make-obsolete 'string-make-multibyte "use `decode-coding-string'." "26.1") I'm going to close this bug; if better documentation is needed, both for the obsolescence of string-to-multibyte and for multibyte strings in general, that's a new bug. -- https://rrt.sc3d.org ^ permalink raw reply related [flat|nested] 7+ messages in thread
* bug#28179: Fwd: Re: bug#28179: Fix use of string-to-multibyte in ispell.el 2017-08-24 18:50 ` Reuben Thomas @ 2017-08-24 19:02 ` Eli Zaretskii 0 siblings, 0 replies; 7+ messages in thread From: Eli Zaretskii @ 2017-08-24 19:02 UTC (permalink / raw) To: Reuben Thomas; +Cc: 28179 > From: Reuben Thomas <rrt@sc3d.org> > Date: Thu, 24 Aug 2017 19:50:17 +0100 > Cc: 28179@debbugs.gnu.org > > >> > (multibyte-string-p (decode-coding-string "abcd" 'utf-8)) => t > > > > That example may be conclusive for UTF-8, but is it conclusive for > > _any_ encoding? I don't know. E.g., what about the ISO-2022 based > > encodings, where all the bytes are (AFAIR) pure ASCII? > > (multibyte-string-p (decode-coding-string "abcd" 'iso-2022-jp)) => t That's not what I meant, but never mind. I only replied to tell there was no contradiction in my previous messages, and no confusion on my part, that's all. Thanks. ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2017-08-24 19:02 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- [not found] <f64ac7bd-9952-c09a-71df-f1e123407cff@sc3d.org> 2017-08-23 10:59 ` bug#28179: Fwd: Re: bug#28179: Fix use of string-to-multibyte in ispell.el Reuben Thomas 2017-08-24 16:59 ` Eli Zaretskii 2017-08-24 17:32 ` Noam Postavsky 2017-08-24 17:45 ` Reuben Thomas 2017-08-24 18:20 ` Eli Zaretskii 2017-08-24 18:50 ` Reuben Thomas 2017-08-24 19:02 ` Eli Zaretskii
Code repositories for project(s) associated with this external index https://git.savannah.gnu.org/cgit/emacs.git https://git.savannah.gnu.org/cgit/emacs/org-mode.git This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.