unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
* bug#28179: Fix use of string-to-multibyte in ispell.el
@ 2017-08-22  0:51 Reuben Thomas
  2017-08-22 16:38 ` Eli Zaretskii
  2017-08-24 18:52 ` bug#28179: Closing bug: patch is installed Reuben Thomas
  0 siblings, 2 replies; 5+ messages in thread
From: Reuben Thomas @ 2017-08-22  0:51 UTC (permalink / raw)
  To: 28179

[-- Attachment #1: Type: text/plain, Size: 176 bytes --]

The attached patch removes a use of string-to-multibyte in ispell.el.

It turned out to be possible to simplify the surrounding code quite a
bit too.

-- 
https://rrt.sc3d.org

[-- Attachment #2: 0001-Avoid-using-string-to-multibyte-in-ispell.el.patch --]
[-- Type: text/x-patch, Size: 1915 bytes --]

From 077584030546f97c06d1f8c907475a04115e6c58 Mon Sep 17 00:00:00 2001
From: Reuben Thomas <rrt@sc3d.org>
Date: Tue, 22 Aug 2017 01:46:27 +0100
Subject: [PATCH] Avoid using string-to-multibyte in ispell.el

* lisp/textmodes/ispell.el (ispell-get-decoded-string): Use
decode-coding-string instead.
---
 lisp/textmodes/ispell.el | 16 +++-------------
 1 file changed, 3 insertions(+), 13 deletions(-)

diff --git a/lisp/textmodes/ispell.el b/lisp/textmodes/ispell.el
index e67e603..87a3b7a 100644
--- a/lisp/textmodes/ispell.el
+++ b/lisp/textmodes/ispell.el
@@ -1485,25 +1485,15 @@ ispell-current-personal-dictionary
   "The name of the current personal dictionary, or nil for the default.
 This is passed to the Ispell process using the `-p' switch.")
 
-(defun ispell-decode-string (str)
-  "Decodes multibyte character strings."
-  (decode-coding-string str (ispell-get-coding-system)))
-
 ;; Return a string decoded from Nth element of the current dictionary.
 (defun ispell-get-decoded-string (n)
   "Get the decoded string in slot N of the descriptor of the current dict."
   (let* ((slot (or
 		(assoc ispell-current-dictionary ispell-local-dictionary-alist)
 		(assoc ispell-current-dictionary ispell-dictionary-alist)
-		(error "No data for dictionary \"%s\", neither in `ispell-local-dictionary-alist' nor in `ispell-dictionary-alist'"
-		       ispell-current-dictionary)))
-	 (str (nth n slot)))
-    (when (and (> (length str) 0)
-	       (not (multibyte-string-p str)))
-      (setq str (ispell-decode-string str))
-      (or (multibyte-string-p str)
-	  (setq str (string-to-multibyte str))))
-    str))
+		(error "No data for dictionary \"%s\" in `ispell-local-dictionary-alist' or `ispell-dictionary-alist'"
+		       ispell-current-dictionary))))
+    (decode-coding-string (nth n slot) (ispell-get-coding-system) t)))
 
 (defun ispell-get-casechars ()
   (ispell-get-decoded-string 1))
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* bug#28179: Fix use of string-to-multibyte in ispell.el
  2017-08-22  0:51 bug#28179: Fix use of string-to-multibyte in ispell.el Reuben Thomas
@ 2017-08-22 16:38 ` Eli Zaretskii
  2017-08-22 17:04   ` Reuben Thomas
  2017-08-24 18:52 ` bug#28179: Closing bug: patch is installed Reuben Thomas
  1 sibling, 1 reply; 5+ messages in thread
From: Eli Zaretskii @ 2017-08-22 16:38 UTC (permalink / raw)
  To: Reuben Thomas; +Cc: 28179

> From: Reuben Thomas <rrt@sc3d.org>
> Date: Tue, 22 Aug 2017 01:51:53 +0100
> 
>  ;; Return a string decoded from Nth element of the current dictionary.
>  (defun ispell-get-decoded-string (n)
>    "Get the decoded string in slot N of the descriptor of the current dict."
>    (let* ((slot (or
>  		(assoc ispell-current-dictionary ispell-local-dictionary-alist)
>  		(assoc ispell-current-dictionary ispell-dictionary-alist)
> -		(error "No data for dictionary \"%s\", neither in `ispell-local-dictionary-alist' nor in `ispell-dictionary-alist'"
> -		       ispell-current-dictionary)))
> -	 (str (nth n slot)))
> -    (when (and (> (length str) 0)
> -	       (not (multibyte-string-p str)))
> -      (setq str (ispell-decode-string str))
> -      (or (multibyte-string-p str)
> -	  (setq str (string-to-multibyte str))))

Are you sure we don't need to ensure ispell-get-decoded-string always
returns a multibyte string?  What if decode-coding-string returns a
pure ASCII string, which is therefore unibyte?

IOW, it looks like you didn't replace the call to string-to-multibyte
with something equivalent, you simply removed that call.  If you
analyzed the code and concluded that this call is redundant, please
tell the details of your analysis.

Thanks.





^ permalink raw reply	[flat|nested] 5+ messages in thread

* bug#28179: Fix use of string-to-multibyte in ispell.el
  2017-08-22 16:38 ` Eli Zaretskii
@ 2017-08-22 17:04   ` Reuben Thomas
  2017-08-22 17:23     ` Eli Zaretskii
  0 siblings, 1 reply; 5+ messages in thread
From: Reuben Thomas @ 2017-08-22 17:04 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 28179

[-- Attachment #1: Type: text/plain, Size: 1828 bytes --]

On 22/08/17 17:38, Eli Zaretskii wrote:
>> From: Reuben Thomas <rrt@sc3d.org>
>> Date: Tue, 22 Aug 2017 01:51:53 +0100
>>
>>  ;; Return a string decoded from Nth element of the current dictionary.
>>  (defun ispell-get-decoded-string (n)
>>    "Get the decoded string in slot N of the descriptor of the current dict."
>>    (let* ((slot (or
>>  		(assoc ispell-current-dictionary ispell-local-dictionary-alist)
>>  		(assoc ispell-current-dictionary ispell-dictionary-alist)
>> -		(error "No data for dictionary \"%s\", neither in `ispell-local-dictionary-alist' nor in `ispell-dictionary-alist'"
>> -		       ispell-current-dictionary)))
>> -	 (str (nth n slot)))
>> -    (when (and (> (length str) 0)
>> -	       (not (multibyte-string-p str)))
>> -      (setq str (ispell-decode-string str))
>> -      (or (multibyte-string-p str)
>> -	  (setq str (string-to-multibyte str))))
> Are you sure we don't need to ensure ispell-get-decoded-string always
> returns a multibyte string?  What if decode-coding-string returns a
> pure ASCII string, which is therefore unibyte?

This is multibyte too, no? The Emacs manual says:

> Rather, Emacs uses a variable-length internal representation of
> characters, that stores each character as a sequence of 1 to 5 8-bit
> bytes, depending on the magnitude of its codepoint(1).  For example,
> *any**
> **ASCII character takes up only 1 byte*, a Latin-1 character takes up 2
> bytes, etc.  We call this representation of text “multibyte”.

(My bold.)

The reason I am using decode-coding-string is because that is what the
obsolescence message in subr.el says to use.

If I've overlooked something, then it would be nice to know what I've
missed in the documentation, or whether the documentation could be improved.

-- 
https://rrt.sc3d.org


[-- Attachment #2: Type: text/html, Size: 2486 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* bug#28179: Fix use of string-to-multibyte in ispell.el
  2017-08-22 17:04   ` Reuben Thomas
@ 2017-08-22 17:23     ` Eli Zaretskii
  0 siblings, 0 replies; 5+ messages in thread
From: Eli Zaretskii @ 2017-08-22 17:23 UTC (permalink / raw)
  To: Reuben Thomas; +Cc: 28179

> Cc: 28179@debbugs.gnu.org
> From: Reuben Thomas <rrt@sc3d.org>
> Date: Tue, 22 Aug 2017 18:04:11 +0100
> 
> Are you sure we don't need to ensure ispell-get-decoded-string always
> returns a multibyte string?  What if decode-coding-string returns a
> pure ASCII string, which is therefore unibyte?
> 
> This is multibyte too, no? The Emacs manual says:
> 
>  Rather, Emacs uses a variable-length internal representation of
>  characters, that stores each character as a sequence of 1 to 5 8-bit
>  bytes, depending on the magnitude of its codepoint(1). For example, any
>  ASCII character takes up only 1 byte, a Latin-1 character takes up 2
>  bytes, etc. We call this representation of text “multibyte”.

This is a misunderstanding, caused by the overloaded meaning of
"multibyte string".  The way I meant it, it has to do with the
internal flag marking a string either unibyte or multibyte.  Observe:

  (multibyte-string-p "abcd") => nil

but

  (multibyte-string-p (decode-coding-string "abcd" 'utf-8)) => t

although

  (string= "abcd" (decode-coding-string "abcd" 'utf-8)) => t

> The reason I am using decode-coding-string is because that is what the obsolescence message in subr.el
> says to use.

Yes, but the code already used decode-coding-string, in the function
ispell-decode-string, which you replaced with its body.  The call to
string-to-multibyte worked on the result of decoding, not instead of
the decoding.  So actually the call to string-to-multibyte was not
replaced, it was removed.

> If I've overlooked something, then it would be nice to know what I've missed in the documentation, or whether
> the documentation could be improved.

Is the issue more clear now?





^ permalink raw reply	[flat|nested] 5+ messages in thread

* bug#28179: Closing bug: patch is installed
  2017-08-22  0:51 bug#28179: Fix use of string-to-multibyte in ispell.el Reuben Thomas
  2017-08-22 16:38 ` Eli Zaretskii
@ 2017-08-24 18:52 ` Reuben Thomas
  1 sibling, 0 replies; 5+ messages in thread
From: Reuben Thomas @ 2017-08-24 18:52 UTC (permalink / raw)
  To: 28179-done

I've installed the patch, so closing the bug. There are some lingering
doubts over the correctness of the use of decode-coding-string, but
they're outside the scope of this bug.

-- 
https://rrt.sc3d.org






^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2017-08-24 18:52 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-08-22  0:51 bug#28179: Fix use of string-to-multibyte in ispell.el Reuben Thomas
2017-08-22 16:38 ` Eli Zaretskii
2017-08-22 17:04   ` Reuben Thomas
2017-08-22 17:23     ` Eli Zaretskii
2017-08-24 18:52 ` bug#28179: Closing bug: patch is installed Reuben Thomas

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).