unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
* bug#66825: last-coding-system-used in basic-save-buffer
@ 2023-10-29 18:40 Juri Linkov
  2023-10-29 19:27 ` Eli Zaretskii
  0 siblings, 1 reply; 7+ messages in thread
From: Juri Linkov @ 2023-10-29 18:40 UTC (permalink / raw)
  To: 66825

[Creating a separate bug report from bug#66317 since
sometimes the bug occurs even when project-mode-line is nil]

>> Ok, here is 100% reproducible minimal test case:
>> 0. emacs-30 -Q
>> 1. eval:
>> (progn
>>    (require 'project)
>>    (setq project-mode-line t)
>>    (setq set-message-functions '(set-multi-message)))
>> 2. in a temporary directory: M-! git init RET
>> 3. C-x C-f .dir-locals.el RET
>> 4. insert: ((fundamental-mode . ((mode . flyspell))))
>> 5. C-x C-s
>> At this point even buffer-file-coding-system of .dir-locals.el
>> is changed to 't' (raw-text-unix).  The same happens when saving
>> any file in that project.
>> The problem doesn't occur when flyspell-mode is enabled from
>> file-local variables, only from dir-locals.
>
> I can repro. But it's as weird a bug as they come.
>
> I guess it's a combination of using flyspell-mode and editing
> .dir-locals.el? Or have you seen other buffers' b-f-c-s changed this way
> too?
>
> If not, it might have something to do with flyspell-mode's use of
> sit-for?.. But I'm only saying that because it's the only feature of this
> mode that I'm regularly reminded of.
>
> I tried using a variable watcher:
>
>   (add-variable-watcher
>    'buffer-file-coding-system
>    (lambda (_sym value op where)
>     (message "%s %s %s" value op where)
>     (if (eq 'raw-text-unix value) (backtrace))
>    ))
>
> but it just prints
>
> prefer-utf-8-unix set  *temp*-925453 [2 times]
> raw-text-unix set .dir-locals.el
>   backtrace()
>   (if (eq 'raw-text-unix value) (backtrace))
>   (closure (t) (_sym value op where) (message "%s %s %s" value op where)
>   (if (eq 'raw-text-unix value) (backtrace)))(buffer-file-coding-system
>   raw-text-unix set #<buffer .dir-locals.el>)
>   basic-save-buffer(t)
>   save-buffer(1)
>   funcall-interactively(save-buffer 1)
>   call-interactively(save-buffer nil nil)
>   command-execute(save-buffer)
>
> OTOH, the bug is very reliable to reproduce: add the aforementioned line to
> dir-locals and save -> the coding system changes to raw-text. Delete the
> line and save -> and it's prefer-utf-8 again.

Indeed, the problem is in basic-save-buffer on the following line:

    (setq buffer-file-coding-system last-coding-system-used)

It's hard to guess why this code relies on the value that
can be changed by other functions during saving the buffer.

For example,

  (progn
    (setq last-coding-system-used 'prefer-utf-8-unix)
    (project-name (project-current))
    (message "%S" last-coding-system-used))

prints "raw-text-unix" because it enables 'flyspell-mode'
that calls:

  (defun ispell-buffer-local-parsing ()
    (ispell-send-string "!\n")

where 'process-send-string' changes 'last-coding-system-used'
to "raw-text-unix" in:

  send_process (Lisp_Object proc, const char *buf, ptrdiff_t len,
                Lisp_Object object)
  {
    ...
    Vlast_coding_system_used = CODING_ID_NAME (coding->id);

A possible workaround would be to protect the value of
last-coding-system-used in 'project-mode-line-format':

  (defun project-mode-line-format ()
    "Compose the project mode-line."
    (when-let ((project (project-current)))
      (let (last-coding-system-used)
        (concat
         " "
         (propertize
          (project-name project)
          'face project-mode-line-face
          'mouse-face 'mode-line-highlight
          'help-echo "mouse-1: Project menu"
          'local-map project-mode-line-map)))))

However, I noticed that occasionally this bug occurs even
when this function is not used.  So the proper fix needed
in 'basic-save-buffer', but I don't know if it's intended
that some function should change 'last-coding-system-used'
during saving the buffer.





^ permalink raw reply	[flat|nested] 7+ messages in thread

* bug#66825: last-coding-system-used in basic-save-buffer
  2023-10-29 18:40 bug#66825: last-coding-system-used in basic-save-buffer Juri Linkov
@ 2023-10-29 19:27 ` Eli Zaretskii
  2023-10-30  7:56   ` Juri Linkov
  0 siblings, 1 reply; 7+ messages in thread
From: Eli Zaretskii @ 2023-10-29 19:27 UTC (permalink / raw)
  To: Juri Linkov; +Cc: 66825

> From: Juri Linkov <juri@linkov.net>
> Date: Sun, 29 Oct 2023 20:40:32 +0200
> 
> Indeed, the problem is in basic-save-buffer on the following line:
> 
>     (setq buffer-file-coding-system last-coding-system-used)
> 
> It's hard to guess why this code relies on the value that
> can be changed by other functions during saving the buffer.

Because this is the protocol: functions that determine the encoding of
buffer text dynamically set this variable, so it could later be used
to reflect the detection in buffer-file-coding-system.

> For example,
> 
>   (progn
>     (setq last-coding-system-used 'prefer-utf-8-unix)
>     (project-name (project-current))
>     (message "%S" last-coding-system-used))
> 
> prints "raw-text-unix" because it enables 'flyspell-mode'
> that calls:
> 
>   (defun ispell-buffer-local-parsing ()
>     (ispell-send-string "!\n")
> 
> where 'process-send-string' changes 'last-coding-system-used'
> to "raw-text-unix" in:
> 
>   send_process (Lisp_Object proc, const char *buf, ptrdiff_t len,
>                 Lisp_Object object)
>   {
>     ...
>     Vlast_coding_system_used = CODING_ID_NAME (coding->id);
> 
> A possible workaround would be to protect the value of
> last-coding-system-used in 'project-mode-line-format':

That's not a workaround, that's what we should do in such cases.
Alternatively, the "important" setting of last-coding-system-used
should be done later, after the inner functions already returned.

> However, I noticed that occasionally this bug occurs even
> when this function is not used.  So the proper fix needed
> in 'basic-save-buffer', but I don't know if it's intended
> that some function should change 'last-coding-system-used'
> during saving the buffer.

Yes, it's intended: saving the buffer can change
buffer-file-coding-system if it detects characters which cannot be
encoded by the original buffer-file-coding-system.





^ permalink raw reply	[flat|nested] 7+ messages in thread

* bug#66825: last-coding-system-used in basic-save-buffer
  2023-10-29 19:27 ` Eli Zaretskii
@ 2023-10-30  7:56   ` Juri Linkov
  2023-10-30 12:15     ` Eli Zaretskii
  0 siblings, 1 reply; 7+ messages in thread
From: Juri Linkov @ 2023-10-30  7:56 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 66825

>> Indeed, the problem is in basic-save-buffer on the following line:
>>
>>     (setq buffer-file-coding-system last-coding-system-used)
>>
>> It's hard to guess why this code relies on the value that
>> can be changed by other functions during saving the buffer.
>
> Because this is the protocol: functions that determine the encoding of
> buffer text dynamically set this variable, so it could later be used
> to reflect the detection in buffer-file-coding-system.

Still it seems there is some flaw in the design when some
functions change 'last-coding-system-used' intentionally,
but some can change it unintentionally that causes this bug.

>> A possible workaround would be to protect the value of
>> last-coding-system-used in 'project-mode-line-format':
>
> That's not a workaround, that's what we should do in such cases.
> Alternatively, the "important" setting of last-coding-system-used
> should be done later, after the inner functions already returned.

I don't understand this alternative.  The mode line updating
that uses 'project-mode-line-format' that unintentionally
changes 'last-coding-system-used' is called from this line
in 'basic-save-buffer-2':

  (write-region nil nil
                buffer-file-name nil t buffer-file-truename)

because this call in 'write_region' updates the mode line:

    message_with_string ((NUMBERP (append)
			  ? "Updated %s"
			  : ! NILP (append)
			  ? "Added to %s"
			  : "Wrote %s"),
			 visit_file, 1);

>> However, I noticed that occasionally this bug occurs even
>> when this function is not used.  So the proper fix needed
>> in 'basic-save-buffer', but I don't know if it's intended
>> that some function should change 'last-coding-system-used'
>> during saving the buffer.
>
> Yes, it's intended: saving the buffer can change
> buffer-file-coding-system if it detects characters which cannot be
> encoded by the original buffer-file-coding-system.

I see this detection happens in the same 'write_region':

  /* Decide the coding-system to encode the data with.
     We used to make this choice before calling build_annotations, but that
     leads to problems when a write-annotate-function takes care of
     unsavable chars (as was the case with X-Symbol).  */
  Vlast_coding_system_used
    = choose_write_coding_system (start, end, filename,
                                 append, visit, lockname, &coding);

This means the value 'last-coding-system-used' should be preserved
afterwards.  Therefore I don't see how this could be fixed in
'basic-save-buffer' and 'write-region'.

Ok, then let's fix this in 'project-mode-line-format':

diff --git a/lisp/progmodes/project.el b/lisp/progmodes/project.el
index bb44cfefa54..da89036160a 100644
--- a/lisp/progmodes/project.el
+++ b/lisp/progmodes/project.el
@@ -2074,14 +2121,17 @@ project-mode-line-format
 (defun project-mode-line-format ()
   "Compose the project mode-line."
   (when-let ((project (project-current)))
-    (concat
-     " "
-     (propertize
-      (project-name project)
-      'face project-mode-line-face
-      'mouse-face 'mode-line-highlight
-      'help-echo "mouse-1: Project menu"
-      'local-map project-mode-line-map))))
+    ;; Preserve the original value of 'last-coding-system-used'
+    ;; that can break 'basic-save-buffer' (bug#66825)
+    (let ((last-coding-system-used nil))
+      (concat
+       " "
+       (propertize
+        (project-name project)
+        'face project-mode-line-face
+        'mouse-face 'mode-line-highlight
+        'help-echo "mouse-1: Project menu"
+        'local-map project-mode-line-map)))))
 
 (provide 'project)
 ;;; project.el ends here






^ permalink raw reply related	[flat|nested] 7+ messages in thread

* bug#66825: last-coding-system-used in basic-save-buffer
  2023-10-30  7:56   ` Juri Linkov
@ 2023-10-30 12:15     ` Eli Zaretskii
  2023-10-30 17:20       ` Juri Linkov
  0 siblings, 1 reply; 7+ messages in thread
From: Eli Zaretskii @ 2023-10-30 12:15 UTC (permalink / raw)
  To: Juri Linkov; +Cc: 66825

> From: Juri Linkov <juri@linkov.net>
> Cc: 66825@debbugs.gnu.org
> Date: Mon, 30 Oct 2023 09:56:27 +0200
> 
> I don't understand this alternative.  The mode line updating
> that uses 'project-mode-line-format' that unintentionally
> changes 'last-coding-system-used' is called from this line
> in 'basic-save-buffer-2':
> 
>   (write-region nil nil
>                 buffer-file-name nil t buffer-file-truename)
> 
> because this call in 'write_region' updates the mode line:
> 
>     message_with_string ((NUMBERP (append)
> 			  ? "Updated %s"
> 			  : ! NILP (append)
> 			  ? "Added to %s"
> 			  : "Wrote %s"),
> 			 visit_file, 1);

How does message_with_string update the mode line?

And why does last-coding-system-used get set to raw-text-unix in this
scenario anyway?

> +    ;; Preserve the original value of 'last-coding-system-used'
> +    ;; that can break 'basic-save-buffer' (bug#66825)
> +    (let ((last-coding-system-used nil))
> +      (concat
> +       " "
> +       (propertize
> +        (project-name project)
> +        'face project-mode-line-face
> +        'mouse-face 'mode-line-highlight
> +        'help-echo "mouse-1: Project menu"
> +        'local-map project-mode-line-map)))))

I'm confused how this avoids the problem, probably because I don't
understand the answers to the two questions above.





^ permalink raw reply	[flat|nested] 7+ messages in thread

* bug#66825: last-coding-system-used in basic-save-buffer
  2023-10-30 12:15     ` Eli Zaretskii
@ 2023-10-30 17:20       ` Juri Linkov
  2023-10-30 17:45         ` Eli Zaretskii
  0 siblings, 1 reply; 7+ messages in thread
From: Juri Linkov @ 2023-10-30 17:20 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 66825

>> I don't understand this alternative.  The mode line updating
>> that uses 'project-mode-line-format' that unintentionally
>> changes 'last-coding-system-used' is called from this line
>> in 'basic-save-buffer-2':
>>
>>   (write-region nil nil
>>                 buffer-file-name nil t buffer-file-truename)
>>
>> because this call in 'write_region' updates the mode line:
>>
>>     message_with_string ((NUMBERP (append)
>> 			  ? "Updated %s"
>> 			  : ! NILP (append)
>> 			  ? "Added to %s"
>> 			  : "Wrote %s"),
>> 			 visit_file, 1);
>
> How does message_with_string update the mode line?

I'm not know, some deeper function needs to update the mode line
when the multi-line message resizes the echo area.

> And why does last-coding-system-used get set to raw-text-unix in this
> scenario anyway?

Because send_process needs to set it to raw-text-unix for ispell:

  send_process (Lisp_Object proc, const char *buf, ptrdiff_t len, Lisp_Object object)
  {
    Vlast_coding_system_used = CODING_ID_NAME (coding->id);

Here is a backtrace:

  send_process
  process-send-string
  ispell-send-string
  ispell-buffer-local-parsing
  ispell-accept-buffer-local-defs
  flyspell-accept-buffer-local-defs
  flyspell--mode-on
  flyspell-mode
  hack-one-local-variable(mode flyspell)
  hack-local-variables-apply
  hack-dir-local-variables-non-file-buffer
  project--value-in-dir
  project-name
  project-mode-line-format
  eval((project-mode-line-format))
  write-region
  basic-save-buffer-2
  basic-save-buffer-1
  basic-save-buffer
  save-buffer





^ permalink raw reply	[flat|nested] 7+ messages in thread

* bug#66825: last-coding-system-used in basic-save-buffer
  2023-10-30 17:20       ` Juri Linkov
@ 2023-10-30 17:45         ` Eli Zaretskii
  2023-10-31  7:23           ` Juri Linkov
  0 siblings, 1 reply; 7+ messages in thread
From: Eli Zaretskii @ 2023-10-30 17:45 UTC (permalink / raw)
  To: Juri Linkov; +Cc: 66825

> From: Juri Linkov <juri@linkov.net>
> Cc: 66825@debbugs.gnu.org
> Date: Mon, 30 Oct 2023 19:20:19 +0200
> 
> >>     message_with_string ((NUMBERP (append)
> >> 			  ? "Updated %s"
> >> 			  : ! NILP (append)
> >> 			  ? "Added to %s"
> >> 			  : "Wrote %s"),
> >> 			 visit_file, 1);
> >
> > How does message_with_string update the mode line?
> 
> I'm not know, some deeper function needs to update the mode line
> when the multi-line message resizes the echo area.

I think it's because message_with_string eventually calls redisplay,
and redisplay updates the mode line as part of its job.

> > And why does last-coding-system-used get set to raw-text-unix in this
> > scenario anyway?
> 
> Because send_process needs to set it to raw-text-unix for ispell:
> 
>   send_process (Lisp_Object proc, const char *buf, ptrdiff_t len, Lisp_Object object)
>   {
>     Vlast_coding_system_used = CODING_ID_NAME (coding->id);

I think this happens because the string sent to the speller is a
plain-ASCII string, and those are almost always unibyte strings.

So I think the local binding of last-coding-system-used around the
call to project-mode-line is TRT, it just needs a better comment to
explain why it's needed.

I think this should also teach us a lesson: calling arbitrary complex
code from mode-line's :eval forms is in general risky business and
should be avoided as much as possible.





^ permalink raw reply	[flat|nested] 7+ messages in thread

* bug#66825: last-coding-system-used in basic-save-buffer
  2023-10-30 17:45         ` Eli Zaretskii
@ 2023-10-31  7:23           ` Juri Linkov
  0 siblings, 0 replies; 7+ messages in thread
From: Juri Linkov @ 2023-10-31  7:23 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 66825

close 66825 30.0.50
quit

> So I think the local binding of last-coding-system-used around the
> call to project-mode-line is TRT, it just needs a better comment to
> explain why it's needed.

Ok, pushed the fix with a better comment.





^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2023-10-31  7:23 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-10-29 18:40 bug#66825: last-coding-system-used in basic-save-buffer Juri Linkov
2023-10-29 19:27 ` Eli Zaretskii
2023-10-30  7:56   ` Juri Linkov
2023-10-30 12:15     ` Eli Zaretskii
2023-10-30 17:20       ` Juri Linkov
2023-10-30 17:45         ` Eli Zaretskii
2023-10-31  7:23           ` Juri Linkov

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).