* bug#25365: 25.1; Coding system for bookmarks and desktop
@ 2017-01-05 12:37 Dmitri Paduchikh
2017-01-06 14:56 ` Dmitri Paduchikh
2017-01-07 12:46 ` Eli Zaretskii
0 siblings, 2 replies; 7+ messages in thread
From: Dmitri Paduchikh @ 2017-01-05 12:37 UTC (permalink / raw)
To: 25365
Hello,
It appears that all Cyrillic text in my bookmarks file has been
corrupted. I wasn't able to reproduce such a corruption using emacs -Q,
so probably this is due to interference with my settings which I will
have to investigate. But in my opinion there is a problem with
bookmark.el as well since it ignores completely coding system when
saving bookmarks. Thus I have written the following two advices to fix
its behavior. It seems that they work as needed.
(advice-add 'bookmark-write-file :around
#'(lambda (f &rest args)
(let ((coding-system-for-write (or coding-system-for-write 'utf-8-emacs)))
(apply f args)))
'((name . "coding")))
(advice-add 'bookmark-insert-file-format-version-stamp :before
#'(lambda (&rest args)
(when coding-system-for-write
(insert (format "\
;;;; Emacs Bookmark Format Version %d ;;;;
;;; -*- coding: %S -*-\n"
bookmark-file-format-version
coding-system-for-write))))
'((name . "coding")))
Besides, although desktop.el specifies coding system in its file, it is
old one - emacs-mule. Shouldn't this be utf-8-emacs these days instead?
With best regards
Dmitri Paduchikh
In GNU Emacs 25.1.1 (x86_64-unknown-linux-gnu, GTK+ Version 3.22.5)
of 2016-12-23 built on juergen
^ permalink raw reply [flat|nested] 7+ messages in thread
* bug#25365: 25.1; Coding system for bookmarks and desktop
2017-01-05 12:37 bug#25365: 25.1; Coding system for bookmarks and desktop Dmitri Paduchikh
@ 2017-01-06 14:56 ` Dmitri Paduchikh
2017-01-07 12:46 ` Eli Zaretskii
1 sibling, 0 replies; 7+ messages in thread
From: Dmitri Paduchikh @ 2017-01-06 14:56 UTC (permalink / raw)
To: 25365
Indeed, the corruption was due to bit rot in my configuration. Still I
think that it is safer to specify coding system explicitly when saving
bookmarks.
With best regards
Dmitri Paduchikh
^ permalink raw reply [flat|nested] 7+ messages in thread
* bug#25365: 25.1; Coding system for bookmarks and desktop
2017-01-05 12:37 bug#25365: 25.1; Coding system for bookmarks and desktop Dmitri Paduchikh
2017-01-06 14:56 ` Dmitri Paduchikh
@ 2017-01-07 12:46 ` Eli Zaretskii
[not found] ` <87d1fxzy9d.fsf@gmail.com>
1 sibling, 1 reply; 7+ messages in thread
From: Eli Zaretskii @ 2017-01-07 12:46 UTC (permalink / raw)
To: Dmitri Paduchikh; +Cc: 25365-done
> From: Dmitri Paduchikh <dpaduchikh@gmail.com>
> Date: Thu, 05 Jan 2017 17:37:30 +0500
>
> It appears that all Cyrillic text in my bookmarks file has been
> corrupted. I wasn't able to reproduce such a corruption using emacs -Q,
> so probably this is due to interference with my settings which I will
> have to investigate. But in my opinion there is a problem with
> bookmark.el as well since it ignores completely coding system when
> saving bookmarks. Thus I have written the following two advices to fix
> its behavior. It seems that they work as needed.
>
> (advice-add 'bookmark-write-file :around
> #'(lambda (f &rest args)
> (let ((coding-system-for-write (or coding-system-for-write 'utf-8-emacs)))
> (apply f args)))
> '((name . "coding")))
Thanks, I used something similar in the patch below, except that I
think we should honor the existing coding cookie in the bookmark file,
instead of forcing the user to override the default each time the file
is saved. So we should record the encoding when we load the file, and
apply it when saving. This way, the user could specify the
non-default encoding only once, using "C-x RET c", and have that value
honored by Emacs thereafter.
> (advice-add 'bookmark-insert-file-format-version-stamp :before
> #'(lambda (&rest args)
> (when coding-system-for-write
> (insert (format "\
> ;;;; Emacs Bookmark Format Version %d ;;;;
> ;;; -*- coding: %S -*-\n"
> bookmark-file-format-version
> coding-system-for-write))))
> '((name . "coding")))
This is not quite right, as coding-system-for-write could be something
inappropriate, like undecided or prefer-utf-8. And again, we should
honor the existing encoding. So I modified this part to support that.
I installed the patch below on the master branch. Please try it
(after removing your advices) and see if it gives good results. If
you see problems, please reopen this bug with the details.
Thanks.
> Besides, although desktop.el specifies coding system in its file, it is
> old one - emacs-mule. Shouldn't this be utf-8-emacs these days instead?
I'm not sure how many users share the desktop files with older Emacs
versions that didn't support utf-8-emacs. There's nothing wrong with
using emacs-mule in recent Emacs versions, for files that aren't
supposed to be read by programs that are not Emacs.
commit e272032769178768cf970839a9c22aba1f5b572e
Author: Eli Zaretskii <eliz@gnu.org>
Date: Sat Jan 7 14:33:41 2017 +0200
Specify encoding of the bookmark file
* lisp/bookmark.el (bookmark-insert-file-format-version-stamp):
Accept an argument CODING and include a 'coding:' cookie in the
bookmark file preamble.
(bookmark-upgrade-file-format-from-0): Call
'bookmark-insert-file-format-version-stamp' with the file buffer's
encoding, as detected when it was read.
(bookmark-file-coding-system): New variable.
(bookmark-load): Set bookmark-file-coding-system to the encoding
of the loaded file.
(bookmark-write-file): Bind coding-system-for-write to either the
user setting via "C-x RET c" or to the existing file encoding,
defaulting to 'utf-8-emacs'. Update the value of
bookmark-file-coding-system. (Bug#25365)
diff --git a/lisp/bookmark.el b/lisp/bookmark.el
index 3440a52..e18362b 100644
--- a/lisp/bookmark.el
+++ b/lisp/bookmark.el
@@ -267,6 +267,8 @@ bookmark-alist
(defvar bookmarks-already-loaded nil
"Non-nil if and only if bookmarks have been loaded from `bookmark-default-file'.")
+(defvar bookmark-file-coding-system nil
+ "The coding-system of the last loaded or saved bookmark file.")
;; more stuff added by db.
@@ -689,7 +691,7 @@ bookmark-upgrade-file-format-from-0
(let* ((old-list (bookmark-alist-from-buffer))
(new-list (bookmark-upgrade-version-0-alist old-list)))
(delete-region (point-min) (point-max))
- (bookmark-insert-file-format-version-stamp)
+ (bookmark-insert-file-format-version-stamp buffer-file-coding-system)
(pp new-list (current-buffer))
(save-buffer))
(goto-char (point-min))
@@ -726,11 +728,14 @@ bookmark-maybe-upgrade-file-format
(error "Bookmark file format version strangeness")))))
-(defun bookmark-insert-file-format-version-stamp ()
- "Insert text indicating current version of bookmark file format."
+(defun bookmark-insert-file-format-version-stamp (coding)
+ "Insert text indicating current version of bookmark file format.
+CODING is the symbol of the coding-system in which the file is encoded."
+ (if (memq (coding-system-base coding) '(undecided prefer-utf-8))
+ (setq coding 'utf-8-emacs))
(insert
- (format ";;;; Emacs Bookmark Format Version %d ;;;;\n"
- bookmark-file-format-version))
+ (format ";;;; Emacs Bookmark Format Version %d ;;;; -*- coding: %S -*- \n"
+ bookmark-file-format-version (coding-system-base coding)))
(insert ";;; This format is meant to be slightly human-readable;\n"
";;; nevertheless, you probably don't want to edit it.\n"
";;; "
@@ -1417,14 +1422,17 @@ bookmark-write-file
(with-current-buffer (get-buffer-create " *Bookmarks*")
(goto-char (point-min))
(delete-region (point-min) (point-max))
- (let ((print-length nil)
+ (let ((coding-system-for-write
+ (or coding-system-for-write
+ bookmark-file-coding-system 'utf-8-emacs))
+ (print-length nil)
(print-level nil)
;; See bug #12503 for why we bind `print-circle'. Users
;; can define their own bookmark types, which can result in
;; arbitrary Lisp objects being stored in bookmark records,
;; and some users create objects containing circularities.
(print-circle t))
- (bookmark-insert-file-format-version-stamp)
+ (bookmark-insert-file-format-version-stamp coding-system-for-write)
(insert "(")
;; Rather than a single call to `pp' we make one per bookmark.
;; Apparently `pp' has a poor algorithmic complexity, so this
@@ -1440,6 +1448,7 @@ bookmark-write-file
(condition-case nil
(write-region (point-min) (point-max) file)
(file-error (message "Can't write %s" file)))
+ (setq bookmark-file-coding-system coding-system-for-write)
(kill-buffer (current-buffer))
(bookmark-maybe-message
"Saving bookmarks to file %s...done" file)))))
@@ -1521,7 +1530,8 @@ bookmark-load
(expand-file-name bookmark-default-file))
file)
(setq bookmarks-already-loaded t))
- (bookmark-bmenu-surreptitiously-rebuild-list))
+ (bookmark-bmenu-surreptitiously-rebuild-list)
+ (setq bookmark-file-coding-system buffer-file-coding-system))
(error "Invalid bookmark list in %s" file)))
(kill-buffer (current-buffer)))
(if (null no-msg)
^ permalink raw reply related [flat|nested] 7+ messages in thread
* bug#25365: 25.1; Coding system for bookmarks and desktop
[not found] ` <83k2a5blo9.fsf@gnu.org>
@ 2017-01-09 17:07 ` Dmitri Paduchikh
2017-01-09 18:49 ` Eli Zaretskii
0 siblings, 1 reply; 7+ messages in thread
From: Dmitri Paduchikh @ 2017-01-09 17:07 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: 25365
Eli Zaretskii <eliz@gnu.org> wrote:
[...]
>> Also setting coding system through coding-system-for-write will make
>> non-encodable characters replaced by spaces.
EZ> This option exists for users who really know what they are doing; the
EZ> default utf-8-emacs encoding is safe for any and all characters Emacs
EZ> supports.
It may be hard to know what is happening if it is designed to happen
automatically. So, if you insist on supporting user choice here, please
consider the following patch. It adds the check of workability of the chosen
codec unless it is given directly by user in which case the usual procedure
applies, or the codec is omnipotent. In the case of encodability problems it
will fallback to utf-8-emacs.
I hope it is trivial enough as I did not sign papers.
diff --git a/lisp/bookmark.el b/lisp/bookmark.el
index e18362bca9..87638cdf1e 100644
--- a/lisp/bookmark.el
+++ b/lisp/bookmark.el
@@ -1432,13 +1432,21 @@ for a file, defaulting to the file defined by variable
;; arbitrary Lisp objects being stored in bookmark records,
;; and some users create objects containing circularities.
(print-circle t))
- (bookmark-insert-file-format-version-stamp coding-system-for-write)
(insert "(")
;; Rather than a single call to `pp' we make one per bookmark.
;; Apparently `pp' has a poor algorithmic complexity, so this
;; scales a lot better. bug#4485.
(dolist (i bookmark-alist) (pp i (current-buffer)))
(insert ")")
+ ;; Check that coding system is safe unless it is given directly by
+ ;; user or is omnipotent. Silently fallback to utf-8-emacs in a case.
+ (unless (or coding-system-require-warning ; coding is given by user
+ (memq 'emacs (coding-system-charset-list coding-system-for-write)))
+ (setq coding-system-for-write
+ (select-safe-coding-system (point-min) (point-max)
+ (list coding-system-for-write 'utf-8-emacs) t)))
+ (goto-char (point-min))
+ (bookmark-insert-file-format-version-stamp coding-system-for-write)
(let ((version-control
(cond
((null bookmark-version-control) nil)
With best regards
Dmitri Paduchikh
^ permalink raw reply related [flat|nested] 7+ messages in thread
* bug#25365: 25.1; Coding system for bookmarks and desktop
2017-01-09 17:07 ` Dmitri Paduchikh
@ 2017-01-09 18:49 ` Eli Zaretskii
2017-01-09 20:34 ` Dmitri Paduchikh
0 siblings, 1 reply; 7+ messages in thread
From: Eli Zaretskii @ 2017-01-09 18:49 UTC (permalink / raw)
To: Dmitri Paduchikh; +Cc: 25365
> From: Dmitri Paduchikh <dpaduchikh@gmail.com>
> Cc: 25365@debbugs.gnu.org
> Date: Mon, 09 Jan 2017 22:07:49 +0500
>
> It may be hard to know what is happening if it is designed to happen
> automatically. So, if you insist on supporting user choice here, please
> consider the following patch. It adds the check of workability of the chosen
> codec unless it is given directly by user in which case the usual procedure
> applies, or the codec is omnipotent. In the case of encodability problems it
> will fallback to utf-8-emacs.
With the proposed patch, the fallback utf-8-emacs will happen
silently, so the user will have no idea that her selected encoding was
ignored, isn't it so? If so, I don't think it's a good idea to do
that, when the user explicitly asks for a particular encoding.
Come to think of that, I actually don't understand why anything is
needed in addition to the current code: if the user-chosen encoding
cannot safely encode the bookmark file, Emacs will refuse to save the
file, and will insist on the user providing a safe encoding. E.g.,
set a bookmark in TUTORIAL.ru, and then try saving the bookmark file
using the Latin-1 encoding. I just tried something like that, and the
effect was exactly as I expected: Emacs requested me to provide
another encoding.
Thanks.
^ permalink raw reply [flat|nested] 7+ messages in thread
* bug#25365: 25.1; Coding system for bookmarks and desktop
2017-01-09 18:49 ` Eli Zaretskii
@ 2017-01-09 20:34 ` Dmitri Paduchikh
2017-01-10 15:49 ` Eli Zaretskii
0 siblings, 1 reply; 7+ messages in thread
From: Dmitri Paduchikh @ 2017-01-09 20:34 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: 25365
Eli Zaretskii <eliz@gnu.org> wrote:
>> From: Dmitri Paduchikh <dpaduchikh@gmail.com>
>> Cc: 25365@debbugs.gnu.org
>> Date: Mon, 09 Jan 2017 22:07:49 +0500
>>
>> It may be hard to know what is happening if it is designed to happen
>> automatically. So, if you insist on supporting user choice here, please
>> consider the following patch. It adds the check of workability of the chosen
>> codec unless it is given directly by user in which case the usual procedure
>> applies, or the codec is omnipotent. In the case of encodability problems it
>> will fallback to utf-8-emacs.
EZ> With the proposed patch, the fallback utf-8-emacs will happen
EZ> silently, so the user will have no idea that her selected encoding was
EZ> ignored, isn't it so? If so, I don't think it's a good idea to do
EZ> that, when the user explicitly asks for a particular encoding.
It is a choice between data loss (see below) and using another encoding.
The former seems worse to me.
EZ> Come to think of that, I actually don't understand why anything is
EZ> needed in addition to the current code: if the user-chosen encoding
EZ> cannot safely encode the bookmark file, Emacs will refuse to save
the EZ> file, and will insist on the user providing a safe encoding.
Only when user is specifying encoding through C-x RET c right now. Next
time, when the encoding is chosen automatically this check won't be
done. Try the following experiment: mark some non-ASCII character and
evaluate this form by M-:.
(let ((coding-system-for-write 'us-ascii))
(shell-command-on-region (region-beginning) (region-end) "od -A n -t x1"))
The output is " 3f". I guess that exact choice of a replacement
character depends on the coding system.
EZ> E.g., set a bookmark in TUTORIAL.ru, and then try saving the
EZ> bookmark file using the Latin-1 encoding. I just tried something
EZ> like that, and the effect was exactly as I expected: Emacs requested
EZ> me to provide another encoding.
Try this in different order. Assuming your patch applied, save bookmarks
using latin-1. Suppose this operation passes successfully and bookmarks
have been saved using latin-1. Then bookmark TUTORIAL.ru and save again
not specifying encoding.
With best regards
Dmitri Paduchikh
^ permalink raw reply [flat|nested] 7+ messages in thread
* bug#25365: 25.1; Coding system for bookmarks and desktop
2017-01-09 20:34 ` Dmitri Paduchikh
@ 2017-01-10 15:49 ` Eli Zaretskii
0 siblings, 0 replies; 7+ messages in thread
From: Eli Zaretskii @ 2017-01-10 15:49 UTC (permalink / raw)
To: Dmitri Paduchikh; +Cc: 25365
> From: Dmitri Paduchikh <dpaduchikh@gmail.com>
> Cc: 25365@debbugs.gnu.org
> Date: Tue, 10 Jan 2017 01:34:18 +0500
>
> EZ> E.g., set a bookmark in TUTORIAL.ru, and then try saving the
> EZ> bookmark file using the Latin-1 encoding. I just tried something
> EZ> like that, and the effect was exactly as I expected: Emacs requested
> EZ> me to provide another encoding.
>
> Try this in different order. Assuming your patch applied, save bookmarks
> using latin-1. Suppose this operation passes successfully and bookmarks
> have been saved using latin-1. Then bookmark TUTORIAL.ru and save again
> not specifying encoding.
Ah, OK. Then the attached additional change should handle that use
case.
Thanks.
commit 560a384038845e37228226313eccfc8d70132553
Author: Eli Zaretskii <eliz@gnu.org>
AuthorDate: Tue Jan 10 17:47:10 2017 +0200
Don't use unsafe encoding for the bookmark file
* lisp/bookmark.el (bookmark-write-file): Handle the case when the
explicitly specified encoding of the bookmark file cannot encode the
additional bookmarks just added. (Bug#25365)
diff --git a/lisp/bookmark.el b/lisp/bookmark.el
index e18362b..02dd8a9 100644
--- a/lisp/bookmark.el
+++ b/lisp/bookmark.el
@@ -1432,13 +1432,20 @@ bookmark-write-file
;; arbitrary Lisp objects being stored in bookmark records,
;; and some users create objects containing circularities.
(print-circle t))
- (bookmark-insert-file-format-version-stamp coding-system-for-write)
(insert "(")
;; Rather than a single call to `pp' we make one per bookmark.
;; Apparently `pp' has a poor algorithmic complexity, so this
;; scales a lot better. bug#4485.
(dolist (i bookmark-alist) (pp i (current-buffer)))
(insert ")")
+ ;; Make sure the specified encoding can safely encode the
+ ;; bookmarks. If it cannot, suggest utf-8-emacs as default.
+ (with-coding-priority '(utf-8-emacs)
+ (setq coding-system-for-write
+ (select-safe-coding-system (point-min) (point-max)
+ (list t coding-system-for-write))))
+ (goto-char (point-min))
+ (bookmark-insert-file-format-version-stamp coding-system-for-write)
(let ((version-control
(cond
((null bookmark-version-control) nil)
^ permalink raw reply related [flat|nested] 7+ messages in thread
end of thread, other threads:[~2017-01-10 15:49 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-01-05 12:37 bug#25365: 25.1; Coding system for bookmarks and desktop Dmitri Paduchikh
2017-01-06 14:56 ` Dmitri Paduchikh
2017-01-07 12:46 ` Eli Zaretskii
[not found] ` <87d1fxzy9d.fsf@gmail.com>
[not found] ` <83k2a5blo9.fsf@gnu.org>
2017-01-09 17:07 ` Dmitri Paduchikh
2017-01-09 18:49 ` Eli Zaretskii
2017-01-09 20:34 ` Dmitri Paduchikh
2017-01-10 15:49 ` Eli Zaretskii
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/emacs.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).