* bug#73846: [PATCH] Make djvused emit UTF-8 encoded text
@ 2024-10-17 4:12 Visuwesh
2024-10-17 5:26 ` Eli Zaretskii
0 siblings, 1 reply; 4+ messages in thread
From: Visuwesh @ 2024-10-17 4:12 UTC (permalink / raw)
To: 73846; +Cc: Tassilo Horn
[-- Attachment #1: Type: text/plain, Size: 863 bytes --]
Tags: patch
Hi Tassilo,
This is a small patch to make djvused emit UTF-8 encoded text. In the
djvu test file that I sent you, outline in the appendix have non-ASCII
characters which are written as octal escapes. Rather than unescaping
them on Emacs side, we can request djvused to use UTF-8 directly which
this patch does. The attached patch does just that.
In GNU Emacs 31.0.50 (build 13, x86_64-pc-linux-gnu, X toolkit, cairo
version 1.18.0, Xaw scroll bars) of 2024-10-06 built on astatine
Repository revision: 500f5da5fb62cd0bbded8df754d93e3147d1d847
Repository branch: master
Windowing system distributor 'The X.Org Foundation', version 11.0.12101011
System Description: Debian GNU/Linux trixie/sid
Configured using:
'configure --with-sound=alsa --with-x-toolkit=lucid --without-xaw3d
--without-gconf --without-libsystemd --with-cairo CFLAGS=-g3'
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-Make-djvused-emit-UTF-8-encoded-text.patch --]
[-- Type: text/patch, Size: 968 bytes --]
From 8e21167c6e01ab76b76e15fa84bd198bc8df59b4 Mon Sep 17 00:00:00 2001
From: Visuwesh <visuweshm@gmail.com>
Date: Thu, 17 Oct 2024 09:40:34 +0530
Subject: [PATCH] Make djvused emit UTF-8 encoded text
* lisp/doc-view.el (doc-view--djvu-outline): Pass -u to djvused
to make it emit UTF-8 encoded text rather than using octal
escapes for non-ASCII string.
---
lisp/doc-view.el | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/lisp/doc-view.el b/lisp/doc-view.el
index bbfbbdec925..018c4eddd34 100644
--- a/lisp/doc-view.el
+++ b/lisp/doc-view.el
@@ -2027,7 +2027,7 @@ doc-view--djvu-outline
(unless file-name (setq file-name (buffer-file-name)))
(with-temp-buffer
(call-process doc-view-djvused-program nil (current-buffer) nil
- "-e" "print-outline" file-name)
+ "-u" "-e" "print-outline" file-name)
(goto-char (point-min))
(when (eobp)
(setq doc-view--outline 'unavailable)
--
2.45.2
^ permalink raw reply related [flat|nested] 4+ messages in thread
* bug#73846: [PATCH] Make djvused emit UTF-8 encoded text
2024-10-17 4:12 bug#73846: [PATCH] Make djvused emit UTF-8 encoded text Visuwesh
@ 2024-10-17 5:26 ` Eli Zaretskii
2024-10-17 8:31 ` Visuwesh
0 siblings, 1 reply; 4+ messages in thread
From: Eli Zaretskii @ 2024-10-17 5:26 UTC (permalink / raw)
To: Visuwesh; +Cc: tsdh, 73846
> Cc: "Tassilo Horn" <tsdh@gnu.org>
> From: Visuwesh <visuweshm@gmail.com>
> Date: Thu, 17 Oct 2024 09:42:30 +0530
>
> This is a small patch to make djvused emit UTF-8 encoded text. In the
> djvu test file that I sent you, outline in the appendix have non-ASCII
> characters which are written as octal escapes. Rather than unescaping
> them on Emacs side, we can request djvused to use UTF-8 directly which
> this patch does. The attached patch does just that.
If you force djvused to emit UTF-8 encoded text, you need to bind
coding-system-for-read to 'utf-8, to make sure Emacs decodes that
correctly. I'm guessing your locale uses UTF-8 by default, which is
why it worked for you.
Please also add a comment there explaining what the -u switch does and
why we use it there.
Thanks.
^ permalink raw reply [flat|nested] 4+ messages in thread
* bug#73846: [PATCH] Make djvused emit UTF-8 encoded text
2024-10-17 5:26 ` Eli Zaretskii
@ 2024-10-17 8:31 ` Visuwesh
2024-10-18 6:07 ` Tassilo Horn
0 siblings, 1 reply; 4+ messages in thread
From: Visuwesh @ 2024-10-17 8:31 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: tsdh, 73846
[-- Attachment #1: Type: text/plain, Size: 1089 bytes --]
[வியாழன் அக்டோபர் 17, 2024] Eli Zaretskii wrote:
>> Cc: "Tassilo Horn" <tsdh@gnu.org>
>> From: Visuwesh <visuweshm@gmail.com>
>> Date: Thu, 17 Oct 2024 09:42:30 +0530
>>
>> This is a small patch to make djvused emit UTF-8 encoded text. In the
>> djvu test file that I sent you, outline in the appendix have non-ASCII
>> characters which are written as octal escapes. Rather than unescaping
>> them on Emacs side, we can request djvused to use UTF-8 directly which
>> this patch does. The attached patch does just that.
>
> If you force djvused to emit UTF-8 encoded text, you need to bind
> coding-system-for-read to 'utf-8, to make sure Emacs decodes that
> correctly. I'm guessing your locale uses UTF-8 by default, which is
> why it worked for you.
My locale is a UTF-8 one indeed. I've now let bound
coding-system-for-read around everything inside with-temp-buffer.
> Please also add a comment there explaining what the -u switch does and
> why we use it there.
Done in attached patch, I hope it is clear.
> Thanks.
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-Make-djvused-emit-UTF-8-encoded-text.patch --]
[-- Type: text/x-diff, Size: 1830 bytes --]
From a39e50a504c9c24f51c7c646f3cfffcec2f34b85 Mon Sep 17 00:00:00 2001
From: Visuwesh <visuweshm@gmail.com>
Date: Thu, 17 Oct 2024 09:40:34 +0530
Subject: [PATCH] Make djvused emit UTF-8 encoded text
* lisp/doc-view.el (doc-view--djvu-outline): Pass -u to djvused
to make it emit UTF-8 encoded text rather than using octal
escapes for non-ASCII string. (bug#73846)
---
lisp/doc-view.el | 17 ++++++++++-------
1 file changed, 10 insertions(+), 7 deletions(-)
diff --git a/lisp/doc-view.el b/lisp/doc-view.el
index bbfbbdec925..4d7d36c8a16 100644
--- a/lisp/doc-view.el
+++ b/lisp/doc-view.el
@@ -2026,13 +2026,16 @@ doc-view--djvu-outline
For the format, see `doc-view--pdf-outline'."
(unless file-name (setq file-name (buffer-file-name)))
(with-temp-buffer
- (call-process doc-view-djvused-program nil (current-buffer) nil
- "-e" "print-outline" file-name)
- (goto-char (point-min))
- (when (eobp)
- (setq doc-view--outline 'unavailable)
- (imenu-unavailable-error "Unable to create imenu index using `djvused'"))
- (nreverse (doc-view--parse-djvu-outline (read (current-buffer))))))
+ (let ((coding-system-for-read 'utf-8))
+ ;; Pass "-u" to make `djvused' emit UTF-8 encoded text to avoid
+ ;; unescaping octal escapes for non-ASCII text.
+ (call-process doc-view-djvused-program nil (current-buffer) nil
+ "-u" "-e" "print-outline" file-name)
+ (goto-char (point-min))
+ (when (eobp)
+ (setq doc-view--outline 'unavailable)
+ (imenu-unavailable-error "Unable to create imenu index using `djvused'"))
+ (nreverse (doc-view--parse-djvu-outline (read (current-buffer)))))))
(defun doc-view--parse-djvu-outline (bookmark &optional level)
"Return a list describing the djvu outline from BOOKMARK.
--
2.45.2
^ permalink raw reply related [flat|nested] 4+ messages in thread
* bug#73846: [PATCH] Make djvused emit UTF-8 encoded text
2024-10-17 8:31 ` Visuwesh
@ 2024-10-18 6:07 ` Tassilo Horn
0 siblings, 0 replies; 4+ messages in thread
From: Tassilo Horn @ 2024-10-18 6:07 UTC (permalink / raw)
To: Visuwesh; +Cc: 73846-done, Eli Zaretskii
Visuwesh <visuweshm@gmail.com> writes:
>> Please also add a comment there explaining what the -u switch does
>> and why we use it there.
>
> Done in attached patch, I hope it is clear.
It is. Applied and pushed to master.
Thanks again,
Tassilo
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2024-10-18 6:07 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-10-17 4:12 bug#73846: [PATCH] Make djvused emit UTF-8 encoded text Visuwesh
2024-10-17 5:26 ` Eli Zaretskii
2024-10-17 8:31 ` Visuwesh
2024-10-18 6:07 ` Tassilo Horn
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/emacs.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).