* call-process -> insert -> iso-latin-1-dos problem on Windows
@ 2024-01-14 23:09 Eduardo Ochs
2024-01-15 0:17 ` Michael Heerdegen via Users list for the GNU Emacs text editor
` (2 more replies)
0 siblings, 3 replies; 4+ messages in thread
From: Eduardo Ochs @ 2024-01-14 23:09 UTC (permalink / raw)
To: help-gnu-emacs
Hi list,
I have a function called `find-wget' that works well in *NIX-like
systems - it calls wget, puts the output in the temporary buffer, and
on unices Emacs always chooses the right encoding... but when I run it
on Windows, and I call wget like this,
wget -q -O - http://anggtwu.net/LUA/Dang1.lua
where Dang1.lua is a file in UTF-8, then Emacs switches the encoding
of output buffer to iso-latin-1-dos...
I probably wrote my code relying in undefined behaviors... any
suggestions on how to fix it? I'm attaching the file with the test and
the comments below, and it's also here:
http://anggtwu.net/elisp/find-wget-jan-2024.el.html
http://anggtwu.net/elisp/find-wget-jan-2024.el
Thanks in advance =/,
Eduardo Ochs
http://anggtwu.net/eepitch.html
http://anggtwu.net/#eev
--snip--snip--
;; This is a simplified version of the `find-wget' from eev:
;;
;; http://anggtwu.net/eev-current/eev-plinks.el.html#find-wget
;; (find-eev "eev-plinks.el" "find-wget")
;;
;; Most functions were copied from the source code of eev without
;; changes; only the ones that are marked as "dummified" were replaced
;; by trivial versions.
(defvar ee-wget-program "wget")
(defvar ee-find-callprocess00-exit-status nil)
;; Dummified versions
(defun ee-expand (fname) fname)
(defun ee-goto-rest (list) ())
(defun ee-goto-position (&optional pos-spec &rest rest) ())
(defun find-ebuffer (buffer &rest pos-spec-list)
"Hyperlink to an Emacs buffer (existing or not)."
(interactive "bBuffer: ")
(switch-to-buffer buffer)
(apply 'ee-goto-position pos-spec-list))
(defun ee-split (str)
"If STR is a string, split it on whitespace and return the resulting list.
If STR if a list, return it unchanged."
(if (stringp str)
(split-string str "[ \t\n]+")
str))
(defun find-callprocess00-ne (program-and-args)
(let ((argv (ee-split program-and-args)))
(with-output-to-string
(with-current-buffer standard-output
(setq ee-find-callprocess00-exit-status
(apply 'call-process (car argv) nil t nil (cdr argv)))))))
(defun find-wget (url &rest pos-spec-list)
"Download URL with \"wget -q -O - URL\" and display the output.
If a buffer named \"*wget: URL*\" already exists then this
function visits it instead of running wget again.
If wget can't download URL then this function runs `error'."
(let* ((eurl (ee-expand url))
(wgetprogandargs (list ee-wget-program "-q" "-O" "-" eurl))
(wgetbufname (format "*wget: %s*" eurl)))
(if (get-buffer wgetbufname)
(apply 'find-ebuffer wgetbufname pos-spec-list)
;;
;; If the buffer wgetbufname doesn't exist, then:
(let* ((wgetoutput (find-callprocess00-ne wgetprogandargs))
(wgetstatus ee-find-callprocess00-exit-status))
;;
(if (not (equal wgetstatus 0))
;; See: (find-node "(wget)Exit Status")
(error "wget can't download: %s" eurl))
;;
(find-ebuffer wgetbufname) ; create buffer
(insert wgetoutput)
(goto-char (point-min))
(apply 'ee-goto-position pos-spec-list)))))
;; Test: (eval-buffer)
;; (find-wget "http://anggtwu.net/LUA/Dang1.lua")
;;
;; When we run the test above on Debian the double angle brackets in
;; the line 12 of Dang1.lua are displayed correctly as single
;; characters - and when we run `M-x hexlify-buffer' we see that they
;; take are encoded in two bytes each - c2ab and c2bb. From
;; /usr/share/unicode/UnicodeData.txt:
;;
;; 00AB;LEFT-POINTING DOUBLE ANGLE QUOTATION MARK;Pi;0;ON;;;;;Y;LEFT
POINTING GUILLEMET;;;;
;; 00BB;RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK;Pf;0;ON;;;;;Y;RIGHT
POINTING GUILLEMET;;;;
;;
;; When we run the `find-wget' above in Emacs 29 for Windows the
;; resulting buffer is put in the encoding "iso-latin-1-dos". `M-x
;; hexlify-buffer' shows that they are still two bytes each - c2ab and
;; c2bb - but they are displayed as two characters each, preceded by
;; "c2"s::
;;
;; 00C2;LATIN CAPITAL LETTER A WITH CIRCUMFLEX;Lu;0;L;0041
0302;;;;N;LATIN CAPITAL LETTER A CIRCUMFLEX;;;00E2;
;;
;; The wget that I am using on Windows was extracted from this zip:
;;
;; https://eternallybored.org/misc/wget/releases/wget-1.21.2-win64.zip
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: call-process -> insert -> iso-latin-1-dos problem on Windows
2024-01-14 23:09 call-process -> insert -> iso-latin-1-dos problem on Windows Eduardo Ochs
@ 2024-01-15 0:17 ` Michael Heerdegen via Users list for the GNU Emacs text editor
2024-01-15 1:25 ` Stefan Monnier via Users list for the GNU Emacs text editor
2024-01-15 12:34 ` Eli Zaretskii
2 siblings, 0 replies; 4+ messages in thread
From: Michael Heerdegen via Users list for the GNU Emacs text editor @ 2024-01-15 0:17 UTC (permalink / raw)
To: help-gnu-emacs
Hi Eduardo,
I would start by looking at the variables whose name contain "process"
and "coding-system".
Michael.
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: call-process -> insert -> iso-latin-1-dos problem on Windows
2024-01-14 23:09 call-process -> insert -> iso-latin-1-dos problem on Windows Eduardo Ochs
2024-01-15 0:17 ` Michael Heerdegen via Users list for the GNU Emacs text editor
@ 2024-01-15 1:25 ` Stefan Monnier via Users list for the GNU Emacs text editor
2024-01-15 12:34 ` Eli Zaretskii
2 siblings, 0 replies; 4+ messages in thread
From: Stefan Monnier via Users list for the GNU Emacs text editor @ 2024-01-15 1:25 UTC (permalink / raw)
To: help-gnu-emacs
> I have a function called `find-wget' that works well in *NIX-like
> systems - it calls wget, puts the output in the temporary buffer, and
> on unices Emacs always chooses the right encoding... but when I run it
> on Windows, and I call wget like this,
>
> wget -q -O - http://anggtwu.net/LUA/Dang1.lua
>
> where Dang1.lua is a file in UTF-8, then Emacs switches the encoding
> of output buffer to iso-latin-1-dos...
Most current POSIX systems use UTF-8 encoding by default.
AFAIK This is not the case under Windows.
> I probably wrote my code relying in undefined behaviors...
You generally need to tell Emacs what's the (expected) encoding of
a process's output. Emacs generally assumes it's "the system's default"
if not, but that's bound to be wrong every once in a while.
You can do that by let-binding `coding-system-for-read`,
`process-coding-system-alist`, or `default-process-coding-system` around
the call to `call-process`.
For async processes you can do it more directly by passing a `:coding`
arg to to `make-process`.
Stefan
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: call-process -> insert -> iso-latin-1-dos problem on Windows
2024-01-14 23:09 call-process -> insert -> iso-latin-1-dos problem on Windows Eduardo Ochs
2024-01-15 0:17 ` Michael Heerdegen via Users list for the GNU Emacs text editor
2024-01-15 1:25 ` Stefan Monnier via Users list for the GNU Emacs text editor
@ 2024-01-15 12:34 ` Eli Zaretskii
2 siblings, 0 replies; 4+ messages in thread
From: Eli Zaretskii @ 2024-01-15 12:34 UTC (permalink / raw)
To: help-gnu-emacs
> From: Eduardo Ochs <eduardoochs@gmail.com>
> Date: Sun, 14 Jan 2024 20:09:41 -0300
>
> I have a function called `find-wget' that works well in *NIX-like
> systems - it calls wget, puts the output in the temporary buffer, and
> on unices Emacs always chooses the right encoding... but when I run it
> on Windows, and I call wget like this,
>
> wget -q -O - http://anggtwu.net/LUA/Dang1.lua
>
> where Dang1.lua is a file in UTF-8, then Emacs switches the encoding
> of output buffer to iso-latin-1-dos...
Emacs cannot reliably distinguish between UTF-8 and Latin-N encodings,
and errs in favor of the latter when the locale's encoding prefers
that. Your Lisp programs should not assume Emacs will auto-detect
UTF-8 every time; instead, if you know for sure that a program's
output is encoded in UTF-8, bind coding-system-for-read to 'utf-8
around the call to call-process.
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2024-01-15 12:34 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-01-14 23:09 call-process -> insert -> iso-latin-1-dos problem on Windows Eduardo Ochs
2024-01-15 0:17 ` Michael Heerdegen via Users list for the GNU Emacs text editor
2024-01-15 1:25 ` Stefan Monnier via Users list for the GNU Emacs text editor
2024-01-15 12:34 ` Eli Zaretskii
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).