unofficial mirror of help-gnu-emacs@gnu.org
 help / color / mirror / Atom feed
* call-process -> insert -> iso-latin-1-dos problem on Windows
@ 2024-01-14 23:09 Eduardo Ochs
  2024-01-15  0:17 ` Michael Heerdegen via Users list for the GNU Emacs text editor
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: Eduardo Ochs @ 2024-01-14 23:09 UTC (permalink / raw)
  To: help-gnu-emacs

Hi list,

I have a function called `find-wget' that works well in *NIX-like
systems - it calls wget, puts the output in the temporary buffer, and
on unices Emacs always chooses the right encoding... but when I run it
on Windows, and I call wget like this,

  wget -q -O - http://anggtwu.net/LUA/Dang1.lua

where Dang1.lua is a file in UTF-8, then Emacs switches the encoding
of output buffer to iso-latin-1-dos...

I probably wrote my code relying in undefined behaviors... any
suggestions on how to fix it? I'm attaching the file with the test and
the comments below, and it's also here:

  http://anggtwu.net/elisp/find-wget-jan-2024.el.html
  http://anggtwu.net/elisp/find-wget-jan-2024.el

Thanks in advance =/,
  Eduardo Ochs
  http://anggtwu.net/eepitch.html
  http://anggtwu.net/#eev


--snip--snip--

;; This is a simplified version of the `find-wget' from eev:
;;
;;   http://anggtwu.net/eev-current/eev-plinks.el.html#find-wget
;;                       (find-eev "eev-plinks.el" "find-wget")
;;
;; Most functions were copied from the source code of eev without
;; changes; only the ones that are marked as "dummified" were replaced
;; by trivial versions.

(defvar ee-wget-program "wget")
(defvar ee-find-callprocess00-exit-status nil)

;; Dummified versions
(defun ee-expand (fname) fname)
(defun ee-goto-rest (list) ())
(defun ee-goto-position (&optional pos-spec &rest rest) ())

(defun find-ebuffer (buffer &rest pos-spec-list)
  "Hyperlink to an Emacs buffer (existing or not)."
  (interactive "bBuffer: ")
  (switch-to-buffer buffer)
  (apply 'ee-goto-position pos-spec-list))

(defun ee-split (str)
  "If STR is a string, split it on whitespace and return the resulting list.
If STR if a list, return it unchanged."
  (if (stringp str)
      (split-string str "[ \t\n]+")
    str))

(defun find-callprocess00-ne (program-and-args)
  (let ((argv (ee-split program-and-args)))
    (with-output-to-string
      (with-current-buffer standard-output
(setq ee-find-callprocess00-exit-status
      (apply 'call-process (car argv) nil t nil (cdr argv)))))))

(defun find-wget (url &rest pos-spec-list)
  "Download URL with \"wget -q -O - URL\" and display the output.
If a buffer named \"*wget: URL*\" already exists then this
function visits it instead of running wget again.
If wget can't download URL then this function runs `error'."
  (let* ((eurl (ee-expand url))
(wgetprogandargs (list ee-wget-program "-q" "-O" "-" eurl))
(wgetbufname (format "*wget: %s*" eurl)))
    (if (get-buffer wgetbufname)
(apply 'find-ebuffer wgetbufname pos-spec-list)
      ;;
      ;; If the buffer wgetbufname doesn't exist, then:
      (let* ((wgetoutput (find-callprocess00-ne wgetprogandargs))
     (wgetstatus ee-find-callprocess00-exit-status))
;;
(if (not (equal wgetstatus 0))
    ;; See: (find-node "(wget)Exit Status")
    (error "wget can't download: %s" eurl))
;;
(find-ebuffer wgetbufname) ; create buffer
(insert wgetoutput)
(goto-char (point-min))
(apply 'ee-goto-position pos-spec-list)))))


;; Test: (eval-buffer)
;;       (find-wget "http://anggtwu.net/LUA/Dang1.lua")
;;
;; When we run the test above on Debian the double angle brackets in
;; the line 12 of Dang1.lua are displayed correctly as single
;; characters - and when we run `M-x hexlify-buffer' we see that they
;; take are encoded in two bytes each - c2ab and c2bb. From
;; /usr/share/unicode/UnicodeData.txt:
;;
;; 00AB;LEFT-POINTING DOUBLE ANGLE QUOTATION MARK;Pi;0;ON;;;;;Y;LEFT
POINTING GUILLEMET;;;;
;; 00BB;RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK;Pf;0;ON;;;;;Y;RIGHT
POINTING GUILLEMET;;;;
;;
;; When we run the `find-wget' above in Emacs 29 for Windows the
;; resulting buffer is put in the encoding "iso-latin-1-dos". `M-x
;; hexlify-buffer' shows that they are still two bytes each - c2ab and
;; c2bb - but they are displayed as two characters each, preceded by
;; "c2"s::
;;
;; 00C2;LATIN CAPITAL LETTER A WITH CIRCUMFLEX;Lu;0;L;0041
0302;;;;N;LATIN CAPITAL LETTER A CIRCUMFLEX;;;00E2;
;;
;; The wget that I am using on Windows was extracted from this zip:
;;
;;   https://eternallybored.org/misc/wget/releases/wget-1.21.2-win64.zip



^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: call-process -> insert -> iso-latin-1-dos problem on Windows
  2024-01-14 23:09 call-process -> insert -> iso-latin-1-dos problem on Windows Eduardo Ochs
@ 2024-01-15  0:17 ` Michael Heerdegen via Users list for the GNU Emacs text editor
  2024-01-15  1:25 ` Stefan Monnier via Users list for the GNU Emacs text editor
  2024-01-15 12:34 ` Eli Zaretskii
  2 siblings, 0 replies; 4+ messages in thread
From: Michael Heerdegen via Users list for the GNU Emacs text editor @ 2024-01-15  0:17 UTC (permalink / raw)
  To: help-gnu-emacs

Hi Eduardo,

I would start by looking at the variables whose name contain "process"
and "coding-system".

Michael.




^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: call-process -> insert -> iso-latin-1-dos problem on Windows
  2024-01-14 23:09 call-process -> insert -> iso-latin-1-dos problem on Windows Eduardo Ochs
  2024-01-15  0:17 ` Michael Heerdegen via Users list for the GNU Emacs text editor
@ 2024-01-15  1:25 ` Stefan Monnier via Users list for the GNU Emacs text editor
  2024-01-15 12:34 ` Eli Zaretskii
  2 siblings, 0 replies; 4+ messages in thread
From: Stefan Monnier via Users list for the GNU Emacs text editor @ 2024-01-15  1:25 UTC (permalink / raw)
  To: help-gnu-emacs

> I have a function called `find-wget' that works well in *NIX-like
> systems - it calls wget, puts the output in the temporary buffer, and
> on unices Emacs always chooses the right encoding... but when I run it
> on Windows, and I call wget like this,
>
>   wget -q -O - http://anggtwu.net/LUA/Dang1.lua
>
> where Dang1.lua is a file in UTF-8, then Emacs switches the encoding
> of output buffer to iso-latin-1-dos...

Most current POSIX systems use UTF-8 encoding by default.
AFAIK This is not the case under Windows.

> I probably wrote my code relying in undefined behaviors...

You generally need to tell Emacs what's the (expected) encoding of
a process's output.  Emacs generally assumes it's "the system's default"
if not, but that's bound to be wrong every once in a while.
You can do that by let-binding `coding-system-for-read`,
`process-coding-system-alist`, or `default-process-coding-system` around
the call to `call-process`.
For async processes you can do it more directly by passing a `:coding`
arg to to `make-process`.


        Stefan




^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: call-process -> insert -> iso-latin-1-dos problem on Windows
  2024-01-14 23:09 call-process -> insert -> iso-latin-1-dos problem on Windows Eduardo Ochs
  2024-01-15  0:17 ` Michael Heerdegen via Users list for the GNU Emacs text editor
  2024-01-15  1:25 ` Stefan Monnier via Users list for the GNU Emacs text editor
@ 2024-01-15 12:34 ` Eli Zaretskii
  2 siblings, 0 replies; 4+ messages in thread
From: Eli Zaretskii @ 2024-01-15 12:34 UTC (permalink / raw)
  To: help-gnu-emacs

> From: Eduardo Ochs <eduardoochs@gmail.com>
> Date: Sun, 14 Jan 2024 20:09:41 -0300
> 
> I have a function called `find-wget' that works well in *NIX-like
> systems - it calls wget, puts the output in the temporary buffer, and
> on unices Emacs always chooses the right encoding... but when I run it
> on Windows, and I call wget like this,
> 
>   wget -q -O - http://anggtwu.net/LUA/Dang1.lua
> 
> where Dang1.lua is a file in UTF-8, then Emacs switches the encoding
> of output buffer to iso-latin-1-dos...

Emacs cannot reliably distinguish between UTF-8 and Latin-N encodings,
and errs in favor of the latter when the locale's encoding prefers
that.  Your Lisp programs should not assume Emacs will auto-detect
UTF-8 every time; instead, if you know for sure that a program's
output is encoded in UTF-8, bind coding-system-for-read to 'utf-8
around the call to call-process.



^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2024-01-15 12:34 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-01-14 23:09 call-process -> insert -> iso-latin-1-dos problem on Windows Eduardo Ochs
2024-01-15  0:17 ` Michael Heerdegen via Users list for the GNU Emacs text editor
2024-01-15  1:25 ` Stefan Monnier via Users list for the GNU Emacs text editor
2024-01-15 12:34 ` Eli Zaretskii

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).