unofficial mirror of help-gnu-emacs@gnu.org
 help / color / mirror / Atom feed
From: Eduardo Ochs <eduardoochs@gmail.com>
To: help-gnu-emacs <help-gnu-emacs@gnu.org>
Subject: call-process -> insert -> iso-latin-1-dos problem on Windows
Date: Sun, 14 Jan 2024 20:09:41 -0300	[thread overview]
Message-ID: <CADs++6jc9BXPKqd8Ry5BA53zSvE4GvGeLjjJSomeng=OB-eMiw@mail.gmail.com> (raw)

Hi list,

I have a function called `find-wget' that works well in *NIX-like
systems - it calls wget, puts the output in the temporary buffer, and
on unices Emacs always chooses the right encoding... but when I run it
on Windows, and I call wget like this,

  wget -q -O - http://anggtwu.net/LUA/Dang1.lua

where Dang1.lua is a file in UTF-8, then Emacs switches the encoding
of output buffer to iso-latin-1-dos...

I probably wrote my code relying in undefined behaviors... any
suggestions on how to fix it? I'm attaching the file with the test and
the comments below, and it's also here:

  http://anggtwu.net/elisp/find-wget-jan-2024.el.html
  http://anggtwu.net/elisp/find-wget-jan-2024.el

Thanks in advance =/,
  Eduardo Ochs
  http://anggtwu.net/eepitch.html
  http://anggtwu.net/#eev


--snip--snip--

;; This is a simplified version of the `find-wget' from eev:
;;
;;   http://anggtwu.net/eev-current/eev-plinks.el.html#find-wget
;;                       (find-eev "eev-plinks.el" "find-wget")
;;
;; Most functions were copied from the source code of eev without
;; changes; only the ones that are marked as "dummified" were replaced
;; by trivial versions.

(defvar ee-wget-program "wget")
(defvar ee-find-callprocess00-exit-status nil)

;; Dummified versions
(defun ee-expand (fname) fname)
(defun ee-goto-rest (list) ())
(defun ee-goto-position (&optional pos-spec &rest rest) ())

(defun find-ebuffer (buffer &rest pos-spec-list)
  "Hyperlink to an Emacs buffer (existing or not)."
  (interactive "bBuffer: ")
  (switch-to-buffer buffer)
  (apply 'ee-goto-position pos-spec-list))

(defun ee-split (str)
  "If STR is a string, split it on whitespace and return the resulting list.
If STR if a list, return it unchanged."
  (if (stringp str)
      (split-string str "[ \t\n]+")
    str))

(defun find-callprocess00-ne (program-and-args)
  (let ((argv (ee-split program-and-args)))
    (with-output-to-string
      (with-current-buffer standard-output
(setq ee-find-callprocess00-exit-status
      (apply 'call-process (car argv) nil t nil (cdr argv)))))))

(defun find-wget (url &rest pos-spec-list)
  "Download URL with \"wget -q -O - URL\" and display the output.
If a buffer named \"*wget: URL*\" already exists then this
function visits it instead of running wget again.
If wget can't download URL then this function runs `error'."
  (let* ((eurl (ee-expand url))
(wgetprogandargs (list ee-wget-program "-q" "-O" "-" eurl))
(wgetbufname (format "*wget: %s*" eurl)))
    (if (get-buffer wgetbufname)
(apply 'find-ebuffer wgetbufname pos-spec-list)
      ;;
      ;; If the buffer wgetbufname doesn't exist, then:
      (let* ((wgetoutput (find-callprocess00-ne wgetprogandargs))
     (wgetstatus ee-find-callprocess00-exit-status))
;;
(if (not (equal wgetstatus 0))
    ;; See: (find-node "(wget)Exit Status")
    (error "wget can't download: %s" eurl))
;;
(find-ebuffer wgetbufname) ; create buffer
(insert wgetoutput)
(goto-char (point-min))
(apply 'ee-goto-position pos-spec-list)))))


;; Test: (eval-buffer)
;;       (find-wget "http://anggtwu.net/LUA/Dang1.lua")
;;
;; When we run the test above on Debian the double angle brackets in
;; the line 12 of Dang1.lua are displayed correctly as single
;; characters - and when we run `M-x hexlify-buffer' we see that they
;; take are encoded in two bytes each - c2ab and c2bb. From
;; /usr/share/unicode/UnicodeData.txt:
;;
;; 00AB;LEFT-POINTING DOUBLE ANGLE QUOTATION MARK;Pi;0;ON;;;;;Y;LEFT
POINTING GUILLEMET;;;;
;; 00BB;RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK;Pf;0;ON;;;;;Y;RIGHT
POINTING GUILLEMET;;;;
;;
;; When we run the `find-wget' above in Emacs 29 for Windows the
;; resulting buffer is put in the encoding "iso-latin-1-dos". `M-x
;; hexlify-buffer' shows that they are still two bytes each - c2ab and
;; c2bb - but they are displayed as two characters each, preceded by
;; "c2"s::
;;
;; 00C2;LATIN CAPITAL LETTER A WITH CIRCUMFLEX;Lu;0;L;0041
0302;;;;N;LATIN CAPITAL LETTER A CIRCUMFLEX;;;00E2;
;;
;; The wget that I am using on Windows was extracted from this zip:
;;
;;   https://eternallybored.org/misc/wget/releases/wget-1.21.2-win64.zip



             reply	other threads:[~2024-01-14 23:09 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-01-14 23:09 Eduardo Ochs [this message]
2024-01-15  0:17 ` call-process -> insert -> iso-latin-1-dos problem on Windows Michael Heerdegen via Users list for the GNU Emacs text editor
2024-01-15  1:25 ` Stefan Monnier via Users list for the GNU Emacs text editor
2024-01-15 12:34 ` Eli Zaretskii

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CADs++6jc9BXPKqd8Ry5BA53zSvE4GvGeLjjJSomeng=OB-eMiw@mail.gmail.com' \
    --to=eduardoochs@gmail.com \
    --cc=help-gnu-emacs@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).