From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED!not-for-mail From: Jude DaShiell Newsgroups: gmane.emacs.help Subject: Re: viewing docx files Date: Mon, 30 Jan 2017 02:21:39 -0500 (EST) Message-ID: References: <159e5b8b2e0.2800.9343beaceee5adfd5722805b7ce72987@gmail.com> <87wpddijo4.fsf@flaptop.tomnor.org> <87k29d4h0r.fsf@fastmail.fm> NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII; format=flowed X-Trace: blaine.gmane.org 1485760949 13162 195.159.176.226 (30 Jan 2017 07:22:29 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Mon, 30 Jan 2017 07:22:29 +0000 (UTC) User-Agent: Alpine 2.20 (NEB 67 2015-01-07) Cc: Devin Prater , help-gnu-emacs@gnu.org To: Joost Kremers , Tomas Nordin Original-X-From: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Mon Jan 30 08:22:24 2017 Return-path: Envelope-to: geh-help-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by blaine.gmane.org with esmtp (Exim 4.84_2) (envelope-from ) id 1cY6I9-0003Ac-T1 for geh-help-gnu-emacs@m.gmane.org; Mon, 30 Jan 2017 08:22:22 +0100 Original-Received: from localhost ([::1]:58285 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1cY6IF-000067-2K for geh-help-gnu-emacs@m.gmane.org; Mon, 30 Jan 2017 02:22:27 -0500 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:41288) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1cY6Hc-00005z-11 for help-gnu-emacs@gnu.org; Mon, 30 Jan 2017 02:21:49 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1cY6HX-0006hY-3x for help-gnu-emacs@gnu.org; Mon, 30 Jan 2017 02:21:48 -0500 Original-Received: from mailbackend.panix.com ([166.84.1.89]:47834) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1cY6HW-0006fO-VL for help-gnu-emacs@gnu.org; Mon, 30 Jan 2017 02:21:43 -0500 Original-Received: from panix1.panix.com (panix1.panix.com [166.84.1.1]) by mailbackend.panix.com (Postfix) with ESMTP id 7598313566; Mon, 30 Jan 2017 02:21:39 -0500 (EST) Original-Received: by panix1.panix.com (Postfix, from userid 20712) id 45FA714B9D; Mon, 30 Jan 2017 02:21:39 -0500 (EST) Original-Received: from localhost (localhost [127.0.0.1]) by panix1.panix.com (Postfix) with ESMTP id 4317214B7F; Mon, 30 Jan 2017 02:21:39 -0500 (EST) In-Reply-To: <87k29d4h0r.fsf@fastmail.fm> X-detected-operating-system: by eggs.gnu.org: GNU/Linux (Android) [fuzzy] X-Received-From: 166.84.1.89 X-BeenThere: help-gnu-emacs@gnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: Users list for the GNU Emacs text editor List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Original-Sender: "help-gnu-emacs" Xref: news.gmane.org gmane.emacs.help:112201 Archived-At: I wonder if the file utility can tell the difference between a docx-utf-8 file and a docx-non-utf-8 file. If that can work it may be possible to do a little docx inspection to find when to trigger the unzip->iconv->zip process and only trigger that process when necessary. On Sun, 29 Jan 2017, Joost Kremers wrote: > Date: Sun, 29 Jan 2017 17:51:00 > From: Joost Kremers > To: Tomas Nordin > Cc: Devin Prater , help-gnu-emacs@gnu.org > Subject: Re: viewing docx files > > > On Sun, Jan 29 2017, Tomas Nordin wrote: >> Devin Prater writes: >> >>> Hi all. I'm running Gnu-Emacs (latest brew install emacs version) on MacOS >>> Sierra. I run Emacs in the terminal, and use the Emacspeak package for >>> access, since I am blind. I received an email (gnews), with an attachment, >>> two docx files for reading. I was able to download the attachments to my >>> ~/ directory. I opened the file (c-x c-f then tab completion), but it >>> opened >> >> I wonder if you would like to eval and try this: >> >> (defun docx2html (file) >> "Convert FILE to html in a buffer and display it." >> (interactive "f") >> (let ((html-buffer (format "*%s --> html*" file))) >> (call-process "pandoc" file html-buffer nil "--to=html") >> (switch-to-buffer html-buffer)) >> ) >> >> After evaluation, say M-x docx2html and locate the docx file. See if it >> works. It did not work for me but it seems to have to do with the >> encoding of the characters in the test files I have. I mean, it works >> such that I get the following message from pandoc in the new buffer: >> >> pandoc: Cannot decode byte '\xb1': >> Data.Text.Encoding.Fusion.streamUtf8: Invalid UTF-8 stream > > Pandoc only reads and writes UTF-8 and does no conversion. So if the files > you want to convert & view are in another encoding, you'll need to reencode > them first. Not sure if there's a tool to do that for docx files, though. > iconv can convert text files from one encoding to another, but for that to > work on docx files, you'll need to unzip them first (and zip them up again > afterwards). > > --