From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: "Stephen J. Turnbull" Newsgroups: gmane.emacs.devel,gmane.emacs.pretest.bugs Subject: Re: jka-compr.el doesn't recognise gzipped files from their magic bytes Date: Fri, 21 Sep 2007 08:24:41 +0900 Message-ID: <878x70yl6u.fsf@uwakimon.sk.tsukuba.ac.jp> References: <87vea8wsqo.fsf@uwakimon.sk.tsukuba.ac.jp> <87tzpswmjm.fsf@uwakimon.sk.tsukuba.ac.jp> <877imnia0p.fsf@uwakimon.sk.tsukuba.ac.jp> <87sl592moj.fsf@uwakimon.sk.tsukuba.ac.jp> NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: sea.gmane.org 1190330673 3651 80.91.229.12 (20 Sep 2007 23:24:33 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Thu, 20 Sep 2007 23:24:33 +0000 (UTC) Cc: emacs-pretest-bug@gnu.org, Eli Zaretskii , christopher.ian.moore@gmail.com To: Stefan Monnier Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Fri Sep 21 01:24:22 2007 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.50) id 1IYVNX-0007dR-Kg for ged-emacs-devel@m.gmane.org; Fri, 21 Sep 2007 01:24:15 +0200 Original-Received: from localhost ([127.0.0.1] helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1IYVNV-0003QM-BG for ged-emacs-devel@m.gmane.org; Thu, 20 Sep 2007 19:24:13 -0400 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1IYVNR-0003Pv-O6 for emacs-devel@gnu.org; Thu, 20 Sep 2007 19:24:09 -0400 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1IYVNL-0003PI-SC for emacs-devel@gnu.org; Thu, 20 Sep 2007 19:24:09 -0400 Original-Received: from [199.232.76.173] (helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1IYVNL-0003PF-O6 for emacs-devel@gnu.org; Thu, 20 Sep 2007 19:24:03 -0400 Original-Received: from fencepost.gnu.org ([140.186.70.10]) by monty-python.gnu.org with esmtp (Exim 4.60) (envelope-from ) id 1IYVNL-0003Mv-IM for emacs-devel@gnu.org; Thu, 20 Sep 2007 19:24:03 -0400 Original-Received: from monty-python.gnu.org ([199.232.76.173]) by fencepost.gnu.org with esmtp (Exim 4.60) (envelope-from ) id 1IYVMl-00017Z-Pl for emacs-pretest-bug@gnu.org; Thu, 20 Sep 2007 19:23:27 -0400 Original-Received: from Debian-exim by monty-python.gnu.org with spam-scanned (Exim 4.60) (envelope-from ) id 1IYVNG-0003M0-0t for emacs-pretest-bug@gnu.org; Thu, 20 Sep 2007 19:24:03 -0400 Original-Received: from mtps02.sk.tsukuba.ac.jp ([130.158.97.224]) by monty-python.gnu.org with esmtp (Exim 4.60) (envelope-from ) id 1IYVNB-0003Kh-9u; Thu, 20 Sep 2007 19:23:53 -0400 Original-Received: from uwakimon.sk.tsukuba.ac.jp (unknown [130.158.99.156]) by mtps02.sk.tsukuba.ac.jp (Postfix) with ESMTP id 900C77FFA; Fri, 21 Sep 2007 08:23:51 +0900 (JST) Original-Received: by uwakimon.sk.tsukuba.ac.jp (Postfix, from userid 1000) id 94E4E1A2E11; Fri, 21 Sep 2007 08:24:42 +0900 (JST) In-Reply-To: X-Mailer: VM 7.17 under 21.5 (beta28) "fuki" (+CVS-20070621) XEmacs Lucid X-Detected-Kernel: Linux 2.6, seldom 2.4 (older, 4) X-Detected-Kernel: Linux 2.6, seldom 2.4 (older, 4) X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:79412 gmane.emacs.pretest.bugs:19954 Archived-At: Stefan Monnier writes: > Let me try to answer: I don't know how it really works, but IIUC as long as > we only get ASCII bytes without end of line, the coding system is left as > `undecided' and on the first packet we receive with an LF or CR or a byte > larger than 128, the coding system is decided based on this packet and this > packet only. Tho, I guess the decision on EOL is orthogonal, so we may go > from `undecided' to `undecided-unix' on one packet (or to `latin-undecided') > and only get to `latin1-unix' on a later packet. Could be, although I really wouldn't want to make a decision based on a very few non-ASCII bytes. Point is, there has to be a buffer holding that packet that the coding system has access to. Perhaps even an Emacs buffer in binary coding system or buffer-as-unibyte mode. It is analyzed and then the process seeks back to where the non-ASCII stuff started (or, more likely, the beginning of the buffer), and decodes it. Then further input is read. The same thing can surely be done with magic numbers to identify images, zipfiles, and the like. There is no need to open, close, and reopen the stream, and none of the inefficiency that Eli was claiming. The exception is if you're going to process it through an external process (such as /bin/gzip) anyway, in which case the detection phase is pretty small overhead compared to the convenience of doing the detection in Emacs.