From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: "Stephen J. Turnbull" Newsgroups: gmane.emacs.devel Subject: What does Emacs on w32 know that grep can't figure out? Date: Fri, 01 Oct 2010 13:00:02 +0900 Message-ID: <874od6bm0t.fsf@uwakimon.sk.tsukuba.ac.jp> References: NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: dough.gmane.org 1285906138 29601 80.91.229.12 (1 Oct 2010 04:08:58 GMT) X-Complaints-To: usenet@dough.gmane.org NNTP-Posting-Date: Fri, 1 Oct 2010 04:08:58 +0000 (UTC) Cc: Emacs-Devel devel To: Lennart Borgman Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Fri Oct 01 06:08:57 2010 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1P1WvX-0004R7-39 for ged-emacs-devel@m.gmane.org; Fri, 01 Oct 2010 06:08:55 +0200 Original-Received: from localhost ([127.0.0.1]:60197 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1P1WvW-0000Pz-LQ for ged-emacs-devel@m.gmane.org; Fri, 01 Oct 2010 00:08:54 -0400 Original-Received: from [140.186.70.92] (port=36905 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1P1WvR-0000Oa-7P for emacs-devel@gnu.org; Fri, 01 Oct 2010 00:08:50 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1P1WvQ-0005Ew-3z for emacs-devel@gnu.org; Fri, 01 Oct 2010 00:08:49 -0400 Original-Received: from imss12.cc.tsukuba.ac.jp ([130.158.254.161]:58484) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1P1WvP-0005BM-Kw for emacs-devel@gnu.org; Fri, 01 Oct 2010 00:08:48 -0400 Original-Received: from imss12.cc.tsukuba.ac.jp (imss12.cc.tsukuba.ac.jp [127.0.0.1]) by postfix.imss71 (Postfix) with ESMTP id EF8A32AF543; Fri, 1 Oct 2010 13:08:39 +0900 (JST) Original-Received: from mgmt1.sk.tsukuba.ac.jp (unknown [130.158.97.223]) by imss12.cc.tsukuba.ac.jp (Postfix) with ESMTP id E11862AF542; Fri, 1 Oct 2010 13:08:39 +0900 (JST) Original-Received: from uwakimon.sk.tsukuba.ac.jp (uwakimon.sk.tsukuba.ac.jp [130.158.99.156]) by mgmt1.sk.tsukuba.ac.jp (Postfix) with ESMTP id DF1963FA04B0; Fri, 1 Oct 2010 13:08:39 +0900 (JST) Original-Received: by uwakimon.sk.tsukuba.ac.jp (Postfix, from userid 1000) id 7CC5212146D; Fri, 1 Oct 2010 13:00:02 +0900 (JST) In-Reply-To: X-Mailer: VM 8.1.93a under 21.5 (beta29) "garbanzo" ed3b274cc037 XEmacs Lucid (x86_64-unknown-linux) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 3) X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:131127 Archived-At: Lennart Borgman writes: > However trying to search this file from a cmd prompt with (gnuwin32) > grep does not work. No, it almost certainly won't. grep is byte-oriented and doesn't know anything about coding systems. On Unix with a UTF-8-capable terminal you would do something like iconv --from=UTF-16 --to=UTF-8 $FILE | grep some-string I would think that either Cygwin or Windows provides a version of iconv. If not, changing the file to UTF-8 (instead of UTF-16) using Emacs should make it grep'able. In some cases grep may think this is a binary file anyway; if so, use the --text switch to force grep to treat the file as text. > And it does not work with cygwin grep either. They think it is a > binary file (even though I changed the line delimiter to unix > style). The EOL delimiter is not a problem. grep should ignore the presence or absence of CR when checking for binary files. The only time it is likely to matter is if you are searching for a word at the end of the line, in which case instead of "word$" you can use "word\015?$" or something like that (if it matters, grep may be EOL-agnostic these days). Now, of course they think a UTF-16-encoded file is a binary file. It almost certainly contains NUL bytes (because an ASCII or Latin-1 character will always have a trailing NUL in UTF-16LE). > What is going on? Is grep sometimes useless on w32 now, or? (How do we > handle that in Emacs?) Emacs tries to guess what the encoding is if you don't specify it. It may guess wrong in certain cases, but it should be extremely accurate in case of any Unicode format.