From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: "riccardo.murri@gmail.com" Newsgroups: gmane.emacs.help Subject: Re: how to find encoding violations in Emacs buffer? Date: 13 Dec 2006 04:34:09 -0800 Organization: http://groups.google.com Message-ID: <1166013248.957606.258900@f1g2000cwa.googlegroups.com> References: <1165947493.201071.294760@l12g2000cwl.googlegroups.com> <1165999197.876027.315040@73g2000cwn.googlegroups.com> NNTP-Posting-Host: dough.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable X-Trace: sea.gmane.org 1166013790 14952 80.91.229.10 (13 Dec 2006 12:43:10 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Wed, 13 Dec 2006 12:43:10 +0000 (UTC) Original-X-From: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Wed Dec 13 13:43:09 2006 Return-path: Envelope-to: geh-help-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by dough.gmane.org with esmtp (Exim 4.50) id 1GuTRx-0005Ze-3U for geh-help-gnu-emacs@m.gmane.org; Wed, 13 Dec 2006 13:43:05 +0100 Original-Received: from localhost ([127.0.0.1] helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1GuTRw-0002hC-8U for geh-help-gnu-emacs@m.gmane.org; Wed, 13 Dec 2006 07:43:04 -0500 Original-Path: shelby.stanford.edu!newsfeed.stanford.edu!postnews.google.com!f1g2000cwa.googlegroups.com!not-for-mail Original-Newsgroups: gnu.emacs.help Original-Lines: 34 Original-NNTP-Posting-Host: 140.105.46.64 Original-X-Trace: posting.google.com 1166013254 18905 127.0.0.1 (13 Dec 2006 12:34:14 GMT) Original-X-Complaints-To: groups-abuse@google.com Original-NNTP-Posting-Date: Wed, 13 Dec 2006 12:34:14 +0000 (UTC) In-Reply-To: User-Agent: G2/1.0 X-HTTP-UserAgent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.0.7) Gecko/20060830 Firefox/1.5.0.7 (Debian-1.5.dfsg+1.5.0.7-2),gzip(gfe),gzip(gfe) Complaints-To: groups-abuse@google.com Injection-Info: f1g2000cwa.googlegroups.com; posting-host=140.105.46.64; posting-account=Ko2Q_w0AAACs81J8LtxGb3EnI389882f Original-Xref: shelby.stanford.edu gnu.emacs.help:143902 Original-To: help-gnu-emacs@gnu.org X-BeenThere: help-gnu-emacs@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Users list for the GNU Emacs text editor List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Errors-To: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.help:39506 Archived-At: On Dec 13, 11:45 am, Peter Dyballa wrote: > Am 13.12.2006 um 09:39 schrieb riccardo.mu.. -at- gmail.com: > > > Yes, but it may be hard to spot one single problematic character in a > > large buffer. In the case at hand, I had one Latin-1 "=F9" in a 20k > > UTF-8 text,This character is an UTF-8 entity: > > [=F9] 00F9 LATIN SMALL LETTER U WITH GRAVE > > It cannot be the cause. In UTF-8 it's encoded as C3 B9. Yes, but the file had 0xF9 in it instead of 0xC3B9, which caused UTF-8 auto-detection to fail. > > Isn't there a way to implement a "goto-next-problematic-char" elisp > > function? UTF-8 has a rather simple algorithm to detect encoding > > violations, which can point at the precise point where a byte sequence > > violates UTF-8 rules, but I wondered if Emacs had a more general > > interface: if it knows where in the buffer the encoding violations > > are located, one would assume that this information would be available > > at elisp level. > > There is something like this already implemented in PostScript > printing: when the buffer contains characters outside a specific ISO > Latin encoding up to a dozen of them is presented in a warning buffer. > Thank you for the pointer! I'll have a look at that code. Greetings, Riccardo