From mboxrd@z Thu Jan  1 00:00:00 1970
Path: news.gmane.org!not-for-mail
From: "riccardo.murri@gmail.com" <riccardo.murri@gmail.com>
Newsgroups: gmane.emacs.help
Subject: Re: how to find encoding violations in Emacs buffer?
Date: 13 Dec 2006 04:34:09 -0800
Organization: http://groups.google.com
Message-ID: <1166013248.957606.258900@f1g2000cwa.googlegroups.com>
References: <1165947493.201071.294760@l12g2000cwl.googlegroups.com>
	<mailman.1814.1165984004.2155.help-gnu-emacs@gnu.org>
	<1165999197.876027.315040@73g2000cwn.googlegroups.com>
	<mailman.1823.1166006732.2155.help-gnu-emacs@gnu.org>
NNTP-Posting-Host: dough.gmane.org
Mime-Version: 1.0
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
X-Trace: sea.gmane.org 1166013790 14952 80.91.229.10 (13 Dec 2006 12:43:10 GMT)
X-Complaints-To: usenet@sea.gmane.org
NNTP-Posting-Date: Wed, 13 Dec 2006 12:43:10 +0000 (UTC)
Original-X-From: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Wed Dec 13 13:43:09 2006
Return-path: <help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org>
Envelope-to: geh-help-gnu-emacs@m.gmane.org
Original-Received: from lists.gnu.org ([199.232.76.165])
	by dough.gmane.org with esmtp (Exim 4.50)
	id 1GuTRx-0005Ze-3U
	for geh-help-gnu-emacs@m.gmane.org; Wed, 13 Dec 2006 13:43:05 +0100
Original-Received: from localhost ([127.0.0.1] helo=lists.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43)
	id 1GuTRw-0002hC-8U
	for geh-help-gnu-emacs@m.gmane.org; Wed, 13 Dec 2006 07:43:04 -0500
Original-Path: shelby.stanford.edu!newsfeed.stanford.edu!postnews.google.com!f1g2000cwa.googlegroups.com!not-for-mail
Original-Newsgroups: gnu.emacs.help
Original-Lines: 34
Original-NNTP-Posting-Host: 140.105.46.64
Original-X-Trace: posting.google.com 1166013254 18905 127.0.0.1 (13 Dec 2006 12:34:14
	GMT)
Original-X-Complaints-To: groups-abuse@google.com
Original-NNTP-Posting-Date: Wed, 13 Dec 2006 12:34:14 +0000 (UTC)
In-Reply-To: <mailman.1823.1166006732.2155.help-gnu-emacs@gnu.org>
User-Agent: G2/1.0
X-HTTP-UserAgent: Mozilla/5.0 (X11; U; Linux i686; en-US;
	rv:1.8.0.7) Gecko/20060830 Firefox/1.5.0.7
	(Debian-1.5.dfsg+1.5.0.7-2),gzip(gfe),gzip(gfe)
Complaints-To: groups-abuse@google.com
Injection-Info: f1g2000cwa.googlegroups.com; posting-host=140.105.46.64;
	posting-account=Ko2Q_w0AAACs81J8LtxGb3EnI389882f
Original-Xref: shelby.stanford.edu gnu.emacs.help:143902
Original-To: help-gnu-emacs@gnu.org
X-BeenThere: help-gnu-emacs@gnu.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Users list for the GNU Emacs text editor <help-gnu-emacs.gnu.org>
List-Unsubscribe: <http://lists.gnu.org/mailman/listinfo/help-gnu-emacs>,
	<mailto:help-gnu-emacs-request@gnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/pipermail/help-gnu-emacs>
List-Post: <mailto:help-gnu-emacs@gnu.org>
List-Help: <mailto:help-gnu-emacs-request@gnu.org?subject=help>
List-Subscribe: <http://lists.gnu.org/mailman/listinfo/help-gnu-emacs>,
	<mailto:help-gnu-emacs-request@gnu.org?subject=subscribe>
Original-Sender: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org
Errors-To: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org
Xref: news.gmane.org gmane.emacs.help:39506
Archived-At: <http://permalink.gmane.org/gmane.emacs.help/39506>


On Dec 13, 11:45 am, Peter Dyballa <Peter_Dyba... -at- Web.DE> wrote:
> Am 13.12.2006 um 09:39 schrieb riccardo.mu.. -at- gmail.com:
>
> > Yes, but it may be hard to spot one single problematic character in a
> > large buffer.  In the case at hand, I had one Latin-1 "=F9" in a 20k
> > UTF-8 text,This character is an UTF-8 entity:
>
>         [=F9]  00F9  LATIN SMALL LETTER U WITH GRAVE
>
> It cannot be the cause. In UTF-8 it's encoded as C3 B9.

Yes, but the file had 0xF9 in it instead of 0xC3B9, which caused UTF-8
auto-detection to fail.

> > Isn't there a way to implement a "goto-next-problematic-char" elisp
> > function?  UTF-8 has a rather simple algorithm to detect encoding
> > violations, which can point at the precise point where a byte sequence
> > violates UTF-8 rules, but I wondered if Emacs had a more general
> > interface: if it knows where in the buffer the encoding violations
> > are located, one would assume that this information would be available
> > at elisp level.
>
> There is something like this already implemented in PostScript
> printing: when the buffer contains characters outside a specific ISO
> Latin encoding up to a dozen of them is presented in a warning buffer.
>

Thank you for the pointer!  I'll have a look at that code.

Greetings,
Riccardo