From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: riccardo.murri@gmail.com Newsgroups: gmane.emacs.help Subject: Re: how to find encoding violations in Emacs buffer? Date: 13 Dec 2006 00:39:57 -0800 Organization: http://groups.google.com Message-ID: <1165999197.876027.315040@73g2000cwn.googlegroups.com> References: <1165947493.201071.294760@l12g2000cwl.googlegroups.com> NNTP-Posting-Host: dough.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable X-Trace: sea.gmane.org 1165999260 29690 80.91.229.10 (13 Dec 2006 08:41:00 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Wed, 13 Dec 2006 08:41:00 +0000 (UTC) Original-X-From: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Wed Dec 13 09:41:00 2006 Return-path: Envelope-to: geh-help-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by dough.gmane.org with esmtp (Exim 4.50) id 1GuPfg-0008QX-5V for geh-help-gnu-emacs@m.gmane.org; Wed, 13 Dec 2006 09:41:00 +0100 Original-Received: from localhost ([127.0.0.1] helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1GuPff-0008B6-Mz for geh-help-gnu-emacs@m.gmane.org; Wed, 13 Dec 2006 03:40:59 -0500 Original-Path: shelby.stanford.edu!newsfeed.stanford.edu!postnews.google.com!73g2000cwn.googlegroups.com!not-for-mail Original-Newsgroups: gnu.emacs.help Original-Lines: 35 Original-NNTP-Posting-Host: 81.174.45.6 Original-X-Trace: posting.google.com 1165999203 16444 127.0.0.1 (13 Dec 2006 08:40:03 GMT) Original-X-Complaints-To: groups-abuse@google.com Original-NNTP-Posting-Date: Wed, 13 Dec 2006 08:40:03 +0000 (UTC) In-Reply-To: User-Agent: G2/1.0 X-HTTP-UserAgent: Mozilla/5.0 (X11; U; Linux ppc; en-US; rv:1.8.0.7) Gecko/20060830 Firefox/1.5.0.7 (Debian-1.5.dfsg+1.5.0.7-2),gzip(gfe),gzip(gfe) Complaints-To: groups-abuse@google.com Injection-Info: 73g2000cwn.googlegroups.com; posting-host=81.174.45.6; posting-account=Ko2Q_w0AAACs81J8LtxGb3EnI389882f Original-Xref: shelby.stanford.edu gnu.emacs.help:143894 Original-To: help-gnu-emacs@gnu.org X-BeenThere: help-gnu-emacs@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Users list for the GNU Emacs text editor List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Errors-To: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.help:39498 Archived-At: On Dec 13, 5:26 am, Eli Zaretskii wrote: > > From: riccardo.mu... -at- gmail.com > > Date: 12 Dec 2006 10:18:13 -0800 > > > from time to time, a buffer gets some spurious character in and Emacs > > refuses to save it in the correct encoding. So I am presented with the > > choice of other different encodings. > > > However, in most of the cases, I know that the file *should* be UTF-8 > > encoded. So I would rather like to find out where the offending > > character is and correct it, instead of choosing a different encoding. > > > Is there any function/package/elisp hack to find/highlight characters > > in a buffer that Emacs could not encode as UTF-8? > > Emacs 22 already shows the problematic characters. Please look closer > at the text of the buffer where Emacs tells you why it needs your > decision about the encoding. Yes, but it may be hard to spot one single problematic character in a large buffer. In the case at hand, I had one Latin-1 "=F9" in a 20k UTF-8 text, and, since the encoding was thus incorrect and could not be autodetected, Emacs displayed al non-ASCII characters as \xxx escape sequences... Isn't there a way to implement a "goto-next-problematic-char" elisp function? UTF-8 has a rather simple algorithm to detect encoding violations, which can point at the precise point where a byte sequence violates UTF-8 rules, but I wondered if Emacs had a more general interface: if it knows where in the buffer the encoding violations are located, one would assume that this information would be available at elisp level. Riccardo