From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: "Alexandre Oberlin" Newsgroups: gmane.emacs.help Subject: Re: Single unrecognized character wrecks entire display Date: Fri, 24 Aug 2012 15:46:47 +0200 Organization: TeraNews.com Message-ID: References: NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed; delsp=yes Content-Transfer-Encoding: Quoted-Printable X-Trace: ger.gmane.org 1345816218 17503 80.91.229.3 (24 Aug 2012 13:50:18 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Fri, 24 Aug 2012 13:50:18 +0000 (UTC) To: help-gnu-emacs@gnu.org Original-X-From: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Fri Aug 24 15:50:19 2012 Return-path: Envelope-to: geh-help-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1T4uHC-0000Ud-P5 for geh-help-gnu-emacs@m.gmane.org; Fri, 24 Aug 2012 15:50:18 +0200 Original-Received: from localhost ([::1]:54817 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1T4uHA-00009b-0v for geh-help-gnu-emacs@m.gmane.org; Fri, 24 Aug 2012 09:50:16 -0400 Original-Path: usenet.stanford.edu!news.glorb.com!npeer02.iad.highwinds-media.com!news.highwinds-media.com!feed-me.highwinds-media.com!post01.iad.highwinds-media.com!newsfe14.iad.POSTED!not-for-mail Original-Newsgroups: gnu.emacs.help User-Agent: Opera Mail/12.00 (Linux) Original-Lines: 75 Original-X-Complaints-To: abuse@teranews.com Original-NNTP-Posting-Date: Fri, 24 Aug 2012 13:46:52 UTC X-Received-Bytes: 2780 Original-Xref: usenet.stanford.edu gnu.emacs.help:194128 X-BeenThere: help-gnu-emacs@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Users list for the GNU Emacs text editor List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Original-Sender: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.help:86493 Archived-At: Thank you Peter for your detailed reply. On Wed, 22 Aug 2012 17:18:36 +0200, Peter Dyballa = = wrote: > When you set read-quoted-char-radix to 8 you can search for these = > "characters" in the text by: > > C-s C-q 3 5 1 RET > Nice command to find eventual other occurrences once you=E2=80=99ve foun= d the = culprit! > Hopefully! I think the problem is that your convertor (can't you use = > something reliable like iconv or recode?) makes mistakes. iconv acts just the same. It tells me the 13th character is faulty (\351= ), = while only the 40th is (\234) > \240 or A0 in hex exists as partner of another byte (with C2 it = > constructs NO-BREAK SPACE, with C3 it's LATIN SMALL LETTER A WITH GRAV= E, = > =E2=80=A6), \234 or 9C builds with C3 LATIN CAPITAL LETTER U WITH DIAE= RESIS etc. = > I think what GNU Emacs wants to tell you and what I did not understand= = > the first time is, that some characters obviously are not encoded = > correctly so that these "isolated" *bytes* are left over, they don't f= it = > into regular 2- or 3- or even 4-byte codes of the UTF-8 encoding =E2=80= =93 and = > of course none of them is an ASCII character encoded by one byte (i.e.= , = > itself). Clear. > Can you give us some more details of the original source and the = > convertor, and its working principle (command line options)? It is the target document output of Trados Studio 2009 (well known = translation software running on Windows only). The original is in Englis= h = and has no such problems, but opens in emacs as raw text Mac. > How do you open it in GNU Emacs? C-x C-f How does it behave when you had launched GNU Emacs > env LC_CTYPE=3DUTF-8 LANG=3Dfr_FR.UTF-8 emacs -Q & Same. I=E2=80=99m using GNU emacs on Linux, on Cygwin when I use Trados. The b= ehavior is = the same on that regard. Cheers, Alexandre