From mboxrd@z Thu Jan 1 00:00:00 1970 Path: main.gmane.org!not-for-mail From: Marc Wilhelm =?iso-8859-1?Q?K=FCster?= Newsgroups: gmane.emacs.bugs Subject: Re: UTF-8 related display problem Date: Mon, 07 Oct 2002 09:28:32 +0200 Sender: bug-gnu-emacs-admin@gnu.org Message-ID: <5.1.0.14.2.20021007091714.03334510@pop.puretec.de> References: <5.1.0.14.2.20021005223224.00ac8620@pop.puretec.de> <5.1.0.14.2.20021005223224.00ac8620@pop.puretec.de> NNTP-Posting-Host: localhost.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1"; format=flowed Content-Transfer-Encoding: quoted-printable X-Trace: main.gmane.org 1033975997 6696 127.0.0.1 (7 Oct 2002 07:33:17 GMT) X-Complaints-To: usenet@main.gmane.org NNTP-Posting-Date: Mon, 7 Oct 2002 07:33:17 +0000 (UTC) Cc: bug-gnu-emacs@gnu.org Return-path: Original-Received: from monty-python.gnu.org ([199.232.76.173]) by main.gmane.org with esmtp (Exim 3.35 #1 (Debian)) id 17ySOB-0001jr-00 for ; Mon, 07 Oct 2002 09:33:15 +0200 Original-Received: from localhost ([127.0.0.1] helo=monty-python.gnu.org) by monty-python.gnu.org with esmtp (Exim 4.10) id 17ySO1-00054L-00; Mon, 07 Oct 2002 03:33:05 -0400 Original-Received: from list by monty-python.gnu.org with tmda-scanned (Exim 4.10) id 17ySNd-0004a6-00 for bug-gnu-emacs@gnu.org; Mon, 07 Oct 2002 03:32:41 -0400 Original-Received: from mail by monty-python.gnu.org with spam-scanned (Exim 4.10) id 17ySNb-0004YH-00 for bug-gnu-emacs@gnu.org; Mon, 07 Oct 2002 03:32:40 -0400 Original-Received: from moutvdom.kundenserver.de ([195.20.224.131]) by monty-python.gnu.org with esmtp (Exim 4.10) id 17ySNa-0004VM-00 for bug-gnu-emacs@gnu.org; Mon, 07 Oct 2002 03:32:38 -0400 Original-Received: from [195.20.224.220] (helo=mrvdomng.kundenserver.de) by moutvdom.kundenserver.de with esmtp (Exim 3.35 #1) id 17ySKW-0008IL-00; Mon, 07 Oct 2002 09:29:28 +0200 Original-Received: from [80.152.6.90] (helo=laptop1.saphor.net) by mrvdomng.kundenserver.de with esmtp (Exim 3.35 #1) id 17ySKV-00027U-00; Mon, 07 Oct 2002 09:29:28 +0200 X-Sender: pt7571697-0704@pop.puretec.de X-Mailer: QUALCOMM Windows Eudora Version 5.1 Original-To: Eli Zaretskii In-Reply-To: <7263-Sun06Oct2002212601+0200-eliz@is.elta.co.il> Errors-To: bug-gnu-emacs-admin@gnu.org X-BeenThere: bug-gnu-emacs@gnu.org X-Mailman-Version: 2.0.11 Precedence: bulk List-Help: List-Post: List-Subscribe: , List-Id: Bug reports for GNU Emacs, the Swiss army knife of text editors List-Unsubscribe: , List-Archive: Xref: main.gmane.org gmane.emacs.bugs:3654 X-Report-Spam: http://spam.gmane.org/gmane.emacs.bugs:3654 > > Opening a largish UTF-8-encoded text file (ca. 800 kb) with Latin, Greek > > and Hebrew passages in it causes emacs to stop displaying the text about > > halfway through the text. It is impossible to navigate beyond that= break. > > Shortening or lengthening the text does only move slighlty the point= where > > the text display stops. > > > > The break seems always to be in non-Latin text. > > > > The file displays without problem in other UTF-8-aware applications, so= =20 > the > > UTF-8 itself should be correct. > >Are you sure you have the necessary fonts installed? The list of >places where you can download Unicode fonts can be found in the file >INSTALL in the Emacs distribution. Thanks for the reply! Yes, the necessary fonts are installed and the text, when extracted into=20 another buffer, even displays correctly. Furthermore, saving the file=20 actually shortens it to the point where the display ended, something that=20 should never happen with pure display problems. It looks to me rather like= =20 an input stream problem of sorts (though a strange one, since splitting the= =20 file into parts and work with those parts is a way to get around the= problem). I have checked the UTF-8 by parsing it with Java's InputStreamReader in=20 UTF-8 mode, but no problems whatsoever. However, I cannot reconstruct the problem with any other file. I generated= =20 for this purpose a list of all existing Unicode characters, all in=20 combination with a combining acute, and, except for the documented issue of= =20 characters bigger than U33FF and smaller than UE200, I could not spot= anything. The file in question contains data that should not be widely circulated. Is= =20 it possible that you can have a look at the problem and then delete the=20 file afterwards? Best regards, Marc K=FCster ************************* Marc Wilhelm K=FCster Saphor GmbH Fronl=E4nder 22 D-72072 T=FCbingen Tel.: (+49) / (0)7472 / 949 100 Fax: (+49) / (0)7472 / 949 114