unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
* UTF-8 related display problem
@ 2002-10-05 20:36 Marc Wilhelm Küster
  2002-10-06 18:26 ` Eli Zaretskii
  0 siblings, 1 reply; 6+ messages in thread
From: Marc Wilhelm Küster @ 2002-10-05 20:36 UTC (permalink / raw)


This bug report will be sent to the Free Software Foundation,
not to your local site managers!
Please write in English, because the Emacs maintainers do not have
translators to read other languages for them.

Your bug report will be posted to the bug-gnu-emacs@gnu.org mailing list,
and to the gnu.emacs.bug news group.

In GNU Emacs 21.2.1 (i386-msvc-nt5.0.2195)
  of 2002-10-01 on LAPTOP1
configured using `configure --with-msvc (12.00)'
Important settings:
   value of $LC_ALL: nil
   value of $LC_COLLATE: nil
   value of $LC_CTYPE: nil
   value of $LC_MESSAGES: nil
   value of $LC_MONETARY: nil
   value of $LC_NUMERIC: nil
   value of $LC_TIME: nil
   value of $LANG: DEU
   locale-coding-system: iso-latin-1
   default-enable-multibyte-characters: t

Please describe exactly what actions triggered the bug
and the precise symptoms of the bug:

Opening a largish UTF-8-encoded text file (ca. 800 kb) with Latin, Greek 
and Hebrew passages in it causes emacs to stop displaying the text about 
halfway through the text. It is impossible to navigate beyond that break.
Shortening or lengthening the text does only move slighlty the point where 
the text display stops.

The break seems always to be in non-Latin text.

The file displays without problem in other UTF-8-aware applications, so the 
UTF-8 itself should be correct.

Choice of font does not have any effect on this behaviour.

Best regards,

Marc Küster


Recent input:
<mouse-wheel> <mouse-wheel> <mouse-wheel> <mouse-wheel>
<mouse-wheel> <mouse-wheel> <mouse-wheel> <mouse-wheel>
<mouse-wheel> <mouse-wheel> <mouse-wheel> <mouse-wheel>
<mouse-wheel> <mouse-wheel> <mouse-wheel> <mouse-wheel>
<mouse-wheel> <down-mouse-1> <mouse-movement> <mouse-movement>
<mouse-movement> <mouse-movement> <mouse-movement>
<mouse-movement> <mouse-movement> <mouse-movement>
<mouse-movement> <mouse-movement> <mouse-movement>
<mouse-movement> <mouse-movement> <mouse-movement>
<mouse-movement> <mouse-movement> <mouse-movement>
<mouse-movement> <mouse-movement> <mouse-movement>
<mouse-movement> <mouse-movement> <mouse-movement>
<mouse-movement> <mouse-movement> <mouse-movement>
<mouse-movement> <mouse-movement> <mouse-movement>
<mouse-movement> <mouse-movement> <mouse-movement>
<mouse-movement> <mouse-movement> <mouse-movement>
<mouse-movement> <mouse-movement> <mouse-movement>
<mouse-movement> <mouse-movement> <mouse-movement>
<drag-mouse-1> C-w C-x C-s C-x k <return> C-x <return>
c u t f - 8 <return> C-x C-f <backspace> <backspace>
<backspace> <backspace> <backspace> <backspace> <backspace>
o r i <tab> . t <tab> <return> M-> C-x <escape> <escape>
C-g <menu-bar> <help-menu> <report-emacs-bug>

Recent messages:

Mark set
Saving file c:/sandbox/sort/orient.txt...
Wrote c:/sandbox/sort/orient.txt

Mark set
repeat-complex-command: Quit
(iconify-frame (#<frame emacs@LAPTOP1 0x1329800\ >))
(make-frame-visible (#<frame emacs@LAPTOP1 0x1329800\ >))
Loading emacsbug...done


*************************
Marc Wilhelm Küster
Saphor GmbH

Fronländer 22
D-72072 Tübingen

Tel.: (+49) / (0)7472 / 949 100
Fax: (+49) / (0)7472 / 949 114

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: UTF-8 related display problem
  2002-10-05 20:36 UTF-8 related display problem Marc Wilhelm Küster
@ 2002-10-06 18:26 ` Eli Zaretskii
  2002-10-07  7:28   ` Marc Wilhelm Küster
  0 siblings, 1 reply; 6+ messages in thread
From: Eli Zaretskii @ 2002-10-06 18:26 UTC (permalink / raw)
  Cc: bug-gnu-emacs

> From: Marc Wilhelm =?iso-8859-1?Q?K=FCster?= <kuester@saphor.net>
> Date: Sat, 05 Oct 2002 22:36:04 +0200
> 
> Opening a largish UTF-8-encoded text file (ca. 800 kb) with Latin, Greek 
> and Hebrew passages in it causes emacs to stop displaying the text about 
> halfway through the text. It is impossible to navigate beyond that break.
> Shortening or lengthening the text does only move slighlty the point where 
> the text display stops.
> 
> The break seems always to be in non-Latin text.
> 
> The file displays without problem in other UTF-8-aware applications, so the 
> UTF-8 itself should be correct.

Are you sure you have the necessary fonts installed?  The list of
places where you can download Unicode fonts can be found in the file
INSTALL in the Emacs distribution.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: UTF-8 related display problem
  2002-10-06 18:26 ` Eli Zaretskii
@ 2002-10-07  7:28   ` Marc Wilhelm Küster
  2002-10-07 14:32     ` Eli Zaretskii
       [not found]     ` <5.1.0.14.2.20021007210058.031a7818@pop.puretec.de>
  0 siblings, 2 replies; 6+ messages in thread
From: Marc Wilhelm Küster @ 2002-10-07  7:28 UTC (permalink / raw)
  Cc: bug-gnu-emacs


> > Opening a largish UTF-8-encoded text file (ca. 800 kb) with Latin, Greek
> > and Hebrew passages in it causes emacs to stop displaying the text about
> > halfway through the text. It is impossible to navigate beyond that break.
> > Shortening or lengthening the text does only move slighlty the point where
> > the text display stops.
> >
> > The break seems always to be in non-Latin text.
> >
> > The file displays without problem in other UTF-8-aware applications, so 
> the
> > UTF-8 itself should be correct.
>
>Are you sure you have the necessary fonts installed?  The list of
>places where you can download Unicode fonts can be found in the file
>INSTALL in the Emacs distribution.

Thanks for the reply!

Yes, the necessary fonts are installed and the text, when extracted into 
another buffer, even displays correctly. Furthermore, saving the file 
actually shortens it to the point where the display ended, something that 
should never happen with pure display problems. It looks to me rather like 
an input stream problem of sorts (though a strange one, since splitting the 
file into parts and work with those parts is a way to get around the problem).

I have checked the UTF-8 by parsing it with Java's InputStreamReader in 
UTF-8 mode, but no problems whatsoever.

However, I cannot reconstruct the problem with any other file. I generated 
for this purpose a list of all existing Unicode characters, all in 
combination with a combining acute, and, except for the documented issue of 
characters bigger than U33FF and smaller than UE200, I could not spot anything.

The file in question contains data that should not be widely circulated. Is 
it possible that you can have a look at the problem and then delete the 
file afterwards?

Best regards,

Marc Küster


*************************
Marc Wilhelm Küster
Saphor GmbH

Fronländer 22
D-72072 Tübingen

Tel.: (+49) / (0)7472 / 949 100
Fax: (+49) / (0)7472 / 949 114

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: UTF-8 related display problem
  2002-10-07  7:28   ` Marc Wilhelm Küster
@ 2002-10-07 14:32     ` Eli Zaretskii
       [not found]     ` <5.1.0.14.2.20021007210058.031a7818@pop.puretec.de>
  1 sibling, 0 replies; 6+ messages in thread
From: Eli Zaretskii @ 2002-10-07 14:32 UTC (permalink / raw)
  Cc: bug-gnu-emacs


On Mon, 7 Oct 2002, Marc Wilhelm =?iso-8859-1?Q?K=FCster?= wrote:

> The file in question contains data that should not be widely circulated. Is 
> it possible that you can have a look at the problem and then delete the 
> file afterwards?

It's better if you could remove all the sensitive parts from the file, 
but without causing the problem you see to disappear.  Can you do that?  
If you can, then please post the file here as a binary attachment, and 
someone will look at the problem.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: UTF-8 related display problem
       [not found]     ` <5.1.0.14.2.20021007210058.031a7818@pop.puretec.de>
@ 2002-10-08  1:09       ` Kenichi Handa
  2002-10-10  9:12         ` Marc Wilhelm Küster
  0 siblings, 1 reply; 6+ messages in thread
From: Kenichi Handa @ 2002-10-08  1:09 UTC (permalink / raw)
  Cc: eliz, bug-gnu-emacs

In article <5.1.0.14.2.20021007210058.031a7818@pop.puretec.de>, Marc Wilhelm Küster <kuester@saphor.net> writes:
> Please find attached a version of the file that has all ASCII letters (a-z, 
> A-Z) transformed into a's. All non-ASCII letters are left intact. The bug 
> still occurs.

> Just open the file as a UTF-8 text file. Note that the display ends right 
> in a Hebrew passage on line 2883 (the line begins with <a><?aaaa ^@-0?>

I found a bug that is revealed typically by decoding large
utf-8-dos file (your case).

I've just installed the attached fix in HEAD and RC.  Could
you please try it?

---
Ken'ichi HANDA
handa@m17n.org

2002-10-08  Kenichi Handa  <handa@m17n.org>

	* coding.c (code_convert_region): When we need more GAP for
	conversion, pay attention to the case that coding->produced is not
	greater than coding->consumed.

Index: coding.c
===================================================================
RCS file: /cvs/emacs/src/coding.c,v
retrieving revision 1.259
retrieving revision 1.260
diff -u -c -r1.259 -r1.260
cvs server: conflicting specifications of output style
*** coding.c	30 Sep 2002 06:28:31 -0000	1.259
--- coding.c	8 Oct 2002 00:57:59 -0000	1.260
***************
*** 5696,5704 ****
  		REQUIRE + LEN_BYTE = LEN_BYTE * (NEW / ORIG)
  		REQUIRE = LEN_BYTE * (NEW - ORIG) / ORIG
  	     Here, we are sure that NEW >= ORIG.  */
! 	  float ratio = coding->produced - coding->consumed;
! 	  ratio /= coding->consumed;
! 	  require = len_byte * ratio;
  	  first = 0;
  	}
        if ((src - dst) < (require + 2000))
--- 5696,5714 ----
  		REQUIRE + LEN_BYTE = LEN_BYTE * (NEW / ORIG)
  		REQUIRE = LEN_BYTE * (NEW - ORIG) / ORIG
  	     Here, we are sure that NEW >= ORIG.  */
! 	  float ratio;
! 
! 	  if (coding->produced <= coding->consumed)
! 	    {
! 	      /* This happens because of CCL-based coding system with
! 		 eol-type CRLF.  */
! 	      require = 0;
! 	    }
! 	  else
! 	    {
! 	      ratio = (coding->produced - coding->consumed) / coding->consumed;
! 	      require = len_byte * ratio;
! 	    }
  	  first = 0;
  	}
        if ((src - dst) < (require + 2000))

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: UTF-8 related display problem
  2002-10-08  1:09       ` Kenichi Handa
@ 2002-10-10  9:12         ` Marc Wilhelm Küster
  0 siblings, 0 replies; 6+ messages in thread
From: Marc Wilhelm Küster @ 2002-10-10  9:12 UTC (permalink / raw)
  Cc: eliz, bug-gnu-emacs



>In article <5.1.0.14.2.20021007210058.031a7818@pop.puretec.de>, Marc 
>Wilhelm Küster <kuester@saphor.net> writes:
> > Please find attached a version of the file that has all ASCII letters 
> (a-z,
> > A-Z) transformed into a's. All non-ASCII letters are left intact. The bug
> > still occurs.
>
> > Just open the file as a UTF-8 text file. Note that the display ends right
> > in a Hebrew passage on line 2883 (the line begins with <a><?aaaa ^@-0?>
>
>I found a bug that is revealed typically by decoding large
>utf-8-dos file (your case).
>
>I've just installed the attached fix in HEAD and RC.  Could
>you please try it?

Thanks a lot! That is fabulous. I downloaded the head and tried it out 
(took me a bit of time because I didn't notice INSTALL-CVS at first...). 
The file now displays as it should.

Best regards,

Marc Küster


>---
>Ken'ichi HANDA
>handa@m17n.org
>
>2002-10-08  Kenichi Handa  <handa@m17n.org>
>
>         * coding.c (code_convert_region): When we need more GAP for
>         conversion, pay attention to the case that coding->produced is not
>         greater than coding->consumed.
>
>Index: coding.c
>===================================================================
>RCS file: /cvs/emacs/src/coding.c,v
>retrieving revision 1.259
>retrieving revision 1.260
>diff -u -c -r1.259 -r1.260
>cvs server: conflicting specifications of output style
>*** coding.c    30 Sep 2002 06:28:31 -0000      1.259
>--- coding.c    8 Oct 2002 00:57:59 -0000       1.260
>***************
>*** 5696,5704 ****
>                 REQUIRE + LEN_BYTE = LEN_BYTE * (NEW / ORIG)
>                 REQUIRE = LEN_BYTE * (NEW - ORIG) / ORIG
>              Here, we are sure that NEW >= ORIG.  */
>!         float ratio = coding->produced - coding->consumed;
>!         ratio /= coding->consumed;
>!         require = len_byte * ratio;
>           first = 0;
>         }
>         if ((src - dst) < (require + 2000))
>--- 5696,5714 ----
>                 REQUIRE + LEN_BYTE = LEN_BYTE * (NEW / ORIG)
>                 REQUIRE = LEN_BYTE * (NEW - ORIG) / ORIG
>              Here, we are sure that NEW >= ORIG.  */
>!         float ratio;
>!
>!         if (coding->produced <= coding->consumed)
>!           {
>!             /* This happens because of CCL-based coding system with
>!               eol-type CRLF.  */
>!             require = 0;
>!           }
>!         else
>!           {
>!             ratio = (coding->produced - coding->consumed) / 
>coding->consumed;
>!             require = len_byte * ratio;
>!           }
>           first = 0;
>         }
>         if ((src - dst) < (require + 2000))

*************************
Marc Wilhelm Küster
Saphor GmbH

Fronländer 22
D-72072 Tübingen

Tel.: (+49) / (0)7472 / 949 100
Fax: (+49) / (0)7472 / 949 114

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2002-10-10  9:12 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2002-10-05 20:36 UTF-8 related display problem Marc Wilhelm Küster
2002-10-06 18:26 ` Eli Zaretskii
2002-10-07  7:28   ` Marc Wilhelm Küster
2002-10-07 14:32     ` Eli Zaretskii
     [not found]     ` <5.1.0.14.2.20021007210058.031a7818@pop.puretec.de>
2002-10-08  1:09       ` Kenichi Handa
2002-10-10  9:12         ` Marc Wilhelm Küster

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).