From mboxrd@z Thu Jan 1 00:00:00 1970 Path: main.gmane.org!not-for-mail From: Ralf Angeli Newsgroups: gmane.emacs.bugs Subject: Re: command fill-paragraph deletes leading Umlauts if line begins with space Date: Thu, 23 Dec 2004 11:19:11 +0100 Message-ID: References: NNTP-Posting-Host: deer.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Trace: sea.gmane.org 1103798344 23734 80.91.229.6 (23 Dec 2004 10:39:04 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Thu, 23 Dec 2004 10:39:04 +0000 (UTC) Cc: bug-gnu-emacs@gnu.org Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Thu Dec 23 11:38:57 2004 Return-path: Original-Received: from lists.gnu.org ([199.232.76.165]) by deer.gmane.org with esmtp (Exim 3.35 #1 (Debian)) id 1ChQMy-0007L4-00 for ; Thu, 23 Dec 2004 11:38:56 +0100 Original-Received: from localhost ([127.0.0.1] helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.33) id 1ChQXZ-0002F7-EC for geb-bug-gnu-emacs@m.gmane.org; Thu, 23 Dec 2004 05:49:53 -0500 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.33) id 1ChQEs-0003TW-D2 for bug-gnu-emacs@gnu.org; Thu, 23 Dec 2004 05:30:35 -0500 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.33) id 1ChQEo-0003SD-Cb for bug-gnu-emacs@gnu.org; Thu, 23 Dec 2004 05:30:30 -0500 Original-Received: from [199.232.76.173] (helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.33) id 1ChQEn-0003RS-Dx for bug-gnu-emacs@gnu.org; Thu, 23 Dec 2004 05:30:29 -0500 Original-Received: from [134.96.7.25] (helo=triton.rz.uni-saarland.de) by monty-python.gnu.org with esmtp (Exim 4.34) id 1ChQ3w-0006Zk-WC for bug-gnu-emacs@gnu.org; Thu, 23 Dec 2004 05:19:17 -0500 Original-Received: from iwi-gate.iwi.uni-sb.de (iwi-gate.iwi.uni-sb.de [134.96.72.13]) by triton.rz.uni-saarland.de (8.12.10/8.12.10) with ESMTP id iBNAJEqI701061; Thu, 23 Dec 2004 11:19:14 +0100 (CET) Original-Received: from [134.96.72.190] (helo=neutrino.iwi.uni-sb.de) by iwi-gate.iwi.uni-sb.de with esmtp (Exim 4.33; FreeBSD) id 1ChQ3o-0006Vt-4F; Thu, 23 Dec 2004 11:19:08 +0100 Original-Received: from angeli by neutrino.iwi.uni-sb.de with local (Exim 4.43) id 1ChQ3r-0002bn-Kf; Thu, 23 Dec 2004 11:19:11 +0100 Original-To: Ulrich Scholz User-Agent: Gnus/5.110003 (No Gnus v0.3) Emacs/21.3.50 (gnu/linux) X-IWi-MailScanner-Information: Please contact the ISP for more information X-IWi-MailScanner: Found to be clean X-IWi-MailScanner-SpamCheck: not spam, SpamAssassin (score=0, required 5) X-MailScanner-From: angeli@iwi.uni-sb.de X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-1.5.1 (triton.rz.uni-saarland.de [134.96.7.25]); Thu, 23 Dec 2004 11:19:14 +0100 (CET) X-AntiVirus: checked by AntiVir Milter 1.0.6; AVE 6.29.0.5; VDF 6.29.0.32 X-BeenThere: bug-gnu-emacs@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Xref: main.gmane.org gmane.emacs.bugs:10189 X-Report-Spam: http://spam.gmane.org/gmane.emacs.bugs:10189 * Ulrich Scholz (2004-12-22) writes: > value of $LANG: en_US.ISO-8859-15 > locale-coding-system: nil > default-enable-multibyte-characters: nil > > Please describe exactly what actions triggered the bug > and the precise symptoms of the bug: > > The command changes the following paragraph > > =DCbersetzung L=F6sungsverfahren f=FCr eine spezielle Problemdom=E4ne ha= ben auch > Probleme: > > to the paragraph > > bersetzung L=F6sungsverfahren f=FCr eine spezielle Problemdom=E4ne haben > auch Probleme: > > Note that the =DC of =DCbersetzung is missing in the second version. The > bug eats any number of Umlauts, but only as first characters of the line = after > some spaces. Umlauts after the first non-Umlaut or in lines that begin w= ith a > non-space remain. > > I don't know how to get a list of all active modes. The bug occurs while > editing an LaTeX-file. I use auc-tex and reftex. iso-accents-mode does = not > seem to cause the bug. I can reproduce the behavior with CVS AUCTeX, but only if I force Emacs (21.3 or CVS) to open the file in unibyte mode by using `find-file-literally'. The problem is that with unibyte mode umlauts are considered to have whitespace syntax. For example, typing `C-u C-x =3D' on the first umlaut in your example gives character: =DC (0334, 220, 0xdc) charset: eight-bit-graphic (8-bit graphic char (0xA0..0xFF)) code point: 220 syntax: which means: whitespace buffer code: 0xDC file code: 0xDC (encoded by coding system no-conversion) display: by display table entry [?=DC] (see below) (Instead of the control char one actually sees a "=C3=9C".) A function in AUCTeX for doing indentation looks at whitespace syntax for finding the first non-whitespace character (and so does `back-to-indentation' in CVS Emacs). That means it will skip the "=C3=9C" and delete everything from the beginning of the line to and including the "=C3=9C". I removed this code in CVS AUCTeX which now only uses `back-to-indentation'. In Emacs 21.3 this function does not look at character syntax but simply skips spaces and tab characters at the beginning of a line. So unless you are using CVS Emacs (i.e. the upcoming Emacs 21.4) your umlauts should be safe. Anyway, do you really need the unibyte stuff? If you want to use latin-1, latin-9 and other non-ASCII encodings it will be better to use Emacs in multibyte mode. That means you should get rid of a --unibyte command line option, a nil value for `default-enable-multibyte-characters' or stuff like `(standard-display-european t)'. For example, this will make `M-f' work correctly, i.e. it will not stop at every umlaut. --=20 Ralf