* Ulrich Scholz (2004-12-22) writes: > value of $LANG: en_US.ISO-8859-15 > locale-coding-system: nil > default-enable-multibyte-characters: nil > > Please describe exactly what actions triggered the bug > and the precise symptoms of the bug: > > The command changes the following paragraph > > Übersetzung Lösungsverfahren für eine spezielle Problemdomäne haben auch > Probleme: > > to the paragraph > > bersetzung Lösungsverfahren für eine spezielle Problemdomäne haben > auch Probleme: > > Note that the Ü of Übersetzung is missing in the second version. The > bug eats any number of Umlauts, but only as first characters of the line after > some spaces. Umlauts after the first non-Umlaut or in lines that begin with a > non-space remain. > > I don't know how to get a list of all active modes. The bug occurs while > editing an LaTeX-file. I use auc-tex and reftex. iso-accents-mode does not > seem to cause the bug. I can reproduce the behavior with CVS AUCTeX, but only if I force Emacs (21.3 or CVS) to open the file in unibyte mode by using `find-file-literally'. The problem is that with unibyte mode umlauts are considered to have whitespace syntax. For example, typing `C-u C-x =' on the first umlaut in your example gives character: Ü (0334, 220, 0xdc) charset: eight-bit-graphic (8-bit graphic char (0xA0..0xFF)) code point: 220 syntax: which means: whitespace buffer code: 0xDC file code: 0xDC (encoded by coding system no-conversion) display: by display table entry [?Ü] (see below) (Instead of the control char one actually sees a "Ãœ".) A function in AUCTeX for doing indentation looks at whitespace syntax for finding the first non-whitespace character (and so does `back-to-indentation' in CVS Emacs). That means it will skip the "Ãœ" and delete everything from the beginning of the line to and including the "Ãœ". I removed this code in CVS AUCTeX which now only uses `back-to-indentation'. In Emacs 21.3 this function does not look at character syntax but simply skips spaces and tab characters at the beginning of a line. So unless you are using CVS Emacs (i.e. the upcoming Emacs 21.4) your umlauts should be safe. Anyway, do you really need the unibyte stuff? If you want to use latin-1, latin-9 and other non-ASCII encodings it will be better to use Emacs in multibyte mode. That means you should get rid of a --unibyte command line option, a nil value for `default-enable-multibyte-characters' or stuff like `(standard-display-european t)'. For example, this will make `M-f' work correctly, i.e. it will not stop at every umlaut. -- Ralf