unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
* command fill-paragraph deletes leading Umlauts if line begins with space
@ 2004-12-22 18:29 Ulrich Scholz
  2004-12-23 10:19 ` Ralf Angeli
  0 siblings, 1 reply; 2+ messages in thread
From: Ulrich Scholz @ 2004-12-22 18:29 UTC (permalink / raw)


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=UTF-8, Size: 3421 bytes --]

This bug report will be sent to the Free Software Foundation,
not to your local site managers!
Please write in English, because the Emacs maintainers do not have
translators to read other languages for them.

Your bug report will be posted to the bug-gnu-emacs@gnu.org mailing list,
and to the gnu.emacs.bug news group.

In GNU Emacs 21.3.1 (i386-pc-linux-gnu, X toolkit, Xaw3d scroll bars)
 of 2004-10-16 on raven, modified by Debian
configured using `configure '--build=i386-linux' '--host=i386-linux' '--prefix=/usr' '--sharedstatedir=/var/lib' '--libexecdir=/usr/lib' '--localstatedir=/var/lib' '--infodir=/usr/share/info' '--mandir=/usr/share/man' '--with-pop=yes' '--with-x=yes' '--with-x-toolkit=athena' 'CFLAGS=-DDEBIAN -g -O2' 'build_alias=i386-linux' 'host_alias=i386-linux''
Important settings:
  value of $LC_ALL: C
  value of $LC_COLLATE: nil
  value of $LC_CTYPE: nil
  value of $LC_MESSAGES: nil
  value of $LC_MONETARY: nil
  value of $LC_NUMERIC: nil
  value of $LC_TIME: nil
  value of $LANG: en_US.ISO-8859-15
  locale-coding-system: nil
  default-enable-multibyte-characters: nil

Please describe exactly what actions triggered the bug
and the precise symptoms of the bug:

The command changes the following paragraph

 Übersetzung Lösungsverfahren für eine spezielle Problemdomäne haben auch
Probleme:

to the paragraph

bersetzung Lösungsverfahren für eine spezielle Problemdomäne haben
auch Probleme:

Note that the Ü of Übersetzung is missing in the second version.  The
bug eats any number of Umlauts, but only as first characters of the line after
some spaces.  Umlauts after the first non-Umlaut or in lines that begin with a
non-space remain.

I don't know how to get a list of all active modes.  The bug occurs while
editing an LaTeX-file.  I use auc-tex and reftex.  iso-accents-mode does not
seem to cause the bug.



Recent input:
<mouse-movement> <mouse-movement> <mouse-movement> 
<mouse-movement> <mouse-movement> <mouse-movement> 
<mouse-movement> <mouse-movement> <mouse-movement> 
<mouse-movement> <mouse-movement> <mouse-movement> 
<mouse-movement> <drag-mouse-1> <help-echo> <down-mouse-1> 
<mouse-1> <down-mouse-2> <mouse-2> <help-echo> e <return> 
<help-echo> <down-mouse-1> <mouse-1> <help-echo> <help-echo> 
<help-echo> <help-echo> <help-echo> <help-echo> <help-echo> 
<help-echo> <help-echo> <help-echo> <help-echo> <help-echo> 
<help-echo> <help-echo> C-x 3 C-x 4 4 C-x 4 <down-mouse-1> 
<mouse-1> <menu-bar> <buffer> "projekt_leanplan.tex" 
<help-echo> <down> <down> <down> <down> <down> <down> 
<down> <up> <return> <return> <return> <up> <up> SPC 
" U b e r f a l l M-q <help-echo> <help-echo> <help-echo> 
<help-echo> <help-echo> <help-echo> <help-echo> <help-echo> 
<help-echo> <help-echo> <help-echo> <help-echo> <help-echo> 
<help-echo> <help-echo> <help-echo> <help-echo> <help-echo> 
<help-echo> <help-echo> <help-echo> <menu-bar> <help-menu> 
<report-emacs-bug>

Recent messages:
Making completion list... [4 times]
isearch-abort: Quit
call-interactively: Quit
Making completion list...
call-interactively: Command attempted to use minibuffer while in minibuffer
Wrote /home/scholzuh/bug.txt
Type C-c C-x C-z to exit MIME mode, and type C-c C-x ? to get help.
Fontifying *mail*... (regexps........)
Type C-c C-x C-z to exit MIME mode, and type C-c C-x ? to get help.
Auto-saving...done

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: command fill-paragraph deletes leading Umlauts if line begins with space
  2004-12-22 18:29 command fill-paragraph deletes leading Umlauts if line begins with space Ulrich Scholz
@ 2004-12-23 10:19 ` Ralf Angeli
  0 siblings, 0 replies; 2+ messages in thread
From: Ralf Angeli @ 2004-12-23 10:19 UTC (permalink / raw)
  Cc: bug-gnu-emacs

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=utf-8, Size: 2680 bytes --]

* Ulrich Scholz (2004-12-22) writes:

>   value of $LANG: en_US.ISO-8859-15
>   locale-coding-system: nil
>   default-enable-multibyte-characters: nil
>
> Please describe exactly what actions triggered the bug
> and the precise symptoms of the bug:
>
> The command changes the following paragraph
>
>  Übersetzung Lösungsverfahren für eine spezielle Problemdomäne haben auch
> Probleme:
>
> to the paragraph
>
> bersetzung Lösungsverfahren für eine spezielle Problemdomäne haben
> auch Probleme:
>
> Note that the Ü of Übersetzung is missing in the second version.  The
> bug eats any number of Umlauts, but only as first characters of the line after
> some spaces.  Umlauts after the first non-Umlaut or in lines that begin with a
> non-space remain.
>
> I don't know how to get a list of all active modes.  The bug occurs while
> editing an LaTeX-file.  I use auc-tex and reftex.  iso-accents-mode does not
> seem to cause the bug.

I can reproduce the behavior with CVS AUCTeX, but only if I force
Emacs (21.3 or CVS) to open the file in unibyte mode by using
`find-file-literally'.  The problem is that with unibyte mode umlauts
are considered to have whitespace syntax.  For example, typing `C-u
C-x =' on the first umlaut in your example gives

  character: Ü (0334, 220, 0xdc)
    charset: eight-bit-graphic (8-bit graphic char (0xA0..0xFF))
 code point: 220
     syntax:   	which means: whitespace
buffer code: 0xDC
  file code: 0xDC (encoded by coding system no-conversion)
    display: by display table entry [?Ü] (see below)

(Instead of the control char one actually sees a "Ü".)

A function in AUCTeX for doing indentation looks at whitespace syntax
for finding the first non-whitespace character (and so does
`back-to-indentation' in CVS Emacs).  That means it will skip the "Ü"
and delete everything from the beginning of the line to and including
the "Ü".

I removed this code in CVS AUCTeX which now only uses
`back-to-indentation'.  In Emacs 21.3 this function does not look at
character syntax but simply skips spaces and tab characters at the
beginning of a line.  So unless you are using CVS Emacs (i.e. the
upcoming Emacs 21.4) your umlauts should be safe.

Anyway, do you really need the unibyte stuff?  If you want to use
latin-1, latin-9 and other non-ASCII encodings it will be better to
use Emacs in multibyte mode.  That means you should get rid of a
--unibyte command line option, a nil value for
`default-enable-multibyte-characters' or stuff like
`(standard-display-european t)'.  For example, this will make `M-f'
work correctly, i.e. it will not stop at every umlaut.

-- 
Ralf

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2004-12-23 10:19 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-12-22 18:29 command fill-paragraph deletes leading Umlauts if line begins with space Ulrich Scholz
2004-12-23 10:19 ` Ralf Angeli

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).