unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
* [cs-usenet@arcor.de: tex-mode: too many _  (underscores) interpreted as subscripts]
@ 2004-10-03  1:19 Richard Stallman
  2004-10-03 13:52 ` Ralf Angeli
  0 siblings, 1 reply; 13+ messages in thread
From: Richard Stallman @ 2004-10-03  1:19 UTC (permalink / raw)


Would someone please investigate this bug report and DTRT?
Stefan made these changes, but he hasn't responded to
my mail about this, so I think it is time to look
for someone else.

------- Start of forwarded message -------
Date: Fri, 30 Jul 2004 15:21:23 +0200
From: Christian Schlauer <cs-usenet@arcor.de>
X-Accept-Language: en-us, en
To: emacs-pretest-bug@gnu.org
Subject: tex-mode: too many _  (underscores) interpreted as subscripts
Sender: emacs-pretest-bug-bounces+rms=gnu.org@gnu.org
X-Spam-Status: No, hits=0.4 required=5.0
	tests=RCVD_IN_ORBS,USER_AGENT,X_ACCEPT_LANG
	version=2.55
X-Spam-Level: 
X-Spam-Checker-Version: SpamAssassin 2.55 (1.174.2.19-2003-05-19-exp)

This bug report will be sent to the Free Software Foundation,
not to your local site managers!
Please write in English if possible, because the Emacs maintainers
usually do not have translators to read other languages for them.

Your bug report will be posted to the emacs-pretest-bug@gnu.org mailing 
list.

Please describe exactly what actions triggered the bug
and the precise symptoms of the bug:

The News file of CVS Emacs announces the following for TeX mode:

*** verbatim environments are now highlighted in courier by font-lock
and super/sub-scripts are made into super/sub-scripts.

At the moment, it considers too many underscores as subscript
commands. While it leaves \cite{blah99:_long_title} and
\verb+file_name+ alone, it displays the commands
\nolinkurl{file_name_with_underscore.txt} and
\url{file_name_with_underscore.txt} (both available with the hyperref
package, the last one also with url.sty) wrong, that is, the first
letter after an underscore in these commands is transformed into a
subscript.

In GNU Emacs 21.3.50.1 (i386-mingw-nt5.1.2600)
  of 2004-05-19 on BERATUNG4
configured using `configure --with-gcc (3.3) --cflags 
- -I../../jpeg-6b-1/include -I../../libpng-1.2.4-1/include 
- -I../../tiff-3.5.7/include -I../../xpm-nox-4.2.0/include 
- -I../../zlib-1.1.4-1/include'

Important settings:
   value of $LC_ALL: nil
   value of $LC_COLLATE: nil
   value of $LC_CTYPE: nil
   value of $LC_MESSAGES: nil
   value of $LC_MONETARY: nil
   value of $LC_NUMERIC: nil
   value of $LC_TIME: nil
   value of $LANG: DEU
   locale-coding-system: cp1252
   default-enable-multibyte-characters: t

Major mode: LaTeX

Minor modes in effect:
   tool-bar-mode: t
   encoded-kbd-mode: t
   mouse-wheel-mode: t
   menu-bar-mode: t
   font-lock-mode: t
   unify-8859-on-encoding-mode: t
   line-number-mode: t

Recent input:
- - e m a c s - f o n t - l o c k . t e x <return> M-x
f o n t - l o c k - m o d e <return> \ n o l i n k
u r l { } <left> f i l e _ n a m e _ w i t h _ u n
d e r s c r o <backspace> <backspace> o r e . t x t
C-e M-x r e p o r t - e m a SPC b u g <return>

Recent messages:
Loading tex-mode...done
Making completion list...
Loading font-lock...done
Loading jit-lock...done
mwheel-scroll: Beginning of buffer [188 times]
Making completion list...
(New file)
Font-Lock mode enabled
Loading skeleton...done
Loading emacsbug...done



_______________________________________________
Emacs-pretest-bug mailing list
Emacs-pretest-bug@gnu.org
http://lists.gnu.org/mailman/listinfo/emacs-pretest-bug
------- End of forwarded message -------

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [cs-usenet@arcor.de: tex-mode: too many _ (underscores) interpreted as subscripts]
  2004-10-03  1:19 [cs-usenet@arcor.de: tex-mode: too many _ (underscores) interpreted as subscripts] Richard Stallman
@ 2004-10-03 13:52 ` Ralf Angeli
  2004-10-03 19:21   ` Stefan
                     ` (2 more replies)
  0 siblings, 3 replies; 13+ messages in thread
From: Ralf Angeli @ 2004-10-03 13:52 UTC (permalink / raw)


* Richard Stallman (2004-10-03) writes:

> Would someone please investigate this bug report and DTRT?
> Stefan made these changes, but he hasn't responded to
> my mail about this, so I think it is time to look
> for someone else.
>
> From: Christian Schlauer <cs-usenet@arcor.de>
> Subject: tex-mode: too many _  (underscores) interpreted as subscripts
> To: emacs-pretest-bug@gnu.org
> Date: Fri, 30 Jul 2004 15:21:23 +0200
[...]
> The News file of CVS Emacs announces the following for TeX mode:
>
> *** verbatim environments are now highlighted in courier by font-lock
> and super/sub-scripts are made into super/sub-scripts.
>
> At the moment, it considers too many underscores as subscript
> commands. While it leaves \cite{blah99:_long_title} and
> \verb+file_name+ alone, it displays the commands
> \nolinkurl{file_name_with_underscore.txt} and
> \url{file_name_with_underscore.txt} (both available with the hyperref
> package, the last one also with url.sty) wrong, that is, the first
> letter after an underscore in these commands is transformed into a
> subscript.

Inhibiting the subscript and superscript fontification is achieved by
checking if certain faces are present.  So a quick fix, only covering
\nolinkurl and \url, would be to add these commands e.g. to the
`citations' keywords in `tex-font-lock-keywords-2'.

If this should be fixed for arbitrary LaTeX commands which are not
fontified, one would have to check if the underscores in concern are
located inside of math environments.  In AUCTeX we are using
texmathp.el (which is distributed with AUCTeX) for these purposes.
But calls to `texmathp' can get very expensive especially in larger
LaTeX files.

-- 
Ralf

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [cs-usenet@arcor.de: tex-mode: too many _ (underscores) interpreted as subscripts]
  2004-10-03 13:52 ` Ralf Angeli
@ 2004-10-03 19:21   ` Stefan
  2004-10-03 19:53   ` Stefan
  2004-10-04 15:19   ` Richard Stallman
  2 siblings, 0 replies; 13+ messages in thread
From: Stefan @ 2004-10-03 19:21 UTC (permalink / raw)


> If this should be fixed for arbitrary LaTeX commands which are not
> fontified, one would have to check if the underscores in concern are
> located inside of math environments.  In AUCTeX we are using
> texmathp.el (which is distributed with AUCTeX) for these purposes.
> But calls to `texmathp' can get very expensive especially in larger
> LaTeX files.

And texmathp is also using heuristics which don't cover all cases.


        Stefan

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [cs-usenet@arcor.de: tex-mode: too many _ (underscores) interpreted as subscripts]
  2004-10-03 13:52 ` Ralf Angeli
  2004-10-03 19:21   ` Stefan
@ 2004-10-03 19:53   ` Stefan
  2004-10-04 15:19   ` Richard Stallman
  2 siblings, 0 replies; 13+ messages in thread
From: Stefan @ 2004-10-03 19:53 UTC (permalink / raw)


> Inhibiting the subscript and superscript fontification is achieved by
> checking if certain faces are present.  So a quick fix, only covering
> \nolinkurl and \url, would be to add these commands e.g. to the
> `citations' keywords in `tex-font-lock-keywords-2'.

I've just installed such a fix.


        Stefan

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [cs-usenet@arcor.de: tex-mode: too many _ (underscores) interpreted as subscripts]
  2004-10-03 13:52 ` Ralf Angeli
  2004-10-03 19:21   ` Stefan
  2004-10-03 19:53   ` Stefan
@ 2004-10-04 15:19   ` Richard Stallman
  2004-10-05  9:56     ` Ralf Angeli
  2004-10-05 10:16     ` Ralf Angeli
  2 siblings, 2 replies; 13+ messages in thread
From: Richard Stallman @ 2004-10-04 15:19 UTC (permalink / raw)
  Cc: emacs-devel

    If this should be fixed for arbitrary LaTeX commands which are not
    fontified, one would have to check if the underscores in concern are
    located inside of math environments.

\nolinkurl and \url are not related to math.  Is there an intermediate
possibility, one that doesn't check specifically for math
environments, but handles all the constructs that are not related to
math?

Meanwhile, why is texmathp so slow?

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [cs-usenet@arcor.de: tex-mode: too many _ (underscores) interpreted as subscripts]
  2004-10-04 15:19   ` Richard Stallman
@ 2004-10-05  9:56     ` Ralf Angeli
  2004-10-05 10:16     ` Ralf Angeli
  1 sibling, 0 replies; 13+ messages in thread
From: Ralf Angeli @ 2004-10-05  9:56 UTC (permalink / raw)


* Richard Stallman (2004-10-04) writes:

>     If this should be fixed for arbitrary LaTeX commands which are not
>     fontified, one would have to check if the underscores in concern are
>     located inside of math environments.
>
> \nolinkurl and \url are not related to math.  Is there an intermediate
> possibility, one that doesn't check specifically for math
> environments, but handles all the constructs that are not related to
> math?

None I know of.

Unescaped underscores can be used in a variety of places; in labels,
references, verbatim commands/environments, math commands/environments
etc.  The current implementation searches for occurences of unescaped
underscores in the buffer and checks for the presence of certain
faces.  If it finds e.g. `tex-verbatim-face', it knows that there is
no math-related content and skips this occurence.  If there is no such
face, it assumes that it is inside of a math command or environment
and fontifies the stuff after the underscore.  This may cause false
fontifications.

If the code wanted to check if the (unfontified) underscore at hand is
part of math content it would have to search for constructs starting
or ending math (that's what `texmathp' does).  So for each underscore
you could end up scanning a large part of the buffer.  A way out of
this could be to scan the buffer linearly from start to end and stop
only at math-related content for the fontification of subscripts and
superscripts.  (Of course, for JIT fontification one would have to
look upwards in the buffer when fontification starts.)

> Meanwhile, why is texmathp so slow?

-- 
Ralf

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [cs-usenet@arcor.de: tex-mode: too many _ (underscores) interpreted as subscripts]
  2004-10-04 15:19   ` Richard Stallman
  2004-10-05  9:56     ` Ralf Angeli
@ 2004-10-05 10:16     ` Ralf Angeli
  2004-10-06 17:10       ` Richard Stallman
  1 sibling, 1 reply; 13+ messages in thread
From: Ralf Angeli @ 2004-10-05 10:16 UTC (permalink / raw)


[My last mail was sent prematurely.  Here is the rest of the answer.]

* Richard Stallman (2004-10-04) writes:

> Meanwhile, why is texmathp so slow?

It can happen that `texmathp' has to `re-search-backward' through the
whole buffer.  We had a bug report in AUCTeX where a file was supplied
which demonstrated the problem in a highly exaggerated way which lead
to fontification eating up a large amount of CPU resources.  The file
was about 10,000 lines and repeatedly contained

--8<---------------cut here---------------start------------->8---
\newcommand{\mycontent}{\ensuremath{\lambda_{1}}}%
\begin{itemize}%
\item
  \begin{align*}%
    \left\{
      \begin{array}{l}
        b_{c} \text{some long line with $\mycontent$}
      \end{array}
    \right\}
  \end{align*}%
\end{itemize}%
--8<---------------cut here---------------end--------------->8---

`texmathp' tries to limit the backward search by finding empty lines
which may denote a paragraph separation.  In the file there were no
empty lines, so at the end, it had to search all the way up those
10,000 lines for math environments.

The search involves two passes: One for commands and one for
environments.  The closer match is selected.  Currently both passes
can be limited only by the empty-line heuristic.  One could probably
speed the process up by limiting one search by the result of the
other.

-- 
Ralf

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [cs-usenet@arcor.de: tex-mode: too many _ (underscores) interpreted as subscripts]
  2004-10-05 10:16     ` Ralf Angeli
@ 2004-10-06 17:10       ` Richard Stallman
  2004-10-07  9:08         ` Ralf Angeli
  0 siblings, 1 reply; 13+ messages in thread
From: Richard Stallman @ 2004-10-06 17:10 UTC (permalink / raw)
  Cc: emacs-devel

    `texmathp' tries to limit the backward search by finding empty lines
    which may denote a paragraph separation.

Are there any other constructs that are generally not used in math
mode, so finding one of them shows the answer at that point is no?

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [cs-usenet@arcor.de: tex-mode: too many _ (underscores) interpreted as subscripts]
  2004-10-06 17:10       ` Richard Stallman
@ 2004-10-07  9:08         ` Ralf Angeli
  2004-10-08 16:05           ` Richard Stallman
  0 siblings, 1 reply; 13+ messages in thread
From: Ralf Angeli @ 2004-10-07  9:08 UTC (permalink / raw)
  Cc: emacs-devel

* Richard Stallman (2004-10-06) writes:

>     `texmathp' tries to limit the backward search by finding empty lines
>     which may denote a paragraph separation.
>
> Are there any other constructs that are generally not used in math
> mode, so finding one of them shows the answer at that point is no?

There is an infinite number of constructs which are not supposed to be
used in math, e.g. itemize environments or sectioning macros.  But I
haven't found a way to generalize those constructs.  And checking for
an arbitrary pick of such commands doesn't seem to be the best
solution.

-- 
Ralf

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [cs-usenet@arcor.de: tex-mode: too many _ (underscores) interpreted as subscripts]
  2004-10-07  9:08         ` Ralf Angeli
@ 2004-10-08 16:05           ` Richard Stallman
  2004-10-08 16:33             ` Ralf Angeli
  0 siblings, 1 reply; 13+ messages in thread
From: Richard Stallman @ 2004-10-08 16:05 UTC (permalink / raw)
  Cc: emacs-devel

    There is an infinite number of constructs which are not supposed to be
    used in math, e.g. itemize environments or sectioning macros.  But I
    haven't found a way to generalize those constructs.  And checking for
    an arbitrary pick of such commands doesn't seem to be the best
    solution.

Since these are used only to save time, it is not a problem
if the list is incomplete.  Let's add some additional common ones,
and then the function texmathp will usually run faster.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [cs-usenet@arcor.de: tex-mode: too many _ (underscores) interpreted as subscripts]
  2004-10-08 16:05           ` Richard Stallman
@ 2004-10-08 16:33             ` Ralf Angeli
  2004-10-08 18:49               ` David Kastrup
  0 siblings, 1 reply; 13+ messages in thread
From: Ralf Angeli @ 2004-10-08 16:33 UTC (permalink / raw)
  Cc: emacs-devel

* Richard Stallman (2004-10-08) writes:

>     There is an infinite number of constructs which are not supposed to be
>     used in math, e.g. itemize environments or sectioning macros.  But I
>     haven't found a way to generalize those constructs.  And checking for
>     an arbitrary pick of such commands doesn't seem to be the best
>     solution.
>
> Since these are used only to save time, it is not a problem
> if the list is incomplete.  Let's add some additional common ones,
> and then the function texmathp will usually run faster.

I'll see what I can do.  Thank you for your help.

-- 
Ralf

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [cs-usenet@arcor.de: tex-mode: too many _ (underscores) interpreted as subscripts]
  2004-10-08 16:33             ` Ralf Angeli
@ 2004-10-08 18:49               ` David Kastrup
  2004-10-09 15:43                 ` Richard Stallman
  0 siblings, 1 reply; 13+ messages in thread
From: David Kastrup @ 2004-10-08 18:49 UTC (permalink / raw)
  Cc: rms, emacs-devel

Ralf Angeli <angeli@iwi.uni-sb.de> writes:

> * Richard Stallman (2004-10-08) writes:
>
>>     There is an infinite number of constructs which are not supposed to be
>>     used in math, e.g. itemize environments or sectioning macros.  But I
>>     haven't found a way to generalize those constructs.  And checking for
>>     an arbitrary pick of such commands doesn't seem to be the best
>>     solution.
>>
>> Since these are used only to save time, it is not a problem
>> if the list is incomplete.  Let's add some additional common ones,
>> and then the function texmathp will usually run faster.
>
> I'll see what I can do.  Thank you for your help.

texmathp.el is in AUCTeX, and we have in there

;; texmathp.el -- Code to check if point is inside LaTeX math environment
;; Copyright (c) 1998 Carsten Dominik
;; Copyright (C) 2004 Free Software Foundation, Inc.
;; texmathp.el,v 1.28 1998/11/23 15:19:44 dominik Exp

Ok, here is the crazy thing.

In 2003, Carsten Dominik assigned "all right and title" in texmathp.el
to the FSF (for program Emacs), after having previously assigned all
past and future changes to it (also for program Emacs).  But
texmathp.el is not even part of Emacs, whether released or in CVS.  As
far as I can see, it is only distributed with AUCTeX at the moment.
Does that mean that we can change the copyright notice in AUCTeX to be
FSF only without Carsten having to sign for AUCTeX specifically?

Should tex-mode.el be made to rely on texmathp.el if that improves
things?  AUCTeX is slated for inclusion into Emacs a few years from
now, anyway.

-- 
David Kastrup, Kriemhildstr. 15, 44793 Bochum

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [cs-usenet@arcor.de: tex-mode: too many _ (underscores) interpreted as subscripts]
  2004-10-08 18:49               ` David Kastrup
@ 2004-10-09 15:43                 ` Richard Stallman
  0 siblings, 0 replies; 13+ messages in thread
From: Richard Stallman @ 2004-10-09 15:43 UTC (permalink / raw)
  Cc: angeli, emacs-devel

    In 2003, Carsten Dominik assigned "all right and title" in texmathp.el
    to the FSF (for program Emacs), after having previously assigned all
    past and future changes to it (also for program Emacs).  But
    texmathp.el is not even part of Emacs, whether released or in CVS.  As
    far as I can see, it is only distributed with AUCTeX at the moment.
    Does that mean that we can change the copyright notice in AUCTeX to be
    FSF only without Carsten having to sign for AUCTeX specifically?

If he signed an assignment that clearly applies to texmathp.el
then we should change the copyright notice.

We could also put the file into Emacs now if that makes things
clearer.

    Should tex-mode.el be made to rely on texmathp.el if that improves
    things?

if we put it into Emacs then yes tex-mode.el can use it.

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2004-10-09 15:43 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-10-03  1:19 [cs-usenet@arcor.de: tex-mode: too many _ (underscores) interpreted as subscripts] Richard Stallman
2004-10-03 13:52 ` Ralf Angeli
2004-10-03 19:21   ` Stefan
2004-10-03 19:53   ` Stefan
2004-10-04 15:19   ` Richard Stallman
2004-10-05  9:56     ` Ralf Angeli
2004-10-05 10:16     ` Ralf Angeli
2004-10-06 17:10       ` Richard Stallman
2004-10-07  9:08         ` Ralf Angeli
2004-10-08 16:05           ` Richard Stallman
2004-10-08 16:33             ` Ralf Angeli
2004-10-08 18:49               ` David Kastrup
2004-10-09 15:43                 ` Richard Stallman

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).