bug#22763: 25.1.50; Feature Request -- A faster method to obtain line number at position.

unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed

* bug#22763: 25.1.50; Feature Request -- A faster method to obtain line number at position.
@ 2016-02-22  2:42 Keith David Bershatsky
  2016-02-22 16:06 ` Eli Zaretskii
                   ` (3 more replies)
  0 siblings, 4 replies; 40+ messages in thread
From: Keith David Bershatsky @ 2016-02-22  2:42 UTC (permalink / raw)
  To: 22763

A while back, I posed a question on emacs.stackexchange.com for a faster method to obtain `line-number-at-pos`.  A user by the name of Constantine came up with `(format-mode-line "%l")`, which is indeed much faster.  I think the draft code below may be just a tad faster, with the added feature of the POS argument.

http://emacs.stackexchange.com/questions/3821/a-faster-method-to-obtain-line-number-at-pos-in-large-buffers

A user named wasamasa posted a comment: "Be aware that this method will give you "??" for lines exceeding line-number-display-limit-width which is set to a value of 200 per default as I found out here."

And Stefan posted a comment:  "IIRC the result may also be unreliable if there have been modifications in the buffer since the last redisplay."

The thread has received 555 hits in the past year, and several have star-ed it and up-voted the question and answer.

The following is a draft in C based on the existing function `decode_mode_spec` that lets the user input the POS as an argument without the necessity to use `goto-char`.  I'm not sure if it resolves either of the comments above.  The draft is not meant to be a patch per se, because I'm not a programmer and am just beginning my tinkering quest into the language of C.  I have been using it in my own setup for about a week and I haven't seen any ill effects.  It can of course use some TLC by someone more knowledgeable than myself.

static const char *
internal_line_number_at_position (struct window *w, register int c, int field_width, Lisp_Object *string)
{
  Lisp_Object obj;
  struct frame *f = XFRAME (WINDOW_FRAME (w));
  char *decode_mode_spec_buf = f->decode_mode_spec_buffer;
  /* We are going to use f->decode_mode_spec_buffer as the buffer to
     produce strings from numerical values, so limit preposterously
     large values of FIELD_WIDTH to avoid overrunning the buffer's
     end.  The size of the buffer is enough for FRAME_MESSAGE_BUF_SIZE
     bytes plus the terminating null.  */
  int width = min (field_width, FRAME_MESSAGE_BUF_SIZE (f));
  struct buffer *b = current_buffer;
  obj = Qnil;
  *string = Qnil;
  switch (c)
    {
    case 'l':
      {
  ptrdiff_t startpos, startpos_byte, line, linepos, linepos_byte;
  ptrdiff_t topline, nlines, height;
  ptrdiff_t junk;
  /* %c and %l are ignored in `frame-title-format'.  */
  if (mode_line_target == MODE_LINE_TITLE)
    return "";
  startpos = marker_position (w->start);
  startpos_byte = marker_byte_position (w->start);
  height = WINDOW_TOTAL_LINES (w);
  /* If we decided that this buffer isn't suitable for line numbers,
     don't forget that too fast.  */
  if (w->base_line_pos == -1)
    goto no_value;
  /* If the buffer is very big, don't waste time.  */
  if (INTEGERP (Vline_number_display_limit)
      && BUF_ZV (b) - BUF_BEGV (b) > XINT (Vline_number_display_limit))
    {
      w->base_line_pos = 0;
      w->base_line_number = 0;
      goto no_value;
    }
  if (w->base_line_number > 0
      && w->base_line_pos > 0
      && w->base_line_pos <= startpos)
    {
      line = w->base_line_number;
      linepos = w->base_line_pos;
      linepos_byte = buf_charpos_to_bytepos (b, linepos);
    }
  else
    {
      line = 1;
      linepos = BUF_BEGV (b);
      linepos_byte = BUF_BEGV_BYTE (b);
    }
  /* Count lines from base line to window start position.  */
  nlines = display_count_lines (linepos_byte,
              startpos_byte,
              startpos, &junk);
  topline = nlines + line;
  /* Determine a new base line, if the old one is too close
     or too far away, or if we did not have one.
     "Too close" means it's plausible a scroll-down would
     go back past it.  */
  if (startpos == BUF_BEGV (b))
    {
      w->base_line_number = topline;
      w->base_line_pos = BUF_BEGV (b);
    }
  else if (nlines < height + 25 || nlines > height * 3 + 50
     || linepos == BUF_BEGV (b))
    {
      ptrdiff_t limit = BUF_BEGV (b);
      ptrdiff_t limit_byte = BUF_BEGV_BYTE (b);
      ptrdiff_t position;
      ptrdiff_t distance =
        (height * 2 + 30) * line_number_display_limit_width;
      if (startpos - distance > limit)
        {
    limit = startpos - distance;
    limit_byte = CHAR_TO_BYTE (limit);
        }
      nlines = display_count_lines (startpos_byte,
            limit_byte,
            - (height * 2 + 30),
            &position);
      /* If we couldn't find the lines we wanted within
         line_number_display_limit_width chars per line,
         give up on line numbers for this window.  */
      if (position == limit_byte && limit == startpos - distance)
        {
    w->base_line_pos = -1;
    w->base_line_number = 0;
    goto no_value;
        }
      w->base_line_number = topline - nlines;
      w->base_line_pos = BYTE_TO_CHAR (position);
    }
  /* Now count lines from the start pos to point.  */
  nlines = display_count_lines (startpos_byte,
              PT_BYTE, PT, &junk);
  /* Record that we did display the line number.  */
  line_number_displayed = true;
  /* Make the string to show.  */
  pint2str (decode_mode_spec_buf, width, topline + nlines);
  return decode_mode_spec_buf;
    no_value:
        {
    char *p = decode_mode_spec_buf;
    int pad = width - 2;
    while (pad-- > 0)
      *p++ = ' ';
    *p++ = '?';
    *p++ = '?';
    *p = '\0';
    return decode_mode_spec_buf;
  }
      }
      break;
    }
  if (STRINGP (obj))
    {
      *string = obj;
      return SSDATA (obj);
    }
  else
    return "";
}

DEFUN ("line-number-at-position", Fline_number_at_position, Sline_number_at_position, 1, 2, 0,
       doc: /* Return line number at position.  */)
  (Lisp_Object window, Lisp_Object pos)
{
  struct window *w = decode_live_window (window);
  Lisp_Object buf;
  struct buffer *b;
  buf = w->contents;
  CHECK_BUFFER (buf);
  b = XBUFFER (buf);
  struct buffer *old_buffer = NULL;
  Lisp_Object foo_string;
  const char *spec;
  EMACS_INT opoint;
  EMACS_INT posint;
  Lisp_Object value;
  if (b != current_buffer)
    {
      old_buffer = current_buffer;
      set_buffer_internal (b);
    }
  if (!NILP (pos))
    {
      opoint = PT;
      posint = XINT (pos);
      SET_PT (posint);
    }
  spec = internal_line_number_at_position (w, 'l', 0, &foo_string);
  if (!NILP (pos))
    SET_PT (opoint);
  if (old_buffer)
    set_buffer_internal (old_buffer);
  value = build_string (spec);
  value = Fstring_to_number (value, make_number (10));
  return value;
}


DEFSYM (Qline_number_at_position, "line-number-at-position");





^ permalink raw reply	[flat|nested] 40+ messages in thread

* bug#22763: 25.1.50; Feature Request -- A faster method to obtain line number at position.
  2016-02-22  2:42 bug#22763: 25.1.50; Feature Request -- A faster method to obtain line number at position Keith David Bershatsky
@ 2016-02-22 16:06 ` Eli Zaretskii
  2021-02-07 15:07   ` Lars Ingebrigtsen
  2021-05-19 23:55 ` Ben Levy via Bug reports for GNU Emacs, the Swiss army knife of text editors
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 40+ messages in thread
From: Eli Zaretskii @ 2016-02-22 16:06 UTC (permalink / raw)
  To: Keith David Bershatsky; +Cc: 22763

> Date: Sun, 21 Feb 2016 18:42:52 -0800
> From: Keith David Bershatsky <esq@lawlist.com>
> 
> A while back, I posed a question on emacs.stackexchange.com for a faster method to obtain `line-number-at-pos`.  A user by the name of Constantine came up with `(format-mode-line "%l")`, which is indeed much faster.  I think the draft code below may be just a tad faster, with the added feature of the POS argument.
> 
> http://emacs.stackexchange.com/questions/3821/a-faster-method-to-obtain-line-number-at-pos-in-large-buffers
> 
> A user named wasamasa posted a comment: "Be aware that this method will give you "??" for lines exceeding line-number-display-limit-width which is set to a value of 200 per default as I found out here."
> 
> And Stefan posted a comment:  "IIRC the result may also be unreliable if there have been modifications in the buffer since the last redisplay."
> 
> The thread has received 555 hits in the past year, and several have star-ed it and up-voted the question and answer.
> 
> The following is a draft in C based on the existing function `decode_mode_spec` that lets the user input the POS as an argument without the necessity to use `goto-char`.  I'm not sure if it resolves either of the comments above.  The draft is not meant to be a patch per se, because I'm not a programmer and am just beginning my tinkering quest into the language of C.  I have been using it in my own setup for about a week and I haven't seen any ill effects.  It can of course use some TLC by someone more knowledgeable than myself.

This would overwrite the line number and position cached by each
window for redisplay purposes.  I think this is undesirable.  More
importantly, the gotchas pointed out above will still be true, AFAICT.

My suggestion is instead to expose display_count_lines to Lisp via a
simple wrapper (that would need to take care of narrowing and such
likes, something that your proposed API doesn't do, btw).  In my
limited testing I saw a 5- to 10-fold speedup as compared to
line-number-at-pos, when doing that.  I think this is good enough;
applications that need faster massive counting, if there are such
applications, can implement a cache in Lisp and speed this up even
more.

Thanks.





^ permalink raw reply	[flat|nested] 40+ messages in thread

* bug#22763: 25.1.50; Feature Request -- A faster method to obtain line number at position.
  2016-02-22 16:06 ` Eli Zaretskii
@ 2021-02-07 15:07   ` Lars Ingebrigtsen
  2021-02-07 15:44     ` Lars Ingebrigtsen
  2021-02-07 16:07     ` Lars Ingebrigtsen
  0 siblings, 2 replies; 40+ messages in thread
From: Lars Ingebrigtsen @ 2021-02-07 15:07 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 22763, Keith David Bershatsky

Eli Zaretskii <eliz@gnu.org> writes:

> My suggestion is instead to expose display_count_lines to Lisp via a
> simple wrapper (that would need to take care of narrowing and such
> likes, something that your proposed API doesn't do, btw). 

I've now added this to Emacs 28 -- line-number-at-position.  In my
tests, it makes `count-lines' about 2x faster than the previous
implementation.

Now, there's a function called `line-number-at-pos' in simple.el (that
I've amended to use `line-number-at-position', so it's even faster), and
it has almost exactly the signature that the new function has.  But it
has the POS parameter optional, and it has an ABSOLUTE parameter.

I didn't actually discover it before I had finished writing the new
function, so I think I'll just extend the new function to have the same
signature, and then remove the simple.el function.

Unless somebody else has a different opinion in the next ten minutes...

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no

^ permalink raw reply	[flat|nested] 40+ messages in thread

* bug#22763: 25.1.50; Feature Request -- A faster method to obtain line number at position.
  2021-02-07 15:07   ` Lars Ingebrigtsen
@ 2021-02-07 15:44     ` Lars Ingebrigtsen
  2021-02-07 16:07     ` Lars Ingebrigtsen
  1 sibling, 0 replies; 40+ messages in thread
From: Lars Ingebrigtsen @ 2021-02-07 15:44 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 22763, Keith David Bershatsky

Lars Ingebrigtsen <larsi@gnus.org> writes:

> Unless somebody else has a different opinion in the next ten minutes...

Nope.  And now all the tests actually pass, too, which I should have
checked before I pushed...

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no





^ permalink raw reply	[flat|nested] 40+ messages in thread

* bug#22763: 25.1.50; Feature Request -- A faster method to obtain line number at position.
  2021-02-07 15:07   ` Lars Ingebrigtsen
  2021-02-07 15:44     ` Lars Ingebrigtsen
@ 2021-02-07 16:07     ` Lars Ingebrigtsen
  2021-02-07 17:40       ` Stefan Monnier
  1 sibling, 1 reply; 40+ messages in thread
From: Lars Ingebrigtsen @ 2021-02-07 16:07 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 22763, Keith David Bershatsky

Lars Ingebrigtsen <larsi@gnus.org> writes:

> I didn't actually discover it before I had finished writing the new
> function, so I think I'll just extend the new function to have the same
> signature, and then remove the simple.el function.

This takes 2.8s in Emacs 27 and 0.3s after the change:

(with-temp-buffer
  (insert-file-contents "~/src/emacs/trunk/src/ChangeLog.11")
  (benchmark-run 1000
    (line-number-at-pos (point-max))))

So that's nice.  `line-number-at-pos' is used all over Emacs, but
probably not in a loop a lot, so it probably doesn't make much
difference, though.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no





^ permalink raw reply	[flat|nested] 40+ messages in thread

* bug#22763: 25.1.50; Feature Request -- A faster method to obtain line number at position.
  2021-02-07 16:07     ` Lars Ingebrigtsen
@ 2021-02-07 17:40       ` Stefan Monnier
  2021-02-07 17:45         ` Lars Ingebrigtsen
  0 siblings, 1 reply; 40+ messages in thread
From: Stefan Monnier @ 2021-02-07 17:40 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: 22763, Keith David Bershatsky

> This takes 2.8s in Emacs 27 and 0.3s after the change:
>
> (with-temp-buffer
>   (insert-file-contents "~/src/emacs/trunk/src/ChangeLog.11")
>   (benchmark-run 1000
>     (line-number-at-pos (point-max))))
>
> So that's nice.  `line-number-at-pos' is used all over Emacs, but
> probably not in a loop a lot, so it probably doesn't make much
> difference, though.

Why is it faster?

Is it still always Θ(N) just with a smaller constant (if so, what makes
the constant smaller), or does it benefit from some kind of caching
(which I fail to see in the code) such that it's O(N) sometimes but much
faster other times (and if so, what are the cases that are sped up)?


        Stefan






^ permalink raw reply	[flat|nested] 40+ messages in thread

* bug#22763: 25.1.50; Feature Request -- A faster method to obtain line number at position.
  2021-02-07 17:40       ` Stefan Monnier
@ 2021-02-07 17:45         ` Lars Ingebrigtsen
  2021-02-07 18:07           ` Lars Ingebrigtsen
  2021-02-07 18:09           ` Eli Zaretskii
  0 siblings, 2 replies; 40+ messages in thread
From: Lars Ingebrigtsen @ 2021-02-07 17:45 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: 22763, Keith David Bershatsky

Stefan Monnier <monnier@iro.umontreal.ca> writes:

> Why is it faster?
>
> Is it still always Θ(N) just with a smaller constant (if so, what makes
> the constant smaller), or does it benefit from some kind of caching
> (which I fail to see in the code) such that it's O(N) sometimes but much
> faster other times (and if so, what are the cases that are sped up)?

There's no caching.  I guess find_newline is just slow compared to
display_count_lines?  (How many of these functions do we have in the C
layer, anyway?)

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no





^ permalink raw reply	[flat|nested] 40+ messages in thread

* bug#22763: 25.1.50; Feature Request -- A faster method to obtain line number at position.
  2021-02-07 17:45         ` Lars Ingebrigtsen
@ 2021-02-07 18:07           ` Lars Ingebrigtsen
  2021-02-07 18:09           ` Eli Zaretskii
  1 sibling, 0 replies; 40+ messages in thread
From: Lars Ingebrigtsen @ 2021-02-07 18:07 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: 22763, Keith David Bershatsky

Lars Ingebrigtsen <larsi@gnus.org> writes:

> Stefan Monnier <monnier@iro.umontreal.ca> writes:
>
>> Why is it faster?
>>
>> Is it still always Θ(N) just with a smaller constant (if so, what makes
>> the constant smaller), or does it benefit from some kind of caching
>> (which I fail to see in the code) such that it's O(N) sometimes but much
>> faster other times (and if so, what are the cases that are sped up)?
>
> There's no caching.  I guess find_newline is just slow compared to
> display_count_lines?  (How many of these functions do we have in the C
> layer, anyway?)

Oh, I see that find_newline has a lot of caching going on, and the logic
isn't...  quite...  obvious.  I guess it's possible there may be cases
where the new implementation is slower, then?  I'm not sure how to
measure that, though -- it takes 10x more time to call the old
`line-number-at-pos' ten times, so if the function is supposed to cache
something, it's not doing that.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no





^ permalink raw reply	[flat|nested] 40+ messages in thread

* bug#22763: 25.1.50; Feature Request -- A faster method to obtain line number at position.
  2021-02-07 17:45         ` Lars Ingebrigtsen
  2021-02-07 18:07           ` Lars Ingebrigtsen
@ 2021-02-07 18:09           ` Eli Zaretskii
  2021-02-07 18:14             ` Lars Ingebrigtsen
  1 sibling, 1 reply; 40+ messages in thread
From: Eli Zaretskii @ 2021-02-07 18:09 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: 22763, esq, monnier

> From: Lars Ingebrigtsen <larsi@gnus.org>
> Cc: Eli Zaretskii <eliz@gnu.org>,  22763@debbugs.gnu.org,  Keith David
>  Bershatsky <esq@lawlist.com>
> Date: Sun, 07 Feb 2021 18:45:51 +0100
> 
> Stefan Monnier <monnier@iro.umontreal.ca> writes:
> 
> > Why is it faster?
> >
> > Is it still always Θ(N) just with a smaller constant (if so, what makes
> > the constant smaller), or does it benefit from some kind of caching
> > (which I fail to see in the code) such that it's O(N) sometimes but much
> > faster other times (and if so, what are the cases that are sped up)?
> 
> There's no caching.  I guess find_newline is just slow compared to
> display_count_lines?

find_newline does the same as display_count_lines: it calls memchr.
But it also maintains a newline cache.  If you disable that cache (by
turning of cache-long-scans), you might see a different speedup.

Also, calling forward-line would loop in Lisp, not in C.





^ permalink raw reply	[flat|nested] 40+ messages in thread

* bug#22763: 25.1.50; Feature Request -- A faster method to obtain line number at position.
  2021-02-07 18:09           ` Eli Zaretskii
@ 2021-02-07 18:14             ` Lars Ingebrigtsen
  2021-02-07 18:23               ` Lars Ingebrigtsen
  0 siblings, 1 reply; 40+ messages in thread
From: Lars Ingebrigtsen @ 2021-02-07 18:14 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 22763, esq, monnier

Eli Zaretskii <eliz@gnu.org> writes:

>> There's no caching.  I guess find_newline is just slow compared to
>> display_count_lines?
>
> find_newline does the same as display_count_lines: it calls memchr.
> But it also maintains a newline cache.  If you disable that cache (by
> turning of cache-long-scans), you might see a different speedup.

Shouldn't the cache speed things up?

Hm...  perhaps I'm not getting a newline cache in my tests because I'm
using a temp buffer or something?

> Also, calling forward-line would loop in Lisp, not in C.

No, it cleverly loops in C by calling `forward-line' this way to count
lines:

(- (buffer-size) (forward-line (buffer-size)))

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no





^ permalink raw reply	[flat|nested] 40+ messages in thread

* bug#22763: 25.1.50; Feature Request -- A faster method to obtain line number at position.
  2021-02-07 18:14             ` Lars Ingebrigtsen
@ 2021-02-07 18:23               ` Lars Ingebrigtsen
  2021-02-07 19:02                 ` Eli Zaretskii
  0 siblings, 1 reply; 40+ messages in thread
From: Lars Ingebrigtsen @ 2021-02-07 18:23 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 22763, esq, monnier

Lars Ingebrigtsen <larsi@gnus.org> writes:

>> find_newline does the same as display_count_lines: it calls memchr.
>> But it also maintains a newline cache.  If you disable that cache (by
>> turning of cache-long-scans), you might see a different speedup.
>
> Shouldn't the cache speed things up?

No, you're right -- if I set `cache-long-scans' to nil and
`line-number-at-pos' in Emacs 27 becomes ~8x faster in my test, which is
currently

(with-temp-buffer
  (setq cache-long-scans nil)
  (dotimes (_ 1000)
    (insert-file-contents "~/src/emacs/trunk/src/ChangeLog.11")
    (goto-char (point-max)))
  (benchmark-run 1
    (dotimes (i 1000)
      (goto-char (/ (buffer-size) 1000))
      (line-number-at-pos (point)))))

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no





^ permalink raw reply	[flat|nested] 40+ messages in thread

* bug#22763: 25.1.50; Feature Request -- A faster method to obtain line number at position.
  2021-02-07 18:23               ` Lars Ingebrigtsen
@ 2021-02-07 19:02                 ` Eli Zaretskii
  2021-02-07 19:06                   ` Eli Zaretskii
  2021-02-07 19:25                   ` Lars Ingebrigtsen
  0 siblings, 2 replies; 40+ messages in thread
From: Eli Zaretskii @ 2021-02-07 19:02 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: 22763, esq, monnier

> From: Lars Ingebrigtsen <larsi@gnus.org>
> Cc: 22763@debbugs.gnu.org,  esq@lawlist.com,  monnier@iro.umontreal.ca
> Date: Sun, 07 Feb 2021 19:23:53 +0100
> 
> Lars Ingebrigtsen <larsi@gnus.org> writes:
> 
> >> find_newline does the same as display_count_lines: it calls memchr.
> >> But it also maintains a newline cache.  If you disable that cache (by
> >> turning of cache-long-scans), you might see a different speedup.
> >
> > Shouldn't the cache speed things up?
> 
> No, you're right -- if I set `cache-long-scans' to nil and
> `line-number-at-pos' in Emacs 27 becomes ~8x faster in my test, which is
> currently
> 
> (with-temp-buffer
>   (setq cache-long-scans nil)
>   (dotimes (_ 1000)
>     (insert-file-contents "~/src/emacs/trunk/src/ChangeLog.11")
>     (goto-char (point-max)))
>   (benchmark-run 1
>     (dotimes (i 1000)
>       (goto-char (/ (buffer-size) 1000))
>       (line-number-at-pos (point)))))

This benchmark is not very fair.  For starters, 1 thousands of
ChangeLog.11's size is on line 31 of the file, so you have just 31
lines to count.  And those lines are quite short.

Try counting a much larger number of lines, and make the lines
longer.  Then you may see different results.

It is also interesting to compare the first iterations with all the
rest, when the newlines are already cached.

But in general, the raw speed of memchr is very hard to beat,
especially given that using the cache requires calls to CHAR_TO_BYTE
and BYTE_TO_CHAR, which can be expensive.





^ permalink raw reply	[flat|nested] 40+ messages in thread

* bug#22763: 25.1.50; Feature Request -- A faster method to obtain line number at position.
  2021-02-07 19:02                 ` Eli Zaretskii
@ 2021-02-07 19:06                   ` Eli Zaretskii
  2021-02-07 19:25                   ` Lars Ingebrigtsen
  1 sibling, 0 replies; 40+ messages in thread
From: Eli Zaretskii @ 2021-02-07 19:06 UTC (permalink / raw)
  To: larsi; +Cc: 22763, esq, monnier

> Date: Sun, 07 Feb 2021 21:02:25 +0200
> From: Eli Zaretskii <eliz@gnu.org>
> Cc: 22763@debbugs.gnu.org, esq@lawlist.com, monnier@iro.umontreal.ca
> 
> This benchmark is not very fair.  For starters, 1 thousands of
> ChangeLog.11's size is on line 31 of the file, so you have just 31
> lines to count.  And those lines are quite short.

Oops, I now see that you insert the file 1000 times, so it's 31
thousand lines.  Still, not a very large number, and the lines are
very short.





^ permalink raw reply	[flat|nested] 40+ messages in thread

* bug#22763: 25.1.50; Feature Request -- A faster method to obtain line number at position.
  2021-02-07 19:02                 ` Eli Zaretskii
  2021-02-07 19:06                   ` Eli Zaretskii
@ 2021-02-07 19:25                   ` Lars Ingebrigtsen
  2021-02-07 19:34                     ` Lars Ingebrigtsen
                                       ` (2 more replies)
  1 sibling, 3 replies; 40+ messages in thread
From: Lars Ingebrigtsen @ 2021-02-07 19:25 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 22763, esq, monnier

Eli Zaretskii <eliz@gnu.org> writes:

>> (with-temp-buffer
>>   (setq cache-long-scans nil)
>>   (dotimes (_ 1000)
>>     (insert-file-contents "~/src/emacs/trunk/src/ChangeLog.11")
>>     (goto-char (point-max)))
>>   (benchmark-run 1
>>     (dotimes (i 1000)
>>       (goto-char (/ (buffer-size) 1000))
>>       (line-number-at-pos (point)))))
>
> This benchmark is not very fair.  For starters, 1 thousands of
> ChangeLog.11's size is on line 31 of the file, so you have just 31
> lines to count.  And those lines are quite short.

Oops, that's not what I meant to test...  I meant to skip to 1000 places
in the buffer, so that's:

(with-temp-buffer
  (dotimes (_ 1000)
    (insert-file-contents "~/src/emacs/trunk/src/ChangeLog.11")
    (goto-char (point-max)))
  (benchmark-run 1
    (dotimes (i 100)
      (goto-char (* (/ (buffer-size) 100) i))
      (line-number-at-pos (point)))))

(Adjusted down to 100, because it takes too long.)  Let's see...

Yup, still 10x faster.

> Try counting a much larger number of lines, and make the lines
> longer.  Then you may see different results.
>
> It is also interesting to compare the first iterations with all the
> rest, when the newlines are already cached.

OK, I've now bumped the benchmark-run to 10 (and decreased the buffer
size by a factor of 10)...  let's see...  The new version takes exactly
the same amount of time, of course...

And so does the old one.  Well, it's 10% faster in this?

(with-temp-buffer
  (dotimes (_ 100)
    (insert-file-contents "~/src/emacs/trunk/src/ChangeLog.11")
    (goto-char (point-max)))
  (benchmark-run 10
    (dotimes (i 100)
      (goto-char (* (/ (buffer-size) 100) i))
      (line-number-at-pos (point)))))

Hm.  I guess this doesn't update the newline cache in any useful way?
Is that a bug?  I haven't actually read the code in the function
closely.

> But in general, the raw speed of memchr is very hard to beat,
> especially given that using the cache requires calls to CHAR_TO_BYTE
> and BYTE_TO_CHAR, which can be expensive.

So ... it's using the cache is only faster when we have monumentally
long lines, since memchr is so fast?  And in buffers with lines with
normal line lengths, it's 10x slower?  That sounds like a rather drastic
trade-off.

I'll try to whip up a benchmark with really long lines and see what
happens.

Eli Zaretskii <eliz@gnu.org> writes:

> Oops, I now see that you insert the file 1000 times, so it's 31
> thousand lines.  Still, not a very large number, and the lines are
> very short.

Yes, short lines, but the ChangeLog file is pretty typical...  

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no

^ permalink raw reply	[flat|nested] 40+ messages in thread

* bug#22763: 25.1.50; Feature Request -- A faster method to obtain line number at position.
  2021-02-07 19:25                   ` Lars Ingebrigtsen
@ 2021-02-07 19:34                     ` Lars Ingebrigtsen
  2021-02-07 19:43                       ` Eli Zaretskii
  2021-02-07 19:42                     ` Eli Zaretskii
  2021-02-07 20:37                     ` Stefan Monnier
  2 siblings, 1 reply; 40+ messages in thread
From: Lars Ingebrigtsen @ 2021-02-07 19:34 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 22763, esq, monnier

Lars Ingebrigtsen <larsi@gnus.org> writes:

> I'll try to whip up a benchmark with really long lines and see what
> happens.

(with-temp-buffer
  (dotimes (_ 10000)
    (insert (make-string (random 4000) ?a) "\n"))
  (benchmark-run 10
    (goto-char (point-min))
    (while (not (eobp))
      (line-number-at-pos (point))
      (forward-line 1))))

With this, the new version still wins...  but only slightly.  So the
cache does indeed help when we've got long lines (which was the point of
the cache, I guess?)

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no





^ permalink raw reply	[flat|nested] 40+ messages in thread

* bug#22763: 25.1.50; Feature Request -- A faster method to obtain line number at position.
  2021-02-07 19:25                   ` Lars Ingebrigtsen
  2021-02-07 19:34                     ` Lars Ingebrigtsen
@ 2021-02-07 19:42                     ` Eli Zaretskii
  2021-02-07 19:46                       ` Lars Ingebrigtsen
  2021-02-07 20:37                     ` Stefan Monnier
  2 siblings, 1 reply; 40+ messages in thread
From: Eli Zaretskii @ 2021-02-07 19:42 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: 22763, esq, monnier

> From: Lars Ingebrigtsen <larsi@gnus.org>
> Cc: 22763@debbugs.gnu.org,  esq@lawlist.com,  monnier@iro.umontreal.ca
> Date: Sun, 07 Feb 2021 20:25:45 +0100
> 
> (with-temp-buffer
>   (dotimes (_ 1000)
>     (insert-file-contents "~/src/emacs/trunk/src/ChangeLog.11")
>     (goto-char (point-max)))
>   (benchmark-run 1
>     (dotimes (i 100)
>       (goto-char (* (/ (buffer-size) 100) i))
>       (line-number-at-pos (point)))))
> 
> (Adjusted down to 100, because it takes too long.)  Let's see...
> 
> Yup, still 10x faster.

This one traverses each 1/100th region of the file just once, no?

> OK, I've now bumped the benchmark-run to 10 (and decreased the buffer
> size by a factor of 10)...  let's see...  The new version takes exactly
> the same amount of time, of course...
> 
> And so does the old one.  Well, it's 10% faster in this?

10% or 10-fold?

> (with-temp-buffer
>   (dotimes (_ 100)
>     (insert-file-contents "~/src/emacs/trunk/src/ChangeLog.11")
>     (goto-char (point-max)))
>   (benchmark-run 10
>     (dotimes (i 100)
>       (goto-char (* (/ (buffer-size) 100) i))
>       (line-number-at-pos (point)))))
> 
> Hm.  I guess this doesn't update the newline cache in any useful way?

Why not?  It should.

> > But in general, the raw speed of memchr is very hard to beat,
> > especially given that using the cache requires calls to CHAR_TO_BYTE
> > and BYTE_TO_CHAR, which can be expensive.
> 
> So ... it's using the cache is only faster when we have monumentally
> long lines, since memchr is so fast?

Yes.

> And in buffers with lines with normal line lengths, it's 10x slower?

In my benchmarks some years ago it was about twice slower, not 10
times.





^ permalink raw reply	[flat|nested] 40+ messages in thread

* bug#22763: 25.1.50; Feature Request -- A faster method to obtain line number at position.
  2021-02-07 19:34                     ` Lars Ingebrigtsen
@ 2021-02-07 19:43                       ` Eli Zaretskii
  0 siblings, 0 replies; 40+ messages in thread
From: Eli Zaretskii @ 2021-02-07 19:43 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: 22763, esq, monnier

> From: Lars Ingebrigtsen <larsi@gnus.org>
> Cc: 22763@debbugs.gnu.org,  esq@lawlist.com,  monnier@iro.umontreal.ca
> Date: Sun, 07 Feb 2021 20:34:56 +0100
> 
> (with-temp-buffer
>   (dotimes (_ 10000)
>     (insert (make-string (random 4000) ?a) "\n"))
>   (benchmark-run 10
>     (goto-char (point-min))
>     (while (not (eobp))
>       (line-number-at-pos (point))
>       (forward-line 1))))
> 
> With this, the new version still wins...  but only slightly.  So the
> cache does indeed help when we've got long lines (which was the point of
> the cache, I guess?)

Yes.





^ permalink raw reply	[flat|nested] 40+ messages in thread

* bug#22763: 25.1.50; Feature Request -- A faster method to obtain line number at position.
  2021-02-07 19:42                     ` Eli Zaretskii
@ 2021-02-07 19:46                       ` Lars Ingebrigtsen
  2021-02-07 19:52                         ` Eli Zaretskii
  0 siblings, 1 reply; 40+ messages in thread
From: Lars Ingebrigtsen @ 2021-02-07 19:46 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 22763, esq, monnier

Eli Zaretskii <eliz@gnu.org> writes:

>> (with-temp-buffer
>>   (dotimes (_ 1000)
>>     (insert-file-contents "~/src/emacs/trunk/src/ChangeLog.11")
>>     (goto-char (point-max)))
>>   (benchmark-run 1
>>     (dotimes (i 100)
>>       (goto-char (* (/ (buffer-size) 100) i))
>>       (line-number-at-pos (point)))))
>> 
>> (Adjusted down to 100, because it takes too long.)  Let's see...
>> 
>> Yup, still 10x faster.
>
> This one traverses each 1/100th region of the file just once, no?

Did I write it wrong again?

    (dotimes (i 100)
      (goto-char (* (/ (buffer-size) 100) i))
      (line-number-at-pos (point)))))

No, that should be the entire buffer, spread out evenly?

>> OK, I've now bumped the benchmark-run to 10 (and decreased the buffer
>> size by a factor of 10)...  let's see...  The new version takes exactly
>> the same amount of time, of course...
>> 
>> And so does the old one.  Well, it's 10% faster in this?
>
> 10% or 10-fold?

10%.

>> And in buffers with lines with normal line lengths, it's 10x slower?
>
> In my benchmarks some years ago it was about twice slower, not 10
> times.

Perhaps memchr has gotten faster over the years?  Using larger memory
fetches and stuff?

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no





^ permalink raw reply	[flat|nested] 40+ messages in thread

* bug#22763: 25.1.50; Feature Request -- A faster method to obtain line number at position.
  2021-02-07 19:46                       ` Lars Ingebrigtsen
@ 2021-02-07 19:52                         ` Eli Zaretskii
  2021-02-07 21:52                           ` Lars Ingebrigtsen
  0 siblings, 1 reply; 40+ messages in thread
From: Eli Zaretskii @ 2021-02-07 19:52 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: 22763, esq, monnier

> From: Lars Ingebrigtsen <larsi@gnus.org>
> Cc: 22763@debbugs.gnu.org,  esq@lawlist.com,  monnier@iro.umontreal.ca
> Date: Sun, 07 Feb 2021 20:46:37 +0100
> 
> > In my benchmarks some years ago it was about twice slower, not 10
> > times.
> 
> Perhaps memchr has gotten faster over the years?

No, it was fast then as well (due to inlining, AFAIR).

I think the main factor is the file used to benchmark this stuff.  I
don't remember what I used.





^ permalink raw reply	[flat|nested] 40+ messages in thread

* bug#22763: 25.1.50; Feature Request -- A faster method to obtain line number at position.
  2021-02-07 19:25                   ` Lars Ingebrigtsen
  2021-02-07 19:34                     ` Lars Ingebrigtsen
  2021-02-07 19:42                     ` Eli Zaretskii
@ 2021-02-07 20:37                     ` Stefan Monnier
  2021-02-07 20:42                       ` Lars Ingebrigtsen
  2 siblings, 1 reply; 40+ messages in thread
From: Stefan Monnier @ 2021-02-07 20:37 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: 22763, esq

> So ... it's using the cache is only faster when we have monumentally
> long lines, since memchr is so fast?  And in buffers with lines with
> normal line lengths, it's 10x slower?  That sounds like a rather drastic
> trade-off.

Yes, the problem is when you have pathologically-long lines (typically,
when the whole file has virtually no \n in it, so lines can reach
multi-MB lengths).


        Stefan






^ permalink raw reply	[flat|nested] 40+ messages in thread

* bug#22763: 25.1.50; Feature Request -- A faster method to obtain line number at position.
  2021-02-07 20:37                     ` Stefan Monnier
@ 2021-02-07 20:42                       ` Lars Ingebrigtsen
  2021-02-07 20:50                         ` Eli Zaretskii
  0 siblings, 1 reply; 40+ messages in thread
From: Lars Ingebrigtsen @ 2021-02-07 20:42 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: 22763, esq

Stefan Monnier <monnier@iro.umontreal.ca> writes:

>> So ... it's using the cache is only faster when we have monumentally
>> long lines, since memchr is so fast?  And in buffers with lines with
>> normal line lengths, it's 10x slower?  That sounds like a rather drastic
>> trade-off.
>
> Yes, the problem is when you have pathologically-long lines (typically,
> when the whole file has virtually no \n in it, so lines can reach
> multi-MB lengths).

Right, and in that case a cached version would definitely be faster on
repeated executions.  Why does display_count_lines (with no cache)
exist, then?  Historical reasons?

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no





^ permalink raw reply	[flat|nested] 40+ messages in thread

* bug#22763: 25.1.50; Feature Request -- A faster method to obtain line number at position.
  2021-02-07 20:42                       ` Lars Ingebrigtsen
@ 2021-02-07 20:50                         ` Eli Zaretskii
  2021-02-07 21:36                           ` Lars Ingebrigtsen
  0 siblings, 1 reply; 40+ messages in thread
From: Eli Zaretskii @ 2021-02-07 20:50 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: 22763, esq, monnier

> From: Lars Ingebrigtsen <larsi@gnus.org>
> Cc: Eli Zaretskii <eliz@gnu.org>,  22763@debbugs.gnu.org,  esq@lawlist.com
> Date: Sun, 07 Feb 2021 21:42:30 +0100
> 
> Why does display_count_lines (with no cache) exist, then?

It exists to support the line-number display on the mode line.  That
display has its own cache, as part of the window object, so
display_count_lines very rarely needs to count from the beginning of
the buffer, it usually counts from the last place it stopped the
previous time for the same window.  This is why it has the signature
that it has.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* bug#22763: 25.1.50; Feature Request -- A faster method to obtain line number at position.
  2021-02-07 20:50                         ` Eli Zaretskii
@ 2021-02-07 21:36                           ` Lars Ingebrigtsen
  2021-02-08 15:04                             ` Eli Zaretskii
  0 siblings, 1 reply; 40+ messages in thread
From: Lars Ingebrigtsen @ 2021-02-07 21:36 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 22763, esq, monnier

Eli Zaretskii <eliz@gnu.org> writes:

>> From: Lars Ingebrigtsen <larsi@gnus.org>
>> Cc: Eli Zaretskii <eliz@gnu.org>,  22763@debbugs.gnu.org,  esq@lawlist.com
>> Date: Sun, 07 Feb 2021 21:42:30 +0100
>> 
>> Why does display_count_lines (with no cache) exist, then?
>
> It exists to support the line-number display on the mode line.  That
> display has its own cache, as part of the window object, so
> display_count_lines very rarely needs to count from the beginning of
> the buffer, it usually counts from the last place it stopped the
> previous time for the same window.  This is why it has the signature
> that it has.

But these are the signatures:

ptrdiff_t
find_newline (ptrdiff_t start, ptrdiff_t start_byte, ptrdiff_t end,
	      ptrdiff_t end_byte, ptrdiff_t count, ptrdiff_t *counted,
	      ptrdiff_t *bytepos, bool allow_quit)

static ptrdiff_t
display_count_lines (ptrdiff_t start_byte,
		     ptrdiff_t limit_byte, ptrdiff_t count,
		     ptrdiff_t *byte_pos_ptr)

So they seem very similar...

Anyway, here's something that just occurred to me: It's still the plan
to have so-long-mode on by default, right?  Which means that
'buffer-line-statistics' will be called when opening a file, which means
that we know whether there are any long lines in the buffer.

Could we use this info to switch between cached and non-cached action
for find_newline?  (I.e., just set 'cache-long-scans'.)

Conversely, could we use find_newline to trigger so-long-mode?  Today,
so-long-mode isn't able to step into the fray when something plops a
long line into the buffer (in shell-mode, for instance).  If
find_newline finds a long line, it could switch 'cache-long-scans' on,
and also (on perhaps a different threshold) notify so-long-mode?

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no

^ permalink raw reply	[flat|nested] 40+ messages in thread

* bug#22763: 25.1.50; Feature Request -- A faster method to obtain line number at position.
  2021-02-07 19:52                         ` Eli Zaretskii
@ 2021-02-07 21:52                           ` Lars Ingebrigtsen
  2021-02-07 21:58                             ` Lars Ingebrigtsen
  2021-02-07 22:09                             ` Philipp
  0 siblings, 2 replies; 40+ messages in thread
From: Lars Ingebrigtsen @ 2021-02-07 21:52 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 22763, esq, monnier

Eli Zaretskii <eliz@gnu.org> writes:

>> Perhaps memchr has gotten faster over the years?
>
> No, it was fast then as well (due to inlining, AFAIR).
>
> I think the main factor is the file used to benchmark this stuff.  I
> don't remember what I used.

I was curious as to how memchr is implemented these days:

https://code.woboq.org/userspace/glibc/string/memchr.c.html

It seems it tests one longword_ptr at a time?  So that's 8 bytes on
64-bit CPUs, I think.  Doesn't look like any SSE/AVX support, though.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no





^ permalink raw reply	[flat|nested] 40+ messages in thread

* bug#22763: 25.1.50; Feature Request -- A faster method to obtain line number at position.
  2021-02-07 21:52                           ` Lars Ingebrigtsen
@ 2021-02-07 21:58                             ` Lars Ingebrigtsen
  2021-02-08  3:34                               ` Eli Zaretskii
  2021-02-07 22:09                             ` Philipp
  1 sibling, 1 reply; 40+ messages in thread
From: Lars Ingebrigtsen @ 2021-02-07 21:58 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 22763, esq, monnier

Lars Ingebrigtsen <larsi@gnus.org> writes:

> It seems it tests one longword_ptr at a time?  So that's 8 bytes on
> 64-bit CPUs, I think.  Doesn't look like any SSE/AVX support, though.

Never mind -- there are even more optimised versions using SSE2 etc,
too:

https://github.com/lattera/glibc/blob/master/sysdeps/x86_64/memchr.S

I've done some further benchmarking -- with a perhaps more typical
file than the ChangeLog:

(with-temp-buffer
  (dotimes (_ 100)
    (insert-file-contents "~/src/emacs/trunk/lisp/printing.el")
    (goto-char (point-max)))
  (benchmark-run 10
    (dotimes (i 100)
      (goto-char (* (/ (buffer-size) 100) i))
      (line-number-at-pos (point)))))

The cached version is only 5x slower than the non-cached one.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no





^ permalink raw reply	[flat|nested] 40+ messages in thread

* bug#22763: 25.1.50; Feature Request -- A faster method to obtain line number at position.
  2021-02-07 21:52                           ` Lars Ingebrigtsen
  2021-02-07 21:58                             ` Lars Ingebrigtsen
@ 2021-02-07 22:09                             ` Philipp
  1 sibling, 0 replies; 40+ messages in thread
From: Philipp @ 2021-02-07 22:09 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: 22763, esq, monnier



> Am 07.02.2021 um 22:52 schrieb Lars Ingebrigtsen <larsi@gnus.org>:
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> 
>>> Perhaps memchr has gotten faster over the years?
>> 
>> No, it was fast then as well (due to inlining, AFAIR).
>> 
>> I think the main factor is the file used to benchmark this stuff.  I
>> don't remember what I used.
> 
> I was curious as to how memchr is implemented these days:
> 
> https://code.woboq.org/userspace/glibc/string/memchr.c.html
> 
> It seems it tests one longword_ptr at a time?  So that's 8 bytes on
> 64-bit CPUs, I think.  Doesn't look like any SSE/AVX support, though.

I think that’s only the unoptimized generic version, there are architecture-specific versions like https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/x86_64/memchr.S;h=beff2708de6a1e40de4141f94ff6fe763f041164;hb=HEAD.




^ permalink raw reply	[flat|nested] 40+ messages in thread

* bug#22763: 25.1.50; Feature Request -- A faster method to obtain line number at position.
  2021-02-07 21:58                             ` Lars Ingebrigtsen
@ 2021-02-08  3:34                               ` Eli Zaretskii
  0 siblings, 0 replies; 40+ messages in thread
From: Eli Zaretskii @ 2021-02-08  3:34 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: 22763, esq, monnier

> From: Lars Ingebrigtsen <larsi@gnus.org>
> Cc: 22763@debbugs.gnu.org,  esq@lawlist.com,  monnier@iro.umontreal.ca
> Date: Sun, 07 Feb 2021 22:58:12 +0100
> 
> Never mind -- there are even more optimised versions using SSE2 etc,
> too:
> 
> https://github.com/lattera/glibc/blob/master/sysdeps/x86_64/memchr.S

You need to look at the code emitted by GCC under -O2.  glibc is not
necessarily relevant here.





^ permalink raw reply	[flat|nested] 40+ messages in thread

* bug#22763: 25.1.50; Feature Request -- A faster method to obtain line number at position.
  2021-02-07 21:36                           ` Lars Ingebrigtsen
@ 2021-02-08 15:04                             ` Eli Zaretskii
  2021-02-09  2:17                               ` Katsumi Yamaoka
  0 siblings, 1 reply; 40+ messages in thread
From: Eli Zaretskii @ 2021-02-08 15:04 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: 22763, esq, monnier

> From: Lars Ingebrigtsen <larsi@gnus.org>
> Cc: monnier@iro.umontreal.ca,  22763@debbugs.gnu.org,  esq@lawlist.com
> Date: Sun, 07 Feb 2021 22:36:12 +0100
> 
> > It exists to support the line-number display on the mode line.  That
> > display has its own cache, as part of the window object, so
> > display_count_lines very rarely needs to count from the beginning of
> > the buffer, it usually counts from the last place it stopped the
> > previous time for the same window.  This is why it has the signature
> > that it has.
> 
> But these are the signatures:
> 
> ptrdiff_t
> find_newline (ptrdiff_t start, ptrdiff_t start_byte, ptrdiff_t end,
> 	      ptrdiff_t end_byte, ptrdiff_t count, ptrdiff_t *counted,
> 	      ptrdiff_t *bytepos, bool allow_quit)
> 
> 
> static ptrdiff_t
> display_count_lines (ptrdiff_t start_byte,
> 		     ptrdiff_t limit_byte, ptrdiff_t count,
> 		     ptrdiff_t *byte_pos_ptr)
> 
> So they seem very similar...

I was trying to explain why display_count_lines accepts arguments that
may seem unnecessary at first sight if the job is to just count the
lines.

> Anyway, here's something that just occurred to me: It's still the plan
> to have so-long-mode on by default, right?  Which means that
> 'buffer-line-statistics' will be called when opening a file, which means
> that we know whether there are any long lines in the buffer.
> 
> Could we use this info to switch between cached and non-cached action
> for find_newline?  (I.e., just set 'cache-long-scans'.)

It isn't as easy as it may sound, because cache-long-scans also
affects another cache, the one used by bidi.c.  We need to be able to
turn them on and off separately to be able to support what you
suggest.

> Conversely, could we use find_newline to trigger so-long-mode?  Today,
> so-long-mode isn't able to step into the fray when something plops a
> long line into the buffer (in shell-mode, for instance).  If
> find_newline finds a long line, it could switch 'cache-long-scans' on,
> and also (on perhaps a different threshold) notify so-long-mode?

We already detect overlong lines: that's when the line number on the
mode line becomes "??".  We just need to use that as trigger for
so-long-mode, if we want it to turn on automatically.  Also, we need
to consider whether the same threshold as the one we use for the
mode-line display of line numbers is suitable for so-long-mode, or we
need two separate thresholds.





^ permalink raw reply	[flat|nested] 40+ messages in thread

* bug#22763: 25.1.50; Feature Request -- A faster method to obtain line number at position.
  2021-02-08 15:04                             ` Eli Zaretskii
@ 2021-02-09  2:17                               ` Katsumi Yamaoka
  2021-02-09  7:13                                 ` Lars Ingebrigtsen
  0 siblings, 1 reply; 40+ messages in thread
From: Katsumi Yamaoka @ 2021-02-09  2:17 UTC (permalink / raw)
  To: 22763; +Cc: larsi, esq, monnier

Hi,

`gnus-cite-parse' got to fail on Gnus articles containing non-ASCII
text.  It is because `count-lines', that uses `line-number-at-pos'
now, fails to count text lines correctly as follows:

(with-temp-buffer
  (insert "あ\nい\nう\nえ\nお\n")
  (count-lines (point) (point)))
 => 3

(save-window-excursion
  (view-hello-file)
  (goto-char (point-max))
  (count-lines (point) (point)))
 => 41

Thanks.





^ permalink raw reply	[flat|nested] 40+ messages in thread

* bug#22763: 25.1.50; Feature Request -- A faster method to obtain line number at position.
  2021-02-09  2:17                               ` Katsumi Yamaoka
@ 2021-02-09  7:13                                 ` Lars Ingebrigtsen
  0 siblings, 0 replies; 40+ messages in thread
From: Lars Ingebrigtsen @ 2021-02-09  7:13 UTC (permalink / raw)
  To: Katsumi Yamaoka; +Cc: 22763, esq, monnier

Katsumi Yamaoka <yamaoka@jpl.org> writes:

> `gnus-cite-parse' got to fail on Gnus articles containing non-ASCII
> text.  It is because `count-lines', that uses `line-number-at-pos'
> now, fails to count text lines correctly as follows:
>
> (with-temp-buffer
>   (insert "あ\nい\nう\nえ\nお\n")
>   (count-lines (point) (point)))
>  => 3

Ooops.  Looks like I fumbled the char/byte thing in 56e76f0eb.  This
should now be fixed.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no





^ permalink raw reply	[flat|nested] 40+ messages in thread

* bug#22763: 25.1.50; Feature Request -- A faster method to obtain line number at position.
  2016-02-22  2:42 bug#22763: 25.1.50; Feature Request -- A faster method to obtain line number at position Keith David Bershatsky
  2016-02-22 16:06 ` Eli Zaretskii
@ 2021-05-19 23:55 ` Ben Levy via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2021-05-20  6:45   ` Eli Zaretskii
  2021-05-20 19:53 ` Ben Levy via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2021-05-20 20:40 ` Ben Levy via Bug reports for GNU Emacs, the Swiss army knife of text editors
  3 siblings, 1 reply; 40+ messages in thread
From: Ben Levy via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2021-05-19 23:55 UTC (permalink / raw)
  To: 22763@debbugs.gnu.org

For this section in the line-number-at-pos code:

/* Check that POSITION is n the visible range of the buffer. */
if (pos < BEGV || pos > ZV)
  args_out_of_range (make_int (start), make_int (ZV));

Shouldn't the lower bound condition be pos < start instead of pos < BEGV?

^ permalink raw reply	[flat|nested] 40+ messages in thread

* bug#22763: 25.1.50; Feature Request -- A faster method to obtain line number at position.
  2021-05-19 23:55 ` Ben Levy via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2021-05-20  6:45   ` Eli Zaretskii
  2021-05-20  7:27     ` Andreas Schwab
  0 siblings, 1 reply; 40+ messages in thread
From: Eli Zaretskii @ 2021-05-20  6:45 UTC (permalink / raw)
  To: Ben Levy; +Cc: 22763

> Date: Wed, 19 May 2021 23:55:27 +0000
> From:  Ben Levy via "Bug reports for GNU Emacs,
>  the Swiss army knife of text editors" <bug-gnu-emacs@gnu.org>
> 
> For this section in the line-number-at-pos code:
> 
> /* Check that POSITION is n the visible range of the buffer. */
> if (pos < BEGV || pos > ZV)
>   args_out_of_range (make_int (start), make_int (ZV));
> 
> Shouldn't the lower bound condition be pos < start instead of pos < BEGV?

No, because 'start' is the _byte_ position, whereas 'pos' is the
character position.  So if 'start's value is BEGV_BYTE, the test above
already made sure 'pos' is correct, because it makes the equivalent
text of the character position.  And if 'start' is BEG_BYTE, then
'pos' cannot possibly be less than the beginning of the buffer,
because 'pos >= BEGV_BYTE' is a more strict condition.





^ permalink raw reply	[flat|nested] 40+ messages in thread

* bug#22763: 25.1.50; Feature Request -- A faster method to obtain line number at position.
  2021-05-20  6:45   ` Eli Zaretskii
@ 2021-05-20  7:27     ` Andreas Schwab
  2021-05-20  7:35       ` Ben Levy via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2021-05-20  9:03       ` Eli Zaretskii
  0 siblings, 2 replies; 40+ messages in thread
From: Andreas Schwab @ 2021-05-20  7:27 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Ben Levy, 22763

On Mai 20 2021, Eli Zaretskii wrote:

>> Date: Wed, 19 May 2021 23:55:27 +0000
>> From:  Ben Levy via "Bug reports for GNU Emacs,
>>  the Swiss army knife of text editors" <bug-gnu-emacs@gnu.org>
>> 
>> For this section in the line-number-at-pos code:
>> 
>> /* Check that POSITION is n the visible range of the buffer. */
>> if (pos < BEGV || pos > ZV)
>>   args_out_of_range (make_int (start), make_int (ZV));
>> 
>> Shouldn't the lower bound condition be pos < start instead of pos < BEGV?
>
> No, because 'start' is the _byte_ position,

But it should not be passed to args_out_of_range then.

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."





^ permalink raw reply	[flat|nested] 40+ messages in thread

* bug#22763: 25.1.50; Feature Request -- A faster method to obtain line number at position.
  2021-05-20  7:27     ` Andreas Schwab
@ 2021-05-20  7:35       ` Ben Levy via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2021-05-20  9:08         ` Eli Zaretskii
  2021-05-20  9:03       ` Eli Zaretskii
  1 sibling, 1 reply; 40+ messages in thread
From: Ben Levy via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2021-05-20  7:35 UTC (permalink / raw)
  To: 22763@debbugs.gnu.org

> No, because 'start' is the byte position.

Oh, I was confused, sorry. In that case, I think the issue I'm having
is that it still compares to BEGV even when absolute is non-nil.
I think it should compare to BEG instead.

> But it should not be passed to args_out_of_range then.

Also this, and I think the comment has a typo ("n" should be "in"?).

Thanks.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* bug#22763: 25.1.50; Feature Request -- A faster method to obtain line number at position.
  2021-05-20  7:27     ` Andreas Schwab
  2021-05-20  7:35       ` Ben Levy via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2021-05-20  9:03       ` Eli Zaretskii
  1 sibling, 0 replies; 40+ messages in thread
From: Eli Zaretskii @ 2021-05-20  9:03 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: blevy, 22763

> From: Andreas Schwab <schwab@linux-m68k.org>
> Cc: Ben Levy <blevy@protonmail.com>,  22763@debbugs.gnu.org
> Date: Thu, 20 May 2021 09:27:34 +0200
> 
> >> if (pos < BEGV || pos > ZV)
> >>   args_out_of_range (make_int (start), make_int (ZV));
> >> 
> >> Shouldn't the lower bound condition be pos < start instead of pos < BEGV?
> >
> > No, because 'start' is the _byte_ position,
> 
> But it should not be passed to args_out_of_range then.

Right you are; fixed.





^ permalink raw reply	[flat|nested] 40+ messages in thread

* bug#22763: 25.1.50; Feature Request -- A faster method to obtain line number at position.
  2021-05-20  7:35       ` Ben Levy via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2021-05-20  9:08         ` Eli Zaretskii
  0 siblings, 0 replies; 40+ messages in thread
From: Eli Zaretskii @ 2021-05-20  9:08 UTC (permalink / raw)
  To: Ben Levy; +Cc: 22763

> Date: Thu, 20 May 2021 07:35:21 +0000
> From:  Ben Levy via "Bug reports for GNU Emacs,
>  the Swiss army knife of text editors" <bug-gnu-emacs@gnu.org>
> 
> > No, because 'start' is the byte position.
> 
> Oh, I was confused, sorry. In that case, I think the issue I'm having
> is that it still compares to BEGV even when absolute is non-nil.
> I think it should compare to BEG instead.

No, because POSITION should still be in the accessible portion of the
buffer.  The function only counts the lines since from BOB, but that
doesn't imply POSITION can be outside BEGV..ZV.

> Also this, and I think the comment has a typo ("n" should be "in"?).

Yes, already fixed.

Thanks.





^ permalink raw reply	[flat|nested] 40+ messages in thread

* bug#22763: 25.1.50; Feature Request -- A faster method to obtain line number at position.
  2016-02-22  2:42 bug#22763: 25.1.50; Feature Request -- A faster method to obtain line number at position Keith David Bershatsky
  2016-02-22 16:06 ` Eli Zaretskii
  2021-05-19 23:55 ` Ben Levy via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2021-05-20 19:53 ` Ben Levy via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2021-05-20 20:09   ` Eli Zaretskii
  2021-05-20 20:40 ` Ben Levy via Bug reports for GNU Emacs, the Swiss army knife of text editors
  3 siblings, 1 reply; 40+ messages in thread
From: Ben Levy via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2021-05-20 19:53 UTC (permalink / raw)
  To: 22763@debbugs.gnu.org

I'm not 100% sure about this, but it looks like when
line-number-at-pos was written in elisp, it didn't do
any bounds check, and instead called (widen) before
counting lines.  Did the elisp version have unintended
behavior, or are the differences with the C version
a regression?

I think it's causing this bug:
https://github.com/io12/good-scroll.el/issues/16

^ permalink raw reply	[flat|nested] 40+ messages in thread

* bug#22763: 25.1.50; Feature Request -- A faster method to obtain line number at position.
  2021-05-20 19:53 ` Ben Levy via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2021-05-20 20:09   ` Eli Zaretskii
  0 siblings, 0 replies; 40+ messages in thread
From: Eli Zaretskii @ 2021-05-20 20:09 UTC (permalink / raw)
  To: Ben Levy; +Cc: 22763

> Date: Thu, 20 May 2021 19:53:24 +0000
> From:  Ben Levy via "Bug reports for GNU Emacs,
>  the Swiss army knife of text editors" <bug-gnu-emacs@gnu.org>
> 
> I'm not 100% sure about this, but it looks like when
> line-number-at-pos was written in elisp, it didn't do
> any bounds check, and instead called (widen) before
> counting lines.  Did the elisp version have unintended
> behavior, or are the differences with the C version
> a regression?

I cannot answer that, because I don't understand what regression are
you talking about.

There's no need to widen the buffer in order to scan buffer text in C,
AFAIK.

> I think it's causing this bug:
> https://github.com/io12/good-scroll.el/issues/16

Please explain the connection.  I've read that discussion, but didn't
see any data that would indicate this function is the culprit.  At the
very least, please tell which arguments are passed to
line-number-at-pos in the problematic case, and what is the narrowed
region.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* bug#22763: 25.1.50; Feature Request -- A faster method to obtain line number at position.
  2016-02-22  2:42 bug#22763: 25.1.50; Feature Request -- A faster method to obtain line number at position Keith David Bershatsky
                   ` (2 preceding siblings ...)
  2021-05-20 19:53 ` Ben Levy via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2021-05-20 20:40 ` Ben Levy via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2021-05-21  5:46   ` Eli Zaretskii
  3 siblings, 1 reply; 40+ messages in thread
From: Ben Levy via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2021-05-20 20:40 UTC (permalink / raw)
  To: 22763@debbugs.gnu.org

> Please explain the connection.

Here's a minimal example (sorry I didn't try to do this earlier):

(with-temp-buffer
  (insert "foo\nbar\nbaz")
  (narrow-to-region 4 5)
  (message "%s" (line-number-at-pos 1 t)))

On the latest master (ef7a6eec20), this errors with

Args out of range: 1, 4, 5

But on emacs-27.2, it prints "1".

From what I can tell, this is because the lisp version of line-number-at-pos
allowed the argument to be outside the visible range, and it widened before
counting lines to allow this.





^ permalink raw reply	[flat|nested] 40+ messages in thread

* bug#22763: 25.1.50; Feature Request -- A faster method to obtain line number at position.
  2021-05-20 20:40 ` Ben Levy via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2021-05-21  5:46   ` Eli Zaretskii
  0 siblings, 0 replies; 40+ messages in thread
From: Eli Zaretskii @ 2021-05-21  5:46 UTC (permalink / raw)
  To: Ben Levy; +Cc: 22763

> Date: Thu, 20 May 2021 20:40:31 +0000
> From:  Ben Levy via "Bug reports for GNU Emacs,
>  the Swiss army knife of text editors" <bug-gnu-emacs@gnu.org>
> 
> (with-temp-buffer
>   (insert "foo\nbar\nbaz")
>   (narrow-to-region 4 5)
>   (message "%s" (line-number-at-pos 1 t)))
> 
> On the latest master (ef7a6eec20), this errors with
> 
> Args out of range: 1, 4, 5
> 
> But on emacs-27.2, it prints "1".

This is the expected behavior: the POSITION argument must be in the
accessible portion of the buffer, like every position argument in
Emacs.  A Lisp program which calls this function like above has a bug
that needs to be fixed.

This is not a bug.

> >From what I can tell, this is because the lisp version of line-number-at-pos
> allowed the argument to be outside the visible range, and it widened before
> counting lines to allow this.

Right, the Lisp implementation was incorrect, and failed to do the
test before widening.





^ permalink raw reply	[flat|nested] 40+ messages in thread

end of thread, other threads:[~2021-05-21  5:46 UTC | newest]

Thread overview: 40+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-02-22  2:42 bug#22763: 25.1.50; Feature Request -- A faster method to obtain line number at position Keith David Bershatsky
2016-02-22 16:06 ` Eli Zaretskii
2021-02-07 15:07   ` Lars Ingebrigtsen
2021-02-07 15:44     ` Lars Ingebrigtsen
2021-02-07 16:07     ` Lars Ingebrigtsen
2021-02-07 17:40       ` Stefan Monnier
2021-02-07 17:45         ` Lars Ingebrigtsen
2021-02-07 18:07           ` Lars Ingebrigtsen
2021-02-07 18:09           ` Eli Zaretskii
2021-02-07 18:14             ` Lars Ingebrigtsen
2021-02-07 18:23               ` Lars Ingebrigtsen
2021-02-07 19:02                 ` Eli Zaretskii
2021-02-07 19:06                   ` Eli Zaretskii
2021-02-07 19:25                   ` Lars Ingebrigtsen
2021-02-07 19:34                     ` Lars Ingebrigtsen
2021-02-07 19:43                       ` Eli Zaretskii
2021-02-07 19:42                     ` Eli Zaretskii
2021-02-07 19:46                       ` Lars Ingebrigtsen
2021-02-07 19:52                         ` Eli Zaretskii
2021-02-07 21:52                           ` Lars Ingebrigtsen
2021-02-07 21:58                             ` Lars Ingebrigtsen
2021-02-08  3:34                               ` Eli Zaretskii
2021-02-07 22:09                             ` Philipp
2021-02-07 20:37                     ` Stefan Monnier
2021-02-07 20:42                       ` Lars Ingebrigtsen
2021-02-07 20:50                         ` Eli Zaretskii
2021-02-07 21:36                           ` Lars Ingebrigtsen
2021-02-08 15:04                             ` Eli Zaretskii
2021-02-09  2:17                               ` Katsumi Yamaoka
2021-02-09  7:13                                 ` Lars Ingebrigtsen
2021-05-19 23:55 ` Ben Levy via Bug reports for GNU Emacs, the Swiss army knife of text editors
2021-05-20  6:45   ` Eli Zaretskii
2021-05-20  7:27     ` Andreas Schwab
2021-05-20  7:35       ` Ben Levy via Bug reports for GNU Emacs, the Swiss army knife of text editors
2021-05-20  9:08         ` Eli Zaretskii
2021-05-20  9:03       ` Eli Zaretskii
2021-05-20 19:53 ` Ben Levy via Bug reports for GNU Emacs, the Swiss army knife of text editors
2021-05-20 20:09   ` Eli Zaretskii
2021-05-20 20:40 ` Ben Levy via Bug reports for GNU Emacs, the Swiss army knife of text editors
2021-05-21  5:46   ` Eli Zaretskii

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).