unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
* Re: master 792ba71: Add a new function 'buffer-line-statistics'
       [not found] ` <20210112174750.5928220B2C@vcs0.savannah.gnu.org>
@ 2021-01-12 18:15   ` Stefan Monnier
  2021-01-12 18:29     ` Philipp Stephani
                       ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Stefan Monnier @ 2021-01-12 18:15 UTC (permalink / raw)
  To: emacs-devel; +Cc: Lars Ingebrigtsen

> +  double mean = 0;
[...]
> +	  /* Blame Knuth. */
> +	  mean = mean + (this_line - mean) / lines;

I must admit to not being fully cognizant of Knuth, but doesn't this
compute the average rather than the mean?

I thought computing the mean was necessarily O(N) in space (like
keeping the full sequence of line lengths so you can sort it and then
pick the middle point).


        Stefan




^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: master 792ba71: Add a new function 'buffer-line-statistics'
  2021-01-12 18:15   ` master 792ba71: Add a new function 'buffer-line-statistics' Stefan Monnier
@ 2021-01-12 18:29     ` Philipp Stephani
  2021-01-12 19:06       ` Stefan Monnier
  2021-01-12 18:37     ` Eli Zaretskii
  2021-01-12 18:39     ` Lars Ingebrigtsen
  2 siblings, 1 reply; 8+ messages in thread
From: Philipp Stephani @ 2021-01-12 18:29 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: Lars Ingebrigtsen, Emacs developers

Am Di., 12. Jan. 2021 um 19:22 Uhr schrieb Stefan Monnier
<monnier@iro.umontreal.ca>:
>
> > +  double mean = 0;
> [...]
> > +       /* Blame Knuth. */
> > +       mean = mean + (this_line - mean) / lines;
>
> I must admit to not being fully cognizant of Knuth, but doesn't this
> compute the average rather than the mean?
>
> I thought computing the mean was necessarily O(N) in space (like
> keeping the full sequence of line lengths so you can sort it and then
> pick the middle point).
>

That's the median. "Mean" typically means "arithmetic mean."



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: master 792ba71: Add a new function 'buffer-line-statistics'
  2021-01-12 18:15   ` master 792ba71: Add a new function 'buffer-line-statistics' Stefan Monnier
  2021-01-12 18:29     ` Philipp Stephani
@ 2021-01-12 18:37     ` Eli Zaretskii
  2021-01-12 18:43       ` Eli Zaretskii
  2021-01-12 18:39     ` Lars Ingebrigtsen
  2 siblings, 1 reply; 8+ messages in thread
From: Eli Zaretskii @ 2021-01-12 18:37 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: larsi, emacs-devel

> From: Stefan Monnier <monnier@iro.umontreal.ca>
> Date: Tue, 12 Jan 2021 13:15:25 -0500
> Cc: Lars Ingebrigtsen <larsi@gnus.org>
> 
> I thought computing the mean was necessarily O(N) in space (like
> keeping the full sequence of line lengths so you can sort it and then
> pick the middle point).

That the median, not the mean.  "Mean" and "average" are identical.



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: master 792ba71: Add a new function 'buffer-line-statistics'
  2021-01-12 18:15   ` master 792ba71: Add a new function 'buffer-line-statistics' Stefan Monnier
  2021-01-12 18:29     ` Philipp Stephani
  2021-01-12 18:37     ` Eli Zaretskii
@ 2021-01-12 18:39     ` Lars Ingebrigtsen
  2021-01-12 19:18       ` Eli Zaretskii
  2 siblings, 1 reply; 8+ messages in thread
From: Lars Ingebrigtsen @ 2021-01-12 18:39 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: emacs-devel

Stefan Monnier <monnier@iro.umontreal.ca> writes:

> I must admit to not being fully cognizant of Knuth, but doesn't this
> compute the average rather than the mean?

Arithmetic mean, which is the meanest of the mean.

> I thought computing the mean was necessarily O(N) in space (like
> keeping the full sequence of line lengths so you can sort it and then
> pick the middle point).

That's the median, and that's what I also thought, but it turns out that
you can approximate the median to the desired accuracy with much less
than O(N) space:

  https://www.stat.cmu.edu/~ryantibs/papers/median.pdf

Which we can consider if somebody wants it.  :-)

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: master 792ba71: Add a new function 'buffer-line-statistics'
  2021-01-12 18:37     ` Eli Zaretskii
@ 2021-01-12 18:43       ` Eli Zaretskii
  0 siblings, 0 replies; 8+ messages in thread
From: Eli Zaretskii @ 2021-01-12 18:43 UTC (permalink / raw)
  To: monnier; +Cc: larsi, emacs-devel

> Date: Tue, 12 Jan 2021 20:37:25 +0200
> From: Eli Zaretskii <eliz@gnu.org>
> Cc: larsi@gnus.org, emacs-devel@gnu.org
> 
> > From: Stefan Monnier <monnier@iro.umontreal.ca>
> > Date: Tue, 12 Jan 2021 13:15:25 -0500
> > Cc: Lars Ingebrigtsen <larsi@gnus.org>
> > 
> > I thought computing the mean was necessarily O(N) in space (like
> > keeping the full sequence of line lengths so you can sort it and then
> > pick the middle point).
> 
> That the median, not the mean.  "Mean" and "average" are identical.

Btw, there are algorithms for true sequential estimation of the
median, without keeping all the samples in memory.  But they are
somewhat complex, and I'm not sure this case justifies such
complexity, as we shouldn't care in this case about the lack of
robustness of the average as a statistic.



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: master 792ba71: Add a new function 'buffer-line-statistics'
  2021-01-12 18:29     ` Philipp Stephani
@ 2021-01-12 19:06       ` Stefan Monnier
  0 siblings, 0 replies; 8+ messages in thread
From: Stefan Monnier @ 2021-01-12 19:06 UTC (permalink / raw)
  To: Philipp Stephani; +Cc: Lars Ingebrigtsen, Emacs developers

> That's the median. "Mean" typically means "arithmetic mean."

Duh!  So that's what we call "experience", eh?


        Stefan


PS: "experience: when you make a mistake and notice that it's not the
     first time"




^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: master 792ba71: Add a new function 'buffer-line-statistics'
  2021-01-12 18:39     ` Lars Ingebrigtsen
@ 2021-01-12 19:18       ` Eli Zaretskii
  2021-01-12 19:54         ` Stefan Monnier
  0 siblings, 1 reply; 8+ messages in thread
From: Eli Zaretskii @ 2021-01-12 19:18 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: monnier, emacs-devel

> From: Lars Ingebrigtsen <larsi@gnus.org>
> Date: Tue, 12 Jan 2021 19:39:54 +0100
> Cc: emacs-devel@gnu.org
> 
> That's the median, and that's what I also thought, but it turns out that
> you can approximate the median to the desired accuracy with much less
> than O(N) space:
> 
>   https://www.stat.cmu.edu/~ryantibs/papers/median.pdf

I used this one:

  https://www.cse.wustl.edu/~jain/papers/ftp/psqr.pdf



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: master 792ba71: Add a new function 'buffer-line-statistics'
  2021-01-12 19:18       ` Eli Zaretskii
@ 2021-01-12 19:54         ` Stefan Monnier
  0 siblings, 0 replies; 8+ messages in thread
From: Stefan Monnier @ 2021-01-12 19:54 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Lars Ingebrigtsen, emacs-devel

>>   https://www.stat.cmu.edu/~ryantibs/papers/median.pdf
> I used this one:
>   https://www.cse.wustl.edu/~jain/papers/ftp/psqr.pdf

Thanks!

FWIW, last time I needed a fast approximation of the median was in the
`src/profiler.c` code, and I just used the approach you can see in
`approximate_median` (basically, the approximate median I used is the
median of 3 (recursive) approximate medians).

So, it's O(N) worst case time complexity and O(log N) worst case
(stack) space complexity.


        Stefan




^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2021-01-12 19:54 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20210112174748.18372.29339@vcs0.savannah.gnu.org>
     [not found] ` <20210112174750.5928220B2C@vcs0.savannah.gnu.org>
2021-01-12 18:15   ` master 792ba71: Add a new function 'buffer-line-statistics' Stefan Monnier
2021-01-12 18:29     ` Philipp Stephani
2021-01-12 19:06       ` Stefan Monnier
2021-01-12 18:37     ` Eli Zaretskii
2021-01-12 18:43       ` Eli Zaretskii
2021-01-12 18:39     ` Lars Ingebrigtsen
2021-01-12 19:18       ` Eli Zaretskii
2021-01-12 19:54         ` Stefan Monnier

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).