* Re: master 792ba71: Add a new function 'buffer-line-statistics'
[not found] ` <20210112174750.5928220B2C@vcs0.savannah.gnu.org>
@ 2021-01-12 18:15 ` Stefan Monnier
2021-01-12 18:29 ` Philipp Stephani
` (2 more replies)
0 siblings, 3 replies; 8+ messages in thread
From: Stefan Monnier @ 2021-01-12 18:15 UTC (permalink / raw)
To: emacs-devel; +Cc: Lars Ingebrigtsen
> + double mean = 0;
[...]
> + /* Blame Knuth. */
> + mean = mean + (this_line - mean) / lines;
I must admit to not being fully cognizant of Knuth, but doesn't this
compute the average rather than the mean?
I thought computing the mean was necessarily O(N) in space (like
keeping the full sequence of line lengths so you can sort it and then
pick the middle point).
Stefan
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: master 792ba71: Add a new function 'buffer-line-statistics'
2021-01-12 18:15 ` master 792ba71: Add a new function 'buffer-line-statistics' Stefan Monnier
@ 2021-01-12 18:29 ` Philipp Stephani
2021-01-12 19:06 ` Stefan Monnier
2021-01-12 18:37 ` Eli Zaretskii
2021-01-12 18:39 ` Lars Ingebrigtsen
2 siblings, 1 reply; 8+ messages in thread
From: Philipp Stephani @ 2021-01-12 18:29 UTC (permalink / raw)
To: Stefan Monnier; +Cc: Lars Ingebrigtsen, Emacs developers
Am Di., 12. Jan. 2021 um 19:22 Uhr schrieb Stefan Monnier
<monnier@iro.umontreal.ca>:
>
> > + double mean = 0;
> [...]
> > + /* Blame Knuth. */
> > + mean = mean + (this_line - mean) / lines;
>
> I must admit to not being fully cognizant of Knuth, but doesn't this
> compute the average rather than the mean?
>
> I thought computing the mean was necessarily O(N) in space (like
> keeping the full sequence of line lengths so you can sort it and then
> pick the middle point).
>
That's the median. "Mean" typically means "arithmetic mean."
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: master 792ba71: Add a new function 'buffer-line-statistics'
2021-01-12 18:15 ` master 792ba71: Add a new function 'buffer-line-statistics' Stefan Monnier
2021-01-12 18:29 ` Philipp Stephani
@ 2021-01-12 18:37 ` Eli Zaretskii
2021-01-12 18:43 ` Eli Zaretskii
2021-01-12 18:39 ` Lars Ingebrigtsen
2 siblings, 1 reply; 8+ messages in thread
From: Eli Zaretskii @ 2021-01-12 18:37 UTC (permalink / raw)
To: Stefan Monnier; +Cc: larsi, emacs-devel
> From: Stefan Monnier <monnier@iro.umontreal.ca>
> Date: Tue, 12 Jan 2021 13:15:25 -0500
> Cc: Lars Ingebrigtsen <larsi@gnus.org>
>
> I thought computing the mean was necessarily O(N) in space (like
> keeping the full sequence of line lengths so you can sort it and then
> pick the middle point).
That the median, not the mean. "Mean" and "average" are identical.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: master 792ba71: Add a new function 'buffer-line-statistics'
2021-01-12 18:15 ` master 792ba71: Add a new function 'buffer-line-statistics' Stefan Monnier
2021-01-12 18:29 ` Philipp Stephani
2021-01-12 18:37 ` Eli Zaretskii
@ 2021-01-12 18:39 ` Lars Ingebrigtsen
2021-01-12 19:18 ` Eli Zaretskii
2 siblings, 1 reply; 8+ messages in thread
From: Lars Ingebrigtsen @ 2021-01-12 18:39 UTC (permalink / raw)
To: Stefan Monnier; +Cc: emacs-devel
Stefan Monnier <monnier@iro.umontreal.ca> writes:
> I must admit to not being fully cognizant of Knuth, but doesn't this
> compute the average rather than the mean?
Arithmetic mean, which is the meanest of the mean.
> I thought computing the mean was necessarily O(N) in space (like
> keeping the full sequence of line lengths so you can sort it and then
> pick the middle point).
That's the median, and that's what I also thought, but it turns out that
you can approximate the median to the desired accuracy with much less
than O(N) space:
https://www.stat.cmu.edu/~ryantibs/papers/median.pdf
Which we can consider if somebody wants it. :-)
--
(domestic pets only, the antidote for overdose, milk.)
bloggy blog: http://lars.ingebrigtsen.no
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: master 792ba71: Add a new function 'buffer-line-statistics'
2021-01-12 18:37 ` Eli Zaretskii
@ 2021-01-12 18:43 ` Eli Zaretskii
0 siblings, 0 replies; 8+ messages in thread
From: Eli Zaretskii @ 2021-01-12 18:43 UTC (permalink / raw)
To: monnier; +Cc: larsi, emacs-devel
> Date: Tue, 12 Jan 2021 20:37:25 +0200
> From: Eli Zaretskii <eliz@gnu.org>
> Cc: larsi@gnus.org, emacs-devel@gnu.org
>
> > From: Stefan Monnier <monnier@iro.umontreal.ca>
> > Date: Tue, 12 Jan 2021 13:15:25 -0500
> > Cc: Lars Ingebrigtsen <larsi@gnus.org>
> >
> > I thought computing the mean was necessarily O(N) in space (like
> > keeping the full sequence of line lengths so you can sort it and then
> > pick the middle point).
>
> That the median, not the mean. "Mean" and "average" are identical.
Btw, there are algorithms for true sequential estimation of the
median, without keeping all the samples in memory. But they are
somewhat complex, and I'm not sure this case justifies such
complexity, as we shouldn't care in this case about the lack of
robustness of the average as a statistic.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: master 792ba71: Add a new function 'buffer-line-statistics'
2021-01-12 18:29 ` Philipp Stephani
@ 2021-01-12 19:06 ` Stefan Monnier
0 siblings, 0 replies; 8+ messages in thread
From: Stefan Monnier @ 2021-01-12 19:06 UTC (permalink / raw)
To: Philipp Stephani; +Cc: Lars Ingebrigtsen, Emacs developers
> That's the median. "Mean" typically means "arithmetic mean."
Duh! So that's what we call "experience", eh?
Stefan
PS: "experience: when you make a mistake and notice that it's not the
first time"
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: master 792ba71: Add a new function 'buffer-line-statistics'
2021-01-12 18:39 ` Lars Ingebrigtsen
@ 2021-01-12 19:18 ` Eli Zaretskii
2021-01-12 19:54 ` Stefan Monnier
0 siblings, 1 reply; 8+ messages in thread
From: Eli Zaretskii @ 2021-01-12 19:18 UTC (permalink / raw)
To: Lars Ingebrigtsen; +Cc: monnier, emacs-devel
> From: Lars Ingebrigtsen <larsi@gnus.org>
> Date: Tue, 12 Jan 2021 19:39:54 +0100
> Cc: emacs-devel@gnu.org
>
> That's the median, and that's what I also thought, but it turns out that
> you can approximate the median to the desired accuracy with much less
> than O(N) space:
>
> https://www.stat.cmu.edu/~ryantibs/papers/median.pdf
I used this one:
https://www.cse.wustl.edu/~jain/papers/ftp/psqr.pdf
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: master 792ba71: Add a new function 'buffer-line-statistics'
2021-01-12 19:18 ` Eli Zaretskii
@ 2021-01-12 19:54 ` Stefan Monnier
0 siblings, 0 replies; 8+ messages in thread
From: Stefan Monnier @ 2021-01-12 19:54 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: Lars Ingebrigtsen, emacs-devel
>> https://www.stat.cmu.edu/~ryantibs/papers/median.pdf
> I used this one:
> https://www.cse.wustl.edu/~jain/papers/ftp/psqr.pdf
Thanks!
FWIW, last time I needed a fast approximation of the median was in the
`src/profiler.c` code, and I just used the approach you can see in
`approximate_median` (basically, the approximate median I used is the
median of 3 (recursive) approximate medians).
So, it's O(N) worst case time complexity and O(log N) worst case
(stack) space complexity.
Stefan
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2021-01-12 19:54 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20210112174748.18372.29339@vcs0.savannah.gnu.org>
[not found] ` <20210112174750.5928220B2C@vcs0.savannah.gnu.org>
2021-01-12 18:15 ` master 792ba71: Add a new function 'buffer-line-statistics' Stefan Monnier
2021-01-12 18:29 ` Philipp Stephani
2021-01-12 19:06 ` Stefan Monnier
2021-01-12 18:37 ` Eli Zaretskii
2021-01-12 18:43 ` Eli Zaretskii
2021-01-12 18:39 ` Lars Ingebrigtsen
2021-01-12 19:18 ` Eli Zaretskii
2021-01-12 19:54 ` Stefan Monnier
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/emacs.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).