> + double mean = 0; [...] > + /* Blame Knuth. */ > + mean = mean + (this_line - mean) / lines; I must admit to not being fully cognizant of Knuth, but doesn't this compute the average rather than the mean? I thought computing the mean was necessarily O(N) in space (like keeping the full sequence of line lengths so you can sort it and then pick the middle point). Stefan

```
Am Di., 12. Jan. 2021 um 19:22 Uhr schrieb Stefan Monnier
<monnier@iro.umontreal.ca>:
>
> > + double mean = 0;
> [...]
> > + /* Blame Knuth. */
> > + mean = mean + (this_line - mean) / lines;
>
> I must admit to not being fully cognizant of Knuth, but doesn't this
> compute the average rather than the mean?
>
> I thought computing the mean was necessarily O(N) in space (like
> keeping the full sequence of line lengths so you can sort it and then
> pick the middle point).
>
That's the median. "Mean" typically means "arithmetic mean."
```

```
> From: Stefan Monnier <monnier@iro.umontreal.ca>
> Date: Tue, 12 Jan 2021 13:15:25 -0500
> Cc: Lars Ingebrigtsen <larsi@gnus.org>
>
> I thought computing the mean was necessarily O(N) in space (like
> keeping the full sequence of line lengths so you can sort it and then
> pick the middle point).
That the median, not the mean. "Mean" and "average" are identical.
```

Stefan Monnier <monnier@iro.umontreal.ca> writes: > I must admit to not being fully cognizant of Knuth, but doesn't this > compute the average rather than the mean? Arithmetic mean, which is the meanest of the mean. > I thought computing the mean was necessarily O(N) in space (like > keeping the full sequence of line lengths so you can sort it and then > pick the middle point). That's the median, and that's what I also thought, but it turns out that you can approximate the median to the desired accuracy with much less than O(N) space: https://www.stat.cmu.edu/~ryantibs/papers/median.pdf Which we can consider if somebody wants it. :-) -- (domestic pets only, the antidote for overdose, milk.) bloggy blog: http://lars.ingebrigtsen.no

```
> Date: Tue, 12 Jan 2021 20:37:25 +0200
> From: Eli Zaretskii <eliz@gnu.org>
> Cc: larsi@gnus.org, emacs-devel@gnu.org
>
> > From: Stefan Monnier <monnier@iro.umontreal.ca>
> > Date: Tue, 12 Jan 2021 13:15:25 -0500
> > Cc: Lars Ingebrigtsen <larsi@gnus.org>
> >
> > I thought computing the mean was necessarily O(N) in space (like
> > keeping the full sequence of line lengths so you can sort it and then
> > pick the middle point).
>
> That the median, not the mean. "Mean" and "average" are identical.
Btw, there are algorithms for true sequential estimation of the
median, without keeping all the samples in memory. But they are
somewhat complex, and I'm not sure this case justifies such
complexity, as we shouldn't care in this case about the lack of
robustness of the average as a statistic.
```

```
> That's the median. "Mean" typically means "arithmetic mean."
Duh! So that's what we call "experience", eh?
Stefan
PS: "experience: when you make a mistake and notice that it's not the
first time"
```

> From: Lars Ingebrigtsen <larsi@gnus.org> > Date: Tue, 12 Jan 2021 19:39:54 +0100 > Cc: emacs-devel@gnu.org > > That's the median, and that's what I also thought, but it turns out that > you can approximate the median to the desired accuracy with much less > than O(N) space: > > https://www.stat.cmu.edu/~ryantibs/papers/median.pdf I used this one: https://www.cse.wustl.edu/~jain/papers/ftp/psqr.pdf

```
>> https://www.stat.cmu.edu/~ryantibs/papers/median.pdf
> I used this one:
> https://www.cse.wustl.edu/~jain/papers/ftp/psqr.pdf
Thanks!
FWIW, last time I needed a fast approximation of the median was in the
`src/profiler.c` code, and I just used the approach you can see in
`approximate_median` (basically, the approximate median I used is the
median of 3 (recursive) approximate medians).
So, it's O(N) worst case time complexity and O(log N) worst case
(stack) space complexity.
Stefan
```