```unofficial mirror of emacs-devel@gnu.org
help / color / mirror / Atom feed```
```* Re: master 792ba71: Add a new function 'buffer-line-statistics'
@ 2021-01-12 18:15   ` Stefan Monnier
2021-01-12 18:29     ` Philipp Stephani
` (2 more replies)
0 siblings, 3 replies; 8+ messages in thread
From: Stefan Monnier @ 2021-01-12 18:15 UTC (permalink / raw)
To: emacs-devel; +Cc: Lars Ingebrigtsen

> +  double mean = 0;
[...]
> +	  /* Blame Knuth. */
> +	  mean = mean + (this_line - mean) / lines;

I must admit to not being fully cognizant of Knuth, but doesn't this
compute the average rather than the mean?

I thought computing the mean was necessarily O(N) in space (like
keeping the full sequence of line lengths so you can sort it and then
pick the middle point).

Stefan

```* Re: master 792ba71: Add a new function 'buffer-line-statistics'
2021-01-12 18:15   ` master 792ba71: Add a new function 'buffer-line-statistics' Stefan Monnier
@ 2021-01-12 18:29     ` Philipp Stephani
2021-01-12 19:06       ` Stefan Monnier
2021-01-12 18:37     ` Eli Zaretskii
2021-01-12 18:39     ` Lars Ingebrigtsen
From: Philipp Stephani @ 2021-01-12 18:29 UTC (permalink / raw)
To: Stefan Monnier; +Cc: Lars Ingebrigtsen, Emacs developers

Am Di., 12. Jan. 2021 um 19:22 Uhr schrieb Stefan Monnier
<monnier@iro.umontreal.ca>:
>
> > +  double mean = 0;
> [...]
> > +       /* Blame Knuth. */
> > +       mean = mean + (this_line - mean) / lines;
>
> I must admit to not being fully cognizant of Knuth, but doesn't this
> compute the average rather than the mean?
>
> I thought computing the mean was necessarily O(N) in space (like
> keeping the full sequence of line lengths so you can sort it and then
> pick the middle point).
>

That's the median. "Mean" typically means "arithmetic mean."

```* Re: master 792ba71: Add a new function 'buffer-line-statistics'
2021-01-12 18:15   ` master 792ba71: Add a new function 'buffer-line-statistics' Stefan Monnier
2021-01-12 18:29     ` Philipp Stephani
@ 2021-01-12 18:37     ` Eli Zaretskii
2021-01-12 18:43       ` Eli Zaretskii
2021-01-12 18:39     ` Lars Ingebrigtsen
From: Eli Zaretskii @ 2021-01-12 18:37 UTC (permalink / raw)
To: Stefan Monnier; +Cc: larsi, emacs-devel

> From: Stefan Monnier <monnier@iro.umontreal.ca>
> Date: Tue, 12 Jan 2021 13:15:25 -0500
> Cc: Lars Ingebrigtsen <larsi@gnus.org>
>
> I thought computing the mean was necessarily O(N) in space (like
> keeping the full sequence of line lengths so you can sort it and then
> pick the middle point).

That the median, not the mean.  "Mean" and "average" are identical.

```* Re: master 792ba71: Add a new function 'buffer-line-statistics'
2021-01-12 18:15   ` master 792ba71: Add a new function 'buffer-line-statistics' Stefan Monnier
2021-01-12 18:29     ` Philipp Stephani
2021-01-12 18:37     ` Eli Zaretskii
@ 2021-01-12 18:39     ` Lars Ingebrigtsen
2021-01-12 19:18       ` Eli Zaretskii
From: Lars Ingebrigtsen @ 2021-01-12 18:39 UTC (permalink / raw)
To: Stefan Monnier; +Cc: emacs-devel

Stefan Monnier <monnier@iro.umontreal.ca> writes:

> I must admit to not being fully cognizant of Knuth, but doesn't this
> compute the average rather than the mean?

Arithmetic mean, which is the meanest of the mean.

> I thought computing the mean was necessarily O(N) in space (like
> keeping the full sequence of line lengths so you can sort it and then
> pick the middle point).

That's the median, and that's what I also thought, but it turns out that
you can approximate the median to the desired accuracy with much less
than O(N) space:

https://www.stat.cmu.edu/~ryantibs/papers/median.pdf

Which we can consider if somebody wants it.  :-)

--
(domestic pets only, the antidote for overdose, milk.)
bloggy blog: http://lars.ingebrigtsen.no

```* Re: master 792ba71: Add a new function 'buffer-line-statistics'
2021-01-12 18:37     ` Eli Zaretskii
@ 2021-01-12 18:43       ` Eli Zaretskii
0 siblings, 0 replies; 8+ messages in thread
From: Eli Zaretskii @ 2021-01-12 18:43 UTC (permalink / raw)
To: monnier; +Cc: larsi, emacs-devel

> Date: Tue, 12 Jan 2021 20:37:25 +0200
> From: Eli Zaretskii <eliz@gnu.org>
> Cc: larsi@gnus.org, emacs-devel@gnu.org
>
> > From: Stefan Monnier <monnier@iro.umontreal.ca>
> > Date: Tue, 12 Jan 2021 13:15:25 -0500
> > Cc: Lars Ingebrigtsen <larsi@gnus.org>
> >
> > I thought computing the mean was necessarily O(N) in space (like
> > keeping the full sequence of line lengths so you can sort it and then
> > pick the middle point).
>
> That the median, not the mean.  "Mean" and "average" are identical.

Btw, there are algorithms for true sequential estimation of the
median, without keeping all the samples in memory.  But they are
somewhat complex, and I'm not sure this case justifies such
complexity, as we shouldn't care in this case about the lack of
robustness of the average as a statistic.

```* Re: master 792ba71: Add a new function 'buffer-line-statistics'
2021-01-12 18:29     ` Philipp Stephani
@ 2021-01-12 19:06       ` Stefan Monnier
0 siblings, 0 replies; 8+ messages in thread
From: Stefan Monnier @ 2021-01-12 19:06 UTC (permalink / raw)
To: Philipp Stephani; +Cc: Lars Ingebrigtsen, Emacs developers

> That's the median. "Mean" typically means "arithmetic mean."

Duh!  So that's what we call "experience", eh?

Stefan

PS: "experience: when you make a mistake and notice that it's not the
first time"

```* Re: master 792ba71: Add a new function 'buffer-line-statistics'
2021-01-12 18:39     ` Lars Ingebrigtsen
@ 2021-01-12 19:18       ` Eli Zaretskii
2021-01-12 19:54         ` Stefan Monnier
From: Eli Zaretskii @ 2021-01-12 19:18 UTC (permalink / raw)
To: Lars Ingebrigtsen; +Cc: monnier, emacs-devel

> From: Lars Ingebrigtsen <larsi@gnus.org>
> Date: Tue, 12 Jan 2021 19:39:54 +0100
> Cc: emacs-devel@gnu.org
>
> That's the median, and that's what I also thought, but it turns out that
> you can approximate the median to the desired accuracy with much less
> than O(N) space:
>
>   https://www.stat.cmu.edu/~ryantibs/papers/median.pdf

I used this one:

https://www.cse.wustl.edu/~jain/papers/ftp/psqr.pdf

```* Re: master 792ba71: Add a new function 'buffer-line-statistics'
2021-01-12 19:18       ` Eli Zaretskii
@ 2021-01-12 19:54         ` Stefan Monnier
0 siblings, 0 replies; 8+ messages in thread
From: Stefan Monnier @ 2021-01-12 19:54 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: Lars Ingebrigtsen, emacs-devel

>>   https://www.stat.cmu.edu/~ryantibs/papers/median.pdf
> I used this one:
>   https://www.cse.wustl.edu/~jain/papers/ftp/psqr.pdf

Thanks!

FWIW, last time I needed a fast approximation of the median was in the
`src/profiler.c` code, and I just used the approach you can see in
`approximate_median` (basically, the approximate median I used is the
median of 3 (recursive) approximate medians).

So, it's O(N) worst case time complexity and O(log N) worst case
(stack) space complexity.

Stefan

```end of thread, other threads:[~2021-01-12 19:54 UTC | newest]

2021-01-12 18:15   ` master 792ba71: Add a new function 'buffer-line-statistics' Stefan Monnier
2021-01-12 18:29     ` Philipp Stephani
2021-01-12 19:06       ` Stefan Monnier
2021-01-12 18:37     ` Eli Zaretskii
2021-01-12 18:43       ` Eli Zaretskii
2021-01-12 18:39     ` Lars Ingebrigtsen
2021-01-12 19:18       ` Eli Zaretskii
2021-01-12 19:54         ` Stefan Monnier
```

```unofficial mirror of emacs-devel@gnu.org

This inbox may be cloned and mirrored by anyone:

git clone --mirror https://yhetil.org/emacs-devel/0 emacs-devel/git/0.git

# If you have public-inbox 1.1+ installed, you may
# initialize and index your mirror using the following commands:
public-inbox-init -V2 emacs-devel emacs-devel/ https://yhetil.org/emacs-devel \
emacs-devel@gnu.org
public-inbox-index emacs-devel

Example config snippet for mirrors.
Newsgroups are available over NNTP:
nntp://news.yhetil.org/yhetil.emacs.devel
nntp://news.gmane.io/gmane.emacs.devel

AGPL code for this site: git clone http://ou63pmih66umazou.onion/public-inbox.git```