unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
* bug#42931: 27.1; json-pretty-print-buffer on ~2MB line causes core dump
@ 2020-08-19 13:50 Phil Sainty
  2020-08-19 14:15 ` Lars Ingebrigtsen
  2020-08-24 23:46 ` Paul Eggert
  0 siblings, 2 replies; 9+ messages in thread
From: Phil Sainty @ 2020-08-19 13:50 UTC (permalink / raw)
  To: 42931

(I presume this is to do with the native JSON support, as Emacs 26.3
copes fine with the same command on the example files.)

Using the example JSON file from
https://emacs.stackexchange.com/questions/598/how-do-i-prevent-extremely-long-lines-making-emacs-slow

which you can fetch with:

wget https://github.com/Wilfred/ReVo-utilities/blob/a4bdc40dd2656c496defc461fc19c403c8306d9f/revo-export/dictionary.json?raw=true -O one_line.json

and then safely open in Emacs 27 with:

emacs -Q -f global-so-long-mode one_line.json

C-x C-q to make the buffer writeable.

M-x json-pretty-print-buffer

On my system, Emacs hangs for quite a while and then core dumps.

That's an 18MB line.  If I trim it down to ~2MB I still see the same
thing.  You can do that with (write-region 1 2000151 "two_mb.json")
and then appending a single '}' at the end of the new file to make
it valid JSON.

If I trim back to ~1MB the command succeeds.
(write-region 1 1000088 "one_mb.json") and then append '}]}}'

The smaller files are a bit nicer for comparisons with Emacs 26.3,
which *does* cope with the 18MB file, but processes the smaller ones
much faster (and much faster than it takes Emacs 27.1 to fail).


I also note that, when forgetting to toggle the read-only buffer
state first, Emacs 26.3 immediately issues the "json-pretty-print:
Buffer is read-only" error, whereas Emacs 27.1 evidentially tries
to do all the work, and (for a file small enough to not cause it
to crash in the process) only notices the buffer read-only state
once it tries to replace the contents "replace-region-contents:
Buffer is read-only".


-Phil

p.s. If you're unable to replicate this and wish me to use gdb,
please give step by step instructions for the entire process.





In GNU Emacs 27.1 (build 1, x86_64-pc-linux-gnu, X toolkit, Xaw3d scroll bars)
 of 2020-08-12 built on shodan
Windowing system distributor 'The X.Org Foundation', version 11.0.12008000
System Description: Ubuntu 18.04.5 LTS

Recent messages:
For information about GNU Emacs and the GNU system, type C-h C-a.
Quit [2 times]
Loading json...done
delete-backward-char: Text is read-only [2 times]
Quit [2 times]
Mark activated

Configured using:
 'configure --prefix=/home/phil/emacs/27.1/usr/local
 --with-x-toolkit=lucid --without-sound'

Configured features:
XAW3D XPM JPEG TIFF GIF PNG RSVG DBUS GSETTINGS GLIB NOTIFY INOTIFY
GNUTLS LIBXML2 FREETYPE HARFBUZZ XFT ZLIB TOOLKIT_SCROLL_BARS LUCID X11
XDBE XIM MODULES THREADS JSON PDUMPER LCMS2 GMP

Important settings:
  value of $LANG: en_NZ.UTF-8
  locale-coding-system: utf-8-unix

Major mode: Dired by name

Minor modes in effect:
  tooltip-mode: t
  global-eldoc-mode: t
  electric-indent-mode: t
  mouse-wheel-mode: t
  tool-bar-mode: t
  menu-bar-mode: t
  file-name-shadow-mode: t
  global-font-lock-mode: t
  font-lock-mode: t
  blink-cursor-mode: t
  auto-composition-mode: t
  auto-encryption-mode: t
  auto-compression-mode: t
  buffer-read-only: t
  line-number-mode: t
  transient-mark-mode: t

Load-path shadows:
None found.

Features:
(shadow sort mail-extr json map emacsbug message rmc puny format-spec
rfc822 mml easymenu mml-sec password-cache epa derived epg epg-config
gnus-util rmail rmail-loaddefs text-property-search time-date subr-x seq
byte-opt gv bytecomp byte-compile cconv mm-decode mm-bodies mm-encode
mail-parse rfc2231 mailabbrev gmm-utils mailheader cl-loaddefs cl-lib
sendmail rfc2047 rfc2045 ietf-drums mm-util mail-prsvr mail-utils dired
dired-loaddefs advice tooltip eldoc electric uniquify ediff-hook
vc-hooks lisp-float-type mwheel term/x-win x-win term/common-win x-dnd
tool-bar dnd fontset image regexp-opt fringe tabulated-list replace
newcomment text-mode elisp-mode lisp-mode prog-mode register page
tab-bar menu-bar rfn-eshadow isearch timer select scroll-bar mouse
jit-lock font-lock syntax facemenu font-core term/tty-colors frame
minibuffer cl-generic cham georgian utf-8-lang misc-lang vietnamese
tibetan thai tai-viet lao korean japanese eucjp-ms cp51932 hebrew greek
romanian slovak czech european ethiopic indian cyrillic chinese
composite charscript charprop case-table epa-hook jka-cmpr-hook help
simple abbrev obarray cl-preloaded nadvice loaddefs button faces
cus-face macroexp files text-properties overlay sha1 md5 base64 format
env code-pages mule custom widget hashtable-print-readable backquote
threads dbusbind inotify lcms2 dynamic-setting system-font-setting
font-render-setting x-toolkit x multi-tty make-network-process emacs)

Memory information:
((conses 16 50018 10992)
 (symbols 48 6273 1)
 (strings 32 17137 1060)
 (string-bytes 1 545762)
 (vectors 16 9965)
 (vector-slots 8 132814 16180)
 (floats 8 26 42)
 (intervals 56 300 0)
 (buffers 1000 14))





^ permalink raw reply	[flat|nested] 9+ messages in thread

* bug#42931: 27.1; json-pretty-print-buffer on ~2MB line causes core dump
  2020-08-19 13:50 bug#42931: 27.1; json-pretty-print-buffer on ~2MB line causes core dump Phil Sainty
@ 2020-08-19 14:15 ` Lars Ingebrigtsen
  2020-08-19 15:18   ` Eli Zaretskii
  2020-08-24 23:46 ` Paul Eggert
  1 sibling, 1 reply; 9+ messages in thread
From: Lars Ingebrigtsen @ 2020-08-19 14:15 UTC (permalink / raw)
  To: Phil Sainty; +Cc: 42931

Phil Sainty <psainty@orcon.net.nz> writes:

> On my system, Emacs hangs for quite a while and then core dumps.

I can confirm that this leads to a segmentation fault (on Debian).

[Current thread is 1 (Thread 0x7fbbb1c04000 (LWP 2154403))]
(gdb) bt
#0  raise (sig=<optimized out>) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x000055d08b0a0ac9 in terminate_due_to_signal
    (sig=sig@entry=11, backtrace_limit=backtrace_limit@entry=40) at emacs.c:408
#2  0x000055d08b0a0f5f in handle_fatal_signal (sig=sig@entry=11)
    at sysdep.c:1786
#3  0x000055d08b19bf9d in deliver_thread_signal
    (sig=sig@entry=11, handler=0x55d08b0a0f54 <handle_fatal_signal>)
    at sysdep.c:1760
#4  0x000055d08b19c019 in deliver_fatal_thread_signal (sig=11) at sysdep.c:1883
#5  handle_sigsegv (sig=11, siginfo=<optimized out>, arg=<optimized out>)
    at sysdep.c:1883
#6  0x00007fbbb530d140 in <signal handler called> ()
    at /lib/x86_64-linux-gnu/libpthread.so.0
#7  0x000055d08b1f7a43 in compareseq
    (xoff=xoff@entry=897, xlim=xlim@entry=17383858, yoff=yoff@entry=1353, ylim=ylim@entry=25500750, find_minimal=false, ctxt=ctxt@entry=0x7fff5bfa5610)
    at ../lib/diffseq.h:472
#8  0x000055d08b1f7d94 in compareseq (xoff=<optimized out>, 
    xoff@entry=897, xlim=xlim@entry=17383882, yoff=yoff@entry=1353, ylim=ylim@entry=25500806, find_minimal=false, ctxt=ctxt@entry=0x7fff5bfa5610)
    at ../lib/diffseq.h:510
#9  0x000055d08b1f7d94 in compareseq (xoff=<optimized out>, 
    xoff@entry=897, xlim=xlim@entry=17383917, yoff=yoff@entry=1353, ylim=ylim@en
try=25500849, find_minimal=false, ctxt=ctxt@entry=0x7fff5bfa5610)
    at ../lib/diffseq.h:510
#10 0x000055d08b1f7d94 in compareseq (xoff=<optimized out>, 
    xoff@entry=897, xlim=xlim@entry=17383963, yoff=yoff@entry=1353, ylim=ylim@entry=25500881, find_minimal=false, ctxt=ctxt@entry=0x7fff5bfa5610)
    at ../lib/diffseq.h:510
#11 0x000055d08b1f7d94 in compareseq (xoff=<optimized out>, 
    xoff@entry=897, xlim=xlim@entry=17384016, yoff=yoff@entry=1353, ylim=ylim@entry=25500898, find_minimal=false, ctxt=ctxt@entry=0x7fff5bfa5610)
    at ../lib/diffseq.h:510
#12 0x000055d08b1f7d94 in compareseq (xoff=<optimized out>, 
    xoff@entry=897, xlim=xlim@entry=17384024, yoff=yoff@entry=1353, ylim=ylim@entry=25500964, find_minimal=false, ctxt=ctxt@entry=0x7fff5bfa5610)
    at ../lib/diffseq.h:510

down to...

Wow, that's a long backtrace.

Hm.  Is gdb inflooping?  Is that possible?

No, it finished:

#36798 0x000055d08b1f7db9 in compareseq (xoff=<optimized out>, xoff@entry=146, xlim=xlim@entry=18922266, yoff=yoff@entry=186, ylim=ylim@entry=27160236, find_minimal=false, ctxt=ctxt@entry=0x7fff5bfa5610) at ../lib/diffseq.h:512
#36799 0x000055d08b1f7d94 in compareseq (xoff=<optimized out>, xoff@entry=146, xlim=xlim@entry=18922364, yoff=yoff@entry=186, ylim=ylim@entry=27160398, find_minimal=find_minimal@entry=false, ctxt=ctxt@entry=0x7fff5bfa5610) at ../lib/diffseq.h:510
#36800 0x000055d08b1f7db9 in compareseq (xoff=<optimized out>, xoff@entry=0, xlim=18922364, xlim@entry=18922365, yoff=1, yoff@entry=0, ylim=27160398, ylim@entry=27160399, find_minimal=find_minimal@entry=false, ctxt=ctxt@entry=0x7fff5bfa5610) at ../lib/diffseq.h:512
#36801 0x000055d08b1f8973 in Freplace_buffer_contents (source=0x55d08c598035, max_secs=<optimized out>, max_costs=<optimized out>) at editfns.c:2038
#36802 0x000055d08b1fd493 in Ffuncall (nargs=4, args=args@entry=0x7fff5bfa5758) at lisp.h:2091
#36803 0x000055d08b237a58 in exec_byte_code (bytestr=<optimized out>, vector=<optimized out>, maxdepth=<optimized out>, args_template=<optimized out>, nargs=<optimized out>, args=<optimized out>) at bytecode.c:632
#36804 0x000055d08b1fd3f7 in Ffuncall (nargs=6, args=args@entry=0x7fff5bfa5ac8) at eval.c:2809
#36805 0x000055d08b237a58 in exec_byte_code (bytestr=<optimized out>, vector=<optimized out>, maxdepth=<optimized out>, args_template=<optimized out>, nargs=<optimized out>, args=<optimized out>) at bytecode.c:632
36806 0x000055d08b1fd3f7 in Ffuncall (nargs=4, args=args@entry=0x7fff5bfa5e10) at eval.c:2809
#36807 0x000055d08b237a58 in exec_byte_code (bytestr=<optimized out>, vector=<optimized out>, maxdepth=<optimized out>, args_template=<optimized out>, nargs=<optimized out>, args=<optimized out>) at bytecode.c:632
#36808 0x000055d08b1fd3f7 in Ffuncall (nargs=nargs@entry=2, args=args@entry=0x7fff5bfa6148) at eval.c:2809
#36809 0x000055d08b1f9f91 in Ffuncall_interactively (nargs=2, args=0x7fff5bfa6148) at callint.c:253
#36810 0x000055d08b1fd493 in Ffuncall (nargs=nargs@entry=3, args=args@entry=0x7fff5bfa6140) at lisp.h:2091
#36811 0x000055d08b1fb216 in Fcall_interactively (function=0xde4130, record_flag=0xb3d0, keys=0x55d08c597c05) at callint.c:779

OK, so it's not a jansson-related thing, but bugging out in
replace-buffer-contents.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no





^ permalink raw reply	[flat|nested] 9+ messages in thread

* bug#42931: 27.1; json-pretty-print-buffer on ~2MB line causes core dump
  2020-08-19 14:15 ` Lars Ingebrigtsen
@ 2020-08-19 15:18   ` Eli Zaretskii
  2020-08-20 13:22     ` Lars Ingebrigtsen
  0 siblings, 1 reply; 9+ messages in thread
From: Eli Zaretskii @ 2020-08-19 15:18 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: psainty, 42931

> From: Lars Ingebrigtsen <larsi@gnus.org>
> Date: Wed, 19 Aug 2020 16:15:38 +0200
> Cc: 42931@debbugs.gnu.org
> 
> Phil Sainty <psainty@orcon.net.nz> writes:
> 
> > On my system, Emacs hangs for quite a while and then core dumps.
> 
> I can confirm that this leads to a segmentation fault (on Debian).
> 
> [Current thread is 1 (Thread 0x7fbbb1c04000 (LWP 2154403))]
> (gdb) bt
> #0  raise (sig=<optimized out>) at ../sysdeps/unix/sysv/linux/raise.c:50
> #1  0x000055d08b0a0ac9 in terminate_due_to_signal
>     (sig=sig@entry=11, backtrace_limit=backtrace_limit@entry=40) at emacs.c:408
> #2  0x000055d08b0a0f5f in handle_fatal_signal (sig=sig@entry=11)
>     at sysdep.c:1786
> #3  0x000055d08b19bf9d in deliver_thread_signal
>     (sig=sig@entry=11, handler=0x55d08b0a0f54 <handle_fatal_signal>)
>     at sysdep.c:1760
> #4  0x000055d08b19c019 in deliver_fatal_thread_signal (sig=11) at sysdep.c:1883
> #5  handle_sigsegv (sig=11, siginfo=<optimized out>, arg=<optimized out>)
>     at sysdep.c:1883
> #6  0x00007fbbb530d140 in <signal handler called> ()
>     at /lib/x86_64-linux-gnu/libpthread.so.0
> #7  0x000055d08b1f7a43 in compareseq
>     (xoff=xoff@entry=897, xlim=xlim@entry=17383858, yoff=yoff@entry=1353, ylim=ylim@entry=25500750, find_minimal=false, ctxt=ctxt@entry=0x7fff5bfa5610)
>     at ../lib/diffseq.h:472
> #8  0x000055d08b1f7d94 in compareseq (xoff=<optimized out>, 
>     xoff@entry=897, xlim=xlim@entry=17383882, yoff=yoff@entry=1353, ylim=ylim@entry=25500806, find_minimal=false, ctxt=ctxt@entry=0x7fff5bfa5610)
>     at ../lib/diffseq.h:510

looks like stack overflow?  I guess the recursive nature of compareseq
is got to cause this at some point?





^ permalink raw reply	[flat|nested] 9+ messages in thread

* bug#42931: 27.1; json-pretty-print-buffer on ~2MB line causes core dump
  2020-08-19 15:18   ` Eli Zaretskii
@ 2020-08-20 13:22     ` Lars Ingebrigtsen
  2020-08-20 13:26       ` Philipp Stephani
  2020-08-20 13:39       ` Eli Zaretskii
  0 siblings, 2 replies; 9+ messages in thread
From: Lars Ingebrigtsen @ 2020-08-20 13:22 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: psainty, 42931

Eli Zaretskii <eliz@gnu.org> writes:

> looks like stack overflow?  I guess the recursive nature of compareseq
> is got to cause this at some point?

Yup.

I'm not sure what to do about it, though.  One easy way to "fix this"
would be to not use replace-region-contents in json-pretty-print if the
region is very large...  but that's kinda just wallpapering over the
problem.

replace-region-contents itself could decide to not do all its fancy
stuff if the region is very large, and just replace the contents in the
normal way instead?

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no





^ permalink raw reply	[flat|nested] 9+ messages in thread

* bug#42931: 27.1; json-pretty-print-buffer on ~2MB line causes core dump
  2020-08-20 13:22     ` Lars Ingebrigtsen
@ 2020-08-20 13:26       ` Philipp Stephani
  2020-08-20 13:39       ` Eli Zaretskii
  1 sibling, 0 replies; 9+ messages in thread
From: Philipp Stephani @ 2020-08-20 13:26 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: Phil Sainty, 42931

Am Do., 20. Aug. 2020 um 15:23 Uhr schrieb Lars Ingebrigtsen <larsi@gnus.org>:
>
> Eli Zaretskii <eliz@gnu.org> writes:
>
> > looks like stack overflow?  I guess the recursive nature of compareseq
> > is got to cause this at some point?
>
> Yup.
>
> I'm not sure what to do about it, though.  One easy way to "fix this"
> would be to not use replace-region-contents in json-pretty-print if the
> region is very large...  but that's kinda just wallpapering over the
> problem.
>
> replace-region-contents itself could decide to not do all its fancy
> stuff if the region is very large, and just replace the contents in the
> normal way instead?


I guess the underlying function (compareseq) should protect against
unbounded recursion and fall back to a more coarse diff if necessary.





^ permalink raw reply	[flat|nested] 9+ messages in thread

* bug#42931: 27.1; json-pretty-print-buffer on ~2MB line causes core dump
  2020-08-20 13:22     ` Lars Ingebrigtsen
  2020-08-20 13:26       ` Philipp Stephani
@ 2020-08-20 13:39       ` Eli Zaretskii
  1 sibling, 0 replies; 9+ messages in thread
From: Eli Zaretskii @ 2020-08-20 13:39 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: psainty, 42931

> From: Lars Ingebrigtsen <larsi@gnus.org>
> Cc: psainty@orcon.net.nz,  42931@debbugs.gnu.org
> Date: Thu, 20 Aug 2020 15:22:31 +0200
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> 
> > looks like stack overflow?  I guess the recursive nature of compareseq
> > is got to cause this at some point?
> 
> Yup.
> 
> I'm not sure what to do about it, though.  One easy way to "fix this"
> would be to not use replace-region-contents in json-pretty-print if the
> region is very large...  but that's kinda just wallpapering over the
> problem.
> 
> replace-region-contents itself could decide to not do all its fancy
> stuff if the region is very large, and just replace the contents in the
> normal way instead?

compareseq is a Gnulib module, so maybe its implementation could be
fixed to bail out when the recursion becomes too deep?





^ permalink raw reply	[flat|nested] 9+ messages in thread

* bug#42931: 27.1; json-pretty-print-buffer on ~2MB line causes core dump
  2020-08-19 13:50 bug#42931: 27.1; json-pretty-print-buffer on ~2MB line causes core dump Phil Sainty
  2020-08-19 14:15 ` Lars Ingebrigtsen
@ 2020-08-24 23:46 ` Paul Eggert
  2020-08-25  6:12   ` Eli Zaretskii
  1 sibling, 1 reply; 9+ messages in thread
From: Paul Eggert @ 2020-08-24 23:46 UTC (permalink / raw)
  To: Phil Sainty; +Cc: 42931, Philipp Stephani, Bruno Haible, Lars Ingebrigtsen

The patch I installed into Emacs master for Bug#43016 also fixes Bug#42931's 
test case, at least for me. However, Bug#42931 prompted me to change the way 
that the Gnulib diffseq module recurses so that the stack size is O(log N) 
rather than O(N). I installed this change into Gnulib, here:

https://git.savannah.gnu.org/cgit/gnulib.git/commit/?id=7aadb23803a8fb71d07e6e87ffb1ca510d86f8ef

and propagated this into Emacs master, here:

https://git.savannah.gnu.org/cgit/emacs.git/commit/?id=d494f9e81a6d11dcf6c22333cd950989b2051dff

I doubt whether this patch needs to be backported into the emacs-27 branch.

In theory even O(log N) might not be good enough if Emacs has a tiny stack and a 
huge buffer, but I doubt whether this is of practical concern.

I'll cc this to Bruno Haible to give him a heads-up, since he created the 
diffseq module. Bruno, the bug report is here:

https://bugs.gnu.org/42931





^ permalink raw reply	[flat|nested] 9+ messages in thread

* bug#42931: 27.1; json-pretty-print-buffer on ~2MB line causes core dump
  2020-08-24 23:46 ` Paul Eggert
@ 2020-08-25  6:12   ` Eli Zaretskii
  2020-08-25 18:19     ` Paul Eggert
  0 siblings, 1 reply; 9+ messages in thread
From: Eli Zaretskii @ 2020-08-25  6:12 UTC (permalink / raw)
  To: Paul Eggert; +Cc: 42931, larsi, bruno, p.stephani2, psainty

> Cc: 42931@debbugs.gnu.org, Lars Ingebrigtsen <larsi@gnus.org>,
>  Eli Zaretskii <eliz@gnu.org>, Philipp Stephani <p.stephani2@gmail.com>,
>  Bruno Haible <bruno@clisp.org>
> From: Paul Eggert <eggert@cs.ucla.edu>
> Date: Mon, 24 Aug 2020 16:46:01 -0700
> 
> In theory even O(log N) might not be good enough if Emacs has a tiny stack and a 
> huge buffer, but I doubt whether this is of practical concern.

What about "normal" Emacs builds?  They usually have between 2MB and
8MB of stack.  Should we worry about stack overflow in these cases?
Maybe it is worth to add a stack-overflow protection to diffseq.h
anyway?  Almost anything is better than a segfault.

Thanks.





^ permalink raw reply	[flat|nested] 9+ messages in thread

* bug#42931: 27.1; json-pretty-print-buffer on ~2MB line causes core dump
  2020-08-25  6:12   ` Eli Zaretskii
@ 2020-08-25 18:19     ` Paul Eggert
  0 siblings, 0 replies; 9+ messages in thread
From: Paul Eggert @ 2020-08-25 18:19 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 42931-done, larsi, bruno, p.stephani2, psainty

On 8/24/20 11:12 PM, Eli Zaretskii wrote:
> What about "normal" Emacs builds?  They usually have between 2MB and
> 8MB of stack.  Should we worry about stack overflow in these cases?

No. On x86-64 Ubuntu 18.04.5 each recursion level consumes 304 bytes. Dividing 2 
MB by 304 gives you 6578 stack frames, which means the algorithm could handle a 
vector of 2**6578 entries, which can't exist anywhere in the known physical 
universe.

On real machines it'd have to be reeeeally tiny stack for this recursion to be a 
significant problem now, so tiny that Emacs would crash for countless other 
reasons. I'll take the liberty of closing the bug report.





^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2020-08-25 18:19 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2020-08-19 13:50 bug#42931: 27.1; json-pretty-print-buffer on ~2MB line causes core dump Phil Sainty
2020-08-19 14:15 ` Lars Ingebrigtsen
2020-08-19 15:18   ` Eli Zaretskii
2020-08-20 13:22     ` Lars Ingebrigtsen
2020-08-20 13:26       ` Philipp Stephani
2020-08-20 13:39       ` Eli Zaretskii
2020-08-24 23:46 ` Paul Eggert
2020-08-25  6:12   ` Eli Zaretskii
2020-08-25 18:19     ` Paul Eggert

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).