unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
* very slow archive-mode
@ 2008-03-12 20:47 Paul Pogonyshev
  2008-03-12 21:30 ` Stefan Monnier
  0 siblings, 1 reply; 12+ messages in thread
From: Paul Pogonyshev @ 2008-03-12 20:47 UTC (permalink / raw)
  To: emacs-devel

Hi,

I regularly open Java source archive (JAR of several tens of megabytes)
in Emacs.  Recently I recompiled Emacs from CVS and noticed that process
of opening and parsing archive had become times slower (didn't measure
precisely, but I guess about 5--10x slowdown is there).  Can anyone
investigate the problem or just guess what changes caused it?  I don't
remember when I compiled Emacs previously, I guess it was a couple month
old.

Paul




^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: very slow archive-mode
  2008-03-12 20:47 very slow archive-mode Paul Pogonyshev
@ 2008-03-12 21:30 ` Stefan Monnier
  2008-03-12 22:41   ` Juri Linkov
  2008-03-13  7:51   ` Kenichi Handa
  0 siblings, 2 replies; 12+ messages in thread
From: Stefan Monnier @ 2008-03-12 21:30 UTC (permalink / raw)
  To: Paul Pogonyshev; +Cc: emacs-devel

> I regularly open Java source archive (JAR of several tens of megabytes)
> in Emacs.  Recently I recompiled Emacs from CVS and noticed that process
> of opening and parsing archive had become times slower (didn't measure
> precisely, but I guess about 5--10x slowdown is there).  Can anyone
> investigate the problem or just guess what changes caused it?  I don't
> remember when I compiled Emacs previously, I guess it was a couple month
> old.

I reported a similar problem.  I believe set-buffer-multibyte is *a lot*
slower now, and may even have a time complexity of O(N^2).


        Stefan




^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: very slow archive-mode
  2008-03-12 21:30 ` Stefan Monnier
@ 2008-03-12 22:41   ` Juri Linkov
  2008-03-13  7:51   ` Kenichi Handa
  1 sibling, 0 replies; 12+ messages in thread
From: Juri Linkov @ 2008-03-12 22:41 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: emacs-devel, Paul Pogonyshev

>> I regularly open Java source archive (JAR of several tens of megabytes)
>> in Emacs.  Recently I recompiled Emacs from CVS and noticed that process
>> of opening and parsing archive had become times slower (didn't measure
>> precisely, but I guess about 5--10x slowdown is there).  Can anyone
>> investigate the problem or just guess what changes caused it?  I don't
>> remember when I compiled Emacs previously, I guess it was a couple month
>> old.
>
> I reported a similar problem.  I believe set-buffer-multibyte is *a lot*
> slower now, and may even have a time complexity of O(N^2).

Unfortunately, this problem destroyed one my important archive today :-(

A 20-megabyte archive that in Emacs 22 opens within 1 sec, in Emacs 23
froze for several minutes, so I was forced to interrupt its visiting
with C-g.  After that, I run a compilation process in another buffer,
that started to automatically save the archive's buffer that was
marked as modified.  As a result, the archive file was corrupted
by this automatic saving.

-- 
Juri Linkov
http://www.jurta.org/emacs/




^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: very slow archive-mode
  2008-03-12 21:30 ` Stefan Monnier
  2008-03-12 22:41   ` Juri Linkov
@ 2008-03-13  7:51   ` Kenichi Handa
  2008-03-13 15:04     ` Stefan Monnier
                       ` (2 more replies)
  1 sibling, 3 replies; 12+ messages in thread
From: Kenichi Handa @ 2008-03-13  7:51 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: emacs-devel, pogonyshev

In article <jwvejafpr5m.fsf-monnier+emacs@gnu.org>, Stefan Monnier <monnier@iro.umontreal.ca> writes:

> > I regularly open Java source archive (JAR of several tens of megabytes)
> > in Emacs.  Recently I recompiled Emacs from CVS and noticed that process
> > of opening and parsing archive had become times slower (didn't measure
> > precisely, but I guess about 5--10x slowdown is there).  Can anyone
> > investigate the problem or just guess what changes caused it?  I don't
> > remember when I compiled Emacs previously, I guess it was a couple month
> > old.

> I reported a similar problem.  I believe set-buffer-multibyte is *a lot*
> slower now, and may even have a time complexity of O(N^2).

I suspect so too.  Now set-buffer-multibyte must convert
more 8-bit bytes to mutlibyte forms and that results in more
movement and increasing of the gap.  This code:

  (let ((str (buffer-string)))
    (erase-buffer)
    (set-buffer-multibyte t)
    (decode-coding-string str 'no-conversion nil (current-buffer))))

runs much faster than set-buffer-multibyte.  But then, I
think it is better that we read archive files into a
multibyte buffer from the start by no-conversion-multibyte.

As I've just found a bug in handling
no-conversion-multibyte, I'll fix it soon.  After that, I am
going to change auto-coding-alist to use
no-conversion-multibyte for archive files, and adjust
arc-mode and tar-mode.

What do you think?

---
Kenichi Handa
handa@ni.aist.go.jp

PS. Another idea is keep archive files in a unibyte buffer
and have the file listing part in another multibyte buffer.

More radical idea is to allow changing multibyteness only in
the narrowed region.




^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: very slow archive-mode
  2008-03-13  7:51   ` Kenichi Handa
@ 2008-03-13 15:04     ` Stefan Monnier
  2008-03-13 15:08     ` Stefan Monnier
  2008-03-14  1:03     ` Juri Linkov
  2 siblings, 0 replies; 12+ messages in thread
From: Stefan Monnier @ 2008-03-13 15:04 UTC (permalink / raw)
  To: Kenichi Handa; +Cc: emacs-devel, pogonyshev

> As I've just found a bug in handling
> no-conversion-multibyte, I'll fix it soon.  After that, I am
> going to change auto-coding-alist to use
> no-conversion-multibyte for archive files, and adjust
> arc-mode and tar-mode.

> What do you think?

I think that the `binary' (better not call it "no-conversion" because
it's a misnomer if one of the ends is multibyte) coding-system should
default as much as possible to unibyte buffers and unibyte strings (so
it indeed does "no conversion" in that case).

> PS. Another idea is keep archive files in a unibyte buffer
> and have the file listing part in another multibyte buffer.

Yes.  I've introduced the new buffer-swap-text primitive specifically to
make that possible.


        Stefan




^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: very slow archive-mode
  2008-03-13  7:51   ` Kenichi Handa
  2008-03-13 15:04     ` Stefan Monnier
@ 2008-03-13 15:08     ` Stefan Monnier
  2008-03-14  1:56       ` YAMAMOTO Mitsuharu
  2008-03-14  1:03     ` Juri Linkov
  2 siblings, 1 reply; 12+ messages in thread
From: Stefan Monnier @ 2008-03-13 15:08 UTC (permalink / raw)
  To: Kenichi Handa; +Cc: emacs-devel, pogonyshev

>> > I regularly open Java source archive (JAR of several tens of megabytes)
>> > in Emacs.  Recently I recompiled Emacs from CVS and noticed that process
>> > of opening and parsing archive had become times slower (didn't measure
>> > precisely, but I guess about 5--10x slowdown is there).  Can anyone
>> > investigate the problem or just guess what changes caused it?  I don't
>> > remember when I compiled Emacs previously, I guess it was a couple month
>> > old.

>> I reported a similar problem.  I believe set-buffer-multibyte is *a lot*
>> slower now, and may even have a time complexity of O(N^2).

> I suspect so too.  Now set-buffer-multibyte must convert
> more 8-bit bytes to mutlibyte forms and that results in more
> movement and increasing of the gap.  This code:

Also, IIUC the 8-bit bytes that are not represented as a single byte and
not only more numerous, but they also take up more space (they used to
take up just 2 bytes but now they take up what 3? 4? 5 bytes?).

For this reason, we should rewrite the set-buffer-multibyte to do things
in two passes: either a first that computes the final size and allocates
the destination and a second that does no re-allocation, or a first that
converts into too-large a destination and a second that shrinks it back
to a reasonable size.


        Stefan




^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: very slow archive-mode
  2008-03-13  7:51   ` Kenichi Handa
  2008-03-13 15:04     ` Stefan Monnier
  2008-03-13 15:08     ` Stefan Monnier
@ 2008-03-14  1:03     ` Juri Linkov
  2008-03-16  2:23       ` Kenichi Handa
  2 siblings, 1 reply; 12+ messages in thread
From: Juri Linkov @ 2008-03-14  1:03 UTC (permalink / raw)
  To: Kenichi Handa; +Cc: pogonyshev, Stefan Monnier, emacs-devel

> As I've just found a bug in handling
> no-conversion-multibyte, I'll fix it soon.  After that, I am
> going to change auto-coding-alist to use
> no-conversion-multibyte for archive files, and adjust
> arc-mode and tar-mode.
>
> What do you think?

Thanks in advance for starting to fix this problem.  I think it is
important not to let a buffer associated with the archive file to
stay in the modified state after interrupting its loading.

Using a separate unibyte buffer may be a good solution, but there is one
possible problem: it would be difficult to find this hidden separate
buffer to kill it with the purpose to free up the memory occupied by the
file buffer after interrupting its loading (perhaps, this buffer can be
killed using unwind-protect during loading and kill-buffer-hook for C-x k).

So it seems a separate unibyte buffer would be necessary only if it will
be impossible to get the fast reading without leaving the file buffer
in the modified state.

PS. Fortunately, I had a copy of the corrupted archive on a DVD,
so nothing was lost, but nevertheless this is a damaging problem.

-- 
Juri Linkov
http://www.jurta.org/emacs/




^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: very slow archive-mode
  2008-03-13 15:08     ` Stefan Monnier
@ 2008-03-14  1:56       ` YAMAMOTO Mitsuharu
  2008-03-14  3:43         ` Kenichi Handa
  0 siblings, 1 reply; 12+ messages in thread
From: YAMAMOTO Mitsuharu @ 2008-03-14  1:56 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: pogonyshev, emacs-devel, Kenichi Handa

>>>>> On Thu, 13 Mar 2008 11:08:52 -0400, Stefan Monnier <monnier@iro.umontreal.ca> said:

>>> I reported a similar problem.  I believe set-buffer-multibyte is
>>> *a lot* slower now, and may even have a time complexity of O(N^2).

>> I suspect so too.  Now set-buffer-multibyte must convert more 8-bit
>> bytes to mutlibyte forms and that results in more movement and
>> increasing of the gap.  This code:

> Also, IIUC the 8-bit bytes that are not represented as a single byte
> and not only more numerous, but they also take up more space (they
> used to take up just 2 bytes but now they take up what 3? 4? 5
> bytes?).

IIUC, it still takes up just 2 bytes by cleverly using vacant UTF-8
area (see character.h).

I think code_convert_region in Emacs 22 tried to avoid the repeated
reallocations and copies by estimating the necessary buffer size from
the intermediate result of conversion in progress. (And there were a
cast problem that affected its performance:
http://lists.gnu.org/archive/html/emacs-pretest-bug/2004-12/msg00197.html)

Does the code conversion in Emacs 23 include such estimation?

				     YAMAMOTO Mitsuharu
				mituharu@math.s.chiba-u.ac.jp




^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: very slow archive-mode
  2008-03-14  1:56       ` YAMAMOTO Mitsuharu
@ 2008-03-14  3:43         ` Kenichi Handa
  2008-03-14  4:17           ` YAMAMOTO Mitsuharu
  0 siblings, 1 reply; 12+ messages in thread
From: Kenichi Handa @ 2008-03-14  3:43 UTC (permalink / raw)
  To: YAMAMOTO Mitsuharu; +Cc: emacs-devel, monnier, pogonyshev

In article <wl63vqhxvl.wl%mituharu@math.s.chiba-u.ac.jp>, YAMAMOTO Mitsuharu <mituharu@math.s.chiba-u.ac.jp> writes:

> > Also, IIUC the 8-bit bytes that are not represented as a single byte
> > and not only more numerous, but they also take up more space (they
> > used to take up just 2 bytes but now they take up what 3? 4? 5
> > bytes?).

> IIUC, it still takes up just 2 bytes by cleverly using vacant UTF-8
> area (see character.h).

Yes.

> I think code_convert_region in Emacs 22 tried to avoid the repeated
> reallocations and copies by estimating the necessary buffer size from
> the intermediate result of conversion in progress. (And there were a
> cast problem that affected its performance:
> http://lists.gnu.org/archive/html/emacs-pretest-bug/2004-12/msg00197.html)

> Does the code conversion in Emacs 23 include such estimation?

Not yet.  But it's not related to the problem of slow
set-buffer-multibyte.

---
Kenichi Handa
handa@ni.aist.go.jp




^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: very slow archive-mode
  2008-03-14  3:43         ` Kenichi Handa
@ 2008-03-14  4:17           ` YAMAMOTO Mitsuharu
  0 siblings, 0 replies; 12+ messages in thread
From: YAMAMOTO Mitsuharu @ 2008-03-14  4:17 UTC (permalink / raw)
  To: Kenichi Handa; +Cc: emacs-devel, monnier, pogonyshev

>>>>> On Fri, 14 Mar 2008 12:43:15 +0900, Kenichi Handa <handa@m17n.org> said:

>> Does the code conversion in Emacs 23 include such estimation?

> Not yet.  But it's not related to the problem of slow
> set-buffer-multibyte.

Sorry, I confused this issue with slowness of visiting binary files.

				     YAMAMOTO Mitsuharu
				mituharu@math.s.chiba-u.ac.jp




^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: very slow archive-mode
  2008-03-14  1:03     ` Juri Linkov
@ 2008-03-16  2:23       ` Kenichi Handa
  2008-03-16 16:31         ` Juri Linkov
  0 siblings, 1 reply; 12+ messages in thread
From: Kenichi Handa @ 2008-03-16  2:23 UTC (permalink / raw)
  To: Juri Linkov; +Cc: pogonyshev, monnier, emacs-devel

In article <87k5k6qjp8.fsf@jurta.org>, Juri Linkov <juri@jurta.org> writes:

> Using a separate unibyte buffer may be a good solution, but there is one
> possible problem: it would be difficult to find this hidden separate
> buffer to kill it with the purpose to free up the memory occupied by the
> file buffer after interrupting its loading (perhaps, this buffer can be
> killed using unwind-protect during loading and kill-buffer-hook for C-x k).

> So it seems a separate unibyte buffer would be necessary only if it will
> be impossible to get the fast reading without leaving the file buffer
> in the modified state.

I installed a change to use no-convesion-multibyte for
archive files.   Please try with the latest code.

> PS. Fortunately, I had a copy of the corrupted archive on a DVD,
> so nothing was lost, but nevertheless this is a damaging problem.

Hering that, I feel greatly reliaved.

---
Kenichi Handa
handa@ni.aist.go.jp




^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: very slow archive-mode
  2008-03-16  2:23       ` Kenichi Handa
@ 2008-03-16 16:31         ` Juri Linkov
  0 siblings, 0 replies; 12+ messages in thread
From: Juri Linkov @ 2008-03-16 16:31 UTC (permalink / raw)
  To: Kenichi Handa; +Cc: pogonyshev, monnier, emacs-devel

>> Using a separate unibyte buffer may be a good solution, but there is one
>> possible problem: it would be difficult to find this hidden separate
>> buffer to kill it with the purpose to free up the memory occupied by the
>> file buffer after interrupting its loading (perhaps, this buffer can be
>> killed using unwind-protect during loading and kill-buffer-hook for C-x k).
>
>> So it seems a separate unibyte buffer would be necessary only if it will
>> be impossible to get the fast reading without leaving the file buffer
>> in the modified state.
>
> I installed a change to use no-convesion-multibyte for
> archive files.   Please try with the latest code.

Thank you very much.  Now large archives are opened instantly.

-- 
Juri Linkov
http://www.jurta.org/emacs/




^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2008-03-16 16:31 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-03-12 20:47 very slow archive-mode Paul Pogonyshev
2008-03-12 21:30 ` Stefan Monnier
2008-03-12 22:41   ` Juri Linkov
2008-03-13  7:51   ` Kenichi Handa
2008-03-13 15:04     ` Stefan Monnier
2008-03-13 15:08     ` Stefan Monnier
2008-03-14  1:56       ` YAMAMOTO Mitsuharu
2008-03-14  3:43         ` Kenichi Handa
2008-03-14  4:17           ` YAMAMOTO Mitsuharu
2008-03-14  1:03     ` Juri Linkov
2008-03-16  2:23       ` Kenichi Handa
2008-03-16 16:31         ` Juri Linkov

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).