unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
From: grischka <grishka@gmx.de>
To: Kenichi Handa <handa@m17n.org>
Cc: ulm@gentoo.org, schwab@linux-m68k.org, monnier@iro.umontreal.ca,
	emacs-devel@gnu.org
Subject: Re: Case mapping of sharp s
Date: Thu, 26 Nov 2009 14:07:51 +0100	[thread overview]
Message-ID: <4B0E7DA7.3010402@gmx.de> (raw)
In-Reply-To: <tl7r5rn9wsc.fsf@m17n.org>

Kenichi Handa wrote:
> In article <4B0C32BF.2020708@gmx.de>, grischka <grishka@gmx.de> writes:
> 
>> DEC_BOTH is maybe not slower than INC_BOTH, but two DEC_BOTH
>> are (as with Andy's patch).  Moderately slower, still ;)
> 
> So, changing the current backward matching to forward
> matching should is effective.
> 

No, there is no such condition.  There are several ways to avoid
the duplicate DEC_POS, on being to handle the "pattern_len == 0"
case right at the top of the function, for all its branches.

>> The originally observed slowness was not because of the usage of
>> CHAR_TO_BYTE, but because of the flaws in CHAR_TO_BYTE, such as
>> using unrelated "best_below" and "best_above" in the same expression.
> 
>> For the numbers, with my 100MB file test case:
> 
>> backward search previously:
>> 	14 .. 90 s (random)
>> backward search with fixed CHAR_TO_BYTE:
>> 	5.6 s
> 
> I don't see any fix of CHAR_TO_BYTE in the current CVS
> code.  Where is it?

Those tests were made with ad hoc modifications as needed. There
was also some code to measure the times, of course.

>> In any case, with some tweaking it is possible to improve both
>> directions by ~70% (that is down to about 1 sec for the test
>> case).  I still don't know why boyer_moore with a one-char
>> pattern takes only 0.5 seconds though.  It's amazingly fast.
> 
> Are you comparing both methods with the same value of
> case-fold-search?

Same value, but not same search patterns.  One with "sharp s",
one without.

Actually I just wanted to check the facts with the originally in
this thread proposed "sharp s" patch, because some people wrote it
would be too slow.  FWIW I don't think it would be any problem.

>> Btw it seems that long loading time for the big file has much to
>> do with inefficient counting of newlines.  Appearently it takes
>> ~2 sec to load the file and then another ~6 sec to scan newlines.
>> It should be (far) under 0.5 sec.
> 
> Why is the code of counting newlines called when we just
> visit a file?

I have no idea why.  Opening the 100MB file would call scan_buffer
(for \n) 67637 times.  The file has 3142771 lines though, so I take
it back: it's probably not "counting newlines" in that sense.  Maybe
it comes from  "Loading cc-langs ..." which happens after the first
2 seconds.

--- grischka

> 
> ---
> Kenichi Handa
> handa@m17n.org
> 





  reply	other threads:[~2009-11-26 13:07 UTC|newest]

Thread overview: 66+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-11-19 19:48 Case mapping of sharp s grischka
2009-11-19 21:49 ` Stefan Monnier
2009-11-19 22:43   ` David Kastrup
2009-11-20  2:08     ` Stefan Monnier
2009-11-20  8:03       ` David Kastrup
2009-11-20 14:14         ` Stefan Monnier
2009-11-20  3:41     ` Stephen J. Turnbull
2009-11-20  4:20       ` Stefan Monnier
2009-11-20  7:13         ` Stephen J. Turnbull
2009-11-21  0:02           ` Richard Stallman
2009-11-21 12:39             ` David Kastrup
2009-11-21 17:40               ` Stephen J. Turnbull
2009-11-21 19:15                 ` Eli Zaretskii
2009-11-22  2:58                   ` Stephen J. Turnbull
2009-11-22  4:28                     ` Eli Zaretskii
2009-11-22  8:27                       ` Stephen J. Turnbull
2009-11-23  1:30                 ` Kenichi Handa
2009-11-21 22:52               ` Richard Stallman
2009-11-20  8:10     ` Ulrich Mueller
2009-11-20 11:46       ` Stephen J. Turnbull
2009-11-20 14:43         ` Ulrich Mueller
2009-11-21  4:33           ` Stephen J. Turnbull
2009-11-19 23:25   ` grischka
2009-11-20  2:11     ` Stefan Monnier
2009-11-21  3:08       ` grischka
2009-11-21  8:58         ` Eli Zaretskii
2009-11-21  9:33           ` Andreas Schwab
2009-11-21 11:45             ` Eli Zaretskii
2009-11-21 15:33           ` grischka
2009-11-21 10:41         ` Ulrich Mueller
2009-11-21 11:58           ` Andreas Schwab
2009-11-21 17:01             ` Ulrich Mueller
2009-11-22 12:11               ` Andreas Schwab
2009-11-22 20:15                 ` Stefan Monnier
2009-11-24 12:26             ` Kenichi Handa
2009-11-24 19:23               ` grischka
2009-11-25  2:13                 ` Kenichi Handa
2009-11-26 13:07                   ` grischka [this message]
2009-11-29 22:03                   ` Juri Linkov
2009-11-30  1:22                     ` Stefan Monnier
2009-11-30  1:28                     ` Kenichi Handa
2009-11-30  1:36                       ` Kenichi Handa
2009-11-30  7:01                     ` Ulrich Mueller
2009-11-30 12:01                       ` Juri Linkov
2009-11-30 13:09                         ` martin rudalics
2009-11-30 21:57                       ` Juri Linkov
2009-11-30 22:34                         ` Ulrich Mueller
2009-12-01  0:02                           ` Juri Linkov
  -- strict thread matches above, loose matches on Subject: below --
2009-11-15 14:29 Ulrich Mueller
2009-11-16 12:06 ` Kenichi Handa
2009-11-16 16:38   ` Ulrich Mueller
2009-11-17  7:36     ` Kenichi Handa
2009-11-17 21:23       ` Reiner Steib
2009-11-16 19:12   ` Eli Zaretskii
2009-11-17  7:43     ` martin rudalics
2009-11-17  7:49     ` Kenichi Handa
2009-11-17 18:56       ` Eli Zaretskii
2009-11-18  1:00         ` Kenichi Handa
2009-11-18  4:09           ` Eli Zaretskii
2009-11-18  5:33             ` Stephen J. Turnbull
2009-11-18  6:26             ` Kenichi Handa
2009-11-18 14:44               ` Stefan Monnier
2009-11-18 19:05                 ` Ulrich Mueller
2009-11-19  1:16                   ` Stefan Monnier
2009-11-18 17:58               ` Eli Zaretskii
2009-11-19  1:57                 ` Stephen J. Turnbull

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4B0E7DA7.3010402@gmx.de \
    --to=grishka@gmx.de \
    --cc=emacs-devel@gnu.org \
    --cc=handa@m17n.org \
    --cc=monnier@iro.umontreal.ca \
    --cc=schwab@linux-m68k.org \
    --cc=ulm@gentoo.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).