unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
* bug#33708: 26.1.90; nhexl-mode performance
@ 2018-12-11 17:06 Guido Kraemer
  2018-12-14 18:40 ` Stefan Monnier
  0 siblings, 1 reply; 3+ messages in thread
From: Guido Kraemer @ 2018-12-11 17:06 UTC (permalink / raw)
  To: 33708

Filing a bug report because of the discussion on

https://emacs.stackexchange.com/questions/46492/how-to-search-for-a-sequence-of-bytes-in-hexl-mode/ 


nhexl-mode performance is really bad in large files.

Occurred in the files of the Bitcoin blockchain. The beginning of every
block is marked with the byte sequence `f9beb4d9`. Searching for this
byte sequence in nhexl-mode is really slow.

In case you do not have the bitcoin client installed, I uploaded the
first file of the blockchain here (134MB, link will be valid for 30
days):

https://ufile.io/z08bl

Thanks for looking into this.

Guido.






In GNU Emacs 26.1.90 (build 1, x86_64-pc-linux-gnu, X toolkit, Xaw 
scroll bars)
of 2018-12-01 built on uranus
Repository revision: 7851ae8b443c62a41ea4f4440512aa56cc87b9b7
Windowing system distributor 'The X.Org Foundation





^ permalink raw reply	[flat|nested] 3+ messages in thread

* bug#33708: 26.1.90; nhexl-mode performance
  2018-12-11 17:06 bug#33708: 26.1.90; nhexl-mode performance Guido Kraemer
@ 2018-12-14 18:40 ` Stefan Monnier
       [not found]   ` <e64cb69b-c63c-0f8e-dcc5-b8d56006b67d@bgc-jena.mpg.de>
  0 siblings, 1 reply; 3+ messages in thread
From: Stefan Monnier @ 2018-12-14 18:40 UTC (permalink / raw)
  To: Guido Kraemer; +Cc: 33708

> Occurred in the files of the Bitcoin blockchain. The beginning of every
> block is marked with the byte sequence `f9beb4d9`. Searching for this
> byte sequence in nhexl-mode is really slow.

Duh, indeed it's painful, and it's a plain performance bug in nhexl-mode.
I just installed the patch below which seems to fix it (along with
another less severe bug).
The corresponding 1.2 package should appear soon on GNU ELPA.

Thanks for the test case (tho it was painful to get due to having to go
through my browser and coerce it to run non-free Javascript code).


        Stefan


diff --git a/packages/nhexl-mode/nhexl-mode.el b/packages/nhexl-mode/nhexl-mode.el
index 89d91182f..a52a90081 100644
--- a/packages/nhexl-mode/nhexl-mode.el
+++ b/packages/nhexl-mode/nhexl-mode.el
@@ -807,22 +807,31 @@ Return the corresponding nibble, if applicable."
       (push (string-to-number (substring string i (+ i 2)) 16)
             chars)
       (setq i (+ i 2)))
-    (let* ((base (regexp-quote (apply #'string (nreverse chars))))
-           (newstr
-            (if (>= i (length string))
-                base
-              (cl-assert (= (1+ i) (length string)))
-              (let ((nibble (string-to-number (substring string i) 16)))
-                ;; FIXME: if one of the two bounds is a special char
-                ;; like `]` or `^' we can get into trouble!
-                (format "%s[%c-%c]" base
-                        (* 16 nibble)
-                        (+ 15 (* 16 nibble)))))))
+    (let* ((base (regexp-quote (apply #'unibyte-string (nreverse chars))))
+           (re
+            (concat (if (>= i (length string))
+                        base
+                      (cl-assert (= (1+ i) (length string)))
+                      (let ((nibble (string-to-number (substring string i) 16)))
+                        ;; FIXME: if one of the two bounds is a special char
+                        ;; like `]` or `^' we can get into trouble!
+                        (concat base
+                                (unibyte-string ?\[ (* 16 nibble) ?-
+                                                   (+ 15 (* 16 nibble)) ?\]))))
+                    ;; We also search for the literal hex string here, so the
+                    ;; search stops as soon as one is found, otherwise we too
+                    ;; easily fall into the trap of bug#33708 where at every
+                    ;; cycle we first search unsuccessfully through the whole
+                    ;; buffer with one kind of search before trying the
+                    ;; other search.
+                    ;; Don't bother regexp-quoting the string since we know
+                    ;; it's only made of hex chars!
+                    "\\|" string)))
       (let ((case-fold-search nil))
         (funcall (if isearch-forward
                      #'re-search-forward
                    #'re-search-backward)
-                 newstr bound noerror)))))
+                 re bound noerror)))))
 
 (defun nhexl--isearch-search-fun (orig-fun)
   (let ((def-fun (funcall orig-fun)))
@@ -830,9 +839,18 @@ Return the corresponding nibble, if applicable."
       (unless bound
         (setq bound (if isearch-forward (point-max) (point-min))))
       (let ((startpos (point))
-            (def (funcall def-fun string bound noerror)))
-        ;; Don't search further than what `def-fun' found.
-        (if def (setq bound (match-beginning 0)))
+            def)
+        ;; Hex address search.
+        (when (and nhexl-isearch-hex-addresses
+                   (> (length string) 1)
+                   (string-match-p "\\`[[:xdigit:]]+:?\\'" string))
+          ;; Could be a hexadecimal address.
+          (goto-char startpos)
+          (let ((newdef (nhexl--isearch-match-hex-address string bound noerror)))
+            (when newdef
+              (setq def newdef)
+              (setq bound (match-beginning 0)))))
+        ;; Hex bytes search
         (when (and nhexl-isearch-hex-bytes
                    (> (length string) 1)
                    (string-match-p "\\`[[:xdigit:]]+\\'" string))
@@ -842,12 +860,10 @@ Return the corresponding nibble, if applicable."
             (when newdef
               (setq def newdef)
               (setq bound (match-beginning 0)))))
-        (when (and nhexl-isearch-hex-addresses
-                   (> (length string) 1)
-                   (string-match-p "\\`[[:xdigit:]]+:?\\'" string))
-          ;; Could be a hexadecimal address.
+        ;; Normal search.
+        (progn
           (goto-char startpos)
-          (let ((newdef (nhexl--isearch-match-hex-address string bound noerror)))
+          (let ((newdef (funcall def-fun string bound noerror)))
             (when newdef
               (setq def newdef)
               (setq bound (match-beginning 0)))))
@@ -909,17 +925,19 @@ Return the corresponding nibble, if applicable."
             #'nhexl--isearch-highlight-cleanup)
 (defun nhexl--isearch-highlight-cleanup (&rest _)
   (when (and nhexl-mode nhexl-isearch-hex-highlight)
-    (dolist (ol isearch-lazy-highlight-overlays)
-      (when (and (overlayp ol) (eq (overlay-buffer ol) (current-buffer)))
-        (put-text-property (overlay-start ol) (overlay-end ol)
-                           'fontified nil)))))
+    (with-silent-modifications
+      (dolist (ol isearch-lazy-highlight-overlays)
+        (when (and (overlayp ol) (eq (overlay-buffer ol) (current-buffer)))
+          (put-text-property (overlay-start ol) (overlay-end ol)
+                             'fontified nil))))))
 
 (advice-add 'isearch-lazy-highlight-match :after
             #'nhexl--isearch-highlight-match)
 (defun nhexl--isearch-highlight-match (&optional mb me)
   (when (and nhexl-mode nhexl-isearch-hex-highlight
              (integerp mb) (integerp me))
-    (put-text-property mb me 'fontified nil)))
+    (with-silent-modifications
+      (put-text-property mb me 'fontified nil))))
 
 (defun nhexl--line-width-watcher (_sym _newval op where)
   (when (eq op 'set)





^ permalink raw reply related	[flat|nested] 3+ messages in thread

* bug#33708: 26.1.90; nhexl-mode performance
       [not found]   ` <e64cb69b-c63c-0f8e-dcc5-b8d56006b67d@bgc-jena.mpg.de>
@ 2018-12-14 22:52     ` Stefan Monnier
  0 siblings, 0 replies; 3+ messages in thread
From: Stefan Monnier @ 2018-12-14 22:52 UTC (permalink / raw)
  To: Guido Kraemer

> - Thanks for fixing this, performance is great now!

Thanks for confirming.

> - Sorry for using that weird site for file sharing, you could have installed
> the bitcoin client (which is MIT licensed) and downloaded the
> blockchain ;-),

I hesitated to do that, indeed.

> what would you use for ad-hoc file sharing?

I'd put it on my web server, not linked from any page.

> - I think there is another minor bug: When the cursor is at the very
> beginning of the buffer and you search for the byte sequence at the very
> beginning of the file, search will jump to the second occurrence. Happens in
> the example of the original bug report.

Yeah, it's a misfeature that I'm not sure how to fix:

When you type `C-s f a`, you first search for `f` and this one is not
treated as a hex-search so it jumps to the first `f` char, so when you
get to type `a` Isearch keeps searching from that `f` rather than
restarting from the beginning of the buffer.

I could change the rule so that `C-s f` already treats `f` as
a hex-search, I guess.


        Stefan





^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2018-12-14 22:52 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-12-11 17:06 bug#33708: 26.1.90; nhexl-mode performance Guido Kraemer
2018-12-14 18:40 ` Stefan Monnier
     [not found]   ` <e64cb69b-c63c-0f8e-dcc5-b8d56006b67d@bgc-jena.mpg.de>
2018-12-14 22:52     ` Stefan Monnier

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).