From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: =?UTF-8?Q?=C3=93scar?= Fuentes Newsgroups: gmane.emacs.bugs Subject: bug#13949: 24.4.1; `fill-paragraph' should not always put the buffer as modified Date: Mon, 28 Mar 2016 00:03:05 +0200 Message-ID: <87lh538tpi.fsf@wanadoo.es> References: <56F12360.5030301@ro.ru> <83y49a4hga.fsf@gnu.org> <56F1837D.4060300@ro.ru> <83io0e4b5r.fsf@gnu.org> <56F19203.5040501@ro.ru> <87a8lkd2bc.fsf@wanadoo.es> <9d1fed3c-fdcb-dfe3-e04d-47680d3e0531@yandex.ru> <87egawaq84.fsf@wanadoo.es> <5ae788d4-42cc-e1f9-dfa4-c25ff2acc10f@yandex.ru> <6d4fb517-8fc7-6d8f-afce-9387e509a46c@yandex.ru> <87shzb8wd5.fsf@wanadoo.es> <343192e9-ea13-21fa-45be-b2d1737bedb7@yandex.ru> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Trace: ger.gmane.org 1459116266 5246 80.91.229.3 (27 Mar 2016 22:04:26 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Sun, 27 Mar 2016 22:04:26 +0000 (UTC) Cc: John Wiegley , Lars Magne Ingebrigtsen , 13949@debbugs.gnu.org To: Dmitry Gutov Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Mon Mar 28 00:04:13 2016 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1akIn6-0003I9-8k for geb-bug-gnu-emacs@m.gmane.org; Mon, 28 Mar 2016 00:04:12 +0200 Original-Received: from localhost ([::1]:37577 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1akIn5-0006ln-Fy for geb-bug-gnu-emacs@m.gmane.org; Sun, 27 Mar 2016 18:04:11 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:42563) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1akIn1-0006lG-IP for bug-gnu-emacs@gnu.org; Sun, 27 Mar 2016 18:04:08 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1akImw-0001Tn-Ha for bug-gnu-emacs@gnu.org; Sun, 27 Mar 2016 18:04:07 -0400 Original-Received: from debbugs.gnu.org ([208.118.235.43]:43360) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1akImw-0001Tj-Dw for bug-gnu-emacs@gnu.org; Sun, 27 Mar 2016 18:04:02 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1akImw-0006sP-7T for bug-gnu-emacs@gnu.org; Sun, 27 Mar 2016 18:04:02 -0400 X-Loop: help-debbugs@gnu.org Resent-From: =?UTF-8?Q?=C3=93scar?= Fuentes Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Sun, 27 Mar 2016 22:04:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 13949 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: Original-Received: via spool by 13949-submit@debbugs.gnu.org id=B13949.145911619526370 (code B ref 13949); Sun, 27 Mar 2016 22:04:02 +0000 Original-Received: (at 13949) by debbugs.gnu.org; 27 Mar 2016 22:03:15 +0000 Original-Received: from localhost ([127.0.0.1]:40485 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1akImA-0006rG-Lz for submit@debbugs.gnu.org; Sun, 27 Mar 2016 18:03:14 -0400 Original-Received: from relaycp01.dominioabsoluto.net ([217.116.26.68]:39516) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1akIm8-0006r0-KJ for 13949@debbugs.gnu.org; Sun, 27 Mar 2016 18:03:13 -0400 Original-Received: from smtp.movistar.es (smtp11.acens.net [86.109.99.135]) by relaycp01.dominioabsoluto.net (Postfix) with ESMTP id AAB724301; Mon, 28 Mar 2016 00:03:06 +0200 (CEST) X-CTCH-RefID: str=0001.0A0B0206.56F8589A.0114, ss=1, re=0.000, recu=0.000, reip=0.000, cl=1, cld=1, fgs=0 X-CTCH-VOD: Unknown X-CTCH-Spam: Unknown Original-Received: from qcore (79.153.146.151) by smtp.movistar.es (8.6.122.03) (authenticated as 981711563$telefonica.net) id 56BC428E034F8028; Sun, 27 Mar 2016 22:03:22 +0000 In-Reply-To: <343192e9-ea13-21fa-45be-b2d1737bedb7@yandex.ru> (Dmitry Gutov's message of "Mon, 28 Mar 2016 00:20:08 +0300") User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.0.92 (gnu/linux) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 208.118.235.43 X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Original-Sender: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.bugs:115613 Archived-At: Dmitry Gutov writes: > On 03/28/2016 12:05 AM, =C3=93scar Fuentes wrote: > >> I guess that the extra bits of entropy (160 vs 128) was a "fuzzy-warm" >> factor too on using SHA-1 instead of MD5. Git must avoid collisions >> among potentially hundreds of millions of objects (repos with that size >> already exists or will exist on the near future.) > > Are there fewer different texts we'd have to be able to discern? As stated on my previous message, statistically it is entirely different to avoid collisions among pairs of objects than within arbitrarily large sets. For this case we are on the pair scenario. IIUC, Lars' idea about using hashes on buffers to test for modification also is the pair case. >> Each and every hash >> must be different from all the others and hence avoid the Birthday >> Problem. Anyway, 128 bit hashes still would be good enough for those >> huge repos. fill-paragraph needs to discriminate only between 2 chunks >> of data. > > I think you mean "2 chunks of data that must only be different in > positioning and presence of newlines". Then yes, the odds of a > collision must be slim. Still, I haven't seen (or performed) a > sufficient analysis to evaluate them. For naturally occurring modifications (opposed to specially chosen modifications with the purpose of creating collisions), inserting newlines or any string makes little difference to the hash algorithm. >>> b) Git has a global object index. It _can_ detect collisions, or at >>> least that detection can be implemented. >> >> And what to do when a collision is detected? > > Abort the current operation? Wait 50ms and retry creating the commit? > Not 100% how the file contents are indexed: e.g. whether mtime factors > into its hash value, too. This would not work, for several reasons (colliding commits exists before they are merged or incorporated into a repo where they met; file and tree objects, whose content is identified by their SHA-1 hashes, can not be "retried"; etc.) Having a collision is something that Should Not Happen on Git, and the designers chose a crypto hash precisely because those algorithms are the best at avoiding collisions. >> Back to the topic, your suggetion about comparing the pre- and post- >> contents of the paragraph (and avoiding huge copies of the pre- contents >> by restricting the copied area to the paragraph itself) does not work >> when the file contains just one paragraph. Try visiting a big CSV dump >> or log and press M-q. You can abort the operation with C-g, but if Emacs >> starts to swap like crazy or exceeds the process memory limit and it is >> killed... > > You can choose to skip the "did it changed" check if the region to > check is too long. If the dump was one huge line, we can be confident > that it will be changed upon filling. What about a file with lots of lines? If you intentionally press M-q on such a file and see the modified indicator, you either will assume that the file changed or use `diff-buffer-with-file' to check for modifications and possibly be greeted with a very long (possibly longer than the original file) diff that will render Emacs to its feet. Using the hash approach will put the "too long" threshold on a higher level (or eliminate it altogether), does not require extra memory and it is simpler to implement. Dmitry, if your proposal about comparing the paragraphs is motivated only by your fear of hash collisions, you are way out off the mark there :-)