unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
From: "Óscar Fuentes" <ofv@wanadoo.es>
To: emacs-devel@gnu.org
Subject: Re: Recommend these .gitconfig settings for git integrity.
Date: Wed, 03 Feb 2016 17:20:47 +0100	[thread overview]
Message-ID: <87k2ml4vfk.fsf@wanadoo.es> (raw)
In-Reply-To: 56B1ACB4.7090705@cs.ucla.edu

Paul Eggert <eggert@cs.ucla.edu> writes:

> On 02/02/2016 10:48 AM, Óscar Fuentes wrote:
>> Emacs is no small project.
>
> Emacs is considerably smaller than other projects that Git regularly
> deals with. The Emacs master branch has 125,000 commits; the Linux
> kernel has 574,000 commits in its master branch (this omits history
> before 2005). I've seen reports of git repositories at Facebook where
> the .git subdirectory contains 50 GB. By comparison, my repository for
> Emacs master has 0.25 GB and for the Linux kernelhas 1.7 GB. These
> numbers can vary quite a bit depending on packing and so forth; still,
> the point remains that Emacs is not nearly as large as other projects
> that use git.

The fact that there are projects with a larger repo than Emacs doesn't
mean that Emacs is not large.

>> If the slow down affects large transfers ("large" meaning either
>> many objects or big objects) what happens if an Emacs hackers pauses
>> his activity for several months and then pulls? (after monitoring
>> emacs-diffs for several years, I can attest that this scenario is
>> quite frequent.) 
>
> I doubt whether it will slow down such pulls significantly for typical
> Emacs development. I just now did a simple benchmark that cloned Emacs
> master from savannah to my desktop at UCLA, and the overhead from the
> transfer.fsckObjects setting was swamped by noise due to the network
> being slower or faster.Here are a few details:
>
>     average times
>     real      user+system   fsckObjects value
>     212.4       202.6           true
>     217.1       195.7           false
>
>     The command used was:
>     git clone --config transfer.fsckObjects=VALUE git://git.sv.gnu.org/emacs.git
>
>     I warmed up with a clone that I discarded. I then tried three of
>     each command, interleaved, and took the averages.
>
> This is just one benchmark of course, but it's suggestive.

You are hypothesizing about the reasons why fsckObjects is not the
default. Instead of making up reasons and proving yourself right, the
reasonable thing to do is to ask the git maintainers. If they respond
citing performance concerns, you prove *them* wrong with some
experiments, so they get convinced about enabling the option.

I woulnd't trust any experiment about fsckObjects without asking the git
maintainers about its specific drawbacks (not just "performance", but
"performance on such and such scenario.")

>> If we wish to avoid tainted objects created by whatever cause, the
>> check can be enabled on the Savannah's repo, hence limiting the
>> problem to the "infected" user.
>
> The problem would not be limited to the infected user, if that user
> pushes commits. Git on the client could remove the taint without
> actually fixing the problem. Although the pushed commits would have
> checksums that match their data, the pushed data would be corrupt.
>
> A possible source of problems in this area is an attack on the
> integrity of the Emacs repository by a determined outsider. Any such
> attacks cannot be discounted by appealing to estimates based purely on
> counts of random hardware or software or configuration errors, as the
> attacks would not be random.

So enabling the check on the server doesn't protect it, but for
protecting the server the check must be enabled on the user's clone...
That doesn't make sense. It either works or doesn't. If you are able to
push tainted objects to a server that has the check enabled, you also
can put them to the user's repo.

>> The problem seems to be so rare that a single Emacs hacker
>> experiencing it every decade or so
>
> Perhaps you're right, but perhaps not. Really, we do not know how
> common this particular problem is. In my experience strange problems
> with git occur more often than once per decade.

Yes, and it is reasonable to expect that as more features you activate,
more problems can arise.

>> (doesn't warrant the risks and inconveniences associated with using
>> the setting (see my previous messages.)
>
> Although I recall concerns based on hypothetical scenarios, I don't
> recall descriptions of real inconveniences. Perhaps I missed an email
> or two.

See above.

[snip]

>> Thanks, although it is now a bit late. It is already installed on
>> several repos, for sure. And anybody that stumbles on the revisions that
>> contained your change (by bisecting some bug, for example) will have his
>> .git/config file modified. I'm pretty sure that you didn't think of this
>> issues when you made the change (neither I did at first.)
>
> No, I anticipated these issues. There's no evidence that they are
> significant problems.git bisect will still work.

Of course bisect will work. My point is that running autogen.sh on any
of those commits will put the setting on .git/config, no matter that you
reverted autogen.sh on a subsequent commit. So people like me, who often
use bisect, will end with the setting on .git/config again and again.

And we are talking about a 4 day timeframe. Too early for talking about
evidences.

>> I appreciate your concern but that has an easy solution: enable it
>> on the server (and on your own machine, to be extra sure.) See,
>> problem solved
>
> That would not solve the problem. True, it would help against some
> random errors, but it won't work in general even there, much less
> against a determined attack.
>
> I take your point that there's no rush in making this change, and it
> will be helpful to gain experience on it among developers who prefer a
> more bleeding-edge environment, so I have reverted the change on
> emacs-25 and installed a considerably more conservative approach on
> master to help us get started. The new approach does not alter git
> configuration if a developer invokes plain './autogen.sh'. Instead a
> developer can invoke the script with a newly-introduced extension:
> either './autogen.sh git' to configure just git, or './autogen.sh all'
> to configure both autoconf and git. I hope this helps address the
> concerns raised in this thread.

Thank you. But in the future, I would appreciate if you discuss a change
like this before rushing to commit it and cause irreversible
consequences.




  reply	other threads:[~2016-02-03 16:20 UTC|newest]

Thread overview: 47+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-01-31 20:22 Recommend these .gitconfig settings for git integrity Karl Fogel
2016-01-31 20:35 ` Eli Zaretskii
2016-01-31 21:37   ` Karl Fogel
2016-01-31 21:48     ` Paul Eggert
2016-02-01 15:42       ` Karl Fogel
2016-02-01 16:01         ` Óscar Fuentes
2016-02-01 16:24           ` Stefan Monnier
2016-02-01 16:39             ` Karl Fogel
2016-02-01 19:12               ` Stefan Monnier
2016-02-01 19:56                 ` Paul Eggert
2016-02-01 20:28                   ` Eli Zaretskii
2016-02-01 21:40                     ` Stefan Monnier
2016-02-02  8:02                     ` Paul Eggert
2016-02-02  8:17                       ` John Wiegley
2016-02-02 12:58                       ` Stefan Monnier
2016-02-02 15:49                       ` Óscar Fuentes
2016-02-02 17:55                         ` Paul Eggert
2016-02-02 18:48                           ` Óscar Fuentes
2016-02-03  7:31                             ` Paul Eggert
2016-02-03 16:20                               ` Óscar Fuentes [this message]
2016-02-03 18:10                                 ` Paul Eggert
2016-02-03 20:50                                   ` Óscar Fuentes
2016-02-04  3:53                                     ` Stefan Monnier
2016-02-02 23:22                           ` Karl Fogel
2016-02-03  0:20                             ` Lars Ingebrigtsen
2016-02-03  2:16                             ` John Wiegley
2016-02-03  2:26                               ` Paul Eggert
2016-02-03  6:35                                 ` John Wiegley
2016-02-03 15:47                                 ` Eli Zaretskii
2016-02-03 17:40                                   ` Paul Eggert
2016-02-03 17:52                                     ` Eli Zaretskii
2016-02-03 18:04                                       ` Paul Eggert
2016-02-04  0:20                                         ` Lars Ingebrigtsen
2016-02-02 16:19                       ` Eli Zaretskii
2016-02-01 18:39             ` John Wiegley
2016-02-01 16:35           ` Paul Eggert
2016-02-01 16:51             ` Óscar Fuentes
2016-02-01 17:40               ` Paul Eggert
2016-02-01 20:34                 ` Óscar Fuentes
2016-02-01 18:09               ` Karl Fogel
2016-02-01 20:56                 ` Óscar Fuentes
2016-02-01 21:07                   ` Karl Fogel
2016-02-02 10:30                 ` Tom
2016-02-02 15:37                   ` Paul Eggert
2016-02-02 17:24                     ` Tom
2016-02-02 17:54                       ` Paul Eggert
2016-02-01 20:50             ` Karl Fogel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87k2ml4vfk.fsf@wanadoo.es \
    --to=ofv@wanadoo.es \
    --cc=emacs-devel@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).