unofficial mirror of meta@public-inbox.org
 help / color / mirror / Atom feed
* `git patch-id --stable' vs quoted-printable
@ 2022-08-22  2:25 Eric Wong
  2022-08-22  4:06 ` Junio C Hamano
  0 siblings, 1 reply; 8+ messages in thread
From: Eric Wong @ 2022-08-22  2:25 UTC (permalink / raw)
  To: git; +Cc: meta

While poking around at the newish patchid indexing support in
public-inbox[1], I noticed an inconsistency in how it seems to
mishandle quoted-printable messages.

For instance, René's message can apply fine with `git am':

https://public-inbox.org/git/6727daf1-f077-7319-187e-ab4e55de3b2d@web.de/raw

However, note the blank context lines are truly blank, as in /^$/,
not /^ $/.  Running `git patch-id --stable' on the decoded content
of that message gives me: a245e99dbd2ce6e319955569eb8a111cb044f474
And that's the value public-inbox indexes.

That patch ends up being commit fc0f8bcd64eb0a03a7098f72da9c4008bd48cf11
in git.git.

However,
`git show fc0f8bcd64eb0a03a7098f72da9c4008bd48cf11 |git patch-id --stable'
gives me: fcac4cf581e11b703c229a129072c95c79b68bf

So, I'm wondering if the search indexing code of public-inbox
should s/^$/ /mgs before feeding stuff to `git patch-id'; and/or
if `git patch-id' should be assuming empty lines and lines with a
single SP are the same...


[1] https://public-inbox.org/meta/20220620192730.550803-3-e@80x24.org/

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: `git patch-id --stable' vs quoted-printable
  2022-08-22  2:25 `git patch-id --stable' vs quoted-printable Eric Wong
@ 2022-08-22  4:06 ` Junio C Hamano
  2022-08-22  4:18   ` Junio C Hamano
  2022-08-22 15:58   ` René Scharfe
  0 siblings, 2 replies; 8+ messages in thread
From: Junio C Hamano @ 2022-08-22  4:06 UTC (permalink / raw)
  To: Eric Wong, René Scharfe; +Cc: git, meta

Eric Wong <e@80x24.org> writes:

> While poking around at the newish patchid indexing support in
> public-inbox[1], I noticed an inconsistency in how it seems to
> mishandle quoted-printable messages.
> ...
> So, I'm wondering if the search indexing code of public-inbox
> should s/^$/ /mgs before feeding stuff to `git patch-id'; and/or
> if `git patch-id' should be assuming empty lines and lines with a
> single SP are the same...

I suspect that QP is a red herring.  I haven't looked at relevant
code at all for a while, but what I think is going on is:

 * patch-id algorithm was written back when "unified" format of
   "diff" did not have the extension of GNU origin to allow an empty
   context line to be expressed as a truely empty line, not a single
   whitespace that signals it is a context line, followed by the
   contents of the line that is empty

 * "git apply" hence "git am" was taught to grok the empty context
   line extention, https://pubs.opengroup.org/onlinepubs/9699919799/utilities/diff.html
   has this:

      It is implementation-defined whether an empty unaffected line is
      written as an empty line or a line containing a single <space> character.

   IIRC, this was added after GNU diff started emitting such an
   output (--suppress-blank-empty) and people complained that such a
   patch is not understood by us.

 * "git diff" was updated to allow this with diff.suppressBlankEmpty
   configuration , but that is never turned on by default.

So, if a patch producer runs "git diff" with diff.suppressBlankEmpty
turned on, "git am" accepts it, and then you run "git show" without
the configuration, then the "shape" of the patch text would be
slightly different.  I do not offhand know if we added configuration
support to "patch-id", but even with a configuration knob, because
once you turn incoming e-mail into a commit, the single bit (i.e.
whether suppressBlankEmpty was in use or not) is forever lost, it
would not be of much help.  After all, the incoming patch can be
hand munged to use both "single whitespace and the end of line" and
"a completely empty line" to record an empty context line, and "am"
has to take such a patch happily.

I *think* the right thing to do is for patch-id that takes text
input to normalize the empty context line into one form or the other
(as a conservatist, I would say we should probably pretend as if an
empty context line is always expressed as a single whitespace on a
line by itself) before computing the ID.

René, do you remember if you used diff.suppressBlankEmpty
configuration when generating the patch in question at:

    https://public-inbox.org/git/6727daf1-f077-7319-187e-ab4e55de3b2d@web.de/raw

by the way?

Thanks.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: `git patch-id --stable' vs quoted-printable
  2022-08-22  4:06 ` Junio C Hamano
@ 2022-08-22  4:18   ` Junio C Hamano
  2022-08-22  4:57     ` Eric Wong
  2022-08-22 15:58   ` René Scharfe
  1 sibling, 1 reply; 8+ messages in thread
From: Junio C Hamano @ 2022-08-22  4:18 UTC (permalink / raw)
  To: Eric Wong; +Cc: René Scharfe, git, meta

Junio C Hamano <gitster@pobox.com> writes:

>> So, I'm wondering if the search indexing code of public-inbox
>> should s/^$/ /mgs before feeding stuff to `git patch-id'; and/or
>> if `git patch-id' should be assuming empty lines and lines with a
>> single SP are the same...

Another potential source of issues (not for the patch from René that
was used as an example) is that a patch producer can use different
diff algorithm from the setting you would use to index resulting
commits via "git show | git patch-id".

    $ git show -U5 | git patch-id
    $ git show | git patch-id

likely result in different patch IDs.  

The --patience and the --histogram options affect how common lines
are matched up, again affecting the shape of the patches you compute
patch-ids over.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: `git patch-id --stable' vs quoted-printable
  2022-08-22  4:18   ` Junio C Hamano
@ 2022-08-22  4:57     ` Eric Wong
  0 siblings, 0 replies; 8+ messages in thread
From: Eric Wong @ 2022-08-22  4:57 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: René Scharfe, git, meta

Junio C Hamano <gitster@pobox.com> wrote:
> Junio C Hamano <gitster@pobox.com> writes:
> 
> >> So, I'm wondering if the search indexing code of public-inbox
> >> should s/^$/ /mgs before feeding stuff to `git patch-id'; and/or
> >> if `git patch-id' should be assuming empty lines and lines with a
> >> single SP are the same...
> 
> Another potential source of issues (not for the patch from René that
> was used as an example) is that a patch producer can use different
> diff algorithm from the setting you would use to index resulting
> commits via "git show | git patch-id".
> 
>     $ git show -U5 | git patch-id
>     $ git show | git patch-id
> 
> likely result in different patch IDs.  
> 
> The --patience and the --histogram options affect how common lines
> are matched up, again affecting the shape of the patches you compute
> patch-ids over.

Yes, you're right; though I suppose most users use git defaults.
I'm not sure how useful patchid ends up being, actually...
The old use of dfpre/dfpost blob OIDs seems to have been working
well for years, already.

Possibly OR-ing them with patchid in Xapian will end up getting
good enough search engine coverage.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: `git patch-id --stable' vs quoted-printable
  2022-08-22  4:06 ` Junio C Hamano
  2022-08-22  4:18   ` Junio C Hamano
@ 2022-08-22 15:58   ` René Scharfe
  2022-08-22 16:21     ` Junio C Hamano
  1 sibling, 1 reply; 8+ messages in thread
From: René Scharfe @ 2022-08-22 15:58 UTC (permalink / raw)
  To: Junio C Hamano, Eric Wong; +Cc: git, meta

[-- Attachment #1: Type: text/plain, Size: 425 bytes --]

Am 22.08.22 um 06:06 schrieb Junio C Hamano:
> René, do you remember if you used diff.suppressBlankEmpty
> configuration when generating the patch in question at:
>
>     https://public-inbox.org/git/6727daf1-f077-7319-187e-ab4e55de3b2d@web.de/raw
>
> by the way?

I did not use that option.  Attached the copy from my Sent folder, which
has spaces at the start of the blank lines in the config.txt hunk.

René

[-- Attachment #2: [PATCH] revert: optionally refer to commit in the "reference" format.eml --]
[-- Type: message/rfc822, Size: 2823 bytes --]

From: "René Scharfe" <l.s.r@web.de>
To: Junio C Hamano <gitster@pobox.com>, git@vger.kernel.org
Subject: Re: [PATCH] revert: optionally refer to commit in the "reference" format
Date: Sun, 26 Jun 2022 11:29:35 +0200
Message-ID: <6727daf1-f077-7319-187e-ab4e55de3b2d@web.de>

Am 22.05.22 um 06:32 schrieb Junio C Hamano:
> ---
>  Documentation/config/revert.txt |  3 +++
>  Documentation/git-revert.txt    |  9 +++++++++
>  builtin/revert.c                |  2 ++
>  sequencer.c                     | 32 +++++++++++++++++++++++++++-----
>  sequencer.h                     |  1 +
>  t/t3501-revert-cherry-pick.sh   | 31 +++++++++++++++++++++++++++++++
>  6 files changed, 73 insertions(+), 5 deletions(-)
>  create mode 100644 Documentation/config/revert.txt
>
> diff --git a/Documentation/config/revert.txt b/Documentation/config/revert.txt
> new file mode 100644
> index 0000000000..797bfb6d62
> --- /dev/null
> +++ b/Documentation/config/revert.txt
> @@ -0,0 +1,3 @@
> +revert.reference::
> +	Setting this variable to true makes `git revert` to behave
> +	as if the `--reference` option is given.

Shouldn't this be "were" instead of "is"?  Not fully sure.  Anyway:

--- >8 ---
Subject: [PATCH] revert: config documentation fixes

43966ab315 (revert: optionally refer to commit in the "reference"
format, 2022-05-26) added the documentation file config/revert.txt.
Actually include it in config.txt.

Make is used with a bare infinitive after the object; remove the "to".

Signed-off-by: René Scharfe <l.s.r@web.de>
---
 Documentation/config.txt        | 2 ++
 Documentation/config/revert.txt | 2 +-
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index e284b042f2..e376d547ce 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -495,6 +495,8 @@ include::config/repack.txt[]

 include::config/rerere.txt[]

+include::config/revert.txt[]
+
 include::config/safe.txt[]

 include::config/sendemail.txt[]
diff --git a/Documentation/config/revert.txt b/Documentation/config/revert.txt
index 797bfb6d62..802d6faca2 100644
--- a/Documentation/config/revert.txt
+++ b/Documentation/config/revert.txt
@@ -1,3 +1,3 @@
 revert.reference::
-	Setting this variable to true makes `git revert` to behave
+	Setting this variable to true makes `git revert` behave
 	as if the `--reference` option is given.
--
2.36.1

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: `git patch-id --stable' vs quoted-printable
  2022-08-22 15:58   ` René Scharfe
@ 2022-08-22 16:21     ` Junio C Hamano
  2022-08-22 17:01       ` Eric Wong
  0 siblings, 1 reply; 8+ messages in thread
From: Junio C Hamano @ 2022-08-22 16:21 UTC (permalink / raw)
  To: René Scharfe; +Cc: Eric Wong, git, meta

René Scharfe <l.s.r@web.de> writes:

> Am 22.08.22 um 06:06 schrieb Junio C Hamano:
>> René, do you remember if you used diff.suppressBlankEmpty
>> configuration when generating the patch in question at:
>>
>>     https://public-inbox.org/git/6727daf1-f077-7319-187e-ab4e55de3b2d@web.de/raw
>>
>> by the way?
>
> I did not use that option.  Attached the copy from my Sent folder, which
> has spaces at the start of the blank lines in the config.txt hunk.

Hmph, so I cannot quite explain what removed the leading spaces from
the copy public-inbox archived.

Thanks.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: `git patch-id --stable' vs quoted-printable
  2022-08-22 16:21     ` Junio C Hamano
@ 2022-08-22 17:01       ` Eric Wong
  2022-08-22 18:25         ` Junio C Hamano
  0 siblings, 1 reply; 8+ messages in thread
From: Eric Wong @ 2022-08-22 17:01 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: René Scharfe, git, meta

Junio C Hamano <gitster@pobox.com> wrote:
> René Scharfe <l.s.r@web.de> writes:
> 
> > Am 22.08.22 um 06:06 schrieb Junio C Hamano:
> >> René, do you remember if you used diff.suppressBlankEmpty
> >> configuration when generating the patch in question at:
> >>
> >>     https://public-inbox.org/git/6727daf1-f077-7319-187e-ab4e55de3b2d@web.de/raw
> >>
> >> by the way?
> >
> > I did not use that option.  Attached the copy from my Sent folder, which
> > has spaces at the start of the blank lines in the config.txt hunk.
> 
> Hmph, so I cannot quite explain what removed the leading spaces from
> the copy public-inbox archived.

Both the June 26 message and the latest which arrived directly
in my IMAP inbox (w/o public-inbox) are missing the space in
blank lines, matching what public-inbox.org and lore have.

I don't think postfix, SpamAssassin, or dovecot strip that..

I would expect the blank context lines in the QP version to have
"=20".  I sent this to test@public-inbox.org using
`git send-email --transfer-encoding=quoted-printable' and it
confirms the blank context lines having "=20" which are missing
in René's messages:
https://try.public-inbox.org/test/20220822165736.23246-1-e@80x24.org/raw

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: `git patch-id --stable' vs quoted-printable
  2022-08-22 17:01       ` Eric Wong
@ 2022-08-22 18:25         ` Junio C Hamano
  0 siblings, 0 replies; 8+ messages in thread
From: Junio C Hamano @ 2022-08-22 18:25 UTC (permalink / raw)
  To: Eric Wong; +Cc: René Scharfe, git, meta

Eric Wong <e@80x24.org> writes:

> I don't think postfix, SpamAssassin, or dovecot strip that..
> ...
> I would expect the blank context lines in the QP version to have
> "=20".  I sent this to test@public-inbox.org using
> `git send-email --transfer-encoding=quoted-printable' and it
> confirms the blank context lines having "=20" which are missing
> in René's messages:
> https://try.public-inbox.org/test/20220822165736.23246-1-e@80x24.org/raw

Yeah, that is a mystery.  Unlike format=flawed, I think CTE-QP was
designed to be reversible without information loss.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2022-08-22 18:25 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-22  2:25 `git patch-id --stable' vs quoted-printable Eric Wong
2022-08-22  4:06 ` Junio C Hamano
2022-08-22  4:18   ` Junio C Hamano
2022-08-22  4:57     ` Eric Wong
2022-08-22 15:58   ` René Scharfe
2022-08-22 16:21     ` Junio C Hamano
2022-08-22 17:01       ` Eric Wong
2022-08-22 18:25         ` Junio C Hamano

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).