linux-kernel - Re: [PATCH v2] docs: add backporting and conflict resolution document

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <12b082c8-908f-4de2-b0b5-4b638e10c402@oracle.com>
Date:   Sat, 14 Oct 2023 13:48:54 +0200
From:   Vegard Nossum <vegard.nossum@...cle.com>
To:     Willy Tarreau <w@....eu>
Cc:     Jonathan Corbet <corbet@....net>, linux-doc@...r.kernel.org,
        linux-kernel@...r.kernel.org, stable@...r.kernel.org,
        backports@...r.kernel.org,
        Harshit Mogalapalli <harshit.m.mogalapalli@...cle.com>,
        Bagas Sanjaya <bagasdotme@...il.com>,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        Stephen Rothwell <sfr@...b.auug.org.au>,
        "Jason A . Donenfeld" <Jason@...c4.com>,
        Konstantin Ryabitsev <konstantin@...uxfoundation.org>,
        Steven Rostedt <rostedt@...dmis.org>,
        Ben Hutchings <ben@...adent.org.uk>
Subject: Re: [PATCH v2] docs: add backporting and conflict resolution document


On 14/10/2023 11:43, Willy Tarreau wrote:
> Hi Vegard,
> 
> On Fri, Oct 13, 2023 at 05:24:31PM +0200, Vegard Nossum wrote:
>> I've now added Steven Rostedt and Willy Tarreau as well on the
>> off-chance that they have something to say about it (Steven presented
>> his conflict resolution method at Kernel Recipes and I think Willy is
>> experienced with backporting), but this is in no way meant as pressure
>> to review this patch. Here's a link to the top of the thread:
>>
>> https://lore.kernel.org/all/20230824092325.1464227-1-vegard.nossum@oracle.com/

(Adding Ben Hutchings to Cc as well for the same reasons.)

> That's a very nice description, I'm sure it can help (and I learned a
> few points there already). There are a few points I'm not seeing there,
> though, base on my habits:

Thanks for the quick and comprehensive response!

>    - in my experience, there's a huge difference between backporting
>      code you don't know and code you know. I'm dealing with haproxy
>      backports several times a week and I tend to know this code, so I
>      use my intuition a lot and have no problem adjusting stuff around
>      the conflicting parts. However when I was dealing with extended
>      kernels, that was not the case at all, because I didn't know that
>      code, and worse, I wasn't skilled at all on many of the parts I had
>      to deal with. While it's OK to take the blame for a failed backport,
>      it's generally not OK to expose users to risks caused by your lack
>      of knowledge. In this case it means you need to be extra cautious,
>      and very often to actually *ask* authors or maintainers for help.
>      If maintainers do not want to help backporting some patches to an
>      older version of their code, usually it should be perceived as a
>      hint that they'll find it complicated to do it right; then take that
>      as a hint that there's little chances you'll get it right by yourself
>      while ignoring that code. This may imply dropping the fix, documenting
>      the area as broken, or asking for help on various lists until someone
>      more knowledgeable can help.

I agree -- backports ARE very easy to get wrong, EVEN when you know the
code well. This is stressed several times in the document, especially in
the last two sections about build and runtime testing, but also in the
section on error handling.

However, I'm wary of being too stern as well. There are already a
million ways to introduce subtle bugs and put users at risk, but we
rarely try to put people off contributing regular patches (at least in
this specific way :-P).

Did you see this meme? https://i.imgur.com/yk5rf13.png

I think conflicts have a bit of a bad reputation exactly because you're
presented with something that's hard to make sense of at first sight.
I'd like to dispel this myth that you need to be an expert to make sense
of conflict markers. I think with the right attitude, the right tools,
and the right approach you can go a LONG way. Also, nobody is born an
expert and we should encourage people to gain experience in this area IMHO.

With that said, how about if we add a short section near the end about
submitting stable backports where we encourage submitters to 1) approach
the backporting process in a humble way, 2) describe the reason for the
conflict in the changelog and their resolution, 3) be honest about their
confidence in their resolution, 4) ask relevant maintainers for an
explicit ack?

I'm open to other ideas, I just want to make sure we strike the right
balance of encouragement vs. discouragement.

>    - the tool that helped me the most in resolving rename conflicts is
>      "patch". As you explained, "git am" is a bit stubborn. But patch is
>      way more lenient (and will often do mistakes). In the very old 2.6.32
>      I used to rely on "git show XX | patch -p1" way more often than
>      "git am". For a rename, you do "git show XX -- file |patch otherfile".
>      It works the same with file-based patches or mbox: "patch -p1 < mbox".
>      Patch however will not place the conflict markers and will leave .rej
>      files. I then opened them in an editor next to the file to edit, to
>      locate the area and copy the relevant part to its location. Emacs'
>      ediff is also extremely convenient to pick select parts of each file.
> 
>    - control the patches: after I'm pretty sure I have resolved a patch,
>      I compare it side by side with the original one. Normally, backported
>      patches should have the same structure as the original. Using whatever
>      editor supporting a vertical split helps a lot. Otherwise I also use
>      "diff -y --width 200" between them to focus on differences (typically
>      line numbers). It's not rare that a part is missing, either because
>      I messed up, or because I forgot to process a .rej that I mistakenly
>      removed, or because a file was added, I reset the tree and it's left
>      untracked. So any difference between the patches should have its own
>      explanation (line numbers, function names, file names, occurrences).
>      By the way, it can very easily happen that applying a patch will work
>      fine but it will apply to the wrong function because some code patterns
>      especially in error unrolling paths are often the same between several
>      functions. It happened to me quite a few times in the past, and even
>      a few of these persisted till the public patch review. That's really
>      a risk that must not be minimized!

There is a section on this: "Verifying the result", and also describes
doing a side-by-side diff of the old and new patches.

The bit about applying the patch to the wrong function -- I doubt this
happens that much when using cherry-pick, since it knows both sides of
the history and can tell when code moves around. You're probably right
that it can easily happen with plain git am/patch though. In my mind,
this is an argument in favour of using cherry-pick whenever possible.

>    - something quite common is that as code evolves, it gets refactored
>      so that what used to appear at 3 locations in the past now moved to
>      a single function. But when you're backporting, you're doing the
>      reverse work, you're taking a patch for a single location that may
>      apply to multiple ones. Often the hint is that the function name
>      changed. But sometimes it's not even the case. If what you've found
>      looks like a nasty bug pattern that looks like it could easily have
>      been copy-pasted at other places, it's worth looking for it elsewhere
>      using git grep. If you find the same pattern, then you search for it
>      into the tree the patch comes from. If you don't find it, it's likely
>      that you'll need to adjust it, and git log is your friend to figure
>      what happened to these areas. Note that git blame doesn't work for
>      removed code. If you find the same pattern elsewhere in mainline, it's
>      worth asking the patch author if that one is also an occurrence of the
>      same bug or just normal. It's not uncommon to find new bugs during a
>      backport.

Very good point.

I think this fits very well alongside the sections on function
arguments, error handling, etc. since it details a specific thing that
can go easily wrong.

Can I take what you wrote above, or do you want to submit your own
incremental patch? I think we could insert it almost verbatim.

> 
>    - color diff: usually I just rely on:
> 
>       [color]
>           ui = true
> 
>      But I also recently got used to using diff-highlight that will
>      highlight different characters between lines. It's nice for complex
>      "if" conditions where you don't see the difference, or for spaces
>      vs tabs:
> 
>      [pager]
>          log = /usr/doc/git-2.35.3/contrib/diff-highlight/diff-highlight | less
>          show = /usr/doc/git-2.35.3/contrib/diff-highlight/diff-highlight | less
>          diff = /usr/doc/git-2.35.3/contrib/diff-highlight/diff-highlight | less

Right, git log/show/diff --word-diff=color can also do this to some degree.

There is also core.whitespace/color.diff.whitespace that will highlight
some common whitespace errors.

I haven't used diff-highlight myself -- I'll give it a try.

In this case, I was using colordiff specifically to do the side-by-side
diff of the two patches since it handles the <() shell syntax _and_ does
the highlighting of differences between the patches.

>    - git add, git add and git add: when fixing patches by hand, it's very
>      common to leave some parts not added (especially with | patch -p1).
>      It's useful to work on a clean tree to more easily spot untracked
>      files with "git status".

Yet another reason to use git cherry-pick instead of manual patch
commands: it keeps track of unmerged files for you. ;-)

So I'm a bit torn on this. I think in this particular document I'd like
to encourage the use of git and doing things "the git way" as much as
possible.

>> I feel like in the worst case, somebody sees the document down the line
>> and vehemently disagrees with something and we either fix it or take it
>> out completely.
> 
> No I don't disagree and even find it useful. If at least it could help
> people figure the pain it is to backport any single patch, and encourage
> them to help stable maintainers, that would already be awesome!
> 
>> I'd like to add that my impression is that a LOT of people *fear*
>> backporting and conflict resolution -- and it doesn't have to be that
>> way. We should be talking about merge conflicts and what good workflows
>> look like (one of the reasons why I was very happy to see Steven's
>> presentation at KR), instead of leaving everybody to figure it out on
>> their own. This document is my contribution towards that.
> 
> I'm not completely sold to this. Yes we should teach more people to
> perform that task themselves. But there's a big difference between
> backporting a few patches and feeling like you could maintain your own
> kernel because now you know how to resolve conflicts. What I mentioned
> above about dealing with patches you don't understand must not be
> underestimated, that's the biggest challenge I faced when working on
> stable kernels. There's probably a feeling of shame of not understanding
> something, but I can say that many times I asked for help and was helped
> even by top-ranked developers, and nobody ever laughed at me for not
> understanding a certain area. But doing that in your garage for your
> own kernel or for your company's products is a huge problem because it's
> unlikely that you'll get help from the maintainers this time, so you're
> left on your own with your own understanding of certain patches.
> 
> Thus, yes to backports, no to kernel forks being a collection of
> backports.

Right; almost every time I talk about backporting it's really in the
context of contributing these backports to stable. I'm not in favour of
forks either and I'm not trying to encourage it.

Let me try to come up with a specific addition related to the changes
you requested above and see if you agree with the wording.

Thanks,


Vegard