[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <8a2707a4-04d3-4bcf-b1ee-9a2ef15886a0@paulmck-laptop>
Date: Tue, 27 Jun 2023 16:41:37 -0700
From: "Paul E. McKenney" <paulmck@...nel.org>
To: Linus Torvalds <torvalds@...ux-foundation.org>
Cc: Dinh Nguyen <dinguyen@...nel.org>, Shuah Khan <shuah@...nel.org>,
linux-kernel@...r.kernel.org, linux@...ck-us.net,
vishal.moola@...il.com, akpm@...ux-foundation.org,
sfr@...b.auug.org.au
Subject: Re: [PATCH] Revert "nios2: Convert __pte_free_tlb() to use ptdescs"
On Tue, Jun 27, 2023 at 03:35:45PM -0700, Linus Torvalds wrote:
> On Tue, 27 Jun 2023 at 15:14, Dinh Nguyen <dinguyen@...nel.org> wrote:
> >
> > This reverts commit 6ebe94baa2b9ddf3ccbb7f94df6ab26234532734.
> >
> > The patch "nios2: Convert __pte_free_tlb() to use ptdescs" was supposed
> > to go together with a patchset that Vishal Moola had planned taking it
> > through the mm tree. By just having this patch, all NIOS2 builds are
> > broken.
>
> This is now at least the third time just this merge window where some
> base tree was broken, and people thought that linux-next is some kind
> of testing ground for it all.
>
> NO!
>
> Linux-next is indeed for testing, and for finding situations where
> there are interactions between different trees.
>
> But linux-next is *not* a replacement for "this tree has to work on
> its own". THAT testing needs to be done independently, and *before* a
> tree hits linux-next.
>
> It is *NOT* ok to say "this will work in combination with that other
> tree". EVERY SINGLE TREE needs to work on its own, because otherwise
> you cannot bisect the end result sanely.
>
> We apparently had the NIOS2 tree being broken. And the RCU tree was
> broken. And the KUnit tree was broken.
>
> In all those cases, the base tree did not compile properly on its own,
> and linux-next "magically fixed" it by either having Stephen Rothwell
> literally fix the build breakage by hand, or by having some other tree
> hide the problem.
>
> This is very much not ok.
>
> I'm not sure why it happened so much this release, but this needs to
> stop. People need to realize that you can't just throw shit at the
> wall and see if it sticks. You need to test your own trees *first*,
> and *independently* of other peoples trees.
>
> Then, if you have done basic testing, you can then have it in
> linux-next and that hopefully then finds any issues with bad
> interactions with other trees, and maybe also ends up getting more
> coverage testing on odd architectures and with odd configurations.
>
> But linux-next must not in *any* way be a replacement for doing basic
> testing on your own tree first.
On the off-chance that it helps someone else avoid my stupid mistakes,
here is exactly how I messed this up so badly:
1. This API-name-change series went well, except for the usual
lagging changes. This *should* not be a problem, as you
simply leave the old API in however long it takes for the
change to get in.
2. At some point -next was a single-argument kfree_rcu()-free zone.
So I queued the offending commit on the -rcu tree's rcu/next
branch, followed by a revert for my own testing. The idea was
to make new uses fail in -next testing.
So far, so good.
3. I noticed that -next was now free of kfree_rcu() calls.
At this point, I made three stupid mistakes:
a. I failed to wait for mainline itself to be free of the
single-argument kfree_rcu(), thus pulling the offending
single-argument kfree_rcu() removal commit into my pull
request a merge window too soon. This is of course
especially stupid since I tend to send you the RCU pull
request early.
b. I failed to identify exactly which -next commit eliminated
single-argument kfree_rcu(). Had I done so, I would
have seen that this was in fact Stephen's rcu/next
merge commit, which was never going to go to mainline.
c. Worst yet, out of force of habit, I left the revert
from #2 above in my testing, thus failing to see the
-rcu failure due to that remaining single-argument
kfree_rcu() call.
So a combination of three stupid mistakes on my part made the RCU
happen.
As you say, testing *exactly* the commit heading up the pull request
merged with your master branch would have spotted this, and I will
of course make sure that I do this in the future.
And again, please accept my apologies for this mess.
Thanx, Paul
Powered by blists - more mailing lists