lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 27 Jun 2023 16:41:37 -0700
From:   "Paul E. McKenney" <paulmck@...nel.org>
To:     Linus Torvalds <torvalds@...ux-foundation.org>
Cc:     Dinh Nguyen <dinguyen@...nel.org>, Shuah Khan <shuah@...nel.org>,
        linux-kernel@...r.kernel.org, linux@...ck-us.net,
        vishal.moola@...il.com, akpm@...ux-foundation.org,
        sfr@...b.auug.org.au
Subject: Re: [PATCH] Revert "nios2: Convert __pte_free_tlb() to use ptdescs"

On Tue, Jun 27, 2023 at 03:35:45PM -0700, Linus Torvalds wrote:
> On Tue, 27 Jun 2023 at 15:14, Dinh Nguyen <dinguyen@...nel.org> wrote:
> >
> > This reverts commit 6ebe94baa2b9ddf3ccbb7f94df6ab26234532734.
> >
> > The patch "nios2: Convert __pte_free_tlb() to use ptdescs" was supposed
> > to go together with a patchset that Vishal Moola had planned taking it
> > through the mm tree. By just having this patch, all NIOS2 builds are
> > broken.
> 
> This is now at least the third time just this merge window where some
> base tree was broken, and people thought that linux-next is some kind
> of testing ground for it all.
> 
> NO!
> 
> Linux-next is indeed for testing, and for finding situations where
> there are interactions between different trees.
> 
> But linux-next is *not* a replacement for "this tree has to work on
> its own". THAT testing needs to be done independently, and *before* a
> tree hits linux-next.
> 
> It is *NOT* ok to say "this will work in combination with that other
> tree". EVERY SINGLE TREE needs to work on its own, because otherwise
> you cannot bisect the end result sanely.
> 
> We apparently had the NIOS2 tree being broken. And the RCU tree was
> broken. And the KUnit tree was broken.
> 
> In all those cases, the base tree did not compile properly on its own,
> and linux-next "magically fixed" it by either having Stephen Rothwell
> literally fix the build breakage by hand, or by having some other tree
> hide the problem.
> 
> This is very much not ok.
> 
> I'm not sure why it happened so much this release, but this needs to
> stop. People need to realize that you can't just throw shit at the
> wall and see if it sticks. You need to test your own trees *first*,
> and *independently* of other peoples trees.
> 
> Then, if  you have done basic testing, you can then have it in
> linux-next and that hopefully then finds any issues with bad
> interactions with other trees, and maybe also ends up getting more
> coverage testing on odd architectures and with odd configurations.
> 
> But linux-next must not in *any* way be a replacement for doing basic
> testing on your own tree first.

On the off-chance that it helps someone else avoid my stupid mistakes,
here is exactly how I messed this up so badly:

1.	This API-name-change series went well, except for the usual
	lagging changes.  This *should* not be a problem, as you
	simply leave the old API in however long it takes for the
	change to get in.

2.	At some point -next was a single-argument kfree_rcu()-free zone.
	So I queued the offending commit on the -rcu tree's rcu/next
	branch, followed by a revert for my own testing.  The idea was
	to make new uses fail in -next testing.

	So far, so good.

3.	I noticed that -next was now free of kfree_rcu() calls.

	At this point, I made three stupid mistakes:

	a.	I failed to wait for mainline itself to be free of the
		single-argument kfree_rcu(), thus pulling the offending
		single-argument kfree_rcu() removal commit into my pull
		request a merge window too soon.  This is of course
		especially stupid since I tend to send you the RCU pull
		request early.

	b.	I failed to identify exactly which -next commit eliminated
		single-argument kfree_rcu().  Had I done so, I would
		have seen that this was in fact Stephen's rcu/next
		merge commit, which was never going to go to mainline.

	c.	Worst yet, out of force of habit, I left the revert
		from #2 above in my testing, thus failing to see the
		-rcu failure due to that remaining single-argument
		kfree_rcu() call.

So a combination of three stupid mistakes on my part made the RCU
happen.

As you say, testing *exactly* the commit heading up the pull request
merged with your master branch would have spotted this, and I will
of course make sure that I do this in the future.

And again, please accept my apologies for this mess.

							Thanx, Paul

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ