lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <Y0gQbdbo5dC7IQkL@lunn.ch>
Date:   Thu, 13 Oct 2022 15:19:41 +0200
From:   Andrew Lunn <andrew@...n.ch>
To:     Thorsten Glaser <t.glaser@...ent.de>
Cc:     Eric Dumazet <eric.dumazet@...il.com>,
        Dave Taht <dave.taht@...il.com>, netdev@...r.kernel.org
Subject: Re: RFH, where did I go wrong?

On Wed, Oct 12, 2022 at 11:56:11PM +0200, Thorsten Glaser wrote:
> On Wed, 12 Oct 2022, Andrew Lunn wrote:
> 
> > > Ooh! Will try! That’s what I get for getting, ahem, inspiration
> > > from other qdiscs.
> > 
> > Are other qdiscs also missing RTNL, or are you just using the
> > inspiration in a different context?
> 
> I think I was probably confused between which of the functions can
> be used when. Eric explained the why. What I was missing was… well,
> basically what I asked for weeks ago — what functions I need to
> provide when writing a qdisc, and which guarantees and expectations
> these have. That rtnl is held for… apparently all but enqueue/dequeue…
> was one of these. I doubt other qdiscs miss it, or their users would
> also run into this crash or so :/

Not all broken code, specially with locks, causes a simple to
reproduce crash. So i was hoping that now you have gained an
understanding of what you did wrong, maybe you can see other places
which make the same error, possibly from where you cut/pasted, which
might not yet of crashed. Thats another way of say, if you have found
a bug, look around and see if the same bug exists somewhere else
nearby.

> The thing I did first was to add ASSERT_RTNL(); directly before the
> rtnl_* call, just like it was in the other place. That, of course,
> crashed immediately. Now *this* could be done systematically.
> 
> In OpenBSD, things like that are often hidden behind #if DIAGNOSTIC
> which is a global option, disabled in “prod” or space-constrained
> (installer) kernels but enabled for the “generic” one for wide testing.
> Something to think about?

The network stack generally falls into two parts. The fast path tries
its best to avoid anything expensive, like mutexes. It uses spinlocks
if it needs any sort of locking. And there is the rest, which is
mostly the control plain, which is the slow path. It is protected by
RTNL. As you said above, "That rtnl is held for… apparently all but
enqueue/dequeue…" enqueue/dequeue are the fast path, so RTNL is not
required, the rest is mostly control plan, so RTNL is probably
required. Also, since it is the slow path, adding ASSERT_RTNL() is
fine, and it does not need to be hidden behind #if DIAGNOSTIC. Linux
does have some debug code hidden behind ifdefs, see the kernel hacking
section of make menuconfig. The sleep in atomic context test is a good
example of this. While doing development work, you want to turn on
most of the checks in this section. That sleep in atomic check will
probably help you find cases where you hold RTNL but should not. The
lock checking will help you find potential deadlocks with RTNL and any
other locks you might use, etc.

	Andrew

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ