netdev - Re: [PATCH 0/3] using guard/_

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20240329102308.GM651713@kernel.org>
Date: Fri, 29 Mar 2024 10:23:08 +0000
From: Simon Horman <horms@...nel.org>
To: Willem de Bruijn <willemdebruijn.kernel@...il.com>
Cc: Jakub Kicinski <kuba@...nel.org>,
	Johannes Berg <johannes@...solutions.net>, netdev@...r.kernel.org
Subject: Re: [PATCH 0/3] using guard/__free in networking

On Tue, Mar 26, 2024 at 11:23:19AM -0400, Willem de Bruijn wrote:
> Jakub Kicinski wrote:
> > On Tue, 26 Mar 2024 09:42:43 +0100 Johannes Berg wrote:
> > > On Mon, 2024-03-25 at 19:09 -0700, Jakub Kicinski wrote:
> > > > On Mon, 25 Mar 2024 23:31:25 +0100 Johannes Berg wrote:  
> > > > > Hi,
> > > > > 
> > > > > So I started playing with this for wifi, and overall that
> > > > > does look pretty nice, but it's a bit weird if we can do
> > > > > 
> > > > >   guard(wiphy)(&rdev->wiphy);
> > > > > 
> > > > > or so, but still have to manually handle the RTNL in the
> > > > > same code.  
> > > > 
> > > > Dunno, it locks code instead of data accesses.  
> > > 
> > > Well, I'm not sure that's a fair complaint. After all, without any more
> > > compiler help, even rtnl_lock()/rtnl_unlock() _necessarily_ locks code.
> > > Clearly
> > > 
> > > 	rtnl_lock();
> > > 	// something
> > > 	rtnl_unlock();
> > > 
> > > also locks the "// something" code, after all., and yeah that might be
> > > doing data accesses, but it might also be a function call or a whole
> > > bunch of other things?
> > > 
> > > Or if you look at something like bpf_xdp_link_attach(), I don't think
> > > you can really say that it locks only data. That doesn't even do the
> > > allocation outside the lock (though I did convert that one to
> > > scoped_guard because of that.)
> > > 
> > > Or even something simple like unregister_netdev(), it just requires the
> > > RTNL for some data accesses and consistency deep inside
> > > unregister_netdevice(), not for any specific data accessed there.
> > > 
> > > So yeah, this is always going to be a trade-off, but all the locking is.
> > > We even make similar trade-offs manually, e.g. look at
> > > bpf_xdp_link_update(), it will do the bpf_prog_put() under the RTNL
> > > still, for no good reason other than simplifying the cleanup path there.
> > 
> > At least to me the mental model is different. 99% of the time the guard
> > is covering the entire body. So now we're moving from "I'm touching X
> > so I need to lock" to "This _function_ is safe to touch X".
> > 
> > > Anyway, I can live with it either way (unless you tell me you won't pull
> > > wireless code using guard), just thought doing the wireless locking with
> > > guard and the RTNL around it without it (only in a few places do we
> > > still use RTNL though) looked odd.
> > > 
> > > 
> > > > Forgive the comparison but it feels too much like Java to me :)  
> > > 
> > > Heh. Haven't used Java in 20 years or so...
> > 
> > I only did at uni, but I think they had a decorator for a method, where
> > you can basically say "this method should be under lock X" and runtime
> > will take that lock before entering and drop it after exit,
> > appropriately. I wonder why the sudden love for this concept :S
> > Is it also present in Rust or some such?
> > 
> > > > scoped_guard is fine, the guard() not so much.  
> > > 
> > > I think you can't get scoped_guard() without guard(), so does that mean
> > > you'd accept the first patch in the series?
> > 
> > How can we get one without the other.. do you reckon Joe P would let us
> > add a checkpatch check to warn people against pure guard() under net/ ?
> > 
> > > > Do you have a piece of code in wireless where the conversion
> > > > made you go "wow, this is so much cleaner"?  
> > > 
> > > Mostly long and complex error paths. Found a double-unlock bug (in
> > > iwlwifi) too, when converting some locking there.
> > > 
> > > Doing a more broader conversion on cfg80211/mac80211 removes around 200
> > > lines of unlocking, mostly error handling, code.
> > > 
> > > Doing __free() too will probably clean up even more.
> > 
> > Not super convinced by that one either:
> > https://lore.kernel.org/all/20240321185640.6f7f4d6b@kernel.org/
> > maybe I'm too conservative..
> 
> +1 on the concept (fwiw).
> 
> Even the simple examples, such as unregister_netdevice_notifier_net,
> show how it avoids boilerplate and so simplifies control flow.
> 
> That benefit multiplies with the number of resources held and number
> of exit paths. Or in our case, gotos and (unlock) labels.
> 
> Error paths are notorious for seeing little test coverage and leaking
> resources. This is an easy class of bugs that this RAII squashes.

+1

While I'm ambivalent to the constructs that have been proposed,
I do see leaking resources in error paths as a very common pattern.
Especially in new code. Making that easier to get right seems
worthwhile to me.

> 
> Sprinkling guard statements anywhere in the scope itself makes it
> perhaps hard to follow. Perhaps a heuristic would be to require these
> statements at the start of scope (after variable declaration)?
> 
> Function level decorators could further inform static analysis.
> But that is somewhat tangential.
>