netdev - Re: RFH, where did I go wrong?

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20221013073812.mvbs6wo64b4yr5cc@skbuf>
Date:   Thu, 13 Oct 2022 10:38:12 +0300
From:   Vladimir Oltean <olteanv@...il.com>
To:     Thorsten Glaser <t.glaser@...ent.de>
Cc:     Andrew Lunn <andrew@...n.ch>,
        Eric Dumazet <eric.dumazet@...il.com>,
        Dave Taht <dave.taht@...il.com>, netdev@...r.kernel.org
Subject: Re: RFH, where did I go wrong?

On Wed, Oct 12, 2022 at 11:56:11PM +0200, Thorsten Glaser wrote:
> The thing I did first was to add ASSERT_RTNL(); directly before the
> rtnl_* call, just like it was in the other place. That, of course,
> crashed immediately. Now *this* could be done systematically.
> 
> In OpenBSD, things like that are often hidden behind #if DIAGNOSTIC
> which is a global option, disabled in “prod” or space-constrained
> (installer) kernels but enabled for the “generic” one for wide testing.
> Something to think about?
> 
> I’m sure there’s lots of things like flow analysis around in the
> Linux world, however that wouldn’t help out-of-tree code being
> developed, whereas extra checks like that would. Just some thoughts,
> as said earlier this is basically¹ my start in Linux kernel dev.

Not trying to defend the Qdisc framework which has poor to non-existing
documentation, but with some minimal level of experience you'd kind of
expect that a function named rtnl_kfree_skbs() expects a calling context
where the rtnl_mutex is held. It is even more clear that this is the
case when you notice that in its implementation, there is a single
defer_kfree_skb_list global to the kernel, which is processed in
__rtnl_unlock().

I think as a takeaway from your debugging journey, you can add an
ASSERT_RTNL() to rtnl_kfree_skbs() and add a comment documenting the
function. You can use the commit message of 1b5c5493e3e6 ("net_sched:
add the ability to defer skb freeing") as inspiration for the function
description.