lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Mon, 2 Aug 2021 10:52:33 +0000
From:   Vladimir Oltean <vladimir.oltean@....com>
To:     Nikolay Aleksandrov <nikolay@...dia.com>
CC:     "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
        Jakub Kicinski <kuba@...nel.org>,
        "David S. Miller" <davem@...emloft.net>,
        Jiri Pirko <jiri@...nulli.us>,
        Ido Schimmel <idosch@...sch.org>,
        Roopa Prabhu <roopa@...dia.com>,
        "bridge@...ts.linux-foundation.org" 
        <bridge@...ts.linux-foundation.org>,
        syzkaller-bugs <syzkaller-bugs@...glegroups.com>,
        "syzbot+9ba1174359adba5a5b7c@...kaller.appspotmail.com" 
        <syzbot+9ba1174359adba5a5b7c@...kaller.appspotmail.com>
Subject: Re: [PATCH net] net: bridge: validate the NUD_PERMANENT bit when
 adding an extern_learn FDB entry

On Mon, Aug 02, 2021 at 12:42:17PM +0300, Nikolay Aleksandrov wrote:
> >>> Before, the two commands listed above both crashed the kernel in this
> >>> check from br_switchdev_fdb_notify:
> >>>
> >>
> >> Not before 52e4bec15546 though, the check used to be:
> >> struct net_device *dev = dst ? dst->dev : br->dev;
> >
> > "Before", as in "before this patch, on net-next/linux-next".
> >
>
> We still need that check, more below.
>
> >> which wouldn't crash. So the fixes tag below is incorrect, you could
> >> add a weird extern learn entry, but it wouldn't crash the kernel.
> >
> > :)
> >
> > Is our only criterion whether a patch is buggy or not that it causes a
> > NULL pointer dereference inside the kernel?
> >
> > I thought I'd mention the interaction with the net-next work for the
> > sake of being thorough, and because this is how the syzcaller caught it
> > by coincidence, but "kernel does not treat an FDB entry with the
> > 'permanent' flag as permanent" is enough of a reason to submit this as a
>
> Not exactly right, you may add it as permanent but it doesn't get "permanent" flag set.

And that is the bug I am addressing here, no?

> The actual bug is that it points to the bridge device, e.g. null dst without the flag.
>
> > bug fix for the commit that I pointed to. Granted, I don't have any use
> > case with extern_learn, so probably your user space programs simply
> > don't add permanent FDB entries, but as this is the kernel UAPI, it
> > should nevertheless do whatever the user space is allowed to say. For a
> > permanent FDB entry, that behavior is to stop forwarding for that MAC
> > DA, and that behavior obviously was not taking place even before any
> > change in br_switchdev_fdb_notify(), or even with CONFIG_NET_SWITCHDEV=n.
> >
>
> Actually I believe there is still a bug in 52e4bec15546 even with this fix.
> The flag can change after the dst has been read in br_switchdev_fdb_notify()
> so in theory you could still do a null pointer dereference. fdb_notify()
> can be called from a few places without locking. The code shouldn't dereference
> the dst based on the flag.

Are you thinking of a specific code path that triggers a race between
(a) a writer side doing WRITE_ONCE(fdb->dst, NULL) and then
    set_bit(BR_FDB_LOCAL, &fdb->flags), exactly in this order, and
(b) a reader side catching that fdb exactly in between the above 2
    statements, through fdb_notify or otherwise (br_fdb_replay)?

Because I don't see any.

Plus, I am a bit nervous about protecting against theoretical/unproven
races in a way that masks real bugs, as we would be doing if I add an
extra check in br_fdb_replay_one and br_switchdev_fdb_notify against the
case where an entry has fdb->dst == NULL but not BR_FDB_LOCAL.

>
> I'm okay with this change due to the null dst without permanent flag fix, but
> it doesn't fully fix the null pointer dereference.

So is there any change that I should make to this patch?

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ