netdev - Re: missing retval check of call_netdevice_notifiers in dev_change_net

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87imfjt2qu.fsf@x220.int.ebiederm.org>
Date:   Mon, 22 Jun 2020 12:43:53 -0500
From:   ebiederm@...ssion.com (Eric W. Biederman)
To:     "Jason A. Donenfeld" <Jason@...c4.com>
Cc:     Jiri Pirko <jiri@...lanox.com>,
        Johannes Berg <johannes@...solutions.net>,
        David Miller <davem@...emloft.net>,
        Petr Machata <petrm@...lanox.com>,
        David Ahern <dsahern@...il.com>,
        Netdev <netdev@...r.kernel.org>,
        Herbert Xu <herbert@...dor.apana.org.au>
Subject: Re: missing retval check of call_netdevice_notifiers in dev_change_net_namespace

"Jason A. Donenfeld" <Jason@...c4.com> writes:

> Hi,

Adding Herbert Xu who added support for failing notifications in
fcc5a03ac425 ("[NET]: Allow netdev REGISTER/CHANGENAME events to fail").

He might have some insight but 2007 was a long time ago.

All I remember is the ability of NETDEV_CHANGENAME to fail was abused by
sysfs to add a userspace ABI regression, and sysfs had to be fixed
not to do that.


> In register_netdevice, there's a call to
> call_netdevice_notifiers(NETDEV_REGISTER), whose return value is
> checked to determine whether or not to roll back the device
> registration. The reason checking that return value is important is
> because this represents an opportunity for the various notifiers to
> veto the registration, or for another error to happen. It's good that
> the error value is checked when that function is called from
> register_netdevice. But from other functions, the error value is not
> always checked.
>
> The notification is split up into two stages:
>
>         ret = raw_notifier_call_chain(&net->netdev_chain, val, info);
>         if (ret & NOTIFY_STOP_MASK)
>                 return ret;
>         return raw_notifier_call_chain(&netdev_chain, val, info);
>
> One is per-net and the other is global. So there's ample space for
> something in either chain to abort the whole process.
>
> The wireguard module uses the notifier to keep track of netns changes
> in order to do some reference count bookkeeping. If this bookkeeping
> goes wrong, there's UaF potential. However, I noticed that the call to
> call_netdevice_notifiers(NETDEV_REGISTER) at the end of
> dev_change_net_namespace doesn't check its return value and doesn't
> implement any sort of rollback like register_netdevice does:
>
> int dev_change_net_namespace(struct net_device *dev, struct net *net,
> const char *pat)
> {
>        struct net *net_old = dev_net(dev);
>        int err, new_nsid, new_ifindex;
>
>        ASSERT_RTNL();
> [...]
>         /* Add the device back in the hashes */
>         list_netdevice(dev);
>
>         /* Notify protocols, that a new device appeared. */
>         call_netdevice_notifiers(NETDEV_REGISTER, dev);
> [...]
> }
>
> Notice that call_netdevice_notifiers isn't checking it's return value there.
>
> It seems like if any device vetoes the notification chain, it's bad
> news bears for modules that depend on getting a netns change
> notification.

In general we are talking a subsystem not a device that will veto
the change.  Implementations of devices should not need notifiers,
as they have enough methods.

It requires a level above a network device to care about your network
namespace.  Now wireguard happens to be both. So you care but I don't
believe an ordinary device will.

Nitpicks aside, if we are going to make sense of this we need to find
actualy notifiers that veto the change and look at them.

> I've been trying to audit the various registered notifiers to see if
> any of them pose a risk for wireguard. There are also unexpected
> errors that can happen, such as OOM conditions for kmalloc(GFP_KERNEL)
> or vmalloc and suchlike, which might be influenceable by attackers. In
> other words, relying on those notifications always being delivered
> seems kind of brittle. Not _super_ brittle, but brittle enough that
> it's at the moment making me a bit nervous. (See: UaF potential.)
>
> I've been trying to come up with a good solution to this.

The entire code consists of walking a linked list and calling the
callback function at each node on the list.   There is nothing
bad that can happen except list corruption and callback functions
that do silly things.  AKA implementation bugs.

There are no expected failures in walking a linked list.  So it should
be perfectly valid to rely on receiving your notification.

If callback function allocates memory or does something that is not 100%
reliable in a notifier that is a different issue and it is that
subystems problem.

> I'm not sure how reasonable it'd be to implement rollback inside of
> dev_change_net_namespace, but I guess that could technically be
> possible. The way that'd work would be that when vetoed, the function
> would complete, but then would start over again (a "goto top" sort of
> pattern), with oldnet and newnet reversed. But that could of course
> fail too and we could get ourselves in some sort of infinite loop. Not
> good.

Semantically it can't happen.  Once we reach dev_close we have caused
userspace visible effects that simply can not be undone.  So the code
has to validate that the change can happen before we it reaches
dev_close.

Which unfortunately makes the code incompatible with the concept of
black boxes on the notification chain adding failure cases.

It wouldn't hurt to add a warning:

	err = call_netdevice_notifiers(NETDEV_REGISTER, dev)
        WARN_ON_ONCE(err).

Just to validate the assumption that nothing ever happens.

> Finally, it could be the solution is that modules that use the netdev
> notifier never really rely too heavily upon getting notifications, and
> if they need something reliable, then they rearchitect their code to
> not need that. I could imagine this attitude holding sway here
> somehow, but it's also not very appealing from a code writing and
> refactoring perspective.

Perhaps we can architect the failure case out of NETDEV_REGISTER so
people don't have to worry about it.

> I figured before I go to town coding one of these up, maybe I should
> bring up the issue here, in case anybody has more developed thoughts
> than me about it.

I hope this gives you some insight.  I think this may be a case where
the code allows more than anyone has actually taken advantage of.

Eric