lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20211029121324.GT2744544@nvidia.com>
Date:   Fri, 29 Oct 2021 09:13:24 -0300
From:   Jason Gunthorpe <jgg@...dia.com>
To:     "Ziyang Xuan (William)" <william.xuanziyang@...wei.com>
Cc:     Jakub Kicinski <kuba@...nel.org>, davem@...emloft.net,
        netdev@...r.kernel.org, linux-kernel@...r.kernel.org,
        linux-rdma@...r.kernel.org
Subject: Re: [PATCH net] net: vlan: fix a UAF in vlan_dev_real_dev()

On Fri, Oct 29, 2021 at 03:04:35PM +0800, Ziyang Xuan (William) wrote:
> > On Thu, 28 Oct 2021 08:45:03 -0300 Jason Gunthorpe wrote:
> >>> But will make all the callers of vlan_dev_real_dev() feel like they
> >>> should NULL-check the result, which is not necessary.  
> >>
> >> Isn't it better to reliably return NULL instead of a silent UAF in
> >> this edge case? 
> > 
> > I don't know what the best practice is for maintaining sanity of
> > unregistered objects.
> > 
> > If there really is a requirement for the real_dev pointer to be sane we
> > may want to move the put_device(real_dev) to vlan_dev_free(). There
> > should not be any risk of circular dependency but I'm not 100% sure.
> > 
> >>> RDMA must be calling this helper on a vlan which was already
> >>> unregistered, can we fix RDMA instead?  
> >>
> >> RDMA holds a get on the netdev which prevents unregistration, however
> >> unregister_vlan_dev() does:
> >>
> >>         unregister_netdevice_queue(dev, head);
> >>         dev_put(real_dev);
> >>
> >> Which corrupts the still registered vlan device while it is sitting in
> >> the queue waiting to unregister. So, it is not true that a registered
> >> vlan device always has working vlan_dev_real_dev().
> > 
> > That's not my reading, unless we have a different definition of
> > "registered". The RDMA code in question runs from a workqueue, at the
> > time the UNREGISTER notification is generated all objects are still
> > alive and no UAF can happen. Past UNREGISTER extra care is needed when
> > accessing the object.
> > 
> > Note that unregister_vlan_dev() may queue the unregistration, without
> > running it. If it clears real_dev the UNREGISTER notification will no
> > longer be able to access real_dev, which used to be completely legal.
> > .
> > 
> 
> I am sorry. I have made a misunderstanding and given a wrong conclusion
> that unregister_vlan_dev() just move the vlan_ndev to a list to unregister
> later and it is possible the real_dev has been freed when we access in
> netdevice_queue_work().
> 
> real_ndev UNREGISTE trigger NETDEV_UNREGISTER notification in
> vlan_device_event(), unregister_vlan_dev() and unregister_netdevice_many()
> are within real_ndev UNREGISTE process. real_dev and vlan_ndev are all
> alive before real_ndev UNREGISTE finished.
> 
> Above is the correction for my previous misunderstanding. But the real
> scenario of the problem is as following:
> 
> __rtnl_newlink
> vlan_newlink
> register_vlan_dev(vlan_ndev, ...)
> register_netdevice(vlan_ndev)
> netdevice_queue_work(..., vlan_ndev) [dev_hold(vlan_ndev)]
> queue_work(gid_cache_wq, ...)

This is exactly what I'm saying, the rdma code saw a registered device
and captured a ref on it, passing it to a work queue.

> rtnl_configure_link(vlan_ndev, ...) [failed]
> ops->dellink(vlan_ndev, &list_kill) [unregister_vlan_dev]
	/* Get rid of the vlan's reference to real_dev */
	dev_put(real_dev);
> unregister_netdevice_many(&list_kill)

Then it released the real_dev reference, leaving a dangled pointer and
goes into unregister_netdevice_many which does:

		dev->reg_state = NETREG_UNREGISTERING;
and eventually

		net_set_todo(dev);

then unlocks RTNL. The get prevents it from progressing past
NETREG_UNREGISTERING

Now later we touch the vlan dev, it is reg_state UNREGISTERED and it's
memory is corrupted because it dropped the ref it was holding on the
pointer it returns, which has now since been freed.

The only reason the dangled pointer doesn't cause larger problems, is
because rtnl saves it - but continuing to reference a pointer that no
longer has a valid ref is certainly a bad practice.

> So my first solution as following for the problem is correct.
> https://lore.kernel.org/linux-rdma/20211025163941.GA393143@nvidia.com/T/#m44abbf1ea5e4b5237610c1b389c3340d92a03b8d

No, it still isn't.

Jakub's path would be to test vlan_dev->reg_state != NETREG_REGISTERED
in the work queue, but that feels pretty hacky to me as the main point
of the UNREGISTERING state is to keep the object alive enough that
those with outstanding gets can compelte their work and release the
get. Leaving a wrecked object in UNREGISTERING is a bad design.

Jason

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ