netdev - Re: BUG: KASAN: use-after-free in free_old_xmit

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <4f6d4342-8476-882e-e7a8-b3f0ebd20d95@redhat.com>
Date:   Fri, 23 Jun 2017 16:43:13 +0800
From:   Jason Wang <jasowang@...hat.com>
To:     "Michael S. Tsirkin" <mst@...hat.com>,
        jean-philippe menil <jpmenil@...il.com>
Cc:     netdev@...r.kernel.org, John Fastabend <john.fastabend@...il.com>,
        virtualization@...ts.linux-foundation.org, qemu-devel@...gnu.org
Subject: Re: BUG: KASAN: use-after-free in free_old_xmit_skbs



On 2017年06月23日 02:53, Michael S. Tsirkin wrote:
> On Thu, Jun 22, 2017 at 08:15:58AM +0200, jean-philippe menil wrote:
>> 2017-06-06 1:52 GMT+02:00 Michael S. Tsirkin <mst@...hat.com>:
>>
>>      On Mon, Jun 05, 2017 at 05:08:25AM +0300, Michael S. Tsirkin wrote:
>>      > On Mon, Jun 05, 2017 at 12:48:53AM +0200, Jean-Philippe Menil wrote:
>>      > > Hi,
>>      > >
>>      > > while playing with xdp and ebpf, i'm hitting the following:
>>      > >
>>      > > [  309.993136]
>>      > > ==================================================================
>>      > > [  309.994735] BUG: KASAN: use-after-free in
>>      > > free_old_xmit_skbs.isra.29+0x2b7/0x2e0 [virtio_net]
>>      > > [  309.998396] Read of size 8 at addr ffff88006aa64220 by task sshd/323
>>      > > [  310.000650]
>>      > > [  310.002305] CPU: 1 PID: 323 Comm: sshd Not tainted 4.12.0-rc3+ #2
>>      > > [  310.004018] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
>>      BIOS
>>      > > 1.10.2-20170228_101828-anatol 04/01/2014
> ...
>
>>      >
>>      > Since commit 680557cf79f82623e2c4fd42733077d60a843513
>>      >     virtio_net: rework mergeable buffer handling
>>      >
>>      > we no longer must do the resets, we now have enough space
>>      > to store a bit saying whether a buffer is xdp one or not.
>>      >
>>      > And that's probably a cleaner way to fix these issues than
>>      > try to find and fix the race condition.
>>      >
>>      > John?
>>      >
>>      > --
>>      > MST
>>
>>
>>      I think I see the source of the race. virtio net calls
>>      netif_device_detach and assumes no packets will be sent after
>>      this point. However, all it does is stop all queues so
>>      no new packets will be transmitted.
>>
>>      Try locking with HARD_TX_LOCK?
>>     
>>
>>      --
>>      MST
>>
>>
>> Hi Michael,
>>
>> from what i see, the race appear when we hit virtnet_reset in virtnet_xdp_set.
>> virtnet_reset
>>    _remove_vq_common
>>      virtnet_del_vqs
>>        virtnet_free_queues
>>          kfree(vi->sq)
>> when the xdp program (with two instances of the program to trigger it faster)
>> is added or removed.
>>
>> It's easily repeatable, with 2 cpus and 4 queues on the qemu command line,
>> running the xdp_ttl tool from Jesper.
>>
>> For now, i'm able to continue my qualification, testing if xdp_qp is not null,
>> but do not seem to be a sustainable trick.
>> if (xdp_qp && vi->xdp_queues_pairs != xdp_qp)
>>
>> Maybe it will be more clear to you with theses informations.
>>
>> Best regards.
>>
>> Jean-Philippe
>
> I'm pretty clear about the issue here, I was trying to figure out a fix.
> Jason, any thoughts?
>
>

Hi Jean:

Does the following fix this issue? (I can't reproduce it locally through 
xdp_ttl)

Thanks

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 1f8c15c..3e65c3f 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -1801,7 +1801,9 @@ static void virtnet_freeze_down(struct 
virtio_device *vdev)
         /* Make sure no work handler is accessing the device */
         flush_work(&vi->config_work);

+       netif_tx_lock_bh(vi->dev);
         netif_device_detach(vi->dev);
+       netif_tx_unlock_bh(vi->dev);
         cancel_delayed_work_sync(&vi->refill);