[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <bc945405-7269-b71b-4e2a-6ba290af70ea@gmail.com>
Date: Tue, 27 Jun 2017 14:35:21 +0200
From: Jean-Philippe Menil <jpmenil@...il.com>
To: Jason Wang <jasowang@...hat.com>,
Cong Wang <xiyou.wangcong@...il.com>
Cc: Linux Kernel Network Developers <netdev@...r.kernel.org>,
virtualization@...ts.linux-foundation.org,
John Fastabend <john.fastabend@...il.com>,
qemu-devel Developers <qemu-devel@...gnu.org>,
"Michael S. Tsirkin" <mst@...hat.com>
Subject: Re: [Qemu-devel] BUG: KASAN: use-after-free in free_old_xmit_skbs
On 06/27/2017 04:13 AM, Jason Wang wrote:
>
>
> On 2017年06月26日 15:35, Jean-Philippe Menil wrote:
>> On 06/26/2017 04:50 AM, Jason Wang wrote:
>>>
>>>
>>> On 2017年06月24日 06:32, Cong Wang wrote:
>>>> On Fri, Jun 23, 2017 at 1:43 AM, Jason Wang <jasowang@...hat.com>
>>>> wrote:
>>>>>
>>>>> On 2017年06月23日 02:53, Michael S. Tsirkin wrote:
>>>>>> On Thu, Jun 22, 2017 at 08:15:58AM +0200, jean-philippe menil wrote:
>>>>>>> Hi Michael,
>>>>>>>
>>>>>>> from what i see, the race appear when we hit virtnet_reset in
>>>>>>> virtnet_xdp_set.
>>>>>>> virtnet_reset
>>>>>>> _remove_vq_common
>>>>>>> virtnet_del_vqs
>>>>>>> virtnet_free_queues
>>>>>>> kfree(vi->sq)
>>>>>>> when the xdp program (with two instances of the program to
>>>>>>> trigger it
>>>>>>> faster)
>>>>>>> is added or removed.
>>>>>>>
>>>>>>> It's easily repeatable, with 2 cpus and 4 queues on the qemu command
>>>>>>> line,
>>>>>>> running the xdp_ttl tool from Jesper.
>>>>>>>
>>>>>>> For now, i'm able to continue my qualification, testing if xdp_qp
>>>>>>> is not
>>>>>>> null,
>>>>>>> but do not seem to be a sustainable trick.
>>>>>>> if (xdp_qp && vi->xdp_queues_pairs != xdp_qp)
>>>>>>>
>>>>>>> Maybe it will be more clear to you with theses informations.
>>>>>>>
>>>>>>> Best regards.
>>>>>>>
>>>>>>> Jean-Philippe
>>>>>>
>>>>>> I'm pretty clear about the issue here, I was trying to figure out
>>>>>> a fix.
>>>>>> Jason, any thoughts?
>>>>>>
>>>>>>
>>>>> Hi Jean:
>>>>>
>>>>> Does the following fix this issue? (I can't reproduce it locally
>>>>> through
>>>>> xdp_ttl)
>>>> It is tricky here.
>>>>
>>>> From my understanding of the code base, the tx_lock is not sufficient
>>>> here, because in virtnet_del_vqs() all vqs are deleted and one vp
>>>> maps to one txq.
>>>>
>>>> I am afraid you have to add a spinlock somewhere to serialized
>>>> free_old_xmit_skbs() vs. vring_del_virtqueue(). As you can see
>>>> they are in different layers, so it is hard to figure out where to add
>>>> it...
>>>>
>>>> Also, make sure we don't sleep inside the spinlock, I see a
>>>> synchronize_net().
>>>
>>> Looks like I miss something. I thought free_old_xmit_skbs() were
>>> serialized in this case since we disable all tx queues after
>>> netif_tx_unlock_bh()?
>>>
>>> Jean:
>>>
>>> I thought this could be easily reproduced by e.g produce some traffic
>>> and in the same time try to attach an xdp program. But looks not. How
>>> do you trigger this? What's your qemu command line for this?
>>>
>>> Thanks
>>
>> Hi Jason,
>>
>> this is how i trigger the bug:
>> - on the guest, tcpdump on on the interface
>> - on the guest, run iperf against the host
>> - on the guest, cat /sys/kernel/debug/tracing/trace_pipe
>> - on the guest, run one or two instances of xdp_ttl compiled with
>> DEBUG uncommented, that i start stop, until i trigger the bug.
>>
>> qemu command line is as follow:
>>
>> qemu-system-x86_64 -name ubuntu --enable-kvm -machine pc,accel=kvm
>> -smp 2 -drive file=/dev/LocalDisk/ubuntu,if=virtio,format=raw -m 2048
>> -rtc base=localtime,clock=host -usbdevice tablet --balloon virtio
>> -netdev
>> tap,id=ubuntu-0,ifname=ubuntu-0,script=/home/jenfi/WORK/jp/qemu/if-up,downscript=/home/jenfi/WORK/jp/qemu/if-down,vhost=on,queues=4
>> -device
>> virtio-net-pci,netdev=ubuntu-0,mac=de:ad:be:ef:01:03,mq=on,guest_tso4=off,guest_tso6=off,guest_ecn=off,guest_ufo=off,vectors=2
>> -vnc 127.0.0.1:3 -nographic -serial
>> file:/home/jenfi/WORK/jp/qemu/ubuntu.out -monitor
>> unix:/home/jenfi/WORK/jp/qemu/ubuntu.sock,server,nowait
>>
>> Notice, the smp 2, queues to 4 and vectors to 2.
>> Seem that if fogot to mention that in the beginning of this thread,
>> sorry for that.
>>
>> Best regards.
>>
>> Jean-Philippe
>>
>
> Thanks Jean, I manage to reproduce the issue.
>
> I thought netif_tx_unlock_bh() will do tx lock but looks not, that's why
> previous patch doesn't work.
>
> Could you please this this patch? (At least it can't trigger the warning
> after more than 20 times of xdp start/stop).
>
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 1f8c15c..a18f859 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -1802,6 +1802,7 @@ static void virtnet_freeze_down(struct
> virtio_device *vdev)
> flush_work(&vi->config_work);
>
> netif_device_detach(vi->dev);
> + netif_tx_disable(vi->dev);
> cancel_delayed_work_sync(&vi->refill);
>
> if (netif_running(vi->dev)) {
>
>
Hi Jason,
Seem to do the trick !
with your patch, i'm unable to repeat the problem anymore (running more
than 2h without any issue).
Best regards.
Jean-Philippe
Powered by blists - more mailing lists