lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 17 Oct 2018 14:18:23 +0800
From:   Jason Wang <jasowang@...hat.com>
To:     ake <ake@...l.co.jp>
Cc:     "Michael S. Tsirkin" <mst@...hat.com>,
        "David S. Miller" <davem@...emloft.net>,
        virtualization@...ts.linux-foundation.org, netdev@...r.kernel.org,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH] virtio_net: enable tx after resuming from suspend


On 2018/10/16 下午6:15, ake wrote:
>
> On 2018年10月16日 17:53, Jason Wang wrote:
>> On 2018/10/15 下午6:08, ake wrote:
>>> On 2018年10月12日 18:18, ake wrote:
>>>> On 2018年10月12日 17:23, Jason Wang wrote:
>>>>> On 2018年10月12日 12:30, ake wrote:
>>>>>> On 2018年10月11日 22:06, Jason Wang wrote:
>>>>>>> On 2018年10月11日 18:22, ake wrote:
>>>>>>>> On 2018年10月11日 18:44, Jason Wang wrote:
>>>>>>>>> On 2018年10月11日 15:51, Ake Koomsin wrote:
>>>>>>>>>> commit 713a98d90c5e ("virtio-net: serialize tx routine during
>>>>>>>>>> reset")
>>>>>>>>>> disabled the virtio tx before going to suspend to avoid a use
>>>>>>>>>> after
>>>>>>>>>> free.
>>>>>>>>>> However, after resuming, it causes the virtio_net device to
>>>>>>>>>> lose its
>>>>>>>>>> network connectivity.
>>>>>>>>>>
>>>>>>>>>> To solve the issue, we need to enable tx after resuming.
>>>>>>>>>>
>>>>>>>>>> Fixes commit 713a98d90c5e ("virtio-net: serialize tx routine
>>>>>>>>>> during
>>>>>>>>>> reset")
>>>>>>>>>> Signed-off-by: Ake Koomsin <ake@...l.co.jp>
>>>>>>>>>> ---
>>>>>>>>>>       drivers/net/virtio_net.c | 1 +
>>>>>>>>>>       1 file changed, 1 insertion(+)
>>>>>>>>>>
>>>>>>>>>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
>>>>>>>>>> index dab504ec5e50..3453d80f5f81 100644
>>>>>>>>>> --- a/drivers/net/virtio_net.c
>>>>>>>>>> +++ b/drivers/net/virtio_net.c
>>>>>>>>>> @@ -2256,6 +2256,7 @@ static int virtnet_restore_up(struct
>>>>>>>>>> virtio_device *vdev)
>>>>>>>>>>           }
>>>>>>>>>>             netif_device_attach(vi->dev);
>>>>>>>>>> +    netif_start_queue(vi->dev);
>>>>>>>>> I believe this is duplicated with netif_tx_wake_all_queues() in
>>>>>>>>> netif_device_attach() above?
>>>>>>>> Thank you for your review.
>>>>>>>>
>>>>>>>> If both netif_tx_wake_all_queues() and netif_start_queue() result in
>>>>>>>> clearing __QUEUE_STATE_DRV_XOFF, then is it possible that some
>>>>>>>> conditions in netif_device_attach() is not satisfied?
>>>>>>> Yes, maybe. One case I can see now is when the device is down, in
>>>>>>> this
>>>>>>> case netif_device_attach() won't try to wakeup the queue.
>>>>>>>
>>>>>>>>      Without
>>>>>>>> netif_start_queue(), the virtio_net device does not resume properly
>>>>>>>> after waking up.
>>>>>>> How do you trigger the issue? Just do suspend/resume?
>>>>>> Yes, simply suspend and resume.
>>>>>>
>>>>>> Here is how I trigger the issue:
>>>>>>
>>>>>> 1) Start the Virtual Machine Manager GUI program.
>>>>>> 2) Create a guest Linux OS. Make sure that the guest OS kernel is
>>>>>>       >= 4.12. Make sure that it uses virtio_net as its network device.
>>>>>>       In addition, make sure that the video adapter is VGA. Otherwise,
>>>>>>       waking up with the virtual power button does not work.
>>>>>> 3) After installing the guest OS, log in, and test the network
>>>>>>       connectivity by ping the host machine.
>>>>>> 4) Suspend. After this, the screen is blank.
>>>>>> 5) Resume by hitting the virtual power button. The login screen
>>>>>>       appears again.
>>>>>> 6) Log in again. The guest loses its network connection.
>>>>>>
>>>>>> In my test:
>>>>>> Guest: Ubuntu 16.04/18.04 with kernel 4.15.0-36-generic
>>>>>> Host: Ubuntu 16.04 with kernel 4.15.0-36-generic/4.4.0-137-generic
>>>>> I can not reproduce this issue if virtio-net interface is up in guest
>>>>> before the suspend. I'm using net-next.git and qemu master. But I do
>>>>> reproduce when virtio-net interface is down in guest before suspend,
>>>>> after resume, even if I make it up, the network is still lost.
>>>>>
>>>>> I think the interface is up in your case, but please confirm this.
>>>> If you mean the interface state before I hit the suspend button,
>>>> the answer is yes. The interface is up before I suspend the guest
>>>> machine.
>>>>
>>>> Note that my current QEMU version is QEMU emulator version 2.5.0
>>>> (Debian 1:2.5+dfsg-5ubuntu10.32).
>>>>
>>>> I will try with net-next.git and qemu master later and see if I can
>>>> reproduce the issue.
>>> Update. I tried with net-next and qemu master. Interestingly, the result
>>> is different from yours. The network is lost even if the virtio_net
>>> interface is up before suspending.
>>>
>>> Host: Ubuntu 16.04 with net-next kernel (default configuration)
>>> Guest: Ubuntu 18.04 with net-next kernel (default configuration)
>>> Qemu: master
>>> Qemu command:
>>> qemu-system-x86_64 -cpu host -m 2048 -enable-kvm \
>>> -bios /usr/share/OVMF/OVMF_CODE.fd \
>>> -drive file=/var/lib/libvirt/images/virtio_test.qcow2,if=virtio \
>>> -netdev user,id=hostnet0 \
>>> -device virtio-net-pci,netdev=hostnet0 \
>>> -device VGA,id=video0,vgamem_mb=16 \
>>> -global PIIX4_PM.disable_s3=1 \
>>> -global PIIX4_PM.disable_s4=1 -monitor stdio
>>
>> Interesting, just notice you're using userspace network. To isolate the
>> issue, can you retry with e.g tap or e1000 to make sure it's not a fault
>> of slirp or virito-net?
> I will try.
>
>> Thanks
>>
> There is another thing that I want to discuss. I notice that
> netif_device_detach() should result in setting __QUEUE_STATE_DRV_XOFF if
> the network interface is running. By calling netif_tx_disable() after
> netif_device_detach(), isn't it redundant in case of the network
> interface is running? If the goal is to serialize tx routine, would
> netif_tx_lock() and net_tx_unlock() are more appropriate? Like this:
>
> netif_tx_lock(vi->dev);
> netif_device_detach(vi->dev);
> netif_tx_unlock(vi->dev);
>
> Currently, netif_tx_disable() seems to disturb the symmetry of
> netif_device_detach() and netif_device_attach(). That is the reason
> why you can reproduce the problem when the interface is down before
> suspending.


Yes I agree.

Thanks

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ