lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <53E079C8.7050700@huawei.com>
Date:	Tue, 5 Aug 2014 14:29:28 +0800
From:	"Zhangjie (HZ)" <zhangjie14@...wei.com>
To:	Jason Wang <jasowang@...hat.com>,
	"Michael S. Tsirkin" <mst@...hat.com>
CC:	<netdev@...r.kernel.org>, <qinchuanyu@...wei.com>,
	<liuyongan@...wei.com>, <davem@...emloft.net>
Subject: Re: Query: Is it possible  to lose interrupts between vhost and virtio_net
 during migration?

Jason is right, the new order is not the cause of network unreachable.
Changing order seems not work. After about 40 times, the problem occurs again.
Maybe there is other hidden reasons for that.

On 2014/8/1 19:14, Jason Wang wrote:
> On 08/01/2014 06:47 PM, Jason Wang wrote:
>> On 07/31/2014 10:37 PM, Michael S. Tsirkin wrote:
>>>> On Thu, Jul 31, 2014 at 04:31:00PM +0200, Michael S. Tsirkin wrote:
>>>>>>>> On Thu, Jul 31, 2014 at 07:47:24PM +0800, Zhangjie (HZ) wrote:
>>>>>>>>>>>> [The test scenario]:
>>>>>>>>>>>>
>>>>>>>>>>>> Doing migration between two Hosts roundly(A->B, B->A) ,after about 20 times, network of the VM is unreachable.
>>>>>>>>>>>> There are other 20 VMs in each Host, and they send ipv4 or ipv6 and multicast packets to each other.
>>>>>>>>>>>> Sometimes the CPU idle of the Host maybe 0;
>>>>>>>>>>>>
>>>>>>>>>>>> [Problem description]:
>>>>>>>>>>>>
>>>>>>>>>>>> I wonder if it was interrupts missing that cause the network unreachable.
>>>>>>>>>>>> In the migration process of kvm, source end should suspend, which include steps as follows:
>>>>>>>>>>>> 1.	do_vm_stop->pause_all_vcpus
>>>>>>>>>>>> 2.	vm_state_notify-> vhost_net_stop->set_guest_notifiers->kvm_virtio_pci_vq_vector_release
>>>>>>>>>>>> 3.	vm_state_notify-> vhost_net_stop-> vhost_net_stop_one->OST_NET_SET_BACKEND-> vhost_net_flush_vq-> vhost_work_flush
>>>>>>>>>>>> This may cause interrupts missing. Supose the scene that, virtqueue_notify() is called in virtio_net,
>>>>>>>>>>>> then the VM is paused. And, just before the portiowrite being handled, eventfd of kvm is released.
>>>>>>>>>>>> Then, vhost could not sense the notify, and the tx notify is lost.
>>>>>>>>>>>> On the other side, if eventfd of kvm is released just after vhost_notify(), and before eventfd_signal(), then rx signal by vhost is lost.
>>>>>>>>
>>>>>>>> Could be a bug in userspace: should should cleanups notifiers
>>>>>>>> after it stops vhost.
>>>>>>>>
>>>>>>>> Could you please send this to appropriate mailing lists?
>>>>>>>> I have a policy against off-list discussions.
>>>> Also, Jason, could you take a look please?
>>>> Looks like your patch a9f98bb5ebe6fb1869321dcc58e72041ae626ad8
>>>> changed the order of stopping the device.
>>>> Previously vhost_dev_stop would disable backend and only afterwards,
>>>> unset guest notifiers.  You now unset guest notifiers while vhost is still
>>>> active. Looks like this can lose events?
>> Not sure it will really cause the issue. Since during guest notifier
>> deassign in virtio_queue_set_guest_notifier_fd_handler() it will test
>> the notifier and trigger callback if set. Looks like this can guarantee
>> the interrupt was not lost.
> 
> More thought on this, looks like it was still a window between guest
> notifiers disabling and vhost_net stopping.
> 
> Please Zhang Jie test the patch of changing its order and if it works,
> sends a formal patch to qemu-devel.
> 
> btw, vhost_scsi may need the fix as well since it may meet the same issue.
> 
> Thanks
> .
> 

-- 
Best Wishes!
Zhang Jie

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ