netdev - Re: Query: Is it possible to lose interrupts between vhost and virtio

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <53DB705F.2000405@redhat.com>
Date:	Fri, 01 Aug 2014 18:47:59 +0800
From:	Jason Wang <jasowang@...hat.com>
To:	"Michael S. Tsirkin" <mst@...hat.com>,
	"Zhangjie (HZ)" <zhangjie14@...wei.com>
CC:	netdev@...r.kernel.org, qinchuanyu@...wei.com,
	liuyongan@...wei.com, davem@...emloft.net
Subject: Re: Query: Is it possible  to lose interrupts between vhost and virtio_net
 during migration?

On 07/31/2014 10:37 PM, Michael S. Tsirkin wrote:
> On Thu, Jul 31, 2014 at 04:31:00PM +0200, Michael S. Tsirkin wrote:
>> > On Thu, Jul 31, 2014 at 07:47:24PM +0800, Zhangjie (HZ) wrote:
>>> > > [The test scenario]:
>>> > > 
>>> > > Doing migration between two Hosts roundly(A->B, B->A) ,after about 20 times, network of the VM is unreachable.
>>> > > There are other 20 VMs in each Host, and they send ipv4 or ipv6 and multicast packets to each other.
>>> > > Sometimes the CPU idle of the Host maybe 0;
>>> > > 
>>> > > [Problem description]:
>>> > > 
>>> > > I wonder if it was interrupts missing that cause the network unreachable.
>>> > > In the migration process of kvm, source end should suspend, which include steps as follows:
>>> > > 1.	do_vm_stop->pause_all_vcpus
>>> > > 2.	vm_state_notify-> vhost_net_stop->set_guest_notifiers->kvm_virtio_pci_vq_vector_release
>>> > > 3.	vm_state_notify-> vhost_net_stop-> vhost_net_stop_one->OST_NET_SET_BACKEND-> vhost_net_flush_vq-> vhost_work_flush
>>> > > This may cause interrupts missing. Supose the scene that, virtqueue_notify() is called in virtio_net,
>>> > > then the VM is paused. And, just before the portiowrite being handled, eventfd of kvm is released.
>>> > > Then, vhost could not sense the notify, and the tx notify is lost.
>>> > > On the other side, if eventfd of kvm is released just after vhost_notify(), and before eventfd_signal(), then rx signal by vhost is lost.
>> > 
>> > Could be a bug in userspace: should should cleanups notifiers
>> > after it stops vhost.
>> > 
>> > Could you please send this to appropriate mailing lists?
>> > I have a policy against off-list discussions.
> Also, Jason, could you take a look please?
> Looks like your patch a9f98bb5ebe6fb1869321dcc58e72041ae626ad8
> changed the order of stopping the device.
> Previously vhost_dev_stop would disable backend and only afterwards,
> unset guest notifiers.  You now unset guest notifiers while vhost is still
> active. Looks like this can lose events?

Not sure it will really cause the issue. Since during guest notifier
deassign in virtio_queue_set_guest_notifier_fd_handler() it will test
the notifier and trigger callback if set. Looks like this can guarantee
the interrupt was not lost.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html