netdev - Re: "virtio-net: enable multiqueue by default" in linux-next breaks networking on GCE

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <60cd312f-86f9-47e9-0c72-f4c2109e2f87@redhat.com>
Date:   Tue, 13 Dec 2016 11:43:00 +0800
From:   Jason Wang <jasowang@...hat.com>
To:     "Theodore Ts'o" <tytso@....edu>,
        "Michael S. Tsirkin" <mst@...hat.com>
Cc:     netdev@...r.kernel.org, nhorman@...driver.com, davem@...emloft.net
Subject: Re: "virtio-net: enable multiqueue by default" in linux-next breaks
 networking on GCE



On 2016年12月13日 11:12, Theodore Ts'o wrote:
> On Tue, Dec 13, 2016 at 04:28:17AM +0200, Michael S. Tsirkin wrote:
>> That's unfortunate, of course. It could be a hypervisor or
>> a guest kernel bug. ideas:
>> - does host have mq capability? how many queues?
>> - how about # of msix vectors?
>> - after you send something on tx queues,
>>    are interrupts arriving on rx queues?
>> - is problem rx or tx?
>>    set ip and arp manually and send a packet to known MAC,
>>    does it get there?
> Sorry, I don't know how to debug virtio-net.  Given that it's in a
> cloud environment, I also can't set ip addresses manually, since ip
> addresses are set manually.
>
> If you can send me a patch, I'm happy to apply it and send you back
> results.
>
> I can say that I've had _zero_ problems using pretty much any kernel
> from 3.10 to 4.9 using Google Compute Engine.  The commit I referenced
> caused things to stop working.  So in terms of regression, this is
> definitely a regression, and it's definitely caused by commit
> 449000102901.  Even if it is a hypervisor "bug", I'm pretty sure I
> know what Linus will say if I ask him to revert it.  Linux kernels are
> expected to work around hardware bugs, and breaking users just because
> hardware is "broken" by some definition is generally not considered
> friendly, especially when has been working for years and years before
> some commit "fixed" things.
>
> I would very much like to work with you to fix it, but I will need
> your help, since virtio-net doesn't seem to print any informational
> during the boot sequence, and I don't know how the best way to debug
> it.
>
> Cheers,
>
> 						- Ted

Thanks for reporting this issue. Looks like I blindly set the affinity 
instead of queues during probe. Could you please try the following patch 
to see if it works?

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index b425fa1..fe9f772 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -1930,7 +1930,9 @@ static int virtnet_probe(struct virtio_device *vdev)
                 goto free_unregister_netdev;
         }

-       virtnet_set_affinity(vi);
+       rtnl_lock();
+       virtnet_set_queues(vi, vi->curr_queue_pairs);
+       rtnl_unlock();

         /* Assume link up if device can't report link status,
            otherwise get link status from config. */