[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20161213031243.avq5g5m5r5ylcnnk@thunk.org>
Date: Mon, 12 Dec 2016 22:12:43 -0500
From: Theodore Ts'o <tytso@....edu>
To: "Michael S. Tsirkin" <mst@...hat.com>
Cc: jasowang@...hat.com, netdev@...r.kernel.org, nhorman@...driver.com,
davem@...emloft.net
Subject: Re: "virtio-net: enable multiqueue by default" in linux-next breaks
networking on GCE
On Tue, Dec 13, 2016 at 04:28:17AM +0200, Michael S. Tsirkin wrote:
>
> That's unfortunate, of course. It could be a hypervisor or
> a guest kernel bug. ideas:
> - does host have mq capability? how many queues?
> - how about # of msix vectors?
> - after you send something on tx queues,
> are interrupts arriving on rx queues?
> - is problem rx or tx?
> set ip and arp manually and send a packet to known MAC,
> does it get there?
Sorry, I don't know how to debug virtio-net. Given that it's in a
cloud environment, I also can't set ip addresses manually, since ip
addresses are set manually.
If you can send me a patch, I'm happy to apply it and send you back
results.
I can say that I've had _zero_ problems using pretty much any kernel
from 3.10 to 4.9 using Google Compute Engine. The commit I referenced
caused things to stop working. So in terms of regression, this is
definitely a regression, and it's definitely caused by commit
449000102901. Even if it is a hypervisor "bug", I'm pretty sure I
know what Linus will say if I ask him to revert it. Linux kernels are
expected to work around hardware bugs, and breaking users just because
hardware is "broken" by some definition is generally not considered
friendly, especially when has been working for years and years before
some commit "fixed" things.
I would very much like to work with you to fix it, but I will need
your help, since virtio-net doesn't seem to print any informational
during the boot sequence, and I don't know how the best way to debug
it.
Cheers,
- Ted
Powered by blists - more mailing lists