[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Yvs2qqx/sG0C3zvz@bullseye>
Date: Tue, 16 Aug 2022 06:18:54 +0000
From: Bobby Eshleman <bobbyeshleman@...il.com>
To: "Michael S. Tsirkin" <mst@...hat.com>
Cc: Bobby Eshleman <bobby.eshleman@...il.com>,
Bobby Eshleman <bobby.eshleman@...edance.com>,
Cong Wang <cong.wang@...edance.com>,
Jiang Wang <jiang.wang@...edance.com>,
Stefan Hajnoczi <stefanha@...hat.com>,
Stefano Garzarella <sgarzare@...hat.com>,
Jason Wang <jasowang@...hat.com>,
"David S. Miller" <davem@...emloft.net>,
Eric Dumazet <edumazet@...gle.com>,
Jakub Kicinski <kuba@...nel.org>,
Paolo Abeni <pabeni@...hat.com>, kvm@...r.kernel.org,
virtualization@...ts.linux-foundation.org, netdev@...r.kernel.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH 3/6] vsock: add netdev to vhost/virtio vsock
On Tue, Aug 16, 2022 at 12:38:52PM -0400, Michael S. Tsirkin wrote:
> On Mon, Aug 15, 2022 at 10:56:06AM -0700, Bobby Eshleman wrote:
> > In order to support usage of qdisc on vsock traffic, this commit
> > introduces a struct net_device to vhost and virtio vsock.
> >
> > Two new devices are created, vhost-vsock for vhost and virtio-vsock
> > for virtio. The devices are attached to the respective transports.
> >
> > To bypass the usage of the device, the user may "down" the associated
> > network interface using common tools. For example, "ip link set dev
> > virtio-vsock down" lets vsock bypass the net_device and qdisc entirely,
> > simply using the FIFO logic of the prior implementation.
>
> Ugh. That's quite a hack. Mark my words, at some point we will decide to
> have down mean "down". Besides, multiple net devices with link up tend
> to confuse userspace. So might want to keep it down at all times
> even short term.
>
I have to admit, this choice was born more of perceived necessity than
of a love for the design... but I can explain the pain points that led
to the current state, which I hope sparks some discussion.
When the state is down, dev_queue_xmit() will fail. To avoid this and
preserve the "zero-configuration" guarantee of vsock, I chose to make
transmission work regardless of device state by implementing this
"ignore up/down state" hack.
This is unfortunate because what we are really after here is just packet
scheduling, i.e., qdisc. We don't really need the rest of the
net_device, and I don't think up/down buys us anything of value. The
relationship between qdisc and net_device is so tightly knit together
though, that using qdisc without a net_device doesn't look very
practical (and maybe impossible).
Some alternative routes might be:
1) Default the state to up, and let users disable vsock by downing the
device if they'd like. It still works out-of-the-box, but if users
really want to disable vsock they may.
2) vsock may simply turn the device to state up when a socket is first
used. For instance, the HCI device in net/bluetooth/hci_* uses a
technique where the net_device is turned to up when bind() is called on
any HCI socket (BTPROTO_HCI). It can also be turned up/down via
ioctl().
3) Modify net_device registration to allow us to have an invisible
device that is only known by the kernel. It may default to up and remain
unchanged. The qdisc controls alone may be exposed to userspace,
hopefully via netlink to still work with tc. This is not
currently supported by register_netdevice(), but a series from 2007 was
sent to the ML, tentatively approved in concept, but was never merged[1].
4) Currently NETDEV_UP/NETDEV_DOWN commands can't be vetoed.
NETDEV_PRE_UP, however, is used to effectively veto NETDEV_UP
commands[2]. We could introduce NETDEV_PRE_DOWN to support vetoing of
NETDEV_DOWN. This would allow us to install a hook to determine if
we actually want to allow the device to be downed.
In an ideal world, we could simply pass a set of netdev queues, a
packet, and maybe a blob of state to qdisc and let it work its
scheduling magic...
Any thoughts?
[1]: https://lore.kernel.org/netdev/20070129140958.0cf6880f@freekitty/
[2]: https://lore.kernel.org/all/20090529.220906.243061042.davem@davemloft.net/
Thanks,
Bobby
Powered by blists - more mailing lists