[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <1460477269.24107.1.camel@docker.com>
Date: Tue, 12 Apr 2016 17:07:49 +0100
From: Ian Campbell <ian.campbell@...ker.com>
To: "Michael S. Tsirkin" <mst@...hat.com>,
Stefan Hajnoczi <stefanha@...il.com>
Cc: Stefan Hajnoczi <stefanha@...hat.com>, kvm@...r.kernel.org,
netdev@...r.kernel.org, Matt Benjamin <mbenjamin@...hat.com>,
Christoffer Dall <christoffer.dall@...aro.org>,
Alex Bennée <alex.bennee@...aro.org>,
marius vlad <marius.vlad0@...il.com>, areis@...hat.com,
Claudio Imbrenda <imbrenda@...ux.vnet.ibm.com>,
Greg Kurz <gkurz@...ux.vnet.ibm.com>,
virtualization@...ts.linux-foundation.org
Subject: Re: [RFC v5 0/5] Add virtio transport for AF_VSOCK
Some how Stefan's reply disapeared from my INBOX (although I did see
it) so replying here.
On Mon, 2016-04-11 at 15:54 +0300, Michael S. Tsirkin wrote:
> On Mon, Apr 11, 2016 at 11:45:48AM +0100, Stefan Hajnoczi wrote:
> >
> > On Fri, Apr 08, 2016 at 04:35:05PM +0100, Ian Campbell wrote:
> > >
> > > On Fri, 2016-04-01 at 15:23 +0100, Stefan Hajnoczi wrote:
> > > >
> > > > This series is based on Michael Tsirkin's vhost branch (v4.5-rc6).
> > > >
> > > > I'm about to process Claudio Imbrenda's locking fixes for virtio-vsock but
> > > > first I want to share the latest version of the code. Several people are
> > > > playing with vsock now so sharing the latest code should avoid duplicate work.
> > > Thanks for this, I've been using it in my project and it mostly seems
> > > fine.
> > >
> > > One wrinkle I came across, which I'm not sure if it is by design or a
> > > problem is that I can see this sequence coming from the guest (with
> > > other activity in between):
> > >
> > > 1) OP_SHUTDOWN w/ flags == SHUTDOWN_RX
> > > 2) OP_SHUTDOWN w/ flags == SHUTDOWN_TX
> > > 3) OP_SHUTDOWN w/ flags == SHUTDOWN_TX|SHUTDOWN_RX
> > >
> > > I orignally had my backend close things down at #2, however this meant
> > > that when #3 arrived it was for a non-existent socket (or, worse, an
> > > active one if the ports got reused). I checked v5 of the spec
> > > proposal[0] which says:
> > > If these bits are set and there are no more virtqueue buffers
> > > pending the socket is disconnected.
> > >
> > > but I'm not entirely sure if this behaviour contradicts this or not
> > > (the bits have both been set at #2, but not at the same time).
> > >
> > > BTW, how does one tell if there are no more virtqueue buffers pending
> > > or not while processing the op?
> > #2 is odd. The shutdown bits are sticky so they cannot be cleared once
> > set. I would have expected just #1 and #3. The behavior you observe
> > look like a bug.
> >
> > The spec text does not convey the meaning of OP_SHUTDOWN well.
> > OP_SHUTDOWN SHUTDOWN_TX|SHUTDOWN_RX means no further rx/tx is possible
> > for this connection. "there are no more virtqueue buffers pending the
> > socket" really means that this isn't an immediate close from the
> > perspective of the application. If the application still has unread rx
> > buffers then the socket stays readable until the rx data has been fully
> > read.
Thanks, distinguishing the local buffer to the application from the
vring would make that clearer. Perhaps by not talking about "virtqueue
buffers" since they sound like a vring thing.
However, as Michael observes I'm not sure that's the whole story.
> Yes but you also wrote:
> If these bits are set and there are no more virtqueue buffers
> pending the socket is disconnected.
>
> how does remote know that there are no buffers pending and so it's safe
> to reuse the same source/destination address now?
Indeed this is one of the things I struggled with. e.g. If I send a
SHUTDOWN_RX to my peer am I supposed to wait for that buffer to come
back (so I know the peer has seen it) and then wait for an entire
"cycle" of the TX ring to know there is nothing still in flight? That's
some tricky book-keeping.
> Maybe destination
> should send RST at that point?
i.e. upon receipt of SHUTDOWN_RX|SHUTDOWN_TX from the peer you are
expected to send a RST. When the peer observes that then they know
there is no further data in that connection on the ring?
That sounds like it would be helpful.
> > > Another thing I noticed, which is really more to do with the generic
> > > AF_VSOCK bits than anything to do with your patches is that there is no
> > > limitations on which vsock ports a non-privileged user can bind to and
> > > relatedly that there is no netns support so e.g. users in unproivileged
> > > containers can bind to any vsock port and talk to the host, which might
> > > be undesirable. For my use for now I just went with the big hammer
> > > approach of denying access from anything other than init_net
> > > namespace[1] while I consider what the right answer is.
> > From the vhost point of view each netns should have its own AF_VSOCK
> > namespace. This way two containers could act as "the host" (CID 2) for
> > their respective guests.
When you say "should" you mean that's the intended design as opposed to
what the current code is actually doing, right?
Ian.
Powered by blists - more mailing lists