[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <3448e588f11dad913e93dfce8031fbd60ba4c85b.camel@intel.com>
Date: Thu, 13 Feb 2020 09:51:36 +0000
From: "Boeuf, Sebastien" <sebastien.boeuf@...el.com>
To: "sgarzare@...hat.com" <sgarzare@...hat.com>
CC: "stefanha@...hat.com" <stefanha@...hat.com>,
"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
"davem@...emloft.net" <davem@...emloft.net>
Subject: Re: [PATCH] net: virtio_vsock: Fix race condition between bind and
listen
Hi Stefano,
On Thu, 2020-02-13 at 10:41 +0100, Stefano Garzarella wrote:
> Hi Sebastien,
>
> On Thu, Feb 13, 2020 at 09:16:11AM +0000, Boeuf, Sebastien wrote:
> > From 2f1276d02f5a12d85aec5adc11dfe1eab7e160d6 Mon Sep 17 00:00:00
> > 2001
> > From: Sebastien Boeuf <sebastien.boeuf@...el.com>
> > Date: Thu, 13 Feb 2020 08:50:38 +0100
> > Subject: [PATCH] net: virtio_vsock: Fix race condition between bind
> > and listen
> >
> > Whenever the vsock backend on the host sends a packet through the
> > RX
> > queue, it expects an answer on the TX queue. Unfortunately, there
> > is one
> > case where the host side will hang waiting for the answer and will
> > effectively never recover.
>
> Do you have a test case?
Yes I do. This has been a bug we've been investigating on Kata
Containers for quite some time now. This was happening when using Kata
along with Cloud-Hypervisor (which rely on the hybrid vsock
implementation from Firecracker). The thing is, this bug is very hard
to reproduce and was happening for Kata because of the connection
strategy. The kata-runtime tries to connect a million times after it
started the VM, just hoping the kata-agent will start to listen from
the guest side at some point.
>
> In the host, the af_vsock.c:vsock_stream_connect() set a timeout, so
> if
> the host try to connect before the guest starts listening, the
> connect()
> should return ETIMEDOUT if the guest does not answer anything.
>
> Anyway, maybe the patch make sense anyway, changing a bit the
> description
> (if the host connect() receive the ETIMEDOUT).
> I'm just concerned that this code is common between guest and host.
> If a
> malicious guest starts sending us wrong requests, we spend time
> sending
> a reset packet. But we already do that if we can't find the bound
> socket,
> so it might make sense.
Yes I don't think this is gonna cause more trouble, but at least we
cannot end up in this weird situation I described.
I was just not sure if the function we should use to do the reset
should be virtio_transport_reset_no_sock() or virtio_transport_reset()
since at this point the socket is already bound.
Thanks,
Sebastien
>
> Thanks,
> Stefano
>
> > This issue happens when the guest side starts binding to the
> > socket,
> > which insert a new bound socket into the list of already bound
> > sockets.
> > At this time, we expect the guest to also start listening, which
> > will
> > trigger the sk_state to move from TCP_CLOSE to TCP_LISTEN. The
> > problem
> > occurs if the host side queued a RX packet and triggered an
> > interrupt
> > right between the end of the binding process and the beginning of
> > the
> > listening process. In this specific case, the function processing
> > the
> > packet virtio_transport_recv_pkt() will find a bound socket, which
> > means
> > it will hit the switch statement checking for the sk_state, but the
> > state won't be changed into TCP_LISTEN yet, which leads the code to
> > pick
> > the default statement. This default statement will only free the
> > buffer,
> > while it should also respond to the host side, by sending a packet
> > on
> > its TX queue.
> >
> > In order to simply fix this unfortunate chain of events, it is
> > important
> > that in case the default statement is entered, and because at this
> > stage
> > we know the host side is waiting for an answer, we must send back a
> > packet containing the operation VIRTIO_VSOCK_OP_RST.
> >
> > Signed-off-by: Sebastien Boeuf <sebastien.boeuf@...el.com>
> > ---
> > net/vmw_vsock/virtio_transport_common.c | 1 +
> > 1 file changed, 1 insertion(+)
> >
> > diff --git a/net/vmw_vsock/virtio_transport_common.c
> > b/net/vmw_vsock/virtio_transport_common.c
> > index e5ea29c6bca7..909334d58328 100644
> > --- a/net/vmw_vsock/virtio_transport_common.c
> > +++ b/net/vmw_vsock/virtio_transport_common.c
> > @@ -1143,6 +1143,7 @@ void virtio_transport_recv_pkt(struct
> > virtio_transport *t,
> > virtio_transport_free_pkt(pkt);
> > break;
> > default:
> > + (void)virtio_transport_reset_no_sock(t, pkt);
> > virtio_transport_free_pkt(pkt);
> > break;
> > }
> > --
> > 2.20.1
> >
> > -----------------------------------------------------------------
> > ----
> > Intel Corporation SAS (French simplified joint stock company)
> > Registered headquarters: "Les Montalets"- 2, rue de Paris,
> > 92196 Meudon Cedex, France
> > Registration Number: 302 456 199 R.C.S. NANTERRE
> > Capital: 4,572,000 Euros
> >
> > This e-mail and any attachments may contain confidential material
> > for
> > the sole use of the intended recipient(s). Any review or
> > distribution
> > by others is strictly prohibited. If you are not the intended
> > recipient, please contact the sender and delete all copies.
---------------------------------------------------------------------
Intel Corporation SAS (French simplified joint stock company)
Registered headquarters: "Les Montalets"- 2, rue de Paris,
92196 Meudon Cedex, France
Registration Number: 302 456 199 R.C.S. NANTERRE
Capital: 4,572,000 Euros
This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.
Powered by blists - more mailing lists