[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <4cd620bec54b0b57792dd3ffd874637ef97a77f1.camel@intel.com>
Date: Thu, 13 Feb 2020 17:18:57 +0000
From: "Boeuf, Sebastien" <sebastien.boeuf@...el.com>
To: "sgarzare@...hat.com" <sgarzare@...hat.com>
CC: "stefanha@...hat.com" <stefanha@...hat.com>,
"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
"davem@...emloft.net" <davem@...emloft.net>
Subject: Re: [PATCH] net: virtio_vsock: Fix race condition between bind and
listen
On Thu, 2020-02-13 at 18:14 +0100, Stefano Garzarella wrote:
> On Thu, Feb 13, 2020 at 5:51 PM Boeuf, Sebastien <
> sebastien.boeuf@...el.com> wrote:
> > On Thu, 2020-02-13 at 14:04 +0100, Stefano Garzarella wrote:
> > > On Thu, Feb 13, 2020 at 11:39:49AM +0000, Stefan Hajnoczi wrote:
> > > > On Thu, Feb 13, 2020 at 10:44:18AM +0000, Boeuf, Sebastien
> > > > wrote:
> > > > > On Thu, 2020-02-13 at 11:22 +0100, Stefano Garzarella wrote:
> > > > > > On Thu, Feb 13, 2020 at 09:51:36AM +0000, Boeuf, Sebastien
> > > > > > wrote:
> > > > > > > Hi Stefano,
> > > > > > >
> > > > > > > On Thu, 2020-02-13 at 10:41 +0100, Stefano Garzarella
> > > > > > > wrote:
> > > > > > > > Hi Sebastien,
> > > > > > > >
> > > > > > > > On Thu, Feb 13, 2020 at 09:16:11AM +0000, Boeuf,
> > > > > > > > Sebastien
> > > > > > > > wrote:
> > > > > > > > > From 2f1276d02f5a12d85aec5adc11dfe1eab7e160d6 Mon Sep
> > > > > > > > > 17
> > > > > > > > > 00:00:00
> > > > > > > > > 2001
> > > > > > > > > From: Sebastien Boeuf <sebastien.boeuf@...el.com>
> > > > > > > > > Date: Thu, 13 Feb 2020 08:50:38 +0100
> > > > > > > > > Subject: [PATCH] net: virtio_vsock: Fix race
> > > > > > > > > condition
> > > > > > > > > between
> > > > > > > > > bind
> > > > > > > > > and listen
> > > > > > > > >
> > > > > > > > > Whenever the vsock backend on the host sends a packet
> > > > > > > > > through
> > > > > > > > > the
> > > > > > > > > RX
> > > > > > > > > queue, it expects an answer on the TX queue.
> > > > > > > > > Unfortunately,
> > > > > > > > > there
> > > > > > > > > is one
> > > > > > > > > case where the host side will hang waiting for the
> > > > > > > > > answer
> > > > > > > > > and
> > > > > > > > > will
> > > > > > > > > effectively never recover.
> > > > > > > >
> > > > > > > > Do you have a test case?
> > > > > > >
> > > > > > > Yes I do. This has been a bug we've been investigating on
> > > > > > > Kata
> > > > > > > Containers for quite some time now. This was happening
> > > > > > > when
> > > > > > > using
> > > > > > > Kata
> > > > > > > along with Cloud-Hypervisor (which rely on the hybrid
> > > > > > > vsock
> > > > > > > implementation from Firecracker). The thing is, this bug
> > > > > > > is
> > > > > > > very
> > > > > > > hard
> > > > > > > to reproduce and was happening for Kata because of the
> > > > > > > connection
> > > > > > > strategy. The kata-runtime tries to connect a million
> > > > > > > times
> > > > > > > after
> > > > > > > it
> > > > > > > started the VM, just hoping the kata-agent will start to
> > > > > > > listen
> > > > > > > from
> > > > > > > the guest side at some point.
> > > > > >
> > > > > > Maybe is related to something else. I tried the following
> > > > > > which
> > > > > > should be
> > > > > > your case simplified (IIUC):
> > > > > >
> > > > > > guest$ python
> > > > > > import socket
> > > > > > s = socket.socket(socket.AF_VSOCK, socket.SOCK_STREAM)
> > > > > > s.bind((socket.VMADDR_CID_ANY, 1234))
> > > > > >
> > > > > > host$ python
> > > > > > import socket
> > > > > > s = socket.socket(socket.AF_VSOCK, socket.SOCK_STREAM)
> > > > > > s.connect((3, 1234))
> > > > > >
> > > > > > Traceback (most recent call last):
> > > > > > File "<stdin>", line 1, in <module>
> > > > > > TimeoutError: [Errno 110] Connection timed out
> > > > >
> > > > > Yes this is exactly the simplified case. But that's the
> > > > > point, I
> > > > > don't
> > > > > think the timeout is the best way to go here. Because this
> > > > > means
> > > > > that
> > > > > when we run into this case, the host side will wait for quite
> > > > > some time
> > > > > before retrying, which can cause a very long delay before the
> > > > > communication with the guest is established. By simply
> > > > > answering
> > > > > the
> > > > > host with a RST packet, we inform it that nobody's listening
> > > > > on
> > > > > the
> > > > > guest side yet, therefore the host side will close and try
> > > > > again.
> > > >
> > > > My expectation is that TCP/IP will produce ECONNREFUSED in this
> > > > case but
> > > > I haven't checked. Timing out is weird behavior.
> > >
> > > I just tried and yes, TCP/IP produces ECONNREFUSED. The same
> > > error
> > > returned
> > > when no one's bound to the port.
> > >
> > > Instead virtio-vsock returns ECONNRESET in the last case.
> > > I'm not sure it's correct (looking at the man page connect(2) it
> > > would
> > > seem not), but if I understood correctly VMCI returns the same
> > > ECONNRESET in this case.
> > >
> > > > In any case, the reference for virtio-vsock semantics is:
> > > > 1. How does VMCI (VMware) vsock behave? We strive to be
> > > > compatible
> > > > with
> > > > the VMCI transport.
> > >
> > > Looking at the code, it looks like VMCI returns ECONNRESET in
> > > this
> > > case,
> > > so this patch should be okay. (I haven't tried it)
> > >
> > > > 2. If there is no clear VMCI behavior, then we look at TCP/IP
> > > > because
> > > > those semantics are expected by most applications.
> > > >
> > > > This bug needs a test case in tools/testings/vsock/ and that
> > > > test
> > > > case
> > > > will run against VMCI, virtio-vsock, and Hyper-V. Doing that
> > > > will
> > > > answer the question of how VMCI handles this case.
> > >
> > > I agree.
> >
> > I'm trying to write the test but I'm kinda stuck as I have no way
> > to
>
> Great!
>
> > ensure the proper timing between the test on the server(guest) and
> > the
> > test on from the client(host).
> >
> > Basically I was thinking about creating a new test reusing
> > test_stream_connection_reset() from vsock_test.c. The reused
> > function
> > would be used for the run_client callback, while I would define a
> > new
> > function for the run_server. I wanted to basically do a bind only,
> > and
> > don't go up to the listen/accept calls.
> > Problem is, the server won't block after the bind is done, and I
> > don't
> > know how much the test should wait between the bind and the end.
> > I would like to show that even when the server reaches the point
> > where
> > the socket is bound, the connect will still return with ECONNRESET.
> > The
> > same test without the patch would simply hang till we hit the
> > timeout
> > on the client side.
> > Does that make sense? And how much time should I wait for after the
> > bind on the server side?
>
> You can use control_writeln() and control_expectln() from control.c
> to
> syncronize server and client. The control path uses a TCP/IP socket.
Oh that's cool! I completely missed this feature :)
Let me resume my work then!
>
> You can do something like this:
>
> server client
>
> s = socket(); s = socket();
> bind(s, ...);
> control_writeln("BIND");
> control_expectln("BIND");
> connect(s, ...);
> # check ret and errno
> control_writeln("DONE");
> control_expectln("DONE");
> close(s); close(s);
>
> Cheers,
> Stefano
>
---------------------------------------------------------------------
Intel Corporation SAS (French simplified joint stock company)
Registered headquarters: "Les Montalets"- 2, rue de Paris,
92196 Meudon Cedex, France
Registration Number: 302 456 199 R.C.S. NANTERRE
Capital: 4,572,000 Euros
This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.
Powered by blists - more mailing lists