[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAGxU2F6v99haTtnGP8FT-GKuDEG03uDip+L-NFe3yFZJt7jUxQ@mail.gmail.com>
Date: Thu, 13 Feb 2020 18:14:48 +0100
From: Stefano Garzarella <sgarzare@...hat.com>
To: "Boeuf, Sebastien" <sebastien.boeuf@...el.com>
Cc: "stefanha@...hat.com" <stefanha@...hat.com>,
"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
"davem@...emloft.net" <davem@...emloft.net>
Subject: Re: [PATCH] net: virtio_vsock: Fix race condition between bind and
listen
On Thu, Feb 13, 2020 at 5:51 PM Boeuf, Sebastien <sebastien.boeuf@...el.com> wrote:
>
> On Thu, 2020-02-13 at 14:04 +0100, Stefano Garzarella wrote:
> > On Thu, Feb 13, 2020 at 11:39:49AM +0000, Stefan Hajnoczi wrote:
> > > On Thu, Feb 13, 2020 at 10:44:18AM +0000, Boeuf, Sebastien wrote:
> > > > On Thu, 2020-02-13 at 11:22 +0100, Stefano Garzarella wrote:
> > > > > On Thu, Feb 13, 2020 at 09:51:36AM +0000, Boeuf, Sebastien
> > > > > wrote:
> > > > > > Hi Stefano,
> > > > > >
> > > > > > On Thu, 2020-02-13 at 10:41 +0100, Stefano Garzarella wrote:
> > > > > > > Hi Sebastien,
> > > > > > >
> > > > > > > On Thu, Feb 13, 2020 at 09:16:11AM +0000, Boeuf, Sebastien
> > > > > > > wrote:
> > > > > > > > From 2f1276d02f5a12d85aec5adc11dfe1eab7e160d6 Mon Sep 17
> > > > > > > > 00:00:00
> > > > > > > > 2001
> > > > > > > > From: Sebastien Boeuf <sebastien.boeuf@...el.com>
> > > > > > > > Date: Thu, 13 Feb 2020 08:50:38 +0100
> > > > > > > > Subject: [PATCH] net: virtio_vsock: Fix race condition
> > > > > > > > between
> > > > > > > > bind
> > > > > > > > and listen
> > > > > > > >
> > > > > > > > Whenever the vsock backend on the host sends a packet
> > > > > > > > through
> > > > > > > > the
> > > > > > > > RX
> > > > > > > > queue, it expects an answer on the TX queue.
> > > > > > > > Unfortunately,
> > > > > > > > there
> > > > > > > > is one
> > > > > > > > case where the host side will hang waiting for the answer
> > > > > > > > and
> > > > > > > > will
> > > > > > > > effectively never recover.
> > > > > > >
> > > > > > > Do you have a test case?
> > > > > >
> > > > > > Yes I do. This has been a bug we've been investigating on
> > > > > > Kata
> > > > > > Containers for quite some time now. This was happening when
> > > > > > using
> > > > > > Kata
> > > > > > along with Cloud-Hypervisor (which rely on the hybrid vsock
> > > > > > implementation from Firecracker). The thing is, this bug is
> > > > > > very
> > > > > > hard
> > > > > > to reproduce and was happening for Kata because of the
> > > > > > connection
> > > > > > strategy. The kata-runtime tries to connect a million times
> > > > > > after
> > > > > > it
> > > > > > started the VM, just hoping the kata-agent will start to
> > > > > > listen
> > > > > > from
> > > > > > the guest side at some point.
> > > > >
> > > > > Maybe is related to something else. I tried the following which
> > > > > should be
> > > > > your case simplified (IIUC):
> > > > >
> > > > > guest$ python
> > > > > import socket
> > > > > s = socket.socket(socket.AF_VSOCK, socket.SOCK_STREAM)
> > > > > s.bind((socket.VMADDR_CID_ANY, 1234))
> > > > >
> > > > > host$ python
> > > > > import socket
> > > > > s = socket.socket(socket.AF_VSOCK, socket.SOCK_STREAM)
> > > > > s.connect((3, 1234))
> > > > >
> > > > > Traceback (most recent call last):
> > > > > File "<stdin>", line 1, in <module>
> > > > > TimeoutError: [Errno 110] Connection timed out
> > > >
> > > > Yes this is exactly the simplified case. But that's the point, I
> > > > don't
> > > > think the timeout is the best way to go here. Because this means
> > > > that
> > > > when we run into this case, the host side will wait for quite
> > > > some time
> > > > before retrying, which can cause a very long delay before the
> > > > communication with the guest is established. By simply answering
> > > > the
> > > > host with a RST packet, we inform it that nobody's listening on
> > > > the
> > > > guest side yet, therefore the host side will close and try again.
> > >
> > > My expectation is that TCP/IP will produce ECONNREFUSED in this
> > > case but
> > > I haven't checked. Timing out is weird behavior.
> >
> > I just tried and yes, TCP/IP produces ECONNREFUSED. The same error
> > returned
> > when no one's bound to the port.
> >
> > Instead virtio-vsock returns ECONNRESET in the last case.
> > I'm not sure it's correct (looking at the man page connect(2) it
> > would
> > seem not), but if I understood correctly VMCI returns the same
> > ECONNRESET in this case.
> >
> > > In any case, the reference for virtio-vsock semantics is:
> > > 1. How does VMCI (VMware) vsock behave? We strive to be compatible
> > > with
> > > the VMCI transport.
> >
> > Looking at the code, it looks like VMCI returns ECONNRESET in this
> > case,
> > so this patch should be okay. (I haven't tried it)
> >
> > > 2. If there is no clear VMCI behavior, then we look at TCP/IP
> > > because
> > > those semantics are expected by most applications.
> > >
> > > This bug needs a test case in tools/testings/vsock/ and that test
> > > case
> > > will run against VMCI, virtio-vsock, and Hyper-V. Doing that will
> > > answer the question of how VMCI handles this case.
> >
> > I agree.
>
> I'm trying to write the test but I'm kinda stuck as I have no way to
Great!
> ensure the proper timing between the test on the server(guest) and the
> test on from the client(host).
>
> Basically I was thinking about creating a new test reusing
> test_stream_connection_reset() from vsock_test.c. The reused function
> would be used for the run_client callback, while I would define a new
> function for the run_server. I wanted to basically do a bind only, and
> don't go up to the listen/accept calls.
> Problem is, the server won't block after the bind is done, and I don't
> know how much the test should wait between the bind and the end.
> I would like to show that even when the server reaches the point where
> the socket is bound, the connect will still return with ECONNRESET. The
> same test without the patch would simply hang till we hit the timeout
> on the client side.
> Does that make sense? And how much time should I wait for after the
> bind on the server side?
You can use control_writeln() and control_expectln() from control.c to
syncronize server and client. The control path uses a TCP/IP socket.
You can do something like this:
server client
s = socket(); s = socket();
bind(s, ...);
control_writeln("BIND");
control_expectln("BIND");
connect(s, ...);
# check ret and errno
control_writeln("DONE");
control_expectln("DONE");
close(s); close(s);
Cheers,
Stefano
Powered by blists - more mailing lists