[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAEf4BzY1v=0fri=5jpk4yXLTbNhmAMxCpL+5EH9PKKBRtM0YTg@mail.gmail.com>
Date: Fri, 21 May 2021 17:12:33 -0700
From: Andrii Nakryiko <andrii.nakryiko@...il.com>
To: Cong Wang <xiyou.wangcong@...il.com>
Cc: John Fastabend <john.fastabend@...il.com>,
Linux Kernel Network Developers <netdev@...r.kernel.org>,
bpf <bpf@...r.kernel.org>, Cong Wang <cong.wang@...edance.com>,
Jiang Wang <jiang.wang@...edance.com>,
Daniel Borkmann <daniel@...earbox.net>,
Jakub Sitnicki <jakub@...udflare.com>,
Lorenz Bauer <lmb@...udflare.com>
Subject: Re: [Patch bpf] selftests/bpf: Retry for EAGAIN in udp_redir_to_connected()
On Fri, May 21, 2021 at 12:31 AM Cong Wang <xiyou.wangcong@...il.com> wrote:
>
> On Thu, May 20, 2021 at 10:47 AM John Fastabend
> <john.fastabend@...il.com> wrote:
> >
> > Cong Wang wrote:
> > > On Wed, May 19, 2021 at 2:56 PM John Fastabend <john.fastabend@...il.com> wrote:
> > > >
> > > > Cong Wang wrote:
> > > > > From: Cong Wang <cong.wang@...edance.com>
> > > > >
> > > > > We use non-blocking sockets for testing sockmap redirections,
> > > > > and got some random EAGAIN errors from UDP tests.
> > > > >
> > > > > There is no guarantee the packet would be immediately available
> > > > > to receive as soon as it is sent out, even on the local host.
> > > > > For UDP, this is especially true because it does not lock the
> > > > > sock during BH (unlike the TCP path). This is probably why we
> > > > > only saw this error in UDP cases.
> > > > >
> > > > > No matter how hard we try to make the queue empty check accurate,
> > > > > it is always possible for recvmsg() to beat ->sk_data_ready().
> > > > > Therefore, we should just retry in case of EAGAIN.
> > > > >
> > > > > Fixes: d6378af615275 ("selftests/bpf: Add a test case for udp sockmap")
> > > > > Reported-by: Jiang Wang <jiang.wang@...edance.com>
> > > > > Cc: John Fastabend <john.fastabend@...il.com>
> > > > > Cc: Daniel Borkmann <daniel@...earbox.net>
> > > > > Cc: Jakub Sitnicki <jakub@...udflare.com>
> > > > > Cc: Lorenz Bauer <lmb@...udflare.com>
> > > > > Signed-off-by: Cong Wang <cong.wang@...edance.com>
> > > > > ---
> > > > > tools/testing/selftests/bpf/prog_tests/sockmap_listen.c | 6 +++++-
> > > > > 1 file changed, 5 insertions(+), 1 deletion(-)
> > > > >
> > > > > diff --git a/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c b/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c
> > > > > index 648d9ae898d2..b1ed182c4720 100644
> > > > > --- a/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c
> > > > > +++ b/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c
> > > > > @@ -1686,9 +1686,13 @@ static void udp_redir_to_connected(int family, int sotype, int sock_mapfd,
> > > > > if (pass != 1)
> > > > > FAIL("%s: want pass count 1, have %d", log_prefix, pass);
> > > > >
> > > > > +again:
> > > > > n = read(mode == REDIR_INGRESS ? p0 : c0, &b, 1);
> > > > > - if (n < 0)
> > > > > + if (n < 0) {
> > > > > + if (errno == EAGAIN)
> > > > > + goto again;
> > > > > FAIL_ERRNO("%s: read", log_prefix);
> > > >
> > > > Needs a counter and abort logic we don't want to loop forever in the
> > > > case the packet is lost.
> > >
> > > It should not be lost because selftests must be self-contained,
> > > if the selftests could not even predict whether its own packet is
> > > lost or not, we would have a much bigger trouble than just this
> > > infinite loop.
> > >
> > > Thanks.
> >
> > Add the retry counter its maybe 4 lines of code. When I run this in a container
>
> Sure, then the next question is how many times are you going to retry?
> Let's say, 10. Then the next question would be is 10 sufficient? Clearly
> not, because if the packet can be dropped (let's say by firewall), it can
> be delayed too (let's say by netem).
10 is fine. If something unexpected happens (whatever hostility of the
environment), getting stuck is much-much worse than erroring out. So
please add the counter and be done with it.
>
> Really, are we going to handle all of such cases? Or if we simply
> assume the environment is self-contained and not hostile, none of
> the above would happen hence nothing needs to be checked.
We have many different environments in which selftests are running. We
shouldn't have an infinite loop in any of them, even if some selftests
can't succeed in some of them. Not in all environments it is possible
to do Ctrl-C (e.g., CIs).
>
> > and my memcg kicks in for some unknown reason I don't want it to loop
> > forever I want it to fail gracefully. Plus it just looks bad to encode
> > a non-terminating loop, in my opinion, so I want the counter.
>
> There could be hundreds of reasons to drop a packet, or delay a
> packet. How many of them do you want to consider and how many
> of them do you not consider? Please draw a boundary.
>
> The boundary I drew is very clear: we just assume the selftests
See above, the minimum bar is that selftest shouldn't get stuck, no matter what.
> environment is not hostile. In fact, I believe selftests should setup
> such an environment before running any test, for example, by creating
> a container. And I believe this would make everyone happy.
>
> Thanks.
Powered by blists - more mailing lists