netdev - Re: [PATCH bpf-next] selftests/bpf: Fix spurious failures in accept due to EAGAIN

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAEf4BzbJENxR6nrL47tKa+mL8Cxf7JtTjkX7ysBSE0iYB0Ey5Q@mail.gmail.com>
Date:   Fri, 13 Mar 2020 11:30:58 -0700
From:   Andrii Nakryiko <andrii.nakryiko@...il.com>
To:     Jakub Sitnicki <jakub@...udflare.com>
Cc:     Alexei Starovoitov <ast@...nel.org>,
        Daniel Borkmann <daniel@...earbox.net>,
        bpf <bpf@...r.kernel.org>, Networking <netdev@...r.kernel.org>,
        kernel-team@...udflare.com
Subject: Re: [PATCH bpf-next] selftests/bpf: Fix spurious failures in accept
 due to EAGAIN

On Fri, Mar 13, 2020 at 9:42 AM Jakub Sitnicki <jakub@...udflare.com> wrote:
>
> On Thu, Mar 12, 2020 at 06:57 PM CET, Andrii Nakryiko wrote:
> > Thanks for looking into this. Can you please verify that test
> > successfully fails (not hangs) when, say, network is down (do `ip link
> > set lo down` before running test?). The reason I'm asking is that I
> > just fixed a problem in tcp_rtt selftest, in which accept() would
> > block forever, even if listening socket was closed.
>
> While on the topic writing network tests with test_progs.
>
> There are a couple pain points because all tests run as one process:
>
> 1) resource cleanup on failure
>
>    Tests can't simply exit(), abort(), or error() on failure. Instead
>    they need to clean up all resources, like opened file descriptors and
>    memory allocations, and propagate the error up to the main test
>    function so it can return to the test runner.
>
> 2) terminating in timely fashion
>
>    We don't have an option of simply setting alarm() to terminate after
>    a reasnable timeout without worrying about I/O syscalls in blocking
>    mode being stuck.

I agree, those APIs suck, unfortunately.

>
> Careful error and timeout handling makes test code more complicated that
> it really needs to be, IMHO. Making writing as well as maintaing them
> harder.

Well, I think it's actually a good thing. Tests are as important as
features, if not more, so it pays to invest in having reliable tests.

>
> What if we extended test_progs runner to support process-per-test
> execution model? Perhaps as an opt-in for selected tests.
>
> Is that in line with the plans/vision for BPF selftests?

It would be nice indeed, though I'd still maintain that tests
shouldn't be sloppy. But having that would allow parallelizing tests,
which would be awesome. So yeah, it would be good to have, IMO.