netdev - Re: [PATCH net-next 02/13] selftests: ncdevmem: Remove validation

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAHS8izM1SNJtusPScJyzrP+Afbo1G2Xdr4-KbLeyuA6P62UMqw@mail.gmail.com>
Date: Fri, 13 Sep 2024 16:42:04 -0700
From: Mina Almasry <almasrymina@...gle.com>
To: Stanislav Fomichev <stfomichev@...il.com>
Cc: Stanislav Fomichev <sdf@...ichev.me>, netdev@...r.kernel.org, davem@...emloft.net, 
	edumazet@...gle.com, kuba@...nel.org, pabeni@...hat.com
Subject: Re: [PATCH net-next 02/13] selftests: ncdevmem: Remove validation

On Fri, Sep 13, 2024 at 10:15 AM Stanislav Fomichev
<stfomichev@...il.com> wrote:
>
> On 09/13, Mina Almasry wrote:
> > On Thu, Sep 12, 2024 at 2:57 PM Stanislav Fomichev <stfomichev@...il.com> wrote:
> > >
> > > On 09/12, Mina Almasry wrote:
> > > > On Thu, Sep 12, 2024 at 10:12 AM Stanislav Fomichev <sdf@...ichev.me> wrote:
> > > > >
> > > > > ncdevmem should (see next patches) print the payload on the stdout.
> > > > > The validation can and should be done by the callers:
> > > > >
> > > > > $ ncdevmem -l ... > file
> > > > > $ sha256sum file
> > > > >
> > > > > Cc: Mina Almasry <almasrymina@...gle.com>
> > > > > Signed-off-by: Stanislav Fomichev <sdf@...ichev.me>
> > > > > ---
> > > > >  tools/testing/selftests/net/ncdevmem.c | 56 +++-----------------------
> > > > >  1 file changed, 6 insertions(+), 50 deletions(-)
> > > > >
> > > > > diff --git a/tools/testing/selftests/net/ncdevmem.c b/tools/testing/selftests/net/ncdevmem.c
> > > > > index 352dba211fb0..3712296d997b 100644
> > > > > --- a/tools/testing/selftests/net/ncdevmem.c
> > > > > +++ b/tools/testing/selftests/net/ncdevmem.c
> > > > > @@ -64,24 +64,13 @@
> > > > >  static char *server_ip = "192.168.1.4";
> > > > >  static char *client_ip = "192.168.1.2";
> > > > >  static char *port = "5201";
> > > > > -static size_t do_validation;
> > > > >  static int start_queue = 8;
> > > > >  static int num_queues = 8;
> > > > >  static char *ifname = "eth1";
> > > > >  static unsigned int ifindex;
> > > > >  static unsigned int dmabuf_id;
> > > > >
> > > > > -void print_bytes(void *ptr, size_t size)
> > > > > -{
> > > > > -       unsigned char *p = ptr;
> > > > > -       int i;
> > > > > -
> > > > > -       for (i = 0; i < size; i++)
> > > > > -               printf("%02hhX ", p[i]);
> > > > > -       printf("\n");
> > > > > -}
> > > > > -
> > > > > -void print_nonzero_bytes(void *ptr, size_t size)
> > > > > +static void print_nonzero_bytes(void *ptr, size_t size)
> > > > >  {
> > > > >         unsigned char *p = ptr;
> > > > >         unsigned int i;
> > > > > @@ -91,30 +80,6 @@ void print_nonzero_bytes(void *ptr, size_t size)
> > > > >         printf("\n");
> > > > >  }
> > > > >
> > > > > -void validate_buffer(void *line, size_t size)
> > > > > -{
> > > > > -       static unsigned char seed = 1;
> > > > > -       unsigned char *ptr = line;
> > > > > -       int errors = 0;
> > > > > -       size_t i;
> > > > > -
> > > > > -       for (i = 0; i < size; i++) {
> > > > > -               if (ptr[i] != seed) {
> > > > > -                       fprintf(stderr,
> > > > > -                               "Failed validation: expected=%u, actual=%u, index=%lu\n",
> > > > > -                               seed, ptr[i], i);
> > > > > -                       errors++;
> > > >
> > > > FWIW the index at where the validation started to fail often gives
> > > > critical clues about where the bug is, along with this line, which I'm
> > > > glad is not removed:
> > > >
> > > > printf("received frag_page=%llu, in_page_offset=%llu,
> > > > frag_offset=%llu, frag_size=%u, token=%u, total_received=%lu,
> > > > dmabuf_id=%u\n",
> > > >
> > > > I think we can ensure that what is doing the validation above ncdevmem
> > > > prints enough context about the error. Although, just to understand
> > > > your thinking a bit, why not have this binary do the validation
> > > > itself?
> > >
> > > Right, the debugging stuff will still be there to track the offending
> > > part. And the caller can print out the idx of data where the validation
> > > failed.
> > >
> >
> > Sorry to harp on this, but on second thought I don't think just
> > printing out the idx is good enough. In many cases all the context
> > printed by ncdevmem validation (page_frag/offset/dmabuf_id/etc) is
> > useful, and it's useful to have it inline with where the check failed.
> >
> > IIUC after your changes the frag_page/offset/dmabuf_id will go to
> > stderr output of ncdevmem, but the validation check fail will go to a
> > different log by the parent checker. Matching the failure in the 2
> > logs in megs of frag output will be annoying.
>
> Yes, it will definitely be more annoying to piece those two things
> together. But I don't expect us to debug the payload validation issues
> from the nipa dashboard logs. Even if you get a clear message of
> "byte at position X is not expected" plus all the chunk info logs,
> what do you really do with this info (especially if it's flaky)?
>

Oh, they provide important clues, even for flakes. See an example of
Taehee using these clues to arrive at a root cause here:
https://lore.kernel.org/netdev/CAMArcTUXm13xJO9XqcT=0uQAn_ZQOQ=Y49EPpHqV+jkkhihMcw@mail.gmail.com/

> For development we can have some script to put those two things together
> for debugging.
>
> > > The reason I removed it from the tool is to be able to do the validation
> > > with regular nc on the other side. I just do:
> > >
> >
> > To be honest I don't think the ncdevmem validation gets in the way of that.
> >
> > To test rx, we can set up regular nc on the client side, and have
> > ncdevmem do validation on rx.
> > To test tx, we can have ncdevmem do tx (no validation), and have a
> > script on top of nc do validation on the rx side.
> >
> > I guess you may find the lack of symmetry annoying, but IMO it's less
> > annoying than losing some debuggability when the test fails.
> >
> > I think probably there are 3 hopefully agreeable things we can do here:
> >
> > 1. Keep the validation as is and do the rx/tx test as I described above.
> > 2. keep the ncdevmem validation, dump it to stderr, and ignore it in
> > the test. Leave it be for folks wanting to run the test manually to
> > debug it.
> > 3. Move the validation to a parent process of ncdevmem, but that
> > parent process needs to print full context of the failure in 1 log.
> >
> > I prefer #1 TBH.
>
> TBH, I find the existing ncdevmem validation scheme a bit lacking. It is
> basically the same payload over and over (or maybe I misread how it
> works). Maybe we should implement a PRNG-like sequence for validation
> if you prefer to keep it internal?
>

It's just a repeating sequence that's easy to implement. Technically
it would not detect if we drop an N iterations in the sequence, but in
practice it's hard to be that unlucky repeatedly, especially if the
number of repeating bytes is not a power of 2. Improving it to a more
robust sequence sounds fine to me.

> How about we start with what I have (simple 'hello\nworld') and once
> you do tx path, we can add the internal validation back? Both sides,
> with a proper selftest this time. For the time being, we can use the existing
> rx selftest as a smoke test.
>
> Or I can just drop this patch from the series and let you follow up
> on the validation in the selftests (i.e., convert from 'hello\nworld'
> to whatever you prefer)

To be honest the validation has been too important in the testing so
far in root causing issues that I would not like it temporarily
removed. Dropping this patch and letting me add validation to NIPA
later on sounds good to me.

-- 
Thanks,
Mina