linux-kernel - Re: [PATCH] xsk: Use pool->dma

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAHApi-kzaJxQTRgZqYmMSWYa6CW6b0U6x9Sdpk_Kt=fd2hPCjA@mail.gmail.com>
Date:   Wed, 26 Apr 2023 15:02:17 +0200
From:   Kal Cutter Conley <kal.conley@...tris.com>
To:     John Fastabend <john.fastabend@...il.com>
Cc:     Björn Töpel <bjorn@...nel.org>,
        Magnus Karlsson <magnus.karlsson@...el.com>,
        Maciej Fijalkowski <maciej.fijalkowski@...el.com>,
        Jonathan Lemon <jonathan.lemon@...il.com>,
        "David S. Miller" <davem@...emloft.net>,
        Eric Dumazet <edumazet@...gle.com>,
        Jakub Kicinski <kuba@...nel.org>,
        Paolo Abeni <pabeni@...hat.com>,
        Alexei Starovoitov <ast@...nel.org>,
        Daniel Borkmann <daniel@...earbox.net>,
        Jesper Dangaard Brouer <hawk@...nel.org>,
        netdev@...r.kernel.org, bpf@...r.kernel.org,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH] xsk: Use pool->dma_pages to check for DMA

> > > Was it noticable in some sort of performance test?
> >
> > This patch is part of the patchset found at
> > https://lore.kernel.org/all/20230412162114.19389-3-kal.conley@dectris.com/
> > which is being actively discussed and needs to be resubmitted anyway
> > because of a conflict. While the discussion continues, I am submitting
> > this patch by itself because I think it's an improvement on its own
> > (regardless of what happens with the rest of the linked patchset). On
> > one system, I measured a performance regression of 2-3% with xdpsock
> > and the linked changes without the current patch. With the current
> > patch, the performance regression was no longer observed.
>
> Would be nice to have in commit message so reader has an idea the
> perf numbers are in fact better.

When I measured this patch by itself (on bpf-next), I didn't measure
any statistically significant performance gains. However, it did allow
me to avoid a regression when combined with the other linked patch (as
mentioned). I don't know if it makes sense to mention that other
change which is not even applied to any tree. I was mainly submitting
this patch from the perspective of the code being better not
contingent on any provable performance gains.

>
> >
> > > > diff --git a/include/net/xsk_buff_pool.h b/include/net/xsk_buff_pool.h
> > > > index d318c769b445..a8d7b8a3688a 100644
> > > > --- a/include/net/xsk_buff_pool.h
> > > > +++ b/include/net/xsk_buff_pool.h
> > > > @@ -180,7 +180,7 @@ static inline bool xp_desc_crosses_non_contig_pg(struct xsk_buff_pool *pool,
> > > >       if (likely(!cross_pg))
> > > >               return false;
> > > >
> > > > -     return pool->dma_pages_cnt &&
> > > > +     return pool->dma_pages &&
> > > >              !(pool->dma_pages[addr >> PAGE_SHIFT] & XSK_NEXT_PG_CONTIG_MASK);
> > > >  }
> >
> > I would consider the above code part of the "fast path". It may be
> > executed approximately once per frame in unaligned mode.
>
> In the unlikely case though is my reading. So really shouldn't
> be called for every packet or we have other perf issues by that
> likely() there.
>
> I assume the above is where the perf is being gained because below
> two things are in setup/tear down. But then we are benchmarking
> an unlikely() path?

I was testing with large chunk sizes in unaligned mode (4000-4096
bytes) with ZC. For chunk sizes nearly as large as PAGE_SIZE the
unlikely path is actually the main path.

> >
> > > This seems to be used in the setup/tear-down paths so your optimizing
> > > a control side. Is there a fast path with this code? I walked the
> > > ice driver. If its just setup code we should do whatever is more
> > > readable.
> >
> > It is not only used in setup/tear-down paths (see above).
> > Additionally, I believe the code is also _more_ readable with this
> > patch applied. In particular, this patch reduces cognitive complexity
> > since people (and compilers) reading the code don't need to
> > additionally think about pool->dma_pages_cnt.
> >
> > > Both the _alloc_ cases read neighboring free_heads_cnt so your saving a load I guess?
> > > This is so deep into micro-optimizing I'm curious if you could measure it?
> >
> > It is saving a load which also reduces code size. This will affect
> > other decisions such as what to inline. Also in the linked patchset,
> > dma_pages and dma_pages_cnt do not share a cache line (on x86_64).
>
> But again buried in an unlikely path. Sure but removing the conditional
> altogether would be even better.

Yeah, I think that is another improvement to consider.

> So my understanding is ZC is preferred and default mode and copy modes
> are primarily fall back modes. So we are punishing the good case here
> for a fallback to copy mode. I think overall refactoring the code to
> avoid burdoning the fast case with a fallback slow case would be ideal
> solution.

I agree that ZC is preferred and this patch is aimed at improving the
ZC path. The performance gain I observed was for ZC.

> However, I agree just on readability the patch is fine and good. No
> objection on my side. But I think if we are making performance
> arguments for 2-3% here the better thing to do is remove the check
> and unlikely() and we would see better benchmarks when using the
> ZC mode which as I understand it is what performance aware folks should
> be doing.

I totally agree that other better improvements exist but I don't think
they make this patch any less desirable. This change is only meant as
a small incremental improvement.