lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAHApi-=Vr4VARgoDNB1T906gfDNB5L5_U24zE=ZHQi+qd__e8w@mail.gmail.com>
Date:   Tue, 25 Apr 2023 00:52:53 +0200
From:   Kal Cutter Conley <kal.conley@...tris.com>
To:     John Fastabend <john.fastabend@...il.com>
Cc:     Björn Töpel <bjorn@...nel.org>,
        Magnus Karlsson <magnus.karlsson@...el.com>,
        Maciej Fijalkowski <maciej.fijalkowski@...el.com>,
        Jonathan Lemon <jonathan.lemon@...il.com>,
        "David S. Miller" <davem@...emloft.net>,
        Eric Dumazet <edumazet@...gle.com>,
        Jakub Kicinski <kuba@...nel.org>,
        Paolo Abeni <pabeni@...hat.com>,
        Alexei Starovoitov <ast@...nel.org>,
        Daniel Borkmann <daniel@...earbox.net>,
        Jesper Dangaard Brouer <hawk@...nel.org>,
        netdev@...r.kernel.org, bpf@...r.kernel.org,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH] xsk: Use pool->dma_pages to check for DMA

> > Compare pool->dma_pages instead of pool->dma_pages_cnt to check for an
> > active DMA mapping. pool->dma_pages needs to be read anyway to access
> > the map so this compiles to more efficient code.
>
> Was it noticable in some sort of performance test?

This patch is part of the patchset found at
https://lore.kernel.org/all/20230412162114.19389-3-kal.conley@dectris.com/
which is being actively discussed and needs to be resubmitted anyway
because of a conflict. While the discussion continues, I am submitting
this patch by itself because I think it's an improvement on its own
(regardless of what happens with the rest of the linked patchset). On
one system, I measured a performance regression of 2-3% with xdpsock
and the linked changes without the current patch. With the current
patch, the performance regression was no longer observed.

> > diff --git a/include/net/xsk_buff_pool.h b/include/net/xsk_buff_pool.h
> > index d318c769b445..a8d7b8a3688a 100644
> > --- a/include/net/xsk_buff_pool.h
> > +++ b/include/net/xsk_buff_pool.h
> > @@ -180,7 +180,7 @@ static inline bool xp_desc_crosses_non_contig_pg(struct xsk_buff_pool *pool,
> >       if (likely(!cross_pg))
> >               return false;
> >
> > -     return pool->dma_pages_cnt &&
> > +     return pool->dma_pages &&
> >              !(pool->dma_pages[addr >> PAGE_SHIFT] & XSK_NEXT_PG_CONTIG_MASK);
> >  }

I would consider the above code part of the "fast path". It may be
executed approximately once per frame in unaligned mode.

> This seems to be used in the setup/tear-down paths so your optimizing
> a control side. Is there a fast path with this code? I walked the
> ice driver. If its just setup code we should do whatever is more
> readable.

It is not only used in setup/tear-down paths (see above).
Additionally, I believe the code is also _more_ readable with this
patch applied. In particular, this patch reduces cognitive complexity
since people (and compilers) reading the code don't need to
additionally think about pool->dma_pages_cnt.

> Both the _alloc_ cases read neighboring free_heads_cnt so your saving a load I guess?
> This is so deep into micro-optimizing I'm curious if you could measure it?

It is saving a load which also reduces code size. This will affect
other decisions such as what to inline. Also in the linked patchset,
dma_pages and dma_pages_cnt do not share a cache line (on x86_64).

>
> >               } else {
> >                       xskb = &pool->heads[xp_aligned_extract_idx(pool, addr)];
>
> I'm not actually against optimizing but maybe another idea. Why do we have to
> check at all? Seems if the DMA has been disabled/unmapped the driver shouldn't
> be trying to call xsk_buff_alloc_batch? Then you can just drop the 'if' check.
>
> It feels to me the drivers shouldn't even be calling this after unmapping
> the dma. WDYT?

Many of these code paths are used both for ZC and copy modes. You
might be right that this particular case is only used with DMA.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ