lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAKgT0UeV9+=AcQ1J+UA=KGWKAV2E4CW566qYHNv_XxQMC3Us-Q@mail.gmail.com>
Date:   Thu, 8 Sep 2022 12:26:21 -0700
From:   Alexander Duyck <alexander.duyck@...il.com>
To:     Paolo Abeni <pabeni@...hat.com>
Cc:     Eric Dumazet <eric.dumazet@...il.com>,
        "David S . Miller" <davem@...emloft.net>,
        Jakub Kicinski <kuba@...nel.org>,
        netdev <netdev@...r.kernel.org>,
        Eric Dumazet <edumazet@...gle.com>,
        Alexander Duyck <alexanderduyck@...com>,
        "Michael S . Tsirkin" <mst@...hat.com>,
        Greg Thelen <gthelen@...gle.com>
Subject: Re: [PATCH net] net: avoid 32 x truesize under-estimation for tiny skbs

On Thu, Sep 8, 2022 at 11:01 AM Paolo Abeni <pabeni@...hat.com> wrote:
>
> On Thu, 2022-09-08 at 07:53 -0700, Alexander H Duyck wrote:
> > On Thu, 2022-09-08 at 13:00 +0200, Paolo Abeni wrote:
> > > In most build GRO_MAX_HEAD packets are even larger (should be 640)
> >
> > Right, which is why I am thinking we may want to default to a 1K slice.
>
> Ok it looks like there is agreement to force a minimum frag size of 1K.
> Side note: that should not cause a memory usage increase compared to
> the slab allocator as kmalloc(640) should use the kmalloc-1k slab.
>
> [...]
>
> > > >
> > > If the pagecnt optimization should be dropped, it would be probably
> > > more straight-forward to use/adapt 'page_frag' for the page_order0
> > > allocator.
> >
> > That would make sense. Basically we could get rid of the pagecnt bias
> > and add the fixed number of slices to the count at allocation so we
> > would just need to track the offset to decide when we need to allocate
> > a new page. In addtion if we are flushing the page when it is depleted
> > we don't have to mess with the pfmemalloc logic.
>
> Uhmm... it looks like that the existing page_frag allocator does not
> always flush the depleted page:
>
> bool skb_page_frag_refill(unsigned int sz, struct page_frag *pfrag, gfp_t gfp)
> {
>         if (pfrag->page) {
>                 if (page_ref_count(pfrag->page) == 1) {
>                         pfrag->offset = 0;
>                         return true;
>                 }

Right, we have an option to reuse the page if the page count is 0.
However in the case of the 4K page with 1K slices scenario it means
you are having to bump back up the count on every 3 pages. So you
would be looking at 1.3 atomic accesses per frag. Just doing the bump
once at the start and using all 4 slices would give you 1.25 atomic
accesses per frag. That is why I assumed it would be better to just
let it go.

> so I'll try adding some separate/specialized code and see if the
> overall complexity would be reasonable.

The other thing to keep in mind is that once you start adding the
recycling you will have best case and worst case scenarios to
consider. The code above is for recycling frag in place it seems like,
or reallocating a new one in its place.

> > > BTW it's quite strange/confusing having to very similar APIs (page_frag
> > > and page_frag_cache) with very similar names and no references between
> > > them.
> >
> > I'm not sure what you are getting at here. There are plenty of
> > references between them, they just aren't direct.
>
> Looking/greping the tree I could not trivially understand when 'struct
> page_frag' should be preferred over 'struct page_frag_cache' and/or
> vice versa, I had to look at the respective implementation details.

The page_frag_cache is mostly there to store a higher order page to
slice up to generate page fragments that can be stored in the
page_frag struct. Honestly I am surprised we still have page_frag
floating around. I thought we replaced that with bio_vec some time
ago. At least that is the structure that skb_frag_t is typdef-ed as.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ