lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  PHC 
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 12 Jan 2021 13:23:16 +0100
From:   Eric Dumazet <>
To:     Alexander Lobakin <>
Cc:     Edward Cree <>,
        "David S. Miller" <>,
        Jakub Kicinski <>,
        Edward Cree <>,
        Jonathan Lemon <>,
        Willem de Bruijn <>,
        Miaohe Lin <>,
        Steffen Klassert <>,
        Guillaume Nault <>,
        Yadu Kishore <>,
        Al Viro <>,
        netdev <>,
        LKML <>
Subject: Re: [PATCH net-next 0/5] skbuff: introduce skbuff_heads bulking and reusing

On Tue, Jan 12, 2021 at 12:08 PM Alexander Lobakin <> wrote:
> From: Edward Cree <>
> Date: Tue, 12 Jan 2021 09:54:04 +0000
> > Without wishing to weigh in on whether this caching is a good idea...
> Well, we already have a cache to bulk flush "consumed" skbs, although
> kmem_cache_free() is generally lighter than kmem_cache_alloc(), and
> a page frag cache to allocate skb->head that is also bulking the
> operations, since it contains a (compound) page with the size of
> min(SZ_32K, PAGE_SIZE).
> If they wouldn't give any visible boosts, I think they wouldn't hit
> mainline.
> > Wouldn't it be simpler, rather than having two separate "alloc" and "flush"
> >  caches, to have a single larger cache, such that whenever it becomes full
> >  we bulk flush the top half, and when it's empty we bulk alloc the bottom
> >  half?  That should mean fewer branches, fewer instructions etc. than
> >  having to decide which cache to act upon every time.
> I though about a unified cache, but couldn't decide whether to flush
> or to allocate heads and how much to process. Your suggestion answers
> these questions and generally seems great. I'll try that one, thanks!

The thing is : kmalloc() is supposed to have batches already, and nice
per-cpu caches.

This looks like an mm issue, are we sure we want to get over it ?

I would like a full analysis of why SLAB/SLUB does not work well for
your test workload.

More details, more numbers.... before we accept yet another
'networking optimization' adding more code to the 'fast' path.

More code means more latencies when all code needs to be brought up in
cpu caches.

Powered by blists - more mailing lists