netdev - Re: [patch 0/3] net, refcount: Address dst_entry reference count scalability issues

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20230228191705.3bc8bed6@kernel.org>
Date:   Tue, 28 Feb 2023 19:17:05 -0800
From:   Jakub Kicinski <kuba@...nel.org>
To:     Thomas Gleixner <tglx@...utronix.de>
Cc:     Eric Dumazet <edumazet@...gle.com>,
        LKML <linux-kernel@...r.kernel.org>,
        Linus Torvalds <torvalds@...uxfoundation.org>, x86@...nel.org,
        Wangyang Guo <wangyang.guo@...el.com>,
        Arjan van De Ven <arjan@...ux.intel.com>,
        "David S. Miller" <davem@...emloft.net>,
        Paolo Abeni <pabeni@...hat.com>, netdev@...r.kernel.org,
        Will Deacon <will@...nel.org>,
        Peter Zijlstra <peterz@...radead.org>,
        Boqun Feng <boqun.feng@...il.com>,
        Mark Rutland <mark.rutland@....com>,
        Marc Zyngier <maz@...nel.org>
Subject: Re: [patch 0/3] net, refcount: Address dst_entry reference count
 scalability issues

FWIW looks good to me, especially the refcount part.
We do see 10% of jitter in microbenchmarks due to random cache
effects, so forgive the questioning. But again, the refcount seems 
like an obvious win to my noob eyes.

While I have you it would be remiss of me not to mention my ksoftirq
change which makes a large difference in production workloads:
https://lore.kernel.org/all/20221222221244.1290833-3-kuba@kernel.org/
Is Peter's "rework" of softirq going in for 6.3?

On Wed, 01 Mar 2023 02:00:20 +0100 Thomas Gleixner wrote:
> >> We looked at this because the reference count operations stood out in
> >> perf top and we analyzed it down to the false sharing _and_ the
> >> non-scalability of atomic_inc_not_zero().
> >
> > Please share your recipe and perf results.  
> 
> Sorry for being not explicit enough about this, but I was under the
> impression that explicitely mentioning memcached and memtier would be
> enough of a hint for people famiiar with this matter.

I think the disconnect may be that we are indeed familiar with 
the workloads, but these exact workloads don't hit the issue
in production (I don't work at Google but a similarly large corp).
My initial reaction was also to see if I can find the issue in prod.
Not to question but in hope that I can indeed find a repro, and make
this series an easy sell.

> Run memcached with -t $N and memtier_benchmark with -t $M and
> --ratio=1:100 on the same machine. localhost connections obviously
> amplify the problem,
> 
> Start with the defaults for $N and $M and increase them. Depending on
> your machine this will tank at some point. But even in reasonably small
> $N, $M scenarios the refcount operations and the resulting false sharing
> fallout becomes visible in perf top. At some point it becomes the
> dominating issue while the machine still has capacity...
> 
> > We must have been very lucky to not see this at Google.  
> 
> There _is_ a world outside of Google? :)
> 
> Seriously. The point is that even if you @google cannot obverse this as
> a major issue and it just gives your usecase a minimal 0.X gain, it
> still is contributing to the overall performance, no?