[<prev] [next>] [day] [month] [year] [list]
Message-Id: <1563317552.qsi08y8lyr.astroid@bobo.none>
Date: Wed, 17 Jul 2019 09:07:09 +1000
From: Nicholas Piggin <npiggin@...il.com>
To: Alex Kogan <alex.kogan@...cle.com>
Cc: Arnd Bergmann <arnd@...db.de>, bp@...en8.de,
daniel.m.jordan@...cle.com, dave.dice@...cle.com,
guohanjun@...wei.com, hpa@...or.com, jglauber@...vell.com,
linux-arch@...r.kernel.org, linux-arm-kernel@...ts.infradead.org,
linux@...linux.org.uk, linux-kernel@...r.kernel.org,
longman@...hat.com, mingo@...hat.com, peterz@...radead.org,
rahul.x.yadav@...cle.com, steven.sistare@...cle.com,
tglx@...utronix.de, will.deacon@....com, x86@...nel.org
Subject: Re: [PATCH v3 0/5] Add NUMA-awareness to qspinlock
Alex Kogan's on July 17, 2019 12:45 am:
>
>> On Jul 16, 2019, at 7:47 AM, Nicholas Piggin <npiggin@...il.com> wrote:
>>
>> Alex Kogan's on July 16, 2019 5:25 am:
>>> Our evaluation shows that CNA also improves performance of user
>>> applications that have hot pthread mutexes. Those mutexes are
>>> blocking, and waiting threads park and unpark via the futex
>>> mechanism in the kernel. Given that kernel futex chains, which
>>> are hashed by the mutex address, are each protected by a
>>> chain-specific spin lock, the contention on a user-mode mutex
>>> translates into contention on a kernel level spinlock.
>>
>> What applications are those, what performance numbers? Arguably that's
>> much more interesting than microbenchmarks (which are mainly useful to
>> help ensure the fast paths are not impacted IMO).
>
> Those are applications that use locks in which waiting threads can park (block),
> e.g., pthread mutexes. Under (user-level) contention, the park-unpark mechanism
> in the kernel creates contention on (kernel) spin locks protecting futex chains.
> As an example, we experimented with LevelDB (key-value store), and included
> performance numbers in the patch. Or you are looking for something else?
Oh, no that's good. I confused myself thinking that was another will it
scale benchmark. The speedup becomes significant on readrandom, I wonder
if of it might be that you're gating which threads get to complete the
futex operation and so the effect is amplified beyond just the critical
section of the spin lock?
Am I reading the table correctly, this test gets about 2.1x speedup when
scaling from 1 to 142 threads in the patch-CNA case?
Thanks,
Nick
Powered by blists - more mailing lists