[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20190716110427.GP3419@hirez.programming.kicks-ass.net>
Date: Tue, 16 Jul 2019 13:04:27 +0200
From: Peter Zijlstra <peterz@...radead.org>
To: Waiman Long <longman@...hat.com>
Cc: Alex Kogan <alex.kogan@...cle.com>, linux@...linux.org.uk,
mingo@...hat.com, will.deacon@....com, arnd@...db.de,
linux-arch@...r.kernel.org, linux-arm-kernel@...ts.infradead.org,
linux-kernel@...r.kernel.org, tglx@...utronix.de, bp@...en8.de,
hpa@...or.com, x86@...nel.org, guohanjun@...wei.com,
jglauber@...vell.com, steven.sistare@...cle.com,
daniel.m.jordan@...cle.com, dave.dice@...cle.com,
rahul.x.yadav@...cle.com
Subject: Re: [PATCH v3 3/5] locking/qspinlock: Introduce CNA into the slow
path of qspinlock
On Mon, Jul 15, 2019 at 05:30:01PM -0400, Waiman Long wrote:
> On 7/15/19 3:25 PM, Alex Kogan wrote:
> > /*
> > - * On 64-bit architectures, the mcs_spinlock structure will be 16 bytes in
> > - * size and four of them will fit nicely in one 64-byte cacheline. For
> > - * pvqspinlock, however, we need more space for extra data. To accommodate
> > - * that, we insert two more long words to pad it up to 32 bytes. IOW, only
> > - * two of them can fit in a cacheline in this case. That is OK as it is rare
> > - * to have more than 2 levels of slowpath nesting in actual use. We don't
> > - * want to penalize pvqspinlocks to optimize for a rare case in native
> > - * qspinlocks.
> > + * On 64-bit architectures, the mcs_spinlock structure will be 20 bytes in
> > + * size. For pvqspinlock or the NUMA-aware variant, however, we need more
> > + * space for extra data. To accommodate that, we insert two more long words
> > + * to pad it up to 36 bytes.
> > */
> The 20 bytes figure is wrong. It is actually 24 bytes for 64-bit as the
> mcs_spinlock structure is 8-byte aligned. For better cacheline
> alignment, I will like to keep mcs_spinlock to 16 bytes as before.
> Instead, you can use encode_tail() to store the CNA node pointer in
> "locked". For instance, use (encode_tail() << 1) in locked to
> distinguish it from the regular locked=1 value.
Yes, please don't bloat this. I already don't like what Waiman did for
the paravirt case, but this is horrible.
Powered by blists - more mailing lists