[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <YGNNCEAMSWbBU+hd@hirez.programming.kicks-ass.net>
Date: Tue, 30 Mar 2021 18:08:40 +0200
From: Peter Zijlstra <peterz@...radead.org>
To: Guo Ren <guoren@...nel.org>
Cc: linux-riscv <linux-riscv@...ts.infradead.org>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
linux-csky@...r.kernel.org,
linux-arch <linux-arch@...r.kernel.org>,
Guo Ren <guoren@...ux.alibaba.com>,
Will Deacon <will@...nel.org>, Ingo Molnar <mingo@...hat.com>,
Waiman Long <longman@...hat.com>,
Arnd Bergmann <arnd@...db.de>, Anup Patel <anup@...infault.org>
Subject: Re: [PATCH v4 3/4] locking/qspinlock: Add
ARCH_USE_QUEUED_SPINLOCKS_XCHG32
On Tue, Mar 30, 2021 at 11:13:55AM +0800, Guo Ren wrote:
> On Mon, Mar 29, 2021 at 8:50 PM Peter Zijlstra <peterz@...radead.org> wrote:
> >
> > On Mon, Mar 29, 2021 at 08:01:41PM +0800, Guo Ren wrote:
> > > u32 a = 0x55aa66bb;
> > > u16 *ptr = &a;
> > >
> > > CPU0 CPU1
> > > ========= =========
> > > xchg16(ptr, new) while(1)
> > > WRITE_ONCE(*(ptr + 1), x);
> > >
> > > When we use lr.w/sc.w implement xchg16, it'll cause CPU0 deadlock.
> >
> > Then I think your LL/SC is broken.
> >
> > That also means you really don't want to build super complex locking
> > primitives on top, because that live-lock will percolate through.
> Do you mean the below implementation has live-lock risk?
> +static __always_inline u32 xchg_tail(struct qspinlock *lock, u32 tail)
> +{
> + u32 old, new, val = atomic_read(&lock->val);
> +
> + for (;;) {
> + new = (val & _Q_LOCKED_PENDING_MASK) | tail;
> + old = atomic_cmpxchg(&lock->val, val, new);
> + if (old == val)
> + break;
> +
> + val = old;
> + }
> + return old;
> +}
That entirely depends on the architecture (and cmpxchg() impementation).
There are a number of cases:
* architecture has cmpxchg() instruction (x86, s390, sparc, etc.).
- architecture provides fwd progress (x86)
- architecture requires backoff for progress (sparc)
* architecture does not have cmpxchg, and implements it using LL/SC.
and here things get *really* interesting, because while an
architecture can have LL/SC fwd progress, that does not translate into
cmpxchg() also having the same guarantees and all bets are off.
The real bummer is that C can do cmpxchg(), but there is no way it can
do LL/SC. And even if we'd teach C how to do LL/SC, it couldn't be
generic because architectures lacking it can't emulate it using
cmpxchg() (there's a fun class of bugs there).
So while the above code might be the best we can do in generic code,
it's really up to the architecture to make it work.
Powered by blists - more mailing lists