linux-kernel - RE: [PATCH] lib/genalloc: use try_cmpxchg in {set,clear}_bits

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <efcf1ad325064d47ba027db9a98222ac@AcuMS.aculab.com>
Date:   Mon, 23 Jan 2023 15:42:08 +0000
From:   David Laight <David.Laight@...LAB.COM>
To:     'Uros Bizjak' <ubizjak@...il.com>
CC:     Andrew Morton <akpm@...ux-foundation.org>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        Mateusz Guzik <mjguzik@...il.com>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: RE: [PATCH] lib/genalloc: use try_cmpxchg in {set,clear}_bits_ll

From: Uros Bizjak
> Sent: 23 January 2023 15:05
> 
> On Thu, Jan 19, 2023 at 1:47 PM David Laight <David.Laight@...lab.com> wrote:
> >
> > > BTW: Recently, it was determined [1] that the usage of cpu_relax()
> > > inside the cmpxchg loop can be harmful for performance. We actually
> > > have the same situation here, so perhaps cpu_relax() should be removed
> > > in the same way it was removed from the lockref.
> >
> > I'm not sure you can ever want a cpu_relax() in a loop that
> > is implementing an atomic operation.
> > Even the ia64 (die...) issue was with a loop that was waiting
> > for another cpu to change the location (eg a spinlock).
> >
> > For an atomic operation an immediate retry is likely to succeed.
> > Any kind of deferral to an another cpu can only make it worse.
> >
> > Clearly if you have 100s of cpu looping doing atomic operation
> > on the same cache line it is likely that some get starved.
> > But to fix that you need to increase the time between successful
> > operations, not delay on failure.
> 
> I would like to point out that the wikipedia article on
> compare-and-swap claims [1] that:
> 
> Instead of immediately retrying after a CAS operation fails,
> researchers have found that total system performance can be improved
> in multiprocessor systems—where many threads constantly update some
> particular shared variable—if threads that see their CAS fail use
> exponential backoff—in other words, wait a little before retrying the
> CAS [2].

Probably, but the real solution is 'don't do that'.
In any case I suspect the cpu_relax() explicitly lets the
other hyperthreading cpu run - which isn't useful at all.

What you actually want if for the cache logic to avoid losing
'exclusive' access to the cache line for enough clocks after a
failed compare+exchange to allow the cpu to re-issue the memory
cycle with an updated value.
You can't do anything about one cpu being starved, but a short
delay there is almost certainly beneficial.
(Some hardware cache engineer will probably say otherwise.)

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)