linux-kernel - Re: [PATCH] lib/genalloc: use try_cmpxchg in {set,clear}_bits

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAFULd4aDORSrq7zf_LcAZRP8HOHcrq2-rGMaroKyG2zQDHNpOA@mail.gmail.com>
Date:   Mon, 23 Jan 2023 16:04:43 +0100
From:   Uros Bizjak <ubizjak@...il.com>
To:     David Laight <David.Laight@...lab.com>
Cc:     Andrew Morton <akpm@...ux-foundation.org>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        Mateusz Guzik <mjguzik@...il.com>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] lib/genalloc: use try_cmpxchg in {set,clear}_bits_ll

On Thu, Jan 19, 2023 at 1:47 PM David Laight <David.Laight@...lab.com> wrote:
>
> > BTW: Recently, it was determined [1] that the usage of cpu_relax()
> > inside the cmpxchg loop can be harmful for performance. We actually
> > have the same situation here, so perhaps cpu_relax() should be removed
> > in the same way it was removed from the lockref.
>
> I'm not sure you can ever want a cpu_relax() in a loop that
> is implementing an atomic operation.
> Even the ia64 (die...) issue was with a loop that was waiting
> for another cpu to change the location (eg a spinlock).
>
> For an atomic operation an immediate retry is likely to succeed.
> Any kind of deferral to an another cpu can only make it worse.
>
> Clearly if you have 100s of cpu looping doing atomic operation
> on the same cache line it is likely that some get starved.
> But to fix that you need to increase the time between successful
> operations, not delay on failure.

I would like to point out that the wikipedia article on
compare-and-swap claims [1] that:

Instead of immediately retrying after a CAS operation fails,
researchers have found that total system performance can be improved
in multiprocessor systems—where many threads constantly update some
particular shared variable—if threads that see their CAS fail use
exponential backoff—in other words, wait a little before retrying the
CAS [2].

[1] https://en.wikipedia.org/wiki/Compare-and-swap#Overview
[2] https://arxiv.org/pdf/1305.5800.pdf

Uros.