linux-ext4 - Re: [PATCH v2 03/16] ext4: remove unnecessary s_md_lock on update s_mb_last

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20250714030113.GA23343@mit.edu>
Date: Sun, 13 Jul 2025 23:01:13 -0400
From: "Theodore Ts'o" <tytso@....edu>
To: Jan Kara <jack@...e.cz>
Cc: Baokun Li <libaokun1@...wei.com>, linux-ext4@...r.kernel.org,
        adilger.kernel@...ger.ca, ojaswin@...ux.ibm.com,
        linux-kernel@...r.kernel.org, yi.zhang@...wei.com,
        yangerkun@...wei.com
Subject: Re: [PATCH v2 03/16] ext4: remove unnecessary s_md_lock on update
 s_mb_last_group

On Thu, Jul 10, 2025 at 04:38:33PM +0200, Jan Kara wrote:
> 
> Yes, apparently both approaches have their pros and cons. I'm actually
> surprised the impact of additional barriers on ARM is so big for the
> single container case. 10% gain for single container cases look nice OTOH
> realistical workloads will have more container so maybe that's not worth
> optimizing for. Ted, do you have any opinion?

Let me try to summarize; regardless of whether we use
{READ,WRITE})_ONCE or smp_load_acquire / smp_store_restore, both are
signiicantly better than using a the spinlock.  The other thing about
the "single-threaded perforance" is that there is the aditional cost
of the CPU-to-CPU syncing is not free.  But CPU synchronization cost
applies when that the single thread is bouncing between CPU's --- if
we hada single threaded application which is pinned on a single CPU
cost of smp_load_acquire would't be there since the cache line
wouldn't be bouncing back and forth.  Is that correct, or am I missing
something?

In any case, so long as the single-threaded performance doesn't
regress relative to the current spin_lock implementation, I'm inclined
to prefer the use smp_load_acquire approach if it improves
multi-threaded allocation performance on ARM64.

Cheers,

							- Ted