lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250330064732.3781046-1-mcgrof@kernel.org>
Date: Sat, 29 Mar 2025 23:47:29 -0700
From: Luis Chamberlain <mcgrof@...nel.org>
To: brauner@...nel.org,
	jack@...e.cz,
	tytso@....edu,
	adilger.kernel@...ger.ca,
	linux-ext4@...r.kernel.org,
	riel@...riel.com
Cc: willy@...radead.org,
	hannes@...xchg.org,
	oliver.sang@...el.com,
	dave@...olabs.net,
	david@...hat.com,
	axboe@...nel.dk,
	hare@...e.de,
	david@...morbit.com,
	djwong@...nel.org,
	ritesh.list@...il.com,
	linux-fsdevel@...r.kernel.org,
	linux-block@...r.kernel.org,
	linux-mm@...ck.org,
	gost.dev@...sung.com,
	p.raghav@...sung.com,
	da.gomez@...sung.com,
	mcgrof@...nel.org
Subject: [PATCH 0/3] mm: move migration work around to buffer-heads

We have an eye-sore of a spin lock held during page migration which
was added for a ext4 jbd corruption fix for which we have no clear
public corruption data. We want to remove the spin lock on mm/migrate
so to help buffer-head filesystems embrace large folios, since we
can cond_resched() on large folios on folio_mc_copy(). I've managed
to reproduce a corruption by just removing the spinlock and stressing
ext4 with generic/750, a corruption happens after 3 hours.

The spin lock was added to help ext4 jbd and other users of
buffer_migrate_folio_norefs(), so the block device cache and nilfs2.
This does the work to move the heuristic needed to avoid page migration
to back to the buffere-head code on __find_get_block_slow() and only
to users of buffer_migrate_folio_norefs(). I have ran generic/750 over
20 hours and don't see the corruption issue.

I've also ran this patchset against all the following ext4 profiles on
all fstests tests and have found no regression, I've published the
baseline based on linux-next tag next-20250328 onto kdevops [0]. For
further sanity I've also tested this patchset against blktests as well
and found no regressions.

ext4-defaults
ext4-1k
ext4-2k
ext4-4k
ext4-bigalloc16k-4k
ext4-bigalloc32k-4k
ext4-bigalloc64k-4k
ext4-bigalloc1024k-4k
ext4-bigalloc2048k-4k
ext4-advanced-features

[0] https://github.com/linux-kdevops/kdevops/commit/3ecd638e67b14162b76b733a120e6e1b55698cc9

Luis Chamberlain (3):
  mm/migrate: add might_sleep() on __migrate_folio()
  fs/buffer: avoid races with folio migrations on
    __find_get_block_slow()
  mm/migrate: avoid atomic context on buffer_migrate_folio_norefs()
    migration

 fs/buffer.c  | 9 +++++++++
 mm/migrate.c | 6 +++---
 2 files changed, 12 insertions(+), 3 deletions(-)

-- 
2.47.2


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ