lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20251231024316.4643-1-CFSworks@gmail.com>
Date: Tue, 30 Dec 2025 18:43:11 -0800
From: Sam Edwards <cfsworks@...il.com>
To: Xiubo Li <xiubli@...hat.com>,
	Ilya Dryomov <idryomov@...il.com>
Cc: Viacheslav Dubeyko <Slava.Dubeyko@....com>,
	Christian Brauner <brauner@...nel.org>,
	Milind Changire <mchangir@...hat.com>,
	Jeff Layton <jlayton@...nel.org>,
	ceph-devel@...r.kernel.org,
	linux-kernel@...r.kernel.org,
	Sam Edwards <CFSworks@...il.com>
Subject: [PATCH 0/5] ceph: CephFS writeback correctness and performance fixes

Hello list,

This series addresses several interrelated CephFS writeback issues,
particularly for fscrypted files. My work began with a performance problem:
encrypted files caused a write storm during writeback because the writeback
code was inadvertently selecting the crypto block instead of the stripe unit as
the maximum write unit size.

While testing that fix, I encountered a correctness bug: failures to allocate
bounce pages during writeback were incorrectly propagated as batch errors,
which trigger kernel oopses/panics due to poor handling in the writeback loop.
While investigating that, I discovered that the same oopses could be triggered
by a failure in ceph_submit_write() as well.

The patches in this series:

1. Prevent bounce page allocation failures from aborting the writeback batch
   and causing a kernel oops/panic due to the page array not being freed.
2. Remove the now-redundant error return from ceph_process_folio_batch().
3. Free page arrays during failure in ceph_submit_write(), preventing another
   path to the same kernel oops/panic. This was not an issue I encountered in
   testing, and it is tricky to trigger organically. I used the fault injection
   framework to confirm it and verify the fix.
4. Assert writeback loop invariants explicitly to help prevent regressions and
   aid debugging should the problem reappear.
5. Fix the write storm on fscrypted files by using the correct stripe unit.

Note that this series follows a "fix-then-refactor" cadence: patches 1, 3, and
5 fix bugs and are intended for stable, while patches 2 and 4 represent code
cleanup and are intended only for next.

Wishing you all a prosperous 2026 ahead,
Sam

Sam Edwards (5):
  ceph: Do not propagate page array emplacement errors as batch errors
  ceph: Remove error return from ceph_process_folio_batch()
  ceph: Free page array when ceph_submit_write fails
  ceph: Assert writeback loop invariants
  ceph: Fix write storm on fscrypted files

 fs/ceph/addr.c | 35 +++++++++++++++++++----------------
 1 file changed, 19 insertions(+), 16 deletions(-)

-- 
2.51.2


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ