linux-ext4 - [Bug 217965] ext4(?) regression since 6.5.0 on sata hdd

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <bug-217965-13602-YI7vwozwNJ@https.bugzilla.kernel.org/>
Date: Sat, 18 Nov 2023 12:10:04 +0000
From: bugzilla-daemon@...nel.org
To: linux-ext4@...r.kernel.org
Subject: [Bug 217965] ext4(?) regression since 6.5.0 on sata hdd

https://bugzilla.kernel.org/show_bug.cgi?id=217965

--- Comment #40 from Ojaswin Mujoo (ojaswin.mujoo@....com) ---
Hey Eyal,

Thanks for the data, the perf probes you added were correct!

I see that the problem is as I suspected where we keep looking trying to find
aligned blocks in ext4 when probably none of them exist. Aligned allocation
right now is only done when stripe mount option is passed as an optimization.
Currently we don't seem to fallback to normal allocation if aligned allocation
doesn't work and this causes the very long, seemingly infinite looping. 

I can try to work on a patchset that fixes this however as a temporary fix you
can continue with stripe mount option turned off for ext4. This will then
instruct ext4 to just use normal allocation rather than aligned.

One point to note is that -o stripe=xyz is sometimes automatically added during
mount even when we don't pass it. You can look at Comment #6 #7 and #8 in this
bug for more info. To confirm it's off you can look into
/proc/fs/ext4/<dev>/options file which has all the currently active mount
options, you shouldn't see stripe there.

Further, this is not something that was changed between 6.4 and 6.5 however
seems like the allocator changes in 6.5 made it even more difficult to come out
of this loop thus prolonging the time taken to flush. 

Also, just wanted to check if you have any non-prod setup where you'd be open
to compile kernel with patches to see if we are able to fix the issue.

Lastly, thank you so much for all the probe data and logs, it's been a huge
help :)

Regard,
ojaswin

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.