linux-ext4 - [Bug 219166] ext4 hang when setting echo noop

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <bug-219166-13602-UmX07mEQiJ@https.bugzilla.kernel.org/>
Date: Fri, 16 Aug 2024 20:36:42 +0000
From: bugzilla-daemon@...nel.org
To: linux-ext4@...r.kernel.org
Subject: [Bug 219166] ext4 hang when setting echo noop >
 /sys/block/sda/queue/scheduler

https://bugzilla.kernel.org/show_bug.cgi?id=219166

--- Comment #3 from Theodore Tso (tytso@....edu) ---
So FWIW, what we saw in our data center kernel was switching between one valid
scheduler to a different valid schedule while I/O was in flight.   And this was
with a kernel that didn't have any modules (or not any modules that would be
loaded under normal circumstance, so modprobe wouldn't have been in the
picture).   It triggered rarely as well, and I don't remember whether it was an
oops or a hang --- if I remember correctly, it was a oops.   So it might not be
the same thing, but our workaround was to quiescece the device before changing
the scheduler.  Since this was happening in the boot sequence, it was something
we could do relatively easily, and like you we then lost interest.  :-)

The question is whether or not I want to close it; the question is whether we
think it's worth trying to ask the block layer developers to try to take a look
at it.   Right now it's mostly only ext4 developers who are paying attention to
this bug componet, so someone would need to take it up to the block developers,
and the thing that would be most useful is a reliable reproducer.

I don't know what guestfish is doing, but if we were trying to create a
reproducer from what I was seeing a few years ago, it would be something like
running fsstress or fio to exercise the block layer, and then try switching the
I/O scheduler and see if we can make it go *boom* regularly.   Maybe with
something like that we could get a reproducer that doesn't require launching a
VM multiple times and only seeing the failure less than 0.5% of the time....

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.