lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [day] [month] [year] [list]
Date:   Mon, 17 Oct 2016 20:52:50 -0800
From:   Kent Overstreet <kent.overstreet@...il.com>
To:     Ming Lei <ming.lei@...onical.com>, Jens Axboe <axboe@...com>
Cc:     linux-kernel@...r.kernel.org, linux-fsdevel@...r.kernel.org
Subject: Bug in fua code

Ming,

I recently discovered a bug in the FUA code - a recent bcachefs change exposed
it - and my best guess is it's related to your recent changes to blk-flush.c.

What I'm seeing is if all writes are issued as FUA writes, in a short period of
time the request queue get stuck - writes are on the queue but they aren't being
issued or completed. This is with an AHCI device - so no blk-mq, and it's
emulating FUA with flushes.

You ought to be able to reproduce this yourself by changing
generic_make_request() to make all writes FUA, and then just doing O_DIRECT
writes with dd or something. I suspect that if there's non FUA flushes being
issued they'll end up kicking the queue and keeping things from getting stuck,
in my testing I'm only seeing things get completely stuck when testing bcachefs
in multi device mode, with no metadata or journal IO to the device in question,
just FUA data writes.

After things get stuck, with kgdb I'm seeing a request on the request queue that
has flush_data_end_io for its endio function. I've still been trying to figure
out how the flush machinery is supposed to work, I don't know what else you'd
want to know.

Much appreciated if you could take a look.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ