lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1516314012.2676.76.camel@wdc.com>
Date:   Thu, 18 Jan 2018 22:20:13 +0000
From:   Bart Van Assche <Bart.VanAssche@....com>
To:     "snitzer@...hat.com" <snitzer@...hat.com>
CC:     "dm-devel@...hat.com" <dm-devel@...hat.com>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "hch@...radead.org" <hch@...radead.org>,
        "linux-block@...r.kernel.org" <linux-block@...r.kernel.org>,
        "osandov@...com" <osandov@...com>,
        "axboe@...nel.dk" <axboe@...nel.dk>,
        "ming.lei@...hat.com" <ming.lei@...hat.com>
Subject: Re: [RFC PATCH] blk-mq: fixup RESTART when queue becomes idle

On Thu, 2018-01-18 at 17:01 -0500, Mike Snitzer wrote:
> And yet Laurence cannot reproduce any such lockups with your test...

Hmm ... maybe I misunderstood Laurence but I don't think that Laurence has
already succeeded at running an unmodified version of my tests. In one of the
e-mails Laurence sent me this morning I read that he modified these scripts
to get past a kernel module unload failure that was reported while starting
these tests. So the next step is to check which changes were made to the test
scripts and also whether the test results are still valid.

> Are you absolutely certain this patch doesn't help you?
> https://patchwork.kernel.org/patch/10174037/
> 
> If it doesn't then that is actually very useful to know.

The first I tried this morning is to run the srp-test software against a merge
of Jens' for-next branch and your dm-4.16 branch. Since I noticed that the dm
queue locked up I reinserted a blk_mq_delay_run_hw_queue() call in the dm code.
Since even that was not sufficient I tried to kick the queues via debugfs (for
s in /sys/kernel/debug/block/*/state; do echo kick >$s; done). Since that was
not sufficient to resolve the queue stall I reverted the following tree patches
that are in Jens' tree:
* "blk-mq: improve DM's blk-mq IO merging via blk_insert_cloned_request feedback"
* "blk-mq-sched: remove unused 'can_block' arg from blk_mq_sched_insert_request"
* "blk-mq: don't dispatch request in blk_mq_request_direct_issue if queue is busy"

Only after I had done this the srp-test software ran again without triggering
dm queue lockups. Sorry but I have not yet had the time to test patch "[RFC]
blk-mq: fixup RESTART when queue becomes idle".

> Please just focus on helping Laurence get his very capable testbed to
> reproduce this issue.  Once we can reproduce these "unkillable" "stalls"
> in-house it'll be _much_ easier to analyze and fix.

OK, I will work with Laurence on this. Maybe Laurence and I should work on this
before analyzing the lockup that was mentioned above further?

Bart.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ