[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <0d7bdb05-07e4-d414-dced-8bd30d1fd9c0@nvidia.com>
Date: Wed, 2 Nov 2022 01:09:58 +0000
From: Chaitanya Kulkarni <chaitanyak@...dia.com>
To: Bart Van Assche <bvanassche@....org>,
"linux-block@...r.kernel.org" <linux-block@...r.kernel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
CC: "axboe@...nel.dk" <axboe@...nel.dk>,
"damien.lemoal@...nsource.wdc.com" <damien.lemoal@...nsource.wdc.com>,
"johannes.thumshirn@....com" <johannes.thumshirn@....com>,
"ming.lei@...hat.com" <ming.lei@...hat.com>,
"shinichiro.kawasaki@....com" <shinichiro.kawasaki@....com>,
"vincent.fu@...sung.com" <vincent.fu@...sung.com>,
"yukuai3@...wei.com" <yukuai3@...wei.com>
Subject: Re: [PATCH] null_blk: allow teardown on request timeout
On 10/19/22 10:41, Bart Van Assche wrote:
> On 10/18/22 21:19, Chaitanya Kulkarni wrote:
>> Also, I've listed the problem that I've seen first hand for keeping the
>> device in the system that is non-responsive due to request timeouts, in
>> that case we should let user decide whether user wants to remove or keep
>> the device in the system instead of forcing user to keep the device in
>> the system bringing down whole system, and these problems are really
>> hard to debug even with Teledyne LeCroy [1]. This patch follows the same
>> philosophy where user can decide to opt in for removal with module
>> parameter. Once opt-in user knows what he is getting into.
>
> Hi Chaitanya,
>
> From commit f2298c0403b0 ("null_blk: multi queue aware block test
> driver"): "Written to facilitate testing of the blk-mq code". I'm not
> sure of this but adding a mechanism like the one in this patch may fall
> outside the original scope of the null_blk driver.
>
I did not understand your comment, this patch uses blk_mq_XXX() APIs:-
+ blk_freeze_queue_start(nullb->q);
+ blk_mq_quiesce_queue(nullb->q);
+ /*
+ * We already ensured that submit_bio() will not add any plugging by
+ * quiescing so it is safe to sync queue now.
+ */
+ blk_sync_queue(nullb->q);
+ blk_mq_tagset_busy_iter(nullb->tag_set, null_cancel_request, nullb);
+ blk_mq_tagset_wait_completed_request(nullb->tag_set);
+ /*
+ * Unblock any pending dispatch I/Os before we destroy the device.
+ * From null_destroy_dev()->del_gendisk() will set GD_DEAD flag
+ * causing any new I/O from __bio_queue_enter() to fail with -ENODEV.
+ */
+ blk_mq_unquiesce_queue(nullb->q);
which are called from error_work issued from blk-mq timeout handler
and that needs to be a part of null_blk so I can submit the testcases
in blktests, these testcases are part of the smoke tests under block
category where block tests needs to be run first to established the
stability of the block layer baseline with minimal driver then moving on
to complex subsystems of additional driver code.
Calling blk_mq_XXX() APIs from error work issued from block layer
timeout handlers facilitates the testing of the blk-mq code with
combination of different error injection parameters for null_blk with
blk_should_fake_timeout() as posted in the patch test-report.
-ck
Powered by blists - more mailing lists