lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <0d7bdb05-07e4-d414-dced-8bd30d1fd9c0@nvidia.com>
Date:   Wed, 2 Nov 2022 01:09:58 +0000
From:   Chaitanya Kulkarni <chaitanyak@...dia.com>
To:     Bart Van Assche <bvanassche@....org>,
        "linux-block@...r.kernel.org" <linux-block@...r.kernel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
CC:     "axboe@...nel.dk" <axboe@...nel.dk>,
        "damien.lemoal@...nsource.wdc.com" <damien.lemoal@...nsource.wdc.com>,
        "johannes.thumshirn@....com" <johannes.thumshirn@....com>,
        "ming.lei@...hat.com" <ming.lei@...hat.com>,
        "shinichiro.kawasaki@....com" <shinichiro.kawasaki@....com>,
        "vincent.fu@...sung.com" <vincent.fu@...sung.com>,
        "yukuai3@...wei.com" <yukuai3@...wei.com>
Subject: Re: [PATCH] null_blk: allow teardown on request timeout

On 10/19/22 10:41, Bart Van Assche wrote:
> On 10/18/22 21:19, Chaitanya Kulkarni wrote:
>> Also, I've listed the problem that I've seen first hand for keeping the
>> device in the system that is non-responsive due to request timeouts, in
>> that case we should let user decide whether user wants to remove or keep
>> the device in the system instead of forcing user to keep the device in
>> the system bringing down whole system, and these problems are really
>> hard to debug even with Teledyne LeCroy [1]. This patch follows the same
>> philosophy where user can decide to opt in for removal with module
>> parameter. Once opt-in user knows what he is getting into.
> 
> Hi Chaitanya,
> 
>  From commit f2298c0403b0 ("null_blk: multi queue aware block test 
> driver"): "Written to facilitate testing of the blk-mq code". I'm not 
> sure of this but adding a mechanism like the one in this patch may fall 
> outside the original scope of the null_blk driver.
> 

I did not understand your comment, this patch uses blk_mq_XXX() APIs:-

+	blk_freeze_queue_start(nullb->q);
+	blk_mq_quiesce_queue(nullb->q);
+	/*
+	 * We already ensured that submit_bio() will not add any plugging by
+	 * quiescing so it is safe to sync queue now.
+	 */
+	blk_sync_queue(nullb->q);
+	blk_mq_tagset_busy_iter(nullb->tag_set, null_cancel_request, nullb);
+	blk_mq_tagset_wait_completed_request(nullb->tag_set);
+	/*
+	 * Unblock any pending dispatch I/Os before we destroy the device.
+	 * From null_destroy_dev()->del_gendisk() will set GD_DEAD flag
+	 * causing any new I/O from __bio_queue_enter() to fail with -ENODEV.
+	 */
+	blk_mq_unquiesce_queue(nullb->q);

which are called from error_work issued from blk-mq timeout handler
and that needs to be a part of null_blk so I can submit the testcases
in blktests, these testcases are part of the smoke tests under block
category where block tests needs to be run first to established the
stability of the block layer baseline with minimal driver then moving on 
to complex subsystems of additional driver code.

Calling blk_mq_XXX() APIs from error work issued from block layer
timeout handlers facilitates the testing of the blk-mq code with
combination of different error injection parameters for null_blk with
blk_should_fake_timeout() as posted in the patch test-report.

-ck

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ