linux-kernel - Re: [PATCH] null_blk: allow teardown on request timeout

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <Y00riC6UxmLDhI5P@T590>
Date:   Mon, 17 Oct 2022 18:16:40 +0800
From:   Ming Lei <ming.lei@...hat.com>
To:     Chaitanya Kulkarni <chaitanyak@...dia.com>
Cc:     "linux-block@...r.kernel.org" <linux-block@...r.kernel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "axboe@...nel.dk" <axboe@...nel.dk>,
        "damien.lemoal@...nsource.wdc.com" <damien.lemoal@...nsource.wdc.com>,
        "johannes.thumshirn@....com" <johannes.thumshirn@....com>,
        "bvanassche@....org" <bvanassche@....org>,
        "shinichiro.kawasaki@....com" <shinichiro.kawasaki@....com>,
        "vincent.fu@...sung.com" <vincent.fu@...sung.com>,
        "yukuai3@...wei.com" <yukuai3@...wei.com>
Subject: Re: [PATCH] null_blk: allow teardown on request timeout

On Mon, Oct 17, 2022 at 10:04:26AM +0000, Chaitanya Kulkarni wrote:
> On 10/17/22 02:50, Ming Lei wrote:
> > On Mon, Oct 17, 2022 at 09:30:47AM +0000, Chaitanya Kulkarni wrote:
> >>
> >>>> +	/*
> >>>> +	 * Unblock any pending dispatch I/Os before we destroy the device.
> >>>> +	 * From null_destroy_dev()->del_gendisk() will set GD_DEAD flag
> >>>> +	 * causing any new I/O from __bio_queue_enter() to fail with -ENODEV.
> >>>> +	 */
> >>>> +	blk_mq_unquiesce_queue(nullb->q);
> >>>> +
> >>>> +	null_destroy_dev(nullb);
> >>>
> >>> destroying device is never good cleanup for handling timeout/abort, and it
> >>> should have been the last straw any time.
> >>>
> >>
> >> That is exactly why I've added the rq_abort_limit, so until the limit
> >> is not reached null_abort_work() will not get scheduled and device is
> >> not destroyed.
> > 
> > I meant destroying device should only be done iff the normal abort handler
> > can't recover the device, however, your patch simply destroys device
> > without running any abort handling.
> > 
> 
> I did not understand your comment, can you please elaborate on exactly
> where and which abort handlers needs to be called in this patch before
> null_destroy_nullb() ?

In case of request timeout, there may be something wrong which needs
to be recovered.

> 
> the objective of this patch it to simulate the teardown scenario
> from timeout handler so it can get tested on regular basis with
> null_blk ...

Why does teardown scenario have to be triggered for timeout? That
looks you think teardown & destroying device for timeout is one normal
and common way, but I think it is not, the device shouldn't be removed
if it still can work. I have got such kind of complaints of disk
disappeared just by request timeout, such as, nvme-pci.


thanks,
Ming