lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <021b5195-9a09-4cc2-064f-940ada9cf764@deltatee.com>
Date:   Wed, 24 Jul 2019 13:12:03 -0600
From:   Logan Gunthorpe <logang@...tatee.com>
To:     Sagi Grimberg <sagi@...mberg.me>, linux-kernel@...r.kernel.org,
        linux-nvme@...ts.infradead.org
Cc:     Keith Busch <kbusch@...nel.org>, Jens Axboe <axboe@...com>,
        Christoph Hellwig <hch@....de>
Subject: Re: [PATCH 2/2] nvme-core: Fix deadlock when deleting the ctrl while
 scanning

Hey,

Sorry for the delay.

I tested your patch and it does work. Do you want me to send your change
as a full patch? Can I add your signed-off-by?

On 2019-07-18 6:50 p.m., Sagi Grimberg wrote:
>> I didn't think the scan_lock was that contested or that
>> nvme_change_ctrl_state() was really called that often...
> 
> it shouldn't be, but I think it makes the flow more convoluted
> as we serialize by flushing the scan_work right after...

I would argue that the check for state in nvme_scan_work() without a
lock is racy and confusing. There's nothing to prevent the state from
changing immediately after the check.

> The design principal is met as we do get the I/O failing,
> but its just that with mpath we simply queue the I/O again
> because the head->list happens to not be empty.
> Perhaps taking care of that check is cleaner.

Yes, I feel your patch is a good solution on it's own merits.
> Thanks. Do you have a firm reproducer for it?

Yes. If you connect to and then immediately disconnect from a target (at
least with nvme-loop) you will reliably trigger this bug -- or one of
the others I've sent patches for.

>>>> +    mutex_lock(&ctrl->scan_lock);
>>>> +
>>>>        if (ctrl->state != NVME_CTRL_LIVE)
>>>>            return;
>>>
>>> unlock
>>
>> If we unlock here and relock below, we'd have to recheck the ctrl->state
>> to avoid any races. If you don't want to call nvme_identify_ctrl with
>> the lock held, then it would probably be better to move the state check
>> below it.
> 
> Meant before the return statement.

Ah, right, my mistake.

Logan

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ