lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <54b5e83b-4b08-4c5e-afec-1c672561fa81@gmail.com>
Date: Tue, 16 Jan 2024 13:14:05 -0600
From: stuart hayes <stuart.w.hayes@...il.com>
To: Max Gurtovoy <mgurtovoy@...dia.com>, Keith Busch <kbusch@...nel.org>
Cc: linux-kernel@...r.kernel.org, Jens Axboe <axboe@...nel.dk>,
 Christoph Hellwig <hch@....de>, Sagi Grimberg <sagi@...mberg.me>,
 linux-nvme@...ts.infradead.org
Subject: Re: [PATCH] nvme_core: scan namespaces asynchronously



On 1/12/2024 1:36 PM, stuart hayes wrote:
> 
>>
>>
>> On 04/01/2024 18:47, Keith Busch wrote:
>>> On Thu, Jan 04, 2024 at 10:38:26AM -0600, Stuart Hayes wrote:
>>>> Currently NVME namespaces are scanned serially, so it can take a long time
>>>> for all of a controller's namespaces to become available, especially with a
>>>> slower (fabrics) interface with large number (~1000) of namespaces.
>>>>
>>>> Use async function calls to make namespace scanning happen in parallel,
>>>> and add a (boolean) module parameter "async_ns_scan" to enable this.
>>>
>>> Hm, we're not doing a whole lot of blocking IO to bring up a namespace,
>>> so I'm a little surprised it makes a noticable difference. How much time
>>> improvement are you observing by parallelizing the scan? Is there a
>>> tipping point in Number of Namespaces where inline scanning is better
>>> than asynchronous? And if it is a meaningful gain, let's not introduce
>>> another module parameter to disable it.
>>
>> I don't think it is a good idea since some of the namespace characteristics must be validated during re-connection time for example.
>> I actually prepared a patch that makes sure we sync the ns scanning before kicking the ns blk queue to avoid that situations.
>> for example, if for some reason ns1 change its uuid then we must remove it and open a new bdev instead. We can't kick old request to it...
>>
> 
> 
> Sorry for the delayed response--I thought I could get exact data on how long it takes with and
> without the patch before I responded, it is taking a while (I'm having to rely on someone else
> to do the testing).  I'll respond with the data as soon as I get it--hopefully it won't be too
> much longer.  The time it takes to scan namespaces adds up when there are 1000 namespaces and
> you have a fabrics controller on a network that isn't too fast.
> 
> I don't expect there would be any reason to disable this.  I only put the module parameter to
> disable it in case there was some unforeseen issue, but I can remove that.
> 
> To Max Gurtovoy--this patch wouldn't change when or how namespaces are validated... it just
> puts the actual scan work function on a workqueue so the scans can happen in parallel.  It will
> do the same work to scan, at the same point, and it will wait for all the scanning to finish
> before proceeding.  I don't understand how this patch would make the situation you mention any
> worse.
> 

I have numbers for the namespace scan time improvement.  Below is the amount of time it took for
all of the namespaces to show up when connecting to a controller with 1002 namespaces:

network latency   time without patch    time with patch
   0                        6s                 1s
  50                      210s                10s
100                      417s                18s

I'll prepare a v2, removing the module parameter and including this data.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ