lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CY4PR04MB375105E77F87B60E74E025BAE7280@CY4PR04MB3751.namprd04.prod.outlook.com>
Date:   Mon, 7 Sep 2020 12:53:20 +0000
From:   Damien Le Moal <Damien.LeMoal@....com>
To:     Kanchan Joshi <joshiiitr@...il.com>
CC:     Christoph Hellwig <hch@....de>,
        Kanchan Joshi <joshi.k@...sung.com>,
        Jens Axboe <axboe@...nel.dk>,
        "sagi@...mberg.me" <sagi@...mberg.me>,
        Johannes Thumshirn <Johannes.Thumshirn@....com>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "linux-nvme@...ts.infradead.org" <linux-nvme@...ts.infradead.org>,
        Keith Busch <kbusch@...nel.org>,
        Selvakumar S <selvakuma.s1@...sung.com>,
        Javier Gonzalez <javier.gonz@...sung.com>,
        Nitesh Shetty <nj.shetty@...sung.com>
Subject: Re: [PATCH 1/2] nvme: set io-scheduler requirement for ZNS

On 2020/09/07 20:54, Kanchan Joshi wrote:
> On Mon, Sep 7, 2020 at 5:07 PM Damien Le Moal <Damien.LeMoal@....com> wrote:
>>
>> On 2020/09/07 20:24, Kanchan Joshi wrote:
>>> On Mon, Sep 7, 2020 at 1:52 PM Damien Le Moal <Damien.LeMoal@....com> wrote:
>>>>
>>>> On 2020/09/07 16:01, Kanchan Joshi wrote:
>>>>>> Even for SMR, the user is free to set the elevator to none, which disables zone
>>>>>> write locking. Issuing writes correctly then becomes the responsibility of the
>>>>>> application. This can be useful for settings that for instance use NCQ I/O
>>>>>> priorities, which give better results when "none" is used.
>>>>>
>>>>> Was it not a problem that even if the application is sending writes
>>>>> correctly, scheduler may not preserve the order.
>>>>> And even when none is being used, re-queue can happen which may lead
>>>>> to different ordering.
>>>>
>>>> "Issuing writes correctly" means doing small writes, one per zone at most. In
>>>> that case, it does not matter if the block layer reorders writes. Per zone, they
>>>> will still be sequential.
>>>>
>>>>>> As far as I know, zoned drives are always used in tightly controlled
>>>>>> environments. Problems like "does not know what other applications would be
>>>>>> doing" are non-existent. Setting up the drive correctly for the use case at hand
>>>>>> is a sysadmin/server setup problem, based on *the* application (singular)
>>>>>> requirements.
>>>>>
>>>>> Fine.
>>>>> But what about the null-block-zone which sets MQ-deadline but does not
>>>>> actually use write-lock to avoid race among multiple appends on a
>>>>> zone.
>>>>> Does that deserve a fix?
>>>>
>>>> In nullblk, commands are executed under a spinlock. So there is no concurrency
>>>> problem. The spinlock serializes the execution of all commands. null_blk zone
>>>> append emulation thus does not need to take the scheduler level zone write lock
>>>> like scsi does.
>>>
>>> I do not see spinlock for that. There is one "nullb->lock", but its
>>> scope is limited to memory-backed handling.
>>> For concurrent zone-appends on a zone, multiple threads may set the
>>> "same" write-pointer into incoming request(s).
>>> Are you referring to any other spinlock that can avoid "same wp being
>>> returned to multiple threads".
>>
>> Checking again, it looks like you are correct. nullb->lock is indeed only used
>> for processing read/write with memory backing turned on.
>> We either need to extend that spinlock use, or add one to protect the zone array
>> when doing zoned commands and checks of read/write against a zone wp.
>> Care to send a patch ? I can send one too.
> 
> Sure, I can send.
> Do you think it is not OK to use zone write-lock (same like SCSI
> emulation) instead of introducing a new spinlock?

zone write lock will not protect against read or zone management commands
executed concurrently with writes. Only concurrent writes to the same zone will
be serialized with the scheduler zone write locking, which may not be used at
all also if the user set the scheduler to none. A lock for exclusive access and
changes to the zone array is needed.


> 
> 


-- 
Damien Le Moal
Western Digital Research

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ