lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <yq11rofghlq.fsf@oracle.com>
Date:   Wed, 22 Apr 2020 20:40:01 -0400
From:   "Martin K. Petersen" <martin.petersen@...cle.com>
To:     Dave Chinner <david@...morbit.com>
Cc:     "Martin K. Petersen" <martin.petersen@...cle.com>,
        Chaitanya Kulkarni <chaitanya.kulkarni@....com>, hch@....de,
        darrick.wong@...cle.com, axboe@...nel.dk, tytso@....edu,
        adilger.kernel@...ger.ca, ming.lei@...hat.com, jthumshirn@...e.de,
        minwoo.im.dev@...il.com, damien.lemoal@....com,
        andrea.parri@...rulasolutions.com, hare@...e.com, tj@...nel.org,
        hannes@...xchg.org, khlebnikov@...dex-team.ru, ajay.joshi@....com,
        bvanassche@....org, arnd@...db.de, houtao1@...wei.com,
        asml.silence@...il.com, linux-block@...r.kernel.org,
        linux-ext4@...r.kernel.org
Subject: Re: [PATCH 0/4] block: Add support for REQ_OP_ASSIGN_RANGE


Dave,

>> Not before overwriting, no. Once you have allocated an LBA it remains
>> allocated until you discard it.

> Ok, so you are confirming what I thought: it's almost completely
> useless to us.
>
> i.e. this requires issuing IO to "reserve" space whilst preserving
> data before every metadata object goes from clean to dirty in memory.

You can only reserve the space prior to writing a block for the first
time. Once an LBA has been written ("Mapped" in the SCSI state machine),
it remains allocated until it is explicitly deallocated (via a
discard/Unmap operation).

This part of the SCSI spec was written eons ago under the assumption
that when a physical resource backing a given LBA had been established,
you could write the block over and over without having to allocate new
space.

This used to be true, but obviously the introduction of de-duplication
blew a major hole in that. I have been perusing the spec over and over
trying to understand how block provisioning state transitions are
defined when dedup is in the picture. However, much is left unexplained.

As a result, I reached out to various folks. Including the people who
worked on this feature in the standards way back. And the response that
I get from them is that allocation operation got irreparably broken when
support for de-duplication was added to the spec. Nobody attempted to
fix the state transitions since most vendors only cared about
deallocation. Consequently specifying the exact behavior of the
allocation operation in the context of dedup fell by the wayside.

The recommendation I got was that we should not rely on this feature
despite it being advertised as supported by the storage. I looked at
whether it was feasible to support it on non-dedup devices only, but it
does not look like it's worthwhile to pursue. And as a result there is
no need for block layer allocation operation to have parity with
SCSI. Although we may want to keep NVMe in mind when defining the
semantics.

-- 
Martin K. Petersen	Oracle Linux Engineering

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ