lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Y7bxjKusa2L/TNRE@mit.edu>
Date:   Thu, 5 Jan 2023 10:49:32 -0500
From:   "Theodore Ts'o" <tytso@....edu>
To:     Sarthak Kukreti <sarthakkukreti@...omium.org>
Cc:     "Darrick J. Wong" <djwong@...nel.org>, sarthakkukreti@...gle.com,
        dm-devel@...hat.com, linux-block@...r.kernel.org,
        linux-ext4@...r.kernel.org, linux-kernel@...r.kernel.org,
        linux-fsdevel@...r.kernel.org, Jens Axboe <axboe@...nel.dk>,
        "Michael S. Tsirkin" <mst@...hat.com>,
        Jason Wang <jasowang@...hat.com>,
        Stefan Hajnoczi <stefanha@...hat.com>,
        Alasdair Kergon <agk@...hat.com>,
        Mike Snitzer <snitzer@...nel.org>,
        Christoph Hellwig <hch@...radead.org>,
        Brian Foster <bfoster@...hat.com>,
        Andreas Dilger <adilger.kernel@...ger.ca>,
        Bart Van Assche <bvanassche@...gle.com>,
        Daniil Lunev <dlunev@...gle.com>
Subject: Re: [PATCH v2 3/7] fs: Introduce FALLOC_FL_PROVISION

On Wed, Jan 04, 2023 at 01:22:06PM -0800, Sarthak Kukreti wrote:
> > How expensive is this expected to be?  Is this why you wanted a separate
> > mode flag?
>
> Yes, the exact latency will depend on the stacked block devices and
> the fragmentation at the allocation layers.
> 
> I did a quick test for benchmarking fallocate() with an:
> A) ext4 filesystem mounted with 'noprovision'
> B) ext4 filesystem mounted with 'provision' on a dm-thin device.
> C) ext4 filesystem mounted with 'provision' on a loop device with a
> sparse backing file on the filesystem in (B).
> 
> I tested file sizes from 512M to 8G, time taken for fallocate() in (A)
> remains expectedly flat at ~0.01-0.02s, but for (B), it scales from
> 0.03-0.4s and for (C) it scales from 0.04s-0.52s (I captured the exact
> time distribution in the cover letter
> https://marc.info/?l=linux-ext4&m=167230113520636&w=2)
> 
> +0.5s for a 8G fallocate doesn't sound a lot but I think fragmentation
> and how the block device is layered can make this worse...

If userspace uses fallocate(2) there are generally two reasons.
Either they **really** don't want to get the NOSPC, in which case
noprovision will not give them what they want unless we modify their
source code to add this new FALLOC_FL_PROVISION flag --- which may not
be possible if it is provided in a binary-only format (for example,
proprietary databases shipped by companies beginning with the letters
'I' or 'O').

Or, they really care about avoiding fragmentation by giving a hint to
the file system that layout is important, and so **please** allocate
the space right away so that it is more likely that the space will be
laid out in a contiguous fashion.  Of course, the moment you use
thin-provisioning this goes out the window, since even if the space is
contiguous on the dm-thin layer, on the underlying storage layer it is
likely that things will be fragmented to a fare-thee-well, and either
(a) you have a vast amount of flash to try to mitigate the performance
hit of using thin-provisioning (example, hardware thin-provisioning
such as EMC storage arrays), or (b) you really don't care about
performance since space savings is what you're going for.

So.... because of the issue of changing the semantics of what
fallocate(2) will guarantee, unless programs are forced to change
their code to use this new FALLOC flag, I really am not very fond of
it.

I suspect that using a mount option (which should default to
"provision"; if you want to break user API expectations, it should
require a mount option for the system administrator to explicitly OK
such a change), is OK.

As far as the per-file mode --- I'm not convinced it's really
necessary.  In general if you are using thin-provisioning file systems
tend to be used explicitly for one purpose, so adding the complexity
of doing it on a per-file basis is probably not really needed.  That
being said, your existing prototype requires searching for the
extended attribute on every single file allocation, which is not a
great idea.  On a system with SELinux enabled, every file will have an
xattr block, and requiring that it be searched on every file
allocation would be unfortunate.  It would be better to check for the
xattr when the file is opened, and then setting a flag in the struct
file.  However, it might be better to see if it there is a real demand
for such a feature before adding it.

						- Ted

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ