lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 08 Apr 2020 00:10:12 -0400
From:   "Martin K. Petersen" <martin.petersen@...cle.com>
To:     Dave Chinner <david@...morbit.com>
Cc:     "Martin K. Petersen" <martin.petersen@...cle.com>,
        Chaitanya Kulkarni <chaitanya.kulkarni@....com>, hch@....de,
        darrick.wong@...cle.com, axboe@...nel.dk, tytso@....edu,
        adilger.kernel@...ger.ca, ming.lei@...hat.com, jthumshirn@...e.de,
        minwoo.im.dev@...il.com, damien.lemoal@....com,
        andrea.parri@...rulasolutions.com, hare@...e.com, tj@...nel.org,
        hannes@...xchg.org, khlebnikov@...dex-team.ru, ajay.joshi@....com,
        bvanassche@....org, arnd@...db.de, houtao1@...wei.com,
        asml.silence@...il.com, linux-block@...r.kernel.org,
        linux-ext4@...r.kernel.org
Subject: Re: [PATCH 0/4] block: Add support for REQ_OP_ASSIGN_RANGE


Hi Dave!

>> In the standards space, the allocation concept was mainly aimed at
>> protecting filesystem internals against out-of-space conditions on
>> devices that dedup identical blocks and where simply zeroing the blocks
>> therefore is ineffective.

> Um, so we're supposed to use space allocation before overwriting
> existing metadata in the filesystem?

Not before overwriting, no. Once you have allocated an LBA it remains
allocated until you discard it.

> So that the underlying storage can reserve space for it before we
> write it? Which would mean we have to issue a space allocation before
> we dirty the metadata, which means before we dirty any metadata in a
> transaction. Which means we'll basically have to redesign the
> filesystems from the ground up, yes?

My understanding is that this facility was aimed at filesystems that do
not dynamically allocate metadata. The intent was that mkfs would
preallocate the metadata LBA ranges, not the filesystem. For filesystems
that allocate metadata dynamically, then yes, an additional step is
required if you want to pin the LBAs.

> You might be talking about filesystem metadata and block devices,
> but this patchset ends up connecting ext4's user data fallocate() to
> the block device, thereby allowing users to reserve space directly
> in the underlying block device and directly exposing this issue to
> userspace.

I missed that Chaitanya's repost of this series included the ext4 patch.
Sorry!

>> How XFS decides to enforce space allocation policy and potentially
>> leverage this plumbing is entirely up to you.
>
> Do I understand this correctly? i.e. that it is the filesystem's
> responsibility to prevent users from preallocating more space than
> exists in an underlying storage pool that has been intentionally
> hidden from the filesystem so it can be underprovisioned?

No. But as an administrative policy it is useful to prevent runaway
applications from writing a petabyte of random garbage to media. My
point was that it is up to you and the other filesystem developers to
decide how you want to leverage the low-level allocation capability and
how you want to provide it to processes. And whether CAP_SYS_ADMIN,
ulimit, or something else is the appropriate policy interface for this.

In terms of thin provisioning and space management there are various
thresholds that may be reported by the device. In past discussions there
haven't been much interest in getting these exposed. It is also unclear
to me whether it is actually beneficial to send low space warnings to
hundreds or thousands of hosts attached to an array. In many cases the
individual server admins are not even the right audience. The most
common notification mechanism is a message to the storage array admin
saying "click here to buy more disk".

If you feel there is merit in having the kernel emit the threshold
warnings you could use as a feedback mechanism, I can absolutely look
into that.

-- 
Martin K. Petersen	Oracle Linux Engineering

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ