linux-kernel - Re: [PATCH 1/6] fs: add hole punching to fallocate

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <AANLkTimwmJ_ZoE9oAuA1WGhCgK585jDznqnc6k0=9Ntb@mail.gmail.com>
Date:	Tue, 11 Jan 2011 16:13:42 -0500
From:	Lawrence Greenfield <leg@...gle.com>
To:	Dave Chinner <david@...morbit.com>
Cc:	"Ted Ts'o" <tytso@....edu>, Josef Bacik <josef@...hat.com>,
	linux-kernel@...r.kernel.org, linux-btrfs@...r.kernel.org,
	linux-ext4@...r.kernel.org, linux-fsdevel@...r.kernel.org,
	xfs@....sgi.com, joel.becker@...cle.com, cmm@...ibm.com,
	cluster-devel@...hat.com
Subject: Re: [PATCH 1/6] fs: add hole punching to fallocate

On Tue, Nov 9, 2010 at 6:40 PM, Dave Chinner <david@...morbit.com> wrote:
> On Tue, Nov 09, 2010 at 04:41:47PM -0500, Ted Ts'o wrote:
>> On Tue, Nov 09, 2010 at 03:42:42PM +1100, Dave Chinner wrote:
>> > Implementation is up to the filesystem. However, XFS does (b)
>> > because:
>> >
>> >     1) it was extremely simple to implement (one of the
>> >        advantages of having an exceedingly complex allocation
>> >        interface to begin with :P)
>> >     2) conversion is atomic, fast and reliable
>> >     3) it is independent of the underlying storage; and
>> >     4) reads of unwritten extents operate at memory speed,
>> >        not disk speed.
>>
>> Yeah, I was thinking that using a device-style TRIM might be better
>> since future attempts to write to it won't require a separate seek to
>> modify the extent tree.  But yeah, there are a bunch of advantages of
>> simply mutating the extent tree.
>>
>> While we're on the subject of changes to fallocate, what do people
>> think of FALLOC_FL_EXPOSE_OLD_DATA, which requires either root
>> privileges or (if capabilities are in use) CAP_DAC_OVERRIDE &&
>> CAP_MAC_OVERRIDE && CAP_SYS_ADMIN.  This would allow a trusted process
>> to fallocate blocks with the extent already marked initialized.  I've
>> had two requests for such functionality for ext4 already.
>
> We removed that ability from XFS about three years ago because it's
> a massive security hole. e.g. what happens if the file is world
> readable, even though the process that called
> FALLOC_FL_EXPOSE_OLD_DATA was privileged and was allowed to expose
> such data? Or the file is chmod 777 after being exposed?
>
> The historical reason for such behaviour existing in XFS was that in
> 1997 the CPU and IO latency cost of unwritten extent conversion was
> significant, so users with real physical security (i.e. marines with
> guns) were able to make use of fast preallocation with no conversion
> overhead without caring about the security implications. These days,
> the performance overhead of unwritten extent conversion is minimal -
> I generally can't measure a difference in IO performance as a result
> of it - so there is simply no good reaѕon for leaving such a gaping
> security hole in the system.
>
> If anyone wants to read the underlying data, then use fiemap to map
> the physical blocks and read it directly from the block device. That
> requires root privileges but does not open any new stale data
> exposure problems....
>
>> (Take for example a trusted cluster filesystem backend that checks the
>> object checksum before returning any data to the user; and if the
>> check fails the cluster file system will try to use some other replica
>> stored on some other server.)
>
> IOWs, all they want to do is avoid the unwritten extent conversion
> overhead. Time has shown that a bad security/performance tradeoff
> decision was made 13 years ago in XFS, so I see little reason to
> repeat it for ext4 today....

I'd make use of FALLOC_FL_EXPOSE_OLD_DATA. It's not the CPU overhead
of extent conversion. It's that extent conversion causes more metadata
operations than what you'd have otherwise, which means systems that
want to use O_DIRECT and make sure the data doesn't go away either
have to write O_DIRECT|O_DSYNC or need to call fdatasync().

cluster file system implementor,
Larry

>
> Cheers,
>
> Dave.
> --
> Dave Chinner
> david@...morbit.com
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@...r.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/