lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20121208005042.GQ27172@dastard>
Date:	Sat, 8 Dec 2012 11:50:42 +1100
From:	Dave Chinner <david@...morbit.com>
To:	Howard Chu <hyc@...as.com>
Cc:	Ric Wheeler <rwheeler@...hat.com>, Theodore Ts'o <tytso@....edu>,
	Steven Rostedt <rostedt@...dmis.org>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Ingo Molnar <mingo@...nel.org>,
	Christoph Hellwig <hch@...radead.org>,
	Martin Steigerwald <Martin@...htvoll.de>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	linux-fsdevel <linux-fsdevel@...r.kernel.org>
Subject: Re: [PATCH, 3.7-rc7, RESEND] fs: revert commit bbdd6808 to fallocate
 UAPI

On Fri, Dec 07, 2012 at 03:25:53PM -0800, Howard Chu wrote:
> Ric Wheeler wrote:
> >On 12/07/2012 04:14 PM, Theodore Ts'o wrote:
> >>On Fri, Dec 07, 2012 at 02:30:19PM -0500, Steven Rostedt wrote:
> >>>How is this similar? By adding this bit, we removed incentive from a
> >>>group of developers that have the means to fix the real issue at hand
> >>>(the performance problem with ext4). Thus, it means that they have a work
> >>>around that's good enough for them, but the rest of us suffer.
> >>That assumes that there **is** a way to claw back the performance
> >>loss, and Chris Mason has demonstrated the performance hit exists with
> >>xfs as well (950 MB/s vs. 400 MB/s; that's more than a factor of two).
> >>Sometimes, you have to make the engineering tradeoffs.  That's why
> >>we're engineers, for goodness sakes.  Sometimes, it's just not
> >>possible to square the circle.
> >>
> >>I don't believe that the technique of forcing people who need that
> >>performance to suffer in order to induce them to try to engineer a
> >>solution which may or may not exist is really the best or fairest way
> >>to go about things.
> >>
> >>					- Ted
> >
> >This is not a generally useful feature and won't ship in a way that helps most
> >users with this issue.
> 
> >Let's fix the problem properly.
> >
> >In the meantime, there are several obvious ways to avoid this performance hit
> >without changing the kernel (fully allocate and write the data, certainly
> >reasonable for even reasonable sized files).
> 
> I have to agree that, if this is going to be an ext4-specific
> feature, then it can just be implemented via an ext4-specific ioctl
> and be done with it. But I'm not convinced this should be an
> ext4-specific feature.
> 
> As for "fix the problem properly" - you're fixing the wrong problem.
> This type of feature is important to me, not just because of the
> performance issue. As has already been pointed out, the performance
> difference may even be negligible.
> 
> But on SSDs, the issue is write endurance. The whole point of
> preallocating a file is to avoid doing incremental metadata updates.
> Particularly when each of those 1-bit status updates costs entire
> blocks, and gratuitously shortens the life of the media. The fact
> that avoiding the unnecessary wear and tear may also yield a
> performance boost is just icing on the cake. (And if the perf boost
> is over a factor of 2:1 that's some pretty damn good icing.)

That's a filesystem implementation specific problem, not a generic
fallocate() or unwritten extent conversion problem.

Besides, ext4 doesn't write back every metadata modification that is
made - they are aggregated in memory and only written when the
journal is full or the metadata ages out. Hence unwritten extent
conversion has very little impact on the amount of writes that are
done to the flash because it is vastly dominated by the data writes.

Similarly, in XFS you might see a few thousand or tens of thousands
of metadata blocks get written once every 30s under such a random
write workload, but each metadata block might have gone through a
million changes in memory since the last time it was written.
Indeed, in that 30s, there would have been a few million random data
writes so the metadata writes are well and truly lost in the
noise...

Cheers,

Dave.
-- 
Dave Chinner
david@...morbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ