lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Wed, 21 Apr 2010 17:56:47 -0400
From:	James Bottomley <James.Bottomley@...senPartnership.com>
To:	Greg Freemyer <greg.freemyer@...il.com>
Cc:	Ric Wheeler <rwheeler@...hat.com>, sandeen@...hat.com,
	Eric Sandeen <esandeen@...hat.com>,
	Jeff Moyer <jmoyer@...hat.com>,
	Mark Lord <kernel@...savvy.com>,
	Lukas Czerner <lczerner@...hat.com>,
	linux-ext4@...r.kernel.org, Edward Shishkin <eshishki@...hat.com>,
	Christoph Hellwig <hch@...radead.org>
Subject: Re: [PATCH 2/2] Add batched discard support for ext4.

On Wed, 2010-04-21 at 17:47 -0400, Greg Freemyer wrote:
> Adding James Bottomley because high-end scsi is entering the
> discussion.  James, I have a couple scsi questions for you at the end.
> 
> On Wed, Apr 21, 2010 at 5:03 PM, Ric Wheeler <rwheeler@...hat.com> wrote:
> > On 04/21/2010 05:01 PM, Eric Sandeen wrote:
> >>
> >> On 04/21/2010 03:44 PM, Greg Freemyer wrote:
> >>
> >>
> >>>
> >>> Mark's benchmarks showed this as doable in seconds which seems like a
> >>> reasonable amount of time for a mount time operation.
> >>>
> >>
> >> All the other things aside, mount-time is interesting, but it's an
> >> infrequent operation, at least in my world.  I think we need something
> >> that can be done runtime.
> >>
> >> For anything with uptime, I don't think it's acceptable to wait until
> >> the next mount to trim unused blocks.

So what's wrong with using wiper.sh?  It can do online discard of
filesystems that support delayed allocation (ext4, xfs etc.)?

> >> But as long as the mechanism can be called either at mount time and/or
> >> kicked off runtime somehow, I'm happy.
> >>
> >> -Eric
> >>
> >
> > That makes sense to me.  Most enterprise servers will go without remounting
> > a file system for (hopefully!) a very long time.
> >
> > It is really important to keep in mind that this is not just a laptop
> > feature for laptop SSD's, this is also used by high end arrays and *could*
> > be useful for virt IO, etc as well :-)
> >
> > ric
> 
> I'm not arguing that a runtime solution is not needed.
> 
> I'm arguing that at least for SSD backed filesystems Mark's userspace
> implementation shows how the mount time initialization of the runtime
> bitmap can be accomplished in a few seconds by leveraging the hardware
> and using vector'ed trims as opposed to having to build an additional
> on-disk structure.
> 
> At least for SSDs, the primary purpose of the proposed on-disk
> structure seems to be to overcome the current lack of a vector'ed
> discard implementation.
> 
> If it is too difficult to implement a fully functional vector'ed
> discard in the block layer due to locking issues, possibly a special
> purpose version could be written that is only used at mount time when
> one can be assured no other i/o is occurring to the filesystem.
> 
> James,
> 
> The ATA-8 spec. supports vectored trims and requires a minimum of 255
> sectors worth of range payload be supported.  That equates to a single
> trim being able to trim thousands of ranges in one command.
> 
> Mark Lord has benchmarked in found a vectored trim to be drastically
> faster than calling trim individually for each of those ranges.
> 
> Does scsi support vector'ed discard? (ie. write-same commands)

only with UNMAP.  WRITE SAME is effectively single range.

> Or are high-end scsi arrays so fast they can process tens of thousands
> of discard commands in a reasonable amount of time, unlike the SSDs
> have so far proven to do.

No ... they actually have two problems: firstly they can only use
discard ranges which align with their internal block size (usually
something huge like 3/4MB) and then a trim operation tends to be O(1)
and slow, so they'd actually like discard accumulation.

> It would be interesting to find out that a SSD can discard thousands
> of ranges drastically faster than a high-end scsi device can.  But if
> true, that might argue for the on-disk bitmap to track previously
> discarded blocks/extents.

I think SSDs and Arrays both have discard problems, arrays more to do
with the time and expense of the operation, SSDs because the TRIM
command isn't queued.

James


--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ