[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1271887007.2893.352.camel@mulgrave.site>
Date: Wed, 21 Apr 2010 17:56:47 -0400
From: James Bottomley <James.Bottomley@...senPartnership.com>
To: Greg Freemyer <greg.freemyer@...il.com>
Cc: Ric Wheeler <rwheeler@...hat.com>, sandeen@...hat.com,
Eric Sandeen <esandeen@...hat.com>,
Jeff Moyer <jmoyer@...hat.com>,
Mark Lord <kernel@...savvy.com>,
Lukas Czerner <lczerner@...hat.com>,
linux-ext4@...r.kernel.org, Edward Shishkin <eshishki@...hat.com>,
Christoph Hellwig <hch@...radead.org>
Subject: Re: [PATCH 2/2] Add batched discard support for ext4.
On Wed, 2010-04-21 at 17:47 -0400, Greg Freemyer wrote:
> Adding James Bottomley because high-end scsi is entering the
> discussion. James, I have a couple scsi questions for you at the end.
>
> On Wed, Apr 21, 2010 at 5:03 PM, Ric Wheeler <rwheeler@...hat.com> wrote:
> > On 04/21/2010 05:01 PM, Eric Sandeen wrote:
> >>
> >> On 04/21/2010 03:44 PM, Greg Freemyer wrote:
> >>
> >>
> >>>
> >>> Mark's benchmarks showed this as doable in seconds which seems like a
> >>> reasonable amount of time for a mount time operation.
> >>>
> >>
> >> All the other things aside, mount-time is interesting, but it's an
> >> infrequent operation, at least in my world. I think we need something
> >> that can be done runtime.
> >>
> >> For anything with uptime, I don't think it's acceptable to wait until
> >> the next mount to trim unused blocks.
So what's wrong with using wiper.sh? It can do online discard of
filesystems that support delayed allocation (ext4, xfs etc.)?
> >> But as long as the mechanism can be called either at mount time and/or
> >> kicked off runtime somehow, I'm happy.
> >>
> >> -Eric
> >>
> >
> > That makes sense to me. Most enterprise servers will go without remounting
> > a file system for (hopefully!) a very long time.
> >
> > It is really important to keep in mind that this is not just a laptop
> > feature for laptop SSD's, this is also used by high end arrays and *could*
> > be useful for virt IO, etc as well :-)
> >
> > ric
>
> I'm not arguing that a runtime solution is not needed.
>
> I'm arguing that at least for SSD backed filesystems Mark's userspace
> implementation shows how the mount time initialization of the runtime
> bitmap can be accomplished in a few seconds by leveraging the hardware
> and using vector'ed trims as opposed to having to build an additional
> on-disk structure.
>
> At least for SSDs, the primary purpose of the proposed on-disk
> structure seems to be to overcome the current lack of a vector'ed
> discard implementation.
>
> If it is too difficult to implement a fully functional vector'ed
> discard in the block layer due to locking issues, possibly a special
> purpose version could be written that is only used at mount time when
> one can be assured no other i/o is occurring to the filesystem.
>
> James,
>
> The ATA-8 spec. supports vectored trims and requires a minimum of 255
> sectors worth of range payload be supported. That equates to a single
> trim being able to trim thousands of ranges in one command.
>
> Mark Lord has benchmarked in found a vectored trim to be drastically
> faster than calling trim individually for each of those ranges.
>
> Does scsi support vector'ed discard? (ie. write-same commands)
only with UNMAP. WRITE SAME is effectively single range.
> Or are high-end scsi arrays so fast they can process tens of thousands
> of discard commands in a reasonable amount of time, unlike the SSDs
> have so far proven to do.
No ... they actually have two problems: firstly they can only use
discard ranges which align with their internal block size (usually
something huge like 3/4MB) and then a trim operation tends to be O(1)
and slow, so they'd actually like discard accumulation.
> It would be interesting to find out that a SSD can discard thousands
> of ranges drastically faster than a high-end scsi device can. But if
> true, that might argue for the on-disk bitmap to track previously
> discarded blocks/extents.
I think SSDs and Arrays both have discard problems, arrays more to do
with the time and expense of the operation, SSDs because the TRIM
command isn't queued.
James
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists