lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <n2x87f94c371004211447w1e57ae43j4c988cf0205d7ec2@mail.gmail.com>
Date:	Wed, 21 Apr 2010 17:47:27 -0400
From:	Greg Freemyer <greg.freemyer@...il.com>
To:	Ric Wheeler <rwheeler@...hat.com>
Cc:	sandeen@...hat.com, Eric Sandeen <esandeen@...hat.com>,
	Jeff Moyer <jmoyer@...hat.com>,
	Mark Lord <kernel@...savvy.com>,
	Lukas Czerner <lczerner@...hat.com>,
	linux-ext4@...r.kernel.org, Edward Shishkin <eshishki@...hat.com>,
	Christoph Hellwig <hch@...radead.org>,
	James Bottomley <James.Bottomley@...senpartnership.com>
Subject: Re: [PATCH 2/2] Add batched discard support for ext4.

Adding James Bottomley because high-end scsi is entering the
discussion.  James, I have a couple scsi questions for you at the end.

On Wed, Apr 21, 2010 at 5:03 PM, Ric Wheeler <rwheeler@...hat.com> wrote:
> On 04/21/2010 05:01 PM, Eric Sandeen wrote:
>>
>> On 04/21/2010 03:44 PM, Greg Freemyer wrote:
>>
>>
>>>
>>> Mark's benchmarks showed this as doable in seconds which seems like a
>>> reasonable amount of time for a mount time operation.
>>>
>>
>> All the other things aside, mount-time is interesting, but it's an
>> infrequent operation, at least in my world.  I think we need something
>> that can be done runtime.
>>
>> For anything with uptime, I don't think it's acceptable to wait until
>> the next mount to trim unused blocks.
>>
>> But as long as the mechanism can be called either at mount time and/or
>> kicked off runtime somehow, I'm happy.
>>
>> -Eric
>>
>
> That makes sense to me.  Most enterprise servers will go without remounting
> a file system for (hopefully!) a very long time.
>
> It is really important to keep in mind that this is not just a laptop
> feature for laptop SSD's, this is also used by high end arrays and *could*
> be useful for virt IO, etc as well :-)
>
> ric

I'm not arguing that a runtime solution is not needed.

I'm arguing that at least for SSD backed filesystems Mark's userspace
implementation shows how the mount time initialization of the runtime
bitmap can be accomplished in a few seconds by leveraging the hardware
and using vector'ed trims as opposed to having to build an additional
on-disk structure.

At least for SSDs, the primary purpose of the proposed on-disk
structure seems to be to overcome the current lack of a vector'ed
discard implementation.

If it is too difficult to implement a fully functional vector'ed
discard in the block layer due to locking issues, possibly a special
purpose version could be written that is only used at mount time when
one can be assured no other i/o is occurring to the filesystem.

James,

The ATA-8 spec. supports vectored trims and requires a minimum of 255
sectors worth of range payload be supported.  That equates to a single
trim being able to trim thousands of ranges in one command.

Mark Lord has benchmarked in found a vectored trim to be drastically
faster than calling trim individually for each of those ranges.

Does scsi support vector'ed discard? (ie. write-same commands)

Or are high-end scsi arrays so fast they can process tens of thousands
of discard commands in a reasonable amount of time, unlike the SSDs
have so far proven to do.

It would be interesting to find out that a SSD can discard thousands
of ranges drastically faster than a high-end scsi device can.  But if
true, that might argue for the on-disk bitmap to track previously
discarded blocks/extents.

Greg
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ