linux-ext4 - Re: [PATCH] mke2fs: Inform user about ongoing discard

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20101122202002.GA2767@thunk.org>
Date:	Mon, 22 Nov 2010 15:20:02 -0500
From:	Ted Ts'o <tytso@....edu>
To:	Lukas Czerner <lczerner@...hat.com>
Cc:	linux-ext4@...r.kernel.org, sandeen@...hat.com, adilger@...ger.ca
Subject: Re: [PATCH] mke2fs: Inform user about ongoing discard

On Thu, Oct 21, 2010 at 04:23:02PM +0200, Lukas Czerner wrote:
> Since there are some slow SSD's out there and big thinly provisioned
> storages on which it takes quite long to issue discard through whole
> device, it would be nice to provide user the information about what is
> going on and how long it will take (approximately).
> 
> Signed-off-by: Lukas Czerner <lczerner@...hat.com>

Hi Lukas,

I've looked at this patch, and one thing that disturbs me about it ---
you are discarding the first percentage of the disk five percent times
for no good reason just to get the timing, before then executing the
discard for the entire disk.   There are a couple of problems with this:

*) For smart/competently implemented SSD's, discarding the same part
of the disk five times might lead to a misleading timing --- the smart
device could easily determine that the first 1% is already not in use
after the first discard, and the subsequent 4 discards could be
discard as no-ops.

*) Mark Lord has claimed that there exists a large number of
incomptently implemented SSD's out there, that may actually be
executing a flash erase of the discarded region.  If true, executing
an extra flash erase on 1% of the disk for no good reason five times
might not be the best thing to do for the longetivity of the device.

I was tempted to fix this up myself, but since I'm trying to get
better at delegating work to others, may I suggest the following
changes?

1)  Implement block device ioctl's for the kernel that export the
discard_granularity, discard_alignment, and max_discard_sectors.  

2) Change mke2fs so that the discard is done in a separate function.
Said function should attempt to fetch the discard_granularity,
discard_alignment, and max_discard_sectors.

3) This new function in mke2fs should start by discarding
approximately 1% of device at a time, respecting discard_granularity
and discard_alignment.  If the time to discard 1% of the device is
less than a second, then it should double the amount that it discards
at a time.  If the time to discard takes longer than 4 seconds, it
should reduce the amount that it discards by half (again, always
respecting discard_granularity and discard_alignment).  The function
can display the amount of time elapsed and the estimated amount of
time remaining after each chunk of the device that it discards,
assuming it can use ^M to redraw the progress report (which of course
should be suppressed if the -q option is specified on the command
line).

This design doesn't "waste" any discards, which is both faster and
reduces wear on badly designed SSD's.  It also continuously updates
the user with the amount of time it takes to complete the discard
process.  It also will respect the discard_granularity and
discard_alignment restrictions; and of course, it allows the user to
interrupt the discard, without needing a special kernel patch.

Does this make sense to you?

						- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html