linux-ext4 - Re: [PATCH] e2fsck: Discard free data and inode blocks.

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Date:	Fri, 22 Oct 2010 17:19:16 -0400
From:	"Martin K. Petersen" <martin.petersen@...cle.com>
To:	Ric Wheeler <ricwheeler@...il.com>
Cc:	Andreas Dilger <adilger.kernel@...ger.ca>,
	Lukas Czerner <lczerner@...hat.com>,
	linux-ext4@...r.kernel.org, tytso@....edu, sandeen@...hat.com
Subject: Re: [PATCH] e2fsck: Discard free data and inode blocks.

>>>>> "Ric" == Ric Wheeler <ricwheeler@...il.com> writes:

Ric> Just to further confuse things, if we just want to zero a device,
Ric> there is the (relatively old) WRITE_SAME command that arrays
Ric> use. Note that it is quite a bit faster than doing this from the
Ric> server since you only transfer over one block of data and the disk
Ric> firmware does the rest - no data transfer for each block once you
Ric> start.

Ric> It can certainly take a long, long time, but would be faster than
Ric> zeroing a drive with write() system calls :)

I took some stabs at this in the spring. And while it looked like a good
idea on paper it turned out not to be a huge win unless the FC link was
heavily congested due to traffic to other devices.

First of all many drives have a cap on the maximum number of blocks
that can be written using one WRITE SAME command. Typically you can only
write 16-32 megs at a time. So I needed to have a bunch of magic to
scale down and retry while attempting to find the sweet spot.

Fred tried to convince T10 that it would be nice to have a field in the
block limits VPD that would indicate the max WRITE SAME blocks a device
supported. But T10 thought that was a bad idea and the proposal was
rejected. Otherwise I would have wired that up and we could have handled
generic WRITE SAME like we do the discard case.

The other problem is that the WRITE SAME may take a looong time. And so
we need special timeouts in place to prevent regular error handling from
kicking in while the drive is busy wiping stuff.

I guess we could just pick a number (16 MB, maybe) and define that as
the max. Picking a low number also has the benefit of being less likely
to interfere with timeouts.

If there's interest I'll be happy to revisit my patches...

-- 
Martin K. Petersen	Oracle Linux Engineering
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html