lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <k2p87f94c371004211352h34ddd0c4ve7cb1f1747c0e9f8@mail.gmail.com>
Date:	Wed, 21 Apr 2010 16:52:55 -0400
From:	Greg Freemyer <greg.freemyer@...il.com>
To:	Jeff Moyer <jmoyer@...hat.com>
Cc:	Ric Wheeler <rwheeler@...hat.com>,
	Eric Sandeen <sandeen@...hat.com>,
	Mark Lord <kernel@...savvy.com>,
	Lukas Czerner <lczerner@...hat.com>,
	linux-ext4@...r.kernel.org, Edward Shishkin <eshishki@...hat.com>,
	Eric Sandeen <esandeen@...hat.com>,
	Christoph Hellwig <hch@...radead.org>
Subject: Re: [PATCH 2/2] Add batched discard support for ext4.

correcting Christoph's email address - no other edits/comments

On Wed, Apr 21, 2010 at 3:22 PM, Jeff Moyer <jmoyer@...hat.com> wrote:
> Ric Wheeler <rwheeler@...hat.com> writes:
>
>> On 04/21/2010 02:59 PM, Greg Freemyer wrote:
>>> On Tue, Apr 20, 2010 at 10:45 PM, Eric Sandeen<sandeen@...hat.com>  wrote:
>>>> Mark Lord wrote:
>>>>> On 20/04/10 05:21 PM, Greg Freemyer wrote:
>>>>>> Mark,
>>>>>>
>>>>>> This is the patch implementing the new discard logic.
>>>>> ..
>>>>>> Signed-off-by: Lukas Czerner<lczerner@...hat.com>
>>>>> ..
>>>>>>> +void ext4_trim_extent(struct super_block *sb, int start, int count,
>>>>>>> +               ext4_group_t group, struct ext4_buddy *e4b)
>>>>>>> +{
>>>>>>> +       ext4_fsblk_t discard_block;
>>>>>>> +       struct ext4_super_block *es = EXT4_SB(sb)->s_es;
>>>>>>> +       struct ext4_free_extent ex;
>>>>>>> +
>>>>>>> +       assert_spin_locked(ext4_group_lock_ptr(sb, group));
>>>>>>> +
>>>>>>> +       ex.fe_start = start;
>>>>>>> +       ex.fe_group = group;
>>>>>>> +       ex.fe_len = count;
>>>>>>> +
>>>>>>> +       mb_mark_used(e4b,&ex);
>>>>>>> +       ext4_unlock_group(sb, group);
>>>>>>> +
>>>>>>> +       discard_block = (ext4_fsblk_t)group *
>>>>>>> +                       EXT4_BLOCKS_PER_GROUP(sb)
>>>>>>> +                       + start
>>>>>>> +                       + le32_to_cpu(es->s_first_data_block);
>>>>>>> +       trace_ext4_discard_blocks(sb,
>>>>>>> +                       (unsigned long long)discard_block,
>>>>>>> +                       count);
>>>>>>> +       sb_issue_discard(sb, discard_block, count);
>>>>>>> +
>>>>>>> +       ext4_lock_group(sb, group);
>>>>>>> +       mb_free_blocks(NULL, e4b, start, ex.fe_len);
>>>>>>> +}
>>>>>>
>>>>>> Mark, unless I'm missing something, sb_issue_discard() above is going
>>>>>> to trigger a trim command for just the one range.  I thought the
>>>>>> benchmarks you did showed that a collection of ranges needed to be
>>>>>> built, then a single trim command invoked that trimmed that group of
>>>>>> ranges.
>>>>> ..
>>>>>
>>>>> Mmm.. If that's what it is doing, then this patch set would be a
>>>>> complete disaster.
>>>>> It would take *hours* to do the initial TRIM.
>
> Except it doesn't.  Lukas did provide numbers in his original email.
>
>>>>> Lukas ?
>>>>
>>>> I'm confused; do we have an interface to send a trim command for multiple ranges?
>>>>
>>>> I didn't think so ...  Lukas' patch is finding free ranges (above a size threshold)
>>>> to discard; it's not doing it a block at a time, if that's the concern.
>>>>
>>>> -Eric
>>>
>>> Eric,
>>>
>>> I don't know what kernel APIs have been created to support discard,
>>> but the ATA8 draft spec. allows for specifying multiple ranges in one
>>> trim command.
>
> Well, sb_issue_discard is what ext4 is using, and that takes a single
> range.  I don't know if anyone has looked into adding a vectored API.
>
>>
>> Greg,
>>
>> We have full support for this in the "discard" support at the file
>> system layer for several file systems.
>
> Actually, we don't support what Greg is talking about, to my knowledge.
>
>> The block layer effectively muxes the "discard" into the right target
>> device command. TRIM for ATA, WRITE_SAME (with unmap) or UNMAP for
>> SCSI...
>>
>> If your favourite fs supports this, you can enable this feature with
>> "-o
>> discard" for fine grained discards,
>
> Thanks, it's worth pointing out that TRIM is not the only backend to the
> discard API.  However, even if we do implement a vectored API, we can
> translate that to dumber commands if a given spec doesn't support it.
>
> Getting back to the problem...
>
> From the file system, you want to discard discrete ranges of blocks.
> The API to support this can either take care of the data integrity
> guarantees by itself, or make the upper layer ensure that trim and write
> do not pass each other.  The current implementation does the latter.  In
> order to do the former, there is the potential for a lot of overhead to
> be introduced into the block allocation layers for the file systems.
>
> So, given the above, it is up to the file system to send down the
> biggest discard requests it can in order to reduce the overhead of the
> command.  If a vectored approach is made available, then that would be
> even better.  Christoph, is this something that's on your radar?
>
> Cheers,
> Jeff
>



-- 
Greg Freemyer
Head of EDD Tape Extraction and Processing team
Litigation Triage Solutions Specialist
http://www.linkedin.com/in/gregfreemyer
CNN/TruTV Aired Forensic Imaging Demo -
   http://insession.blogs.cnn.com/2010/03/23/how-computer-evidence-gets-retrieved/

The Norcross Group
The Intersection of Evidence & Technology
http://www.norcrossgroup.com
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ