[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <v2w87f94c371004211159ue3a48923u6a543e8090dcdfa6@mail.gmail.com>
Date: Wed, 21 Apr 2010 14:59:21 -0400
From: Greg Freemyer <greg.freemyer@...il.com>
To: Eric Sandeen <sandeen@...hat.com>
Cc: Mark Lord <kernel@...savvy.com>,
Lukas Czerner <lczerner@...hat.com>,
linux-ext4@...r.kernel.org, Jeff Moyer <jmoyer@...hat.com>,
Edward Shishkin <eshishki@...hat.com>,
Eric Sandeen <esandeen@...hat.com>,
Ric Wheeler <rwheeler@...hat.com>
Subject: Re: [PATCH 2/2] Add batched discard support for ext4.
On Tue, Apr 20, 2010 at 10:45 PM, Eric Sandeen <sandeen@...hat.com> wrote:
> Mark Lord wrote:
>> On 20/04/10 05:21 PM, Greg Freemyer wrote:
>>> Mark,
>>>
>>> This is the patch implementing the new discard logic.
>> ..
>>> Signed-off-by: Lukas Czerner <lczerner@...hat.com>
>> ..
>>>> +void ext4_trim_extent(struct super_block *sb, int start, int count,
>>>> + ext4_group_t group, struct ext4_buddy *e4b)
>>>> +{
>>>> + ext4_fsblk_t discard_block;
>>>> + struct ext4_super_block *es = EXT4_SB(sb)->s_es;
>>>> + struct ext4_free_extent ex;
>>>> +
>>>> + assert_spin_locked(ext4_group_lock_ptr(sb, group));
>>>> +
>>>> + ex.fe_start = start;
>>>> + ex.fe_group = group;
>>>> + ex.fe_len = count;
>>>> +
>>>> + mb_mark_used(e4b,&ex);
>>>> + ext4_unlock_group(sb, group);
>>>> +
>>>> + discard_block = (ext4_fsblk_t)group *
>>>> + EXT4_BLOCKS_PER_GROUP(sb)
>>>> + + start
>>>> + + le32_to_cpu(es->s_first_data_block);
>>>> + trace_ext4_discard_blocks(sb,
>>>> + (unsigned long long)discard_block,
>>>> + count);
>>>> + sb_issue_discard(sb, discard_block, count);
>>>> +
>>>> + ext4_lock_group(sb, group);
>>>> + mb_free_blocks(NULL, e4b, start, ex.fe_len);
>>>> +}
>>>
>>> Mark, unless I'm missing something, sb_issue_discard() above is going
>>> to trigger a trim command for just the one range. I thought the
>>> benchmarks you did showed that a collection of ranges needed to be
>>> built, then a single trim command invoked that trimmed that group of
>>> ranges.
>> ..
>>
>> Mmm.. If that's what it is doing, then this patch set would be a
>> complete disaster.
>> It would take *hours* to do the initial TRIM.
>>
>> Lukas ?
>
> I'm confused; do we have an interface to send a trim command for multiple ranges?
>
> I didn't think so ... Lukas' patch is finding free ranges (above a size threshold)
> to discard; it's not doing it a block at a time, if that's the concern.
>
> -Eric
Eric,
I don't know what kernel APIs have been created to support discard,
but the ATA8 draft spec. allows for specifying multiple ranges in one
trim command.
See section 7.10.3.1 and .2 of the latest draft spec.
Both talk about multiple trim ranges per trim command (think thousands
of ranges per command).
Recent hdparm versions accept a trim command argument that causes
multiple ranges to be trimmed per command.
--trim-sector-ranges Tell SSD firmware to discard unneeded
data sectors: lba:count ..
--trim-sector-ranges-stdin Same as above, but reads lba:count pairs from stdin
As I understand it, this is critical from a performance perspective
for the SSDs Mark tested with. ie. He found a single trim command
with 1000 ranges takes much less time than 1000 discrete trim
commands.
Per Mark's comment's in wiper.sh, a trim command can have a minimum of
128KB of associated range information, so it is thousands of ranges
that can be discarded in a single command
ie. hdparm can accept extremely large lists of ranges on stdin, but it
parses the list into discrete trim commands with thousands of ranges
per command.
A kernel implementation which is trying to implement after that fact
discards as this patch is doing, also needs to somehow craft trim
commands with a large payload of ranges if it is going to be
efficient.
If the block layer cannot do this yet, then in my opinion this type of
batched discarding needs to stay in user space as done with Mark's
wiper.sh script and enhanced hdparm until the block layer grows that
ability.
Greg
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists