lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Wed, 21 Apr 2010 15:22:18 -0400
From:	Jeff Moyer <jmoyer@...hat.com>
To:	Ric Wheeler <rwheeler@...hat.com>
Cc:	Greg Freemyer <greg.freemyer@...il.com>,
	Eric Sandeen <sandeen@...hat.com>,
	Mark Lord <kernel@...savvy.com>,
	Lukas Czerner <lczerner@...hat.com>,
	linux-ext4@...r.kernel.org, Edward Shishkin <eshishki@...hat.com>,
	Eric Sandeen <esandeen@...hat.com>,
	Christoph Hellwig <hch@...adead.org>
Subject: Re: [PATCH 2/2] Add batched discard support for ext4.

Ric Wheeler <rwheeler@...hat.com> writes:

> On 04/21/2010 02:59 PM, Greg Freemyer wrote:
>> On Tue, Apr 20, 2010 at 10:45 PM, Eric Sandeen<sandeen@...hat.com>  wrote:
>>> Mark Lord wrote:
>>>> On 20/04/10 05:21 PM, Greg Freemyer wrote:
>>>>> Mark,
>>>>>
>>>>> This is the patch implementing the new discard logic.
>>>> ..
>>>>> Signed-off-by: Lukas Czerner<lczerner@...hat.com>
>>>> ..
>>>>>> +void ext4_trim_extent(struct super_block *sb, int start, int count,
>>>>>> +               ext4_group_t group, struct ext4_buddy *e4b)
>>>>>> +{
>>>>>> +       ext4_fsblk_t discard_block;
>>>>>> +       struct ext4_super_block *es = EXT4_SB(sb)->s_es;
>>>>>> +       struct ext4_free_extent ex;
>>>>>> +
>>>>>> +       assert_spin_locked(ext4_group_lock_ptr(sb, group));
>>>>>> +
>>>>>> +       ex.fe_start = start;
>>>>>> +       ex.fe_group = group;
>>>>>> +       ex.fe_len = count;
>>>>>> +
>>>>>> +       mb_mark_used(e4b,&ex);
>>>>>> +       ext4_unlock_group(sb, group);
>>>>>> +
>>>>>> +       discard_block = (ext4_fsblk_t)group *
>>>>>> +                       EXT4_BLOCKS_PER_GROUP(sb)
>>>>>> +                       + start
>>>>>> +                       + le32_to_cpu(es->s_first_data_block);
>>>>>> +       trace_ext4_discard_blocks(sb,
>>>>>> +                       (unsigned long long)discard_block,
>>>>>> +                       count);
>>>>>> +       sb_issue_discard(sb, discard_block, count);
>>>>>> +
>>>>>> +       ext4_lock_group(sb, group);
>>>>>> +       mb_free_blocks(NULL, e4b, start, ex.fe_len);
>>>>>> +}
>>>>>
>>>>> Mark, unless I'm missing something, sb_issue_discard() above is going
>>>>> to trigger a trim command for just the one range.  I thought the
>>>>> benchmarks you did showed that a collection of ranges needed to be
>>>>> built, then a single trim command invoked that trimmed that group of
>>>>> ranges.
>>>> ..
>>>>
>>>> Mmm.. If that's what it is doing, then this patch set would be a
>>>> complete disaster.
>>>> It would take *hours* to do the initial TRIM.

Except it doesn't.  Lukas did provide numbers in his original email.

>>>> Lukas ?
>>>
>>> I'm confused; do we have an interface to send a trim command for multiple ranges?
>>>
>>> I didn't think so ...  Lukas' patch is finding free ranges (above a size threshold)
>>> to discard; it's not doing it a block at a time, if that's the concern.
>>>
>>> -Eric
>>
>> Eric,
>>
>> I don't know what kernel APIs have been created to support discard,
>> but the ATA8 draft spec. allows for specifying multiple ranges in one
>> trim command.

Well, sb_issue_discard is what ext4 is using, and that takes a single
range.  I don't know if anyone has looked into adding a vectored API.

>
> Greg,
>
> We have full support for this in the "discard" support at the file
> system layer for several file systems.

Actually, we don't support what Greg is talking about, to my knowledge.

> The block layer effectively muxes the "discard" into the right target
> device command. TRIM for ATA, WRITE_SAME (with unmap) or UNMAP for
> SCSI...
>
> If your favourite fs supports this, you can enable this feature with
> "-o 
> discard" for fine grained discards,

Thanks, it's worth pointing out that TRIM is not the only backend to the
discard API.  However, even if we do implement a vectored API, we can
translate that to dumber commands if a given spec doesn't support it.

Getting back to the problem...

>From the file system, you want to discard discrete ranges of blocks.
The API to support this can either take care of the data integrity
guarantees by itself, or make the upper layer ensure that trim and write
do not pass each other.  The current implementation does the latter.  In
order to do the former, there is the potential for a lot of overhead to
be introduced into the block allocation layers for the file systems.

So, given the above, it is up to the file system to send down the
biggest discard requests it can in order to reduce the overhead of the
command.  If a vectored approach is made available, then that would be
even better.  Christoph, is this something that's on your radar?

Cheers,
Jeff
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists