linux-kernel - Re: Discard support (was Re: [PATCH] swap: send callback when swap slot is freed)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <46b8a8850908131520s747e045cnd8db9493e072939d@mail.gmail.com>
Date:	Thu, 13 Aug 2009 15:20:39 -0700
From:	Richard Sharpe <realrichardsharpe@...il.com>
To:	Greg Freemyer <greg.freemyer@...il.com>
Cc:	david@...g.hm, Markus Trippelsdorf <markus@...ppelsdorf.de>,
	Matthew Wilcox <willy@...ux.intel.com>,
	Hugh Dickins <hugh.dickins@...cali.co.uk>,
	Nitin Gupta <ngupta@...are.org>, Ingo Molnar <mingo@...e.hu>,
	Peter Zijlstra <peterz@...radead.org>,
	linux-kernel@...r.kernel.org, linux-mm@...ck.org,
	linux-scsi@...r.kernel.org, linux-ide@...r.kernel.org,
	Linux RAID <linux-raid@...r.kernel.org>
Subject: Re: Discard support (was Re: [PATCH] swap: send callback when swap 
	slot is freed)

On Thu, Aug 13, 2009 at 2:28 PM, Greg Freemyer<greg.freemyer@...il.com> wrote:
> On Thu, Aug 13, 2009 at 4:44 PM, <david@...g.hm> wrote:
>> On Thu, 13 Aug 2009, Greg Freemyer wrote:
>>
>>> On Thu, Aug 13, 2009 at 12:33 PM, <david@...g.hm> wrote:
>>>>
>>>> On Thu, 13 Aug 2009, Markus Trippelsdorf wrote:
>>>>
>>>>> On Thu, Aug 13, 2009 at 08:13:12AM -0700, Matthew Wilcox wrote:
>>>>>>
>>>>>> I am planning a complete overhaul of the discard work.  Users can send
>>>>>> down discard requests as frequently as they like.  The block layer will
>>>>>> cache them, and invalidate them if writes come through.  Periodically,
>>>>>> the block layer will send down a TRIM or an UNMAP (depending on the
>>>>>> underlying device) and get rid of the blocks that have remained
>>>>>> unwanted
>>>>>> in the interim.
>>>>>
>>>>> That is a very good idea. I've tested your original TRIM implementation
>>>>> on
>>>>> my Vertex yesterday and it was awful ;-). The SSD needs hundreds of
>>>>> milliseconds to digest a single TRIM command. And since your
>>>>> implementation
>>>>> sends a TRIM for each extent of each deleted file, the whole system is
>>>>> unusable after a short while.
>>>>> An optimal solution would be to consolidate the discard requests, bundle
>>>>> them and send them to the drive as infrequent as possible.
>>>>
>>>> or queue them up and send them when the drive is idle (you would need to
>>>> keep track to make sure the space isn't re-used)
>>>>
>>>> as an example, if you would consider spinning down a drive you don't hurt
>>>> performance by sending accumulated trim commands.
>>>>
>>>> David Lang
>>>
>>> An alternate approach is the block layer maintain its own bitmap of
>>> used unused sectors / blocks. Unmap commands from the filesystem just
>>> cause the bitmap to be updated.  No other effect.
>>
>> how does the block layer know what blocks are unused by the filesystem?
>>
>> or would it be a case of the filesystem generating discard/trim requests to
>> the block layer so that it can maintain it's bitmap, and then the block
>> layer generating the requests to the drive below it?
>>
>> David Lang
>
> Yes, my thought.was that block layer would consume the discard/trim
> requests from the filesystem in realtime to maintain the bitmap, then
> at some later point in time when the system has extra resources it
> would generate the calls down to the lower layers and eventually the
> drive.

Why should the block layer be forced to maintain something that is
probably of use for only a limited number of cases? For example, the
devices I work on already maintain their own mapping of HOST-visible
LBAs to underlying storage, and I suspect that most such devices do.
So, you are duplicating something that we already do, and there is no
way that I am aware of to synchronise the two.

All we really need, I believe is for the UNMAP requests to come down
to us with writes barriered until we respond, and it is a relatively
cheap operation, although writes that are already in the cache and
uncommitted to disk present some issues if an UNMAP request comes down
for recently written blocks.

> I highlight the lower layers because mdraid is also going to have to
> be in the mix if raid5/6 is in use.  ie. At a minimum it will have to
> adjust the block range to align with the stripe boundaries.
>
> Greg
> --
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majordomo@...r.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>



-- 
Regards,
Richard Sharpe
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/