linux-kernel - Re: RFC: default group_isolation to 1, remove option

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Mon, 7 Mar 2011 15:41:50 -0800
From:	Justin TerAvest <teravest@...gle.com>
To:	Jens Axboe <axboe@...nel.dk>
Cc:	Vivek Goyal <vgoyal@...hat.com>,
	Chad Talbott <ctalbott@...gle.com>,
	Nauman Rafique <nauman@...gle.com>,
	Divyesh Shah <dpshah@...gle.com>,
	lkml <linux-kernel@...r.kernel.org>,
	Gui Jianfeng <guijianfeng@...fujitsu.com>,
	Corrado Zoccolo <czoccolo@...il.com>
Subject: Re: RFC: default group_isolation to 1, remove option

On Mon, Mar 7, 2011 at 12:47 PM, Jens Axboe <axboe@...nel.dk> wrote:
> On 2011-03-07 21:46, Vivek Goyal wrote:
>> On Mon, Mar 07, 2011 at 09:32:54PM +0100, Jens Axboe wrote:
>>
>> [..]
>>>> So given then fact that per-ioc-per-disk accounting of request descriptors
>>>> makes the accounting complicated and also makes it hard for block IO
>>>> controller to use it, the other approach of implementing per group limit
>>>> and per-group-per-bdi congested might be reasonable. Having said that, the
>>>> patch I had written for per group descritor was also not necessarily very
>>>> simple.
>>>
>>> So before all of this gets over designed a lot... If we get rid of the
>>> one remaining direct buffered writeback in bdp(), then only the flusher
>>> threads should be sending huge amounts of IO. So if we attack the
>>> problem from that end instead, have it do that accounting in the bdi.
>>> With that in place, I'm fairly confident that we can remove the request
>>> limits.
>>>
>>> Basically just replace the congestion_wait() in there with a bit of
>>> accounting logic. Since it's per bdi anyway, we don't even have to
>>> maintain that state in the bdi itself. It can remain in the thread
>>> stack.
>>
>> Moving the accounting up sounds interesting. For cgroup stuff we again
>> shall have to do something additional like having per cgroup per bdi
>> flusher threads or mainting the number of pending IO per group and not
>> flusher thread does not submitting IOs for groups which have lots of
>> pending IOs (to avoid faster group getting blocked behind slower one).
>
> So since there are at least two use cases, we could easily provide
> helpers to do that sort of blocking to not throw too much work at it.
>
> I think we are making progress :-)

This generally sounds good to me, though I didn't think per-cgroup limits
were terribly complicated.

I wanted to make a quick note-- it sounds like part of the intent here is to
avoid doing any page tracking in the page_cgroup structure, but I think that
we will inevitably have to do some tracking there for css ids, to provide
isolation between buffered writers. I'd like to send out a patchset soon
to track buffered writers, but we should probably work out the request
descriptor limits first.


>
> --
> Jens Axboe
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/