linux-kernel - Re: [PATCH 21/25] io-controller: Per cgroup request descriptor support

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20090721140134.GB540@redhat.com>
Date:	Tue, 21 Jul 2009 10:01:34 -0400
From:	Vivek Goyal <vgoyal@...hat.com>
To:	Nauman Rafique <nauman@...gle.com>
Cc:	Gui Jianfeng <guijianfeng@...fujitsu.com>,
	linux-kernel@...r.kernel.org,
	containers@...ts.linux-foundation.org, dm-devel@...hat.com,
	jens.axboe@...cle.com, dpshah@...gle.com, lizf@...fujitsu.com,
	mikew@...gle.com, fchecconi@...il.com, paolo.valente@...more.it,
	ryov@...inux.co.jp, fernando@....ntt.co.jp, s-uchida@...jp.nec.com,
	taka@...inux.co.jp, jmoyer@...hat.com, dhaval@...ux.vnet.ibm.com,
	balbir@...ux.vnet.ibm.com, righi.andrea@...il.com,
	m-ikeda@...jp.nec.com, jbaron@...hat.com, agk@...hat.com,
	snitzer@...hat.com, akpm@...ux-foundation.org, peterz@...radead.org
Subject: Re: [PATCH 21/25] io-controller: Per cgroup request descriptor
	support

On Mon, Jul 20, 2009 at 10:55:31PM -0700, Nauman Rafique wrote:
> On Mon, Jul 20, 2009 at 10:37 PM, Gui
> Jianfeng<guijianfeng@...fujitsu.com> wrote:
> > Vivek Goyal wrote:
> >> o Currently a request queue has got fixed number of request descriptors for
> >>   sync and async requests. Once the request descriptors are consumed, new
> >>   processes are put to sleep and they effectively become serialized. Because
> >>   sync and async queues are separate, async requests don't impact sync ones
> >>   but if one is looking for fairness between async requests, that is not
> >>   achievable if request queue descriptors become bottleneck.
> >>
> >> o Make request descriptor's per io group so that if there is lots of IO
> >>   going on in one cgroup, it does not impact the IO of other group.
> >>
> >> o This is just one relatively simple way of doing things. This patch will
> >>   probably change after the feedback. Folks have raised concerns that in
> >>   hierchical setup, child's request descriptors should be capped by parent's
> >>   request descriptors. May be we need to have per cgroup per device files
> >>   in cgroups where one can specify the upper limit of request descriptors
> >>   and whenever a cgroup is created one needs to assign request descritor
> >>   limit making sure total sum of child's request descriptor is not more than
> >>   of parent.
> >>
> >>   I guess something like memory controller. Anyway, that would be the next
> >>   step. For the time being, we have implemented something simpler as follows.
> >>
> >> o This patch implements the per cgroup request descriptors. request pool per
> >>   queue is still common but every group will have its own wait list and its
> >>   own count of request descriptors allocated to that group for sync and async
> >>   queues. So effectively request_list becomes per io group property and not a
> >>   global request queue feature.
> >>
> >> o Currently one can define q->nr_requests to limit request descriptors
> >>   allocated for the queue. Now there is another tunable q->nr_group_requests
> >>   which controls the requests descriptr limit per group. q->nr_requests
> >>   supercedes q->nr_group_requests to make sure if there are lots of groups
> >>   present, we don't end up allocating too many request descriptors on the
> >>   queue.
> >>
> >
> >  Hi Vivek,
> >
> >  In order to prevent q->nr_requests from becoming the bottle-neck of allocating
> >  requests, whether we can update nr_requests accordingly when allocating or removing
> >  a cgroup?
> 
> Vivek,
> I agree with Gui here. In fact, it does not make much sense to keep
> the nr_requests limit if we already have per cgroup limit in place.
> This change also simplifies code quite a bit, as we can get rid of all
> that sleep_on_global logic.
> 

Hi Nauman, Gui,

There were few reasons to keep a total limit on number of request
descriptors (q->nr_requests) apart from per group limit.

- We have this notion of queue being congested or not depending on out of
  q->nr_requests how many are currently being used. Writeback threads,
  some filesystems and other places make use of this information to either
  not to block or to avoid pushing too much of data on device if queue is
  congested. 

  With q->nr_requests removed, how do you define queue full and congested
  semantics?

- I think slee_on_global logic makes sense even without q->nr_requests.
  Assume that a group allows request descriptor allocation but due to lack
  of memory, allocation fails. Where do you make this process wait to
  attempt next time? Making all such failed processes on gloabl list on
  queue instead of per group list makes more sense to me for following
  reasons.

	- If this is the first request allocation from the group and we
 	  make the process sleep on group list, it will never be woken up
	  as no request from that group will complete.

	- If there are many processes who failed request descriptor
	  allocation, when some request completes, I think it is more
	  fair to wake these up in FIFO manner to try out allocation again
	  instead of waiting for request to complete from the group
	  process belongs to. The reason being that io controller did not
          fail the request descriptor allocation.

  So even if you get rid of q->nr_requests, you still shall have to have
  some logic of global wait list where failed allocations can wait.

- It is backward compatible and there are less chances of higher layers
  being broken due to this.


Gui, I think automatic updation of q->nr_requests is probably not a very
good thing. It is user defined tunable and user does not expect this to
change automatically.

At this point of time I really can't think of simpler and cleaner way.
Ideas are welcome.

Thanks
Vivek
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/