linux-kernel - Re: [RFC v5 PATCH 0/8] CFS Hard limits

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20100202041448.GA17333@in.ibm.com>
Date:	Tue, 2 Feb 2010 09:44:48 +0530
From:	Bharata B Rao <bharata@...ux.vnet.ibm.com>
To:	Paul Turner <pjt@...gle.com>
Cc:	Bharata B Rao <bharata.rao@...il.com>,
	linux-kernel@...r.kernel.org,
	Dhaval Giani <dhaval.giani@...il.com>,
	Balbir Singh <balbir@...ux.vnet.ibm.com>,
	Vaidyanathan Srinivasan <svaidy@...ux.vnet.ibm.com>,
	Gautham R Shenoy <ego@...ibm.com>,
	Srivatsa Vaddagiri <vatsa@...ibm.com>,
	Kamalesh Babulal <kamalesh@...ux.vnet.ibm.com>,
	Ingo Molnar <mingo@...e.hu>,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>,
	Pavel Emelyanov <xemul@...nvz.org>,
	Herbert Poetzl <herbert@...hfloor.at>,
	Avi Kivity <avi@...hat.com>,
	Chris Friesen <cfriesen@...tel.com>,
	Paul Menage <menage@...gle.com>,
	Mike Waychison <mikew@...gle.com>
Subject: Re: [RFC v5 PATCH 0/8] CFS Hard limits - v5

On Mon, Feb 01, 2010 at 10:25:11AM -0800, Paul Turner wrote:
> On Mon, Feb 1, 2010 at 3:04 AM, Paul Turner <pjt@...gle.com> wrote:
> > On Mon, Feb 1, 2010 at 12:21 AM, Bharata B Rao
> > <bharata@...ux.vnet.ibm.com> wrote:
> >> On Thu, Jan 28, 2010 at 08:26:08PM -0800, Paul Turner wrote:
> >>> On Thu, Jan 28, 2010 at 7:49 PM, Bharata B Rao <bharata.rao@...il.com> wrote:
> >>> > On Sat, Jan 9, 2010 at 2:15 AM, Paul Turner <pjt@...gle.com> wrote:
> >>> >>
> >>> >> What are your thoughts on using a separate mechanism for the general case.  A
> >>> >> draft proposal follows:
> >>> >>
> >>> >> - Maintain a global run-time pool for each tg.  The runtime specified by the
> >>> >>  user represents the value that this pool will be refilled to each period.
> >>> >> - We continue to maintain the local notion of runtime/period in each cfs_rq,
> >>> >>  continue to accumulate locally here.
> >>> >>
> >>> >> Upon locally exceeding the period acquire new credit from the global pool
> >>> >> (either under lock or more likely using atomic ops).  This can either be in
> >>> >> fixed steppings (e.g. 10ms, could be tunable) or following some quasi-curve
> >>> >> variant with historical demand.
> >>> >>
> >>> >> One caveat here is that there is some over-commit in the system, the local
> >>> >> differences of runtime vs period represent additional over the global pool.
> >>> >> However it should not be possible to consistently exceed limits since the rate
> >>> >> of refill is gated by the runtime being input into the system via the per-tg
> >>> >> pool.
> >>> >>
> >>> >
> >>> > We borrow from what is actually available as spare (spare = unused or
> >>> > remaining). With global pool, I see that would be difficult.
> >>> > Inability/difficulty in keeping the global pool in sync with the
> >>> > actual available spare time is the reason for over-commit ?
> >>> >
> >>>
> >>> We maintain two pools, a global pool (new) and a per-cfs_rq pool
> >>> (similar to existing rt_bw).
> >>>
> >>> When consuming time you charge vs your local bandwidth until it is
> >>> expired, at this point you must either refill from the global pool, or
> >>> throttle.
> >>>
> >>> The "slack" in the system is the sum of unconsumed time in local pools
> >>> from the *previous* global pool refill.  This is bounded above by the
> >>> size of time you refill a local pool at each expiry.  We call the size
> >>> of refill a 'slice'.
> >>>
> >>> e.g.
> >>>
> >>> Task limit of 50ms, slice=10ms, 4cpus, period of 500ms
> >>>
> >>> Task A runs on cpus 0 and 1 for 5ms each, then blocks.
> >>>
> >>> When A first executes on each cpu we take slice=10ms from the global
> >>> pool of 50ms and apply it to the local rq.  Execution then proceeds vs
> >>> local pool.
> >>>
> >>> Current state is: 5 ms in local pools on {0,1}, 30ms remaining in global pool
> >>>
> >>> Upon period expiration we issue a global pool refill.  At this point we have:
> >>> 5 ms in local pools on {0,1}, 50ms remaining in global pool.
> >>>
> >>> That 10ms of slack time is over-commit in the system.  However it
> >>> should be clear that this can only be a local effect since over any
> >>> period of time the rate of input into the system is limited by global
> >>> pool refill rate.
> >>
> >> With the same setup as above consider 5 such tasks which block after
> >> consuming 5ms each. So now we have 25ms slack time. In the next bandwidth
> >> period if 5 cpu hogs start running and they would consume this 25ms and the
> >> 50ms from this period. So we gave 50% extra to a group in a bandwidth period.
> >> Just wondering how common such scenarious could be.
> >>
> >
> > Yes within a single given period you may exceed your reservation due
> > to slack.  However, of note is that across any 2 successive periods
> > you are guaranteed to be within your reservation, i.e. 2*usage <=
> > 2*period, as slack available means that you under-consumed your
> > previous period.
> >
> > For those needing a hard guarantee (independent of amelioration
> > strategies) halving the period provided would then provide this across
> > their target period with the basic v1 implementation.
> >
> 
> Actually now that I think about it, this observation only holds when
> the slack is consumed within the second of the two periods.  It should
> be restated something like:
> 
> for any n contiguous periods your maximum usage is n*runtime +
> nr_cpus*slice, note the slack term is constant and is dominated for
> any observation window involving several periods

Ok. We are talking about 'hard limits' here and looks like there is
a theoritical possibility of exceeding the limit often. Need to understand
how good/bad this is in real life.

Regards,
Bharata.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/