linux-kernel - Re: [patch 00/16] sched: per-entity load-tracking

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAPM31RJOzCBE6N6v-oRV+O=LC_NVAg6Y6LCgYXYrdOyrpLD2kA@mail.gmail.com>
Date:	Fri, 5 Oct 2012 02:07:08 -0700
From:	Paul Turner <pjt@...gle.com>
To:	Benjamin Segall <bsegall@...gle.com>
Cc:	Jan H. Schönherr <schnhrr@...tu-berlin.de>,
	linux-kernel@...r.kernel.org,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>,
	Ingo Molnar <mingo@...e.hu>,
	Vaidyanathan Srinivasan <svaidy@...ux.vnet.ibm.com>,
	Srivatsa Vaddagiri <vatsa@...ibm.com>,
	Kamalesh Babulal <kamalesh@...ux.vnet.ibm.com>,
	Venki Pallipadi <venki@...gle.com>,
	Mike Galbraith <efault@....de>,
	Vincent Guittot <vincent.guittot@...aro.org>,
	Nikunj A Dadhania <nikunj@...ux.vnet.ibm.com>,
	Morten Rasmussen <Morten.Rasmussen@....com>,
	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
	Namhyung Kim <namhyung@...nel.org>
Subject: Re: [patch 00/16] sched: per-entity load-tracking

On Mon, Sep 24, 2012 at 10:16 AM, Benjamin Segall <bsegall@...gle.com> wrote:
> "Jan H. Schönherr" <schnhrr@...tu-berlin.de> writes:
>
>> Hi Paul.
>>
>> Am 23.08.2012 16:14, schrieb pjt@...gle.com:
>>> Please find attached the latest version for CFS load-tracking.
>>
>> Originally, I thought, this series also takes care of
>> the leaf-cfs-runqueue ordering issue described here:
>>
>> http://lkml.org/lkml/2011/7/18/86
>>
>> Now, that I had a closer look, I see that it does not take
>> care of it.
>>
>> Is there still any reason why the leaf_cfs_rq-list must be sorted?
>> Or could we just get rid of the ordering requirement, now?
>
> Ideally yes, since a parent's __update_cfs_rq_tg_load_contrib and
> update_cfs_shares still depend on accurate values in
> runnable_load_avg/blocked_load_avg from its children. That said, nothing
> should completely fall over, it would make load decay take longer to
> propogate to the root.
>>
>> (That seems easier than to fix the issue, as I suspect that
>> __update_blocked_averages_cpu() might still punch some holes
>> in the hierarchy in some edge cases.)
>
> Yeah, I suspect it's possible that the parent ends up with a slightly
> lower runnable_avg_sum if they're both hovering around the max value
> since it isn't quite continuous, and it might be the case that this
> difference is large enough to require one more tick to decay to zero.

OK so coming back to this.  I had a look at this last week and
realized I'd managed to pervert my original intent.

Specifically, the idea here was barring numerical rounding errors
about LOAD_AVG_MAX we can guarantee a parent's runnable average is
greater than or equal to its child, since a parent is runnable
whenever its child is runnable by definition.  Provided we fix up
possible rounding errors (e.g. with a clamp) this then guarantees
we'll always remove child nodes before parent.

So I did this.  Then I thought: oh dear.  When I'd previously proposed
the above as a resolution for out-of-order removal I had not tackled
the problem of correct accounting on bandwidth constrained entities.
It turns out we end up having to "stop" time to handle this
efficiently / correctly.  But this means that we can then no longer
depend on the constraint above as the sums on a sub-tree can
potentially become out of sync.

So I got back to this again tonight and just spent a few hours tonight
looking at some alternate approaches to resolve this.  There's a few
games we can play here but after all of that I now re-realize we still
won't handle an on-list grand-parent correctly when the parent/child
are not on tree; and that this is fundamentally an issue with
enqueue's ordering -- no hole punching from parent before child
removal required.

I suspect we might want to do a segment splice on enqueue after all.
Let me sleep on it.

- Paul
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/