linux-kernel - Re: Sum of weights idea for CFS PI

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <a4a7a4de-c58d-d667-a4b3-0f7bfb2b09f1@joelfernandes.org>
Date:   Fri, 30 Sep 2022 13:42:05 -0400
From:   Joel Fernandes <joel@...lfernandes.org>
To:     Youssef Esmat <youssefesmat@...gle.com>,
        Qais Yousef <qais.yousef@....com>
Cc:     Peter Zijlstra <peterz@...radead.org>,
        LKML <linux-kernel@...r.kernel.org>,
        Steven Rostedt <rostedt@...dmis.org>, juri.lelli@...hat.com,
        vincent.guittot@...aro.org,
        Dietmar Eggemann <dietmar.eggemann@....com>,
        Thomas Gleixner <tglx@...utronix.de>, bristot@...hat.com,
        clark.williams@...il.com, bigeasy@...utronix.de,
        "Paul E. McKenney" <paulmck@...nel.org>
Subject: Re: Sum of weights idea for CFS PI

On 9/30/2022 11:44 AM, Youssef Esmat wrote:
> Hi Everyone!

Hi Youssef,

(Youssef is new to LKML though in no way new to OS or software development. I
gave him the usual 'dont-top-post' chat already - fyi).

> I am not sure we should care about A's sleeping pattern. The case we
> care about is when A is running or wants to run but can't because it
> is blocked on C. In that case C should get the weight of A as if A was
> running.

Just to clarify - Youssef did mean sum of weights of different things in the
chain, and not just weights (he confirmed on chat that that's what he meant).

> Ideally this is also a temporary boost since critical sections should
> be relatively small, so erring on the side of giving C slightly more
> runtime would be safe I think.

True. But I would not hold my breath too much on user space not holding a lock
for very long periods of time. But I agree that generally should be true.

thanks,

 - Joel


> 
> Thanks,
> Youssef
> 
> On Fri, Sep 30, 2022 at 8:49 AM Qais Yousef <qais.yousef@....com> wrote:
>>
>> Hi Joel
>>
>> I'm interested in the topic, if I can be CCed in any future discussions I'd
>> appreciate it :)
>>
>> On 09/29/22 16:38, Joel Fernandes wrote:
>>> Hi Peter, all,
>>>
>>> Just following-up about the idea Peter suggested at LPC22 about sum of weights
>>> to solve the CFS priority inversion issues using priority inheritance. I am not
>>> sure if a straight forward summation of the weights of dependencies in the
>>> chain, is sufficient (or may cause too much unfairness).
>>>
>>> I think it will work if all the tasks on CPU are 100% in utilization:
>>>
>>> Say if you have 4 tasks (A, B, C, D) running and each one has equal
>>> weight (W) except for A which has twice the weight (2W).
>>> So the CPU bandwidth distribution is (assuming all are running):
>>> A:   2 / 5
>>> B, C. D:  1 / 5
>>>
>>> Say out of the 4 tasks, 3 of them are a part of a classical priority
>>> inversion scenario (A, B and C).
>>>
>>> Say now A blocks on a lock and that lock's owner C is running, however now
>>> because A has blocked, B gets 1/3 bandwidth, where as it should have been
>>> limited to 1/5. To remedy this, say you give C a weight of 2W. B gets 1/4
>>> bandwidth - still not fair since B is eating away CPU bandwidth causing the
>>> priority inversion we want to remedy.
>>>
>>> The correct bandwidth distribution should be (B and D should be unchanged):
>>> B = 1/5
>>> D = 1/5
>>>
>>> C = 3/5
>>>
>>> This means that C's weight should be 3W , and B and D should be W each
>>> as before. So indeed, C's new weight is its original weight PLUS the
>>> weight of the A - that's needed to keep the CPU usage of the other
>>> tasks (B, D) in check so that C makes forward progress on behalf of A and the
>>> other tasks don't eat into the CPU utilization.
>>>
>>> However, I think this will kinda fall apart if A is asleep 50% of the time
>>> (assume the sleep is because of I/O and unrelated to the PI chain).
>>>
>>> Because now if all were running (and assume no PI dependencies), with A being
>>> 50%, the bandwidth of B, C and D each would be divided into 2 components:
>>>
>>> a.  when A is running, it would be as above.
>>> b.  but if A was sleeping, B, C, and D would get 1/3.
>>>
>>> So on average, B, C and D get:  (1/3 + 1/5) / 2 = 8/30. This gives A about 6/30
>>> or 1/5 bandwidth.
>>
>> The average metric is interesting one. It can be confusing to reason about too.
>>
>> I think we have 3 events to take into account here, not 2:
>>
>> a. when A is running and NOT blocked on C.
>> b. when A is running and BLOCKED on C.
>> c. A is sleeping.
>>
>> This means A, B, C and D's shares will be:
>>
>>     A ,  B ,  C ,  D
>> a. 2/5, 1/5, 1/5, 1/5
>> b. -  , 3/5, 1/5, 1/5
>> c. -  , 1/3, 1/3, 1/3
>>
>> Since A is sleeping for 50%, I don't think we can assume equal distribution for
>> the 3 events (can't just divide by 3).
>>
>> I believe we can assume that
>>
>> a. occurs 25% of the time
>> b. occurs 25% of the time
>> c. occurs 50% of the time
>>
>> I *think* this should provide something more representative.
>>
>>>
>>> But now say A happen to block on a lock that C is holding. You would boost C to
>>> weight 3W which gives it 3/5 (or 18/30) as we saw above, which is more than what
>>> C should actually get.
>>>
>>> C should get (8/30 + 6/30 = 14/30) AFAICS.
>>>
>>> Hopefully one can see that a straight summation of weights is not enough. It
>>> needs to be something like:
>>>
>>> C's new weight = C's original weight + (A's weight) * (A's utilization)
>>>
>>> Or something, otherwise the inherited weight may be too much to properly solve it.
>>>
>>> Any thoughts on this? You mentioned you had some notes on this and/or proxy
>>> execution, could you share it?
>>
>> I assume we'll be using rt-mutex inheritance property to handle this? If this
>> was discussed during a talk, I'd appreciate a link to that.
>>
>> In the past in OSPM conference we brought up an issue with performance
>> inversion where a task running on a smaller (slower to be more generic) CPU is
>> holding the lock and causing massive delays for waiters. This is an artefact of
>> DVFS. For HMP, there's an additional cause due to the unequal capacities of the
>> CPUs.
>>
>> Proxy execution seems to be the nice solution to all of these problems, but
>> it's a long way away. I'm interested to learn how this inheritance will be
>> implemented. And whether there are any userspace conversion issues. i.e: do
>> we need to convert all locks to rt-mutex locks?
>>
>>
>> Thanks
>>
>> --
>> Qais Yousef