lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <af55979a-a19e-d6b8-314d-b573d0dcf513@bytedance.com>
Date:   Wed, 22 Mar 2023 11:12:15 +0800
From:   Hao Jia <jiahao.os@...edance.com>
To:     Joel Fernandes <joel@...lfernandes.org>,
        Peter Zijlstra <peterz@...radead.org>
Cc:     mingo@...hat.com, mingo@...nel.org, juri.lelli@...hat.com,
        vincent.guittot@...aro.org, dietmar.eggemann@....com,
        rostedt@...dmis.org, bsegall@...gle.com, mgorman@...e.de,
        bristot@...hat.com, vschneid@...hat.com,
        mgorman@...hsingularity.net, linux-kernel@...r.kernel.org,
        Josh Don <joshdon@...gle.com>,
        Vineeth Pillai <vineethrp@...il.com>,
        Vineeth Pillai <vineethrp@...gle.com>
Subject: Re: [External] Re: [PATCH] sched/core: Minor optimize
 pick_next_task() when core-sched enable



On 2023/3/22 Joel Fernandes wrote:
> On Mon, Mar 20, 2023 at 4:55 AM Hao Jia <jiahao.os@...edance.com> wrote:
>>
>> kindly ping...
>>
>> On 2023/3/8 Hao Jia wrote:
>>> If core-sched is enabled, sometimes we will traverse each CPU on the core
>>> to find the highest priority task 'max' on the entire core, and then try
>>> and find a runnable task that matches @max.
>>> But in the following case, we choose the runnable task is not the best.
>>>
>>> core max: task2 (cookie 0)
>>>
>>>        rq0                             rq1
>>> task0(cookie non-zero)                task2(cookie 0)
>>> task1(cookie 0)
>>> task3(cookie 0)
>>> ...
>>>
>>> pick-task: idle                       pick-task: task2
>>>
>>> CPU0 and CPU1 are two CPUs on the same core, task0 and task2 are the
>>> highest priority tasks on rq0 and rq1 respectively, task2 is @max
>>> on the entire core.
>>>
>>> In the case that 'max' has a zero cookie, instead of continuing to
>>> search for a runnable task on rq0 that matches @max's cookie, we
>>> choose idle for rq0 directly.
>>> At this time, it is obviously better to choose task1 to run for rq0,
>>> which will increase the CPU utilization.
>>> Therefore, we queue tasks with zero cookies in core_tree, and record
>>> the number of non-zero cookie tasks of each rq to detect the status
>>> of the sched-core.
> 
> I do remember this as a known issue (more of a known but unimplemented
> optimization) which happens when you have a high priority non-cookie
> task which is in front of several low priority ones on the same
> thread/rq. Adding +Vineeth Pillai to see if he remembers the issue.

Thank you for sharing the information, I will check the previous emails.


> 
> I can try to take a look at it this week to make sense of your patch.
> The code in upstream has changed quite a bit since we did this work on
> the older kernels, so allow some time to page it all in.

My patch tries to make non-cookie tasks also queue the core_tree if 
CONFIG_SCHED_CORE is defined.
This allows us to quaickly find the highest priority non-cookie task on 
this rq through sched_core_find(). But as Peter said it makes the 
overhead of the second rb tree unconditional.

Finding the highest priority non-cookie task is complex and 
time-consuming, especially in CONFIG_FAIR_GROUP_SCHED.

I am now trying to let the fore_idle CPU look for a highest priority 
non-cookie task through a mechanism similar to queue_balance_callback.
I'm not sure if this is possible, do you have a better suggestion?

> 
> Meanwhile, could you please provide some more details of your
> usecase/workload and how this patch improves it? Also out of
> curiosity, is bytedance using core scheduling for security or
> something else?
> 

At ByteDance, we try to use core-sched to deploy high-priority services 
(online) and low-priority services (offline) on the same physical machine.

core-sched can ensure that offline(cookie non-zero) will not affect 
onlie's L1 and L2 cache when onlie is running. In our scenario, 
core-sched is very useful for online services.

Thanks,
Hao


> Thanks,
> 
>   - Joel
> 
> 
>>>
>>> Signed-off-by: Hao Jia <jiahao.os@...edance.com>
>>> ---
>>>    kernel/sched/core.c       | 29 +++++++++++++++--------------
>>>    kernel/sched/core_sched.c |  9 ++++-----
>>>    kernel/sched/sched.h      |  1 +
>>>    3 files changed, 20 insertions(+), 19 deletions(-)
>>>
>>> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
>>> index af017e038b48..765cd14c52e1 100644
>>> --- a/kernel/sched/core.c
>>> +++ b/kernel/sched/core.c
>>> @@ -236,8 +236,8 @@ void sched_core_enqueue(struct rq *rq, struct task_struct *p)
>>>    {
>>>        rq->core->core_task_seq++;
>>>
>>> -     if (!p->core_cookie)
>>> -             return;
>>> +     if (p->core_cookie)
>>> +             rq->cookied_count++;
>>>
>>>        rb_add(&p->core_node, &rq->core_tree, rb_sched_core_less);
>>>    }
>>> @@ -246,11 +246,16 @@ void sched_core_dequeue(struct rq *rq, struct task_struct *p, int flags)
>>>    {
>>>        rq->core->core_task_seq++;
>>>
>>> +     if (p->core_cookie)
>>> +             rq->cookied_count--;
>>> +
>>>        if (sched_core_enqueued(p)) {
>>>                rb_erase(&p->core_node, &rq->core_tree);
>>>                RB_CLEAR_NODE(&p->core_node);
>>>        }
>>>
>>> +     if (!sched_core_enabled(rq))
>>> +             return;
>>>        /*
>>>         * Migrating the last task off the cpu, with the cpu in forced idle
>>>         * state. Reschedule to create an accounting edge for forced idle,
>>> @@ -370,7 +375,7 @@ static void sched_core_assert_empty(void)
>>>        int cpu;
>>>
>>>        for_each_possible_cpu(cpu)
>>> -             WARN_ON_ONCE(!RB_EMPTY_ROOT(&cpu_rq(cpu)->core_tree));
>>> +             WARN_ON_ONCE(cpu_rq(cpu)->cookied_count);
>>>    }
>>>
>>>    static void __sched_core_enable(void)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ