lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4ad76a27-dfb4-23be-fdb3-49c0780df670@didichuxing.com>
Date:   Fri, 30 Sep 2022 08:58:51 +0800
From:   Honglei Wang <wanghonglei@...ichuxing.com>
To:     K Prateek Nayak <kprateek.nayak@....com>,
        Chen Yu <yu.c.chen@...el.com>
CC:     Peter Zijlstra <peterz@...radead.org>,
        Vincent Guittot <vincent.guittot@...aro.org>,
        Tim Chen <tim.c.chen@...el.com>,
        Mel Gorman <mgorman@...hsingularity.net>,
        Juri Lelli <juri.lelli@...hat.com>,
        Rik van Riel <riel@...riel.com>,
        Aaron Lu <aaron.lu@...el.com>,
        Abel Wu <wuyun.abel@...edance.com>,
        Yicong Yang <yangyicong@...ilicon.com>,
        "Gautham R . Shenoy" <gautham.shenoy@....com>,
        Ingo Molnar <mingo@...hat.com>,
        Dietmar Eggemann <dietmar.eggemann@....com>,
        Steven Rostedt <rostedt@...dmis.org>,
        Ben Segall <bsegall@...gle.com>,
        Daniel Bristot de Oliveira <bristot@...hat.com>,
        Valentin Schneider <vschneid@...hat.com>,
        <linux-kernel@...r.kernel.org>
Subject: Re: [RFC PATCH] sched/fair: Choose the CPU where short task is
 running during wake up

Hi Prateek,


On 2022/9/30 01:34, K Prateek Nayak wrote:
> Hello Honglei,
> 
> Thank you for looking into this.
> 
> On 9/29/2022 12:29 PM, Honglei Wang wrote:
>>
>> [..snip..]
>>
>>>>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>>>>> index 914096c5b1ae..7519ab5b911c 100644
>>>>> --- a/kernel/sched/fair.c
>>>>> +++ b/kernel/sched/fair.c
>>>>> @@ -6020,6 +6020,19 @@ static int wake_wide(struct task_struct *p)
>>>>>        return 1;
>>>>>    }
>>>>>    +/*
>>>>> + * If a task switches in and then voluntarily relinquishes the
>>>>> + * CPU quickly, it is regarded as a short running task.
>>>>> + * sysctl_sched_min_granularity is chosen as the threshold,
>>>>> + * as this value is the minimal slice if there are too many
>>>>> + * runnable tasks, see __sched_period().
>>>>> + */
>>>>> +static int is_short_task(struct task_struct *p)
>>>>> +{
>>>>> +    return (p->se.sum_exec_runtime <=
>>>>> +        (p->nvcsw * sysctl_sched_min_granularity));
>>>>> +}
>>>>> +
>>>>>    /*
>>>>>     * The purpose of wake_affine() is to quickly determine on which CPU we can run
>>>>>     * soonest. For the purpose of speed we only consider the waking and previous
>>>>> @@ -6050,7 +6063,8 @@ wake_affine_idle(int this_cpu, int prev_cpu, int sync)
>>>>>        if (available_idle_cpu(this_cpu) && cpus_share_cache(this_cpu, prev_cpu))
>>>>>            return available_idle_cpu(prev_cpu) ? prev_cpu : this_cpu;
>>>>>    -    if (sync && cpu_rq(this_cpu)->nr_running == 1)
>>>>> +    if ((sync && cpu_rq(this_cpu)->nr_running == 1) ||
>>>>> +        is_short_task(cpu_curr(this_cpu)))
>>
>> Seems it a bit breaks idle (or will be idle) purpose of wake_affine_idle() here. Maybe we can do it something like this?
>>
>> if ((sync || is_short_task(cpu_curr(this_cpu))) && cpu_rq(this_cpu)->nr_running == 1)
> 
> I believe this will still cause performance degradation on split-LLC
> system for Stream like workloads. Based on the logs below, we can
> have a situation as follows:
> 
> 	stream-4135    [029] d..2.   353.580957: select_task_rq_fair: wake_affine_idle: Select this_cpu: sync(0) rq->nr_running(1) is_short_task(1)
> 
> Where sync is 0 but is_short_task() may return 1 and the
> current_rq->nr_running is 1. This will lead to two Stream threads
> getting placed on same LLC during wakeup which will cause cache
> contention and performance degradation.
> 

What I meant was that we should not break the purpose of 
wake_affine_idle(). 'nr_running == 1' makes sure there won't be a long 
queue here, and this might be helpful in the benchmark tests as well. 
Probably the short code section I sent was not considerate.. It's just 
kinda clue.

I see your test result in another mail. It's great and is exactly what I 
was thinking we should test.

Thanks,
Honglei

>>
>> Thanks,
>> Honglei
>>
>>>>
>>>> This change seems to optimize for affine wakeup which benefits
>>>> tasks with producer-consumer pattern but is not ideal for Stream.
>>>> Currently the logic ends will do an affine wakeup even if sync
>>>> flag is not set:
>>>>
>>>>             stream-4135    [029] d..2.   353.580953: sched_waking: comm=stream pid=4129 prio=120 target_cpu=082
>>>>             stream-4135    [029] d..2.   353.580957: select_task_rq_fair: wake_affine_idle: Select this_cpu: sync(0) rq->nr_running(1) is_short_task(1)
>>>>             stream-4135    [029] d..2.   353.580960: sched_migrate_task: comm=stream pid=4129 prio=120 orig_cpu=82 dest_cpu=30
>>>>             <idle>-0       [030] dNh2.   353.580993: sched_wakeup: comm=stream pid=4129 prio=120 target_cpu=030
> 
> This is the exact situation observed during our testing.
> 
>>>>
>>>> [..snip..]
>>>>   
> --
> Thanks and Regards,
> Prateek

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ