linux-kernel - Re: [RFC PATCH v4 0/2] sched/fair: Choose the CPU where short task is running during wake up

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <Y8laqT+7OX0I3pCu@chenyu5-mobl1>
Date:   Thu, 19 Jan 2023 22:58:49 +0800
From:   Chen Yu <yu.c.chen@...el.com>
To:     K Prateek Nayak <kprateek.nayak@....com>
CC:     Peter Zijlstra <peterz@...radead.org>,
        Vincent Guittot <vincent.guittot@...aro.org>,
        Tim Chen <tim.c.chen@...el.com>,
        Mel Gorman <mgorman@...hsingularity.net>,
        Juri Lelli <juri.lelli@...hat.com>,
        "Rik van Riel" <riel@...riel.com>, Aaron Lu <aaron.lu@...el.com>,
        Abel Wu <wuyun.abel@...edance.com>,
        Yicong Yang <yangyicong@...ilicon.com>,
        "Gautham R . Shenoy" <gautham.shenoy@....com>,
        Ingo Molnar <mingo@...hat.com>,
        Dietmar Eggemann <dietmar.eggemann@....com>,
        Steven Rostedt <rostedt@...dmis.org>,
        Ben Segall <bsegall@...gle.com>,
        "Daniel Bristot de Oliveira" <bristot@...hat.com>,
        Valentin Schneider <vschneid@...hat.com>,
        Hillf Danton <hdanton@...a.com>,
        Honglei Wang <wanghonglei@...ichuxing.com>,
        Len Brown <len.brown@...el.com>,
        Chen Yu <yu.chen.surf@...il.com>,
        "Tianchen Ding" <dtcccc@...ux.alibaba.com>,
        Joel Fernandes <joel@...lfernandes.org>,
        Josh Don <joshdon@...gle.com>, <linux-kernel@...r.kernel.org>
Subject: Re: [RFC PATCH v4 0/2] sched/fair: Choose the CPU where short task
 is running during wake up

On 2023-01-16 at 16:23:13 +0530, K Prateek Nayak wrote:
> Hello Chenyu,
> 
> On 12/30/2022 8:17 AM, Chen Yu wrote:
> > On 2022-12-29 at 12:46:59 +0530, K Prateek Nayak wrote:
> >> Hello Chenyu,
> >>
> >> Including the detailed results from testing below.
> >>
> >> tl;dr
> >>
> >> o There seems to be 3 noticeable regressions:
> >>   - tbench for lower number of clients. The schedstat data shows
> >>     an increase in wait time.
> >>   - SpecJBB MultiJVM performance drops as the workload prefers
> >>     an idle CPU over a busy one.
> >>   - Unixbench-pipe benchmark performance drops.
> >>
> >> o Most benchmark numbers remain same.
> >>
> >> o Small gains seen for ycsb-mongodb and unixbench-syscall.
> >>
> 
> Please ignore the last test results. The tests did not use
> exactly same config for tip and sis_short kernel which led
> to more overhead in the network stack for sis_short kernel
> and the longer wait time seen in sched_stat data for tbench
> was a result of each loop taking longer to finish.
> 
> I reran the benchmarks on the latest tip making sure the
> configs are identical this time and only notice one
> regression in Spec-JBB Critical-jOPS.
>
> tl;dr
> 
> o tbench sees good improvement in the throughput when
>   the machine is fully loaded and beyond.
> o Some unixbench test cases show improvement as well as
>   ycsb-mongodb in NPS2 and NPS4 mode.
> o Most benchmark results are same.
> o SpecJBB Critical-jOPS are still down. I'll share full
>   schedstat dump for tasks separately with you.
> 
Thanks Prateek! I checked the task duration for these workloads,
they fall into the short duration task range, so SIS_SHORT takes
effect.
[snip]
> 
> SpecJBB Critical-jOPS performance is known to suffer when tasks
> queue behind each other. I'll share the data separately. I do see
> the average wait_sum go up 1.3%. The Max-jOPS throughput, however,
> is identical on both kernels which means sis_short does not affect
> the overall throughput but only for the critical jobs, do we see
> the regression due to possible queuing of tasks.
>
If I understand correctly, most workloads on Zen3 prefers to be spreaded
on idle CPUs in the same LLC, except for the scenario when the system is
extremly busy(and SIS_UTIL handles that). As Zen3 has 8C/16T per LLC,
it is unlikely to trigger the race condition I described in PATCH 2/2.
I'm preparing for a patch to also take nr_llc into considerdation.

thanks,
Chenyu
> --
> Thanks and Regards,
> Prateek