linux-kernel - Re: [RFC PATCH v3 00/16] Core scheduling v3

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <e8872bd9-1c6b-fb12-b535-3d37740a0306@linux.alibaba.com>
Date:   Fri, 31 May 2019 11:01:51 +0800
From:   Aaron Lu <aaron.lu@...ux.alibaba.com>
To:     Aubrey Li <aubrey.intel@...il.com>,
        Vineeth Remanan Pillai <vpillai@...italocean.com>
Cc:     Nishanth Aravamudan <naravamudan@...italocean.com>,
        Julien Desfossez <jdesfossez@...italocean.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Tim Chen <tim.c.chen@...ux.intel.com>,
        Ingo Molnar <mingo@...nel.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        Paul Turner <pjt@...gle.com>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        Linux List Kernel Mailing <linux-kernel@...r.kernel.org>,
        Subhra Mazumdar <subhra.mazumdar@...cle.com>,
        Frédéric Weisbecker <fweisbec@...il.com>,
        Kees Cook <keescook@...omium.org>,
        Greg Kerr <kerrnel@...gle.com>, Phil Auld <pauld@...hat.com>,
        Valentin Schneider <valentin.schneider@....com>,
        Mel Gorman <mgorman@...hsingularity.net>,
        Pawan Gupta <pawan.kumar.gupta@...ux.intel.com>,
        Paolo Bonzini <pbonzini@...hat.com>
Subject: Re: [RFC PATCH v3 00/16] Core scheduling v3

On 2019/5/30 22:04, Aubrey Li wrote:
> On Thu, May 30, 2019 at 4:36 AM Vineeth Remanan Pillai
> <vpillai@...italocean.com> wrote:
>>
>> Third iteration of the Core-Scheduling feature.
>>
>> This version fixes mostly correctness related issues in v2 and
>> addresses performance issues. Also, addressed some crashes related
>> to cgroups and cpu hotplugging.
>>
>> We have tested and verified that incompatible processes are not
>> selected during schedule. In terms of performance, the impact
>> depends on the workload:
>> - on CPU intensive applications that use all the logical CPUs with
>>   SMT enabled, enabling core scheduling performs better than nosmt.
>> - on mixed workloads with considerable io compared to cpu usage,
>>   nosmt seems to perform better than core scheduling.
> 
> My testing scripts can not be completed on this version. I figured out the
> number of cpu utilization report entry didn't reach my minimal requirement.
> Then I wrote a simple script to verify.
> ====================
> $ cat test.sh
> #!/bin/sh
> 
> for i in `seq 1 10`
> do
>     echo `date`, $i
>     sleep 1
> done
> ====================

Is the shell put to some cgroup and assigned some tag or simply untagged?

> 
> Normally it works as below:
> 
> Thu May 30 14:13:40 CST 2019, 1
> Thu May 30 14:13:41 CST 2019, 2
> Thu May 30 14:13:42 CST 2019, 3
> Thu May 30 14:13:43 CST 2019, 4
> Thu May 30 14:13:44 CST 2019, 5
> Thu May 30 14:13:45 CST 2019, 6
> Thu May 30 14:13:46 CST 2019, 7
> Thu May 30 14:13:47 CST 2019, 8
> Thu May 30 14:13:48 CST 2019, 9
> Thu May 30 14:13:49 CST 2019, 10
> 
> When the system was running 32 sysbench threads and
> 32 gemmbench threads, it worked as below(the system
> has ~38% idle time)

Are the two workloads assigned different tags?
And how many cores/threads do you have?

> Thu May 30 14:14:20 CST 2019, 1
> Thu May 30 14:14:21 CST 2019, 2
> Thu May 30 14:14:22 CST 2019, 3
> Thu May 30 14:14:24 CST 2019, 4 <=======x=
> Thu May 30 14:14:25 CST 2019, 5
> Thu May 30 14:14:26 CST 2019, 6
> Thu May 30 14:14:28 CST 2019, 7 <=======x=
> Thu May 30 14:14:29 CST 2019, 8
> Thu May 30 14:14:31 CST 2019, 9 <=======x=
> Thu May 30 14:14:34 CST 2019, 10 <=======x=

This feels like "date" failed to schedule on some CPU
on time.

> And it got worse when the system was running 64/64 case,
> the system still had ~3% idle time
> Thu May 30 14:26:40 CST 2019, 1
> Thu May 30 14:26:46 CST 2019, 2
> Thu May 30 14:26:53 CST 2019, 3
> Thu May 30 14:27:01 CST 2019, 4
> Thu May 30 14:27:03 CST 2019, 5
> Thu May 30 14:27:11 CST 2019, 6
> Thu May 30 14:27:31 CST 2019, 7
> Thu May 30 14:27:32 CST 2019, 8
> Thu May 30 14:27:41 CST 2019, 9
> Thu May 30 14:27:56 CST 2019, 10
> 
> Any thoughts?

My first reaction is: when shell wakes up from sleep, it will
fork date. If the script is untagged and those workloads are
tagged and all available cores are already running workload
threads, the forked date can lose to the running workload
threads due to __prio_less() can't properly do vruntime comparison
for tasks on different CPUs. So those idle siblings can't run
date and are idled instead. See my previous post on this:

https://lore.kernel.org/lkml/20190429033620.GA128241@aaronlu/
(Now that I re-read my post, I see that I didn't make it clear
that se_bash and se_hog are assigned different tags(e.g. hog is
tagged and bash is untagged).

Siblings being forced idle is expected due to the nature of core
scheduling, but when two tasks belonging to two siblings are
fighting for schedule, we should let the higher priority one win.

It used to work on v2 is probably due to we mistakenly
allow different tagged tasks to schedule on the same core at
the same time, but that is fixed in v3.