linux-kernel - Re: [PATCH v4 6/7] sched/fair: skip busy cores in SIS search

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <90e76dec-aacf-5619-1116-b4c5640a65a4@bytedance.com>
Date:   Wed, 13 Jul 2022 18:25:58 +0800
From:   Abel Wu <wuyun.abel@...edance.com>
To:     Chen Yu <yu.c.chen@...el.com>
Cc:     Peter Zijlstra <peterz@...radead.org>,
        Mel Gorman <mgorman@...e.de>,
        Vincent Guittot <vincent.guittot@...aro.org>,
        Josh Don <joshdon@...gle.com>,
        Tim Chen <tim.c.chen@...ux.intel.com>,
        K Prateek Nayak <kprateek.nayak@....com>,
        "Gautham R . Shenoy" <gautham.shenoy@....com>,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH v4 6/7] sched/fair: skip busy cores in SIS search

On 7/11/22 8:02 PM, Chen Yu Wrote:
>>> ...
>>> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
>>> index d3e2c5a7c1b7..452eb63ee6f6 100644
>>> --- a/kernel/sched/core.c
>>> +++ b/kernel/sched/core.c
>>> @@ -5395,6 +5395,7 @@ void scheduler_tick(void)
>>>    		resched_latency = cpu_resched_latency(rq);
>>>    	calc_global_load_tick(rq);
>>>    	sched_core_tick(rq);
>>> +	update_overloaded_rq(rq);
>>
>> I didn't see this update in idle path. Is this on your intend?
>>
> It is intended to exclude the idle path. My thought was that, since
> the avg_util has contained the historic activity, checking the cpu
> status in each idle path seems to have no much difference...

I presume the avg_util is used to decide how many cpus to scan, while
the update determines which cpus to scan.

>>>    	rq_unlock(rq, &rf);
>>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>>> index f80ae86bb404..34b1650f85f6 100644
>>> --- a/kernel/sched/fair.c
>>> +++ b/kernel/sched/fair.c
>>> @@ -6323,6 +6323,50 @@ static inline int select_idle_smt(struct task_struct *p, struct sched_domain *sd
>>>    #endif /* CONFIG_SCHED_SMT */
>>> +/* derived from group_is_overloaded() */
>>> +static inline bool rq_overloaded(struct rq *rq, int cpu, unsigned int imbalance_pct)
>>> +{
>>> +	if (rq->nr_running - rq->cfs.idle_h_nr_running <= 1)
>>> +		return false;
>>> +
>>> +	if ((SCHED_CAPACITY_SCALE * 100) <
>>> +			(cpu_util_cfs(cpu) * imbalance_pct))
>>> +		return true;
>>> +
>>> +	if ((SCHED_CAPACITY_SCALE * imbalance_pct) <
>>> +			(cpu_runnable(rq) * 100))
>>> +		return true;
>>
>> So the filter contains cpus that over-utilized or overloaded now.
>> This steps further to make the filter reliable while at the cost
>> of sacrificing scan efficiency.
>>
> Right. Ideally if there is a 'realtime' idle cpumask for SIS, the
> scan would be quite accurate. The issue is how to maintain this
> cpumask in low cost.

Yes indeed.

>> The idea behind my recent patches is to keep the filter radical,
>> but use it conservatively.
>>
> Do you mean, update the per-core idle filter frequently, but only
> propogate the filter to LLC-cpumask when the system is overloaded?

Not exactly. I want to update the filter (BTW there is only the LLC
filter, no core filters :)) once core state changes, while apply it
in SIS domain scan only if the domain is busy enough.

>>> +
>>> +	return false;
>>> +}
>>> +
>>> +void update_overloaded_rq(struct rq *rq)
>>> +{
>>> +	struct sched_domain_shared *sds;
>>> +	struct sched_domain *sd;
>>> +	int cpu;
>>> +
>>> +	if (!sched_feat(SIS_FILTER))
>>> +		return;
>>> +
>>> +	cpu = cpu_of(rq);
>>> +	sd = rcu_dereference(per_cpu(sd_llc, cpu));
>>> +	if (unlikely(!sd))
>>> +		return;
>>> +
>>> +	sds = rcu_dereference(per_cpu(sd_llc_shared, cpu));
>>> +	if (unlikely(!sds))
>>> +		return;
>>> +
>>> +	if (rq_overloaded(rq, cpu, sd->imbalance_pct)) {
>>
>> I'm not sure whether it is appropriate to use LLC imbalance pct here,
>> because we are comparing inside the LLC rather than between the LLCs.
>>
> Right, imbalance_pct could not be of LLC's, it could be of the core domain's
> imbalance_pct.
>>> +		/* avoid duplicated write, mitigate cache contention */
>>> +		if (!cpumask_test_cpu(cpu, sdo_mask(sds)))
>>> +			cpumask_set_cpu(cpu, sdo_mask(sds));
>>> +	} else {
>>> +		if (cpumask_test_cpu(cpu, sdo_mask(sds)))
>>> +			cpumask_clear_cpu(cpu, sdo_mask(sds));
>>> +	}
>>> +}
>>>    /*
>>>     * Scan the LLC domain for idle CPUs; this is dynamically regulated by
>>>     * comparing the average scan cost (tracked in sd->avg_scan_cost) against the
>>> @@ -6383,6 +6427,9 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, bool
>>>    		}
>>>    	}
>>> +	if (sched_feat(SIS_FILTER) && !has_idle_core && sd->shared)
>>> +		cpumask_andnot(cpus, cpus, sdo_mask(sd->shared));
>>> +
>>>    	for_each_cpu_wrap(cpu, cpus, target + 1) {
>>>    		if (has_idle_core) {
>>>    			i = select_idle_core(p, cpu, cpus, &idle_cpu);
>>> diff --git a/kernel/sched/features.h b/kernel/sched/features.h
>>> index ee7f23c76bd3..1bebdb87c2f4 100644
>>> --- a/kernel/sched/features.h
>>> +++ b/kernel/sched/features.h
>>> @@ -62,6 +62,7 @@ SCHED_FEAT(TTWU_QUEUE, true)
>>>     */
>>>    SCHED_FEAT(SIS_PROP, false)
>>>    SCHED_FEAT(SIS_UTIL, true)
>>> +SCHED_FEAT(SIS_FILTER, true)
>>
>> The filter should be enabled when there is a need. If the system
>> is idle enough, I don't think it's a good idea to clear out the
>> overloaded cpus from domain scan. Making the filter a sched-feat
>> won't help the problem.
>>
>> My latest patch will only apply the filter when nr is less than
>> the LLC size.
> Do you mean only update the filter(idle cpu mask), or only uses the
> filter in SIS when the system meets: nr_running < LLC size?
> 

In SIS domain search, apply the filter when nr < LLC_size. But I
haven't tested this with SIS_UTIL, and in the SIS_UTIL case this
condition seems always true.

Thanks,
Abel