linux-kernel - Re: [PATCH RFC v4 2/3] sched: Avoid placing RT threads on cores handling long softirqs

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Y01NPB4sa8Z98ntK@li-4a3a4a4c-28e5-11b2-a85c-a8d192c6f089.ibm.com>
Date:   Mon, 17 Oct 2022 14:40:28 +0200
From:   Alexander Gordeev <agordeev@...ux.ibm.com>
To:     John Stultz <jstultz@...gle.com>
Cc:     LKML <linux-kernel@...r.kernel.org>,
        "Connor O'Brien" <connoro@...gle.com>,
        John Dias <joaodias@...gle.com>, Rick Yiu <rickyiu@...gle.com>,
        John Kacur <jkacur@...hat.com>,
        Qais Yousef <qais.yousef@....com>,
        Chris Redpath <chris.redpath@....com>,
        Abhijeet Dharmapurikar <adharmap@...cinc.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Ingo Molnar <mingo@...hat.com>,
        Juri Lelli <juri.lelli@...hat.com>,
        Vincent Guittot <vincent.guittot@...aro.org>,
        Dietmar Eggemann <dietmar.eggemann@....com>,
        Steven Rostedt <rostedt@...dmis.org>,
        Thomas Gleixner <tglx@...utronix.de>, kernel-team@...roid.com,
        "J . Avila" <elavila@...gle.com>
Subject: Re: [PATCH RFC v4 2/3] sched: Avoid placing RT threads on cores
 handling long softirqs

On Mon, Oct 03, 2022 at 11:20:32PM +0000, John Stultz wrote:
> From: Connor O'Brien <connoro@...gle.com>

Hi John, Connor,

I took a cursory look and have couple of hopefully meaningful
comments, but mostly - questions.

[...]

> diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
> index 55f39c8f4203..3c628db807c8 100644
> --- a/kernel/sched/rt.c
> +++ b/kernel/sched/rt.c
> @@ -1599,6 +1599,44 @@ static void yield_task_rt(struct rq *rq)
>  #ifdef CONFIG_SMP
>  static int find_lowest_rq(struct task_struct *task);
>  
> +#ifdef CONFIG_RT_SOFTIRQ_OPTIMIZATION
> +#define __use_softirq_opt 1
> +/*
> + * Return whether the given cpu is currently non-preemptible
> + * while handling a potentially long softirq, or if the current
> + * task is likely to block preemptions soon because it is a
> + * ksoftirq thread that is handling slow softirq.

What is slow softirqs in this context compared to long?

> + */
> +static bool cpu_busy_with_softirqs(int cpu)
> +{
> +	u32 softirqs = per_cpu(active_softirqs, cpu) |
> +		       __cpu_softirq_pending(cpu);
> +	struct task_struct *cpu_ksoftirqd = per_cpu(ksoftirqd, cpu);
> +	struct task_struct *curr;
> +	struct rq *rq = cpu_rq(cpu);
> +	int ret;
> +
> +	rcu_read_lock();
> +	curr = READ_ONCE(rq->curr); /* unlocked access */

select_task_rq_rt() takes the lock and reads curr already,
before calling this funciton. I think there is a way to
decompose it in a better way.

> +	ret = (softirqs & LONG_SOFTIRQ_MASK) &&
> +		 (curr == cpu_ksoftirqd ||

EOL is extra.

> +		  preempt_count() & SOFTIRQ_MASK);

Could you please clarify this whole check in more detail?

What is the point in checking if a remote CPU is handling
a "long" softirq while the local one is handling any softirq?

> +	rcu_read_unlock();

Why ret needs to be calculated under the lock?

> +	return ret;
> +}
> +#else
> +#define __use_softirq_opt 0
> +static bool cpu_busy_with_softirqs(int cpu)
> +{
> +	return false;
> +}
> +#endif /* CONFIG_RT_SOFTIRQ_OPTIMIZATION */
> +
> +static bool rt_task_fits_cpu(struct task_struct *p, int cpu)

To me, the new name is unfortunate, since it strips a notion
of the reason. Instead, "CPU un/fits, because of capacity" it
reads as "CPU un/fits, because of ..." what?

> +{
> +	return !cpu_busy_with_softirqs(cpu) && rt_task_fits_capacity(p, cpu);

I guess the order needs to be swapped, as rt_task_fits_capacity()
is rather "quick" while cpu_busy_with_softirqs() is rather "slow".

> +}
> +
>  static int
>  select_task_rq_rt(struct task_struct *p, int cpu, int flags)
>  {
> @@ -1894,14 +1934,17 @@ static int find_lowest_rq(struct task_struct *task)
>  		return -1; /* No other targets possible */
>  
>  	/*
> -	 * If we're on asym system ensure we consider the different capacities
> -	 * of the CPUs when searching for the lowest_mask.
> +	 * If we're using the softirq optimization or if we are
> +	 * on asym system, ensure we consider the softirq processing
> +	 * or different capacities of the CPUs when searching for the
> +	 * lowest_mask.
>  	 */
> -	if (static_branch_unlikely(&sched_asym_cpucapacity)) {
> +	if (__use_softirq_opt ||

Why use __use_softirq_opt and not IS_ENABLED(CONFIG_RT_SOFTIRQ_OPTIMIZATION)?

> +	    static_branch_unlikely(&sched_asym_cpucapacity)) {
>  
>  		ret = cpupri_find_fitness(&task_rq(task)->rd->cpupri,
>  					  task, lowest_mask,
> -					  rt_task_fits_capacity);
> +					  rt_task_fits_cpu);
>  	} else {
>  
>  		ret = cpupri_find(&task_rq(task)->rd->cpupri,
> diff --git a/kernel/softirq.c b/kernel/softirq.c
> index c8a6913c067d..35ee79dd8786 100644
> --- a/kernel/softirq.c
> +++ b/kernel/softirq.c
> @@ -60,6 +60,13 @@ static struct softirq_action softirq_vec[NR_SOFTIRQS] __cacheline_aligned_in_smp
>  
>  DEFINE_PER_CPU(struct task_struct *, ksoftirqd);
>  
> +/*
> + * active_softirqs -- per cpu, a mask of softirqs that are being handled,
> + * with the expectation that approximate answers are acceptable and therefore
> + * no synchronization.
> + */
> +DEFINE_PER_CPU(u32, active_softirqs);

I guess all active_softirqs uses need to be coupled with
IS_ENABLED(CONFIG_RT_SOFTIRQ_OPTIMIZATION) check.

>  const char * const softirq_to_name[NR_SOFTIRQS] = {
>  	"HI", "TIMER", "NET_TX", "NET_RX", "BLOCK", "IRQ_POLL",
>  	"TASKLET", "SCHED", "HRTIMER", "RCU"
> @@ -551,6 +558,7 @@ asmlinkage __visible void __softirq_entry __do_softirq(void)
>  restart:
>  	/* Reset the pending bitmask before enabling irqs */
>  	set_softirq_pending(0);
> +	__this_cpu_write(active_softirqs, pending);
>  
>  	local_irq_enable();
>  
> @@ -580,6 +588,7 @@ asmlinkage __visible void __softirq_entry __do_softirq(void)
>  		pending >>= softirq_bit;
>  	}
>  
> +	__this_cpu_write(active_softirqs, 0);
>  	if (!IS_ENABLED(CONFIG_PREEMPT_RT) &&
>  	    __this_cpu_read(ksoftirqd) == current)
>  		rcu_softirq_qs();

Thanks!