linux-kernel - Re: [PATCH] sched/fair: no sync wakeup from interrupt context

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <20220715100738.GD3493@suse.de>
Date:   Fri, 15 Jul 2022 11:07:38 +0100
From:   Mel Gorman <mgorman@...e.de>
To:     Libo Chen <libo.chen@...cle.com>
Cc:     Tim Chen <tim.c.chen@...ux.intel.com>, peterz@...radead.org,
        vincent.guittot@...aro.org, 21cnbao@...il.com,
        dietmar.eggemann@....com, linux-kernel@...r.kernel.org,
        tglx@...utronix.de, Konrad Rzeszutek Wilk <konrad.wilk@...cle.com>
Subject: Re: [PATCH] sched/fair: no sync wakeup from interrupt context

On Thu, Jul 14, 2022 at 01:21:14PM -0700, Libo Chen wrote:
> > state explicitly that "the interrupt CPU isn't as performance critical as
> > cache from its previous CPU" so that assumption was incorrect, at least
> > in your case. I don't have a counter example where the interrupt data *is*
> > more important than any other cache-hot data so the check can go.
> > 
> > I think a revert would not achieve what you want as a plain revert would
> > still allow an interrupt to pull a task from an arbitrary location as sync
> This is the tricky part, I didn't explain it well. For rds-stress, it's a
> lot (~30%) better to allow pulling across nodes when the waking CPU is idle.

Ah, the exact opposite then.

> I think this may be an example of interrupt data being more important.
> Something
> like below will help a lot for this particular benchmark (rds-stress):
> 
> if (available_idle_cpu(this_cpu))
>         return this_cpu;
> 

I see but this will likely regress for workloads that receive interrupts on
arbitrary CPUs that are not related to the tasks preferred location. This
can happen for IO completions for example where interrupts can be delivered
round-robin to many CPUs in the system. It's all described in the changelog
for 7332dec055f2

	Unfortunately, depending on the type of interrupt and IRQ
	configuration, there may not be a strong relationship between the
	CPU an interrupt was delivered on and the CPU a task was running
	on. For example, the interrupts could all be delivered to CPUs on
	one particular node due to the machine topology or IRQ affinity
	configuration. Another example is an interrupt for an IO completion
	which can be delivered to any CPU where there is no guarantee the
	data is either cache hot or even local.

> still pulls
> the wakee task to that CPU across nodes irrespective of its previous CPU.
> And that's
> what this patch tries to address.
> 

> Mel, I am thinking about a follow-up patch like below then we can continue
> the discussion
> there since this is kinda a separate issue:
> 
> -	if (available_idle_cpu(this_cpu) && cpus_share_cache(this_cpu, prev_cpu))
> -		return available_idle_cpu(prev_cpu) ? prev_cpu : this_cpu;
> -
> 
> +       if (available_idle_cpu(this_cpu))
> +               if (cpus_share_cache(this_cpu, prev_cpu))
> +                       return available_idle_cpu(prev_cpu) ? prev_cpu :
> this_cpu;
> +       else
> +               return this_cpu;
> 

That will also pull tasks cross-node and while it might work well for a
network stress test, it will hurt other cases where the interrupt data
is relatively unimportant to the waking task.

-- 
Mel Gorman
SUSE Labs