[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20220715100738.GD3493@suse.de>
Date: Fri, 15 Jul 2022 11:07:38 +0100
From: Mel Gorman <mgorman@...e.de>
To: Libo Chen <libo.chen@...cle.com>
Cc: Tim Chen <tim.c.chen@...ux.intel.com>, peterz@...radead.org,
vincent.guittot@...aro.org, 21cnbao@...il.com,
dietmar.eggemann@....com, linux-kernel@...r.kernel.org,
tglx@...utronix.de, Konrad Rzeszutek Wilk <konrad.wilk@...cle.com>
Subject: Re: [PATCH] sched/fair: no sync wakeup from interrupt context
On Thu, Jul 14, 2022 at 01:21:14PM -0700, Libo Chen wrote:
> > state explicitly that "the interrupt CPU isn't as performance critical as
> > cache from its previous CPU" so that assumption was incorrect, at least
> > in your case. I don't have a counter example where the interrupt data *is*
> > more important than any other cache-hot data so the check can go.
> >
> > I think a revert would not achieve what you want as a plain revert would
> > still allow an interrupt to pull a task from an arbitrary location as sync
> This is the tricky part, I didn't explain it well. For rds-stress, it's a
> lot (~30%) better to allow pulling across nodes when the waking CPU is idle.
Ah, the exact opposite then.
> I think this may be an example of interrupt data being more important.
> Something
> like below will help a lot for this particular benchmark (rds-stress):
>
> if (available_idle_cpu(this_cpu))
> return this_cpu;
>
I see but this will likely regress for workloads that receive interrupts on
arbitrary CPUs that are not related to the tasks preferred location. This
can happen for IO completions for example where interrupts can be delivered
round-robin to many CPUs in the system. It's all described in the changelog
for 7332dec055f2
Unfortunately, depending on the type of interrupt and IRQ
configuration, there may not be a strong relationship between the
CPU an interrupt was delivered on and the CPU a task was running
on. For example, the interrupts could all be delivered to CPUs on
one particular node due to the machine topology or IRQ affinity
configuration. Another example is an interrupt for an IO completion
which can be delivered to any CPU where there is no guarantee the
data is either cache hot or even local.
> still pulls
> the wakee task to that CPU across nodes irrespective of its previous CPU.
> And that's
> what this patch tries to address.
>
> Mel, I am thinking about a follow-up patch like below then we can continue
> the discussion
> there since this is kinda a separate issue:
>
> - if (available_idle_cpu(this_cpu) && cpus_share_cache(this_cpu, prev_cpu))
> - return available_idle_cpu(prev_cpu) ? prev_cpu : this_cpu;
> -
>
> + if (available_idle_cpu(this_cpu))
> + if (cpus_share_cache(this_cpu, prev_cpu))
> + return available_idle_cpu(prev_cpu) ? prev_cpu :
> this_cpu;
> + else
> + return this_cpu;
>
That will also pull tasks cross-node and while it might work well for a
network stress test, it will hurt other cases where the interrupt data
is relatively unimportant to the waking task.
--
Mel Gorman
SUSE Labs
Powered by blists - more mailing lists