[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200116104428.GP2827@hirez.programming.kicks-ass.net>
Date: Thu, 16 Jan 2020 11:44:28 +0100
From: Peter Zijlstra <peterz@...radead.org>
To: Valentin Schneider <valentin.schneider@....com>
Cc: linux-kernel@...r.kernel.org, sudeep.holla@....com,
prime.zeng@...ilicon.com, dietmar.eggemann@....com,
morten.rasmussen@....com, mingo@...nel.org
Subject: Re: [PATCH] sched/topology: Assert non-NUMA topology masks don't
(partially) overlap
On Wed, Jan 15, 2020 at 04:09:15PM +0000, Valentin Schneider wrote:
> A "less intrusive" alternative is to assert the sd->groups list doesn't get
> re-written, which is a symptom of such bogus topologies. I've briefly
> tested this, you can have a look at it here:
>
> http://www.linux-arm.org/git?p=linux-vs.git;a=commit;h=e0ead72137332cbd3d69c9055ab29e6ffae5b37b
Something like that might still make sense. Can't never be too careful,
right ;-)
> kernel/sched/topology.c | 39 +++++++++++++++++++++++++++++++++++++++
> 1 file changed, 39 insertions(+)
>
> diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
> index 6ec1e595b1d4..dfb64c08a407 100644
> --- a/kernel/sched/topology.c
> +++ b/kernel/sched/topology.c
> @@ -1879,6 +1879,42 @@ static struct sched_domain *build_sched_domain(struct sched_domain_topology_leve
> return sd;
> }
>
> +/*
> + * Ensure topology masks are sane, i.e. there are no conflicts (overlaps) for
> + * any two given CPUs at this (non-NUMA) topology level.
> + */
> +static bool topology_span_sane(struct sched_domain_topology_level *tl,
> + const struct cpumask *cpu_map, int cpu)
> +{
> + int i;
> +
> + /* NUMA levels are allowed to overlap */
> + if (tl->flags & SDTL_OVERLAP)
> + return true;
> +
> + /*
> + * Non-NUMA levels cannot partially overlap - they must be either
> + * completely equal or completely disjoint. Otherwise we can end up
> + * breaking the sched_group lists - i.e. a later get_group() pass
> + * breaks the linking done for an earlier span.
> + */
> + for_each_cpu(i, cpu_map) {
> + if (i == cpu)
> + continue;
> + /*
> + * We should 'and' all those masks with 'cpu_map' to exactly
> + * match the topology we're about to build, but that can only
> + * remove CPUs, which only lessens our ability to detect
> + * overlaps
> + */
> + if (!cpumask_equal(tl->mask(cpu), tl->mask(i)) &&
> + cpumask_intersects(tl->mask(cpu), tl->mask(i)))
> + return false;
> + }
> +
> + return true;
> +}
> +
> /*
> * Find the sched_domain_topology_level where all CPU capacities are visible
> * for all CPUs.
> @@ -1975,6 +2011,9 @@ build_sched_domains(const struct cpumask *cpu_map, struct sched_domain_attr *att
> has_asym = true;
> }
>
> + if (WARN_ON(!topology_span_sane(tl, cpu_map, i)))
> + goto error;
> +
> sd = build_sched_domain(tl, cpu_map, attr, sd, dflags, i);
>
> if (tl == sched_domain_topology)
This is O(nr_cpus), but then, that function already is, so I don't see a
problem with this.
I'll take it, thanks!
Powered by blists - more mailing lists