[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20140916032920.GH2840@worktop.localdomain>
Date: Tue, 16 Sep 2014 05:29:20 +0200
From: Peter Zijlstra <peterz@...radead.org>
To: Dave Hansen <dave@...1.net>
Cc: linux-kernel@...r.kernel.org, borislav.petkov@....com,
andreas.herrmann3@....com, mingo@...nel.org, hpa@...ux.intel.com,
ak@...ux.intel.com
Subject: Re: [PATCH] x86: Consider multiple nodes in a single socket to be
"sane"
On Mon, Sep 15, 2014 at 03:26:41PM -0700, Dave Hansen wrote:
>
> I'm getting the spew below when booting with Haswell (Xeon
> E5-2699) CPUs and the "Cluster-on-Die" (CoD) feature enabled in
> the BIOS.
What is that cluster-on-die thing? I've heard it before but never could
find anything on it.
> This also fixes sysfs because CPUs with the same 'physical_package_id'
> in /sys/devices/system/cpu/cpu*/topology/ are not listed together
> in the same 'core_siblings_list'. This violates a statement from
> Documentation/ABI/testing/sysfs-devices-system-cpu:
>
> core_siblings: internal kernel map of cpu#'s hardware threads
> within the same physical_package_id.
>
> core_siblings_list: human-readable list of the logical CPU
> numbers within the same physical_package_id as cpu#.
No that statement is wrong; it assumes physical_package_id is a good
identifier for nodes. Clearly this is no longer true.
The idea is that core_siblings (or rather cpu_core_mask) is a mask of
all cores on a node.
> The sysfs effects here cause an issue with the hwloc tool where
> it gets confused and thinks there are more sockets than are
> physically present.
Meh, so then we need another mask.
The important bit you didn't show was the scheduler domain setup. I
suspect it all works by accident, not by design.
> diff -puN arch/x86/kernel/smpboot.c~hsw-cod-is-sane arch/x86/kernel/smpboot.c
> --- a/arch/x86/kernel/smpboot.c~hsw-cod-is-sane 2014-09-15 14:56:20.012314468 -0700
> +++ b/arch/x86/kernel/smpboot.c 2014-09-15 14:58:58.837506644 -0700
> @@ -344,10 +344,13 @@ static bool match_llc(struct cpuinfo_x86
> static bool match_mc(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o)
> {
> if (c->phys_proc_id == o->phys_proc_id) {
> - if (cpu_has(c, X86_FEATURE_AMD_DCM))
> - return true;
> -
> - return topology_sane(c, o, "mc");
> + /*
> + * We used to enforce that 'c' and 'o' be on the
> + * same node, but AMD's DCM and Intel's Cluster-
> + * on-Die (CoD) support both have physical
> + * processors that span NUMA nodes.
> + */
> + return true;
> }
> return false;
> }
This is wrong (and I suppose the AMD case was already wrong). That
function is supposed to match a multi-core group which is very much
supposed to be smaller-or-equal to a node, not spanning nodes.
The scheduler assumes: SMT <= LLC <= MC <= NODE and if setting the MC
mask to cover multiple nodes works, its by accident.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists