[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <56027DB2.50203@citrix.com>
Date: Wed, 23 Sep 2015 11:23:46 +0100
From: George Dunlap <george.dunlap@...rix.com>
To: Juergen Gross <jgross@...e.com>,
Dario Faggioli <dario.faggioli@...rix.com>
CC: "xen-devel@...ts.xenproject.org" <xen-devel@...ts.xenproject.org>,
"Andrew Cooper" <Andrew.Cooper3@...rix.com>,
"Luis R. Rodriguez" <mcgrof@...not-panic.com>,
linux-kernel <linux-kernel@...r.kernel.org>,
"David Vrabel" <david.vrabel@...rix.com>,
Boris Ostrovsky <boris.ostrovsky@...cle.com>,
Stefano Stabellini <stefano.stabellini@...citrix.com>
Subject: Re: [Xen-devel] [PATCH RFC] xen: if on Xen, "flatten" the scheduling
domain hierarchy
On 09/23/2015 05:36 AM, Juergen Gross wrote:
> On 09/22/2015 06:22 PM, George Dunlap wrote:
>> On 09/22/2015 05:42 AM, Juergen Gross wrote:
>>> One other thing I just discovered: there are other consumers of the
>>> topology sibling masks (e.g. topology_sibling_cpumask()) as well.
>>>
>>> I think we would want to avoid any optimizations based on those in
>>> drivers as well, not only in the scheduler.
>>
>> I'm beginning to lose the thread of the discussion here a bit.
>>
>> Juergen / Dario, could one of you summarize your two approaches, and the
>> (alleged) advantages and disadvantages of each one?
>
> Okay, I'll have a try:
>
> The problem we want to solve:
> -----------------------------
>
> The Linux kernel is gathering cpu topology data during boot via the
> CPUID instruction on each processor coming online. This data is
> primarily used in the scheduler to decide to which cpu a thread should
> be migrated when this seems to be necessary. There are other users of
> the topology information in the kernel (e.g. some drivers try to do
> optimizations like core-specific queues/lists).
>
> When started in a virtualized environment the obtained data is next to
> useless or even wrong, as it is reflecting only the status of the time
> of booting the system. Scheduling of the (v)cpus done by the hypervisor
> is changing the topology beneath the feet of the Linux kernel without
> reflecting this in the gathered topology information. So any decisions
> taken based on that data will be clueless and possibly just wrong.
>
> The minimal solution is to change the topology data in the kernel in a
> way that all cpus are regarded as equal regarding their relation to each
> other (e.g. when migrating a thread to another cpu no cpu is preferred
> as a target).
>
> The topology information of the CPUID instruction is, however, even
> accessible form user mode and might be used for licensing purposes of
> any user program (e.g. by limiting the software to run on a specific
> number of cores or sockets). So just mangling the data returned by
> CPUID in the hypervisor seems not to be a general solution, while we
> might want to do it at least optionally in the future.
>
> In the future we might want to support either dynamic topology updates
> or be able to tell the kernel to use some of the topology data, e.g.
> when pinning vcpus.
>
>
> Solution 1 (Dario):
> -------------------
>
> Don't use the CPUID derived topology information in the Linux scheduler,
> but let it use a simple "flat" topology by setting own scheduler domain
> data under Xen.
>
> Advantages:
> + very clean solution regarding the scheduler interface
> + scheduler decisions are based on a minimal data set
> + small patch
>
> Disadvantages:
> - covers the scheduler only, drivers still use the "wrong" data
> - a little bit hacky regarding some NUMA architectures (needs either a
> hook in the code dealing with that architecture or multiple scheduler
> domain data overwrites)
> - future enhancements will make the solution less clean (either need
> duplicating scheduler domain data or some new hooks in scheduler
> domain interface)
>
>
> Solution 2 (Juergen):
> ---------------------
>
> When booted as a Xen guest modify the topology data built during boot
> resulting in the same simple "flat" topology as in Dario's solution.
>
> Advantages:
> + the simple topology is seen by all consumers of topology data as the
> data itself is modified accordingly
> + small patch
> + future enhancements rather easy by selecting which data to modify
>
> Disadvantages:
> - interface to scheduler not as clean as in Dario's approach
> - scheduler decisions are based on multiple layers of topology data
> where one layer would be enough to describe the topology
>
>
> Dario, are you okay with this summary?
Thanks -- that's very helpful.
-George
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists