linux-kernel - Re: [Xen-devel] [PATCH RFC] xen: if on Xen, "flatten" the scheduling domain hierarchy

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <56027DB2.50203@citrix.com>
Date:	Wed, 23 Sep 2015 11:23:46 +0100
From:	George Dunlap <george.dunlap@...rix.com>
To:	Juergen Gross <jgross@...e.com>,
	Dario Faggioli <dario.faggioli@...rix.com>
CC:	"xen-devel@...ts.xenproject.org" <xen-devel@...ts.xenproject.org>,
	"Andrew Cooper" <Andrew.Cooper3@...rix.com>,
	"Luis R. Rodriguez" <mcgrof@...not-panic.com>,
	linux-kernel <linux-kernel@...r.kernel.org>,
	"David Vrabel" <david.vrabel@...rix.com>,
	Boris Ostrovsky <boris.ostrovsky@...cle.com>,
	Stefano Stabellini <stefano.stabellini@...citrix.com>
Subject: Re: [Xen-devel] [PATCH RFC] xen: if on Xen, "flatten" the scheduling
 domain hierarchy

On 09/23/2015 05:36 AM, Juergen Gross wrote:
> On 09/22/2015 06:22 PM, George Dunlap wrote:
>> On 09/22/2015 05:42 AM, Juergen Gross wrote:
>>> One other thing I just discovered: there are other consumers of the
>>> topology sibling masks (e.g. topology_sibling_cpumask()) as well.
>>>
>>> I think we would want to avoid any optimizations based on those in
>>> drivers as well, not only in the scheduler.
>>
>> I'm beginning to lose the thread of the discussion here a bit.
>>
>> Juergen / Dario, could one of you summarize your two approaches, and the
>> (alleged) advantages and disadvantages of each one?
> 
> Okay, I'll have a try:
> 
> The problem we want to solve:
> -----------------------------
> 
> The Linux kernel is gathering cpu topology data during boot via the
> CPUID instruction on each processor coming online. This data is
> primarily used in the scheduler to decide to which cpu a thread should
> be migrated when this seems to be necessary. There are other users of
> the topology information in the kernel (e.g. some drivers try to do
> optimizations like core-specific queues/lists).
> 
> When started in a virtualized environment the obtained data is next to
> useless or even wrong, as it is reflecting only the status of the time
> of booting the system. Scheduling of the (v)cpus done by the hypervisor
> is changing the topology beneath the feet of the Linux kernel without
> reflecting this in the gathered topology information. So any decisions
> taken based on that data will be clueless and possibly just wrong.
> 
> The minimal solution is to change the topology data in the kernel in a
> way that all cpus are regarded as equal regarding their relation to each
> other (e.g. when migrating a thread to another cpu no cpu is preferred
> as a target).
> 
> The topology information of the CPUID instruction is, however, even
> accessible form user mode and might be used for licensing purposes of
> any user program (e.g. by limiting the software to run on a specific
> number of cores or sockets). So just mangling the data returned by
> CPUID in the hypervisor seems not to be a general solution, while we
> might want to do it at least optionally in the future.
> 
> In the future we might want to support either dynamic topology updates
> or be able to tell the kernel to use some of the topology data, e.g.
> when pinning vcpus.
> 
> 
> Solution 1 (Dario):
> -------------------
> 
> Don't use the CPUID derived topology information in the Linux scheduler,
> but let it use a simple "flat" topology by setting own scheduler domain
> data under Xen.
> 
> Advantages:
> + very clean solution regarding the scheduler interface
> + scheduler decisions are based on a minimal data set
> + small patch
> 
> Disadvantages:
> - covers the scheduler only, drivers still use the "wrong" data
> - a little bit hacky regarding some NUMA architectures (needs either a
>   hook in the code dealing with that architecture or multiple scheduler
>   domain data overwrites)
> - future enhancements will make the solution less clean (either need
>   duplicating scheduler domain data or some new hooks in scheduler
>   domain interface)
> 
> 
> Solution 2 (Juergen):
> ---------------------
> 
> When booted as a Xen guest modify the topology data built during boot
> resulting in the same simple "flat" topology as in Dario's solution.
> 
> Advantages:
> + the simple topology is seen by all consumers of topology data as the
>   data itself is modified accordingly
> + small patch
> + future enhancements rather easy by selecting which data to modify
> 
> Disadvantages:
> - interface to scheduler not as clean as in Dario's approach
> - scheduler decisions are based on multiple layers of topology data
>   where one layer would be enough to describe the topology
> 
> 
> Dario, are you okay with this summary?

Thanks -- that's very helpful.

 -George
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/