[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <46b2e9c5-6186-5c37-197c-8acd0bce358e@amazon.com>
Date: Thu, 28 May 2020 20:06:19 +0300
From: "Paraschiv, Andra-Irina" <andraprs@...zon.com>
To: Greg KH <gregkh@...uxfoundation.org>,
Alexander Graf <graf@...zon.de>
CC: <linux-kernel@...r.kernel.org>,
Anthony Liguori <aliguori@...zon.com>,
Benjamin Herrenschmidt <benh@...nel.crashing.org>,
Colm MacCarthaigh <colmmacc@...zon.com>,
Bjoern Doebel <doebel@...zon.de>,
David Woodhouse <dwmw@...zon.co.uk>,
Frank van der Linden <fllinden@...zon.com>,
"Martin Pohlack" <mpohlack@...zon.de>,
Matt Wilson <msw@...zon.com>,
Paolo Bonzini <pbonzini@...hat.com>,
Balbir Singh <sblbir@...zon.com>,
Stefano Garzarella <sgarzare@...hat.com>,
Stefan Hajnoczi <stefanha@...hat.com>,
Stewart Smith <trawets@...zon.com>,
Uwe Dannowski <uwed@...zon.de>, <kvm@...r.kernel.org>,
<ne-devel-upstream@...zon.com>
Subject: Re: [PATCH v3 07/18] nitro_enclaves: Init misc device providing the
ioctl interface
On 28/05/2020 16:12, Greg KH wrote:
>
> On Thu, May 28, 2020 at 03:01:36PM +0200, Alexander Graf wrote:
>>
>> On 27.05.20 00:24, Greg KH wrote:
>>> On Tue, May 26, 2020 at 03:44:30PM +0200, Alexander Graf wrote:
>>>>
>>>> On 26.05.20 15:17, Greg KH wrote:
>>>>> On Tue, May 26, 2020 at 02:44:18PM +0200, Alexander Graf wrote:
>>>>>>
>>>>>> On 26.05.20 14:33, Greg KH wrote:
>>>>>>> On Tue, May 26, 2020 at 01:42:41PM +0200, Alexander Graf wrote:
>>>>>>>>
>>>>>>>> On 26.05.20 08:51, Greg KH wrote:
>>>>>>>>> On Tue, May 26, 2020 at 01:13:23AM +0300, Andra Paraschiv wrote:
>>>>>>>>>> +#define NE "nitro_enclaves: "
>>>>>>>>> Again, no need for this.
>>>>>>>>>
>>>>>>>>>> +#define NE_DEV_NAME "nitro_enclaves"
>>>>>>>>> KBUILD_MODNAME?
>>>>>>>>>
>>>>>>>>>> +#define NE_IMAGE_LOAD_OFFSET (8 * 1024UL * 1024UL)
>>>>>>>>>> +
>>>>>>>>>> +static char *ne_cpus;
>>>>>>>>>> +module_param(ne_cpus, charp, 0644);
>>>>>>>>>> +MODULE_PARM_DESC(ne_cpus, "<cpu-list> - CPU pool used for Nitro Enclaves");
>>>>>>>>> Again, please do not do this.
>>>>>>>> I actually asked her to put this one in specifically.
>>>>>>>>
>>>>>>>> The concept of this parameter is very similar to isolcpus= and maxcpus= in
>>>>>>>> that it takes CPUs away from Linux and instead donates them to the
>>>>>>>> underlying hypervisor, so that it can spawn enclaves using them.
>>>>>>>>
>>>>>>>> From an admin's point of view, this is a setting I would like to keep
>>>>>>>> persisted across reboots. How would this work with sysfs?
>>>>>>> How about just as the "initial" ioctl command to set things up? Don't
>>>>>>> grab any cpu pools until asked to. Otherwise, what happens when you
>>>>>>> load this module on a system that can't support it?
>>>>>> That would give any user with access to the enclave device the ability to
>>>>>> remove CPUs from the system. That's clearly a CAP_ADMIN task in my book.
>>>>> Ok, what's wrong with that?
>>>> Would you want random users to get the ability to hot unplug CPUs from your
>>>> system? At unlimited quantity? I don't :).
>>> A random user, no, but one with admin rights, why not? They can do that
>>> already today on your system, this isn't new.
>>>
>>>>>> Hence this whole split: The admin defines the CPU Pool, users can safely
>>>>>> consume this pool to spawn enclaves from it.
>>>>> But having the admin define that at module load / boot time, is a major
>>>>> pain. What tools do they have that allow them to do that easily?
>>>> The normal toolbox: editing /etc/default/grub, adding an /etc/modprobe.d/
>>>> file.
>>> Editing grub files is horrid, come on...
>> It's not editing grub files, it's editing template config files that then
>> are used as input for grub config file generation :).
>>
>>>> When but at module load / boot time would you define it? I really don't want
>>>> to have a device node that in theory "the world" can use which then allows
>>>> any user on the system to hot unplug every CPU but 0 from my system.
>>> But you have that already when the PCI device is found, right? What is
>>> the initial interface to the driver? What's wrong with using that?
>>>
>>> Or am I really missing something as to how this all fits together with
>>> the different pieces? Seeing the patches as-is doesn't really provide a
>>> good overview, sorry.
>> Ok, let me walk you through the core donation process.
>>
>> Imagine you have a parent VM with 8 cores. Every one of those virtual cores
>> is 1:1 mapped to a physical core.
>>
>> You enumerate the PCI device, you start working with it. None of that
>> changes your topology.
>>
>> You now create an enclave spanning 2 cores. The hypervisor will remove the
>> 1:1 map for those 2 cores and instead mark them as "free floating" on the
>> remaining 6 cores. It then uses the 2 freed up cores and creates a 1:1 map
>> for the enclave's 2 cores
>>
>> To ensure that we still see a realistic mapping of core topology, we need to
>> remove those 2 cores from the parent VM's scope of execution. The way this
>> is done today is by going through CPU offlining.
>>
>> The first and obvious option would be to offline all respective CPUs when an
>> enclave gets created. But unprivileged users should be able to spawn
>> enclaves. So how do I ensure that my unprivileged user doesn't just offline
>> all of my parent VM's CPUs?
>>
>> The option implemented here is that we fold this into a two-stage approach.
>> The admin reserves a "pool" of cores for enclaves to use. Unprivileged users
>> can then consume cores from that pool, but not more than those.
>>
>> That way, unprivileged users have no influence over whether a core is
>> enabled or not. They can only consume the pool of cores that was dedicated
>> for enclave use.
>>
>> It also has the big advantage that you don't have dynamically changing CPU
>> topology in your system. Long living processes that adjust their environment
>> to the topology can still do so, without cores getting pulled out under
>> their feet.
> Ok, that makes more sense, but:
>
>>>>>> So I really don't think an ioctl would be a great user experience. Same for
>>>>>> a sysfs file - although that's probably slightly better than the ioctl.
>>>>> You already are using ioctls to control this thing, right? What's wrong
>>>>> with "one more"? :)
>>>> So what we *could* do is add an ioctl to set the pool size which then does a
>>>> CAP_ADMIN check. That however means you now are in priority hell:
>>>>
>>>> A user that wants to spawn an enclave as part of an nginx service would need
>>>> to create another service to set the pool size and indicate the dependency
>>>> in systemd control files.
>>>>
>>>> Is that really better than a module parameter?
>>> module parameters are hard to change, and manage control over who really
>>> is changing them.
>> What is hard about
>>
>> $ echo 1-5 > /sys/module/nitro_enclaves/parameters/ne_cpus
> So at runtime, after all is booted and up and going, you just ripped
> cores out from under someone's feet? :)
>
> And the code really handles writing to that value while the module is
> already loaded and up and running? At a quick glance, it didn't seem
> like it would handle that very well as it only is checked at ne_init()
> time.
>
> Or am I missing something?
It's checked for now at module init, true.
I started with init and it remained as a TODO on my side to adapt the
logic to be able to handle the setup via the sysfs file for the module.
Thanks,
Andra
>
> Anyway, yes, if you can dynamically do this at runtime, that's great,
> but it feels ackward to me to rely on one configuration thing as a
> module parameter, and everything else through the ioctl interface.
> Unification would seem to be a good thing, right?
>
> thanks,
>
> greg k-h
Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in Romania. Registration number J22/2621/2005.
Powered by blists - more mailing lists