[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.DEB.2.10.1508201713070.13335@vshiva-Udesk>
Date: Thu, 20 Aug 2015 17:13:58 -0700 (PDT)
From: Vikas Shivappa <vikas.shivappa@...el.com>
To: Vikas Shivappa <vikas.shivappa@...el.com>
cc: Marcelo Tosatti <mtosatti@...hat.com>,
Matt Fleming <matt@...eblueprint.co.uk>,
Tejun Heo <tj@...nel.org>,
Vikas Shivappa <vikas.shivappa@...ux.intel.com>,
linux-kernel@...r.kernel.org, x86@...nel.org, hpa@...or.com,
tglx@...utronix.de, mingo@...nel.org, peterz@...radead.org,
matt.fleming@...el.com, will.auld@...el.com,
glenn.p.williamson@...el.com, kanaka.d.juvva@...el.com,
Karen Noel <knoel@...hat.com>
Subject: Re: [PATCH 5/9] x86/intel_rdt: Add new cgroup and Class of service
management
On Thu, 20 Aug 2015, Vikas Shivappa wrote:
>
>
> On Mon, 17 Aug 2015, Marcelo Tosatti wrote:
>
>> Vikas, Tejun,
>>
>> This is an updated interface. It addresses all comments made
>> so far and also covers all use-cases the cgroup interface
>> covers.
>>
>> Let me know what you think. I'll proceed to writing
>> the test applications.
>>
>> Usage model:
>> ------------
>>
>> This document details how CAT technology is
>> exposed to userspace.
>>
>> Each task has a list of task cache reservation entries (TCRE list).
>>
>> The init process is created with empty TCRE list.
>>
>> There is a system-wide unique ID space, each TCRE is assigned
>> an ID from this space. ID's can be reused (but no two TCREs
>> have the same ID at one time).
>>
>> The interface accomodates transient and independent cache allocation
>> adjustments from applications, as well as static cache partitioning
>> schemes.
>>
>> Allocation:
>> Usage of the system calls require CAP_SYS_CACHE_RESERVATION capability.
>>
>> A configurable percentage is reserved to tasks with empty TCRE list.
>
> And how do you think you will do this without a system controlled mechanism ?
> Everytime in your proposal you include these caveats which actually mean to
> include a system controlled interface in the background ,
> and your below interfaces make no mention of this really ! Why do we want to
> confuse ourselves like this ?
>
> syscall only interface does not seem to work on its own for the cache
> allocation scenario. This can only be a nice to have interface on top of a
> system controlled mechanism like cgroup interface. Sure you can do all the
> things you did with cgroup with the same with syscall interface but the point
> is what are the use cases that cant be done with this syscall only interface.
> (ex: to deal with cases you brought up earlier like when an app does cache
> intensive work for some time and later changes - it could use the syscall
> interface to quickly reqlinquish the cache lines or change a clos associated
> with it)
>
> I have repeatedly listed the use cases that can be dealt with , with this
big typo - 'use cases that cannot be dealt with'
> interface. How will you address the cases like 1.1 and 1.2 with your syscall
> only interface ? So we expect all the millions of apps like SAP, oracle etc
> and etc and all the millions of app developers to magically learn our new
> syscall interface and also cooperate between themselves to decide a cache
> allocation that is agreeable to all ? (which btw the interface doesnt list
> below how to do it) and then by some godly powers the noisly neighbour will
> decide himself to give up the cache ? (that should be first ever app to not
> request more resource in the world for himself and hurt his own performance -
> they surely dont want to do social service !)
>
> And how do we do the case 1.5 where the administrator want to assign cache to
> specific VMs in a cloud etc - with the hypothetical syscall interface we now
> should expect all the apps to do the above and now they also need to know
> where they run (what VM , what socket etc) and then decide and cooperate an
> allocation : compare this to a container environment like rancher where today
> the admin can convinetly use docker underneath to allocate
> mem/storage/compute to containers and easily extend this to include shared
> l3.
>
> http://marc.info/?l=linux-kernel&m=143889397419199
>
> without addressing the above the details of the interface below is irrelavant
> -
>
> Your initial request was to extend the cgroup interface to include rounding
> off the size of cache (which can easily be done with a bash script on top of
> cgroup interface !) and now you are proposing a syscall only interface ? this
> is very confusing and will only unnecessarily delay the process without
> adding any value.
>
> however like i mentioned the syscall interface or user/app being able to
> modify the cache alloc could be used to address some very specific use cases
> on top an existing system managed interface. This is not really a common case
> in cloud or container environment and neither a feasible deployable solution.
> Just consider the millions of apps that have to transition to such an
> interface to even use it - if thats the only way to do it, thats dead on
> arrival.
>
> Also please donot include kernel automatically adjusting resources in your
> reply as thats totally irrelavent and again more confusing as we have already
> exchanged some >100 emails on this same patch version without meaning
> anything so far.
>
> The debate is purely between a syscall only interface and a system manageable
> interface(like cgroup where admin or a central entity controls the
> resources). If not define what is it first before going into details.
>
> Thanks,
> Vikas
>
>>
>> On fork, the child inherits the TCR from its parent.
>>
>> Semantics:
>> Once a TCRE is created and assigned to a task, that task has
>> guaranteed reservation on any CPU where its scheduled in,
>> for the lifetime of the TCRE.
>>
>> A task can have its TCR list modified without notification.
>>
>> FIXME: Add a per-task flag to not copy the TCR list of a task but delete
>> all TCR's on fork.
>>
>> Interface:
>>
>> enum cache_rsvt_flags {
>> CACHE_RSVT_ROUND_DOWN = (1 << 0), /* round "kbytes" down */
>> };
>>
>> enum cache_rsvt_type {
>> CACHE_RSVT_TYPE_CODE = 0, /* cache reservation is for code */
>> CACHE_RSVT_TYPE_DATA, /* cache reservation is for data */
>> CACHE_RSVT_TYPE_BOTH, /* cache reservation is for code and data
>> */
>> };
>>
>> struct cache_reservation {
>> unsigned long kbytes;
>> int type;
>> int flags;
>> int trcid;
>> };
>>
>> The following syscalls modify the TCR of a task:
>>
>> * int sys_create_cache_reservation(struct cache_reservation *rsvt);
>> DESCRIPTION: Creates a cache reservation entry, and assigns
>> it to the current task.
>>
>> returns -ENOMEM if not enough space, -EPERM if no permission.
>> returns 0 if reservation has been successful, copying actual
>> number of kbytes reserved to "kbytes", type to type, and tcrid.
>>
>> * int sys_delete_cache_reservation(struct cache_reservation *rsvt);
>> DESCRIPTION: Deletes a cache reservation entry, deassigning it
>> from any task.
>>
>> Backward compatibility for processors with no support for code/data
>> differentiation: by default code and data cache allocation types
>> fallback to CACHE_RSVT_TYPE_BOTH on older processors (and return the
>> information that they done so via "flags").
>>
>> * int sys_attach_cache_reservation(pid_t pid, unsigned int tcrid);
>> DESCRIPTION: Attaches cache reservation identified by "tcrid" to
>> task by identified by pid.
>> returns 0 if successful.
>>
>> * int sys_detach_cache_reservation(pid_t pid, unsigned int tcrid);
>> DESCRIPTION: Detaches cache reservation identified by "tcrid" to
>> task by identified pid.
>>
>> The following syscalls list the TCRs:
>> * int sys_get_cache_reservations(size_t size, struct cache_reservation
>> list[]);
>> DESCRIPTION: Return all cache reservations in the system.
>> Size should be set to the maximum number of items that can be stored
>> in the buffer pointed to by list.
>>
>> * int sys_get_tcrid_tasks(unsigned int tcrid, size_t size, pid_t list[]);
>> DESCRIPTION: Return which pids are associated to tcrid.
>>
>> * sys_get_pid_cache_reservations(pid_t pid, size_t size,
>> struct cache_reservation list[]);
>> DESCRIPTION: Return all cache reservations associated with "pid".
>> Size should be set to the maximum number of items that can be stored
>> in the buffer pointed to by list.
>>
>> * sys_get_cache_reservation_info()
>> DESCRIPTION: ioctl to retrieve hardware info: cache round size, whether
>> code/data separation is supported.
>>
>>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists