[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <d80e3d5d-1db2-0f47-dbf3-dfc607e1b451@intel.com>
Date: Wed, 9 Dec 2020 16:22:40 -0800
From: Reinette Chatre <reinette.chatre@...el.com>
To: James Morse <james.morse@....com>, fenghua.yu@...el.com
Cc: tglx@...utronix.de, bp@...en8.de, tony.luck@...el.com,
kuo-lang.tseng@...el.com, shakeelb@...gle.com,
valentin.schneider@....com, mingo@...hat.com, babu.moger@....com,
hpa@...or.com, x86@...nel.org, linux-kernel@...r.kernel.org,
stable@...r.kernel.org
Subject: Re: [PATCH 2/3] x86/resctrl: Update PQR_ASSOC MSR synchronously when
moving task to resource group
Hi James,
On 12/9/2020 8:51 AM, James Morse wrote:
> Hi Reinette, Fenghua,
>
> Subject nit: I think 'use IPI instead of task_work() to update PQR_ASSOC_MSR' conveys the
> guts of this change more quickly!
Sure. Thank you. A small change is that I plan to write "PQR_ASSOC MSR"
instead to closer match the name.
>
> On 03/12/2020 23:25, Reinette Chatre wrote:
>> From: Fenghua Yu <fenghua.yu@...el.com>
>>
>> Currently when moving a task to a resource group the PQR_ASSOC MSR
>> is updated with the new closid and rmid in an added task callback.
>> If the task is running the work is run as soon as possible. If the
>> task is not running the work is executed later
>
>> in the kernel exit path when the kernel returns to the task again.
>
> kernel exit makes me thing of user-space... is it enough to just say:
> "by __switch_to() when task is next run"?
I do not think that would be accurate. The paragraph to which you are
proposing the change states the current context, before the fix, when
task_work_add() is still used to update PQR_ASSOC. The "kernel exit"
text you refer to is quite close to task_work_add()'s comments and
indeed refers to the work that is run before returning to user space. If
a function name would make things clearer perhaps adding
exit_to_user_mode_loop() instead? Changing the text to __switch_to()
does not reflect the context described here since __switch_to() is what
is called when task is context switched back from where
resctrl_sched_in() is called, not the queued work being described here.
>> Updating the PQR_ASSOC MSR as soon as possible on the CPU a moved task
>> is running is the right thing to do. Queueing work for a task that is
>> not running is unnecessary (the PQR_ASSOC MSR is already updated when the
>> task is scheduled in) and causing system resource waste with the way in
>> which it is implemented: Work to update the PQR_ASSOC register is queued
>> every time the user writes a task id to the "tasks" file, even if the task
>> already belongs to the resource group. This could result in multiple pending
>> work items associated with a single task even if they are all identical and
>> even though only a single update with most recent values is needed.
>> Specifically, even if a task is moved between different resource groups
>> while it is sleeping then it is only the last move that is relevant but
>> yet a work item is queued during each move.
>> This unnecessary queueing of work items could result in significant system
>> resource waste, especially on tasks sleeping for a long time. For example,
>> as demonstrated by Shakeel Butt in [1] writing the same task id to the
>> "tasks" file can quickly consume significant memory. The same problem
>> (wasted system resources) occurs when moving a task between different
>> resource groups.
>>
>> As pointed out by Valentin Schneider in [2] there is an additional issue with
>> the way in which the queueing of work is done in that the task_struct update
>> is currently done after the work is queued, resulting in a race with the
>> register update possibly done before the data needed by the update is available.
>>
>> To solve these issues, the PQR_ASSOC MSR is updated in a synchronous way
>> right after the new closid and rmid are ready during the task movement,
>> only if the task is running. If a moved task is not running nothing is
>> done since the PQR_ASSOC MSR will be updated next time the task is scheduled.
>> This is the same way used to update the register when tasks are moved as
>> part of resource group removal.
>
> (as t->on_cpu is already used...)
>
> Reviewed-by: James Morse <james.morse@....com>
Thank you very much. I do plan to follow your suggestion below though.
>> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> index 68db7d2dec8f..9d62f1fadcc3 100644
>> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> @@ -525,6 +525,16 @@ static void rdtgroup_remove(struct rdtgroup *rdtgrp)
>
>
>> +static void update_task_closid_rmid(struct task_struct *t)
>> {
>> + int cpu;
>>
>> + if (task_on_cpu(t, &cpu))
>> + smp_call_function_single(cpu, _update_task_closid_rmid, t, 1);
>
>
> I think:
> | if (task_curr(t))
> | smp_call_function_single(task_cpu(t), _update_task_closid_rmid, t, 1);
>
> here would make for an easier backport as it doesn't depend on the previous patch.
>
Will do, thank you very much.
The previous patch has now become a bugfix in its own right after you
pointed out the issue with using t->on_cpu. In addressing that I plan to
remove the helpers found in patch #1 so backporting should continue to
be easier.
>
>> +}
>
> [...]
>
>> static int __rdtgroup_move_task(struct task_struct *tsk,
>> struct rdtgroup *rdtgrp)
>> {
>
>> + if (rdtgrp->type == RDTCTRL_GROUP) {
>> + tsk->closid = rdtgrp->closid;
>> + tsk->rmid = rdtgrp->mon.rmid;
>> + } else if (rdtgrp->type == RDTMON_GROUP) {
>
> [...]
>
>> + } else {
>
>> + rdt_last_cmd_puts("Invalid resource group type\n");
>> + return -EINVAL;
>
> Wouldn't this be a kernel bug?
> I'd have thought there would be a WARN_ON_ONCE() here to make it clear this isn't
> user-space's fault!
You are right, this would be a kernel bug. I'll add a WARN_ON_ONCE().
>
>
>> }
>> - return ret;
>> +
>> + /*
>> + * By now, the task's closid and rmid are set. If the task is current
>> + * on a CPU, the PQR_ASSOC MSR needs to be updated to make the resource
>> + * group go into effect. If the task is not current, the MSR will be
>> + * updated when the task is scheduled in.
>> + */
>> + update_task_closid_rmid(tsk);
>> +
>> + return 0;
>> }
>
Thank you very much.
Reinette
Powered by blists - more mailing lists