linux-kernel - Re: [PATCH v2 1/1] x86/resctrl: fix task closid/rmid update race

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CALPaoChEMCdrVkYzqkWuXJVhsSyB0PB2CTZcvKXUJTavC7dYjg@mail.gmail.com>
Date:   Mon, 14 Nov 2022 11:02:42 +0100
From:   Peter Newman <peternewman@...gle.com>
To:     James Morse <james.morse@....com>
Cc:     reinette.chatre@...el.com, fenghua.yu@...el.com, bp@...en8.de,
        derkling@...gle.com, eranian@...gle.com, hpa@...or.com,
        jannh@...gle.com, kpsingh@...gle.com, linux-kernel@...r.kernel.org,
        mingo@...hat.com, tglx@...utronix.de, x86@...nel.org
Subject: Re: [PATCH v2 1/1] x86/resctrl: fix task closid/rmid update race

Hi James,

On Fri, Nov 11, 2022 at 7:38 PM James Morse <james.morse@....com> wrote:
> On 10/11/2022 13:53, Peter Newman wrote:
> > This was because stores updating the closid and rmid in the task
> > structure could reorder with the loads in task_curr() and task_cpu().
>
> Mentioning this happens in __rdtgroup_move_task() would make this easier to review.

ok

> > Similar reordering also impacted resctrl_sched_in(), where reading the
> > updated values could reorder with prior stores to t->on_cpu.
>
> Where does restrl_sched_in() depend on t->on_cpu?

I misworded this. I should probably say "occurs" or "happens" rather
than saying "impacted". There is no impact on resctrl_sched_in() itself, but
rather the reordering that occurs around resctrl_sched_in() further breaks the
assumptions of __rdtgroup_move_task().

Also I think the mention of t->on_cpu comes from a prototype on an older
branch before you changed it to use task_curr(). Now it's rq->curr and
t->cpu reordering with t->{closid,rmid} that I'm concerned about.

>
>
> > Instead, when moving a single task, use task_call_func() to serialize
> > updates to the closid and rmid fields in the task_struct with context
> > switch.
> >
> > When deleting a group, just update the MSRs on all CPUs rather than
> > calling task_call_func() on every task in a potentially long list while
> > read-locking the tasklist_lock.
>
> This rmdir stuff feels like something that should go in a preparatory patch with an
> expanded justification. (the stuff in the comment below). Real-time users may care about
> unconditionally IPIing all CPUs, but I suspect changes to resctrl while the system is
> running aren't realistic.
>
> A group of smaller patches that make independent changes is easier to review than one big
> one! (especially as some of those changes are mechanical)

I think I made a mess of the patch when I pivoted to a different
approach in v2. It was more cohesive when I had the rmdir stuff using
update_task_closid_rmid().

>
>
> > diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> > index e5a48f05e787..d645f9a6c22e 100644
> > --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> > +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> > @@ -538,12 +538,38 @@ static void _update_task_closid_rmid(void *task)
> >               resctrl_sched_in();
> >  }
> >
> > -static void update_task_closid_rmid(struct task_struct *t)
> > +static int update_locked_task_closid_rmid(struct task_struct *t, void *arg)
> >  {
> > -     if (IS_ENABLED(CONFIG_SMP) && task_curr(t))
> > -             smp_call_function_single(task_cpu(t), _update_task_closid_rmid, t, 1);
> > -     else
> > -             _update_task_closid_rmid(t);
>
> [...]
>
> >  static int __rdtgroup_move_task(struct task_struct *tsk,
> > @@ -557,39 +583,26 @@ static int __rdtgroup_move_task(struct task_struct *tsk,
>
> > -     update_task_closid_rmid(tsk);
> > +     if (update_task_closid_rmid(tsk, rdtgrp) && IS_ENABLED(CONFIG_SMP))
> > +             /*
> > +              * If the task has migrated away from the CPU indicated by
> > +              * task_cpu() below, then it has already switched in on the
> > +              * new CPU using the updated closid and rmid and the call below
> > +              * unnecessary, but harmless.
> > +              */
> > +             smp_call_function_single(task_cpu(tsk),
> > +                                      _update_task_closid_rmid, tsk, 1);
> > +     else
> > +             _update_task_closid_rmid(tsk);
>
> I think it would result in less churn if you kept this chunk in update_task_closid_rmid().

This is another thing that made more sense in the v1 approach, where it
had to be done a little differently in the rmdir case. I'll fix it.

Also, I find it strange that the same function calls
update_task_closid_rmid(), followed by _update_task_closid_rmid(). It
gives the impression that there was a layering which has now been
disregarded.

> > @@ -2385,12 +2398,13 @@ static int reset_all_ctrls(struct rdt_resource *r)
> >   * Move tasks from one to the other group. If @from is NULL, then all tasks
> >   * in the systems are moved unconditionally (used for teardown).
> >   *
> > - * If @mask is not NULL the cpus on which moved tasks are running are set
> > - * in that mask so the update smp function call is restricted to affected
> > - * cpus.
> > + * Following this operation, the caller is required to update the MSRs on all
> > + * CPUs. The cost of constructing the precise mask of CPUs impacted by this
> > + * operation will likely be high, during which we'll be blocking writes to the
> > + * tasklist, and in non-trivial cases, the resulting mask would contain most of
> > + * the CPUs anyways.
>
> This is the argument for not building the mask. I think it would be better placed in the
> commit message of a patch that removes that support. It's not really necessary for new
> users to read about what the function doesn't do....

The first sentence details an important requirement for the caller which
I think should remain, but I suppose I should have stopped myself from
rambling on about the rationale inline.

>
>
> With the caveat that I don't understand memory ordering:
> Reviewed-by: James Morse <james.morse@....com>

Thanks for your review. I'll clean this up and send out an update.

Thanks!
-Peter