lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20230411084634.GA576825@hirez.programming.kicks-ass.net>
Date:   Tue, 11 Apr 2023 10:46:34 +0200
From:   Peter Zijlstra <peterz@...radead.org>
To:     Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
Cc:     linux-kernel@...r.kernel.org, Aaron Lu <aaron.lu@...el.com>,
        Olivier Dion <odion@...icios.com>
Subject: Re: [RFC PATCH v3] sched: Fix performance regression introduced by
 mm_cid

On Fri, Apr 07, 2023 at 07:50:42PM -0400, Mathieu Desnoyers wrote:

> Let's looks at the relevant combinations of TSA/TSB, and TMA transitions.
> 
> Scenario A) (TSA)+(TMA) (from next task perspective)
> 
> CPU0                                                 CPU1
> 
> Context switch CS-1                                  Migrate-from
>   - store to rq->curr: (N)->(Y) (TSA)                - cmpxchg to *pcpu_id to LAZY (TMA)
>      *** missing barrier ?? ***                        (implied barrier after cmpxchg)
>   - prepare_task_switch()
>     - switch_mm_cid()
>       - mm_cid_get (next)
>         - READ_ONCE(*pcpu_cid)                       - rcu_dereference(src_rq->curr)
> 
> This Dekker ensures that either task (Y) is observed by the rcu_dereference() or the LAZY
> flag is observed by READ_ONCE(), or both are observed.
> 
> If task (Y) store is observed by rcu_dereference(), it means that there is still
> an active task on the cpu. Migrate-from will therefore not transition to UNSET, which
> fulfills property (1). That observed task will itself eventually need a migrate-from
> to be migrated away from that cpu, which fulfills property (2).
> 
> If task (Y) is not observed, but the lazy flag is observed by READ_ONCE(), it will
> move its state to UNSET, which clears the percpu cid perhaps uselessly (which is not
> an issue for correctness). Because task (Y) is not observed, CPU1 can move ahead to
> set the state to UNSET. Because moving state to UNSET is done with a cmpxchg expecting
> that the old state has the LAZY flag set, only one thread will successfully UNSET.
> 
> If both states (LAZY flag and task (Y)) are observed, the thread on CPU0 will observe
> the LAZY flag and transition to UNSET (perhaps uselessly), and CPU1 will observe task
> (Y) and do nothing more, which is fine.
> 
> What we are effectively preventing with this Dekker is a scenario where neither LAZY
> flag nor store (Y) are observed, which would fail property (1) because this would
> UNSET a cid which is actively used.

OK, this I'll buy. Let me go stare at this more.

> Scenario B) (TSB)+(TMA) (from prev task perspective)
> 
> CPU0                                                 CPU1
> 
> Context switch CS-1                                  Migrate-from
>   - store to rq->curr: (Y)->(N) (TSB)                - cmpxchg to *pcpu_id to LAZY (TMA)
>     *** missing barrier ?? ***                         (implied barrier after cmpxchg)
>   - prepare_task_switch()
>     - switch_mm_cid()
>       - cid_put_lazy() (prev)
>         - READ_ONCE(*pcpu_cid)                       - rcu_dereference(src_rq->curr)
> 

This I'm conflicted about, if we're running Y, then how the heck do we
get to setting LAZY in the first place?

For this scenario there must be at least an N->Y->N transition, such
that the first:

  if (src_task->mm_cid_active && src_task->mm == mm) {

can observe N and proceed to setting LAZY. But that then leads us to the
scenario above.

(And apparently I ended up doing an N->N transition, which really isn't
*that* interesting :-)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ