lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 21 Nov 2018 17:33:16 +0100
From:   Frederic Weisbecker <frederic@...nel.org>
To:     Peter Zijlstra <peterz@...radead.org>
Cc:     LKML <linux-kernel@...r.kernel.org>,
        Wanpeng Li <wanpengli@...cent.com>,
        Thomas Gleixner <tglx@...utronix.de>,
        Yauheni Kaliuta <yauheni.kaliuta@...hat.com>,
        Ingo Molnar <mingo@...nel.org>, Rik van Riel <riel@...hat.com>,
        Oleg Nesterov <oleg@...hat.com>
Subject: Re: [PATCH 20/25] sched/kcpustat: Introduce vtime-aware kcpustat
 accessor

On Wed, Nov 21, 2018 at 09:18:19AM +0100, Peter Zijlstra wrote:
> On Tue, Nov 20, 2018 at 11:40:22PM +0100, Frederic Weisbecker wrote:
> > On Tue, Nov 20, 2018 at 03:23:06PM +0100, Peter Zijlstra wrote:
> > > On Wed, Nov 14, 2018 at 03:46:04AM +0100, Frederic Weisbecker wrote:
> > > 
> > > > +void kcpustat_cputime(struct kernel_cpustat *kcpustat, int cpu,
> > > > +		      u64 *user, u64 *nice, u64 *system,
> > > > +		      u64 *guest, u64 *guest_nice)
> > > > +{
> > > > +	struct task_struct *curr;
> > > > +	struct vtime *vtime;
> > > > +	int err;
> > > > +
> > > > +	if (!vtime_accounting_enabled()) {
> > > > +		kcpustat_cputime_raw(kcpustat, user, nice,
> > > > +				     system, guest, guest_nice);
> > > > +		return;
> > > > +	}
> > > > +
> > > > +	rcu_read_lock();
> > > > +
> > > > +	do {
> > > > +		curr = rcu_dereference(kcpustat->curr);
> > > 
> > > Like I explained earlier; I don't think the above is correct.
> > > task_struct is itself not RCU protected.
> > 
> > But there is at least one put_task_struct() that is enqueued as an RCU callback
> > on release_task(). That patchset (try to) make sure that kcpustat->curr can't
> > be assigned beyond that point.
> > 
> > Or did I misunderstand something?
> 
> Yeah; release_task() is not the normal exit path. Oleg can probably
> remember how all that works, because I always get lost there :-/
> 
> In any case, have a look at task_rcu_dereference(), but that still does
> not explain the rcu_assign_pointer() stuff you use to set
> kcpustat->curr.

So there are two reasons here.

One is the ordering. Here is the write side:


       schedule() {
           rq->curr = next;
	   context_switch() {
               vtime_task_switch() {
	           vtime_seqcount_write_lock(prev);
                   //flush pending vtime
                   vtime_seqcount_write_unlock(prev);

       	           vtime_seqcount_write_lock(next);
                   //Init vtime
                   vtime_seqcount_write_unlock(next);

                   rcu_assign_pointer(this_kcpustat->curr, next);
               }
           }
       }

So now from the read side, if I fetch the task from rq->curr, there is a risk
that I see the next task but I don't see the vtime flushed by prev task yet.
If the prev task has run 1 second without being interrupted, I'm going to miss
that whole time.

Assigning the next task after the prev task has flushed its vtime prevent from
that kind of race due to the barriers implied by seqcount and even rcu_assign_pointer.

The second reason is that I looked at task_rcu_dereference() and its guts are really
not appealing. The ordering guarantee is hard to parse there.

At least with kcpustat->curr I can make sure that the pointer is NULL before the
earliest opportunity for call_rcu(delayed_put_task_struct) to be queued (which is
exit_notify() in case of autoreap).

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ