lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 17 Aug 2017 18:50:42 +0200
From:   Sebastian Andrzej Siewior <bigeasy@...utronix.de>
To:     Mike Galbraith <efault@....de>
Cc:     Thomas Gleixner <tglx@...utronix.de>,
        LKML <linux-kernel@...r.kernel.org>,
        linux-rt-users <linux-rt-users@...r.kernel.org>,
        Steven Rostedt <rostedt@...dmis.org>
Subject: Re: [patch-rt] hotplug, hrtimer: Migrate expired/deferred timers
 during cpu offline

On 2017-08-14 09:59:48 [+0200], Mike Galbraith wrote:
> On Fri, 2017-08-11 at 10:15 +0200, Mike Galbraith wrote:
> > On Fri, 2017-08-11 at 09:55 +0200, Mike Galbraith wrote:
> > > The below fixes the list debug explosion up.
> > > 
> > > If we do not migrate expired/deferred timers during cpu offline, ->cb_entry
> > > will be corrupted by online initialization of base->expired, leading to a
> > > loud list debug complaint should someone call __remove_hrtimer() thereafter.
> > > 
> > > Signed-off-by: Mike Galvraith <efault@....de>
> > ahem.....................b
> 
> (actually, I shouldn't have signed, question being why we now leave
> them lying about when we _apparently_ previously did not)

takedown_cpu() invokes early smpboot_park_threads() which parks/ stops
the ksoftirqd. That means each hrtimer that fires after that and is not
marked as irqsafe won't be processed but just enqueued onto the
->expired list. The timer would be processed once the CPU goes back
online. Be not really. The thing is that once the CPU goes back online
the "expired" list head will be initialized and that timer is lost. Once
you try to cancel it, it will remove itself from the expired list and
this is when the list corruption is noticed.
My guess here is that the hotplug rework changed the timing and the bug
is more obvious now: if you cancel the timer before the CPU goes back
online then nothing happens.

> > > ---
> > >  kernel/time/hrtimer.c |   13 +++++++++++++
> > >  1 file changed, 13 insertions(+)
> > > 
> > > --- a/kernel/time/hrtimer.c
> > > +++ b/kernel/time/hrtimer.c
> > > @@ -1802,6 +1802,19 @@ static void migrate_hrtimer_list(struct
> > >  		 */
> > >  		enqueue_hrtimer(timer, new_base);
> > >  	}
> > > +
> > > +	/*
> > > +	 * Finally, migrate any expired timers deferred by RT.
> > > +	 */
> > > +	while (!list_empty(&old_base->expired)) {
> > > +		struct list_head *entry = old_base->expired.next;
> > > +
> > > +		timer = container_of(entry, struct hrtimer, cb_entry);
> 
> (oops, forgot to change that back too. [scribble scribble])
> 
> > > +		/* XXX: hm, perhaps defer again instead of enqueueing. */
> > > +		__remove_hrtimer(timer, old_base, HRTIMER_STATE_ENQUEUED, 0);
> > > +		timer->base = new_base;
> > > +		enqueue_hrtimer(timer, new_base);

__remove_hrtimer() shouldn't be required because it has been done
already. It should be enough to just list_splice() one list to the
other and raise the softirq afterwards.

> > > +	}
> > >  }
> > >  
> > >  int hrtimers_dead_cpu(unsigned int scpu)

Sebastian

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ