linux-kernel - RE: [PATCH] hrtimer:__run_hrtimer races with enqueue

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <alpine.LFD.2.02.1210261359210.2756@ionos>
Date:	Fri, 26 Oct 2012 14:09:33 +0200 (CEST)
From:	Thomas Gleixner <tglx@...utronix.de>
To:	"Zhang, Yanmin" <yanmin.zhang@...el.com>
cc:	"He, Bo" <bo.he@...el.com>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	Peter Zijlstra <peterz@...radead.org>,
	Ingo Molnar <mingo@...hat.com>,
	"yanmin_zhang@...ux.intel.com" <yanmin_zhang@...ux.intel.com>
Subject: RE: [PATCH] hrtimer:__run_hrtimer races with enqueue_hrtimer

On Fri, 26 Oct 2012, Zhang, Yanmin wrote:
> >From: Thomas Gleixner [mailto:tglx@...utronix.de]
> >Your code is returning HRTIMER_RESTART from the timer callback and at
> >the same time it starts the timer from some other context. That's what
> >needs to be fixed.
> 
> The timer user should fix it. But could we also change hrtimer to
> make it more stable?  At least, instead of panic, could we print
> some information and go ahead to let kernel continue?

That's unfortunately not possible. At this point the timer might be
already corrupted.

CPU0				CPU 1

timer expires
  callback runs		
                                hrtimer_start()
				   expiry value is set
				   hrtimer_enqueue()

   hrtimer_forward()
      expiry value is set

  return HRTIMER_RESTART				

So while we can prevent the double enqueue, we have no way to deal
with the corrupted expiry value and the inconsistent RB tree. We can
give better debugging information, but we can't pretend that
everything is nice and cool.

If we really want to do something about it which keeps the machine
alive, then we need to 

       1) dequeue the timer
       2) run a consistency check over the rbtree
       3) enqueue the timer

Not sure if that's worth the trouble.

Thanks,

	tglx
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/