lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Wed, 10 Jun 2015 18:29:18 +0200
From:	Mike Galbraith <umgwanakikbuti@...il.com>
To:	Thomas Gleixner <tglx@...utronix.de>
Cc:	LKML <linux-kernel@...r.kernel.org>,
	Peter Zijlstra <peterz@...radead.org>,
	Ingo Molnar <mingo@...e.hu>,
	Steven Rostedt <rostedt@...dmis.org>
Subject: Re: RFC: futex_wait() can DoS the tick

On Wed, 2015-06-10 at 17:12 +0200, Thomas Gleixner wrote:
> On Wed, 10 Jun 2015, Mike Galbraith wrote:
> > The above was handed to me by a colleague working on a Xen guest that
> > livelocked.  I at first though Xen arch must have a weird problem, but
> > when I tried proggy on my desktop box, while it didn't stop the tick
> > completely as it did the Xen box, it slowed it to a crawl.  I noticed
> > that this did not happen with newer kernels, so a bisecting I did go,
> > and found that...
> > 
> > 279f14614 x86: apic: Use tsc deadline for oneshot when available
> > 
> > ..is what fixed it up.  Trouble is, while it fixes up my Haswell box, a
> 
> This does not make any sense at all. It does not matter whether the
> box uses tscdeadline or local apic timer. We do not even program the
> hardware because we see that the event is in the past already.

Yup.

> So we raise the hrtimer softirqd, which then expires the timer. So all
> what happens is that ksoftirqd accumulates runtime, but I cannot at
> all see how that amounts to a DoS and brings the machine to a grinding
> halt.

The tick certainly appears to crawl here, and Dom0 boxen gripe if you
let them not tick at all for a while.

> I just booted a SNB with lapic=notscdeadline and ran that test
> program. All what happens is - as expected - that ksoftirqd runs more
> than we would like it to. I cannot observe any anomality vs. local
> timer interrupts at all. If I run this pinned on an otherwise idle
> core, then I get ~ CONFIG_HZ interrupts per second, which is what you
> expect when the cpu never reaches idle.

Hm.  In order to successfully bisect the thing 3.7->3.8 I ran 2xCPUS
copies because the first bisect went gaga.  I'm not having any trouble
reproducing on master with a single pinned copy though, nor did I have
any on any of the kernels either stable or enterprise I tested, and
that's quite a few.  Whatever, that first bisect did go bad.

> > The below targets the symptom, consider it hrtimer cluebat attractant.
> 
> By now I know to take your patches with a grain of salt :)

Sodium being bad for blood pressure is a medical myth.

> Some more information about your symptoms in form of configuration,
> extra patches, kernel traces etc. would be appreciated.

Virgin source or kernels with zillion+ patches, doesn't matter.  To test
virgin source earlier than EFI_STUB I had to pollute the source with
EFI backports, but nothing else.

Just a sec while I check yet again that absolutely virgin master really
really does stall....  Yup.  I pinned the tescase to CPU3..

while sleep 1; do grep LOC /proc/interrupts; done
LOC:       6706       5367       5053       6217       3031       2866       5477       3022   Local timer interrupts
LOC:       6753       5391       5074       6238       3058       2894       5576       3034   Local timer interrupts
LOC:       6791       5422       5104       6265       3066       2903       5582       3039   Local timer interrupts
LOC:       6846       5472       5154       6293       3096       2909       5595       3042   Local timer interrupts
LOC:       6855       5518       5177       6325       3199       2920       5613       3046   Local timer interrupts
LOC:       6892       5552       5217       6338       3234       2935       5637       3053   Local timer interrupts
LOC:       6983       5568       5236       6347       3244       2944       5660       3065   Local timer interrupts
LOC:       7028       5583       5251       6363       3262       2963       5673       3071   Local timer interrupts
LOC:       7217       5676       5343       6383       3305       2976       5682       3078   Local timer interrupts
LOC:       7432       5803       5418       6387       3371       3039       5757       3080   Local timer interrupts <== here
LOC:       7560       6028       5632       6394       3538       3195       5937       3084   Local timer interrupts
LOC:       7747       6135       5720       6394       3543       3262       6087       3086   Local timer interrupts
LOC:       7930       6206       5785       6394       3571       3288       6303       3087   Local timer interrupts
LOC:       8057       6299       5842       6394       3606       3346       6415       3088   Local timer interrupts
LOC:       8236       6361       5921       6394       3632       3409       6630       3090   Local timer interrupts
LOC:       8382       6448       6004       6394       3664       3478       6754       3090   Local timer interrupts
LOC:       8460       6571       6124       6394       3690       3542       6951       3092   Local timer interrupts
LOC:       8605       6670       6224       6394       3723       3614       7078       3093   Local timer interrupts
LOC:       8710       6842       6323       6394       3776       3702       7295       3123   Local timer interrupts
LOC:       8868       6947       6402       6394       3828       3784       7422       3149   Local timer interrupts
LOC:       9077       7124       6523       6394       3901       3848       7637       3172   Local timer interrupts
LOC:       9222       7189       6596       6394       3971       3928       7763       3174   Local timer interrupts
LOC:       9336       7325       6699       6394       4020       3948       7912       3176   Local timer interrupts
LOC:       9423       7414       6849       6395       4089       3979       7940       3177   Local timer interrupts
LOC:       9637       7595       6923       6395       4111       4039       7942       3179   Local timer interrupts
LOC:       9807       7734       7095       6395       4232       4108       8069       3180   Local timer interrupts
^C

Config attached.

	-Mike

Download attachment "config.xz" of type "application/x-xz" (23776 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ