lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <1299594304.7938.33.camel@marge.simson.net>
Date:	Tue, 08 Mar 2011 15:25:04 +0100
From:	Mike Galbraith <efault@....de>
To:	Peter Zijlstra <a.p.zijlstra@...llo.nl>
Cc:	Yong Zhang <yong.zhang0@...il.com>,
	LKML <linux-kernel@...r.kernel.org>, Ingo Molnar <mingo@...e.hu>
Subject: Re: [patchlet] sched: fix rt throttle runtime borrowing

On Tue, 2011-03-08 at 14:46 +0100, Peter Zijlstra wrote:
> On Tue, 2011-03-08 at 14:27 +0100, Mike Galbraith wrote:
> 
> > > Also, how much of a problem is it really? When I start a FIFO spinner on
> > > my machine I can still ssh in and kill the thing.
> > 
> > It's a problem if you have one box.  Also, try starting a hefty load
> > then having an rt task go nuts.  Nothing good happens here. 
> 
> Right, so I think we're not aggressive enough to migrate tasks away from
> very small cpu_power CPUs, trapping tasks on such CPUs.

I don't think that's it at all.  I just tried (again) with virgin tip.
Start a kbuild, start an RT hog.  Instant frozen box.  If you're in a
console shell, you may or may not save the box, but the desktop is
instantly toast, and there is no ssh possibility.  I can ping the box,
but that's it.  It's a pingable doorstop.

> Of course, this is no help for pinned tasks.. but then you get what you
> asked for isn't it ;-)

But events/N are pinned, and kinda critical.

> > > Not allowing 100% FIFO usage on SMP is going to make it very very hard
> > > to implement any kind of fifo-cgroup stuff.
> > 
> > The only thing I care much about is the default setup.  The safety net
> > should work, otherwise it's a waste.
> 
> Right, but how much trouble can be avoided by making the sched_fair
> load-balancer migrate tasks away from very small cpu_power CPUs?

I don't think that's the issue.  I think when any events is blocked
forever, it's game over.

> It won't avoid actual deadlocks when someone tries to wait for workqueue
> broadcasts and the like, but how much of that is actually happening?

Mmmmm.. enough to kill my box every time I test? :)

> > Maybe only doing the borrow thing when there are active RT groups is the
> > right thing to do.  (minus knob) 
> 
> Thing is the whole borrowing needs to go, Dario and me finally came up
> with a 'sane' way to implement fifo-cgroups, but that does include
> explicitly allowing starving CPUs.

Hm, borrowing going away sounds great.  Dunno about that starving CPUs
bit, that has never led to anything but BRB poking here.

> Not allowing that very quickly degenerates into massive trouble like
> gang-scheduling or bouncing tasks around like mad and generally messing
> up the 'load-balancer'.

If the problematic code is going away anyway, I'll just leave it. It's
the same problem that has existed since the dawn of time.  RT hogs can
be utterly deadly a while longer I suppose :)

	-Mike

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ