linux-kernel - Re: floppy.c soft lockup

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Thu, 7 Jun 2007 09:25:17 -0500
From:	Matt Mackall <mpm@...enic.com>
To:	Mark Hounschell <dmarkh@....rr.com>
Cc:	Andrew Morton <akpm@...ux-foundation.org>, markh@...pro.net,
	linux-kernel@...r.kernel.org, Oleg Nesterov <oleg@...sign.ru>,
	Ingo Molnar <mingo@...e.hu>
Subject: Re: floppy.c soft lockup

On Thu, Jun 07, 2007 at 06:18:52AM -0400, Mark Hounschell wrote:
> Matt Mackall wrote:
> > On Wed, Jun 06, 2007 at 10:28:28AM -0700, Andrew Morton wrote:
> >> On Wed, 06 Jun 2007 09:12:04 -0400 Mark Hounschell <markh@...pro.net> wrote:
> >>
> >>>> As far as a 100% CPU bound task being a valid thing to do, it has been 
> >>>> done for many years on SMP machines. Any kernel limitation on this 
> >>>> surely must be considered a bug? 
> >>>>
> >>> Could someone authoritatively comment on this? Is a SCHED_RR/SCHED_FIFO
> >>> 100% Cpu bound process supported in an SMP env on Linux? (vanilla or -rt)
> >> It will kill the kernel, sorry.
> >>
> >> The only way in which we can fix that is to allow kernel threads to preempt
> >> rt-priority userspace threads.  But if we were to do that (to benefit the
> >> few) it would cause _all_ people's rt-prio processes to experience glitches
> >> due to kernel activity, which we believe to be worse.
> >>
> >> So we're between a rock and a hard place here.
> >>
> >> If we really did want to solve this then I guess the kernel would need some
> >> new code to detect a 100%-busy rt-prio process and to then start premitting
> >> preemption of it for kernel thread activity.  That detector would need to
> >> be smart enough to detect a number of 100%-busy rt-prio processes which are
> >> yielding to each other, and one rt-prio process which keeps forking others,
> >> etc.  It might get tricky.
> > 
> > The usual alternative is to manually chrt the relevant kernel threads
> > to RT priority and adjust the priority scheme of their processes appropriately.
> > 
> 
> >From an earlier thread member:
> 
> >> Mark writes:
> >> Again I don't understand why flush_scheduled_work() running on behalf
> >> of a process affinitized to processor-1 requires cooperation from
> >> events/2 (affinitized to processor-2)
> >> when there is an events/1 already affinitized to processor 1?
> 
> >Oleg write:
> >flush_workqueue() blocks until any scheduled work on any CPU has run to
> >completion. If we have some work_struct pending on CPU 2, it can be
> >completed only when events/2 executes it.
> 
> Could not flush_scheduled_work() just follow the affinity mask of the
> task that caused the call to begin with. If calling task had a cpu-mask
> of 3 then flush_scheduled_work() would do the events/0 and events/1
> thing and if the calling task had an affinity mask of 1 then only
> events/0 would be done?

The kernel's internal event API doesn't track any of this stuff and
it's not clear we'd want it to. It'd be a bit simpler perhaps to
simply allow SIGSTOPing events/0. This might even work today from
userspace.

In general, it's considered a mistake to mark CPU hogs as RT precisely
because they present a starvation risk to everything else in the
system, not just kernel threads. We could add kernel infrastructure to
make events survive this sort of thing, but that will very likely just
expose another kernel or userspace livelock.

-- 
Mathematics is the supreme nostalgia of our time.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/