[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4A6BC2FC.7020700@billgatliff.com>
Date: Sat, 25 Jul 2009 21:44:12 -0500
From: Bill Gatliff <bgat@...lgatliff.com>
To: Jamie Lokier <jamie@...reable.org>
CC: Peter Zijlstra <peterz@...radead.org>,
sen wang <wangsen.linux@...il.com>, mingo@...e.hu,
akpm@...ux-foundation.org, kernel@...ivas.org, npiggin@...e.de,
arjan@...radead.org, linux-arm-kernel@...ts.arm.linux.org.uk,
linux-kernel@...r.kernel.org
Subject: Re: report a bug about sched_rt
Jamie Lokier wrote:
> Bill Gatliff wrote:
>
>> Jamie Lokier wrote:
>>
>>> For simple things like "try to keep the buffer to my DVD writer full"
>>> (no I don't know how much CPU that requires - it's a kind of "best
>>> effort but try very hard!"), it would be quite useful to have
>>> something like RT-bandwidth which grants a certain percentage of time
>>> as an RT task, and effectively downgrades it to SCHED_OTHER when that
>>> time is exceeded to permit some fairness with the rest of the system.
>>>
>>>
>> Useful perhaps, but an application design that explicitly communicates
>> your desires to the scheduler will be more robust, even if it does seem
>> more complex at the outset.
>>
>
> I agree with communicting the desire explicitly to the scheduler.
>
> In the above example, the exact desire is "give me as much CPU as I
> ask for, because my hardware servicing will be adversely but
> non-fatally affected if you don't, and the amount of CPU needed to
> service the hardware cannot be determined in advance, but prevent me
> from blocking progress in the rest of the system by limiting my
> exclusive ownership of the CPU".
>
> How do you propose to communicate that to the scheduler, if not by
> something rather like RT-bandwidth with downgrading to SCHED_OTHER
> when a policy limit is exceeded?
>
This is a great real-world problem. And there's no one-size-fits-all
answer, unfortunately.
RT-bandwidth will give you the system behavior you are after, but it's a
pretty blunt instrument.
I'd consider putting some throttling in your interrupt handler that
prevents it from running more than a certain amount of calculation per
interrupt event. And perhaps it's looking at execution timestamps to
determine how often it's running, and can therefore do a rough
calculation of how much CPU it's eating. At least until threaded
interrupt scheduling is widespread, a runaway interrupt handler is
definitely an opportunity to hang up a system.
Tasklets are nice for this, because the scheduler won't re-queue one if
it's already running. So if your interrupt handler's job is just to
launch the tasklet, and you know how much time the tasklet takes to run,
then if you get a burst of interrupts you don't end up launching an
equivalent burst of scheduled work: eventually the interrupt handler
overtakes the tasklet, and the additional interrupt events get dropped.
That's often a decent way to deal with system overload, especially if it
leaves the system functional enough to take some sort of "evasive
action" like reverting to polled i/o, issuing a diagnostic message, or
doing an orderly transition to a safe mode.
A flood ping, lots of paging, and driver bugs are just a few ways you
can encounter an unexpected burst of interrupt activity that might, if
not dealt with on some level, cause the system to suddenly destabilize.
Point is, keep a mentality that you want to fall back onto RT-bandwidth
(or any other type of watchdog timer expiration) only after you've
exhausted all other options. Pretend it isn't there--- but definitely
know what will happen if it ever steps in. A system coded that way is
much more resistant to breakage, in my experience anyway.
b.g.
--
Bill Gatliff
bgat@...lgatliff.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists