[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20080710085500.GA19918@elte.hu>
Date: Thu, 10 Jul 2008 10:55:00 +0200
From: Ingo Molnar <mingo@...e.hu>
To: Elias Oltmanns <eo@...ensachen.de>
Cc: linux-kernel@...r.kernel.org, Roland McGrath <roland@...hat.com>,
Peter Zijlstra <a.p.zijlstra@...llo.nl>,
Dmitry Adamushko <dmitry.adamushko@...il.com>
Subject: Re: Scheduling issue with signal handling
* Elias Oltmanns <eo@...ensachen.de> wrote:
> By sprinkling some printk()s all over the place, I've managed to
> establish the following sequence of events taking place in the event
> of delayed signal handling as described above: The first Ctrl+Z event
> enqueues a SIGTSTP signal which eventually results in a call to
> kick_process(). For some reason though, the signal isn't handled
> straight away but remains on the queue for some time. Consequently,
> subsequent Ctrl+Z events result in echoing another ^Z to the terminal
> but everything related to sending a signal is skipped (and rightly so)
> because the kernel detects that a SIGTSTP is still pending.
> Eventually, get_signal_to_deliver() dequeues the SIGTSTP signal and
> the shell propt appears.
>
> My question is this: Even under high disk I/O pressure, the threads
> dealing with I/O to the terminal evidently still get their turn as
> indicated by the sequence of ^Z appearing on screen. Why is it then,
> that the threads which are meant to process the SIGTSTP or SIGINT
> signals aren't scheduled for some seconds and is there a way to change
> this?
>
> Please let me know if there is anything I can try to investigate this
> any further or if you need further information.
>
> Thanks in advance,
>
> Elias
>
> [1] http://lkml.org/lkml/2008/6/28/50
hm, kick_process() is a no-op on !SMP.
Basically, when a new signal is queued and a task is already running, it
will run in due course and process the signal the moment it's scheduled
again. (unless the signal is blocked)
If a task is not already running, then the signal code will wake up the
task and it will then process the signal the moment it's executed.
The maximum latency of a runnable task hitting the CPU is controlled via
/proc/sys/kernel/sched_latency [available if CONFIG_SCHED_DEBUG=y in the
.config] - 20 milliseconds on uniprocessors.
Several seconds of lag is almost out of question and would indicate a
serious scheduler bug, or - which is far more likely - either an
application signal processing hickup or a kernel signal processing
hickup.
If the lag happens with the task you can observe its worst-case
scheduling delay by looking at /proc/<PID>/sched, if you also have
CONFIG_SCHEDSTAT=y in your .config.
For example, a random shell's delays on a testbox:
phoenix:~> grep se.wait_max /proc/$$/sched
se.wait_max : 3.338588
That's 3.3 msecs _worst case_, on a system that has otherwise quite
insane load:
10:53:57 up 2:48, 2 users, load average: 77.56, 94.33, 102.75
So several seconds of delay, if it came from the scheduler, would be
really anomalous.
As a final measure, instead of printk's, you could try the scheduler
tracer in linux-next (CONFIG_CONTEXT_SWITCH_TRACER=y), to have an exact
idea about what is going on and when. (see /debug/tracing/README)
[ You might also want to try CONFIG_FTRACE=y and CONFIG_DYNAMIC_FTRACE=y
for extremely finegrained kernel tracing - available in linux-next
too. ]
Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists