[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.DEB.2.02.1402192119480.4468@ionos.tec.linutronix.de>
Date: Thu, 20 Feb 2014 01:00:05 +0100 (CET)
From: Thomas Gleixner <tglx@...utronix.de>
To: Mike Galbraith <bitbucket@...ine.de>
cc: LKML <linux-kernel@...r.kernel.org>,
Steven Rostedt <rostedt@...dmis.org>,
Peter Zijlstra <peterz@...radead.org>,
Ingo Molnar <mingo@...e.hu>,
Dan Williams <dan.j.williams@...el.com>,
dmaengine@...r.kernel.org
Subject: Re: [PATCH] softirq: stable v3.1[23] (others?) have screaming tasklet
disease - ksoftirqd[random] eats 100% CPU
On Wed, 19 Feb 2014, Mike Galbraith wrote:
> I'm seeing ksoftirqd chewing 100% CPU on one or more CPUs in both 3.12
> and 3.13, as below in a 40 core (+smt) box. It should look very
> familiar to CCs, especially Ingo.
>
> Below, tasklet is disabled by ioat2_free_chan_resources, and what I
> presume was systemd-udevd-1050 starts screaming when it meets same,
> until debug patchlet turns tracing off. Once the box was up such that I
> could login, 1050 was long gone, and ksoftirqd had taken over.
>
> systemd-udevd-976 [016] .... 27.467534: ioat_init_channel: tasklet_disable_nosync ffff880465b8bee8
> systemd-udevd-976 [016] .... 27.467649: ioat2_alloc_chan_resources: tasklet_enable ffff880465b8bee8
> <idle>-0 [072] ..s. 27.467659: tasklet_action: ENTER struct tasklet_struct *list: ffff880465b8bee8
> <idle>-0 [072] .Ns. 27.467667: tasklet_action: LOOP struct tasklet_struct *t = list: ffff880465b8bee8
> <idle>-0 [072] .Ns. 27.467673: tasklet_action: LOOP processed ffff880465b8bee8
> systemd-udevd-976 [016] .... 27.467679: ioat2_free_chan_resources: tasklet_disable_nosync ffff880465b8bee8
> systemd-udevd-1034 [000] .Ns. 27.467917: tasklet_action: ENTER struct tasklet_struct *list: ffff880465b8bee8
> systemd-udevd-1034 [000] .Ns. 27.467918: tasklet_action: LOOP struct tasklet_struct *t = list: ffff880465b8bee8
> <...>-1050 [000] ..s. 27.468203: tasklet_action: ENTER struct tasklet_struct *list: ffff880465b8bee8
> <...>-1050 [000] ..s. 27.468204: tasklet_action: LOOP struct tasklet_struct *t = list: ffff880465b8bee8
> <...>-1050 [000] ..s. 27.468204: tasklet_action: ENTER struct tasklet_struct *list: ffff880465b8bee8
> <...>-1050 [000] ..s. 27.468205: tasklet_action: LOOP struct tasklet_struct *t = list: ffff880465b8bee8
> ... much no processing, see tasklet disabled, raise softirq - wash rinse repeat
> <...>-1050 [000] ..s. 27.469561: tasklet_action: ENTER struct tasklet_struct *list: ffff880465b8bee8
> <...>-1050 [000] ..s. 27.469562: tasklet_action: LOOP struct tasklet_struct *t = list: ffff880465b8bee8
> <...>-1050 [000] ..s. 27.469563: tasklet_action: LOOP tasklet disabled ffff880465b8bee8 - It's dead Jim
>
> Hm, he says, now where have I seen text describing that trace? Right,
> RT, and the below fixes screaming NOPREEMPT kernels.
>
> Taken from 3.12-rt, and applied to screaming 3.12.11-virgin
Indeed. That's a very similar issue just for different reasons. The RT
case is special as the mainline usage side of tasklets do not expect
the preemption scenario.
But this one is clearly a driver issue.
The window where you can bring a machine into that state is infinite
large. Lets look at the tasklet_schedule --> softirq sequence:
tasklet_schedule(t)
set_bit(TASKLET_STATE_SCHED, &t->state);
queue_tasklet_on_cpu_list(t);
raise_softirq();
softirq()
splice_tasklet_cpu_list(cpu_list, list);
while (list) {
t = list;
list = t->next;
/* Sets the TASKLET_STATE_RUN bit ! */
if (tasklet_trylock(t) {
if (!atomic_read(&t->count)) { <-----
clear_bit(TASKLET_STATE_SCHED, &t->state);
t->func();
/* Clear the TASKLET_STATE_RUN bit */
tasklet_unlock();
continue;
}
tasklet_unlock();
queue_tasklet_on_cpu_list(t);
raise_softirq();
}
So up to the atomic_read in the softirq all calls to tasklet_disable()
even if issued eons before that point are going to put the softirq
into an infinite loop when the tasklet is scheduled.
Even if we would put a check for the disabled state into
tasklet_schedule there would be still the window between the schedule
and the actual softirq handling. And we even can't add that check
because that would break "sane" use sites of tasklet_disable.
tasklet_disable/enable is only meant for temporary, i.e. over a very
short code sequence, preventing the execution of the tasklet.
The usage of tasklet_disable() in teardown scenarios is completely
broken. The only way to do that is to have a proper serialization of
the teardown versus the interrupt which schedules the tasklet:
/*
* First step.
*/
disable_interrupt_at_device_or_irq_line_level();
/*
* This makes sure that even a spurious interrupt which
* arrives _AFTER_ the synchronize_irq() cannot schedule
* the tasklet anymore.
*/
tell_interrupt_to_not_schedule_tasklet();
/* Make sure that no interrupt is on the fly */
synchronize_irq();
/*
* Kill the tasklet, which also waits for an already
* scheduled one to complete.
*/
tasklet_kill();
I tried to find something like that in the ioat code but I failed
miserably.
Instead of that it uses tasklet_disable/enable for the setup/teardown
which is completely buggered and obviously written by people who have
no clue about the tasklet semantics at all.
What's worse is that at the point where this code was written it was
already well known that tasklets are a steaming pile of crap and
should die.
I know why and how the RT patch works around that issue, but do we
really want to make it simpler to (ab)use and introduce new users of
tasklets instead of getting rid of them? Definitely NOT!
Seriously, people who still use tasklets without being aware of their
subtle issues and without an extremly good reason to use them at all
should use a wire cutter or some other appropriate tool to render
their keyboard unusable and get a job in a bakery where they can eat
the mess they produce themself.
Thanks,
tglx
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists