[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1307084189.23876.19.camel@pasglop>
Date: Fri, 03 Jun 2011 16:56:29 +1000
From: Benjamin Herrenschmidt <benh@...nel.crashing.org>
To: Alan Cox <alan@...rguk.ukuu.org.uk>
Cc: gregkh@...e.de,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
Felipe Balbi <balbi@...com>,
Linus Torvalds <torvalds@...ux-foundation.org>,
Tejun Heo <tj@...nel.org>
Subject: Re: tty breakage in X (Was: tty vs workqueue oddities)
On Fri, 2011-06-03 at 16:17 +1000, Benjamin Herrenschmidt wrote:
> Some more data: It -looks- like what happens is that the flush_to_ldisc
> work queue entry constantly re-queues itself (because the PTY is full ?)
> and the workqueue thread will basically loop forver calling it without
> ever scheduling, thus starving the consumer process that could have
> emptied the PTY.
>
> At least that's a semi half-assed theory. If I add a schedule() to
> process_one_work() after dropping the lock, the problem disappears.
>
> So there's a combination of things here that are quite interesting:
>
> - A lot of work queued for the kworker will essentially go on without
> scheduling for as long as it takes to empty all work items. That doesn't
> sound very nice latency-wise. At least on a non-PREEMPT kernel.
>
> - flush_to_ldisc seems to be nasty and requeues itself over and over
> again from what I can tell, when it can't push the data out, in this
> case, I suspect because the PTY is full but I don't know for sure yet.
Interesting results from x86. I could not initially reproduce there at
all on my little Atom board (the one from kernel summit last year).
Eventually I looked at the kernel config, switched off PREEMPT_VOLUNTARY
and I can now reproduce on x86 too. Again, if you have both threads/core
running, the problem isn't as visible (you do see "hickups" when cat'ing
a large file, the atom is slow enough I suppose).
But offline a cpu (leave only one up) and cat a large file (dmesg is
enough for me to trigger it) and you see the hangs.
So I think my theory stands that flush_to_ldisc constantly reschedule
itself causing the worker thread to eat all CPU and starve the consumer
of the PTY. I won't have time to dig much deeper today nor probably this
week-end so I'm sending this email for others who want to look.
Cheers,
Ben.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists