linux-kernel - Re: tty breakage in X (Was: tty vs workqueue oddities)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <1307084189.23876.19.camel@pasglop>
Date:	Fri, 03 Jun 2011 16:56:29 +1000
From:	Benjamin Herrenschmidt <benh@...nel.crashing.org>
To:	Alan Cox <alan@...rguk.ukuu.org.uk>
Cc:	gregkh@...e.de,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	Felipe Balbi <balbi@...com>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Tejun Heo <tj@...nel.org>
Subject: Re: tty breakage in X (Was: tty vs workqueue oddities)

On Fri, 2011-06-03 at 16:17 +1000, Benjamin Herrenschmidt wrote:

> Some more data: It -looks- like what happens is that the flush_to_ldisc
> work queue entry constantly re-queues itself (because the PTY is full ?)
> and the workqueue thread will basically loop forver calling it without
> ever scheduling, thus starving the consumer process that could have
> emptied the PTY.
> 
> At least that's a semi half-assed theory. If I add a schedule() to
> process_one_work() after dropping the lock, the problem disappears.
> 
> So there's a combination of things here that are quite interesting:
> 
>  - A lot of work queued for the kworker will essentially go on without
> scheduling for as long as it takes to empty all work items. That doesn't
> sound very nice latency-wise. At least on a non-PREEMPT kernel.
> 
>  - flush_to_ldisc seems to be nasty and requeues itself over and over
> again from what I can tell, when it can't push the data out, in this
> case, I suspect because the PTY is full but I don't know for sure yet.

Interesting results from x86. I could not initially reproduce there at
all on my little Atom board (the one from kernel summit last year).

Eventually I looked at the kernel config, switched off PREEMPT_VOLUNTARY
and I can now reproduce on x86 too. Again, if you have both threads/core
running, the problem isn't as visible (you do see "hickups" when cat'ing
a large file, the atom is slow enough I suppose).

But offline a cpu (leave only one up) and cat a large file (dmesg is
enough for me to trigger it) and you see the hangs.

So I think my theory stands that flush_to_ldisc constantly reschedule
itself causing the worker thread to eat all CPU and starve the consumer
of the PTY. I won't have time to dig much deeper today nor probably this
week-end so I'm sending this email for others who want to look.

Cheers,
Ben.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/