lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20070414172920.GA2433@1wt.eu>
Date:	Sat, 14 Apr 2007 19:29:20 +0200
From:	Willy Tarreau <w@....eu>
To:	"Eric W. Biederman" <ebiederm@...ssion.com>
Cc:	Ingo Molnar <mingo@...e.hu>, Nick Piggin <npiggin@...e.de>,
	linux-kernel@...r.kernel.org,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Con Kolivas <kernel@...ivas.org>,
	Mike Galbraith <efault@....de>,
	Arjan van de Ven <arjan@...radead.org>,
	Thomas Gleixner <tglx@...utronix.de>,
	Jiri Slaby <jirislaby@...il.com>,
	Alan Cox <alan@...rguk.ukuu.org.uk>
Subject: Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

Hi Eric,

[...]
> >> the ramp up slows down after 700-800 processes, but something very 
> >> strange happens. If I'm under X, I can switch the focus to all xterms 
> >> (the WM is still alive) but all xterms are frozen. On the console, 
> >> after one moment I simply cannot switch to another VT anymore while I 
> >> can still start commands locally. But "chvt 2" simply blocks. SysRq-K 
> >> killed everything and restored full control. Dmesg shows lots of :
> >
> >> SAK: killed process xxxx (scheddos2): process_session(p)==tty->session.
> 
> This.  Yes. SAK is noisy and tells you everything it kills.

OK, that's what I suspected, but I did not know if the fact that it talked
about the session was systematic or related to any particular state when it
killed the task.

> >> I wonder if part of the problem would be too many processes bound to 
> >> the same tty :-/
> >
> > hm, that's really weird. I've Cc:-ed the tty experts (Erik, Jiri, Alan), 
> > maybe this description rings a bell with them?
> 
> Is there any swapping going on?

Not at all.

> I'm inclined to suspect that it is a problem that has more to do with the
> number of processes and has nothing to do with ttys.

It is clearly possible. What I found strange is that I could still fork
processes (eg: ls, dmesg|tail), ... but not switch to another VT anymore.
It first happened under X with frozen xterms but a perfectly usable WM,
then I reproduced it on pure console to rule out any potential X problem.

> Anyway you can easily rule out ttys by having your startup program
> detach from a controlling tty before you start everything.
> 
> I'm more inclined to guess something is reading /proc a lot, or doing
> something that holds the tasklist lock, a lot or something like that,
> if the problem isn't that you are being kicked into swap.

Oh I'm sorry you were invited into the discussion without a first description
of the context. I was giving a try to Ingo's new scheduler, and trying to
reach corner cases with lots of processes competing for CPU.

I simply used a "for" loop in bash to fork 1000 processes, and this problem
happened between 700-800 children. The program only uses a busy loop and a
pause. I then changed my program to close 0,1,2 and perform the fork itself,
and the problem vanished. So there are two differences here :

  - bash not forking anymore
  - far less FDs on /dev/tty1

At first, I had around 2200 fds on /dev/tty1, reason why I suspected something
in this area.

I agree that this is not normal usage at all, I'm just trying to attack
Ingo's scheduler to ensure it is more robust than the stock one. But
sometimes brute force methods can make other sleeping problems pop up.

Thinking about it, I don't know if there are calls to schedule() while
switching from tty1 to tty2. Alt-F2 had no effect anymore, and "chvt 2"
simply blocked. It would have been possible that a schedule() call
somewhere got starved due to the load, I don't know.

Thanks,
Willy

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ