[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <487113AC.7000300@skyrush.com>
Date: Sun, 06 Jul 2008 12:49:16 -0600
From: Joe Peterson <joe@...rush.com>
To: Tim Connors <tim.w.connors@...il.com>
CC: Vegard Nossum <vegard.nossum@...il.com>,
Alan Cox <alan@...hat.com>,
Alan Cox <alan@...rguk.ukuu.org.uk>,
David Newall <davidn@...idnewall.com>,
Willy Tarreau <w@....eu>,
Harald Dunkel <harald.dunkel@...nline.de>,
linux-kernel@...r.kernel.org, Ingo Molnar <mingo@...e.hu>
Subject: Re: tty session leader issue [cause now known!] (was Re: 2.6.25.3:
su gets stuck for root)
Tim Connors wrote:
> On Wed, 2 Jul 2008, Joe Peterson wrote:
>
>> I have done some more investigation on this problem, and I am posting
>> here my results in hope that someone can point me in the right direction
>> for further investigation...
>>
>> Summary: during the initialization of a new bash shell, the terminal
>> foreground process group often reverts back to that of the parent of the
>> bash shell (after being set *to* the bash shell pgrp by bash),
>> prohibiting commands like stty from being run by the init scripts. The
>> result is that the execution of these commands will hang until killed,
>> causing the bash prompt to not appear. Adding a delay in the script
>> (using sleep) increases the chance of this having time to happen.
I have done more investigation, and I now know the cause of the
bash/stty problem. It appears to be a race condition in bash (well,
between two different bash shells, actually). I saw a post from a while
back about something similar by Ingo Molnar, so I have copied him here too.
Here is the ps tree of the test case where stty has hung:
4704 ? S 0:00 \_ xterm
4706 pts/3 Ss 0:00 | \_ -bash
4739 pts/3 S 0:00 | \_ su
4742 pts/3 S 0:00 | \_ bash
4746 pts/3 S+ 0:00 | \_ su foo
4747 pts/3 S 0:00 | \_ bash
4752 pts/3 T 0:00 | \_ stty -ixany
What should happen is: when "su foo" (4746) is run, it spawns a bash
shell (4747) that then makes itself the session leader when it
initializes its job control. The stty command (in the child bash's
.bashrc) will then be able to work (and not hang).
However, the hang happens when the parent bash (4742) interferes by
reverting the tty session leader back to its child (the "su foo"
process: 4746) shortly after the child bash (4747) becomes the leader.
The parent does this when it calls
execute_command_internal()->stop_pipeline()->give_terminal_to(). This
seems to happen at a slightly random time, making the issue intermittent
- it depends which one wins the race.
In summary, when the bug does *not* occur, here is the approximate
sequence (note I am :
1) parent bash (4742) runs 'su foo' (4746)
2) parent bash sets tty leader to 'su' (4746)
3) child bash (4747) initializes and sets itself to be the leader
4) stty command in .bashrc runs successfully
When the bug occurs, here is the sequence:
1) parent bash (4742) runs 'su foo' (4746)
2) child bash (4747) initializes and sets itself to be the leader
3) parent bash sets tty leader *back* to 'su' (4746)
4) stty command runs and fails/hangs because its parent is not leader
The various calls to tcsetpgrp() that do this are interleaved from the
two bash processes, and sometimes the parent does it slightly *after*
the child bash initializes job control - that's when the problem happens.
I have not looked further to find a solution (but it's a great start to
know the cause...!). Any further help is welcome.
> The 6 year old inspiron 4000 gets stuck at stty erase ^? . Randomly, but
> most of the time.
>
> All of my machines exhibit the ctrl-C being slower than ctrl-Z discussed
> elswhere (I've almost developed a habit of typing ctrl-Z kill %1 <RET>).
> Although even ctrl-Z recently has been reluctant to always work. I wonder
> if this is the cause of dpkg recently not responding to ctrl-Z's? (debian
> bug #486222). dpkg does respond to kill -STOP
I doubt that this is related. See the following thread for more info on
this:
http://marc.info/?l=linux-kernel&m=121528829718840&w=2
> ctrl-s doesn't always work anymore. Again, what prompted me to write this
> email, was I couldn't pause dpkg. It's particularly unreliable at
> stopping scrolling messages at bootup, and if I press it at the wrong time
> at bootup (not a specific place - it can be starting up any number of
> scripts), something deadlocks and won't resume upon a ctrl-q.
> alt-sysrq-k is enough to kill whatever has deadlocked. I have a feeling,
> but don't want to test on this system right now, that pressing scroll-lock
> as opposed to ctrl-q once unlocked such a stuck display.
Hmm, not sure; I have not seen that behavior.
> In summary, something in tty is certainly screwed. Does anyone see a
> connection between all of these?
I doubt there is a connection between the bash issue and what you are
seeing with ctrl-C/ctrl-S, etc.
-Joe
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists