lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:	Sun, 06 Jul 2008 12:49:16 -0600
From:	Joe Peterson <joe@...rush.com>
To:	Tim Connors <tim.w.connors@...il.com>
CC:	Vegard Nossum <vegard.nossum@...il.com>,
	Alan Cox <alan@...hat.com>,
	Alan Cox <alan@...rguk.ukuu.org.uk>,
	David Newall <davidn@...idnewall.com>,
	Willy Tarreau <w@....eu>,
	Harald Dunkel <harald.dunkel@...nline.de>,
	linux-kernel@...r.kernel.org, Ingo Molnar <mingo@...e.hu>
Subject: Re: tty session leader issue [cause now known!] (was Re: 2.6.25.3:
 su gets stuck for root)

Tim Connors wrote:
> On Wed, 2 Jul 2008, Joe Peterson wrote:
> 
>> I have done some more investigation on this problem, and I am posting
>> here my results in hope that someone can point me in the right direction
>> for further investigation...
>>
>> Summary: during the initialization of a new bash shell, the terminal
>> foreground process group often reverts back to that of the parent of the
>> bash shell (after being set *to* the bash shell pgrp by bash),
>> prohibiting commands like stty from being run by the init scripts.  The
>> result is that the execution of these commands will hang until killed,
>> causing the bash prompt to not appear.  Adding a delay in the script
>> (using sleep) increases the chance of this having time to happen.

I have done more investigation, and I now know the cause of the
bash/stty problem.  It appears to be a race condition in bash (well,
between two different bash shells, actually).  I saw a post from a while
back about something similar by Ingo Molnar, so I have copied him here too.

Here is the ps tree of the test case where stty has hung:

 4704 ?        S      0:00  \_ xterm
 4706 pts/3    Ss     0:00  |   \_ -bash
 4739 pts/3    S      0:00  |       \_ su
 4742 pts/3    S      0:00  |           \_ bash
 4746 pts/3    S+     0:00  |               \_ su foo
 4747 pts/3    S      0:00  |                   \_ bash
 4752 pts/3    T      0:00  |                       \_ stty -ixany

What should happen is: when "su foo" (4746) is run, it spawns a bash
shell (4747) that then makes itself the session leader when it
initializes its job control.  The stty command (in the child bash's
.bashrc) will then be able to work (and not hang).

However, the hang happens when the parent bash (4742) interferes by
reverting the tty session leader back to its child (the "su foo"
process: 4746) shortly after the child bash (4747) becomes the leader.
The parent does this when it calls
execute_command_internal()->stop_pipeline()->give_terminal_to().  This
seems to happen at a slightly random time, making the issue intermittent
- it depends which one wins the race.

In summary, when the bug does *not* occur, here is the approximate
sequence (note I am :

1) parent bash (4742) runs 'su foo' (4746)
2) parent bash sets tty leader to 'su' (4746)
3) child bash (4747) initializes and sets itself to be the leader
4) stty command in .bashrc runs successfully

When the bug occurs, here is the sequence:

1) parent bash (4742) runs 'su foo' (4746)
2) child bash (4747) initializes and sets itself to be the leader
3) parent bash sets tty leader *back* to 'su' (4746)
4) stty command runs and fails/hangs because its parent is not leader

The various calls to tcsetpgrp() that do this are interleaved from the
two bash processes, and sometimes the parent does it slightly *after*
the child bash initializes job control - that's when the problem happens.

I have not looked further to find a solution (but it's a great start to
know the cause...!).  Any further help is welcome.

> The 6 year old inspiron 4000 gets stuck at stty erase ^? .  Randomly, but
> most of the time.
> 
> All of my machines exhibit the ctrl-C being slower than ctrl-Z discussed
> elswhere (I've almost developed a habit of typing ctrl-Z kill %1 <RET>).
> Although even ctrl-Z recently has been reluctant to always work.  I wonder
> if this is the cause of dpkg recently not responding to ctrl-Z's? (debian
> bug #486222).  dpkg does respond to kill -STOP

I doubt that this is related.  See the following thread for more info on
this:

	http://marc.info/?l=linux-kernel&m=121528829718840&w=2

> ctrl-s doesn't always work anymore.  Again, what prompted me to write this
> email, was I couldn't pause dpkg.  It's particularly unreliable at
> stopping scrolling messages at bootup, and if I press it at the wrong time
> at bootup (not a specific place - it can be starting up any number of
> scripts), something deadlocks and won't resume upon a ctrl-q.
> alt-sysrq-k is enough to kill whatever has deadlocked.  I have a feeling,
> but don't want to test on this system right now, that pressing scroll-lock
> as opposed to ctrl-q once unlocked such a stuck display.

Hmm, not sure; I have not seen that behavior.

> In summary, something in tty is certainly screwed.  Does anyone see a
> connection between all of these?

I doubt there is a connection between the bash issue and what you are
seeing with ctrl-C/ctrl-S, etc.

					-Joe
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ