lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Tue, 25 Jul 2006 10:57:44 +0200 From: Jens Axboe <axboe@...e.de> To: gmu 2k6 <gmu2006@...il.com> Cc: linux-kernel@...r.kernel.org Subject: Re: Re: i686 hang on boot in userspace On Tue, Jul 25 2006, gmu 2k6 wrote: > On 7/25/06, gmu 2k6 <gmu2006@...il.com> wrote: > >On 7/25/06, Jens Axboe <axboe@...e.de> wrote: > >> On Tue, Jul 25 2006, gmu 2k6 wrote: > >> > On 7/25/06, Jens Axboe <axboe@...e.de> wrote: > >> > >On Tue, Jul 25 2006, gmu 2k6 wrote: > >> > >> On 7/25/06, Jens Axboe <axboe@...e.de> wrote: > >> > >> >On Tue, Jul 25 2006, gmu 2k6 wrote: > >> > >> >> On 7/25/06, Jens Axboe <axboe@...e.de> wrote: > >> > >> >> >On Mon, Jul 24 2006, gmu 2k6 wrote: > >> > >> >> >> the problem I have with hangs is related to changes in CFQ > >and that > >> > >> >> >> CFQ is now the default. 2.6.17-git12 had the problem but > >booting > >> > >> >> >> it with elevator=deadline fixes the hang. > >> > >> >> >> > >> > >> >> >> symptoms encountered during git-bisecting between v2.6.17 and > >> > >> >> >v2.6.18-rc1: > >> > >> >> >> A hang while starting network services > >> > >> >> >> B hang while trying to login > >> > >> >> >> 1 on remote console [not SSH] it hang after typing <uid><CR> > >> > >> >> >> 1 via OpenSSH it hang after typing <pwd><CR> when doing > >slogin > >> > >> >> >root@<IP> > >> > >> >> >> > >> > >> >> >> A is the problem I got in the first place and this seems to > >be the > >> > >> >> >> case since 2.6.17-git11 definitely although git-bisect > >pointed me > >> > >at > >> > >> >> >> the following > >> > >> >> >> changeset which is included since 2.6.17-git12: > >> > >> >> >> > >> > >> >> >> caaa5f9f0a75d1dc5e812e69afdbb8720e077fd3 > >> > >> >> >> by Jens Axboe > >> > >> >> >> titled "[PATCH] cfq-iosched: many performance fixes" > >> > >> >> >> > >> > >> >> >> strange enough it also hangs with 2.6.17-git11 which did not > >> > >include > >> > >> >that > >> > >> >> >> one changeset yet. > >> > >> >> > > >> > >> >> >So perhaps your bisect isn't 100% trust worthy? Can you do a > >manual > >> > >> >> >-gitX bisect to see which 2.6.17-gitX introduced the problem? > >> > >> >> > > >> > >> >> >Also please put a serial console or similar on the machine, so > >you > >> > >can > >> > >> >> >log + store the sysrq+t output. > >> > >> >> > >> > >> >> well I didn't say that caa....fd3 is the exact change which > >broke it, > >> > >> >> just that it's related to 1) CFQ changes and 2) CFQ being the > >default > >> > >> >> now. > >> > >> >> I have a Remote Serial Console via HP's integrated Lights-Out > >Java > >> > >> >> Applet but am not sure how to enable serial console via kernel > >boot > >> > >> >> params (will try to find out). > >> > >> >> I will first try to find the 2.6.17-git* revision working before > >> > >> >> bisecting it against -git11 or git12. > >> > >> > > >> > >> >Thanks, would be much appreciated to try and narrow it down to a > >> > >> >specific fix. > >> > >> > > >> > >> >Are you seeing the hang on cciss? > >> > >> > >> > >> I'm not sure it is in the cciss driver, but the SmartArray is > >driven by > >> > >> cciss. > >> > >> starting git<11 boot tests in a minute now. > >> > > > >> > >Ok, thanks for confirming it's cciss. The bug is likely an interaction > >> > >between cciss and cfq I think, so it would be very useful if you can > >pin > >> > >point which of the cfq patches make it stall. > >> > > >> > is there anything special about cciss or did you just deduce that it > >> > must be cciss in that particular box and are suspecting interaction > >> > problems with that driver and your CFQ changes? > >> > >> Nothing really special about cciss, but a few months ago I had a similar > >> discussion about cciss and a strange hang. > >> > >> If possible, please also try a known bad kernel and apply the below > >> patch and see if it still reproduces: > >> > >> diff --git a/drivers/block/cciss.c b/drivers/block/cciss.c > >> index 1c4df22..2b36e7a 100644 > >> --- a/drivers/block/cciss.c > >> +++ b/drivers/block/cciss.c > >> @@ -2362,7 +2362,11 @@ static inline void complete_command(ctlr > >> cmd->rq->completion_data = cmd; > >> cmd->rq->errors = status; > >> blk_add_trace_rq(cmd->rq->q, cmd->rq, BLK_TA_COMPLETE); > >> +#if 1 > >> + cciss_softirq_done(cmd->rq); > >> +#else > >> blk_complete_request(cmd->rq); > >> +#endif > >> } > >> > >> /* > > > >manually nailed it down to 2.6.17-git7 being the first broken revision. > >going to try whether Linus' git tree knows the -git revisions and do a > >bisect > >otherwise interdiff and looking for CFQ or cciss changes as best I can. > > oops, doing git-status while running 2.6.17-git6 seems to have locked the > box > again :D, ping works though. *sigh*. Jens I will try your cciss.c change > now. I guess that's a good thing, if it was git7 that introduced it, then things are looking fishy. -- Jens Axboe - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@...r.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists