[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <f96157c40607250235t4cdd76ffxfd6f95389d2ddbdc@mail.gmail.com>
Date: Tue, 25 Jul 2006 09:35:23 +0000
From: "gmu 2k6" <gmu2006@...il.com>
To: "Jens Axboe" <axboe@...e.de>
Cc: linux-kernel@...r.kernel.org
Subject: Re: Re: i686 hang on boot in userspace
On 7/25/06, gmu 2k6 <gmu2006@...il.com> wrote:
> On 7/25/06, gmu 2k6 <gmu2006@...il.com> wrote:
> > On 7/25/06, Jens Axboe <axboe@...e.de> wrote:
> > > On Tue, Jul 25 2006, gmu 2k6 wrote:
> > > > On 7/25/06, Jens Axboe <axboe@...e.de> wrote:
> > > > >On Tue, Jul 25 2006, gmu 2k6 wrote:
> > > > >> On 7/25/06, Jens Axboe <axboe@...e.de> wrote:
> > > > >> >On Tue, Jul 25 2006, gmu 2k6 wrote:
> > > > >> >> On 7/25/06, Jens Axboe <axboe@...e.de> wrote:
> > > > >> >> >On Mon, Jul 24 2006, gmu 2k6 wrote:
> > > > >> >> >> the problem I have with hangs is related to changes in CFQ and that
> > > > >> >> >> CFQ is now the default. 2.6.17-git12 had the problem but booting
> > > > >> >> >> it with elevator=deadline fixes the hang.
> > > > >> >> >>
> > > > >> >> >> symptoms encountered during git-bisecting between v2.6.17 and
> > > > >> >> >v2.6.18-rc1:
> > > > >> >> >> A hang while starting network services
> > > > >> >> >> B hang while trying to login
> > > > >> >> >> 1 on remote console [not SSH] it hang after typing <uid><CR>
> > > > >> >> >> 1 via OpenSSH it hang after typing <pwd><CR> when doing slogin
> > > > >> >> >root@<IP>
> > > > >> >> >>
> > > > >> >> >> A is the problem I got in the first place and this seems to be the
> > > > >> >> >> case since 2.6.17-git11 definitely although git-bisect pointed me
> > > > >at
> > > > >> >> >> the following
> > > > >> >> >> changeset which is included since 2.6.17-git12:
> > > > >> >> >>
> > > > >> >> >> caaa5f9f0a75d1dc5e812e69afdbb8720e077fd3
> > > > >> >> >> by Jens Axboe
> > > > >> >> >> titled "[PATCH] cfq-iosched: many performance fixes"
> > > > >> >> >>
> > > > >> >> >> strange enough it also hangs with 2.6.17-git11 which did not
> > > > >include
> > > > >> >that
> > > > >> >> >> one changeset yet.
> > > > >> >> >
> > > > >> >> >So perhaps your bisect isn't 100% trust worthy? Can you do a manual
> > > > >> >> >-gitX bisect to see which 2.6.17-gitX introduced the problem?
> > > > >> >> >
> > > > >> >> >Also please put a serial console or similar on the machine, so you
> > > > >can
> > > > >> >> >log + store the sysrq+t output.
> > > > >> >>
> > > > >> >> well I didn't say that caa....fd3 is the exact change which broke it,
> > > > >> >> just that it's related to 1) CFQ changes and 2) CFQ being the default
> > > > >> >> now.
> > > > >> >> I have a Remote Serial Console via HP's integrated Lights-Out Java
> > > > >> >> Applet but am not sure how to enable serial console via kernel boot
> > > > >> >> params (will try to find out).
> > > > >> >> I will first try to find the 2.6.17-git* revision working before
> > > > >> >> bisecting it against -git11 or git12.
> > > > >> >
> > > > >> >Thanks, would be much appreciated to try and narrow it down to a
> > > > >> >specific fix.
> > > > >> >
> > > > >> >Are you seeing the hang on cciss?
> > > > >>
> > > > >> I'm not sure it is in the cciss driver, but the SmartArray is driven by
> > > > >> cciss.
> > > > >> starting git<11 boot tests in a minute now.
> > > > >
> > > > >Ok, thanks for confirming it's cciss. The bug is likely an interaction
> > > > >between cciss and cfq I think, so it would be very useful if you can pin
> > > > >point which of the cfq patches make it stall.
> > > >
> > > > is there anything special about cciss or did you just deduce that it
> > > > must be cciss in that particular box and are suspecting interaction
> > > > problems with that driver and your CFQ changes?
> > >
> > > Nothing really special about cciss, but a few months ago I had a similar
> > > discussion about cciss and a strange hang.
> > >
> > > If possible, please also try a known bad kernel and apply the below
> > > patch and see if it still reproduces:
> > >
> > > diff --git a/drivers/block/cciss.c b/drivers/block/cciss.c
> > > index 1c4df22..2b36e7a 100644
> > > --- a/drivers/block/cciss.c
> > > +++ b/drivers/block/cciss.c
> > > @@ -2362,7 +2362,11 @@ static inline void complete_command(ctlr
> > > cmd->rq->completion_data = cmd;
> > > cmd->rq->errors = status;
> > > blk_add_trace_rq(cmd->rq->q, cmd->rq, BLK_TA_COMPLETE);
> > > +#if 1
> > > + cciss_softirq_done(cmd->rq);
> > > +#else
> > > blk_complete_request(cmd->rq);
> > > +#endif
> > > }
> > >
> > > /*
> >
> > manually nailed it down to 2.6.17-git7 being the first broken revision.
> > going to try whether Linus' git tree knows the -git revisions and do a bisect
> > otherwise interdiff and looking for CFQ or cciss changes as best I can.
>
> oops, doing git-status while running 2.6.17-git6 seems to have locked the box
> again :D, ping works though. *sigh*. Jens I will try your cciss.c change now.
ok, let's nail it to 2.6.17-git5 instead as it survived git status
compared to -git6
which seems to have correctly booted by accident the lastime. timing issues
I guess.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists