lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <f96157c40607250319w27de6956xf20d68a415503f4a@mail.gmail.com>
Date:	Tue, 25 Jul 2006 10:19:56 +0000
From:	"gmu 2k6" <gmu2006@...il.com>
To:	"Jens Axboe" <axboe@...e.de>
Cc:	linux-kernel@...r.kernel.org
Subject: Re: Re: i686 hang on boot in userspace

On 7/25/06, Jens Axboe <axboe@...e.de> wrote:
> On Tue, Jul 25 2006, gmu 2k6 wrote:
> > On 7/25/06, Jens Axboe <axboe@...e.de> wrote:
> > >On Tue, Jul 25 2006, gmu 2k6 wrote:
> > >> On 7/25/06, Jens Axboe <axboe@...e.de> wrote:
> > >> >On Tue, Jul 25 2006, gmu 2k6 wrote:
> > >> >> On 7/25/06, Jens Axboe <axboe@...e.de> wrote:
> > >> >> >On Tue, Jul 25 2006, gmu 2k6 wrote:
> > >> >> >> On 7/25/06, Jens Axboe <axboe@...e.de> wrote:
> > >> >> >> >On Tue, Jul 25 2006, gmu 2k6 wrote:
> > >> >> >> >> On 7/25/06, Jens Axboe <axboe@...e.de> wrote:
> > >> >> >> >> >On Mon, Jul 24 2006, gmu 2k6 wrote:
> > >> >> >> >> >> the problem I have with hangs is related to changes in CFQ
> > >and
> > >> >that
> > >> >> >> >> >> CFQ is now the default. 2.6.17-git12 had the problem but
> > >booting
> > >> >> >> >> >> it with elevator=deadline fixes the hang.
> > >> >> >> >> >>
> > >> >> >> >> >> symptoms encountered during git-bisecting between v2.6.17 and
> > >> >> >> >> >v2.6.18-rc1:
> > >> >> >> >> >> A hang while starting network services
> > >> >> >> >> >> B hang while trying to login
> > >> >> >> >> >>   1 on remote console [not SSH] it hang after typing
> > ><uid><CR>
> > >> >> >> >> >>   1 via OpenSSH it hang after typing <pwd><CR> when doing
> > >slogin
> > >> >> >> >> >root@<IP>
> > >> >> >> >> >>
> > >> >> >> >> >> A is the problem I got in the first place and this seems to
> > >be
> > >> >the
> > >> >> >> >> >> case since 2.6.17-git11 definitely although git-bisect
> > >pointed
> > >> >me
> > >> >> >at
> > >> >> >> >> >> the following
> > >> >> >> >> >> changeset which is included since 2.6.17-git12:
> > >> >> >> >> >>
> > >> >> >> >> >> caaa5f9f0a75d1dc5e812e69afdbb8720e077fd3
> > >> >> >> >> >> by Jens Axboe
> > >> >> >> >> >> titled "[PATCH] cfq-iosched: many performance fixes"
> > >> >> >> >> >>
> > >> >> >> >> >> strange enough it also hangs with 2.6.17-git11 which did not
> > >> >> >include
> > >> >> >> >that
> > >> >> >> >> >> one changeset yet.
> > >> >> >> >> >
> > >> >> >> >> >So perhaps your bisect isn't 100% trust worthy? Can you do a
> > >> >manual
> > >> >> >> >> >-gitX bisect to see which 2.6.17-gitX introduced the problem?
> > >> >> >> >> >
> > >> >> >> >> >Also please put a serial console or similar on the machine, so
> > >you
> > >> >> >can
> > >> >> >> >> >log + store the sysrq+t output.
> > >> >> >> >>
> > >> >> >> >> well I didn't say that caa....fd3 is the exact change which
> > >broke
> > >> >it,
> > >> >> >> >> just that it's related to 1) CFQ changes and 2) CFQ being the
> > >> >default
> > >> >> >> >> now.
> > >> >> >> >> I have a Remote Serial Console via HP's integrated Lights-Out
> > >Java
> > >> >> >> >> Applet but am not sure how to enable serial console via kernel
> > >boot
> > >> >> >> >> params (will try to find out).
> > >> >> >> >> I will first try to find the 2.6.17-git* revision working before
> > >> >> >> >> bisecting it against -git11 or git12.
> > >> >> >> >
> > >> >> >> >Thanks, would be much appreciated to try and narrow it down to a
> > >> >> >> >specific fix.
> > >> >> >> >
> > >> >> >> >Are you seeing the hang on cciss?
> > >> >> >>
> > >> >> >> I'm not sure it is in the cciss driver, but the SmartArray is
> > >driven
> > >> >by
> > >> >> >> cciss.
> > >> >> >> starting git<11 boot tests in a minute now.
> > >> >> >
> > >> >> >Ok, thanks for confirming it's cciss. The bug is likely an
> > >interaction
> > >> >> >between cciss and cfq I think, so it would be very useful if you can
> > >pin
> > >> >> >point which of the cfq patches make it stall.
> > >> >>
> > >> >> is there anything special about cciss or did you just deduce that it
> > >> >> must be cciss in that particular box and are suspecting interaction
> > >> >> problems with that driver and your CFQ changes?
> > >> >
> > >> >Nothing really special about cciss, but a few months ago I had a similar
> > >> >discussion about cciss and a strange hang.
> > >> >
> > >> >If possible, please also try a known bad kernel and apply the below
> > >> >patch and see if it still reproduces:
> > >> >
> > >> >diff --git a/drivers/block/cciss.c b/drivers/block/cciss.c
> > >> >index 1c4df22..2b36e7a 100644
> > >> >--- a/drivers/block/cciss.c
> > >> >+++ b/drivers/block/cciss.c
> > >> >@@ -2362,7 +2362,11 @@ static inline void complete_command(ctlr
> > >> >        cmd->rq->completion_data = cmd;
> > >> >        cmd->rq->errors = status;
> > >> >        blk_add_trace_rq(cmd->rq->q, cmd->rq, BLK_TA_COMPLETE);
> > >> >+#if 1
> > >> >+       cciss_softirq_done(cmd->rq);
> > >> >+#else
> > >> >        blk_complete_request(cmd->rq);
> > >> >+#endif
> > >> > }
> > >> >
> > >> > /*
> > >>
> > >> manually nailed it down to 2.6.17-git7 being the first broken revision.
> > >> going to try whether Linus' git tree knows the -git revisions and do a
> > >> bisect
> > >> otherwise interdiff and looking for CFQ or cciss changes as best I can.
> > >
> > >Hmm, there are no cfq/cciss changes between git6 and git7. Some SCSI
> > >changes, though. Are you using SCSI for anything?
> >
> > I thought cciss (SmartArray) was SCSI, isn't it? I guess you mean "no
> > James Bottomley changes in the SCSI layer".
>
> Nope, cciss doesn't interact with the SCSI layer except for tapes.
>
> > >We really need that sysrq-t dump.
> >
> > I'm not able to get the virtual remote serial console working, so I
> > will try to go down to the datacenter and do "1) SysRq-t 2) SysRq-S 3)
> > reboot with livecd and get the content of the synced kern.log which
> > should contain the SysRq-t output (hopefully).
>
> Ok, hope it works out :)

I'm going downstairs now.

> You can also use netconsole, that might be a lot easier for you. That
> just requires networking and et netcat at the other end.

any howto?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ