[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <200801220925.50314.chris.mason@oracle.com>
Date: Tue, 22 Jan 2008 09:25:49 -0500
From: Chris Mason <chris.mason@...cle.com>
To: Al Boldi <a1426z@...ab.com>
Cc: Ingo Molnar <mingo@...e.hu>,
Oliver Pinter (Pintér Olivér)
<oliver.pntr@...il.com>, linux-kernel@...r.kernel.org,
linux-fsdevel@...r.kernel.org
Subject: Re: konqueror deadlocks on 2.6.22
On Tuesday 22 January 2008, Al Boldi wrote:
> Ingo Molnar wrote:
> > * Oliver Pinter (Pintér Olivér) <oliver.pntr@...il.com> wrote:
> > > and then please update to CFS-v24.1
> > > http://people.redhat.com/~mingo/cfs-scheduler/sched-cfs-v2.6.22.15-v24.
> > >1 .patch
> > >
> > > > Yes with CFSv20.4, as in the log.
> > > >
> > > > It also hangs on 2.6.23.13
> >
> > my feeling is that this is some sort of timing dependent race in
> > konqueror/kde/qt that is exposed when a different scheduler is put in.
> >
> > If it disappears with CFS-v24.1 it is probably just because the timings
> > will change again. Would be nice to debug this on the konqueror side and
> > analyze why it fails and how. You can probably tune the timings by
> > enabling SCHED_DEBUG and tweaking /proc/sys/kernel/*sched* values - in
> > particular sched_latency and the granularity settings. Setting wakeup
> > granularity to 0 might be one of the things that could make a
> > difference.
>
> Thanks Ingo, but Mike suggested that data=writeback may make a difference,
> which it does indeed.
>
> So the bug seems to be related to data=ordered, although I haven't gotten
> any feedback from the ext3 gurus yet.
>
> Seems rather critical though, as data=writeback is a dangerous mode to run.
Running fsync in data=ordered means that all of the dirty blocks on the FS
will get written before fsync returns. Your original stack trace shows
everyone either performing writeback for a log commit or waiting for the log
commit to return.
They key task in your trace is kjournald, stuck in get_request_wait. It could
be a block layer bug, not giving him requests quickly enough, or it could be
the scheduler not giving him back the cpu fast enough.
At any rate, that's where to concentrate the debugging. You should be able to
simulate this by running a few instances of the below loop and looking for
stalls:
while(true) ; do
time dd if=/dev/zero of=foo bs=50M count=4 oflags=sync
done
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists