[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <17923.8970.917375.917772@notabene.brown>
Date: Fri, 23 Mar 2007 11:44:58 +1100
From: Neil Brown <neilb@...e.de>
To: "Dan Williams" <dan.j.williams@...el.com>
Cc: "Jens Axboe" <jens.axboe@...cle.com>, linux@...izon.com,
linux-kernel@...r.kernel.org, linux-raid@...r.kernel.org,
linux-kernel@...e.us, cebbert@...hat.com
Subject: Re: 2.6.20.3 AMD64 oops in CFQ code
On Thursday March 22, dan.j.williams@...el.com wrote:
>
> Not a cfq failure, but I have been able to reproduce a different oops
> at array stop time while i/o's were pending. I have not dug into it
> enough to suggest a patch, but I wonder if it is somehow related to
> the cfq failure since it involves congestion and drives going away:
Thanks. I know about that one and have a patch about to be posted
which should fix it. But I don't completely understand it.
When a raid5 array shuts down, it clears mddev->private, but doesn't
clean q->backing_dev_info.congested_fn. So if someone tries to call
that congested_fn, it will try to dereference mddev->private and Oops.
Only by the time that raid5 is shutting down, no-one should have a
reference to the device any more, and no-one should be in a position
to call congested_fn !!
Maybe pdflush is just trying to sync the block device, even though
there is no dirty date .... dunno....
But I don't think it is related to the cfq problem as this one is only
a problem when the array is being stopped.
Thanks,
NeilBrown
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists