[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20080414174540.GO16584@beardog.cca.cpqcorp.net>
Date: Mon, 14 Apr 2008 12:45:40 -0500
From: scameron@...rdog.cca.cpqcorp.net
To: Jens Axboe <jens.axboe@...cle.com>
Cc: linux-kernel@...r.kernel.org, mike.miller@...com,
mikem@...rdog.cca.cpqcorp.net
Subject: Re: [patch] cciss: Fix race between disk-adding code and interrupt handler
On Mon, Apr 14, 2008 at 07:37:20PM +0200, Jens Axboe wrote:
> On Mon, Apr 14 2008, scameron@...rdog.cca.cpqcorp.net wrote:
> >
> >
> > > On Mon, Apr 14 2008, scameron@...rdog.cca.cpqcorp.net wrote:
> > > >
> > > >
> > > > Fix race condition between cciss_init_one(), cciss_update_drive_info(),
> > > > and cciss_check_queues(). cciss_softirq_done would try to start
> > > > queues which were not quite ready to be started, as its checks for
> > > > readiness were not sufficiently synchronized with the queue initializing
> > > > code in cciss_init_one and cciss_update_drive_info. Slow cpu and
> > > > large numbers of logical drives seem to make the race more likely
> > > > to cause a problem.
> > >
> > > Hmm, this seems backwards to me. cciss_softirq_done() isn't going to
> > > start the queues, until an irq has triggered for instance. Why isn't the
> > > init properly ordered instead of band-aiding around this with a
> > > 'queue_ready' variable?
> > >
> >
> > Each call to add_disk() will trigger some interrupts,
> > and earlier added disks may cause the queues of later,
> > not-yet-completely added disks to be started.
> >
> > I suppose the init routine might be reorganized to initialize all
> > the queues, then have second loop call add_disk() for all
> > of them. Is that what you had in mind by "properly ordered?"
>
> Yep precisely, don't call add_disk() until everything is set up.
>
> > Disks may be added at run time though, and I think this tears
> > down all but the first disk, and re-adds them all, if I remember
> > right, so there is some complication there to think about.
>
> Well, other drivers manage quite fine without resorting to work-arounds
> :-)
Ok. Thanks for the constructive criticism. I'll rethink it.
Fortunately, (or unfortunately) the race is apparently pretty hard
to trigger, it's been in there for ages, and we've only just seen it
manifest as a problem recently and only in one particular configuration.
-- steve
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists