linux-kernel - Re: [patch] cciss: Fix race between disk-adding code and interrupt handler

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20080414175219.GR12774@kernel.dk>
Date:	Mon, 14 Apr 2008 19:52:19 +0200
From:	Jens Axboe <jens.axboe@...cle.com>
To:	scameron@...rdog.cca.cpqcorp.net
Cc:	linux-kernel@...r.kernel.org, mike.miller@...com,
	mikem@...rdog.cca.cpqcorp.net
Subject: Re: [patch] cciss: Fix race between disk-adding code and interrupt handler

On Mon, Apr 14 2008, scameron@...rdog.cca.cpqcorp.net wrote:
> On Mon, Apr 14, 2008 at 07:37:20PM +0200, Jens Axboe wrote:
> > On Mon, Apr 14 2008, scameron@...rdog.cca.cpqcorp.net wrote:
> > > 
> > > 
> > > > On Mon, Apr 14 2008, scameron@...rdog.cca.cpqcorp.net wrote:
> > > > > 
> > > > > 
> > > > > Fix race condition between cciss_init_one(), cciss_update_drive_info(),
> > > > > and cciss_check_queues().  cciss_softirq_done would try to start
> > > > > queues which were not quite ready to be started, as its checks for
> > > > > readiness were not sufficiently synchronized with the queue initializing
> > > > > code in cciss_init_one and cciss_update_drive_info.  Slow cpu and
> > > > > large numbers of logical drives seem to make the race more likely 
> > > > > to cause a problem.
> > > > 
> > > > Hmm, this seems backwards to me.  cciss_softirq_done() isn't going to
> > > > start the queues, until an irq has triggered for instance. Why isn't the
> > > > init properly ordered instead of band-aiding around this with a
> > > > 'queue_ready' variable?
> > > >
> > > 
> > > Each call to add_disk() will trigger some interrupts, 
> > > and earlier added disks may cause the queues of later,
> > > not-yet-completely added disks to be started.
> > > 
> > > I suppose the init routine might be reorganized to initialize all
> > > the queues, then have second loop call add_disk() for all
> > > of them.  Is that what you had in mind by "properly ordered?"
> > 
> > Yep precisely, don't call add_disk() until everything is set up.
> > 
> > > Disks may be added at run time though, and I think this tears
> > > down all but the first disk, and re-adds them all, if I remember
> > > right, so there is some complication there to think about.
> > 
> > Well, other drivers manage quite fine without resorting to work-arounds
> > :-)
> 
> Ok.  Thanks for the constructive criticism.  I'll rethink it.
> 
> Fortunately, (or unfortunately) the race is apparently pretty hard 
> to trigger, it's been in there for ages, and we've only just seen it 
> manifest as a problem recently and only in one particular configuration.

Hopefully that will not matter. If you rework the init code so that
everything is up and running before you allow any IO going on, then
it'll be easier to 'prove' that you can't hit such races. If you can
have disk added at runtime, make them go through the same init
process/function.

-- 
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/