lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20080414175219.GR12774@kernel.dk>
Date:	Mon, 14 Apr 2008 19:52:19 +0200
From:	Jens Axboe <jens.axboe@...cle.com>
To:	scameron@...rdog.cca.cpqcorp.net
Cc:	linux-kernel@...r.kernel.org, mike.miller@...com,
	mikem@...rdog.cca.cpqcorp.net
Subject: Re: [patch] cciss: Fix race between disk-adding code and interrupt handler

On Mon, Apr 14 2008, scameron@...rdog.cca.cpqcorp.net wrote:
> On Mon, Apr 14, 2008 at 07:37:20PM +0200, Jens Axboe wrote:
> > On Mon, Apr 14 2008, scameron@...rdog.cca.cpqcorp.net wrote:
> > > 
> > > 
> > > > On Mon, Apr 14 2008, scameron@...rdog.cca.cpqcorp.net wrote:
> > > > > 
> > > > > 
> > > > > Fix race condition between cciss_init_one(), cciss_update_drive_info(),
> > > > > and cciss_check_queues().  cciss_softirq_done would try to start
> > > > > queues which were not quite ready to be started, as its checks for
> > > > > readiness were not sufficiently synchronized with the queue initializing
> > > > > code in cciss_init_one and cciss_update_drive_info.  Slow cpu and
> > > > > large numbers of logical drives seem to make the race more likely 
> > > > > to cause a problem.
> > > > 
> > > > Hmm, this seems backwards to me.  cciss_softirq_done() isn't going to
> > > > start the queues, until an irq has triggered for instance. Why isn't the
> > > > init properly ordered instead of band-aiding around this with a
> > > > 'queue_ready' variable?
> > > >
> > > 
> > > Each call to add_disk() will trigger some interrupts, 
> > > and earlier added disks may cause the queues of later,
> > > not-yet-completely added disks to be started.
> > > 
> > > I suppose the init routine might be reorganized to initialize all
> > > the queues, then have second loop call add_disk() for all
> > > of them.  Is that what you had in mind by "properly ordered?"
> > 
> > Yep precisely, don't call add_disk() until everything is set up.
> > 
> > > Disks may be added at run time though, and I think this tears
> > > down all but the first disk, and re-adds them all, if I remember
> > > right, so there is some complication there to think about.
> > 
> > Well, other drivers manage quite fine without resorting to work-arounds
> > :-)
> 
> Ok.  Thanks for the constructive criticism.  I'll rethink it.
> 
> Fortunately, (or unfortunately) the race is apparently pretty hard 
> to trigger, it's been in there for ages, and we've only just seen it 
> manifest as a problem recently and only in one particular configuration.

Hopefully that will not matter. If you rework the init code so that
everything is up and running before you allow any IO going on, then
it'll be easier to 'prove' that you can't hit such races. If you can
have disk added at runtime, make them go through the same init
process/function.

-- 
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ