lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1302621318.2604.19.camel@mulgrave.site>
Date:	Tue, 12 Apr 2011 10:15:18 -0500
From:	James Bottomley <James.Bottomley@...senPartnership.com>
To:	Tejun Heo <tj@...nel.org>
Cc:	Steven Whitehouse <swhiteho@...hat.com>,
	linux-kernel@...r.kernel.org, Jens Axboe <jaxboe@...ionio.com>
Subject: Re: Strange block/scsi/workqueue issue

On Tue, 2011-04-12 at 14:15 +0900, Tejun Heo wrote:
> > A fix might be to shunt more stuff off to workqueues, but that's
> > producing a more complex system which would be prone to entanglements
> > that would be even harder to spot.
> 
> I don't agree there.  To me, the cause for entanglement seems to be
> request_fn calling all the way through blocking destruction because it
> detected that the final put was called with sleepable context.  It's
> just weird and difficult to anticipate to directly call into sleepable
> destruction path from request_fn whether it had sleepable context or
> not.  With the yet-to-be-debugged bug caused by the conversion aside,
> I think simply using workqueue is the better solution.

So your idea is that all final puts should go through a workqueue?  Like
I said, that would work, but it's not just SCSI ... any call path that
destroys a queue has to be audited.

The problem is nothing to do with sleeping context ... it's that any
work called by the block workqueue can't destroy that queue.  In a
refcounted model, that's a bit nasty.

> > Perhaps a better solution is just not to use sync cancellations in
> > block?  As long as the work in the queue holds a queue ref, they can be
> > done asynchronously.
> 
> Hmmm... maybe but at least I prefer doing explicit shutdown/draining
> on destruction even if the base data structure is refcounted.  Things
> become much more predictable that way.

It is pretty much instantaneous.  Unless we're executing, we cancel the
work.  If the work is already running, we just let it complete instead
of waiting for it.

Synchronous waits are dangerous because they cause entanglement.

James


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ