lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20130522094840.472b263e@stein>
Date:	Wed, 22 May 2013 09:48:40 +0200
From:	Stefan Richter <stefanr@...6.in-berlin.de>
To:	Tejun Heo <tj@...nel.org>
Cc:	Peter Hurley <peter@...leysoftware.com>, stephan.gatzka@...il.com,
	linux1394-devel@...ts.sourceforge.net, linux-kernel@...r.kernel.org
Subject: Re: function call fw_iso_resource_mange(..) (core-iso.c) does not
 return

On May 22 Tejun Heo wrote:
> Hello,
> 
> On Tue, May 21, 2013 at 12:54:04PM -0400, Peter Hurley wrote:
> >> This rescuer thread is responsible to keep the queue working even
> >> under high memory pressure so that a memory allocation might
> >> sleep. If that happens, all work of that workqueue is designated to
> >> that particular rescuer thread. The work in this rescuer thread is
> >> done strictly sequential. Now we have the situation that the
> >> rescuer thread runs
> >> fw_device_init->read_config_rom->read_rom->fw_run_transaction. fw_run_transaction
> >> blocks waiting for the completion object. This completion object
> >> will be completed in bus_reset_work, but this work will never
> >> executed in the rescuer thread.
> > 
> > Interesting.
> > 
> > Tejun, is this workqueue behavior as designed?  Ie., that a workqueue used
> > as a domain for forward progress guarantees collapses under certain conditions,
> > such as scheduler overhead and no longer ensures forward progress?
> 
> Yeap, from Documentation/workqueue.txt
> 
>   WQ_MEM_RECLAIM
> 
> 	All wq which might be used in the memory reclaim paths _MUST_
> 	have this flag set.  The wq is guaranteed to have at least one
> 	execution context regardless of memory pressure.
> 		 
> All it guarantees is that there will be at least one execution thread
> working on the workqueue under any conditions.  If there are
> inter-dependent work items which are necessary to make forward
> progress in memory reclaim, they must be put into separate workqueues.
> In turn, workqueues w/ WQ_RESCUER set *must* be able to make forward
> progress in all cases at the concurrency level of 1.  Probably the
> documentation needs a bit of clarification.
[...]
> > I thought the whole point of needing WQ_MEM_RECLAIM is if a SBP-2
> > device is swap.
> > 
> > FWIW, I still believe that we should revert to the original bus
> > reset as tasklet and redo the TI workaround to use
> > TI-workaround-specific versions of non-sleeping PHY accesses.
> 
> The right fix would be either dropping WQ_MEM_RECLAIM or breaking it
> into two workqueues so that work items don't have interdependencies.
> 
> Thanks.

Argh, suddenly it all seems so obvious.  Tejun, Peter, Stephan, thank you
for getting this clarified.

A third (fourth?) way to fix it --- feasible or not --- would be to break
the dependency between the worklets.  In this case, use a timer to cancel
outbound transactions if the request-transmit IRQ event was not received
before a timeout.  We had such a timeout in the older ieee1394 drivers and
we also had it in earlier versions of the firewire drivers, at a risk of a
race between CPU and OHCI.
-- 
Stefan Richter
-=====-===-= -=-= =-==-
http://arcgraph.de/sr/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ