lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Tue, 5 Dec 2006 14:59:31 -0600
From:	Andy Fleming <afleming@...escale.com>
To:	Andrew Morton <akpm@...l.org>
Cc:	"Maciej W. Rozycki" <macro@...ux-mips.org>,
	Ben Collins <ben.collins@...ntu.com>,
	linux-kernel@...r.kernel.org, Linus Torvalds <torvalds@...l.org>,
	Jeff Garzik <jeff@...zik.org>
Subject: Re: [PATCH] Export current_is_keventd() for libphy


On Dec 5, 2006, at 14:39, Andrew Morton wrote:

> On Tue, 5 Dec 2006 17:48:05 +0000 (GMT)
> "Maciej W. Rozycki" <macro@...ux-mips.org> wrote:
>
>>  Essentially there is a race when disconnecting from a PHY, because
>> interrupt delivery uses the event queue for processing.  The  
>> function to
>> handle interrupts that is called from the event queue is phy_change 
>> ().
>> It takes a pointer to a structure that is associated with the  
>> PHY.  At the
>> time phy_stop_interrupts() is called there may be one or more  
>> calls to
>> phy_change() still pending on the event queue.  They may not be  
>> able to be
>> processed until the structure passed to phy_change() have been  
>> freed, at
>> which point calling the function is wrong.
>>
>>  One way of avoiding it is calling flush_scheduled_work() from
>> phy_stop_interrupts().  This is fine as long as a caller of
>> phy_stop_interrupts() (not necessarily the immediate one calling into
>> libphy) does not hold the netlink lock.
>
> So let me try to rephrase...
>
> - phy_change() is the workqueue callback function.  It is executed by
>   keventd.
>
> - Something under phy_change() takes rtnl_lock() (but what??)


I don't think it's phy_change().  It's something else that may be  
scheduled.


>
> - phy_stop_interrupts() does flush_scheduled_work().  This has to
>   following logic:
>
>   - if I am kevetnd, run phy_change() directly.
>
>   - If I am not keventd, wait for keventd() to run phy_change()
>
> - So if the caller of phy_stop_interrupt() already holds rtnl_lock(),
>   and if that caller is keventd then it will recur onto rntl_lock()  
> and
>   will deadlock.
>
> Problem is, if the caller of phy_stop_interrupt() is *not* keventd,  
> that
> caller will still deadlock, because that caller is waiting for  
> keventd to
> run phy_change(), and keventd cannot do that, because the not-keventd
> process already holds rtnl_lock.
>
>
> Now, afaict, there are only two callers of phy_stop_interrupts(): the
> close() handlers of gianfar.c and fs_enet-main.c (confusingly held in
> netdevice.stop (confusingly called by dev_close())).  Via  
> phy_disconnect.
> Did I miss anything?


Right now, that's probably about right.


>
> And the dev_close() caller holds rtnl_lock.
>


Ok, I think this is the summary:

- phy_change() is the work queue callback function (scheduled when a  
PHY interrupt occurs)

- dev_close() invokes the controller's stop/close/whatever function,  
and it calls phy_disconnect()

- phy_disconnect() calls phy_stop_interrupts().  To prevent any  
pending phy_change() calls from getting confused, phy_stop_interrupts 
() needs to flush the queue.  Otherwise, subsequent memory freeings  
will leave phy_change() hanging.

- If phy_stop_interrupts() calls flush_scheduled_work(), keventd will  
execute its queues while rtnl_lock is held, providing opportunity for  
other callbacks to deadlock.

- innocent puppies are slaughtered, and the world mourns.


Maciej's solution is to schedule phy_disconnect() to be called from a  
work queue.  That solution should work, but it sounds like it doesn't  
require the check for if keventd is running.

Of course, my objection to it is that it now requires the ethernet  
controller to be excessively aware of the details of how the PHY Lib  
is handling the PHY interrupts (by scheduling them on a work queue).

Andy


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ