linux-kernel - Re: [PATCH] Export current_is

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <20061205123958.497a7bd6.akpm@osdl.org>
Date:	Tue, 5 Dec 2006 12:39:58 -0800
From:	Andrew Morton <akpm@...l.org>
To:	"Maciej W. Rozycki" <macro@...ux-mips.org>
Cc:	Andy Fleming <afleming@...escale.com>,
	Ben Collins <ben.collins@...ntu.com>,
	linux-kernel@...r.kernel.org, Linus Torvalds <torvalds@...l.org>,
	Jeff Garzik <jeff@...zik.org>
Subject: Re: [PATCH] Export current_is_keventd() for libphy

On Tue, 5 Dec 2006 17:48:05 +0000 (GMT)
"Maciej W. Rozycki" <macro@...ux-mips.org> wrote:

>  Essentially there is a race when disconnecting from a PHY, because 
> interrupt delivery uses the event queue for processing.  The function to 
> handle interrupts that is called from the event queue is phy_change().  
> It takes a pointer to a structure that is associated with the PHY.  At the 
> time phy_stop_interrupts() is called there may be one or more calls to 
> phy_change() still pending on the event queue.  They may not be able to be 
> processed until the structure passed to phy_change() have been freed, at 
> which point calling the function is wrong.
> 
>  One way of avoiding it is calling flush_scheduled_work() from 
> phy_stop_interrupts().  This is fine as long as a caller of 
> phy_stop_interrupts() (not necessarily the immediate one calling into 
> libphy) does not hold the netlink lock.

So let me try to rephrase...

- phy_change() is the workqueue callback function.  It is executed by
  keventd.

- Something under phy_change() takes rtnl_lock() (but what??)

- phy_stop_interrupts() does flush_scheduled_work().  This has to
  following logic:

  - if I am kevetnd, run phy_change() directly.

  - If I am not keventd, wait for keventd() to run phy_change()

- So if the caller of phy_stop_interrupt() already holds rtnl_lock(),
  and if that caller is keventd then it will recur onto rntl_lock() and
  will deadlock.

Problem is, if the caller of phy_stop_interrupt() is *not* keventd, that
caller will still deadlock, because that caller is waiting for keventd to
run phy_change(), and keventd cannot do that, because the not-keventd
process already holds rtnl_lock.

Now, afaict, there are only two callers of phy_stop_interrupts(): the
close() handlers of gianfar.c and fs_enet-main.c (confusingly held in
netdevice.stop (confusingly called by dev_close())).  Via phy_disconnect. 
Did I miss anything?

And the dev_close() caller holds rtnl_lock.

Summary:

a) Please tell us what code under phy_change() wants to take rtnl_lock

b) I think it should deadlock whether or not the caller of
   phy_stop_interrupt() is keventd.  What am I missing?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/