[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <1274872584.20576.13579.camel@macbook.infradead.org>
Date: Wed, 26 May 2010 12:16:24 +0100
From: David Woodhouse <dwmw2@...radead.org>
To: linux-atm-general@...ts.sourceforge.net, netdev@...r.kernel.org
Cc: Nathan Williams <nathan@...verse.com.au>
Subject: RX/close vcc race with solos/atmtcp/usbatm/he
I've had this crash reported to me...
[18842.727906] EIP: [<e082f490>] br2684_push+0x19/0x234 [br2684] SS:ESP 0068:dfb89d14
[18845.090712] [<c13ecff3>] ? do_page_fault+0x0/0x2e1
[18845.120042] [<e082f490>] ? br2684_push+0x19/0x234 [br2684]
[18845.153530] [<e084fa13>] solos_bh+0x28b/0x7c8 [solos_pci]
[18845.186488] [<e084f711>] ? solos_irq+0x2d/0x51 [solos_pci]
[18845.219960] [<c100387b>] ? handle_irq+0x3b/0x48
[18845.247732] [<c10265cb>] ? irq_exit+0x34/0x57
[18845.274437] [<c1025720>] tasklet_action+0x42/0x69
[18845.303247] [<c102643f>] __do_softirq+0x8e/0x129
[18845.331540] [<c10264ff>] do_softirq+0x25/0x2a
[18845.358274] [<c102664c>] _local_bh_enable_ip+0x5e/0x6a
[18845.389677] [<c102666d>] local_bh_enable+0xb/0xe
[18845.417944] [<e08490a8>] ppp_unregister_channel+0x32/0xbb [ppp_generic]
[18845.458193] [<e08731ad>] pppox_unbind_sock+0x18/0x1f [pppox]
[18845.492712] [<e087f5f7>] pppoe_device_event+0xa7/0x159 [pppoe]
[18845.528269] [<c13ed2ff>] notifier_call_chain+0x2b/0x4a
[18845.559674] [<c1038a62>] raw_notifier_call_chain+0xc/0xe
[18845.592110] [<c1300867>] dev_close+0x51/0x8b
[18845.618266] [<c1300927>] rollback_registered_many+0x86/0x15e
[18845.652813] [<c1300ab3>] unregister_netdevice_queue+0x67/0x91
[18845.687849] [<c1300b79>] unregister_netdev+0x14/0x1c
[18845.718221] [<e082f4d1>] br2684_push+0x5a/0x234 [br2684]
[18845.750676] [<e083dc21>] vcc_release+0x64/0x100 [atm]
The problem is that the 'find_vcc' functions in these drivers are
returning a vcc with the ATM_VF_READY bit cleared, because it's already
in the process of being destroyed. If we fix that simple oversight,
there's still a race condition because the socket can still be closed
(and completely freed, afaict) between our call to find_vcc() and our
call to vcc->push() in the RX tasklet.
Here's a patch for solos-pci which should fix it. We prevent the race by
making the dev->ops->close() function wait for the RX tasklet to
complete, so it can't still be using the vcc in question.
I think this same approach should work OK for usbatm and he. Less sure
about atmtcp -- we may need some extra locking there to protect
atmtcp_c_send(). And I'm ignoring eni_proc_read() for now.
Can anyone see a better approach -- short of rewriting the whole ATM
layer to make the locking saner?
diff --git a/drivers/atm/solos-pci.c b/drivers/atm/solos-pci.c
index c5f5186..a73f102 100644
--- a/drivers/atm/solos-pci.c
+++ b/drivers/atm/solos-pci.c
@@ -774,7 +774,8 @@ static struct atm_vcc *find_vcc(struct atm_dev *dev, short vpi, int vci)
sk_for_each(s, node, head) {
vcc = atm_sk(s);
if (vcc->dev == dev && vcc->vci == vci &&
- vcc->vpi == vpi && vcc->qos.rxtp.traffic_class != ATM_NONE)
+ vcc->vpi == vpi && vcc->qos.rxtp.traffic_class != ATM_NONE &&
+ test_bit(ATM_VF_READY, &vcc->flags))
goto out;
}
vcc = NULL;
@@ -900,6 +901,10 @@ static void pclose(struct atm_vcc *vcc)
clear_bit(ATM_VF_ADDR, &vcc->flags);
clear_bit(ATM_VF_READY, &vcc->flags);
+ /* Hold up vcc_destroy_socket() (our caller) until solos_bh() in the
+ tasklet has finished processing any incoming packets (and, more to
+ the point, using the vcc pointer). */
+ tasklet_unlock_wait(&card->tlet);
return;
}
--
David Woodhouse Open Source Technology Centre
David.Woodhouse@...el.com Intel Corporation
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists