lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Sat, 14 Mar 2015 10:38:20 -0400 From: "Ahmed S. Darwish" <darwish.07@...il.com> To: Marc Kleine-Budde <mkl@...gutronix.de> Cc: Olivier Sobrie <olivier@...rie.be>, Oliver Hartkopp <socketcan@...tkopp.net>, Wolfgang Grandegger <wg@...ndegger.com>, Andri Yngvason <andri.yngvason@...el.com>, Linux-CAN <linux-can@...r.kernel.org>, LKML <linux-kernel@...r.kernel.org>, netdev@...r.kernel.org Subject: Re: [PATCH v4 1/3] can: kvaser_usb: Fix tx queue start/stop race conditions Hi Marc, On Sat, Mar 14, 2015 at 02:41:18PM +0100, Marc Kleine-Budde wrote: > On 03/14/2015 02:02 PM, Ahmed S. Darwish wrote: > > From: Ahmed S. Darwish <ahmed.darwish@...eo.com> > > > > A number of tx queue wake-up events went missing due to the > > outlined scenario below. Start state is a pool of 16 tx URBs, > > active tx_urbs count = 15, with the netdev tx queue open. > > > > CPU #1 [softirq] CPU #2 [softirq] > > start_xmit() tx_acknowledge() > > ................ ................ > > > > atomic_inc(&tx_urbs); > > if (atomic_read(&tx_urbs) >= 16) { > > --> > > atomic_dec(&tx_urbs); > > netif_wake_queue(); > > return; > > <-- > > netif_stop_queue(); > > } > > > > At the end, the correct state expected is a 15 tx_urbs count > > value with the tx queue state _open_. Due to the race, we get > > the same tx_urbs value but with the tx queue state _stopped_. > > The wake-up event is completely lost. > > > > Thus avoid hand-rolled concurrency mechanisms and use a proper > > lock for contexts and tx queue protection. > > > > Signed-off-by: Ahmed S. Darwish <ahmed.darwish@...eo.com> > > Applied to can. This will go into David's net tree and finally into > net-next. Then I'll apply patches 2+3. Nag me, if I forget about them ;) > Thanks! :-) So if I've understood correctly, this patch will go to -rc5 and the rest will go into net-next? If so, IMHO patch #2 should also go to -rc5. I know it's usually frowned up on to add further patches at this late -rc stage, but here's my logic: The original driver code just _arbitrarily_ assumed a MAX_TX_URB value of 16 parallel transmissions. This value was chosen, it seems, because the driver was heavily based on esd_usb2.c and the esd code just did so :-( Meanwhile, in the Kvaser hardware at hand, if I've increased the driver's max parallel transmissions little above the recommended limit reported by firmware, the firmware breaks up badly, reports a massive list of internal errors, and the candump traces becomes sort of an internal mess hardly related to the actual frames sent and received. In my case, I was lucky that the brand we own here (*) had a higher max outstanding transmissions value than 16. But if there is hardware out there with a max oustanding tx support < 16 (#), such hardware will break badly under a heavy transmission load :-( (*) http://www.kvaser.com/products/kvaser-usb-hs-ii-hsls/ (#) There are a huge list of Kvaser products having the same controller but with different performance metrics, so this is quite a possiblity. Thanks, Darwish -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@...r.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists