lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Mon, 23 May 2016 19:36:09 +0800 From: Shuyu Wei <wsy2220@...il.com> To: Lino Sanfilippo <LinoSanfilippo@....de> Cc: Francois Romieu <romieu@...zoreil.com>, David Miller <davem@...emloft.net>, wxt@...k-chips.com, heiko@...ech.de, linux-rockchip@...ts.infradead.org, netdev@...r.kernel.org, al.kochet@...il.com Subject: Re: [PATCH v2] ethernet:arc: Fix racing of TX ring buffer On Sun, May 22, 2016 at 01:30:27PM +0200, Lino Sanfilippo wrote: > > Thanks for testing. However that extra check for skb not being NULL should not be > necessary if the code were correct. The changes I suggested were all about having > skb and info consistent with txbd_curr. > But I just realized that there is still a big flaw in the last changes. While > tx() looks correct now (we first set up the descriptor and assign the skb and _then_ > advance txbd_curr) tx_clean still is not: > > We _first_ have to read tx_curr and _then_ read the corresponding descriptor and its skb. > (The last patch implemented just the reverse - and thus wrong - order, first get skb and > descriptor and then read tx_curr). > > So the patch below hopefully handles also tx_clean correctly. Could you please do once more a test > with this one? Hi Lino, This patch worked after a whole night of stress testing. > > > > After further test, my patch to barrier timestamp() didn't work. > > Just like the original code in the tree, the emac still got stuck under > > high load, even if I changed the smp_wmb() to dma_wmb(). So the original > > code do have race somewhere. > > So to make this clear: with the current code in net-next you still see a problem (lockup), right? Yes, I mean the mainline kernel, which should be the same as net-next. > > ... and why Francois' fix worked. Please be patient with me :-). > > So which fix(es) exactly work for you and solve your lockup issue? I mean the patch below, starting this thread. diff --git a/drivers/net/ethernet/arc/emac_main.c b/drivers/net/ethernet/arc/emac_main.c index a3a9392..df3dfef 100644 --- a/drivers/net/ethernet/arc/emac_main.c +++ b/drivers/net/ethernet/arc/emac_main.c @@ -153,9 +153,8 @@ static void arc_emac_tx_clean(struct net_device *ndev) { struct arc_emac_priv *priv = netdev_priv(ndev); struct net_device_stats *stats = &ndev->stats; - unsigned int i; - for (i = 0; i < TX_BD_NUM; i++) { + while (priv->txbd_dirty != priv->txbd_curr) { unsigned int *txbd_dirty = &priv->txbd_dirty; struct arc_emac_bd *txbd = &priv->txbd[*txbd_dirty]; struct buffer_state *tx_buff = &priv->tx_buff[*txbd_dirty]; @@ -685,13 +684,15 @@ static int arc_emac_tx(struct sk_buff *skb, struct net_device *ndev) wmb(); skb_tx_timestamp(skb); + priv->tx_buff[*txbd_curr].skb = skb; + + dma_wmb(); *info = cpu_to_le32(FOR_EMAC | FIRST_OR_LAST_MASK | len); /* Make sure info word is set */ wmb(); - priv->tx_buff[*txbd_curr].skb = skb; /* Increment index to point to the next BD */ *txbd_curr = (*txbd_curr + 1) % TX_BD_NUM;
Powered by blists - more mailing lists