lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 6 Dec 2017 13:14:13 +0100
From:   Niklas Cassel <niklas.cassel@...s.com>
To:     Jose Abreu <Jose.Abreu@...opsys.com>
Cc:     Joao Pinto <Joao.Pinto@...opsys.com>,
        linux-netdev <netdev@...r.kernel.org>,
        Giuseppe CAVALLARO <peppe.cavallaro@...com>,
        alexandre.torgue@...com
Subject: Re: Commit 05cf0d1bf4 ("net: stmmac: free an skb first when there
 are no longer any descriptors using it") breaks stmmac?

On Thu, Nov 30, 2017 at 04:50:38PM +0000, Jose Abreu wrote:
> Hi Niklas,
> 
> Thanks for the detailed explanation!
> 
> On 30-11-2017 03:51, Niklas Cassel wrote:
> >
> > Could you try to see if the following patch
> > solves your problem:
> > (still don't see why it should be needed,
> > considering stmmac_tx_clean() should set all non-NULL
> > entries to NULL)
> >
> > diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> > index 1763e48c84e2..1d30b20b3740 100644
> > --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> > +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> > @@ -2830,6 +2830,7 @@ static netdev_tx_t stmmac_tso_xmit(struct sk_buff *skb, struct net_device *dev)
> >  
> >         tx_q->tx_skbuff_dma[first_entry].buf = des;
> >         tx_q->tx_skbuff_dma[first_entry].len = skb_headlen(skb);
> > +       tx_q->tx_skbuff[first_entry] = NULL;
> >  
> >         first->des0 = cpu_to_le32(des);
> >  
> > @@ -3003,6 +3004,8 @@ static netdev_tx_t stmmac_xmit(struct sk_buff *skb, struct net_device *dev)
> >  
> >         first = desc;
> >  
> > +       tx_q->tx_skbuff[first_entry] = NULL;
> > +
> >         enh_desc = priv->plat->enh_desc;
> >         /* To program the descriptors according to the size of the frame */
> >         if (enh_desc)
> 
> I confirm that applying 05cf0d1bf4 ("net: stmmac: free an skb
> first when there are no longer any descriptors using it") and
> this patch makes my setup work perfectly in multi-queue
> configuration. Can you send an official patch to -net?
> 
> You can add my:
> Tested-by: Jose Abreu <joabreu@...opsys.com>
> 
> Also, maybe we should try to understand why stmmac_tx_clean() is
> not setting the entry to NULL?

Hello Jose,

It is not easy to decode the databook, however,
after reading the databook several times more,
looking at:

3.3.2.2 Tx DMA Operation: OSP Mode

"The DMA writes the status, with a cleared Own bit, to the
corresponding TDES3, thus closing the descriptor."

So closing the descriptor means clearing the own bit.


Now looking at the mode we are interested in:

3.3.2.1 Tx DMA Operation: Default (Non-OSP) Mode

8. If an Ethernet packet is stored over data buffers in multiple descriptors,
the DMA closes the intermediate descriptor and fetches the next descriptor.
Steps 3 through 7 are repeated until the end-of-Ethernet-packet data is
transferred to the MTL.

So I think that my initial understanding was correct after all,
i.e., the commit 05cf0d1bf4 ("net: stmmac: free an skb first when
there are no longer any descriptors using it") is needed to make
sure that the DMA is not using skbs that might have been freed.

How certain are you that this is the offending commit?

Your crash is on v4.14-rc5, I can see an interesting
commit: 8d5f4b071749 ("stmmac: Don't access tx_q->dirty_tx before
netif_tx_lock"), which got included first in v4.14-rc6.



I've tried to understand if this could be a use-after-free,
but I don't see it.

An skb has a certain queue-mapping, so it shouldn't be in more
than one queue.

tx_q->tx_skbuff is allocated with kmalloc_array(), but is
initialized in init_dma_tx_desc_rings().

Other than that, it is only used in stmmac_tso_xmit()/stmmac_xmit()
and stmmac_tx_clean(). stmmac_tx_clean() will free and non-NULL skb,
and set tx_q->tx_skbuff[entry] to NULL.

I don't see why stmmac does:

for (i = 0; i < nfrags; i++) {
    ...
    tx_q->tx_skbuff[tx_q->cur_tx] = NULL;

since it is initialized to NULL in init_dma_tx_desc_rings(),
and if it has been cleaned, it should be set to NULL again.

If we haven't cleaned for a long time, we will not be able
to queue more skbs, since stmmac_tso_xmit()/stmmac_xmit()
checks current "clean" entries, see stmmac_tx_avail().

I removed the tx_q->tx_skbuff[tx_q->cur_tx] = NULL
from both stmmac_tso_xmit()/stmmac_xmit() locally,
and I could not see any problems with ping (different sizes)
or with iperf3.


Could you try to reproduce the bug with CONFIG_KASAN enabled?
(And please don't cut anything from the oops message).


Regards,
Niklas

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ