linux-kernel - Re: [PATCH v3 10/15] net: ethernet: mtk-eth-mac: new driver

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAMRc=MeVyNzTWw_hk=J9kX1NE9reCE_O4P3wrNpMMc9z4xA_DA@mail.gmail.com>
Date:   Mon, 18 May 2020 16:07:23 +0200
From:   Bartosz Golaszewski <brgl@...ev.pl>
To:     Arnd Bergmann <arnd@...db.de>
Cc:     Jonathan Corbet <corbet@....net>, Rob Herring <robh+dt@...nel.org>,
        "David S . Miller" <davem@...emloft.net>,
        Matthias Brugger <matthias.bgg@...il.com>,
        John Crispin <john@...ozen.org>,
        Sean Wang <sean.wang@...iatek.com>,
        Mark Lee <Mark-MC.Lee@...iatek.com>,
        Jakub Kicinski <kuba@...nel.org>,
        Fabien Parent <fparent@...libre.com>,
        Heiner Kallweit <hkallweit1@...il.com>,
        Edwin Peer <edwin.peer@...adcom.com>,
        DTML <devicetree@...r.kernel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        Networking <netdev@...r.kernel.org>,
        Linux ARM <linux-arm-kernel@...ts.infradead.org>,
        "moderated list:ARM/Mediatek SoC..." 
        <linux-mediatek@...ts.infradead.org>,
        Stephane Le Provost <stephane.leprovost@...iatek.com>,
        Pedro Tsai <pedro.tsai@...iatek.com>,
        Andrew Perepech <andrew.perepech@...iatek.com>,
        Bartosz Golaszewski <bgolaszewski@...libre.com>
Subject: Re: [PATCH v3 10/15] net: ethernet: mtk-eth-mac: new driver

pt., 15 maj 2020 o 15:32 Arnd Bergmann <arnd@...db.de> napisał(a):
>
> On Thu, May 14, 2020 at 10:00 AM Bartosz Golaszewski <brgl@...ev.pl> wrote:
> > +static int mtk_mac_ring_pop_tail(struct mtk_mac_ring *ring,
> > +                                struct mtk_mac_ring_desc_data *desc_data)
>
> I took another look at this function because of your comment on the locking
> the descriptor updates, which seemed suspicious as the device side does not
> actually use the locks to access them
>
> > +{
> > +       struct mtk_mac_ring_desc *desc = &ring->descs[ring->tail];
> > +       unsigned int status;
> > +
> > +       /* Let the device release the descriptor. */
> > +       dma_rmb();
> > +       status = desc->status;
> > +       if (!(status & MTK_MAC_DESC_BIT_COWN))
> > +               return -1;
>
> The dma_rmb() seems odd here, as I don't see which prior read
> is being protected by this.
>
> > +       desc_data->len = status & MTK_MAC_DESC_MSK_LEN;
> > +       desc_data->flags = status & ~MTK_MAC_DESC_MSK_LEN;
> > +       desc_data->dma_addr = ring->dma_addrs[ring->tail];
> > +       desc_data->skb = ring->skbs[ring->tail];
> > +
> > +       desc->data_ptr = 0;
> > +       desc->status = MTK_MAC_DESC_BIT_COWN;
> > +       if (status & MTK_MAC_DESC_BIT_EOR)
> > +               desc->status |= MTK_MAC_DESC_BIT_EOR;
> > +
> > +       /* Flush writes to descriptor memory. */
> > +       dma_wmb();
>
> The comment and the barrier here seem odd as well. I would have expected
> a barrier after the update to the data pointer, and only a single store
> but no read of the status flag instead of the read-modify-write,
> something like
>
>       desc->data_ptr = 0;
>       dma_wmb(); /* make pointer update visible before status update */
>       desc->status = MTK_MAC_DESC_BIT_COWN | (status & MTK_MAC_DESC_BIT_EOR);
>
> > +       ring->tail = (ring->tail + 1) % MTK_MAC_RING_NUM_DESCS;
> > +       ring->count--;
>
> I would get rid of the 'count' here, as it duplicates the information
> that is already known from the difference between head and tail, and you
> can't update it atomically without holding a lock around the access to
> the ring. The way I'd do this is to have the head and tail pointers
> in separate cache lines, and then use READ_ONCE/WRITE_ONCE
> and smp barriers to access them, with each one updated on one
> thread but read by the other.
>

Your previous solution seems much more reliable though. For instance
in the above: when we're doing the TX cleanup (we got the TX ready
irq, we're iterating over descriptors until we know there are no more
packets scheduled (count == 0) or we encounter one that's still owned
by DMA), a parallel TX path can schedule new packets to be sent and I
don't see how we can atomically check the count (understood as a
difference between tail and head) and run a new iteration (where we'd
modify the head or tail) without risking the other path getting in the
way. We'd have to always check the descriptor.

I experimented a bit with this and couldn't come up with anything that
would pass any stress test.

On the other hand: spin_lock_bh() works fine and I like your approach
from the previous e-mail - except for the work for updating stats as
we could potentially lose some stats when we're updating in process
context with RX/TX paths running in parallel in napi context but that
would be rare enough to overlook it.

I hope v4 will be good enough even with spinlocks. :)

Bart