netdev - Re: [PATCH net] net: enetc: avoid deadlock in enetc_tx_onestep

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAKgT0UfnRceCA__HkpVnONO_AAp+wt+1GC2b6-vNk8oPh6aV9g@mail.gmail.com>
Date:   Thu, 12 Jan 2023 13:29:21 -0800
From:   Alexander Duyck <alexander.duyck@...il.com>
To:     Vladimir Oltean <vladimir.oltean@....com>
Cc:     netdev@...r.kernel.org, "David S. Miller" <davem@...emloft.net>,
        Eric Dumazet <edumazet@...gle.com>,
        Jakub Kicinski <kuba@...nel.org>,
        Paolo Abeni <pabeni@...hat.com>,
        Claudiu Manoil <claudiu.manoil@....com>,
        "Y . b . Lu" <yangbo.lu@....com>
Subject: Re: [PATCH net] net: enetc: avoid deadlock in enetc_tx_onestep_tstamp()

On Thu, Jan 12, 2023 at 10:54 AM Vladimir Oltean
<vladimir.oltean@....com> wrote:
>
> On Thu, Jan 12, 2023 at 09:48:40AM -0800, Alexander H Duyck wrote:
> > Looking at the patch this fixes I had a question. You have the tx_skbs
> > in the enet_ndev_priv struct and from what I can tell it looks like you
> > support multiple Tx queues. Is there a risk of corrupting the queue if
> > multiple Tx queues attempt to request the onestep timestamp?
>
> void skb_queue_tail(struct sk_buff_head *list, struct sk_buff *newsk)
> {
>         unsigned long flags;
>
>         spin_lock_irqsave(&list->lock, flags);
>         __skb_queue_tail(list, newsk);
>         spin_unlock_irqrestore(&list->lock, flags);
> }

So yet another layer of locking. As I said you could spare yourself
some cycles by moving this to a per queue list rather than a global
one. With that you could use the Tx lock to protect the list instead
of having to have the Tx lock and the queue lock.

> > Also I am confused. Why do you clear the TSTAMP_IN_PROGRESS flag in
> > enetc_tx_onestep_timestamp before checking the state of the queue?
>
> Because when enetc_tx_onestep_timestamp() is called, the one-step
> timestamping process is no longer in progress - which is what we need to
> know. The resource that needs serialized access is the MAC-wide
> ENETC_PM0_SINGLE_STEP register. So from enetc_start_xmit() and until
> enetc_clean_tx_ring(), there can only be one one-step Sync message in
> flight at a time.
>
> > It seems like something you should only be clearing once the queue is
> > empty.
>
> The flag tracks what it says: whether there's a one-step timestamp in
> progress. If no TS is in progress and a Sync message must be
> timestamped, the flag will be set but the skb will not be queued.
> It will be timestamped right away.
>
> The queue is there to ensure that Sync messages sent in a burst are
> eventually all sent (and timestamped). Each TX confirmation will
> schedule the work item again.
>
> By taking netif_tx_lock[_bh](), enetc_tx_onestep_tstamp() ensures that
> it has priority in sending the skbs already queued up in &priv->tx_skbs,
> over those coming from ndo_start_xmit -> enetc_xmit(). Not only that,
> but if enetc_tx_onestep_tstamp() doesn't clear TSTAMP_IN_PROGRESS before
> calling enetc_start_xmit(), this is a PEBKAC, because the skb will end
> up being queued right back into &priv->tx_skbs again, rather than ever
> getting sent. Keeping the netif_tx_lock() held ensures that the
> TSTAMP_IN_PROGRESS bit will remain unset enough for our own queued skb
> to make forward progress in enetc_start_xmit().

Yeah, what I realized is that I was looking at the "fixes" patch and
not the current code. I missed the patch "enetc: fix locking for
one-step timestamping packet transfer". It fixes the issue by moving
the test_and_set_bit_lock, but is still dependent on a global lock to
prevent anything else from taking the bit when we attempt a transmit.

So essentially we have to completely disable the Tx path in order to
make sure we don't race against any other Tx thread while we clear and
set the ENETC_TX_ONESTEP_TSTAMP_IN_PROGRESS flag.

Not pretty, but it addresses the issue it says it does.

Reviewed-by: Alexander Duyck <alexanderduyck@...com>


One other question I had. How do you handle the event that
enetc_start_xmit returns NETDEV_TX_BUSY or causes the packet to go
down the drop_packet_err path?