netdev - Re: [PATCH v2 1/2] tg3: Increment tx_dropped in tg3_tso

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CACKFLi=ZLAb1Y92LwvqjOGPCuinka7qbHwDP2pkG4-_a7DMorQ@mail.gmail.com>
Date: Fri, 3 Nov 2023 16:02:49 -0700
From: Michael Chan <michael.chan@...adcom.com>
To: Alex Pakhunov <alexey.pakhunov@...cex.com>
Cc: linux-kernel@...r.kernel.org, mchan@...adcom.com, netdev@...r.kernel.org, 
	prashant@...adcom.com, siva.kallam@...adcom.com, vincent.wong2@...cex.com
Subject: Re: [PATCH v2 1/2] tg3: Increment tx_dropped in tg3_tso_bug()

On Fri, Nov 3, 2023 at 10:07 AM Alex Pakhunov
<alexey.pakhunov@...cex.com> wrote:
> I'm not super familiar with the recommended approach for handling locks in
> network drivers, so I spent a bit of tme looking at what tg3 does.
>
> It seems that there are a few ways to remove the race condition when
> working with these counters:
>
> 1. Use atomic increments. It is easy but every update is more expensive
>    than it needs to be. We might be able to say that there specific
>    counters are updated rarely, so maybe we don't care too much.
> 2. netif_tx_lock is already taken when tx_droped is incremented - wrap
>    rx_dropped increment and reading both counters in netif_tx_lock. This
>    seems legal since tg3_tx() can take netif_tx_lock. I'm not sure how to
>    order netif_tx_lock and tp->lock, since tg3_get_stats64() takes
>    the latter. Should netif_tx_lock be takes inside tp->lock? Should they
>    be not nested?
> 3. Using tp->lock to protect rx_dropped (tg3_poll_link() already takes it
>    so it must be legal) and netif_tx_lock to protect tx_dropped.
>
> There are probably other options. Can you recommend an aproach?

I recommend using per queue counters as briefly mentioned in my
earlier reply.  Move the tx_dropped and rx_dropped counters to the per
queue tg3_napi struct.  Incrementing tnapi->tx_dropped in
tg3_start_xmit() is serialized by the netif_tx_lock held by the stack.

Similarly, incrementing tnapi->rx_dropped in the tg3_rx() is serialized by NAPI.

tg3_get_stats64() can just loop and sum all the tx_dropped and
rx_dropped counters in each tg3_napi struct.  We don't worry about
locks here since we are just reading.

>
> Also, this seems like a larger change that should be done separately from
> fixing the TX stall. Should we land just "[PATCH v2 2/2]"? Should we land
> the whole patch (since it does not make race condition much worse) and fix
> the race condition separately?
>

Yes, we can merge patch #2 first which fixes the stall.  Please repost
just patch #2 standalone if you want to do that.  Thanks.

Download attachment "smime.p7s" of type "application/pkcs7-signature" (4209 bytes)