lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20231103170711.4006756-1-alexey.pakhunov@spacex.com>
Date:   Fri, 3 Nov 2023 10:07:11 -0700
From:   Alex Pakhunov <alexey.pakhunov@...cex.com>
To:     <michael.chan@...adcom.com>
CC:     <alexey.pakhunov@...cex.com>, <linux-kernel@...r.kernel.org>,
        <mchan@...adcom.com>, <netdev@...r.kernel.org>,
        <prashant@...adcom.com>, <siva.kallam@...adcom.com>,
        <vincent.wong2@...cex.com>
Subject: Re: [PATCH v2 1/2] tg3: Increment tx_dropped in tg3_tso_bug()

> This is prone to race conditions if we have more than one TX queue.

Yes, indeed.

> The original driver code only supported one TX queue and the counters
> were never modified properly to support multiple queues.  We should
> convert them to per queue counters by moving tx_dropped and rx_dropped
> to the tg3_napi struct.

I'm not super familiar with the recommended approach for handling locks in
network drivers, so I spent a bit of tme looking at what tg3 does.

It seems that there are a few ways to remove the race condition when
working with these counters:

1. Use atomic increments. It is easy but every update is more expensive
   than it needs to be. We might be able to say that there specific
   counters are updated rarely, so maybe we don't care too much.
2. netif_tx_lock is already taken when tx_droped is incremented - wrap
   rx_dropped increment and reading both counters in netif_tx_lock. This
   seems legal since tg3_tx() can take netif_tx_lock. I'm not sure how to
   order netif_tx_lock and tp->lock, since tg3_get_stats64() takes
   the latter. Should netif_tx_lock be takes inside tp->lock? Should they
   be not nested?
3. Using tp->lock to protect rx_dropped (tg3_poll_link() already takes it
   so it must be legal) and netif_tx_lock to protect tx_dropped.

There are probably other options. Can you recommend an aproach?

Also, this seems like a larger change that should be done separately from
fixing the TX stall. Should we land just "[PATCH v2 2/2]"? Should we land
the whole patch (since it does not make race condition much worse) and fix
the race condition separately?

Alex.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ