[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <b5ab983d43284d298fdc0d1268b33053@amazon.com>
Date: Thu, 1 Feb 2024 13:21:44 +0000
From: "Arinzon, David" <darinzon@...zon.com>
To: Paolo Abeni <pabeni@...hat.com>, "Nelson, Shannon"
<shannon.nelson@....com>, David Miller <davem@...emloft.net>, Jakub Kicinski
<kuba@...nel.org>, "netdev@...r.kernel.org" <netdev@...r.kernel.org>
CC: "Woodhouse, David" <dwmw@...zon.co.uk>, "Machulsky, Zorik"
<zorik@...zon.com>, "Matushevsky, Alexander" <matua@...zon.com>, "Bshara,
Saeed" <saeedb@...zon.com>, "Wilson, Matt" <msw@...zon.com>, "Liguori,
Anthony" <aliguori@...zon.com>, "Bshara, Nafea" <nafea@...zon.com>,
"Belgazal, Netanel" <netanel@...zon.com>, "Saidi, Ali" <alisaidi@...zon.com>,
"Herrenschmidt, Benjamin" <benh@...zon.com>, "Kiyanovski, Arthur"
<akiyano@...zon.com>, "Dagan, Noam" <ndagan@...zon.com>, "Agroskin, Shay"
<shayagr@...zon.com>, "Itzko, Shahar" <itzko@...zon.com>, "Abboud, Osama"
<osamaabb@...zon.com>, "Ostrovsky, Evgeny" <evostrov@...zon.com>, "Tabachnik,
Ofir" <ofirt@...zon.com>, "Koler, Nati" <nkoler@...zon.com>
Subject: RE: [PATCH v2 net-next 07/11] net: ena: Add more information on TX timeouts
> On Tue, 2024-01-30 at 09:53 +0000, darinzon@...zon.com wrote:
> > @@ -3408,25 +3437,45 @@ static int
> check_missing_comp_in_tx_queue(struct ena_adapter *adapter,
> > adapter->missing_tx_completion_to);
> >
> > if (unlikely(is_tx_comp_time_expired)) {
> > - if (!tx_buf->print_once) {
> > - time_since_last_napi = jiffies_to_usecs(jiffies - tx_ring-
> >tx_stats.last_napi_jiffies);
> > - missing_tx_comp_to = jiffies_to_msecs(adapter-
> >missing_tx_completion_to);
> > - netif_notice(adapter, tx_err, adapter->netdev,
> > - "Found a Tx that wasn't completed on time, qid %d,
> index %d. %u usecs have passed since last napi execution. Missing Tx
> timeout value %u msecs\n",
> > - tx_ring->qid, i, time_since_last_napi,
> missing_tx_comp_to);
> > + time_since_last_napi =
> > + jiffies_to_usecs(jiffies - tx_ring->tx_stats.last_napi_jiffies);
> > + napi_scheduled = !!(ena_napi->napi.state &
> > + NAPIF_STATE_SCHED);
> > +
> > + if (missing_tx_comp_to < time_since_last_napi &&
> napi_scheduled) {
> > + /* We suspect napi isn't called because the
> > + * bottom half is not run. Require a bigger
> > + * timeout for these cases
> > + */
>
> Not blocking this series, but I guess the above "the bottom half is not run",
> after commit d15121be7485655129101f3960ae6add40204463, happens only
> when running in napi threaded mode, right?
>
> cheers,
>
> Paolo
Hi Paolo,
The ENA driver napi routine doesn't run in threaded mode.
We've seen cases where napi is indeed scheduled, but didn't get a chance
to run for a noticeable amount of time and process TX completions,
and based on that we conclude that there's a high CPU load that doesn't allow
the routine to run in a timely manner.
Based on the information in d15121be7485655129101f3960ae6add40204463,
the observed stalls are in the magnitude of milliseconds, the above code is actually
an additional grace time, and the timeouts here are in seconds.
Thanks,
David
Powered by blists - more mailing lists