[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date: Thu, 3 Mar 2022 16:01:31 +0000
From: Robert Hancock <robert.hancock@...ian.com>
To: "kuba@...nel.org" <kuba@...nel.org>
CC: "nicolas.ferre@...rochip.com" <nicolas.ferre@...rochip.com>,
"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
"claudiu.beznea@...rochip.com" <claudiu.beznea@...rochip.com>,
"davem@...emloft.net" <davem@...emloft.net>,
"soren.brinkmann@...inx.com" <soren.brinkmann@...inx.com>,
"scott.mcnutt@...iusxm.com" <scott.mcnutt@...iusxm.com>,
"stable@...r.kernel.org" <stable@...r.kernel.org>
Subject: Re: [PATCH net] net: macb: Fix lost RX packet wakeup race in NAPI
receive
On Wed, 2022-03-02 at 22:40 -0800, Jakub Kicinski wrote:
> On Mon, 28 Feb 2022 12:33:28 -0600 Robert Hancock wrote:
> > There is an oddity in the way the RSR register flags propagate to the
> > ISR register (and the actual interrupt output) on this hardware: it
> > appears that RSR register bits only result in ISR being asserted if the
> > interrupt was actually enabled at the time, so enabling interrupts with
> > RSR bits already set doesn't trigger an interrupt to be raised. There
> > was already a partial fix for this race in the macb_poll function where
> > it checked for RSR bits being set and re-triggered NAPI receive.
> > However, there was a still a race window between checking RSR and
> > actually enabling interrupts, where a lost wakeup could happen. It's
> > necessary to check again after enabling interrupts to see if RSR was set
> > just prior to the interrupt being enabled, and re-trigger receive in that
> > case.
> >
> > This issue was noticed in a point-to-point UDP request-response protocol
> > which periodically saw timeouts or abnormally high response times due to
> > received packets not being processed in a timely fashion. In many
> > applications, more packets arriving, including TCP retransmissions, would
> > cause the original packet to be processed, thus masking the issue.
> >
> > Also change from using napi_reschedule to napi_schedule, as the only
> > difference is the presence of a return value which wasn't used here
> > anyway.
>
> Let's leave that out from this particular patch - fixes should be
> minimal, this sounds like cleanup.
Can do.
>
> > Fixes: 02f7a34f34e3 ("net: macb: Re-enable RX interrupt only when RX is
> > done")
> > Cc: stable@...r.kernel.org
> > Co-developed-by: Scott McNutt <scott.mcnutt@...iusxm.com>
> > Signed-off-by: Scott McNutt <scott.mcnutt@...iusxm.com>
> > Signed-off-by: Robert Hancock <robert.hancock@...ian.com>
> > ---
> > drivers/net/ethernet/cadence/macb_main.c | 26 ++++++++++++++++++++++--
> > 1 file changed, 24 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/net/ethernet/cadence/macb_main.c
> > b/drivers/net/ethernet/cadence/macb_main.c
> > index 98498a76ae16..338660fe1d93 100644
> > --- a/drivers/net/ethernet/cadence/macb_main.c
> > +++ b/drivers/net/ethernet/cadence/macb_main.c
> > @@ -1573,14 +1573,36 @@ static int macb_poll(struct napi_struct *napi, int
> > budget)
> > if (work_done < budget) {
> > napi_complete_done(napi, work_done);
> >
> > - /* Packets received while interrupts were disabled */
> > + /* RSR bits only seem to propagate to raise interrupts when
> > + * interrupts are enabled at the time, so if bits are already
> > + * set due to packets received while interrupts were disabled,
> > + * they will not cause another interrupt to be generated when
> > + * interrupts are re-enabled.
> > + * Check for this case here.
> > + */
> > status = macb_readl(bp, RSR);
>
> Which case is more likely - status == 0 or != 0?
>
> Because MMIO reads are usually expensive so if status is likely
> to be zero your other suggestion could be lower overhead.
> It'd be good to mention this expectation in the commit message
> or comment here.
There was some measurement done on this that motivated a previous patch in this
area:
commit 504ad98df3a6b027ce997ca8f620e949cafb151f
Author: Soren Brinkmann <soren.brinkmann@...inx.com>
Date: Sun May 4 15:43:01 2014 -0700
net: macb: Remove 'unlikely' optimization
Coverage data suggests that the unlikely case of receiving data while
the receive handler is running may not be that unlikely.
Coverage data after running iperf for a while:
91320: 891: work_done = bp->macbgem_ops.mog_rx(bp, budget);
91320: 892: if (work_done < budget) {
2362: 893: napi_complete(napi);
-: 894:
-: 895: /* Packets received while interrupts were
disabled */
4724: 896: status = macb_readl(bp, RSR);
2362: 897: if (unlikely(status)) {
762: 898: if (bp->caps &
MACB_CAPS_ISR_CLEAR_ON_WRITE)
762: 899: macb_writel(bp, ISR,
MACB_BIT(RCOMP));
-: 900: napi_reschedule(napi);
-: 901: } else {
1600: 902: macb_writel(bp, IER,
MACB_RX_INT_FLAGS);
-: 903: }
-: 904: }
Signed-off-by: Soren Brinkmann <soren.brinkmann@...inx.com>
Signed-off-by: David S. Miller <davem@...emloft.net>
So it looks like the non-zero status case was being hit roughly 1/3 of the
time, at least under that particular workload. It may depend heavily on
workload etc. but doesn't seem to be clear-cut to optimize one way or the
other.
For the new "double check" branch, from adding debug in, it appears that one is
hit on the order of a few dozen times a day under constant load, so the
"unlikely" seems appropriate there.
>
> > if (status) {
> > if (bp->caps & MACB_CAPS_ISR_CLEAR_ON_WRITE)
> > queue_writel(queue, ISR, MACB_BIT(RCOMP));
> > - napi_reschedule(napi);
> > + napi_schedule(napi);
> > } else {
> > queue_writel(queue, IER, bp->rx_intr_mask);
> > +
> > + /* Packets could have been received in the window
> > + * between the check above and re-enabling interrupts.
> > + * Therefore, a double-check is required to avoid
> > + * losing a wakeup. This can potentially race with
> > + * the interrupt handler doing the same actions if an
> > + * interrupt is raised just after enabling them, but
> > + * this should be harmless.
> > + */
> > + status = macb_readl(bp, RSR);
> > + if (unlikely(status)) {
> > + queue_writel(queue, IDR, bp->rx_intr_mask);
> > + if (bp->caps & MACB_CAPS_ISR_CLEAR_ON_WRITE)
> > + queue_writel(queue, ISR,
> > MACB_BIT(RCOMP));
> > + napi_schedule(napi);
> > + }
> > }
> > }
> >
Powered by blists - more mailing lists