[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CANn89iKLonovBaX7nAHHuwtuo9q=y5kFm7E0msKEz5xqVHF8Lw@mail.gmail.com>
Date: Sun, 23 Sep 2018 12:47:57 -0700
From: Eric Dumazet <edumazet@...gle.com>
To: David Miller <davem@...emloft.net>
Cc: netdev <netdev@...r.kernel.org>, michael.chan@...adcom.com,
Ariel Elior <ariel.elior@...ium.com>,
Eric Dumazet <eric.dumazet@...il.com>,
Tariq Toukan <tariqt@...lanox.com>,
Saeed Mahameed <saeedm@...lanox.com>,
Jeff Kirsher <jeffrey.t.kirsher@...el.com>,
jakub.kicinski@...ronome.com, songliubraving@...com,
Jay Vosburgh <j.vosburgh@...il.com>,
Veaceslav Falico <vfalico@...il.com>,
Andy Gospodarek <andy@...yhouse.net>
Subject: Re: [PATCH net 00/15] netpoll: avoid capture effects for NAPI drivers
On Sun, Sep 23, 2018 at 12:29 PM David Miller <davem@...emloft.net> wrote:
>
> From: Eric Dumazet <edumazet@...gle.com>
> Date: Fri, 21 Sep 2018 15:27:37 -0700
>
> > As diagnosed by Song Liu, ndo_poll_controller() can
> > be very dangerous on loaded hosts, since the cpu
> > calling ndo_poll_controller() might steal all NAPI
> > contexts (for all RX/TX queues of the NIC).
> >
> > This capture, showing one ksoftirqd eating all cycles
> > can last for unlimited amount of time, since one
> > cpu is generally not able to drain all the queues under load.
> >
> > It seems that all networking drivers that do use NAPI
> > for their TX completions, should not provide a ndo_poll_controller() :
> >
> > Most NAPI drivers have netpoll support already handled
> > in core networking stack, since netpoll_poll_dev()
> > uses poll_napi(dev) to iterate through registered
> > NAPI contexts for a device.
>
> I'm having trouble understanding the difference.
>
> If the drivers are processing all of the RX/TX queue draining by hand
> in their ndo_poll_controller() method, how is that different from the
> generic code walking all of the registererd NAPI instances one by one?
(resent in plain text mode this time)
Reading poll_napi() and poll_one_napi() I thought that we were using
NAPI_STATE_NPSVC
and cmpxchg(&napi->poll_owner, -1, cpu) to _temporary_ [1] own each
napi at a time.
But I do see we also have this part at the beginning of poll_one_napi() :
if (!test_bit(NAPI_STATE_SCHED, &napi->state))
return;
So we probably should remove it. (The normal napi->poll() calls would
not proceed since napi->poll_owner would not be -1)
[1]
While if a cpu succeeds into setting NAPI_STATE_SCHED, it means it has
to own it as long as the
napi->poll() does not call napi_complete_done(), and this can be
forever (the capture effect)
Basically calling napi_schedule() is the dangerous part.
I believe busy_polling and netpoll are the same intruders (as they can
run on arbitrary cpus).
But netpoll is far more problematic since it iterates through all RX/TX queues.
Powered by blists - more mailing lists