netdev - Re: [PATCH net 00/15] netpoll: avoid capture effects for NAPI drivers

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CANn89iKLonovBaX7nAHHuwtuo9q=y5kFm7E0msKEz5xqVHF8Lw@mail.gmail.com>
Date:   Sun, 23 Sep 2018 12:47:57 -0700
From:   Eric Dumazet <edumazet@...gle.com>
To:     David Miller <davem@...emloft.net>
Cc:     netdev <netdev@...r.kernel.org>, michael.chan@...adcom.com,
        Ariel Elior <ariel.elior@...ium.com>,
        Eric Dumazet <eric.dumazet@...il.com>,
        Tariq Toukan <tariqt@...lanox.com>,
        Saeed Mahameed <saeedm@...lanox.com>,
        Jeff Kirsher <jeffrey.t.kirsher@...el.com>,
        jakub.kicinski@...ronome.com, songliubraving@...com,
        Jay Vosburgh <j.vosburgh@...il.com>,
        Veaceslav Falico <vfalico@...il.com>,
        Andy Gospodarek <andy@...yhouse.net>
Subject: Re: [PATCH net 00/15] netpoll: avoid capture effects for NAPI drivers

On Sun, Sep 23, 2018 at 12:29 PM David Miller <davem@...emloft.net> wrote:
>
> From: Eric Dumazet <edumazet@...gle.com>
> Date: Fri, 21 Sep 2018 15:27:37 -0700
>
> > As diagnosed by Song Liu, ndo_poll_controller() can
> > be very dangerous on loaded hosts, since the cpu
> > calling ndo_poll_controller() might steal all NAPI
> > contexts (for all RX/TX queues of the NIC).
> >
> > This capture, showing one ksoftirqd eating all cycles
> > can last for unlimited amount of time, since one
> > cpu is generally not able to drain all the queues under load.
> >
> > It seems that all networking drivers that do use NAPI
> > for their TX completions, should not provide a ndo_poll_controller() :
> >
> > Most NAPI drivers have netpoll support already handled
> > in core networking stack, since netpoll_poll_dev()
> > uses poll_napi(dev) to iterate through registered
> > NAPI contexts for a device.
>
> I'm having trouble understanding the difference.
>
> If the drivers are processing all of the RX/TX queue draining by hand
> in their ndo_poll_controller() method, how is that different from the
> generic code walking all of the registererd NAPI instances one by one?

(resent in plain text mode this time)

Reading poll_napi() and poll_one_napi() I thought that we were using
NAPI_STATE_NPSVC
and cmpxchg(&napi->poll_owner, -1, cpu) to _temporary_ [1] own each
napi at a time.

But I do see we also have this part at the beginning of poll_one_napi() :

if (!test_bit(NAPI_STATE_SCHED, &napi->state))
      return;

So we probably should remove it. (The normal napi->poll() calls would
not proceed since napi->poll_owner would not be -1)

[1]
While if a cpu succeeds into setting NAPI_STATE_SCHED, it means it has
to own it as long as the
napi->poll() does not call napi_complete_done(), and this can be
forever (the capture effect)

Basically calling napi_schedule() is the dangerous part.

I believe busy_polling and netpoll are the same intruders (as they can
run on arbitrary cpus).
But netpoll is far more problematic since it iterates through all RX/TX queues.