netdev - ksoftirqd takes 100% of a core with ixgbe and netconsole (netpoll)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [day] [month] [year] [list]

Message-ID: <23D4A6D4-0956-486D-8B82-C316BB7BF851@fb.com>
Date:   Mon, 10 Sep 2018 20:00:10 +0000
From:   Song Liu <songliubraving@...com>
To:     Networking <netdev@...r.kernel.org>
CC:     "john.r.fastabend@...el.com" <john.r.fastabend@...el.com>,
        "alexander.h.duyck@...el.com" <alexander.h.duyck@...el.com>,
        "jeffrey.t.kirsher@...el.com" <jeffrey.t.kirsher@...el.com>,
        Kernel Team <Kernel-team@...com>
Subject: ksoftirqd takes 100% of a core with ixgbe and netconsole (netpoll)


We are debugging an issue with netconsole and ixgbe, that ksoftirqd takes 100%
of a core. It happens with both current net and net-next.

To reproduce the issue:

  1. Setup server with ixgbe and netconsole. We bind each queue to a separate
     core via smp_affinity;
  2. Start simple netperf job from client, like:
        ./super_netperf 201 -P 0 -t TCP_RR -p 8888 -H <SERVER> -l 7200 -- -r 300,300 -o -s 1M,1M -S 1M,1M
  3. On server, write to /dev/kmsg in a loop (to send netconsole):
        for x in {1..7200} ; do echo aa >> /dev/kmsg ; sleep 1; done
  4. On server, monitor ksoftirqd in top

Within a few minutes, top will show one ksoftirqd take 100% of the core for many
seconds in a row. 

When the ksoftirqd takes 100% of a core, the driver hits "clean_complete=false"
path below, so this napi stays in polling mode. 

        ixgbe_for_each_ring(ring, q_vector->rx) {
                int cleaned = ixgbe_clean_rx_irq(q_vector, ring,
                                                 per_ring_budget);

                work_done += cleaned;
                if (cleaned >= per_ring_budget)
                        clean_complete = false;
        }

        /* If all work not completed, return budget and keep polling */
        if (!clean_complete)
                return budget;

We didn't see this issue on a 4.6 based kernel.

We are still debugging the issue. But we would like to check whether there is
known solution for it. Any comments and suggestions are highly appreciated.

Best,
Song