lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20180109133623.10711-2-dima@arista.com>
Date:   Tue,  9 Jan 2018 13:36:22 +0000
From:   Dmitry Safonov <dima@...sta.com>
To:     linux-kernel@...r.kernel.org
Cc:     0x7f454c46@...il.com, Dmitry Safonov <dima@...sta.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        David Miller <davem@...emloft.net>,
        Eric Dumazet <edumazet@...gle.com>,
        Frederic Weisbecker <fweisbec@...il.com>,
        Hannes Frederic Sowa <hannes@...essinduktion.org>,
        Ingo Molnar <mingo@...nel.org>,
        "Levin, Alexander (Sasha Levin)" <alexander.levin@...izon.com>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        Paolo Abeni <pabeni@...hat.com>,
        "Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Radu Rendec <rrendec@...sta.com>,
        Rik van Riel <riel@...hat.com>,
        Stanislaw Gruszka <sgruszka@...hat.com>,
        Thomas Gleixner <tglx@...utronix.de>,
        Wanpeng Li <wanpeng.li@...mail.com>
Subject: [RFC 1/2] softirq: Defer net rx/tx processing to ksoftirqd context

Warning: Not merge-ready

I. Current workflow of ksoftirqd.
  Softirqs are processed in the context of ksoftirqd iff they are
  being raised very frequently. How it works:
  do_softirq() and invoke_softirq() deffer pending softirq iff
  ksoftirqd is in runqueue. Ksoftirqd is scheduled mostly in the
  end of processed softirqs if 2ms were not enough to process all
  pending softirqs.

  Here is pseudo-picture of the workflow (for simplicity on UMP):
  -------------      ------------------      ------------------
  | ksoftirqd |      | User's process |      |   Softirqs     |
  -------------      ------------------      ------------------
   Not scheduled          Running
                             |
                             o------------------------o
                                                      |
                                                __do_softirq()
                                                      |
                                              2ms & softirq pending?
                                              Schedule ksoftirqd
                                                      |
    Scheduled                o------------------------o
                             |
        o--------------------o
        |
     Running             Scheduled
        |
        o--------------------o
                             |
   Not scheduled          Running

   Timegraph for the workflow,
     dash (-) means ksoftirqd not scheduled;
     equal(=) ksoftirqd is scheduled, a softirq may still be pending

                           Pending softirqs
                | | | |           | | | |       |
                v v v v           | | | |       v
   Processing   o-----o           | | | |       o--o
    softirqs    |     |           | | | |       |  |
                |     |           | | | |       |  |
                |     |           | | | |       |  |
   Userspace  o-o     o=========o | | | |  o----o  o---------o
                <-2ms->         | | | | |  |
                                | v v v v  |
   Ksoftirqd                    o----------o

II. Corner-conditions.
  During testing of commit [1] on some non-mainstream driver,
  I've found that due to platform specifics, the IRQ is being
  raised too late (after softirq has been processed).
  In result softirqs steal time from userspace process, leaving
  it starving for CPU time and never/rarely scheduling ksoftirqd:

                          Pending softirqs
               |       |       |       |       |       |
               v       v       v       v       v       v
  Processing   o-----o o-----o o-----o o-----o o-----o o  ...
   softirqs    |     | |     | |     | |     | |     | |
               |     | |     | |     | |     | |     | |
               |     | |     | |     | |     | |     | |
  Userspace  o-o     o-o     o-o     o-o     o-o     o-o  (starving)

  Ksoftirqd                                        (rarely scheduled)

  Afterwards I thought that the same may happen to mainstream
  if PPS rate is selected to raise an IRQ just after previous
  softirq was processed. I managed to reproduce the conjecture,
  see (IV).

III. RFC proposal.
  Firstly, I tried to count all time spent in softirq processing to
  ksoftirqd thread that serves local CPU and add comparison of
  vruntime for ksoftirqd and current task to decide if softirq
  should be delayed. You may imagine what a disgraceful hacks were
  involved. Current RFC has nothing of that kind and relies on
  fair scheduling of ksoftirqd and other tasks.
  To do that we check pending softirqs and serve them on current
  context only if there are non-net softirqs pending.
  The following patch adds a mask to __do_softirq() to process
  net-softirqs only on ksoftirqd context if multiply softirqs
  are pending.

IV. Test results.
  Unfortunately, I wasn't able to test it on hardware with mainstream
  kernel. So, I've only results from Qemu VMs with fedora 26.
  The first VM stresses the second with UDP packages by pktgen.
  The receiver VM is running udp_sink[2] program and prints the
  amount of PPS served.
  Vms have virtio as network cards, have rt priority and are
  assigned to different CPUs on the host.
  Host's CPU is Intel Core i7-7600U @ 2.80GHz.
  RFC definitely needs some testing on the real HW (because I
  don't expect anyone would quite believe VM perf testing) - any
  help with testing it would be appreciated.

   Source |                  Destination
  --------|------------------------------------
          |     master       |      RFC       |
          |   (4.15-rc4)     |                |
  --------|------------------|----------------|
     5000 |      5000.7      |     4999.7     |
  --------|------------------|----------------|
     7000 |      6997.42     |     6995.88    |
  --------|------------------|----------------|
     8000 |      7999.55     |     7999.86    |
  --------|------------------|----------------|
     9000 |      8951.37     |     8986.30    |
  --------|------------------|----------------|
    10000 |      9864.96     |     9972.05    |
  --------|------------------|----------------|
    11000 |     10711.92     |    10976.26    |
  --------|------------------|----------------|
    12000 |     11494.79     |    11962.40    |
  --------|------------------|----------------|
    13000 |     12161.76     |    12946.91    |
  --------|------------------|----------------|
    14000 |     11152.07     |    13942.96    |
  --------|------------------|----------------|
    15000 |      8650.22     |    14878.26    |
  --------|------------------|----------------|
    16000 |      7662.55     |    15880.60    |
  --------|------------------|----------------|
    17000 |      6485.49     |    16814.07    |
  --------|------------------|----------------|
    18000 |      5489.48     |    17679.69    |
  --------|------------------|----------------|
    19000 |      4679.59     |    18543.60    |
  --------|------------------|----------------|
    20000 |      4738.24     |    19233.56    |
  --------|------------------|----------------|
    21000 |      4015.00     |    20247.50    |
  --------|------------------|----------------|
    22000 |      4376.99     |    20654.62    |
  --------|------------------|----------------|
    23000 |      9429.80     |    20925.07    |
  --------|------------------|----------------|
    24000 |      8872.33     |    21336.31    |
  --------|------------------|----------------|
    25000 |     19824.67     |    21486.84    |
  --------|------------------|----------------|
    30000 |     20779.49     |    21487.15    |
  --------|------------------|----------------|
    40000 |     24559.83     |    21452.74    |
  --------|------------------|----------------|
    50000 |     18469.20     |    21191.34    |
  --------|------------------|----------------|
   100000 |     19773.00     |    22592.28    |
  --------|------------------|----------------|

  Note, that I tested in VMs and I've found that if I produce more
  hw irqs on the host, than the results for master are not that
  dramatically bad, but still much worse then with RFC.
  By that reason I have qualms if my test's results are correct.

V. References:
[1] 4cd13c21b207 ("softirq: Let ksoftirqd do its job")
[2] https://github.com/netoptimizer/network-testing/blob/master/src/udp_sink.c

Signed-off-by: Dmitry Safonov <dima@...sta.com>
---
 kernel/softirq.c | 29 ++++++++++++++++++++++++-----
 1 file changed, 24 insertions(+), 5 deletions(-)

diff --git a/kernel/softirq.c b/kernel/softirq.c
index 2f5e87f1bae2..ee48f194dcec 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -88,6 +88,28 @@ static bool ksoftirqd_running(void)
 	return tsk && (tsk->state == TASK_RUNNING);
 }
 
+static bool defer_softirq(void)
+{
+	__u32 pending = local_softirq_pending();
+
+	if (!pending)
+		return true;
+
+	if (ksoftirqd_running())
+		return true;
+
+	/*
+	 * Defer net-rx softirqs to ksoftirqd processing as they may
+	 * make userspace starving cpu time.
+	 */
+	if (pending & (NET_RX_SOFTIRQ | NET_TX_SOFTIRQ)) {
+		wakeup_softirqd();
+		return true;
+	}
+
+	return false;
+}
+
 /*
  * preempt_count and SOFTIRQ_OFFSET usage:
  * - preempt_count is changed by SOFTIRQ_OFFSET on entering or leaving
@@ -315,7 +337,6 @@ asmlinkage __visible void __softirq_entry __do_softirq(void)
 
 asmlinkage __visible void do_softirq(void)
 {
-	__u32 pending;
 	unsigned long flags;
 
 	if (in_interrupt())
@@ -323,9 +344,7 @@ asmlinkage __visible void do_softirq(void)
 
 	local_irq_save(flags);
 
-	pending = local_softirq_pending();
-
-	if (pending && !ksoftirqd_running())
+	if (!defer_softirq())
 		do_softirq_own_stack();
 
 	local_irq_restore(flags);
@@ -352,7 +371,7 @@ void irq_enter(void)
 
 static inline void invoke_softirq(void)
 {
-	if (ksoftirqd_running())
+	if (defer_softirq())
 		return;
 
 	if (!force_irqthreads) {
-- 
2.13.6

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ