[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CY4PR21MB0741BC9992A7A945A0D4A62CCE840@CY4PR21MB0741.namprd21.prod.outlook.com>
Date: Tue, 24 Sep 2019 00:57:39 +0000
From: Long Li <longli@...rosoft.com>
To: Sagi Grimberg <sagi@...mberg.me>, Ming Lei <ming.lei@...hat.com>
CC: Jens Axboe <axboe@...com>, Hannes Reinecke <hare@...e.com>,
John Garry <john.garry@...wei.com>,
Bart Van Assche <bvanassche@....org>,
"linux-scsi@...r.kernel.org" <linux-scsi@...r.kernel.org>,
Peter Zijlstra <peterz@...radead.org>,
Daniel Lezcano <daniel.lezcano@...aro.org>,
LKML <linux-kernel@...r.kernel.org>,
"linux-nvme@...ts.infradead.org" <linux-nvme@...ts.infradead.org>,
Keith Busch <keith.busch@...el.com>,
Ingo Molnar <mingo@...hat.com>,
Thomas Gleixner <tglx@...utronix.de>,
Christoph Hellwig <hch@....de>
Subject: RE: [PATCH 1/4] softirq: implement IRQ flood detection mechanism
>Thanks for the clarification.
>
>The problem with what Ming is proposing in my mind (and its an existing
>problem that exists today), is that nvme is taking precedence over anything
>else until it absolutely cannot hog the cpu in hardirq.
>
>In the thread Ming referenced a case where today if the cpu core has a net
>softirq activity it cannot make forward progress. So with Ming's suggestion,
>net softirq will eventually make progress, but it creates an inherent fairness
>issue. Who said that nvme completions should come faster then the net rx/tx
>or another I/O device (or hrtimers or sched events...)?
>
>As much as I'd like nvme to complete as soon as possible, I might have other
>activities in the system that are as important if not more. So I don't think we
>can solve this with something that is not cooperative or fair with the rest of
>the system.
>
>>> If we are context switching too much, it means the soft-irq operation
>>> is not efficient, not necessarily the fact that the completion path
>>> is running in soft- irq..
>>>
>>> Is your kernel compiled with full preemption or voluntary preemption?
>>
>> The tests are based on Ubuntu 18.04 kernel configuration. Here are the
>parameters:
>>
>> # CONFIG_PREEMPT_NONE is not set
>> CONFIG_PREEMPT_VOLUNTARY=y
>> # CONFIG_PREEMPT is not set
>
>I see, so it still seems that irq_poll_softirq is still not efficient in reaping
>completions. reaping the completions on its own is pretty much the same in
>hard and soft irq, so its really the scheduling part that is creating the overhead
>(which does not exist in hard irq).
>
>Question:
>when you test with without the patch (completions are coming in hard-irq),
>do the fio threads that run on the cpu cores that are assigned to the cores that
>are handling interrupts get substantially lower throughput than the rest of the
>fio threads? I would expect that the fio threads that are running on the first 32
>cores to get very low iops (overpowered by the nvme interrupts) and the rest
>doing much more given that nvme has almost no limits to how much time it
>can spend on processing completions.
>
>If need_resched() is causing us to context switch too aggressively, does
>changing that to local_softirq_pending() make things better?
>--
>diff --git a/lib/irq_poll.c b/lib/irq_poll.c index d8eab563fa77..05d524fcaf04
>100644
>--- a/lib/irq_poll.c
>+++ b/lib/irq_poll.c
>@@ -116,7 +116,7 @@ static void __latent_entropy irq_poll_softirq(struct
>softirq_action *h)
> /*
> * If softirq window is exhausted then punt.
> */
>- if (need_resched())
>+ if (local_softirq_pending())
> break;
> }
>--
>
>Although, this can potentially cause other threads from making forward
>progress.. If it is better, perhaps we also need a time limit as well.
Thanks for this patch. The IOPS was about the same. (it tends to fluctuate more but within 3% variation)
I captured the following from one of the CPUs. All CPUs tend to have similar numbers. The following numbers are captured during 5 seconds and averaged:
Context switches/s:
Without any patch: 5
With the previous patch: 640
With this patch: 522
Process migrated/s:
Without any patch: 0.6
With the previous patch: 104
With this patch: 121
>
>Perhaps we should add statistics/tracing on how many completions we are
>reaping per invocation...
I'll look into a bit more on completion. From the numbers I think the increased number of context switches/migrations are hurting most on performance.
Thanks
Long
Powered by blists - more mailing lists