lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Tue, 24 Sep 2019 00:57:39 +0000 From: Long Li <longli@...rosoft.com> To: Sagi Grimberg <sagi@...mberg.me>, Ming Lei <ming.lei@...hat.com> CC: Jens Axboe <axboe@...com>, Hannes Reinecke <hare@...e.com>, John Garry <john.garry@...wei.com>, Bart Van Assche <bvanassche@....org>, "linux-scsi@...r.kernel.org" <linux-scsi@...r.kernel.org>, Peter Zijlstra <peterz@...radead.org>, Daniel Lezcano <daniel.lezcano@...aro.org>, LKML <linux-kernel@...r.kernel.org>, "linux-nvme@...ts.infradead.org" <linux-nvme@...ts.infradead.org>, Keith Busch <keith.busch@...el.com>, Ingo Molnar <mingo@...hat.com>, Thomas Gleixner <tglx@...utronix.de>, Christoph Hellwig <hch@....de> Subject: RE: [PATCH 1/4] softirq: implement IRQ flood detection mechanism >Thanks for the clarification. > >The problem with what Ming is proposing in my mind (and its an existing >problem that exists today), is that nvme is taking precedence over anything >else until it absolutely cannot hog the cpu in hardirq. > >In the thread Ming referenced a case where today if the cpu core has a net >softirq activity it cannot make forward progress. So with Ming's suggestion, >net softirq will eventually make progress, but it creates an inherent fairness >issue. Who said that nvme completions should come faster then the net rx/tx >or another I/O device (or hrtimers or sched events...)? > >As much as I'd like nvme to complete as soon as possible, I might have other >activities in the system that are as important if not more. So I don't think we >can solve this with something that is not cooperative or fair with the rest of >the system. > >>> If we are context switching too much, it means the soft-irq operation >>> is not efficient, not necessarily the fact that the completion path >>> is running in soft- irq.. >>> >>> Is your kernel compiled with full preemption or voluntary preemption? >> >> The tests are based on Ubuntu 18.04 kernel configuration. Here are the >parameters: >> >> # CONFIG_PREEMPT_NONE is not set >> CONFIG_PREEMPT_VOLUNTARY=y >> # CONFIG_PREEMPT is not set > >I see, so it still seems that irq_poll_softirq is still not efficient in reaping >completions. reaping the completions on its own is pretty much the same in >hard and soft irq, so its really the scheduling part that is creating the overhead >(which does not exist in hard irq). > >Question: >when you test with without the patch (completions are coming in hard-irq), >do the fio threads that run on the cpu cores that are assigned to the cores that >are handling interrupts get substantially lower throughput than the rest of the >fio threads? I would expect that the fio threads that are running on the first 32 >cores to get very low iops (overpowered by the nvme interrupts) and the rest >doing much more given that nvme has almost no limits to how much time it >can spend on processing completions. > >If need_resched() is causing us to context switch too aggressively, does >changing that to local_softirq_pending() make things better? >-- >diff --git a/lib/irq_poll.c b/lib/irq_poll.c index d8eab563fa77..05d524fcaf04 >100644 >--- a/lib/irq_poll.c >+++ b/lib/irq_poll.c >@@ -116,7 +116,7 @@ static void __latent_entropy irq_poll_softirq(struct >softirq_action *h) > /* > * If softirq window is exhausted then punt. > */ >- if (need_resched()) >+ if (local_softirq_pending()) > break; > } >-- > >Although, this can potentially cause other threads from making forward >progress.. If it is better, perhaps we also need a time limit as well. Thanks for this patch. The IOPS was about the same. (it tends to fluctuate more but within 3% variation) I captured the following from one of the CPUs. All CPUs tend to have similar numbers. The following numbers are captured during 5 seconds and averaged: Context switches/s: Without any patch: 5 With the previous patch: 640 With this patch: 522 Process migrated/s: Without any patch: 0.6 With the previous patch: 104 With this patch: 121 > >Perhaps we should add statistics/tracing on how many completions we are >reaping per invocation... I'll look into a bit more on completion. From the numbers I think the increased number of context switches/migrations are hurting most on performance. Thanks Long
Powered by blists - more mailing lists