[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20200102102807.dc7yf6choxre2lbg@beryllium.lan>
Date: Thu, 2 Jan 2020 11:28:07 +0100
From: Daniel Wagner <dwagner@...e.de>
To: Ming Lei <ming.lei@...hat.com>
Cc: Peter Zijlstra <peterz@...radead.org>,
Thomas Gleixner <tglx@...utronix.de>,
Jens Axboe <axboe@...nel.dk>, linux-kernel@...r.kernel.org,
linux-block@...r.kernel.org, Long Li <longli@...rosoft.com>,
Ingo Molnar <mingo@...hat.com>, Christoph Hellwig <hch@....de>,
Keith Busch <keith.busch@...el.com>,
Sagi Grimberg <sagi@...mberg.me>,
John Garry <john.garry@...wei.com>,
Hannes Reinecke <hare@...e.com>
Subject: Re: [RFC PATCH 2/3] softirq: implement interrupt flood detection
Hi,
On Tue, Dec 31, 2019 at 11:48:06AM +0800, Ming Lei wrote:
> On Thu, Dec 19, 2019 at 11:43:47AM +0100, Daniel Wagner wrote:
> get_util_irq() only works in case of HAVE_SCHED_AVG_IRQ which depends
> on IRQ_TIME_ACCOUNTING or PARAVIRT_TIME_ACCOUNTING.
>
> Also rq->avg_irq.util_avg is only updated when there is scheduler
> activities. However, when interrupt flood happens, scheduler can't
> have chance to be called. Looks get_util_irq() can't be relied on
> for this task.
I am not totally sold on the idea to do so as much work as possible in
the IRQ context. I started to play with the patches from Keith [1] which
move the work to proper kernel thread.
> > ps: A customer observes the same problem as Ming is reporting.
>
> Actually this issue should be more serious on ARM64 system, in which
> there are more CPU cores, and each CPU core is often slower than
> x86's, and each interrupt is only delivered to single CPU target.
>
> Meantime the storage device performance is same for the two kinds of
> systems.
As it turnes out, we missed one fix 2887e41b910b ("blk-wbt: Avoid lock
contention and thundering herd issue in wbt_wait") in our enterprise
kernel which helps but doesn't solve the real cause. But as I said
moving the work out of the IRQ context will address all those
problems. Obvious there is no free lunch, let's see if we find a way
to address all the performance issues.
Thanks,
Daniel
[1] https://lore.kernel.org/linux-nvme/20191209175622.1964-1-kbusch@kernel.org/
Powered by blists - more mailing lists