[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <DF4PR84MB01696549697831B4CEC4A3FDAB150@DF4PR84MB0169.NAMPRD84.PROD.OUTLOOK.COM>
Date: Thu, 18 Aug 2016 21:08:18 +0000
From: "Elliott, Robert (Persistent Memory)" <elliott@....com>
To: Sreekanth Reddy <sreekanth.reddy@...adcom.com>,
"linux-scsi@...r.kernel.org" <linux-scsi@...r.kernel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"irqbalance@...ts.infradead.org" <irqbalance@...ts.infradead.org>
CC: Kashyap Desai <kashyap.desai@...adcom.com>,
Sathya Prakash Veerichetty <sathya.prakash@...adcom.com>,
Chaitra Basappa <chaitra.basappa@...adcom.com>,
Suganath Prabu Subramani
<suganath-prabu.subramani@...adcom.com>
Subject: RE: Observing Softlockup's while running heavy IOs
> -----Original Message-----
> From: linux-kernel-owner@...r.kernel.org [mailto:linux-kernel-
> owner@...r.kernel.org] On Behalf Of Sreekanth Reddy
> Sent: Thursday, August 18, 2016 12:56 AM
> Subject: Observing Softlockup's while running heavy IOs
>
> Problem statement:
> Observing softlockups while running heavy IOs on 8 SSD drives
> connected behind our LSI SAS 3004 HBA.
>
...
> Observing a loop in the IO path, i.e only one CPU is busy with
> processing the interrupts and other CPUs (in the affinity_hint mask)
> are busy with sending the IOs (these CPUs are not yet all receiving
> any interrupts). For example, only CPU6 is busy with processing the
> interrupts from IRQ 219 and remaining CPUs i.e CPU 7,8,9,10 & 11 are
> just busy with pumping the IOs and they never processed any IO
> interrupts from IRQ 219. So we are observing softlockups due to
> existence this loop in the IO Path.
>
> We may not observe these softlockups if irqbalancer might have
> balanced the interrupts among the CPUs enabled in the particular
> irq's
> affinity_hint mask. so that all the CPUs are equaly busy with send
> IOs
> and processing the interrupts. I am not sure how irqbalancer balance
> the load among the CPUs, but here I see only one CPU from irq's
> affinity_hint mask is busy with interrupts and remaining CPUs won't
> receive any interrupts from this IRQ.
>
> Please help me with any suggestions/recomendations to slove/limit
> these kind of softlockups. Also please let me known if I have missed
> any setting in the irqbalance.
>
The CPUs need to be forced to self-throttle by processing interrupts for
their own submissions, which reduces the time they can submit more IOs.
See https://lkml.org/lkml/2014/9/9/931 for discussion of this
problem when blk-mq was added.
---
Robert Elliott, HPE Persistent Memory
Powered by blists - more mailing lists