linux-kernel - Question on threaded handlers for managed interrupts

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-ID: <b8c4be8c-1d67-c16c-570e-d3c883c77ea2@huawei.com>
Date:   Thu, 22 Apr 2021 17:10:49 +0100
From:   John Garry <john.garry@...wei.com>
To:     Thomas Gleixner <tglx@...utronix.de>
CC:     "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Question on threaded handlers for managed interrupts

Hi Thomas,

I am finding that I can pretty easily trigger a system hang for certain 
scenarios with my storage controller.

So I'm getting something like this when running moderately heavy data 
throughput:

Starting 6 processes
[70.656622] sched: RT throttling activatedB/s][r=356k,w=0 IOPS][eta
01h:14m:43s]
[  207.632161] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:ta
01h:12m:26s]
[  207.638261] rcu:  0-...!: (1 GPs behind)
idle=312/1/0x4000000000000000 softirq=508/512 fqs=0
[  207.646777] rcu:  1-...!: (1 GPs behind) idle=694/0/0x0

It ends pretty badly - see [0].

The multi-queue storage controller (see [1] for memory refresh, but note 
that I can also trigger on PCI device host controller as well) is using 
managed interrupts and threaded handlers. Since the threaded handler 
uses SCHED_FIFO, aren't we always vulnerable to this situation with the 
managed interrupt and threaded handler combo? Would the advice be to 
just use irq polling here?

I unsuccessfully tried to trigger the same on NVMe PCI - however I have 
only 1x card, so hardly overloading the system.

Thanks,
John

[0] 
https://lore.kernel.org/rcu/412926e8-d3e1-3071-8cb9-098a7f49b64c@huawei.com/T/#mbd60463c543e04f87090d89301e1a5f10de958dd

[1] 
https://lore.kernel.org/linux-scsi/1606905417-183214-1-git-send-email-john.garry@huawei.com/#t