linux-kernel - Re: [PATCH 1/4] softirq: implement IRQ flood detection mechanism

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20190903063125.GA21022@ming.t460p>
Date:   Tue, 3 Sep 2019 14:31:26 +0800
From:   Ming Lei <ming.lei@...hat.com>
To:     Daniel Lezcano <daniel.lezcano@...aro.org>
Cc:     Thomas Gleixner <tglx@...utronix.de>,
        LKML <linux-kernel@...r.kernel.org>,
        Long Li <longli@...rosoft.com>, Ingo Molnar <mingo@...hat.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Keith Busch <keith.busch@...el.com>, Jens Axboe <axboe@...com>,
        Christoph Hellwig <hch@....de>,
        Sagi Grimberg <sagi@...mberg.me>,
        John Garry <john.garry@...wei.com>,
        Hannes Reinecke <hare@...e.com>,
        linux-nvme@...ts.infradead.org, linux-scsi@...r.kernel.org
Subject: Re: [PATCH 1/4] softirq: implement IRQ flood detection mechanism

Hi Daniel,

On Tue, Sep 03, 2019 at 07:59:39AM +0200, Daniel Lezcano wrote:
> 
> Hi Ming Lei,
> 
> On 03/09/2019 05:30, Ming Lei wrote:
> 
> [ ... ]
> 
> 
> >>> 2) irq/timing doesn't cover softirq
> >>
> >> That's solvable, right?
> > 
> > Yeah, we can extend irq/timing, but ugly for irq/timing, since irq/timing
> > focuses on hardirq predication, and softirq isn't involved in that
> > purpose.
> > 
> >>  
> >>> Daniel, could you take a look and see if irq flood detection can be
> >>> implemented easily by irq/timing.c?
> >>
> >> I assume you can take a look as well, right?
> > 
> > Yeah, I have looked at the code for a while, but I think that irq/timing
> > could become complicated unnecessarily for covering irq flood detection,
> > meantime it is much less efficient for detecting IRQ flood.
> 
> In the series, there is nothing describing rigorously the problem (I can
> only guess) and why the proposed solution solves it.
> 
> What is your definition of an 'irq flood'? A high irq load? An irq
> arriving while we are processing the previous one in the bottom halves?

So far, it means that handling interrupt & softirq takes all utilization
of one CPU, then processes can't be run on this CPU basically, usually
sort of CPU lockup warning will be triggered.

> 
> The patch 2/4 description says "however IO completion is only done on
> one of these submission CPU cores". That describes the bottleneck and
> then the patch says "Add IRQF_RESCUE_THREAD to create one interrupt
> thread handler", what is the rational between the bottleneck (problem)
> and the irqf_rescue_thread (solution)?

The solution is to switch to handle this interrupt on the created rescue
irq thread context when irq flood is detected, and 'this interrupt' means
the interrupt requested with IRQF_RESCUE_THREAD.

> 
> Is it really the solution to track the irq timings to detect a flood?

The solution tracks the time taken on running do_IRQ() for each CPU.


Thanks,
Ming