lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CY4PR21MB0741091795CEE3D4624977CFCEBA0@CY4PR21MB0741.namprd21.prod.outlook.com>
Date:   Fri, 6 Sep 2019 17:50:49 +0000
From:   Long Li <longli@...rosoft.com>
To:     Keith Busch <kbusch@...nel.org>, Ming Lei <ming.lei@...hat.com>
CC:     Daniel Lezcano <daniel.lezcano@...aro.org>,
        Keith Busch <keith.busch@...el.com>,
        Hannes Reinecke <hare@...e.com>,
        Bart Van Assche <bvanassche@....org>,
        "linux-scsi@...r.kernel.org" <linux-scsi@...r.kernel.org>,
        Peter Zijlstra <peterz@...radead.org>,
        John Garry <john.garry@...wei.com>,
        LKML <linux-kernel@...r.kernel.org>,
        "linux-nvme@...ts.infradead.org" <linux-nvme@...ts.infradead.org>,
        Jens Axboe <axboe@...com>, Ingo Molnar <mingo@...hat.com>,
        Thomas Gleixner <tglx@...utronix.de>,
        Christoph Hellwig <hch@....de>,
        Sagi Grimberg <sagi@...mberg.me>
Subject: RE: [PATCH 1/4] softirq: implement IRQ flood detection mechanism

>Subject: Re: [PATCH 1/4] softirq: implement IRQ flood detection mechanism
>
>On Fri, Sep 06, 2019 at 09:48:21AM +0800, Ming Lei wrote:
>> When one IRQ flood happens on one CPU:
>>
>> 1) softirq handling on this CPU can't make progress
>>
>> 2) kernel thread bound to this CPU can't make progress
>>
>> For example, network may require softirq to xmit packets, or another
>> irq thread for handling keyboards/mice or whatever, or rcu_sched may
>> depend on that CPU for making progress, then the irq flood stalls the
>> whole system.
>>
>> >
>> > AFAIU, there are fast medium where the responses to requests are
>> > faster than the time to process them, right?
>>
>> Usually medium may not be faster than CPU, now we are talking about
>> interrupts, which can be originated from lots of devices concurrently,
>> for example, in Long Li'test, there are 8 NVMe drives involved.
>
>Why are all 8 nvmes sharing the same CPU for interrupt handling?
>Shouldn't matrix_find_best_cpu_managed() handle selecting the least used
>CPU from the cpumask for the effective interrupt handling?

The tests run on 10 NVMe disks on a system of 80 CPUs. Each NVMe disk has 32 hardware queues.
It seems matrix_find_best_cpu_managed() has done its job, but we may still have CPUs that service several hardware queues mapped from other issuing CPUs.
Another thing to consider is that there may be other managed interrupts on the system, so NVMe interrupts may not end up evenly distributed on such a system.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ