lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <9903df53-8a84-fe89-7ae0-aac8e6d3f42f@huawei.com>
Date:   Mon, 10 May 2021 11:19:43 +0800
From:   "liaochang (A)" <liaochang1@...wei.com>
To:     Thomas Gleixner <tglx@...utronix.de>,
        xuyihang <xuyihang@...wei.com>, "Ming Lei" <ming.lei@...hat.com>
CC:     Peter Xu <peterx@...hat.com>, Christoph Hellwig <hch@....de>,
        Jason Wang <jasowang@...hat.com>,
        Luiz Capitulino <lcapitulino@...hat.com>,
        "Linux Kernel Mailing List" <linux-kernel@...r.kernel.org>,
        "Michael S. Tsirkin" <mst@...hat.com>, <minlei@...hat.com>
Subject: Re: Virtio-scsi multiqueue irq affinity

Hi Thomas,

在 2021/5/8 20:26, Thomas Gleixner 写道:
> Yihang,
> 
> On Sat, May 08 2021 at 15:52, xuyihang wrote:
>>
>> We are dealing with a scenario which may need to assign a default 
>> irqaffinity for managed IRQ.
>>
>> Assume we have a full CPU usage RT thread running binded to a specific
>> CPU.
>>
>> In the mean while, interrupt handler registered by a device which is
>> ksoftirqd may never have a chance to run. (And we don't want to use
>> isolate CPU)
> 
> A device cannot register and interrupt handler in ksoftirqd.

I learn the scenario further after communicate with Yihang offline:
1.We have a machine with 36 CPUs,and assign several RT threads to last two CPUs(CPU-34, CPU-35).
2.I/O device driver create single managed irq, the affinity of which includes CPU-34 and CPU-35.
3.Another regular application launch I/O operation at different CPUs with the ones RT threads use,
  then CPU-34/35 will receive hardware interrupt and wakeup ksoftirqd to deal with real I/O stuff.
4.Cause the priority and schedule policy of RT thread overwhlem per-cpu ksoftirqd, it looks like
  ksoftirqd has no chance to run at CPU-34/35,which leads to I/O processing can't finish at time,
  and application get stuck.

> 
>> There could be a couple way to deal with this problem:
>>
>> 1. Adjust priority of ksoftirqd or RT thread, so the interrupt handler 
>> could preempt
>>
>> RT thread. However, I am not sure whether it could have some side 
>> effects or not.
>>
>> 2. Adjust interrupt CPU affinity or RT thread affinity. But managed IRQ 
>> seems design to forbid user from manipulating interrupt affinity.
>>
>> It seems managed IRQ is coupled with user side application to me.
>>
>> Would you share your thoughts about this issue please?
> 
> Can you please provide a more detailed description of your system?
> 
>     - Number of CPUs
> 
>     - Kernel version
>     - Is NOHZ full enabled?
>     - Any isolation mechanisms enabled, and if so how are they
>       configured (e.g. on the kernel command line)?
> 
>     - Number of queues in the multiqueue device
>           
>     - Is the RT thread issuing I/O to the multiqueue device?
> 
> Thanks,
> 
>         tglx
> .
> 
BR,
Liao Chang

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ