linux-kernel - Re: Kernel-managed IRQ affinity (cont)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20200111024835.GA24575@ming.t460p>
Date:   Sat, 11 Jan 2020 10:48:35 +0800
From:   Ming Lei <ming.lei@...hat.com>
To:     Thomas Gleixner <tglx@...utronix.de>
Cc:     Peter Xu <peterx@...hat.com>, Juri Lelli <juri.lelli@...hat.com>,
        Ming Lei <minlei@...hat.com>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        linux-block@...r.kernel.org
Subject: Re: Kernel-managed IRQ affinity (cont)

Hi Thomas,

On Fri, Jan 10, 2020 at 08:43:14PM +0100, Thomas Gleixner wrote:
> Ming,
> 
> Ming Lei <ming.lei@...hat.com> writes:
> > On Thu, Jan 09, 2020 at 09:02:20PM +0100, Thomas Gleixner wrote:
> >> Ming Lei <ming.lei@...hat.com> writes:
> >>
> >> This is duct tape engineering with absolutely no semantics. I can't even
> >> figure out the intent of this 'managed_irq' parameter.
> >
> > The intent is to isolate the specified CPUs from handling managed
> > interrupt.
> 
> That's what I figured, but it still does not provide semantics and works
> just for specific cases.
> 
> > We can do that. The big problem is that the RT case can't guarantee that
> > IO won't be submitted from isolated CPU always. blk-mq's queue mapping
> > relies on the setup affinity, so un-known behavior(kernel crash, or io
> > hang, or other) may be caused if we exclude isolated CPUs from interrupt
> > affinity.
> >
> > That is why I try to exclude isolated CPUs from interrupt effective affinity,
> > turns out the approach is simple and doable.
> 
> Yes, it's doable. But it still is inconsistent behaviour. Assume the
> following configuration:
> 
>   8 CPUs CPU0,1 assigned for housekeeping
> 
> With 8 queues the proposed change does nothing because each queue is
> mapped to exactly one CPU.

That is expected behavior for this RT case, given userspace won't submit
IO from isolated CPUs.

> 
> With 4 queues you get the following:
> 
>  CPU0,1       queue 0
>  CPU2,3       queue 1
>  CPU4,5       queue 2
>  CPU6,7       queue 3
> 
> No effect on the isolated CPUs either.
> 
> With 2 queues you get the following:
> 
>  CPU0,1,2,3   queue 0
>  CPU4,5,6,7   queue 1
> 
> So here the isolated CPUs 2 and 3 get the isolation, but 4-7
> not. That's perhaps intended, but definitely not documented.

That is intentional change, given no IO will be submitted from 4-7
most of times in RT case, so it is fine to select effective CPU from
isolated CPUs in this case. As peter mentioned, IO may just be submitted
from isolated CPUs during booting. Once the system is setup, no IO
comes from isolated CPUs, then no interrupt is delivered to isolated
CPUs, then meet RT's requirement.

We can document this change somewhere.

> 
> So you really need to make your mind up and describe what the intended
> effect of this is and why you think that the result is correct.

In short, if there is at least one housekeeping available in the
interrupt's affinity, we choose effective CPU from housekeeping CPUs.
Otherwise, keep the current behavior wrt. selecting effective CPU.

With this approach, no interrupts can be delivered to isolated CPUs
if no IOs are submitted from these CPUs.

Please let us know if it addresses your concerns.


Thanks,
Ming