lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <2C4F431F-8140-4C82-B4BD-E51DE618FC08@amazon.com>
Date:   Fri, 29 May 2020 12:36:42 +0000
From:   "Saidi, Ali" <alisaidi@...zon.com>
To:     Marc Zyngier <maz@...nel.org>
CC:     Thomas Gleixner <tglx@...utronix.de>,
        Jason Cooper <jason@...edaemon.net>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "linux-arm-kernel@...ts.infradead.org" 
        <linux-arm-kernel@...ts.infradead.org>,
        "Herrenschmidt, Benjamin" <benh@...zon.com>,
        "Woodhouse, David" <dwmw@...zon.co.uk>,
        "Zilberman, Zeev" <zeev@...zon.com>,
        "Machulsky, Zorik" <zorik@...zon.com>
Subject: Re: [PATCH] irqchip/gic-v3-its: Don't try to move a disabled irq

Hi Marc,

> On May 29, 2020, at 3:33 AM, Marc Zyngier <maz@...nel.org> wrote:
> 
> Hi Ali,
> 
>> On 2020-05-29 02:55, Ali Saidi wrote:
>> If an interrupt is disabled the ITS driver has sent a discard removing
>> the DeviceID and EventID from the ITT. After this occurs it can't be
>> moved to another collection with a MOVI and a command error occurs if
>> attempted. Before issuing the MOVI command make sure that the IRQ isn't
>> disabled and change the activate code to try and use the previous
>> affinity.
>> 
>> Signed-off-by: Ali Saidi <alisaidi@...zon.com>
>> ---
>> drivers/irqchip/irq-gic-v3-its.c | 18 +++++++++++++++---
>> 1 file changed, 15 insertions(+), 3 deletions(-)
>> 
>> diff --git a/drivers/irqchip/irq-gic-v3-its.c
>> b/drivers/irqchip/irq-gic-v3-its.c
>> index 124251b0ccba..1235dd9a2fb2 100644
>> --- a/drivers/irqchip/irq-gic-v3-its.c
>> +++ b/drivers/irqchip/irq-gic-v3-its.c
>> @@ -1540,7 +1540,11 @@ static int its_set_affinity(struct irq_data *d,
>> const struct cpumask *mask_val,
>>      /* don't set the affinity when the target cpu is same as current one
>> */
>>      if (cpu != its_dev->event_map.col_map[id]) {
>>              target_col = &its_dev->its->collections[cpu];
>> -             its_send_movi(its_dev, target_col, id);
>> +
>> +             /* If the IRQ is disabled a discard was sent so don't move */
>> +             if (!irqd_irq_disabled(d))
>> +                     its_send_movi(its_dev, target_col, id);
>> +
> 
> This looks wrong. What you are testing here is whether the interrupt
> is masked, not that there isn't a valid translation.
I’m not exactly sure the correct condition, but what I’m looking for is interrupts which are deactivated and we have thus sent a discard. 

> 
> In the commit message, you're saying that we've issued a discard. This
> hints at doing a set_affinity on an interrupt that has been deactivated
> (mapping removed). Is that actually the case? If so, why was it
> deactivated
> the first place?
This is the case. If we down a NIC, that interface’s MSIs will be deactivated but remain allocated until the device is unbound from the driver or the NIC is brought up. 

While stressing down/up a device I’ve found that irqbalance can move interrupts and you end up with the situation described. The device is downed, the interrupts are deactivated but still present and then trying to move one results in sending a MOVI after the DISCARD which is an error per the GIC spec. 

> 
>>              its_dev->event_map.col_map[id] = cpu;
>>              irq_data_update_effective_affinity(d, cpumask_of(cpu));
>>      }
>> @@ -3439,8 +3443,16 @@ static int its_irq_domain_activate(struct
>> irq_domain *domain,
>>      if (its_dev->its->numa_node >= 0)
>>              cpu_mask = cpumask_of_node(its_dev->its->numa_node);
>> 
>> -     /* Bind the LPI to the first possible CPU */
>> -     cpu = cpumask_first_and(cpu_mask, cpu_online_mask);
>> +     /* If the cpu set to a different CPU that is still online use it */
>> +     cpu = its_dev->event_map.col_map[event];
>> +
>> +     cpumask_and(cpu_mask, cpu_mask, cpu_online_mask);
>> +
>> +     if (!cpumask_test_cpu(cpu, cpu_mask)) {
>> +             /* Bind the LPI to the first possible CPU */
>> +             cpu = cpumask_first(cpu_mask);
>> +     }
>> +
>>      if (cpu >= nr_cpu_ids) {
>>              if (its_dev->its->flags & ITS_FLAGS_WORKAROUND_CAVIUM_23144)
>>                      return -EINVAL;
> 
> So you deactivate an interrupt, do a set_affinity that doesn't issue
> a MOVI but preserves the affinity, then reactivate it and hope that
> the new mapping will target the "right" CPU.
> 
> That seems a bit mad, but I presume this isn't the whole story...
Doing some experiments it appears as though other interrupts controllers do preserve affinity across deactivate/activate, so this is my attempt at doing the same. 

Thanks,
Ali

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ