[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1e4496c263e486be3438f2797630164a@kernel.org>
Date: Thu, 06 Aug 2020 13:25:35 +0100
From: Marc Zyngier <maz@...nel.org>
To: Jason Liu <jason.hui.liu@....com>
Cc: Sudeep Holla <sudeep.holla@....com>, catalin.marinas@....com,
will@...nel.org, sashal@...nel.org, linux-kernel@...r.kernel.org,
linux-arm-kernel@...ts.infradead.org
Subject: Re: [PATCH 1/1] arm64: kexec: no need to do irq_chip->irq_mask if it
already masked
On 2020-08-06 11:05, Jason Liu wrote:
>> -----Original Message-----
>> From: Marc Zyngier <maz@...nel.org>
[...]
>> > No, this patch is not papering over a much deeper issue in the driver.
>> > This is just to make things better for the ARM64 kexec.
>>
>> Yes, I'm sure it is... However:
>>
>> request_irq()
>> <goes into suspend, panic somewhere after having turned the irqchip
>> clock off>
>> if (chip->irq_mask && !irqd_irq_masked(&desc->irq_data))
>> <explodes, as the interrupt isn't masked>
>>
>> This is because the PM in the irqsteer driver is completely busted:
>> request_irq() should get a reference on the driver to prevent it from
>> being
>> suspended. Since you don't implement it correctly, this doesn't happen
>> and
>> your "improvement" doesn't help at all.
>
> The request_irq will get a reference to prevent the irqchip from being
> suspended due to it call
> irq_chip_pm_get(). I am pretty sure we have implemented correctly and
> that is also the common Linux code.
Then it seems you cannot read your own driver. At no point do you
set the parent_device that would give you a fighting chance to
get the device clocked and powered on. How does it work? Magic?
> In order to save power and let the irqchip enter into runtime SUSPEND
> mode, the driver will call free_irq()
> When it was not used(idle). Then free_irq() will decrease the
> reference and let the irqchip enter suspend state.
The reference count on *what*? There is nothing to take a reference
on. So yes, you understand how the core kernel works. But you don't
seem to notice that there is no link between the irq and the device
that implements the controller.
> So, when the irqchip entered suspend, which means there is no user for
> the irqchip and all the irqs were DISABLED + MASKED.
> Due to the runtimePM support for the irqchip, when kexec runs, it will
> sometimes meet the situation that the irqchip is suspend due to
> no users for it. So from either the performance(time cost) or coding
> logic, the machine_kexec_mask_interrupts() should not do
> double mask for the irqs which already masked.
I strongly suggest you start by fixing the damn driver first.
In the meantime, NAK to this patch.
M.
--
Jazz is not dead. It just smells funny...
Powered by blists - more mailing lists