[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <01b7091e69d2b4e51f280b05f6218a65@kernel.org>
Date: Thu, 13 Aug 2020 11:08:54 +0100
From: Marc Zyngier <maz@...nel.org>
To: Jason Liu <jason.hui.liu@....com>
Cc: Sudeep Holla <sudeep.holla@....com>, catalin.marinas@....com,
will@...nel.org, sashal@...nel.org, linux-kernel@...r.kernel.org,
linux-arm-kernel@...ts.infradead.org
Subject: Re: [PATCH 1/1] arm64: kexec: no need to do irq_chip->irq_mask if it
already masked
On 2020-08-13 07:03, Jason Liu wrote:
>> -----Original Message-----
>> From: Marc Zyngier <maz@...nel.org>
>> Sent: Thursday, August 6, 2020 8:26 PM
>> To: Jason Liu <jason.hui.liu@....com>
>> Cc: Sudeep Holla <sudeep.holla@....com>; catalin.marinas@....com;
>> will@...nel.org; sashal@...nel.org; linux-kernel@...r.kernel.org;
>> linux-arm-kernel@...ts.infradead.org
>> Subject: Re: [PATCH 1/1] arm64: kexec: no need to do
>> irq_chip->irq_mask if it
>> already masked
>>
>> On 2020-08-06 11:05, Jason Liu wrote:
>> >> -----Original Message-----
>> >> From: Marc Zyngier <maz@...nel.org>
>>
>> [...]
>>
>> >> > No, this patch is not papering over a much deeper issue in the driver.
>> >> > This is just to make things better for the ARM64 kexec.
>> >>
>> >> Yes, I'm sure it is... However:
>> >>
>> >> request_irq()
>> >> <goes into suspend, panic somewhere after having turned the irqchip
>> >> clock off> if (chip->irq_mask && !irqd_irq_masked(&desc->irq_data))
>> >> <explodes, as the interrupt isn't masked>
>> >>
>> >> This is because the PM in the irqsteer driver is completely busted:
>> >> request_irq() should get a reference on the driver to prevent it from
>> >> being suspended. Since you don't implement it correctly, this doesn't
>> >> happen and your "improvement" doesn't help at all.
>> >
>> > The request_irq will get a reference to prevent the irqchip from being
>> > suspended due to it call irq_chip_pm_get(). I am pretty sure we have
>> > implemented correctly and that is also the common Linux code.
>>
>> Then it seems you cannot read your own driver. At no point do you set
>> the
>> parent_device that would give you a fighting chance to get the device
>> clocked
>> and powered on. How does it work? Magic?
>>
>> > In order to save power and let the irqchip enter into runtime SUSPEND
>> > mode, the driver will call free_irq() When it was not used(idle). Then
>> > free_irq() will decrease the reference and let the irqchip enter
>> > suspend state.
>>
>> The reference count on *what*? There is nothing to take a reference
>> on. So yes,
>> you understand how the core kernel works. But you don't seem to notice
>> that
>> there is no link between the irq and the device that implements the
>> controller.
>
> See the code, it will call irq_chip_pm_put(&desc->irq_data)
>
> /*
> * Internal function to unregister an irqaction - used to free
> * regular and special interrupts that are part of the architecture.
> */
> static struct irqaction *__free_irq(struct irq_desc *desc, void
> *dev_id)
> {
> ..
> irq_chip_pm_put(&desc->irq_data);
> module_put(desc->owner);
> kfree(action->secondary);
> return action;
> }
This is getting tiresome. You want to play the code-quote game?
int irq_chip_pm_put(struct irq_data *data)
{
int retval = 0;
if (IS_ENABLED(CONFIG_PM) && data->chip->parent_device)
retval = pm_runtime_put(data->chip->parent_device);
return (retval < 0) ? retval : 0;
}
What is parent_device set to in your driver? Oh wait... Nothing.
So what does the code you quoted do? Not much.
>> > So, when the irqchip entered suspend, which means there is no user for
>> > the irqchip and all the irqs were DISABLED + MASKED.
>> > Due to the runtimePM support for the irqchip, when kexec runs, it will
>> > sometimes meet the situation that the irqchip is suspend due to no
>> > users for it. So from either the performance(time cost) or coding
>> > logic, the machine_kexec_mask_interrupts() should not do double mask
>> > for the irqs which already masked.
>>
>> I strongly suggest you start by fixing the damn driver first.
>
> Our driver does not have issue at all. What to fix?
[I've censored myself here...]
>
>>
>> In the meantime, NAK to this patch.
>
> Anyway, it seems don't really understand this issue and you just
> simply reject one reasonable fix. Sounds not good at all.
I reject it because your approach is flawed, and that you are
papering over a glaring bug in your driver that you are refusing
to fix.
Maybe the right thing to do is to remove this driver from the code
base altogether. I will prepare a patch to that effect.
M.
--
Jazz is not dead. It just smells funny...
Powered by blists - more mailing lists