linux-kernel - Re: [PATCH v2 3/3] drm/panfrost: Synchronize and disable interrupts before powering off

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <5f45485a-fee6-4ab9-9894-6da4491a985c@collabora.com>
Date:   Tue, 28 Nov 2023 17:10:17 +0100
From:   AngeloGioacchino Del Regno 
        <angelogioacchino.delregno@...labora.com>
To:     Boris Brezillon <boris.brezillon@...labora.com>
Cc:     robh@...nel.org, steven.price@....com,
        maarten.lankhorst@...ux.intel.com, mripard@...nel.org,
        tzimmermann@...e.de, airlied@...il.com, daniel@...ll.ch,
        dri-devel@...ts.freedesktop.org, linux-kernel@...r.kernel.org,
        kernel@...labora.com, m.szyprowski@...sung.com,
        krzysztof.kozlowski@...aro.org
Subject: Re: [PATCH v2 3/3] drm/panfrost: Synchronize and disable interrupts
 before powering off

Il 28/11/23 16:53, Boris Brezillon ha scritto:
> On Tue, 28 Nov 2023 16:10:45 +0100
> AngeloGioacchino Del Regno <angelogioacchino.delregno@...labora.com>
> wrote:
> 
>>>>    static void panfrost_job_handle_err(struct panfrost_device *pfdev,
>>>>    				    struct panfrost_job *job,
>>>>    				    unsigned int js)
>>>> @@ -792,9 +800,13 @@ static irqreturn_t panfrost_job_irq_handler_thread(int irq, void *data)
>>>>    	struct panfrost_device *pfdev = data;
>>>>    
>>>>    	panfrost_job_handle_irqs(pfdev);
>>>> -	job_write(pfdev, JOB_INT_MASK,
>>>> -		  GENMASK(16 + NUM_JOB_SLOTS - 1, 16) |
>>>> -		  GENMASK(NUM_JOB_SLOTS - 1, 0));
>>>> +
>>>> +	/* Enable interrupts only if we're not about to get suspended */
>>>> +	if (!test_bit(PANFROST_COMP_BIT_JOB, pfdev->is_suspending))
>>>
>>> The irq-line is requested with IRQF_SHARED, meaning the line might be
>>> shared between all three GPU IRQs, but also with other devices. I think
>>> if we want to be totally safe, we need to also check this is_suspending
>>> field in the hard irq handlers before accessing the xxx_INT_yyy
>>> registers.
>>>    
>>
>> This would mean that we would have to force canceling jobs in the suspend
>> handler, but if the IRQ never fired, would we still be able to find the
>> right bits flipped in JOB_INT_RAWSTAT?
> 
> There should be no jobs left if we enter suspend. If there is, that's a
> bug we should fix, but I'm digressing.
> 
>>
>>   From what I understand, are you suggesting to call, in job_suspend_irq()
>> something like
>>
>> void panfrost_job_suspend_irq(struct panfrost_device *pfdev)
>> {
>>           u32 status;
>>
>> 	set_bit(PANFROST_COMP_BIT_JOB, pfdev->is_suspending);
>>
>> 	job_write(pfdev, JOB_INT_MASK, 0);
>> 	synchronize_irq(pfdev->js->irq);
>>
>> 	status = job_read(pfdev, JOB_INT_STAT);
> 
> I guess you meant _RAWSTAT. _STAT should always be zero after we've
> written 0 to _INT_MASK.
> 

Whoops! Yes, as I wrote up there, I meant _RAWSTAT, sorry! :-)

>> 	if (status)
>> 		panfrost_job_irq_handler_thread(pfdev->js->irq, (void*)pfdev);
> 
> Nope, we don't need to read the STAT reg and forcibly call the threaded
> handler if it's != 0. The synchronize_irq() call should do exactly that
> (make sure all pending interrupts are processed before returning), and
> our previous job_write(pfdev, JOB_INT_MASK, 0) guarantees that no new
> interrupts will kick in after that point.
> 

Unless we synchronize_irq() *before* masking all interrupts (which would be
wrong, as some interrupt could still fire after execution of the ISR), we get
*either of* two scenarios:

  - COMP_BIT_JOB is not set, softirq thread unmasks some interrupts by
    writing to JOB_INT_MASK; or
  - COMP_BIT_JOB is set, hardirq handler returns IRQ_NONE, the threaded
    interrupt handler doesn't get executed, jobs are not canceled.

So if we don't forbicly call the threaded handler if RAWSTAT != 0 in there,
and if the extra check is present in the hardirq handler, and if the hardirq
handler wasn't executed already before our synchronize_irq() call (so: if the
hardirq execution has to be done to synchronize irqs), we are not guaranteeing
that jobs cancellation/dequeuing/removal/whatever-handling is done before
entering suspend.

That, unless the suggestion was to call panfrost_job_handle_irqs() instead of
the handler thread like that (because reading it back, it makes sense to do so).

Cheers!

>> }
>>
>> and then while still retaining the check in the IRQ thread handler, also
>> check it in the hardirq handler like
>>
>> static irqreturn_t panfrost_job_irq_handler(int irq, void *data)
>> {
>> 	struct panfrost_device *pfdev = data;
>> 	u32 status;
>>
>> 	if (!test_bit(PANFROST_COMP_BIT_JOB, pfdev->is_suspending))
>> 		return IRQ_NONE;
> 
> Yes, that's the extra check I was talking about, and that's also the
> very reason I'm suggesting to call this field suspended_irqs instead of
> is_suspending. Ultimately, each bit in this bitmap encodes the status
> of a specific IRQ, not the transition from active-to-suspended,
> otherwise we'd be clearing the bit at the end of
> panfrost_job_suspend_irq(), right after the synchronize_irq(). But if
> we were doing that, our hard IRQ handler could be called because other
> devices raised an interrupt on the very same IRQ line while we are
> suspended, and we'd be doing an invalid GPU reg read while the
> clks/power-domains are off.
> 
>>
>> 	status = job_read(pfdev, JOB_INT_STAT);
>> 	if (!status)
>> 	        return IRQ_NONE;
>>
>> 	job_write(pfdev, JOB_INT_MASK, 0);
>> 	return IRQ_WAKE_THREAD;
>> }
>>
>> (rinse and repeat for panfrost_mmu)
>>
>> ..or am I misunderstanding you?
>>
>> Cheers,
>> Angelo
>>
>>
>