linux-kernel - Re: [PATCH] drm/panfrost: Really power off GPU cores in panfrost_gpu_power

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <37d373e1-8850-4ab2-8fdb-6b069e2d6976@samsung.com>
Date:   Fri, 24 Nov 2023 13:45:32 +0100
From:   Marek Szyprowski <m.szyprowski@...sung.com>
To:     Krzysztof Kozlowski <krzysztof.kozlowski@...aro.org>,
        AngeloGioacchino Del Regno 
        <angelogioacchino.delregno@...labora.com>,
        Boris Brezillon <boris.brezillon@...labora.com>
Cc:     Steven Price <steven.price@....com>, tzimmermann@...e.de,
        linux-kernel@...r.kernel.org, mripard@...nel.org,
        dri-devel@...ts.freedesktop.org, wenst@...omium.org,
        kernel@...labora.com,
        "linux-samsung-soc@...r.kernel.org" 
        <linux-samsung-soc@...r.kernel.org>
Subject: Re: [PATCH] drm/panfrost: Really power off GPU cores in
 panfrost_gpu_power_off()

On 22.11.2023 10:29, Krzysztof Kozlowski wrote:
> On 22/11/2023 10:06, AngeloGioacchino Del Regno wrote:
>>>>> Hey Krzysztof,
>>>>>
>>>>> This is interesting. It might be about the cores that are missing from the partial
>>>>> core_mask raising interrupts, but an external abort on non-linefetch is strange to
>>>>> see here.
>>>> I've seen such external aborts in the past, and the fault type has
>>>> often been misleading. It's unlikely to have anything to do with a
>>> Yeah, often accessing device with power or clocks gated.
>>>
>> Except my commit does *not* gate SoC power, nor SoC clocks 🙂
> It could be that something (like clocks or power supplies) was missing
> on this board/SoC, which was not critical till your patch came.
>
>> What the "Really power off ..." commit does is to ask the GPU to internally power
>> off the shaders, tilers and L2, that's why I say that it is strange to see that
>> kind of abort.
>>
>> The GPU_INT_CLEAR GPU_INT_STAT, GPU_FAULT_STATUS and GPU_FAULT_ADDRESS_{HI/LO}
>> registers should still be accessible even with shaders, tilers and cache OFF.
>>
>> Anyway, yes, synchronizing IRQs before calling the poweroff sequence would also
>> work, but that'd add up quite a bit of latency on the runtime_suspend() call, so
>> in this case I'd be more for avoiding to execute any register r/w in the handler
>> by either checking if the GPU is supposed to be OFF, or clearing interrupts, which
>> may not work if those are generated after the execution of the poweroff function.
>> Or we could simply disable the irq after power_off, but that'd be hacky (as well).
>>
>>
>> Let's see if asking to poweroff *everything* works:
> Worked.

Yes, I also got into this issue some time ago, but I didn't report it 
because I also had some power supply related problems on my test farm 
and everything was a bit unstable. I wasn't 100% sure that the $subject 
patch is responsible for the observed issues. Now, after fixing power 
supply, I confirm that the issue was revealed by the $subject patch and 
above mentioned change fixes the problem. Feel free to add:

Tested-by: Marek Szyprowski <m.szyprowski@...sung.com>

Best regards
-- 
Marek Szyprowski, PhD
Samsung R&D Institute Poland