linux-kernel - Re: [PATCH] drm/panfrost: Ignore core_mask for poweroff and sync interrupts

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <43cc8641-6a60-41d9-b8f2-32227235702a@collabora.com>
Date:   Thu, 23 Nov 2023 16:14:12 +0100
From:   AngeloGioacchino Del Regno 
        <angelogioacchino.delregno@...labora.com>
To:     Boris Brezillon <boris.brezillon@...labora.com>
Cc:     steven.price@....com, robh@...nel.org,
        maarten.lankhorst@...ux.intel.com, mripard@...nel.org,
        tzimmermann@...e.de, airlied@...il.com, daniel@...ll.ch,
        dri-devel@...ts.freedesktop.org, linux-kernel@...r.kernel.org,
        krzysztof.kozlowski@...aro.org, kernel@...labora.com
Subject: Re: [PATCH] drm/panfrost: Ignore core_mask for poweroff and sync
 interrupts

Il 23/11/23 14:51, Boris Brezillon ha scritto:
> On Thu, 23 Nov 2023 14:24:57 +0100
> AngeloGioacchino Del Regno <angelogioacchino.delregno@...labora.com>
> wrote:
> 
>>>>
>>>> So, while I agree that it'd be slightly more readable as a diff if those
>>>> were two different commits I do have reasons against splitting.....
>>>
>>> If we just need a quick fix to avoid PWRTRANS interrupts from kicking
>>> in when we power-off the cores, I think we'd be better off dropping
>>> GPU_IRQ_POWER_CHANGED[_ALL] from the value we write to GPU_INT_MASK
>>> at [re]initialization time, and then have a separate series that fixes
>>> the problem more generically.
>>>    
>>
>> But that didn't work:
>> https://lore.kernel.org/all/d95259b8-10cf-4ded-866c-47cbd2a44f84@linaro.org/
> 
> I meant, your 'ignore-core_mask' fix + the
> 'drop GPU_IRQ_POWER_CHANGED[_ALL] in GPU_INT_MASK' one.
> 
> So,
> 
> https://lore.kernel.org/all/4c73f67e-174c-497e-85a5-cb053ce657cb@collabora.com/
> +
> https://lore.kernel.org/all/d95259b8-10cf-4ded-866c-47cbd2a44f84@linaro.org/
> 
>>
>>
>> ...while this "full" solution worked:
>> https://lore.kernel.org/all/39e9514b-087c-42eb-8d0e-f75dc620e954@linaro.org/
>>
>> https://lore.kernel.org/all/5b24cc73-23aa-4837-abb9-b6d138b46426@linaro.org/
>>
>>
>> ...so this *is* a "quick fix" already... :-)
> 
> It's a half-baked solution for the missing irq-synchronization-on-suspend
> issue IMHO. I understand why you want it all in one patch that can serve
> as a fix for 123b431f8a5c ("drm/panfrost: Really power off GPU cores in
> panfrost_gpu_power_off()"), which is why I'm suggesting to go for an
> even simpler diff (see below), and then fully address the
> irq-synhronization-on-suspend issue in a follow-up patchset.
> 
> --->8---
> diff --git a/drivers/gpu/drm/panfrost/panfrost_gpu.c b/drivers/gpu/drm/panfrost/panfrost_gpu.c
> index 09f5e1563ebd..6e2d7650cc2b 100644
> --- a/drivers/gpu/drm/panfrost/panfrost_gpu.c
> +++ b/drivers/gpu/drm/panfrost/panfrost_gpu.c
> @@ -78,7 +78,10 @@ int panfrost_gpu_soft_reset(struct panfrost_device *pfdev)
>          }
>   
>          gpu_write(pfdev, GPU_INT_CLEAR, GPU_IRQ_MASK_ALL);
> -       gpu_write(pfdev, GPU_INT_MASK, GPU_IRQ_MASK_ALL);
> +       gpu_write(pfdev, GPU_INT_MASK,
> +                 GPU_IRQ_MASK_ERROR |
> +                 GPU_IRQ_PERFCNT_SAMPLE_COMPLETED |
> +                 GPU_IRQ_CLEAN_CACHES_COMPLETED);
>   

...but if we do that, the next patch(es) will contain a partial revert of this
commit, putting back this to gpu_write(pfdev, GPU_INT_MASK, GPU_IRQ_MASK_ALL)...

I'm not sure that it's worth changing this like that, then changing it back right
after :-\

Anyway, if anyone else agrees with doing it and then partially revert, I have no
issues going with this one instead; what I care about ultimately is resolving the
regression ASAP :-)

Cheers,
Angelo

>          /*
>           * All in-flight jobs should have released their cycle
> @@ -425,11 +428,10 @@ void panfrost_gpu_power_on(struct panfrost_device *pfdev)
>   
>   void panfrost_gpu_power_off(struct panfrost_device *pfdev)
>   {
> -       u64 core_mask = panfrost_get_core_mask(pfdev);
>          int ret;
>          u32 val;
>   
> -       gpu_write(pfdev, SHADER_PWROFF_LO, pfdev->features.shader_present & core_mask);
> +       gpu_write(pfdev, SHADER_PWROFF_LO, pfdev->features.shader_present);
>          ret = readl_relaxed_poll_timeout(pfdev->iomem + SHADER_PWRTRANS_LO,
>                                           val, !val, 1, 1000);
>          if (ret)
> @@ -441,7 +443,7 @@ void panfrost_gpu_power_off(struct panfrost_device *pfdev)
>          if (ret)
>                  dev_err(pfdev->dev, "tiler power transition timeout");
>   
> -       gpu_write(pfdev, L2_PWROFF_LO, pfdev->features.l2_present & core_mask);
> +       gpu_write(pfdev, L2_PWROFF_LO, pfdev->features.l2_present);
>          ret = readl_poll_timeout(pfdev->iomem + L2_PWRTRANS_LO,
>                                   val, !val, 0, 1000);
>          if (ret)
> 
>