linux-kernel - Re: [Freedreno] [PATCH v2 5/7] arm64: dts: qcom: sc7280: Update gpu register list

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <b6ab023b-601d-1df2-b04b-af5961b73bea@quicinc.com>
Date:   Tue, 19 Jul 2022 15:26:15 +0530
From:   Rajendra Nayak <quic_rjendra@...cinc.com>
To:     Stephen Boyd <swboyd@...omium.org>,
        Akhil P Oommen <quic_akhilpo@...cinc.com>,
        Doug Anderson <dianders@...omium.org>,
        Taniya Das <quic_tdas@...cinc.com>
CC:     <devicetree@...r.kernel.org>, Jonathan Marek <jonathan@...ek.ca>,
        linux-arm-msm <linux-arm-msm@...r.kernel.org>,
        Andy Gross <agross@...nel.org>,
        dri-devel <dri-devel@...ts.freedesktop.org>,
        "Bjorn Andersson" <bjorn.andersson@...aro.org>,
        Rob Herring <robh+dt@...nel.org>,
        Rob Clark <robdclark@...il.com>,
        Matthias Kaehlcke <mka@...omium.org>,
        Krzysztof Kozlowski <krzysztof.kozlowski+dt@...aro.org>,
        Jordan Crouse <jordan@...micpenguin.net>,
        freedreno <freedreno@...ts.freedesktop.org>,
        LKML <linux-kernel@...r.kernel.org>
Subject: Re: [Freedreno] [PATCH v2 5/7] arm64: dts: qcom: sc7280: Update gpu
 register list



On 7/19/2022 12:49 PM, Stephen Boyd wrote:
> Quoting Akhil P Oommen (2022-07-18 23:37:16)
>> On 7/19/2022 11:19 AM, Stephen Boyd wrote:
>>> Quoting Akhil P Oommen (2022-07-18 21:07:05)
>>>> On 7/14/2022 11:10 AM, Akhil P Oommen wrote:
>>>>> IIUC, qcom gdsc driver doesn't ensure hardware is collapsed since they
>>>>> are vote-able switches. Ideally, we should ensure that the hw has
>>>>> collapsed for gpu recovery because there could be transient votes from
>>>>> other subsystems like hypervisor using their vote register.
>>>>>
>>>>> I am not sure how complex the plumbing to gpucc driver would be to allow
>>>>> gpu driver to check hw status. OTOH, with this patch, gpu driver does a
>>>>> read operation on a gpucc register which is in always-on domain. That
>>>>> means we don't need to vote any resource to access this register.
> 
> Reading between the lines here, you're saying that you have to read the
> gdsc register to make sure that the gdsc is in some state? Can you
> clarify exactly what you're doing? And how do you know that something
> else in the kernel can't cause the register to change after it is read?
> It certainly seems like we can't be certain because there is voting
> involved.

yes, this looks like the best case effort to get the gpu to recover, but
the kernel driver really has no control to make sure this condition can
always be met (because it depends on other entities like hyp, trustzone etc right?)
Why not just put a worst case polling delay?

> 
>>>>>
>>>>> Stephen/Rajendra/Taniya, any suggestion?
>>> Why can't you assert a gpu reset signal with the reset APIs? This series
>>> seems to jump through a bunch of hoops to get the gdsc and power domain
>>> to "reset" when I don't know why any of that is necessary. Can't we
>>> simply assert a reset to the hardware after recovery completes so the
>>> device is back into a good known POR (power on reset) state?
>> That is because there is no register interface to reset GPU CX domain.
>> The recommended sequence from HW design folks is to collapse both cx and
>> gx gdsc to properly reset gpu/gmu.
>>
> 
> Ok. One knee jerk reaction is to treat the gdsc as a reset then and
> possibly mux that request along with any power domain on/off so that if
> the reset is requested and the power domain is off nothing happens.
> Otherwise if the power domain is on then it manually sequences and
> controls the two gdscs so that the GPU is reset and then restores the
> enable state of the power domain.