lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 31 May 2022 20:31:47 -0700
From:   Stephen Boyd <swboyd@...omium.org>
To:     Abhinav Kumar <quic_abhinavk@...cinc.com>,
        Dmitry Baryshkov <dmitry.baryshkov@...aro.org>,
        Douglas Anderson <dianders@...omium.org>,
        Rob Clark <robdclark@...il.com>
Cc:     AngeloGioacchino Del Regno 
        <angelogioacchino.delregno@...ainline.org>,
        Bjorn Andersson <bjorn.andersson@...aro.org>,
        Daniel Vetter <daniel@...ll.ch>,
        David Airlie <airlied@...ux.ie>,
        Kalyan Thota <quic_kalyant@...cinc.com>,
        Sean Paul <sean@...rly.run>,
        Vinod Polimera <quic_vpolimer@...cinc.com>,
        dri-devel@...ts.freedesktop.org, freedreno@...ts.freedesktop.org,
        linux-arm-msm@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2] drm/msm/dpu: Move min BW request and full BW disable
 back to mdss

Quoting Douglas Anderson (2022-05-31 16:01:26)
> In commit a670ff578f1f ("drm/msm/dpu: always use mdp device to scale
> bandwidth") we fully moved interconnect stuff to the DPU driver. This
> had no change for sc7180 but _did_ have an impact for other SoCs. It
> made them match the sc7180 scheme.
>
> Unfortunately, the sc7180 scheme seems like it was a bit broken.
> Specifically the interconnect needs to be on for more than just the
> DPU driver's AXI bus. In the very least it also needs to be on for the
> DSI driver's AXI bus. This can be seen fairly easily by doing this on
> a ChromeOS sc7180-trogdor class device:
>
>   set_power_policy --ac_screen_dim_delay=5 --ac_screen_off_delay=10
>   sleep 10
>   cd /sys/bus/platform/devices/ae94000.dsi/power
>   echo on > control
>
> When you do that, you'll get a warning splat in the logs about
> "gcc_disp_hf_axi_clk status stuck at 'off'".
>
> One could argue that perhaps what I have done above is "illegal" and
> that it can't happen naturally in the system because in normal system
> usage the DPU is pretty much always on when DSI is on. That being
> said:
> * In official ChromeOS builds (admittedly a 5.4 kernel with backports)
>   we have seen that splat at bootup.
> * Even though we don't use "autosuspend" for these components, we
>   don't use the "put_sync" variants. Thus plausibly the DSI could stay
>   "runtime enabled" past when the DPU is enabled. Techncially we
>   shouldn't do that if the DPU's suspend ends up yanking our clock.
>
> Let's change things such that the "bare minimum" request for the
> interconnect happens in the mdss driver again. That means that all of
> the children can assume that the interconnect is on at the minimum
> bandwidth. We'll then let the DPU request the higher amount that it
> wants.
>
> It should be noted that this isn't as hacky of a solution as it might
> initially appear. Specifically:
> * Since MDSS and DPU individually get their own references to the
>   interconnect then the framework will actually handle aggregating
>   them. The two drivers are _not_ clobbering each other.
> * When the Qualcomm interconnect driver aggregates it takes the max of
>   all the peaks. Thus having MDSS request a peak, as we're doing here,
>   won't actually change the total interconnect bandwidth (it won't be
>   added to the request for the DPU). This perhaps explains why the
>   "average" requested in MDSS was historically 0 since that one
>   _would_ be added in.
>
> NOTE also that in the downstream ChromeOS 5.4 and 5.15 kernels, we're
> also seeing some RPMH hangs that are addressed by this fix. These
> hangs are showing up in the field and on _some_ devices with enough
> stress testing of suspend/resume. Specifically right at suspend time
> with a stack crawl that looks like this (from chromeos-5.15 tree):
>   rpmh_write_batch+0x19c/0x240
>   qcom_icc_bcm_voter_commit+0x210/0x420
>   qcom_icc_set+0x28/0x38
>   apply_constraints+0x70/0xa4
>   icc_set_bw+0x150/0x24c
>   dpu_runtime_resume+0x50/0x1c4
>   pm_generic_runtime_resume+0x30/0x44
>   __genpd_runtime_resume+0x68/0x7c
>   genpd_runtime_resume+0x12c/0x20c
>   __rpm_callback+0x98/0x138
>   rpm_callback+0x30/0x88
>   rpm_resume+0x370/0x4a0
>   __pm_runtime_resume+0x80/0xb0
>   dpu_kms_enable_commit+0x24/0x30
>   msm_atomic_commit_tail+0x12c/0x630
>   commit_tail+0xac/0x150
>   drm_atomic_helper_commit+0x114/0x11c
>   drm_atomic_commit+0x68/0x78
>   drm_atomic_helper_disable_all+0x158/0x1c8
>   drm_atomic_helper_suspend+0xc0/0x1c0
>   drm_mode_config_helper_suspend+0x2c/0x60
>   msm_pm_prepare+0x2c/0x40
>   pm_generic_prepare+0x30/0x44
>   genpd_prepare+0x80/0xd0
>   device_prepare+0x78/0x17c
>   dpm_prepare+0xb0/0x384
>   dpm_suspend_start+0x34/0xc0
>
> We don't completely understand all the mechanisms in play, but the
> hang seemed to come and go with random factors. It's not terribly
> surprising that the hang is gone after this patch since the line of
> code that was failing is no longer present in the kernel.
>
> Fixes: a670ff578f1f ("drm/msm/dpu: always use mdp device to scale bandwidth")
> Fixes: c33b7c0389e1 ("drm/msm/dpu: add support for clk and bw scaling for display")
> Signed-off-by: Douglas Anderson <dianders@...omium.org>
> ---

Reviewed-by: Stephen Boyd <swboyd@...omium.org>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ