linux-kernel - Re: [PATCH 15/21] drm/msm/dpu: Configure CWB in writeback encoder

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAA8EJpp1tcxKzftYot9td7_dBQwCLai_zWCB_AWYbZreAJbM6Q@mail.gmail.com>
Date: Sat, 31 Aug 2024 01:57:32 +0300
From: Dmitry Baryshkov <dmitry.baryshkov@...aro.org>
To: Jessica Zhang <quic_jesszhan@...cinc.com>
Cc: Rob Clark <robdclark@...il.com>, quic_abhinavk@...cinc.com, 
	Sean Paul <sean@...rly.run>, Marijn Suijten <marijn.suijten@...ainline.org>, 
	David Airlie <airlied@...il.com>, Daniel Vetter <daniel@...ll.ch>, 
	Maarten Lankhorst <maarten.lankhorst@...ux.intel.com>, Maxime Ripard <mripard@...nel.org>, 
	Thomas Zimmermann <tzimmermann@...e.de>, quic_ebharadw@...cinc.com, 
	linux-arm-msm@...r.kernel.org, dri-devel@...ts.freedesktop.org, 
	freedreno@...ts.freedesktop.org, linux-kernel@...r.kernel.org, 
	Rob Clark <robdclark@...omium.org>
Subject: Re: [PATCH 15/21] drm/msm/dpu: Configure CWB in writeback encoder

On Fri, 30 Aug 2024 at 23:28, Jessica Zhang <quic_jesszhan@...cinc.com> wrote:
>
>
>
> On 8/30/2024 10:47 AM, Dmitry Baryshkov wrote:
> > On Thu, Aug 29, 2024 at 01:48:36PM GMT, Jessica Zhang wrote:
> >> Cache the CWB block mask in the DPU virtual encoder and configure CWB
> >> according to the CWB block mask within the writeback phys encoder
> >>
> >> Signed-off-by: Jessica Zhang <quic_jesszhan@...cinc.com>
> >> ---
> >>   drivers/gpu/drm/msm/disp/dpu1/dpu_crtc.c           | 29 +++++++++++++++
> >>   drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.c        | 43 ++++++++++++++++++++++
> >>   drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.h        |  9 ++++-
> >>   drivers/gpu/drm/msm/disp/dpu1/dpu_encoder_phys.h   | 18 ++++++++-
> >>   .../gpu/drm/msm/disp/dpu1/dpu_encoder_phys_wb.c    | 43 +++++++++++++++++++++-
> >>   5 files changed, 139 insertions(+), 3 deletions(-)
> >>
> >> diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_crtc.c b/drivers/gpu/drm/msm/disp/dpu1/dpu_crtc.c
> >> index 99eaaca405a4..c8ef59af444c 100644
> >> --- a/drivers/gpu/drm/msm/disp/dpu1/dpu_crtc.c
> >> +++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_crtc.c
> >> @@ -1236,6 +1236,33 @@ static bool dpu_crtc_has_valid_clones(struct drm_crtc *crtc,
> >>      return true;
> >>   }
> >>
> >> +static void dpu_crtc_set_encoder_cwb_mask(struct drm_crtc *crtc,
> >> +            struct drm_crtc_state *crtc_state)
> >> +{
> >> +    struct drm_encoder *drm_enc;
> >> +    struct dpu_crtc_state *cstate = to_dpu_crtc_state(crtc_state);
> >> +    int cwb_idx = 0;
> >> +    u32 cwb_mask = 0;
> >> +
> >> +    /*
> >> +     * Since there can only be one CWB session at a time, if the CRTC LM
> >> +     * starts with an even index we start with CWB_0. If the LM index is
> >> +     * odd, we start with CWB_1
> >
> > I'd prefer to get indices from RM. They can be set during mode_set /
> > atomic_check calls.
>
> Hi Dmitry,
>
> Do you mean having CWB being managed by RM or using rm's
> get_assigned_blks() to grab the LM indices?

I was thinking of using either get_assigned_blks() or a new
CWB-specific API. This way such odd-even requirements get abstracted
in the RM.

> >> +     */
> >> +    if (cstate->mixers[0].hw_lm)
> >> +            cwb_idx = (cstate->mixers[0].hw_lm->idx - LM_0) % 2;
> >> +
> >> +    if (drm_crtc_in_clone_mode(crtc_state)) {
> >> +            for (int i = 0; i < cstate->num_mixers; i++) {
> >> +                    cwb_mask |= (1 << cwb_idx);
> >> +                    cwb_idx++;
> >> +            }
> >> +    }
> >> +
> >> +    drm_for_each_encoder_mask(drm_enc, crtc->dev, crtc_state->encoder_mask)
> >> +            dpu_encoder_set_cwb_mask(drm_enc, cwb_mask);
> >
> > Ugh, no. This function writes to dpu_enc, however there is being called
> > from atomic_check(). You can not write non-state variables here as the
> > commit can get completely dropped.
>
> Ack -- will change this API to `dpu_crtc_get_cwb_mask()` and call it
> from dpu_encoder_virt_mode_set() instead.

LGTM.

>
> >
> >> +}
> >> +
> >>   static int dpu_crtc_assign_resources(struct drm_crtc *crtc, struct drm_crtc_state *crtc_state)
> >>   {
> >>      struct dpu_hw_blk *hw_ctl[MAX_CHANNELS_PER_CRTC];
> >> @@ -1329,6 +1356,8 @@ static int dpu_crtc_atomic_check(struct drm_crtc *crtc,
> >>                      !dpu_crtc_has_valid_clones(crtc, crtc_state))
> >>              return -EINVAL;
> >>
> >> +    dpu_crtc_set_encoder_cwb_mask(crtc, crtc_state);
> >> +
> >>      if (!crtc_state->enable || !drm_atomic_crtc_effectively_active(crtc_state)) {
> >>              DRM_DEBUG_ATOMIC("crtc%d -> enable %d, active %d, skip atomic_check\n",
> >>                              crtc->base.id, crtc_state->enable,
> >> diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.c b/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.c
> >> index f1bd14d1f89e..0f8f6c0182d5 100644
> >> --- a/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.c
> >> +++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.c
> >> @@ -162,6 +162,7 @@ enum dpu_enc_rc_states {
> >>    *                         clks and resources after IDLE_TIMEOUT time.
> >>    * @topology:                   topology of the display
> >>    * @idle_timeout:          idle timeout duration in milliseconds
> >> + * @cwb_mask:                       current encoder is in clone mode
> >>    * @wide_bus_en:           wide bus is enabled on this interface
> >>    * @dsc:                   drm_dsc_config pointer, for DSC-enabled encoders
> >>    */
> >> @@ -202,6 +203,7 @@ struct dpu_encoder_virt {
> >>      struct msm_display_topology topology;
> >>
> >>      u32 idle_timeout;
> >> +    u32 cwb_mask;
> >>
> >>      bool wide_bus_en;
> >>
> >> @@ -638,6 +640,33 @@ bool dpu_encoder_needs_modeset(struct drm_encoder *drm_enc, struct drm_atomic_st
> >>      return false;
> >>   }
> >>
> >> +void dpu_encoder_set_cwb_mask(struct drm_encoder *enc, u32 cwb_mask)
> >> +{
> >> +    struct dpu_encoder_virt *dpu_enc = to_dpu_encoder_virt(enc);
> >> +
> >> +    dpu_enc->cwb_mask = cwb_mask;
> >> +}
> >> +
> >> +u32 dpu_encoder_get_cwb_mask(struct drm_encoder *enc)
> >> +{
> >> +    struct dpu_encoder_virt *dpu_enc = to_dpu_encoder_virt(enc);
> >> +
> >> +    if (!dpu_enc)
> >> +            return 0;
> >> +
> >> +    return dpu_enc->cwb_mask;
> >> +}
> >> +
> >> +bool dpu_encoder_in_clone_mode(struct drm_encoder *enc)
> >> +{
> >> +    struct dpu_encoder_virt *dpu_enc = to_dpu_encoder_virt(enc);
> >> +
> >> +    if (!dpu_enc)
> >> +            return 0;
> >> +
> >> +    return dpu_enc->cwb_mask != 0;
> >> +}
> >> +
> >>   struct drm_dsc_config *dpu_encoder_get_dsc_config(struct drm_encoder *drm_enc)
> >>   {
> >>      struct msm_drm_private *priv = drm_enc->dev->dev_private;
> >> @@ -2019,6 +2048,7 @@ void dpu_encoder_helper_phys_cleanup(struct dpu_encoder_phys *phys_enc)
> >>      struct dpu_hw_ctl *ctl = phys_enc->hw_ctl;
> >>      struct dpu_hw_intf_cfg intf_cfg = { 0 };
> >>      int i;
> >> +    enum dpu_cwb cwb_idx;
> >>      struct dpu_encoder_virt *dpu_enc;
> >>
> >>      dpu_enc = to_dpu_encoder_virt(phys_enc->parent);
> >> @@ -2040,6 +2070,19 @@ void dpu_encoder_helper_phys_cleanup(struct dpu_encoder_phys *phys_enc)
> >>              /* mark WB flush as pending */
> >>              if (phys_enc->hw_ctl->ops.update_pending_flush_wb)
> >>                      phys_enc->hw_ctl->ops.update_pending_flush_wb(ctl, phys_enc->hw_wb->idx);
> >> +
> >> +            if (dpu_enc->cwb_mask) {
> >> +                    for (i = 0; i < hweight32(dpu_enc->cwb_mask); i++) {
> >
> > This will break if cwb_mask starts from bit 1: hweight will count bits,
> > so the loop will not cover the highest bit set.
>
> Ah, good point. I should be doing `hweight() + 1` here instead.

Not quite. I think it might be easier to loop up from CWB_0 to
CWB_MAX. The optimization gain from using hweight32 should be
negligible and using CWB constants will improve readability.

>
> >
> >> +                            if (!(dpu_enc->cwb_mask & (1 << i)))
> >> +                                    continue;
> >> +
> >> +                            cwb_idx = i + CWB_0;
> >> +
> >> +                            if (phys_enc->hw_wb->ops.setup_input_ctrl)
> >> +                                    phys_enc->hw_wb->ops.setup_input_ctrl(phys_enc->hw_wb,
> >> +                                                    cwb_idx, PINGPONG_NONE);
> >> +                    }
> >> +            }
> >>      } else {
> >>              for (i = 0; i < dpu_enc->num_phys_encs; i++) {
> >>                      if (dpu_enc->phys_encs[i] && phys_enc->hw_intf->ops.bind_pingpong_blk)
> >> diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.h b/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.h
> >> index 0d27e50384f0..131bb8b2c0ee 100644
> >> --- a/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.h
> >> +++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.h
> >> @@ -1,6 +1,6 @@
> >>   /* SPDX-License-Identifier: GPL-2.0-only */
> >>   /*
> >> - * Copyright (c) 2022 Qualcomm Innovation Center, Inc. All rights reserved.
> >> + * Copyright (c) 2022-2024 Qualcomm Innovation Center, Inc. All rights reserved.
> >>    * Copyright (c) 2015-2018, The Linux Foundation. All rights reserved.
> >>    * Copyright (C) 2013 Red Hat
> >>    * Author: Rob Clark <robdclark@...il.com>
> >> @@ -188,6 +188,13 @@ void dpu_encoder_update_topology(struct drm_encoder *drm_enc,
> >>    */
> >>   bool dpu_encoder_needs_modeset(struct drm_encoder *drm_enc, struct drm_atomic_state *state);
> >>
> >> +/**
> >> + * dpu_encoder_set_cwb_mask - set the CWB blocks mask for the encoder
> >> + * @drm_enc:    Pointer to previously created drm encoder structure
> >> + * @cwb_mask:   CWB blocks mask to set for the encoder
> >> + */
> >> +void dpu_encoder_set_cwb_mask(struct drm_encoder *drm_enc, u32 cwb_mask);
> >> +
> >>   /**
> >>    * dpu_encoder_prepare_wb_job - prepare writeback job for the encoder.
> >>    * @drm_enc:    Pointer to previously created drm encoder structure
> >> diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder_phys.h b/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder_phys.h
> >> index e77ebe3a68da..f0e5c5b073e5 100644
> >> --- a/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder_phys.h
> >> +++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder_phys.h
> >> @@ -1,6 +1,6 @@
> >>   /* SPDX-License-Identifier: GPL-2.0-only */
> >>   /*
> >> - * Copyright (c) 2022 Qualcomm Innovation Center, Inc. All rights reserved.
> >> + * Copyright (c) 2022-2024 Qualcomm Innovation Center, Inc. All rights reserved.
> >>    * Copyright (c) 2015-2018 The Linux Foundation. All rights reserved.
> >>    */
> >>
> >> @@ -339,6 +339,22 @@ static inline enum dpu_3d_blend_mode dpu_encoder_helper_get_3d_blend_mode(
> >>    */
> >>   unsigned int dpu_encoder_helper_get_dsc(struct dpu_encoder_phys *phys_enc);
> >>
> >> +/**
> >> + * dpu_encoder_get_cwb_mask - get CWB blocks mask for DPU encoder
> >> + *   This helper function is used by physical encoders to get the CWB blocks
> >> + *   mask used for this encoder.
> >> + * @enc: Pointer to DRM encoder structure
> >> + */
> >> +u32 dpu_encoder_get_cwb_mask(struct drm_encoder *enc);
> >> +
> >> +/**
> >> + * dpu_encoder_in_clone_mode - determine if DPU encoder is in clone mode
> >> + *   This helper is used by physical encoders to determine if the encoder is in
> >> + *   clone mode.
> >> + * @enc: Pointer to DRM encoder structure
> >> + */
> >> +bool dpu_encoder_in_clone_mode(struct drm_encoder *enc);
> >> +
> >>   /**
> >>    * dpu_encoder_get_dsc_config - get DSC config for the DPU encoder
> >>    *   This helper function is used by physical encoder to get DSC config
> >> diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder_phys_wb.c b/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder_phys_wb.c
> >> index 882c717859ce..e1ec64ffc742 100644
> >> --- a/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder_phys_wb.c
> >> +++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder_phys_wb.c
> >> @@ -1,6 +1,6 @@
> >>   // SPDX-License-Identifier: GPL-2.0-only
> >>   /*
> >> - * Copyright (c) 2022 Qualcomm Innovation Center, Inc. All rights reserved.
> >> + * Copyright (c) 2022-2024 Qualcomm Innovation Center, Inc. All rights reserved.
> >>    */
> >>
> >>   #define pr_fmt(fmt)        "[drm:%s:%d] " fmt, __func__, __LINE__
> >> @@ -264,6 +264,45 @@ static void dpu_encoder_phys_wb_setup_ctl(struct dpu_encoder_phys *phys_enc)
> >>      }
> >>   }
> >>
> >> +static void dpu_encoder_phys_wb_setup_cwb(struct dpu_encoder_phys *phys_enc)
> >> +{
> >> +    struct dpu_hw_wb *hw_wb;
> >> +    struct dpu_hw_ctl *hw_ctl;
> >> +    struct dpu_hw_pingpong *hw_pp;
> >> +    u32 cwb_mask, cwb_idx;
> >> +
> >> +    if (!phys_enc)
> >> +            return;
> >> +
> >> +    hw_wb = phys_enc->hw_wb;
> >> +    hw_ctl = phys_enc->hw_ctl;
> >> +    hw_pp = phys_enc->hw_pp;
> >> +
> >> +    if (!hw_wb || !hw_ctl || !hw_pp) {
> >> +            DPU_DEBUG("[wb:%d] no ctl or pp assigned\n", hw_wb->idx - WB_0);
> >> +            return;
> >> +    }
> >> +
> >> +    cwb_mask = dpu_encoder_get_cwb_mask(phys_enc->parent);
> >> +
> >> +    for (int i = 0; i < hweight32(cwb_mask); i++) {
> >> +            if (!(cwb_mask & (1 << i)))
> >> +                    continue;
> >> +
> >> +            cwb_idx = i + CWB_0;
> >> +
> >> +            if (hw_wb->ops.setup_input_ctrl)
> >> +                    hw_wb->ops.setup_input_ctrl(hw_wb, cwb_idx, hw_pp->idx + i);
>
> Just wanted to note that the value being programmed here is incorrect.
>
> Instead of passing in the index of the dedicated cwb pingpong, it should
> instead be the index of the *real time display* pingpong.
>
> To grab the correct value, I'm thinking of adding a DPU encoder API to
> query the RM for (non-dedicated CWB) HW_BLK_PINGPONG to grab the
> pingpong indices for the real-time encoder. I'll then call that API here
> and pass the resulting real-time hw_pp index.

Hmm. Having SDM845 and below and LM_2 / LM_5 pair in mind I thought
that we might need to get a list of PPs. But then I noticed that this
pair uses PP_2 / PP_3 pair of pingpong blocks. Yes, the proposed API
sounds good.

>
> Please let me know if you have any feedback on this proposal or
> alternate solutions.
>
> Thanks,
>
> Jessica Zhang
>
> >> +
> >> +            /*
> >> +             * The CWB mux supports using LM or DSPP as tap points. For now,
> >> +             * always use DSPP tap point
> >> +             */
> >> +            if (hw_wb->ops.setup_input_mode)
> >> +                    hw_wb->ops.setup_input_mode(hw_wb, cwb_idx, INPUT_MODE_DSPP_OUT);
> >> +    }
> >> +}
> >> +
> >>   /**
> >>    * _dpu_encoder_phys_wb_update_flush - flush hardware update
> >>    * @phys_enc:      Pointer to physical encoder
> >> @@ -342,6 +381,8 @@ static void dpu_encoder_phys_wb_setup(
> >>
> >>      dpu_encoder_helper_phys_setup_cdm(phys_enc, dpu_fmt, CDM_CDWN_OUTPUT_WB);
> >>
> >> +    dpu_encoder_phys_wb_setup_cwb(phys_enc);
> >> +
> >>      dpu_encoder_phys_wb_setup_ctl(phys_enc);
> >>   }
> >>
> >>
> >> --
> >> 2.34.1
> >>
> >
> > --
> > With best wishes
> > Dmitry



-- 
With best wishes
Dmitry