linux-kernel - Re: [PATCH v5 2/3] drm/tidss: Remove max_pclk

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <67c54a2e-953e-4385-ac85-324a0d20e5c1@ideasonboard.com>
Date: Wed, 3 Sep 2025 12:31:03 +0300
From: Tomi Valkeinen <tomi.valkeinen@...asonboard.com>
To: Swamil Jain <s-jain1@...com>, Maxime Ripard <mripard@...nel.org>
Cc: h-shenoy@...com, devarsht@...com, vigneshr@...com, praneeth@...com,
 u-kumar1@...com, dri-devel@...ts.freedesktop.org,
 linux-kernel@...r.kernel.org, jyri.sarha@....fi,
 maarten.lankhorst@...ux.intel.com, tzimmermann@...e.de, airlied@...il.com,
 simona@...ll.ch, aradhya.bhatia@...ux.dev
Subject: Re: [PATCH v5 2/3] drm/tidss: Remove max_pclk_khz from tidss display
 features

Hi,

On 03/09/2025 11:38, Swamil Jain wrote:
> Hi Tomi, Maxime,
> 
> On 8/27/25 15:19, Tomi Valkeinen wrote:
>> Hi,
>>
>> On 27/08/2025 12:27, Maxime Ripard wrote:
>>> On Wed, Aug 27, 2025 at 11:49:22AM +0300, Tomi Valkeinen wrote:
>>>> On 19/08/2025 22:21, Swamil Jain wrote:
>>>>> From: Jayesh Choudhary <j-choudhary@...com>
>>>>>
>>>>> TIDSS hardware by itself does not have variable max_pclk for each VP.
>>>>> The maximum pixel clock is determined by the limiting factor between
>>>>> the functional clock and the PLL (parent to the VP/pixel clock).
>>>>
>>>> Hmm, this is actually not in the driver, is it? We're not limiting the
>>>> pclk based on the fclk.
>>>>
>>>>> The limitation that has been modeled till now comes from the clock
>>>>> (PLL can only be programmed to a particular max value). Instead of
>>>>> putting it as a constant field in dispc_features, we can query the
>>>>> DM to see if requested clock can be set or not and use it in
>>>>> mode_valid().
>>>>>
>>>>> Replace constant "max_pclk_khz" in dispc_features with
>>>>> max_successful_rate and max_attempted_rate, both of these in
>>>>> tidss_device structure would be modified in runtime. In mode_valid()
>>>>> call, check if a best frequency match for mode clock can be found or
>>>>> not using "clk_round_rate()". Based on that, propagate
>>>>> max_successful_rate and max_attempted_rate and query DM again only if
>>>>> the requested mode clock is greater than max_attempted_rate. (As the
>>>>> preferred display mode is usually the max resolution, driver ends up
>>>>> checking the highest clock the first time itself which is used in
>>>>> subsequent checks).
>>>>>
>>>>> Since TIDSS display controller provides clock tolerance of 5%, we use
>>>>> this while checking the max_successful_rate. Also, move up
>>>>> "dispc_pclk_diff()" before it is called.
>>>>>
>>>>> This will make the existing compatibles reusable if DSS features are
>>>>> same across two SoCs with the only difference being the pixel clock.
>>>>>
>>>>> Fixes: 7246e0929945 ("drm/tidss: Add OLDI bridge support")
>>>>> Reviewed-by: Devarsh Thakkar <devarsht@...com>
>>>>> Signed-off-by: Jayesh Choudhary <j-choudhary@...com>
>>>>> Signed-off-by: Swamil Jain <s-jain1@...com>
>>>>> ---
>>>>>   drivers/gpu/drm/tidss/tidss_dispc.c | 85 ++++++++++++
>>>>> +----------------
>>>>>   drivers/gpu/drm/tidss/tidss_dispc.h |  1 -
>>>>>   drivers/gpu/drm/tidss/tidss_drv.h   | 11 +++-
>>>>>   3 files changed, 47 insertions(+), 50 deletions(-)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/tidss/tidss_dispc.c b/drivers/gpu/drm/
>>>>> tidss/tidss_dispc.c
>>>>> index c0277fa36425..c2c0fe0d4a0f 100644
>>>>> --- a/drivers/gpu/drm/tidss/tidss_dispc.c
>>>>> +++ b/drivers/gpu/drm/tidss/tidss_dispc.c
>>>>> @@ -58,10 +58,6 @@ static const u16
>>>>> tidss_k2g_common_regs[DISPC_COMMON_REG_TABLE_LEN] = {
>>>>>   const struct dispc_features dispc_k2g_feats = {
>>>>>       .min_pclk_khz = 4375,
>>>>>   -    .max_pclk_khz = {
>>>>> -        [DISPC_VP_DPI] = 150000,
>>>>> -    },
>>>>> -
>>>>>       /*
>>>>>        * XXX According TRM the RGB input buffer width up to 2560
>>>>> should
>>>>>        *     work on 3 taps, but in practice it only works up to 1280.
>>>>> @@ -144,11 +140,6 @@ static const u16
>>>>> tidss_am65x_common_regs[DISPC_COMMON_REG_TABLE_LEN] = {
>>>>>   };
>>>>>     const struct dispc_features dispc_am65x_feats = {
>>>>> -    .max_pclk_khz = {
>>>>> -        [DISPC_VP_DPI] = 165000,
>>>>> -        [DISPC_VP_OLDI_AM65X] = 165000,
>>>>> -    },
>>>>> -
>>>>>       .scaling = {
>>>>>           .in_width_max_5tap_rgb = 1280,
>>>>>           .in_width_max_3tap_rgb = 2560,
>>>>> @@ -244,11 +235,6 @@ static const u16
>>>>> tidss_j721e_common_regs[DISPC_COMMON_REG_TABLE_LEN] = {
>>>>>   };
>>>>>     const struct dispc_features dispc_j721e_feats = {
>>>>> -    .max_pclk_khz = {
>>>>> -        [DISPC_VP_DPI] = 170000,
>>>>> -        [DISPC_VP_INTERNAL] = 600000,
>>>>> -    },
>>>>> -
>>>>>       .scaling = {
>>>>>           .in_width_max_5tap_rgb = 2048,
>>>>>           .in_width_max_3tap_rgb = 4096,
>>>>> @@ -315,11 +301,6 @@ const struct dispc_features dispc_j721e_feats = {
>>>>>   };
>>>>>     const struct dispc_features dispc_am625_feats = {
>>>>> -    .max_pclk_khz = {
>>>>> -        [DISPC_VP_DPI] = 165000,
>>>>> -        [DISPC_VP_INTERNAL] = 170000,
>>>>> -    },
>>>>> -
>>>>>       .scaling = {
>>>>>           .in_width_max_5tap_rgb = 1280,
>>>>>           .in_width_max_3tap_rgb = 2560,
>>>>> @@ -376,15 +357,6 @@ const struct dispc_features dispc_am625_feats = {
>>>>>   };
>>>>>     const struct dispc_features dispc_am62a7_feats = {
>>>>> -    /*
>>>>> -     * if the code reaches dispc_mode_valid with VP1,
>>>>> -     * it should return MODE_BAD.
>>>>> -     */
>>>>> -    .max_pclk_khz = {
>>>>> -        [DISPC_VP_TIED_OFF] = 0,
>>>>> -        [DISPC_VP_DPI] = 165000,
>>>>> -    },
>>>>> -
>>>>>       .scaling = {
>>>>>           .in_width_max_5tap_rgb = 1280,
>>>>>           .in_width_max_3tap_rgb = 2560,
>>>>> @@ -441,10 +413,6 @@ const struct dispc_features dispc_am62a7_feats
>>>>> = {
>>>>>   };
>>>>>     const struct dispc_features dispc_am62l_feats = {
>>>>> -    .max_pclk_khz = {
>>>>> -        [DISPC_VP_DPI] = 165000,
>>>>> -    },
>>>>> -
>>>>>       .subrev = DISPC_AM62L,
>>>>>         .common = "common",
>>>>> @@ -1347,25 +1315,57 @@ static void
>>>>> dispc_vp_set_default_color(struct dispc_device *dispc,
>>>>>               DISPC_OVR_DEFAULT_COLOR2, (v >> 32) & 0xffff);
>>>>>   }
>>>>>   +/*
>>>>> + * Calculate the percentage difference between the requested pixel
>>>>> clock rate
>>>>> + * and the effective rate resulting from calculating the clock
>>>>> divider value.
>>>>> + */
>>>>> +unsigned int dispc_pclk_diff(unsigned long rate, unsigned long
>>>>> real_rate)
>>>>> +{
>>>>> +    int r = rate / 100, rr = real_rate / 100;
>>>>> +
>>>>> +    return (unsigned int)(abs(((rr - r) * 100) / r));
>>>>> +}
>>>>> +
>>>>> +static int check_pixel_clock(struct dispc_device *dispc,
>>>>> +                 u32 hw_videoport, unsigned long clock)
>>>>> +{
>>>>> +    unsigned long round_clock;
>>>>> +
>>>>> +    if (dispc->tidss->is_ext_vp_clk[hw_videoport])
>>>>> +        return 0;
>>>>> +
>>>>> +    if (clock <= dispc->tidss->max_successful_rate[hw_videoport])
>>>>> +        return 0;
>>>>> +
>>>>> +    if (clock < dispc->tidss->max_attempted_rate[hw_videoport])
>>>>> +        return -EINVAL;
>>>>> +
>>>>> +    round_clock = clk_round_rate(dispc->vp_clk[hw_videoport], clock);
>>>>> +
>>>>> +    if (dispc_pclk_diff(clock, round_clock) > 5)
>>>>> +        return -EINVAL;
>>>>> +
>>>>> +    dispc->tidss->max_successful_rate[hw_videoport] = round_clock;
>>>>> +    dispc->tidss->max_attempted_rate[hw_videoport] = clock;
>>>>
>>>> I still don't think this logic is sound. This is trying to find the
>>>> maximum clock rate, and optimize by avoiding the calls to
>>>> clk_round_rate() if possible. That makes sense.
>>>>
>>>> But checking for the 5% tolerance breaks it, in my opinion. If we find
>>>> out that the PLL can do, say, 100M, but we need pclk of 90M, the
>>>> current
>>>> maximum is still the 100M, isn't it?
>>>
>>> 5% is pretty large indeed. We've been using .5% in multiple drivers and
>>> it proved to be pretty ok. I would advise you tu use it too.
>>
>> The 5% comes from OMAP DSS, where we had to do pixel clock with a few
>> dividers and multipliers. The rates were quite coarse, and we ended up
>> having quite a large tolerance.
>>
>> I think with tidss, we always have a PLL we control, so we should always
>> have very exact clocks. So I'm fine with dropping it to .5%. However,
>> this patch and series is about removing the a-bit-too-hardcoded VP clk
>> max rate code in the driver, so I would leave everything else to another
>> series.
>>
>>> It's not clear to me why avoiding a clk_round_rate() call is something
>>> worth doing though?
>>
>> Hard to say if it's worth doing, someone should make some perf tests.
>> However, afaik, the calls do go to the firmware, so it involves
>> inter-processor calls. On OMAP DSS checking the clock rates was slow, as
>> it involved lots of iterating with dividers and multipliers. Perhaps
>> it's much faster here.
>>
>>> Even caching the maximum rate you have been able to reach before is
>>> pretty fragile: if the PLL changes its rate, or if a sibling clock has
>>> set some limits on what the PLL can do, your maximum isn't relevant
>>> anymore.
>>
>> You're right, although afaik it should not happen with TI's SoCs. We
>> would be in trouble anyway if that were the case (e.g. someone starts
>> the camera, and suddenly we can't support 1080p anymore).
>>
>>> in other words, what's wrong with simply calling clk_round_rate() and
>>> checking if it's within a .5% deviation?
>>
>> This started with discussions how to replace the hardcoded max VP clock
>> rate (used to quickly weed out impossible rates), which in reality was
>> actually PLL max clock rate. We don't know the PLL max rate, and can't
>> query it, so this approach was taken.
>>
>>> At the very least, this should be explained in comments or the commit
>>> message.
>>
>> I agree.
>>
>> Swamil, can you do some perf tests with clk_round_rate()? If it's fast
>> (enough), it will simplify the driver.
> 
> Average execution time is around 112 us.
> Trace file including the execution time for clk_round_rate(): https://
> gist.github.com/swamiljain/2abe86982cdeba1d69223d2d525e0cb6
> It is better to reduce calls to clk_round_rate().
> 
> Need your suggestions for a better approach.

We can cache the clk_round_rate calls. Checking my monitor, there are 36
modes it offers me, but only 20 different pclk rates. Also, we could
have multiple clk_round_rate calls happening in the driver for the same
mode, and that would also be handled.

Even if clk_round_rate takes a bit long, it only happens once (I hope)
when an app does a modeset, and multiple times when a display is
connected, I wonder if 100 us is an issue?

Just using clk_round_rate() without any tricks would simplify the driver
nicely, so I think we should try to see if we can get that working.

Do you know if there's anything to improve on the clock side, ti-sci or
firmare?

 Tomi