linux-kernel - Re: [PATCH v2] clk: mediatek: clk-mux: Add .determine

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20221012094004.jgiyvmbgomiyedik@houat>
Date:   Wed, 12 Oct 2022 11:40:04 +0200
From:   Maxime Ripard <maxime@...no.tech>
To:     AngeloGioacchino Del Regno 
        <angelogioacchino.delregno@...labora.com>
Cc:     sboyd@...nel.org, mturquette@...libre.com, matthias.bgg@...il.com,
        chun-jie.chen@...iatek.com, miles.chen@...iatek.com,
        wenst@...omium.org, linux-clk@...r.kernel.org,
        linux-arm-kernel@...ts.infradead.org,
        linux-mediatek@...ts.infradead.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2] clk: mediatek: clk-mux: Add .determine_rate() callback

On Wed, Oct 12, 2022 at 11:09:59AM +0200, AngeloGioacchino Del Regno wrote:
> Il 12/10/22 10:55, Maxime Ripard ha scritto:
> > Hi,
> > 
> > On Tue, Oct 11, 2022 at 03:55:48PM +0200, AngeloGioacchino Del Regno wrote:
> > > Since commit 262ca38f4b6e ("clk: Stop forwarding clk_rate_requests
> > > to the parent"), the clk_rate_request is .. as the title says, not
> > > forwarded anymore to the parent:
> > 
> > It's not entirely true, the rate request should still be forwarded, but
> > we don't pass the same instance of clk_rate_request anymore.
> > 
> > > this produces an issue with the MediaTek clock MUX driver during GPU
> > > DVFS on MT8195, but not on MT8192 or others.
> > > 
> > > This is because, differently from others, like MT8192 where all of
> > > the clocks in the MFG parents tree are of mtk_mux type, but in the
> > > parent tree of MT8195's MFG clock, we have one mtk_mux clock and
> > > one (clk framework generic) mux clock, like so:
> > > 
> > > names: mfg_bg3d -> mfg_ck_fast_ref -> top_mfg_core_tmp (or) mfgpll
> > > types: mtk_gate ->      mux        ->     mtk_mux      (or) mtk_pll
> > > 
> > > To solve this issue and also keep the GPU DVFS clocks code working
> > > as expected, wire up a .determine_rate() callback for the mtk_mux
> > > ops; for that, the standard clk_mux_determine_rate_flags() was used
> > > as it was possible to.
> > 
> > It probably fixes things indeed, but I'm a bit worried that it just
> > works around the actual issue instead of fixing the actual bug...
> > 
> > > This commit was successfully tested on MT6795 Xperia M5, MT8173 Elm,
> > > MT8192 Spherion and MT8195 Tomato; no regressions were seen.
> > > 
> > > For the sake of some more documentation about this issue here's the
> > > trace of it:
> > > 
> > > [   12.211587] ------------[ cut here ]------------
> > > [   12.211589] WARNING: CPU: 6 PID: 78 at drivers/clk/clk.c:1462 clk_core_init_rate_req+0x84/0x90
> > > [   12.211593] Modules linked in: stp crct10dif_ce mtk_adsp_common llc rfkill snd_sof_xtensa_dsp
> > >                 panfrost(+) sbs_battery cros_ec_lid_angle cros_ec_sensors snd_sof_of
> > >                 cros_ec_sensors_core hid_multitouch cros_usbpd_logger snd_sof gpu_sched
> > >                 snd_sof_utils fuse ipv6
> > > [   12.211614] CPU: 6 PID: 78 Comm: kworker/u16:2 Tainted: G        W          6.0.0-next-20221011+ #58
> > > [   12.211616] Hardware name: Acer Tomato (rev2) board (DT)
> > > [   12.211617] Workqueue: devfreq_wq devfreq_monitor
> > > [   12.211620] pstate: 40400009 (nZcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> > > [   12.211622] pc : clk_core_init_rate_req+0x84/0x90
> > > [   12.211625] lr : clk_core_forward_rate_req+0xa4/0xe4
> > > [   12.211627] sp : ffff80000893b8e0
> > > [   12.211628] x29: ffff80000893b8e0 x28: ffffdddf92f9b000 x27: ffff46a2c0e8bc05
> > > [   12.211632] x26: ffff46a2c1041200 x25: 0000000000000000 x24: 00000000173eed80
> > > [   12.211636] x23: ffff80000893b9c0 x22: ffff80000893b940 x21: 0000000000000000
> > > [   12.211641] x20: ffff46a2c1039f00 x19: ffff46a2c1039f00 x18: 0000000000000000
> > > [   12.211645] x17: 0000000000000038 x16: 000000000000d904 x15: 0000000000000003
> > > [   12.211649] x14: ffffdddf9357ce48 x13: ffffdddf935e71c8 x12: 000000000004803c
> > > [   12.211653] x11: 00000000a867d7ad x10: 00000000a867d7ad x9 : ffffdddf90c28df4
> > > [   12.211657] x8 : ffffdddf9357a980 x7 : 0000000000000000 x6 : 0000000000000004
> > > [   12.211661] x5 : ffffffffffffffc8 x4 : 00000000173eed80 x3 : ffff80000893b940
> > > [   12.211665] x2 : 00000000173eed80 x1 : ffff80000893b940 x0 : 0000000000000000
> > > [   12.211669] Call trace:
> > > [   12.211670]  clk_core_init_rate_req+0x84/0x90
> > > [   12.211673]  clk_core_round_rate_nolock+0xe8/0x10c
> > > [   12.211675]  clk_mux_determine_rate_flags+0x174/0x1f0
> > > [   12.211677]  clk_mux_determine_rate+0x1c/0x30
> > > [   12.211680]  clk_core_determine_round_nolock+0x74/0x130
> > > [   12.211682]  clk_core_round_rate_nolock+0x58/0x10c
> > > [   12.211684]  clk_core_round_rate_nolock+0xf4/0x10c
> > > [   12.211686]  clk_core_set_rate_nolock+0x194/0x2ac
> > > [   12.211688]  clk_set_rate+0x40/0x94
> > > [   12.211691]  _opp_config_clk_single+0x38/0xa0
> > > [   12.211693]  _set_opp+0x1b0/0x500
> > > [   12.211695]  dev_pm_opp_set_rate+0x120/0x290
> > > [   12.211697]  panfrost_devfreq_target+0x3c/0x50 [panfrost]
> > > [   12.211705]  devfreq_set_target+0x8c/0x2d0
> > > [   12.211707]  devfreq_update_target+0xcc/0xf4
> > > [   12.211708]  devfreq_monitor+0x40/0x1d0
> > > [   12.211710]  process_one_work+0x294/0x664
> > > [   12.211712]  worker_thread+0x7c/0x45c
> > > [   12.211713]  kthread+0x104/0x110
> > > [   12.211716]  ret_from_fork+0x10/0x20
> > > [   12.211718] irq event stamp: 7102
> > > [   12.211719] hardirqs last  enabled at (7101): [<ffffdddf904ea5a0>] finish_task_switch.isra.0+0xec/0x2f0
> > > [   12.211723] hardirqs last disabled at (7102): [<ffffdddf91794b74>] el1_dbg+0x24/0x90
> > > [   12.211726] softirqs last  enabled at (6716): [<ffffdddf90410be4>] __do_softirq+0x414/0x588
> > > [   12.211728] softirqs last disabled at (6507): [<ffffdddf904171d8>] ____do_softirq+0x18/0x24
> > > [   12.211730] ---[ end trace 0000000000000000 ]---
> > 
> > ... Indeed, you shouldn't hit that warning at all. It happens in
> > clk_core_round_rate_nolock, which takes (before your patch) the
> > CLK_SET_RATE_PARENT branch. This indeed has been changed by the patch
> > you mentioned, and will call clk_core_forward_rate_req() now, that in
> > turn calls clk_core_init_rate_nolock().
> > 
> > I think the warning you hit is because core->parent is NULL, which is
> > passed to clk_core_forward_rate_req() as the parent argument, and we'll
> > call clk_core_init_rate_req() with parent set as the core argument.
> > 
> > In clk_core_init_rate_req(), the first thing we do is a WARN_ON(!core),
> > which is what you hit here I think.
> > 
> > This is different to the previous behavior that was calling
> > clk_core_round_rate_nolock() with core->parent directly, and
> > clk_core_round_rate_nolock() if its core argument is NULL will set
> > req->rate to 0 and bail out without returning an error.
> > 
> > Now, your patch probably works because now that you provide a
> > determine_rate implementation, clk_core_can_round() returns true and
> > you'll take a different branch in clk_core_round_rate_nolock(), avoiding
> > that issue entirely.
> > 
> > Does that patch work better (on top of next-20221012)?
>
> I admit I didn't go too deep in the research, as my brain processed that as
> "this is a mux clock, not really different from a standard mux, this callback
> is missing, that's not optimal"... then that fixed it and called it a day.
> 
> I should've prolonged my research for a better understanding of what was
> actually going on.

No worries :)

> What you said actually opened my mind and, with little surprise, your patch
> works as good as mine - no warnings and the clock scales as expected!

I'm actually wondering if you didn't encounter two issues. What kernel
were you testing before? If it's older than today's next
(next-20221012), you're likely missing

https://lore.kernel.org/linux-clk/20221010-rpi-clk-fixes-again-v1-0-d87ba82ac404@cerno.tech/

Which is likely to be what fixed the clock scaling. And my patch only
fixed the warning. Could you test next-20221012? If I'm right, you
should only get the warning.

> I still think that the mtk-mux driver should get a determine_rate callback but,
> at this point, that's going to have an entirely different commit description...

Yeah, it might, but as you said it's a separate discussion

> Please go on and send your patch: if you want, please remember to add me to
> the Cc's, so that I can give you my R-b tag in a timely manner.

Thanks!
Maxime