lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 18 Oct 2022 09:06:53 +0200
From:   Maxime Ripard <maxime@...no.tech>
To:     Stephen Boyd <sboyd@...nel.org>
Cc:     AngeloGioacchino Del Regno 
        <angelogioacchino.delregno@...labora.com>, mturquette@...libre.com,
        matthias.bgg@...il.com, chun-jie.chen@...iatek.com,
        miles.chen@...iatek.com, wenst@...omium.org,
        linux-clk@...r.kernel.org, linux-arm-kernel@...ts.infradead.org,
        linux-mediatek@...ts.infradead.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2] clk: mediatek: clk-mux: Add .determine_rate() callback

Hi Stephen,

On Fri, Oct 14, 2022 at 12:36:50PM -0700, Stephen Boyd wrote:
> Quoting Maxime Ripard (2022-10-12 06:56:19)
> > 
> > I think we need to address this in multiple ways:
> > 
> > - If you have ftrace enabled on that platform, running with "tp_printk
> >   trace_event=clk:*" in the kernel command line on a failing kernel would
> >   be great, so that we can figure out what is happening exactly.
> > 
> > - We really need to merge your patch above as well;
> > 
> > - And, if we can, we should forbid to register a mux without a
> >   determine_rate implementation. We have to take into account that some
> >   muxes are going to be RO and won't need a determine_rate
> >   implementation at all, but a clock with a set_parent and without a
> >   determine_rate seems like a weird combination.
> > 
> 
> So should I apply this patch now on clk-next? Given this regression I'm
> leaning towards not sending off the clk rate request this merge window
> and letting it bake another cycle.

I saw that you sent the PR, thanks

We spent some time with Angelo yesterday debugging this, and it's still
not clear to me what happens.

He provided a significant amount of traces, and we first checked that
the round_rate part of set_rate was returning something meaningful, and
it does. The rate is fine, the parent is too, everything's great.

Next we checked the decisions by clk_calc_new_rates, and it does return
the proper top clock, and its proper rate.

Finally, we hooked into clk_change_rate() to see what kind of decision
it was enforcing, and it seems to be ok as well. It doesn't change
parent, and it sets the proper rate, in both cases.

There's still one thing we haven't checked: one of the clock in the tree
(the parent of the one we want to change the rate on, and it has
SET_RATE_PARENT) has a notifier. As we've had a bug recently over this
I've not ruled out that this could be a similar bug.

I don't really think it is though, since the notifier callback doesn't
use the data provided by the framework:
https://elixir.bootlin.com/linux/v6.1-rc1/source/drivers/clk/mediatek/clk-mux.c#L279

I've pushed a branch for Angelo to test, just to confirm.

So... yeah. I can't explain the regression at this point. Do you have an
idea?

The good news is, since you merged this patch the regression is
invisible now to that platform. We still could encounter it on another
platform, but maybe it will also have a more obvious setup to replicate?

Thanks!
Maxime

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ