[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20220630200717.zlc6z6zcqbsw7euk@bogus>
Date: Thu, 30 Jun 2022 21:07:17 +0100
From: Sudeep Holla <sudeep.holla@....com>
To: Conor.Dooley@...rochip.com
Cc: linux-kernel@...r.kernel.org, gregkh@...uxfoundation.org,
atishp@...shpatra.org, atishp@...osinc.com,
vincent.guittot@...aro.org, dietmar.eggemann@....com,
wangqing@...o.com, robh+dt@...nel.org, rafael@...nel.org,
ionela.voinescu@....com, pierre.gondois@....com,
linux-arm-kernel@...ts.infradead.org,
linux-riscv@...ts.infradead.org, gshan@...hat.com,
Valentina.FernandezAlanis@...rochip.com
Subject: Re: [PATCH v5 09/19] arch_topology: Use the last level cache
information from the cacheinfo
On Thu, Jun 30, 2022 at 07:20:04PM +0000, Conor.Dooley@...rochip.com wrote:
>
>
> On 30/06/2022 18:35, Sudeep Holla wrote:
> > On Thu, Jun 30, 2022 at 04:37:50PM +0000, Conor.Dooley@...rochip.com wrote:
> >> On 30/06/2022 11:39, Sudeep Holla wrote:
> >>>
> >>> I can't think of any reason for that to happen unless detect_cache_attributes
> >>> is failing from init_cpu_topology and we are ignoring that.
> >>>
> >>> Are all RISC-V platforms failing on -next or is it just this platform ?
> >>
> >> I don't know. I only have SoCs with this core complex & one that does not
> >> work with upstream. I can try my other board with this SoC - but I am on
> >> leave at the moment w/ a computer or internet during the day so it may be
> >> a few days before I can try it.
> >>
> >
> > Sure, no worries.
> >
> >> However, Niklas Cassel has tried to use the Canaan K210 on next-20220630
> >> but had issues with RCU stalling:
> >> https://lore.kernel.org/linux-riscv/Yr3PKR0Uj1bE5Y6O@x1-carbon/T/#m52016996fcf5fa0501066d73352ed8e806803e06
> >> Not going to claim any relation, but that's minus 1 to the platforms that
> >> can be used to test this on upstream RISC-V.
> >>
> >
> > Ah OK, will check and ask full logs to see if there is any relation.
> >
> >>> We may have to try with some logs in detect_cache_attributes,
> >>> last_level_cache_is_valid and last_level_cache_is_shared to check where it
> >>> is going wrong.
> >>>
> >>> It must be crashing in smp_callin->update_siblings_masks->last_level_cache_is_shared
>
>
> So, looks like there's a problem in cache_leaves_are_shared() which is hit
> by the above path. Both of the if clauses are false, and the function falls
> through to return sib_leaf->fw_token == this_leaf->fw_token;
Both if() failing is expected and that statement
return sib_leaf->fw_token == this_leaf->fw_token;
execution is correct.
> Both sib_leaf & this_leaf seem to be null.
>
But this is wrong as last_level_cache_is_shared checks for
last_level_cache_is_valid which must return false if the fw_token = NULL
So we must not hit the above return statement with NULL fw_token.
> static inline bool cache_leaves_are_shared(struct cacheinfo *this_leaf,
> struct cacheinfo *sib_leaf)
> {
> /*
> * For non DT/ACPI systems, assume unique level 1 caches,
> * system-wide shared caches for all other levels. This will be used
> * only if arch specific code has not populated shared_cpu_map
> */
> if (!(IS_ENABLED(CONFIG_OF) || IS_ENABLED(CONFIG_ACPI)))
> return !(this_leaf->level == 1);
>
> if ((sib_leaf->attributes & CACHE_ID) &&
> (this_leaf->attributes & CACHE_ID))
> return sib_leaf->id == this_leaf->id;
>
> return sib_leaf->fw_token == this_leaf->fw_token;
> }
>
> Any ideas what to look at next?
I wonder how did we not get last_level_cache_is_valid as false if the
fw_node is NULL. But it should not be NULL at the first place.
--
Regards,
Sudeep
Powered by blists - more mailing lists