[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <A6F550BE-D7F6-4FE1-8FED-85AA69835FFB@lca.pw>
Date: Tue, 10 Dec 2019 07:11:37 -0500
From: Qian Cai <cai@....pw>
To: Ryan Chen <yu.chen.surf@...il.com>
Cc: Thomas Gleixner <tglx@...utronix.de>,
Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
Reinette Chatre <reinette.chatre@...el.com>,
Fenghua Yu <fenghua.yu@...el.com>,
"H. Peter Anvin" <hpa@...or.com>,
John Stultz <john.stultz@...aro.org>, sboyd@...nel.org,
Tony Luck <tony.luck@...el.com>, tj@...nel.org,
the arch/x86 maintainers <x86@...nel.org>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] x86/resctrl: fix an imbalance in domain_remove_cpu
> On Dec 10, 2019, at 2:44 AM, Ryan Chen <yu.chen.surf@...il.com> wrote:
>
> Could you elaborate a little more on the failure symptom?
> If I understand correctly, the error you described was due to
> r->mon_capable set to false while is_mbm_enabled() returns true?
Yes.
> Which means on this platform rdt_mon_features is non zero?
No idea. I did add some debug code that indicated resctrl_online_cpu() found 3 resources in for_each_capable_rdt_resource(r). Only the first one set r->mon_capable.
> And in get_rdt_mon_resources() it will invoke rdt_get_mon_l3_config(),
> however the only possible failure to do not set r->mon_capable is that it
> failed in dom_data_init() due to kcalloc() failure? Then the logic in
Very likely. Should be easy to confirm.
> get_rdt_resources() is that it will ignore the return error if rdt allocate
> feature is supported on this platform? If
Yes.
> this is the case, the r->mon_capable
> is not an indicator for whether the overflow thread has been created, right?
I am not sure about that.
> Can we simply remove the check of r->mon_capable in domain_add_cpu() and
> invoke domain_setup_mon_state() directly?
That should work too, but it is so perfect align with the r->alloc_capable check above, so I am not sure it is a good idea to break it.
> Humm, it looks like there are two places within this function
> invoked cancel_delayed_work(&d->mbm_over),
> why not adding the check for both of them?
Because I am not sure about the second one. It was never executed due to “cpu != d->mbm_work_cpu“ even after offlining all CPUs except cpu 0 and never cause anything wrong yet, so I could not test it yet, but I can see why it might need a similar check too if d->mbm_work_cpu is non-zero and could trigger the same imbalance.
Powered by blists - more mailing lists