[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <Zrj2v4IFbS/1yPnz@vaxr-BM6660-BM6360>
Date: Mon, 12 Aug 2024 01:37:03 +0800
From: I Hsin Cheng <richard120310@...il.com>
To: Yury Norov <yury.norov@...il.com>
Cc: andriy.shevchenko@...ux.intel.com, linux@...musvillemoes.dk,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH] cpumask: Ensure the visibility of set_nr_cpu_ids
On Sun, Aug 11, 2024 at 08:18:01AM -0700, Yury Norov wrote:
> On Sun, Aug 11, 2024 at 05:25:01PM +0800, I Hsin Cheng wrote:
> > The variable "nr_cpu_ids" is a system-wide variable which should be seen
> > as consistent all the time. For example it's set in one of the kernel
> > setup procedure "prefill_possible_map", the operations here should
> > happens before all the code after setup, which means the operations here
> > should be visible to all the code after setup.
> >
> > set_cpu_possible() ensure it's visibility because it eventually falls
> > into an atomic instruction, however the function "set_nr_cpu_ids()"
> > fails to make the guarantee since it only performs a normal write
> > operations.
>
> Set_cpu_possible() is a completely different thing.
>
> > Adding the macro "WRITE_ONCE()" will prevent the compiler from re-order
> > the instruction of the write operation for "nr_cpu_ids", so we can
> > guarantee the operation is visible to all the codes coming after it.
> >
> > Signed-off-by: I Hsin Cheng <richard120310@...il.com>
>
> I don't understand this. nr_cpu_ids is initialized at compile time
> to NR_CPUS, to represent maximum number of bits in cpumasks.
>
> Later on runtime we update nr_cpu_ids with an actual number of possible
> CPUs in the system. The type of the variable is unsigned int, and it
> means that threads accessing it will either fetch NR_CPUS, or new value
> coherently.
>
> Having nr_cpu_ids == NR_CPUS is not an error, it's just a non-optimal
> value. The only effect of it is that kernel algorithms traverse unused
> part of cpumasks for the first few microseconds after boot.
>
> Can you explain in details what type of race you're trying to fix?
> Which architecture? What is the race scenario?
>
> > ---
> > include/linux/cpumask.h | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/include/linux/cpumask.h b/include/linux/cpumask.h
> > index f10fb87d4..3731f5e43 100644
> > --- a/include/linux/cpumask.h
> > +++ b/include/linux/cpumask.h
> > @@ -46,7 +46,7 @@ static inline void set_nr_cpu_ids(unsigned int nr)
> > #if (NR_CPUS == 1) || defined(CONFIG_FORCE_NR_CPUS)
> > WARN_ON(nr != nr_cpu_ids);
> > #else
> > - nr_cpu_ids = nr;
> > + WRITE_ONCE(nr_cpu_ids, nr);
>
>
> WRITE_ONCE()? How is that supposed to work? The only possible effect
> would be reordering of a couple of instructions. How would that help
> threads running on other CPUs synchronize any better?
>
> Regardless, WRITE_ONCE() should always be paired with READ_ONCE() to
> make it working. So, if we take this, we should also make every read of
> nr_cpu_ids by using READ_ONCE(). nr_cpu_ids is used in fast paths in
> many places, particularly as loop termination condition. Things like
> this:
>
> while (cpu < READ_ONCE(nr_cpu_ids))
> do_something_very_quick();
>
> would definitely hit performance.
>
> Thanks,
> Yury
>
> > #endif
> > }
> >
> > --
> > 2.34.1
Thanks for the reply and your detailed explanation, I wasn't trying to
fix any race conditions, I was thinking the same as you mentioned that
some of the cpumask's operation would traverse more times than it
actually needs to. I took set_cpu_possible() as an example trying to
express I think it would be nice for nr_cpu_ids to make synchronized
guarantees across CPUs, as WRITE_ONCE() would also provide write memory
barrier.
> Regardless, WRITE_ONCE() should always be paired with READ_ONCE() to
> make it working. So, if we take this, we should also make every read of
> nr_cpu_ids by using READ_ONCE(). nr_cpu_ids is used in fast paths in
> many places, particularly as loop termination condition. Things like
> this:
>
> while (cpu < READ_ONCE(nr_cpu_ids))
> do_something_very_quick();
>
> would definitely hit performance.
Indeed if WRITCE_ONCE() is going to be applied here, every read of
nr_cpu_ids will have to use READ_ONCE() and it'll harm the fast path's
performance. I didn't considered this part, thanks for pointing it out.
Comparing to the number of redundant traversal, I think the performance
hit introduced by READ_ONCE() in fast path will be larger, so we should
stick to the same way. I learned alot and should reconsidered many parts
of my former knowledge about cpumask from your reply.
Thanks again for your detail explanation and kindly reply !
Best regards,
I-Hsin Cheng
Powered by blists - more mailing lists