[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20230227230502.GJ2948950@paulmck-ThinkPad-P17-Gen-1>
Date: Mon, 27 Feb 2023 15:05:02 -0800
From: "Paul E. McKenney" <paulmck@...nel.org>
To: Joel Fernandes <joel@...lfernandes.org>
Cc: Uladzislau Rezki <urezki@...il.com>,
"Zhuo, Qiuxu" <qiuxu.zhuo@...el.com>, linux-kernel@...r.kernel.org,
Frederic Weisbecker <frederic@...nel.org>,
Lai Jiangshan <jiangshanlai@...il.com>,
linux-doc@...r.kernel.org, rcu@...r.kernel.org
Subject: Re: [PATCH RFC v2] rcu: Add a minimum time for marking boot as
completed
On Mon, Feb 27, 2023 at 02:10:30PM -0500, Joel Fernandes wrote:
> On Mon, Feb 27, 2023 at 1:57 PM Uladzislau Rezki <urezki@...il.com> wrote:
> >
> > On Mon, Feb 27, 2023 at 01:27:20PM -0500, Joel Fernandes wrote:
> > >
> > >
> > > > On Feb 27, 2023, at 1:20 PM, Uladzislau Rezki <urezki@...il.com> wrote:
> > > >
> > > > On Mon, Feb 27, 2023 at 01:15:47PM -0500, Joel Fernandes wrote:
> > > >>
> > > >>
> > > >>>> On Feb 27, 2023, at 1:06 PM, Uladzislau Rezki <urezki@...il.com> wrote:
> > > >>>
> > > >>> On Mon, Feb 27, 2023 at 10:16:51AM -0500, Joel Fernandes wrote:
> > > >>>>> On Mon, Feb 27, 2023 at 9:55 AM Paul E. McKenney <paulmck@...nel.org> wrote:
> > > >>>>>
> > > >>>>> On Mon, Feb 27, 2023 at 08:22:06AM -0500, Joel Fernandes wrote:
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>> On Feb 27, 2023, at 2:53 AM, Zhuo, Qiuxu <qiuxu.zhuo@...el.com> wrote:
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>>>
> > > >>>>>>>> From: Joel Fernandes (Google) <joel@...lfernandes.org>
> > > >>>>>>>> Sent: Saturday, February 25, 2023 11:34 AM
> > > >>>>>>>> To: linux-kernel@...r.kernel.org
> > > >>>>>>>> Cc: Joel Fernandes (Google) <joel@...lfernandes.org>; Frederic Weisbecker
> > > >>>>>>>> <frederic@...nel.org>; Lai Jiangshan <jiangshanlai@...il.com>; linux-
> > > >>>>>>>> doc@...r.kernel.org; Paul E. McKenney <paulmck@...nel.org>;
> > > >>>>>>>> rcu@...r.kernel.org
> > > >>>>>>>> Subject: [PATCH RFC v2] rcu: Add a minimum time for marking boot as
> > > >>>>>>>> completed
> > > >>>>>>>>
> > > >>>>>>>> On many systems, a great deal of boot happens after the kernel thinks the
> > > >>>>>>>> boot has completed. It is difficult to determine if the system has really
> > > >>>>>>>> booted from the kernel side. Some features like lazy-RCU can risk slowing
> > > >>>>>>>> down boot time if, say, a callback has been added that the boot
> > > >>>>>>>> synchronously depends on.
> > > >>>>>>>>
> > > >>>>>>>> Further, it is better to boot systems which pass 'rcu_normal_after_boot' to
> > > >>>>>>>> stay expedited for as long as the system is still booting.
> > > >>>>>>>>
> > > >>>>>>>> For these reasons, this commit adds a config option
> > > >>>>>>>> 'CONFIG_RCU_BOOT_END_DELAY' and a boot parameter
> > > >>>>>>>> rcupdate.boot_end_delay.
> > > >>>>>>>>
> > > >>>>>>>> By default, this value is 20s. A system designer can choose to specify a value
> > > >>>>>>>> here to keep RCU from marking boot completion. The boot sequence will not
> > > >>>>>>>> be marked ended until at least boot_end_delay milliseconds have passed.
> > > >>>>>>>
> > > >>>>>>> Hi Joel,
> > > >>>>>>>
> > > >>>>>>> Just some thoughts on the default value of 20s, correct me if I'm wrong :-).
> > > >>>>>>>
> > > >>>>>>> Does the OS with CONFIG_PREEMPT_RT=y kernel concern more about the
> > > >>>>>>> real-time latency than the overall OS boot time?
> > > >>>>>>
> > > >>>>>> But every system has to boot, even an RT system.
> > > >>>>>>
> > > >>>>>>>
> > > >>>>>>> If so, we might make rcupdate.boot_end_delay = 0 as the default value
> > > >>>>>>> (NOT the default 20s) for CONFIG_PREEMPT_RT=y kernels?
> > > >>>>>>
> > > >>>>>> Could you measure how much time your RT system takes to boot before the application runs?
> > > >>>>>>
> > > >>>>>> I can change it to default 0 essentially NOOPing it, but I would rather have a saner default (10 seconds even), than having someone forget to tune this for their system.
> > > >>>>>
> > > >>>>> Provide a /sys location that the userspace code writes to when it
> > > >>>>> is ready? Different systems with different hardware and software
> > > >>>>> configurations are going to take different amounts of time to boot,
> > > >>>>> correct?
> > > >>>>
> > > >>>> I could add a sysfs node, but I still wanted this patch as well
> > > >>>> because I am wary of systems where yet more userspace changes are
> > > >>>> required. I feel the kernel should itself be able to do this. Yes, it
> > > >>>> is possible the system completes "booting" at a different time than
> > > >>>> what the kernel thinks. But it does that anyway (even without this
> > > >>>> patch), so I am not seeing a good reason to not do this in the kernel.
> > > >>>> It is also only a minimum cap, so if the in-kernel boot takes too
> > > >>>> long, then the patch will have no effect.
> > > >>>>
> > > >>>> Thoughts?
> > > >>>>
> > > >>> Why "rcu_boot_ended" is not enough? As i see right after that an "init"
> > > >>> process or shell or panic is going to be invoked by the kernel. It basically
> > > >>> indicates that a kernel is fully functional.
> > > >>>
> > > >>> Or an idea to wait even further? Until all kernel modules are loaded by
> > > >>> user space.
> > > >>
> > > >> I mentioned in commit message it is daemons, userspace initialization etc. There is a lot of userspace booting up as well and using the kernel while doing so.
> > > >>
> > > >> So, It does not make sense to me to mark kernel as booted too early. And no harm in adding some builtin kernel hysteresis. What am I missing?
> > > >>
> > > > Than it is up to user space to decide when it is ready in terms of "boot completed".
> > >
> > > I dont know if you caught up with the other threads. See replies from Paul and my reply to that.
> > >
> > > Also what you are proposing can be more harmful. If user space has a bug and does not notify the kernel that boot completed, then the boot can stay incomplete forever. The idea with this patch is to make things better, not worse.
> > >
> > I saw that Paul proposed to have a sysfs attribute using which you can
> > send a notification.
>
> Maybe I am missing something but how will a sysfs node on its own work really?
>
> 1. delete kernel marking itself boot completed -- and then sysfs
> marks it completed?
>
> 2. delete kernel marking itself boot completed -- and then sysfs
> marks it completed, if sysfs does not come in in N seconds, then
> kernel marks as completed?
>
> #1 is a no go, that just means a bug waiting to happen if userspace
> forgets to write to sysfs.
>
> #2 is just an extension of this patch. So I can add a sysfs node on
> top of this. And we can make the minimum time as a long period of
> time, as you noted below:
>
> > IMHO, to me this patch does not provide a clear correlation between what
> > is a boot complete and when it occurs. A boot complete is a synchronous
> > event whereas the patch thinks that after some interval a "boot" is completed.
>
> But that is exactly how the kernel code is now without this patch, so
> it is already broken in that sense, I am not really breaking it more
> ;-)
>
> > We can imply that after, say 100 seconds an initialization of user space
> > is done. Maybe 100 seconds then? :)
>
> Yes I am Ok with that. So are you suggesting we change the default to
> 100 seconds and then add a sysfs node to mark as boot done whenever
> userspace notifies?
The combination of sysfs manipulated by userspace and a kernel failsafe
makes sense to me. Especially if by default triggering the failsafe
splats. That way, bugs where userspace fails to update the sysfs file
get caught.
The non-default silent-failsafe mode is also useful to allow some power
savings in advance of userspace getting the sysfs updating in place.
And of course the default splatting setup can be used in internal testing
with the release software being more tolerant of userspace foibles.
Thanx, Paul
Powered by blists - more mailing lists