lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <E5E8224E-B5C4-4796-8ACA-F2E26BEA1374@joelfernandes.org>
Date:   Sun, 5 Mar 2023 10:03:12 -0500
From:   Joel Fernandes <joel@...lfernandes.org>
To:     Uladzislau Rezki <urezki@...il.com>
Cc:     linux-kernel@...r.kernel.org, Qiuxu Zhuo <qiuxu.zhuo@...el.com>,
        Frederic Weisbecker <frederic@...nel.org>,
        Lai Jiangshan <jiangshanlai@...il.com>,
        linux-doc@...r.kernel.org, "Paul E. McKenney" <paulmck@...nel.org>,
        rcu@...r.kernel.org
Subject: Re: [PATCH v3] rcu: Add a minimum time for marking boot as completed



> On Mar 5, 2023, at 6:39 AM, Uladzislau Rezki <urezki@...il.com> wrote:
> 
> On Fri, Mar 03, 2023 at 09:38:51PM +0000, Joel Fernandes (Google) wrote:
>> On many systems, a great deal of boot (in userspace) happens after the
>> kernel thinks the boot has completed. It is difficult to determine if
>> the system has really booted from the kernel side. Some features like
>> lazy-RCU can risk slowing down boot time if, say, a callback has been
>> added that the boot synchronously depends on. Further expedited callbacks
>> can get unexpedited way earlier than it should be, thus slowing down
>> boot (as shown in the data below).
>> 
>> For these reasons, this commit adds a config option
>> 'CONFIG_RCU_BOOT_END_DELAY' and a boot parameter rcupdate.boot_end_delay.
>> Userspace can also make RCU's view of the system as booted, by writing the
>> time in milliseconds to: /sys/module/rcupdate/parameters/rcu_boot_end_delay
>> Or even just writing a value of 0 to this sysfs node.
>> However, under no circumstance will the boot be allowed to end earlier
>> than just before init is launched.
>> 
>> The default value of CONFIG_RCU_BOOT_END_DELAY is chosen as 15s. This
>> suites ChromeOS and also a PREEMPT_RT system below very well, which need
>> no config or parameter changes, and just a simple application of this patch. A
>> system designer can also choose a specific value here to keep RCU from marking
>> boot completion.  As noted earlier, RCU's perspective of the system as booted
>> will not be marker until at least rcu_boot_end_delay milliseconds have passed
>> or an update is made via writing a small value (or 0) in milliseconds to:
>> /sys/module/rcupdate/parameters/rcu_boot_end_delay.
>> 
>> One side-effect of this patch is, there is a risk that a real-time workload
>> launched just after the kernel boots will suffer interruptions due to expedited
>> RCU, which previous ended just before init was launched. However, to mitigate
>> such an issue (however unlikely), the user should either tune
>> CONFIG_RCU_BOOT_END_DELAY to a smaller value than 15 seconds or write a value
>> of 0 to /sys/module/rcupdate/parameters/rcu_boot_end_delay, once userspace
>> boots, and before launching the real-time workload.
>> 
>> Qiuxu also noted impressive boot-time improvements with earlier version
>> of patch. An excerpt from the data he shared:
>> 
>> 1) Testing environment:
>>    OS            : CentOS Stream 8 (non-RT OS)
>>    Kernel     : v6.2
>>    Machine : Intel Cascade Lake server (2 sockets, each with 44 logical threads)
>>    Qemu  args  : -cpu host -enable-kvm, -smp 88,threads=2,sockets=2, …
>> 
>> 2) OS boot time definition:
>>    The time from the start of the kernel boot to the shell command line
>>    prompt is shown from the console. [ Different people may have
>>    different OS boot time definitions. ]
>> 
>> 3) Measurement method (very rough method):
>>    A timer in the kernel periodically prints the boot time every 100ms.
>>    As soon as the shell command line prompt is shown from the console,
>>    we record the boot time printed by the timer, then the printed boot
>>    time is the OS boot time.
>> 
>> 4) Measured OS boot time (in seconds)
>>   a) Measured 10 times w/o this patch:
>>        8.7s, 8.4s, 8.6s, 8.2s, 9.0s, 8.7s, 8.8s, 9.3s, 8.8s, 8.3s
>>        The average OS boot time was: ~8.7s
>> 
>>   b) Measure 10 times w/ this patch:
>>        8.5s, 8.2s, 7.6s, 8.2s, 8.7s, 8.2s, 7.8s, 8.2s, 9.3s, 8.4s
>>        The average OS boot time was: ~8.3s.
>> 
>> Tested-by: Qiuxu Zhuo <qiuxu.zhuo@...el.com>
>> Signed-off-by: Joel Fernandes (Google) <joel@...lfernandes.org>
>> ---
>> v1->v2:
>>    Update some comments and description.
>> v2->v3:
>>        Add sysfs param, and update with Test data.
>> 
>> .../admin-guide/kernel-parameters.txt         | 12 ++++
>> cc_list                                       |  8 +++
>> kernel/rcu/Kconfig                            | 19 ++++++
>> kernel/rcu/update.c                           | 68 ++++++++++++++++++-
>> 4 files changed, 106 insertions(+), 1 deletion(-)
>> create mode 100644 cc_list
>> 
>> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
>> index 2429b5e3184b..611de90d9c13 100644
>> --- a/Documentation/admin-guide/kernel-parameters.txt
>> +++ b/Documentation/admin-guide/kernel-parameters.txt
>> @@ -5085,6 +5085,18 @@
>>    rcutorture.verbose= [KNL]
>>            Enable additional printk() statements.
>> 
>> +    rcupdate.rcu_boot_end_delay= [KNL]
>> +            Minimum time in milliseconds that must elapse
>> +            before the boot sequence can be marked complete
>> +            from RCU's perspective, after which RCU's behavior
>> +            becomes more relaxed. The default value is also
>> +            configurable via CONFIG_RCU_BOOT_END_DELAY.
>> +            Userspace can also mark the boot as completed
>> +            sooner by writing the time in milliseconds, say once
>> +            userspace considers the system as booted, to:
>> +            /sys/module/rcupdate/parameters/rcu_boot_end_delay
>> +            Or even just writing a value of 0 to this sysfs node.
>> +
>>    rcupdate.rcu_cpu_stall_ftrace_dump= [KNL]
>>            Dump ftrace buffer after reporting RCU CPU
>>            stall warning.
>> diff --git a/kernel/rcu/Kconfig b/kernel/rcu/Kconfig
>> index 9071182b1284..4b5ffa36cbaf 100644
>> --- a/kernel/rcu/Kconfig
>> +++ b/kernel/rcu/Kconfig
>> @@ -217,6 +217,25 @@ config RCU_BOOST_DELAY
>> 
>>      Accept the default if unsure.
>> 
>> +config RCU_BOOT_END_DELAY
>> +    int "Minimum time before RCU may consider in-kernel boot as completed"
>> +    range 0 120000
>> +    default 15000
>> +    help
>> +      Default value of the minimum time in milliseconds that must elapse
>> +      before the boot sequence can be marked complete from RCU's perspective,
>> +      after which RCU's behavior becomes more relaxed.
>> +      Userspace can also mark the boot as completed sooner than this default
>> +      by writing the time in milliseconds, say once userspace considers
>> +      the system as booted, to: /sys/module/rcupdate/parameters/rcu_boot_end_delay.
>> +      Or even just writing a value of 0 to this sysfs node.
>> +
>> +      The actual delay for RCU's view of the system to be marked as booted can be
>> +      higher than this value if the kernel takes a long time to initialize but it
>> +      will never be smaller than this value.
>> +
>> +      Accept the default if unsure.
>> +
>> config RCU_EXP_KTHREAD
>>    bool "Perform RCU expedited work in a real-time kthread"
>>    depends on RCU_BOOST && RCU_EXPERT
>> diff --git a/kernel/rcu/update.c b/kernel/rcu/update.c
>> index 19bf6fa3ee6a..93138c92136e 100644
>> --- a/kernel/rcu/update.c
>> +++ b/kernel/rcu/update.c
>> @@ -224,18 +224,84 @@ void rcu_unexpedite_gp(void)
>> }
>> EXPORT_SYMBOL_GPL(rcu_unexpedite_gp);
>> 
>> +/*
>> + * Minimum time in milliseconds until RCU can consider in-kernel boot as
>> + * completed.  This can also be tuned at runtime to end the boot earlier, by
>> + * userspace init code writing the time in milliseconds (even 0) to:
>> + * /sys/module/rcupdate/parameters/rcu_boot_end_delay
>> + */
>> +static int rcu_boot_end_delay = CONFIG_RCU_BOOT_END_DELAY;
>> +
>> static bool rcu_boot_ended __read_mostly;
>> +static bool rcu_boot_end_called __read_mostly;
>> +static DEFINE_MUTEX(rcu_boot_end_lock);
>> +
>> +static int param_set_rcu_boot_end(const char *val, const struct kernel_param *kp)
>> +{
>> +    uint end_ms;
>> +    int ret = kstrtouint(val, 0, &end_ms);
>> +
>> +    if (ret)
>> +        return ret;
>> +    WRITE_ONCE(*(uint *)kp->arg, end_ms);
>> +
>> +    /*
>> +     * rcu_end_inkernel_boot() should be called at least once during init
>> +     * before we can allow param changes to end the boot.
>> +     */
>> +    mutex_lock(&rcu_boot_end_lock);
>> +    rcu_boot_end_delay = end_ms;
>> +    if (!rcu_boot_ended && rcu_boot_end_called) {
>> +        mutex_unlock(&rcu_boot_end_lock);
>> +        rcu_end_inkernel_boot();
>> +    }
>> +    mutex_unlock(&rcu_boot_end_lock);
>> +    return ret;
>> +}
>> +
>> +static const struct kernel_param_ops rcu_boot_end_ops = {
>> +    .set = param_set_rcu_boot_end,
>> +    .get = param_get_uint,
>> +};
>> +module_param_cb(rcu_boot_end_delay, &rcu_boot_end_ops, &rcu_boot_end_delay, 0644);
>> 
>> /*
>> - * Inform RCU of the end of the in-kernel boot sequence.
>> + * Inform RCU of the end of the in-kernel boot sequence. The boot sequence will
>> + * not be marked ended until at least rcu_boot_end_delay milliseconds have passed.
>>  */
>> +void rcu_end_inkernel_boot(void);
>> +static void rcu_boot_end_work_fn(struct work_struct *work)
>> +{
>> +    rcu_end_inkernel_boot();
>> +}
>> +static DECLARE_DELAYED_WORK(rcu_boot_end_work, rcu_boot_end_work_fn);
>> +
>> void rcu_end_inkernel_boot(void)
>> {
>> +    mutex_lock(&rcu_boot_end_lock);
>> +    rcu_boot_end_called = true;
>> +
>> +    if (rcu_boot_ended)
>> +        return;
>> +
>> +    if (rcu_boot_end_delay) {
>> +        u64 boot_ms = div_u64(ktime_get_boot_fast_ns(), 1000000UL);
>> +
>> +        if (boot_ms < rcu_boot_end_delay) {
>> +            schedule_delayed_work(&rcu_boot_end_work,
>> +                    rcu_boot_end_delay - boot_ms);
> <snip>
> urezki@...38:~/data/raid0/coding/linux-rcu.git$ git diff
> diff --git a/kernel/rcu/update.c b/kernel/rcu/update.c
> index 93138c92136e..93f426f0f4ec 100644
> --- a/kernel/rcu/update.c
> +++ b/kernel/rcu/update.c
> @@ -289,7 +289,7 @@ void rcu_end_inkernel_boot(void)
> 
>                if (boot_ms < rcu_boot_end_delay) {
>                        schedule_delayed_work(&rcu_boot_end_work,
> -                                       rcu_boot_end_delay - boot_ms);
> +                               msecs_to_jiffies(rcu_boot_end_delay - boot_ms));
>                        mutex_unlock(&rcu_boot_end_lock);
>                        return;
>                }
> urezki@...38:~/data/raid0/coding/linux-rcu.git$
> <snip>
> 
> I think you need to apply above patch. I am not sure maybe Paul
> has already mentioned about it. But just in case.

Ah, the reason my testing did not catch it is because for HZ=1000, msecs
and jiffies are the same.

Great eyes and thank you Vlad, I’ll make the fix and repost it.

 - Joel

> 
> --
> Uladzislau Rezki

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ