[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <3FF1C1CB-ACE5-485A-8BFB-AFE561CA6EC1@oracle.com>
Date: Fri, 23 Jan 2026 17:41:16 +0000
From: Prakash Sangappa <prakash.sangappa@...cle.com>
To: Peter Zijlstra <peterz@...radead.org>
CC: Thomas Gleixner <tglx@...utronix.de>, LKML <linux-kernel@...r.kernel.org>,
Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
"Paul E. McKenney"
<paulmck@...nel.org>,
Boqun Feng <boqun.feng@...il.com>, Jonathan Corbet
<corbet@....net>,
Madadi Vineeth Reddy <vineethr@...ux.ibm.com>,
K Prateek
Nayak <kprateek.nayak@....com>,
Steven Rostedt <rostedt@...dmis.org>,
Sebastian Andrzej Siewior <bigeasy@...utronix.de>,
Arnd Bergmann
<arnd@...db.de>,
"linux-arch@...r.kernel.org" <linux-arch@...r.kernel.org>,
Randy Dunlap <rdunlap@...radead.org>, Ron Geva <rongevarg@...il.com>,
Waiman
Long <longman@...hat.com>
Subject: Re: [patch V6 07/11] rseq: Implement time slice extension enforcement
timer
> On Jan 17, 2026, at 1:57 AM, Peter Zijlstra <peterz@...radead.org> wrote:
>
> On Mon, Dec 15, 2025 at 05:52:22PM +0100, Thomas Gleixner wrote:
>
>> +rseq_slice_extension_nsec
>> +=========================
>> +
>> +A task can request to delay its scheduling if it is in a critical section
>> +via the prctl(PR_RSEQ_SLICE_EXTENSION_SET) mechanism. This sets the maximum
>> +allowed extension in nanoseconds before scheduling of the task is enforced.
>> +Default value is 30000ns (30us). The possible range is 10000ns (10us) to
>> +50000ns (50us).
> +
> +This value has a direct correlation to the worst case scheduling latency;
> +increment at your own risk.
>
>
>> +unsigned int rseq_slice_ext_nsecs __read_mostly = 30 * NSEC_PER_USEC;
>
> Changed default to 10us
>
> Also, given the results of that slice_test thing, we might possibly get
> away with a much lower value still.
>
> Prakash, could you possibly capture a trace of hrtimer_start,
> hrtimer_cancel and hrtimer_expire_entry for your Oracle workload and run
> that python thing on it?
The database setup is on an older environment. The python script provided does not run as is.
So, modified it to gather the stats by parsing trace-cmd report output.
Modified python script is below
Here are rseq stats from the benchmark run.
# cat /sys/kernel/debug/rseq/stats
[…]
sgrant: 707530
sexpir: 19717
srevok: 26548
syield: 680982
Here is the histogram data snippet. Showing typical usage.
Gathered from 10 sec trace-cmd samples collected during the benchmark run.
# trace-cmd record -e hrtimer_start -e hrtimer_cancel -e hrtimer_expire_entry -- sleep 10
# trace-cmd report -t > trace.report
This is with slice size of 30us.
The kernel includes the following fix
> --- a/kernel/time/hrtimer.c
> +++ b/kernel/time/hrtimer.c
> @@ -1742,7 +1742,7 @@ static void __run_hrtimer(struct hrtimer
>
> lockdep_assert_held(&cpu_base->lock);
>
> - debug_deactivate(timer);
> + debug_hrtimer_deactivate(timer);
> base->running = timer;
========================================
RSEQ SLICE HISTOGRAM (us)
========================================
Task: ora_dbwr_lmb-666257 Mean: 1593.058 ns
Latency (us) | Count
------------------------------
EXPIRED | 3
0 us | 8
1 us | 116
2 us | 25
3 us | 2
4 us | 1
7 us | 1
Task: ora_dbwd_lmb-666196 Mean: 1548.641 ns
Latency (us) | Count
------------------------------
EXPIRED | 6
0 us | 19
1 us | 203
2 us | 33
4 us | 2
5 us | 1
[..]
Task: oracle_668951_l-668951 Mean: 1797.796 ns
Latency (us) | Count
------------------------------
1 us | 5
2 us | 1
3 us | 2
Task: oracle_671571_l-671571 Mean: 3285.425 ns
Latency (us) | Count
------------------------------
1 us | 1
3 us | 1
5 us | 1
6 us | 1
10 us | 1
Task: oracle_672277_l-672277 Mean: 2361.600 ns
Latency (us) | Count
------------------------------
1 us | 4
2 us | 3
5 us | 1
7 us | 1
9 us | 1
11 us | 1
Task: ora_dbwb_lmb-666192 Mean: 1548.157 ns
Latency (us) | Count
------------------------------
EXPIRED | 10
0 us | 24
1 us | 182
2 us | 39
3 us | 2
4 us | 4
5 us | 1
14 us | 1
———
#!/usr/bin/python3
#
# trace-cmd record -e hrtimer_start -e hrtimer_cancel -e hrtimer_expire_entry -- $cmd
# trace-cmd report -t >trace.report
# pending[timer_ptr] = {'ts': timestamp, 'comm': comm}
pending = {}
# histograms[comm][bucket] = count
histograms = {}
class OnlineHarmonicMean:
def __init__(self):
self.n = 0 # Count of elements
self.S = 0.0 # Cumulative sum of reciprocals
def update(self, x):
if x == 0:
raise ValueError("Harmonic mean is undefined for zero.")
self.n += 1
self.S += 1.0 / x
return self.n / self.S
@property
def mean(self):
return self.n / self.S if self.n > 0 else 0
ohms = {}
def handle_start(comm, ts, timer_ptr, func):
if "rseq_slice_expired" in func:
ts = ts.replace(':', '')
pending[timer_ptr] = {
'ts': ts,
'comm': comm
}
return None
def handle_cancel(ts, timer_ptr):
if timer_ptr in pending:
start_data = pending.pop(timer_ptr)
ts= ts.replace(':', '')
duration_ns = float(ts) - float(start_data['ts'])
duration_ns = int(duration_ns * 1000000000)
duration_us = duration_ns // 1000
comm = start_data['comm']
if comm not in ohms:
ohms[comm] = OnlineHarmonicMean()
if duration_us > 0:
ohms[comm].update(duration_ns)
if comm not in histograms:
histograms[comm] = {}
histograms[comm][duration_us] = histograms[comm].get(duration_us, 0) + 1
return None
def handle_expire(timer_ptr):
if timer_ptr in pending:
start_data = pending.pop(timer_ptr)
comm = start_data['comm']
if comm not in histograms:
histograms[comm] = {}
# Record -1 bucket for expired (failed to cancel)
histograms[comm][-1] = histograms[comm].get(-1, 0) + 1
return None
if __name__ == "__main__":
file_path="./trace.report"
try:
with open(file_path, 'r') as f:
for line in f:
# format descritpion of trace-cmd report
parts = line.split()
if len(parts) < 5:
continue
if "hrtimer_cancel" in parts[3]:
handle_cancel(parts[2], parts[4]);
continue
if len(parts) < 6:
continue
if "hrtimer_start" in parts[3]:
handle_start(parts[0], parts[2], parts[4], parts[5])
continue
if "hrtimer_expire_entry" in parts[3]:
handle_expire(parts[4])
except PermissionError:
print(f"Error: Permission denied reading {file_path}")
except FileNotFoundError:
print(f"Error: {file_path} not found.")
print("\n" + "="*40)
print("RSEQ SLICE HISTOGRAM (us)")
print("="*40)
for comm, buckets in histograms.items():
print(f"\nTask: {comm} Mean: {ohms[comm].mean:.3f} ns")
print(f" {'Latency (us)':<15} | {'Count'}")
print(f" {'-'*30}")
# Sort buckets numerically, putting -1 at the top
for bucket in sorted(buckets.keys()):
label = "EXPIRED" if bucket == -1 else f"{bucket} us"
print(f" {label:<15} | {buckets[bucket]}")
Powered by blists - more mailing lists