lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1C89BAF2-A95C-44FF-9F5A-7078BCC8A571@oracle.com>
Date: Fri, 23 Jan 2026 17:38:29 +0000
From: Prakash Sangappa <prakash.sangappa@...cle.com>
To: Peter Zijlstra <peterz@...radead.org>
CC: Thomas Gleixner <tglx@...utronix.de>, LKML <linux-kernel@...r.kernel.org>,
        Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
        "Paul E. McKenney"
	<paulmck@...nel.org>,
        Boqun Feng <boqun.feng@...il.com>, Jonathan Corbet
	<corbet@....net>,
        Madadi Vineeth Reddy <vineethr@...ux.ibm.com>,
        K Prateek
 Nayak <kprateek.nayak@....com>,
        Steven Rostedt <rostedt@...dmis.org>,
        Sebastian Andrzej Siewior <bigeasy@...utronix.de>,
        Arnd Bergmann
	<arnd@...db.de>,
        "linux-arch@...r.kernel.org" <linux-arch@...r.kernel.org>,
        Randy Dunlap <rdunlap@...radead.org>, Ron Geva <rongevarg@...il.com>,
        Waiman
 Long <longman@...hat.com>
Subject: Re: [patch V6 07/11] rseq: Implement time slice extension enforcement
 timer



> On Jan 17, 2026, at 1:57 AM, Peter Zijlstra <peterz@...radead.org> wrote:
> 
> On Mon, Dec 15, 2025 at 05:52:22PM +0100, Thomas Gleixner wrote:
> 
>> +rseq_slice_extension_nsec
>> +=========================
>> +
>> +A task can request to delay its scheduling if it is in a critical section
>> +via the prctl(PR_RSEQ_SLICE_EXTENSION_SET) mechanism. This sets the maximum
>> +allowed extension in nanoseconds before scheduling of the task is enforced.
>> +Default value is 30000ns (30us). The possible range is 10000ns (10us) to
>> +50000ns (50us).
> +
> +This value has a direct correlation to the worst case scheduling latency;
> +increment at your own risk.
> 
> 
>> +unsigned int rseq_slice_ext_nsecs __read_mostly = 30 * NSEC_PER_USEC;
> 
> Changed default to 10us
> 
> Also, given the results of that slice_test thing, we might possibly get
> away with a much lower value still. 
> 
> Prakash, could you possibly capture a trace of hrtimer_start,
> hrtimer_cancel and hrtimer_expire_entry for your Oracle workload and run
> that python thing on it?



The database setup is on an older environment. The python script provided does not run as is.
So, modified it to gather the stats by parsing trace-cmd report output.
Modified python script is below

Here are rseq  stats from the benchmark run.
# cat /sys/kernel/debug/rseq/stats
[…]
sgrant:          707530
sexpir:             19717
srevok:            26548
syield:           680982

Here is the histogram data snippet. Showing typical usage.
Gathered from 10 sec trace-cmd samples collected during the benchmark run.
# trace-cmd record -e hrtimer_start -e hrtimer_cancel -e hrtimer_expire_entry -- sleep 10
# trace-cmd report -t > trace.report

This is with slice size of 30us. 
The kernel includes the following fix

> --- a/kernel/time/hrtimer.c
> +++ b/kernel/time/hrtimer.c
> @@ -1742,7 +1742,7 @@ static void __run_hrtimer(struct hrtimer
> 
> lockdep_assert_held(&cpu_base->lock);
> 
> - debug_deactivate(timer);
> + debug_hrtimer_deactivate(timer);
> base->running = timer;

========================================
RSEQ SLICE HISTOGRAM (us)
========================================

Task: ora_dbwr_lmb-666257    Mean: 1593.058 ns
  Latency (us)    | Count
  ------------------------------
  EXPIRED         | 3
  0 us            | 8
  1 us            | 116
  2 us            | 25
  3 us            | 2
  4 us            | 1
  7 us            | 1

Task: ora_dbwd_lmb-666196    Mean: 1548.641 ns
  Latency (us)    | Count
  ------------------------------
  EXPIRED         | 6
  0 us            | 19
  1 us            | 203
  2 us            | 33
  4 us            | 2
  5 us            | 1


[..]

Task: oracle_668951_l-668951    Mean: 1797.796 ns
  Latency (us)    | Count
  ------------------------------
  1 us            | 5
  2 us            | 1
  3 us            | 2

Task: oracle_671571_l-671571    Mean: 3285.425 ns
  Latency (us)    | Count
  ------------------------------
  1 us            | 1
  3 us            | 1
  5 us            | 1
  6 us            | 1
  10 us           | 1

Task: oracle_672277_l-672277    Mean: 2361.600 ns
  Latency (us)    | Count
  ------------------------------
  1 us            | 4
  2 us            | 3
  5 us            | 1
  7 us            | 1
  9 us            | 1
  11 us           | 1

Task: ora_dbwb_lmb-666192    Mean: 1548.157 ns
  Latency (us)    | Count
  ------------------------------
  EXPIRED         | 10
  0 us            | 24
  1 us            | 182
  2 us            | 39
  3 us            | 2
  4 us            | 4
  5 us            | 1
  14 us           | 1

———
#!/usr/bin/python3

#
# trace-cmd record -e hrtimer_start -e hrtimer_cancel -e hrtimer_expire_entry -- $cmd
# trace-cmd report -t >trace.report

# pending[timer_ptr] = {'ts': timestamp, 'comm': comm}
pending = {}

# histograms[comm][bucket] = count
histograms = {}

class OnlineHarmonicMean:
   def __init__(self):
       self.n = 0          # Count of elements
       self.S = 0.0        # Cumulative sum of reciprocals

   def update(self, x):
       if x == 0:
           raise ValueError("Harmonic mean is undefined for zero.")

       self.n += 1
       self.S += 1.0 / x
       return self.n / self.S

   @property
   def mean(self):
       return self.n / self.S if self.n > 0 else 0

ohms = {}

def handle_start(comm, ts,  timer_ptr, func):
   if "rseq_slice_expired" in func:
       ts = ts.replace(':', '')
       pending[timer_ptr] = {
           'ts': ts,
           'comm': comm
       }
   return None

def handle_cancel(ts, timer_ptr):

   if timer_ptr in pending:
       start_data = pending.pop(timer_ptr)
       ts= ts.replace(':', '')
       duration_ns = float(ts) - float(start_data['ts'])
       duration_ns = int(duration_ns * 1000000000)
       duration_us = duration_ns // 1000

       comm = start_data['comm']

       if comm not in ohms:
           ohms[comm] = OnlineHarmonicMean()

       if duration_us > 0:
       ohms[comm].update(duration_ns)

       if comm not in histograms:
           histograms[comm] = {}

       histograms[comm][duration_us] = histograms[comm].get(duration_us, 0) + 1
   return None

def handle_expire(timer_ptr):
   if timer_ptr in pending:
       start_data = pending.pop(timer_ptr)
       comm = start_data['comm']
       if comm not in histograms:
           histograms[comm] = {}

       # Record -1 bucket for expired (failed to cancel)
       histograms[comm][-1] = histograms[comm].get(-1, 0) + 1
   return None

if __name__ == "__main__":

file_path="./trace.report"
try:
with open(file_path, 'r') as f:
for line in f:
# format descritpion of trace-cmd report
parts = line.split()

if len(parts) < 5:
continue
if "hrtimer_cancel" in parts[3]:
handle_cancel(parts[2], parts[4]);
continue
if len(parts) < 6:
continue
if "hrtimer_start" in parts[3]:
handle_start(parts[0], parts[2], parts[4], parts[5])
continue
if "hrtimer_expire_entry" in parts[3]:
handle_expire(parts[4])

except PermissionError:
print(f"Error: Permission denied reading {file_path}")
except FileNotFoundError:
print(f"Error: {file_path} not found.")


print("\n" + "="*40)
print("RSEQ SLICE HISTOGRAM (us)")
print("="*40)
for comm, buckets in histograms.items():
print(f"\nTask: {comm}    Mean: {ohms[comm].mean:.3f} ns")
print(f"  {'Latency (us)':<15} | {'Count'}")
print(f"  {'-'*30}")
# Sort buckets numerically, putting -1 at the top
for bucket in sorted(buckets.keys()):
label = "EXPIRED" if bucket == -1 else f"{bucket} us"
print(f"  {label:<15} | {buckets[bucket]}")

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ