linux-kernel - [RFC PATCH 0/3] DO NOT MERGE: Breaking down the experimantal diff

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20230831104508.7619-1-kprateek.nayak@amd.com>
Date:   Thu, 31 Aug 2023 16:15:05 +0530
From:   K Prateek Nayak <kprateek.nayak@....com>
To:     <void@...ifault.com>
CC:     <linux-kernel@...r.kernel.org>, <peterz@...radead.org>,
        <mingo@...hat.com>, <juri.lelli@...hat.com>,
        <vincent.guittot@...aro.org>, <dietmar.eggemann@....com>,
        <rostedt@...dmis.org>, <bsegall@...gle.com>, <mgorman@...e.de>,
        <bristot@...hat.com>, <vschneid@...hat.com>, <tj@...nel.org>,
        <roman.gushchin@...ux.dev>, <gautham.shenoy@....com>,
        <aaron.lu@...el.com>, <wuyun.abel@...edance.com>,
        <kernel-team@...a.com>, <kprateek.nayak@....com>
Subject: [RFC PATCH 0/3] DO NOT MERGE: Breaking down the experimantal diff

Since the diff is a concoction of a bunch of things that somehow work,
this series tries to clean it up. I've lost a bunch of things based on
David's suggestion [1], [2] and added some new logic on top that is
covered in Patch 3.

Breakdown is as follows:

- Patch 1 moves struct definition to sched.h

- Patch 2 is the above diff but more palatable with changes based on
  David's comments.

- Patch 3 adds a bailout mechanism on top, since I saw the same amount
  of regression with Patch2.

With these changes, following are the results for tbench 128-clients:

tip				: 1.00 (var: 1.00%)
tip + v3 + series till patch 2	: 0.41 (var: 1.15%) (diff: -58.81%)
tip + v3 + full series		: 1.01 (var: 0.36%) (diff: +00.92%)

Disclaimer: All the testing is done hyper-focused on tbench 128-clients
case on a dual socket 3rd Generation EPYC system (2 x 64C/128T). The
series should apply cleanly on top of tip at commit 88c56cfeaec4
("sched/fair: Block nohz tick_stop when cfs bandwidth in use") + v3 of
shared_runq series (this series)

The SHARED_RUNQ_SHARD_SZ was set to 16 throughout the testing since that
maches the sd_llc_size on the system.

P.S. I finally got to enabling lockdep and I saw the following splat
early during the boot but nothing after (so I think everything is
alright?):

  ================================
  WARNING: inconsistent lock state
  6.5.0-rc2-shared-wq-v3-fix+ #681 Not tainted
  --------------------------------
  inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage.
  swapper/0/1 [HC0[0]:SC0[0]:HE1:SE1] takes:
  ffff95f6bb24d818 (&rq->__lock){?.-.}-{2:2}, at: raw_spin_rq_lock_nested+0x15/0x30
  {IN-HARDIRQ-W} state was registered at:
    lock_acquire+0xcc/0x2c0
    _raw_spin_lock_nested+0x2e/0x40
    scheduler_tick+0x5c/0x350
    update_process_times+0x83/0x90
    tick_periodic+0x27/0xe0
    tick_handle_periodic+0x24/0x70
    timer_interrupt+0x18/0x30
    __handle_irq_event_percpu+0x8b/0x240
    handle_irq_event+0x38/0x80
    handle_level_irq+0x90/0x170
    __common_interrupt+0x4f/0x110
    common_interrupt+0x7f/0xa0
    asm_common_interrupt+0x26/0x40
    __x86_return_thunk+0x0/0x40
    console_flush_all+0x2e3/0x590
    console_unlock+0x56/0x100
    vprintk_emit+0x153/0x350
    _printk+0x5c/0x80
    apic_intr_mode_init+0x85/0x110
    x86_late_time_init+0x24/0x40
    start_kernel+0x5e1/0x7a0
    x86_64_start_reservations+0x18/0x30
    x86_64_start_kernel+0x92/0xa0
    secondary_startup_64_no_verify+0x17e/0x18b
  irq event stamp: 65081
  hardirqs last  enabled at (65081): [<ffffffff857723c1>] _raw_spin_unlock_irqrestore+0x31/0x60
  hardirqs last disabled at (65080): [<ffffffff857720d3>] _raw_spin_lock_irqsave+0x63/0x70
  softirqs last  enabled at (64284): [<ffffffff848ccb7b>] __irq_exit_rcu+0x7b/0xa0
  softirqs last disabled at (64269): [<ffffffff848ccb7b>] __irq_exit_rcu+0x7b/0xa0
 
  other info that might help us debug this:
   Possible unsafe locking scenario:
 
         CPU0
         ----
    lock(&rq->__lock);
    <Interrupt>
      lock(&rq->__lock);
 
   *** DEADLOCK ***
 
  1 lock held by swapper/0/1:
   #0: ffffffff8627eec8 (sched_domains_mutex){+.+.}-{4:4}, at: sched_init_smp+0x3f/0xd0
 
  stack backtrace:
  CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.5.0-rc2-shared-wq-v3-fix+ #681
  Hardware name: Dell Inc. PowerEdge R6525/024PW1, BIOS 2.7.3 03/30/2022
  Call Trace:
   <TASK>
   dump_stack_lvl+0x5c/0x90
   mark_lock.part.0+0x755/0x930
   ? __lock_acquire+0x3e7/0x21d0
   ? __lock_acquire+0x2f0/0x21d0
   __lock_acquire+0x3ab/0x21d0
   ? lock_is_held_type+0xaa/0x130
   lock_acquire+0xcc/0x2c0
   ? raw_spin_rq_lock_nested+0x15/0x30
   ? free_percpu+0x245/0x4a0
   _raw_spin_lock_nested+0x2e/0x40
   ? raw_spin_rq_lock_nested+0x15/0x30
   raw_spin_rq_lock_nested+0x15/0x30
   update_domains_fair+0xf1/0x220
   sched_update_domains+0x32/0x50
   sched_init_domains+0xd9/0x100
   sched_init_smp+0x4b/0xd0
   ? stop_machine+0x32/0x40
   kernel_init_freeable+0x2d3/0x540
   ? __pfx_kernel_init+0x10/0x10
   kernel_init+0x1a/0x1c0
   ret_from_fork+0x34/0x50
   ? __pfx_kernel_init+0x10/0x10
   ret_from_fork_asm+0x1b/0x30
  RIP: 0000:0x0
  Code: Unable to access opcode bytes at 0xffffffffffffffd6.
  RSP: 0000:0000000000000000 EFLAGS: 00000000 ORIG_RAX: 0000000000000000
  RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
  RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
  RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
  R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
  R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
   </TASK>

References:

[1] https://lore.kernel.org/all/20230831013435.GB506447@maniforge/
[2] https://lore.kernel.org/all/20230831023254.GC506447@maniforge/

-- 
Thanks and Regards,
Prateek