linux-kernel - Re: [PATCH v4 00/13] Generalized Priority Inheritance via Proxy Execution v3

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <206791ae-db73-dd00-9fb3-4c642638dd85@amd.com>
Date:   Tue, 13 Jun 2023 08:41:44 +0530
From:   K Prateek Nayak <kprateek.nayak@....com>
To:     John Stultz <jstultz@...gle.com>
Cc:     LKML <linux-kernel@...r.kernel.org>,
        Joel Fernandes <joelaf@...gle.com>,
        Qais Yousef <qyousef@...gle.com>,
        Ingo Molnar <mingo@...hat.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Juri Lelli <juri.lelli@...hat.com>,
        Vincent Guittot <vincent.guittot@...aro.org>,
        Dietmar Eggemann <dietmar.eggemann@....com>,
        Valentin Schneider <vschneid@...hat.com>,
        Steven Rostedt <rostedt@...dmis.org>,
        Ben Segall <bsegall@...gle.com>,
        Zimuzo Ezeozue <zezeozue@...gle.com>,
        Youssef Esmat <youssefesmat@...gle.com>,
        Mel Gorman <mgorman@...e.de>,
        Daniel Bristot de Oliveira <bristot@...hat.com>,
        Will Deacon <will@...nel.org>,
        Waiman Long <longman@...hat.com>,
        Boqun Feng <boqun.feng@...il.com>,
        "Paul E . McKenney" <paulmck@...nel.org>, kernel-team@...roid.com
Subject: Re: [PATCH v4 00/13] Generalized Priority Inheritance via Proxy
 Execution v3

Hello John,

On 6/13/2023 12:22 AM, John Stultz wrote:
> On Mon, Jun 12, 2023 at 10:21 AM K Prateek Nayak <kprateek.nayak@....com> wrote:
>> On 6/1/2023 11:28 AM, John Stultz wrote:
>>> [..snip..]
>>>
>>> Issues still to address:
>>> —----------
>>> * Occasional getting null scheduler entities from pick_next_entity() in
>>>   CFS. I’m a little stumped as to where this is going awry just yet, and
>>>   delayed sending this out, but figured it was worth getting it out for
>>>   review on the other issues while I chase this down.
>>
>> I'm consistently hitting the above issue early during the boot when I was
>> trying to test the series on a dual socket 3rd Generation EPYC platform
>> (2 x 64C/128T). Sharing the trace below:
>>
>>     [   20.821808] ------------[ cut here ]------------
>>     [   20.826432] kernel BUG at kernel/sched/core.c:7462!
>>     [   20.831322] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
>>     [   20.836545] CPU: 250 PID: 0 Comm: swapper/250 Not tainted 6.4.0-rc4-proxy-execution-v4+ #474
>>     [   20.844976] Hardware name: Dell Inc. PowerEdge R6525/024PW1, BIOS 2.7.3 03/30/2022
>>     [   20.852544] RIP: 0010:__schedule+0x18b6/0x20a0
>>     [   20.856998] Code: 0f 85 51 04 00 00 83 ad 50 ff ff ff 01 0f 85 05 e9 ff ff f3 0f 1e fa 48 8b 35 0e 0c fe 00 48 c7 c7 33 a1 c1 85 e8 ca 37 23 ff <0f> 0b 4c 89 ff 4c 8b 6d 98 e8 1c 82 00 00 4c 89 f7 e8 14 82 00 00
>>     [   20.875744] RSP: 0018:ffffbd1e4d1d7dd0 EFLAGS: 00010082
>>     [   20.880970] RAX: 0000000000000001 RBX: 0000000000000000 RCX: 0000000000000005
>>     [   20.888104] RDX: ffff9d4d0006b000 RSI: 0000000000000200 RDI: ffff9d4d0004d400
>>     [   20.895235] RBP: ffffbd1e4d1d7e98 R08: 0000000000000024 R09: ffffffffff7edbd0
>>     [   20.902369] R10: 0000000000000000 R11: 0000000000000000 R12: ffff9d4d12e25a20
>>     [   20.909501] R13: ffff9dcbffab3840 R14: ffffbd1e4d1d7e50 R15: ffff9dcbff2b3840
>>     [   20.916632] FS:  0000000000000000(0000) GS:ffff9dcbffa80000(0000) knlGS:0000000000000000
>>     [   20.924709] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>     [   20.930449] CR2: 00007f92120c4800 CR3: 000000011477a002 CR4: 0000000000770ee0
>>     [   20.937581] PKRU: 55555554
>>     [   20.940292] Call Trace:
>>     [   20.942741]  <TASK>
>>     [   20.944845]  ? show_regs+0x6e/0x80
>>     [   20.948259]  ? die+0x3c/0xa0
>>     [   20.951146]  ? do_trap+0xd4/0xf0
>>     [   20.954377]  ? do_error_trap+0x75/0xa0
>>     [   20.958129]  ? __schedule+0x18b6/0x20a0
>>     [   20.961971]  ? exc_invalid_op+0x57/0x80
>>     [   20.965808]  ? __schedule+0x18b6/0x20a0
>>     [   20.969648]  ? asm_exc_invalid_op+0x1f/0x30
>>     [   20.973835]  ? __schedule+0x18b6/0x20a0
>>     [   20.977672]  ? cpuidle_enter_state+0xde/0x710
>>     [   20.982033]  schedule_idle+0x2e/0x50
>>     [   20.985614]  do_idle+0x15d/0x240
>>     [   20.988845]  cpu_startup_entry+0x24/0x30
>>     [   20.992772]  start_secondary+0x126/0x160
>>     [   20.996695]  secondary_startup_64_no_verify+0x10b/0x10b
>>     [   21.001924]  </TASK>
>>     [   21.004117] Modules linked in: sch_fq_codel dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua ipmi_devintf ipmi_msghandler msr ramoops reed_solomon pstore_blk pstore_zone efi_pstore
>>     ip_tables x_tables autofs4 btrfs blake2b_generic raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear mgag200
>>     i2c_algo_bit drm_shmem_helper drm_kms_helper ghash_clmulni_intel syscopyarea sysfillrect aesni_intel sysimgblt crypto_simd crc32_pclmul cryptd crct10dif_pclmul sha512_ssse3 xhci_pci tg3 drm
>>     xhci_pci_renesas megaraid_sas wmi
>>     [   21.055707] Dumping ftrace buffer:
>>     [   21.059291] ---------------------------------
>>     [   21.063697]   <idle>-0       250dn.2. 21175635us : __schedule: JDB: BUG!!! pick next retry_count > 50
>>     [   21.072915] ---------------------------------
>>     [   21.077282] ---[ end trace 0000000000000000 ]---
>>
>>     $ sed -n 7460,7462p kernel/sched/core.c
>>       if (retry_count++ > 50) {
>>               trace_printk("JDB: BUG!!! pick next retry_count > 50\n");
>>               BUG();
>>
>> Hope it helps during the debug. If you have a fix in mind that you
>> would like me to test, please do let me know.
> 
> Thank you for the testing and feedback here! I really appreciate it!
> And my apologies that you're hitting trouble here!

No worries! Better to hit the snags now than later :)

> 
>>> * Better deadlock handling in proxy(): With the ww_mutex issues
>>>   resolved, we shouldn’t see circular blocked_on references, but a
>>>   number of the bugs I’ve been chasing recently come from getting stuck
>>>   with proxy() returning null forcing a reselection over and over. These
>>>   are still bugs to address, but my current thinking is that if we get
>>>   stuck like this, we can start to remove the selected mutex blocked
>>>   tasks from the rq, and let them be woken from the mutex waiters list
>>>   as is done currently? Thoughts here would be appreciated.
>>>
>>> * More work on migration correctness (RT/DL load balancing,etc). I’m
>>>   still seeing occasional trouble as cpu counts go up which seems to be
>>>   due to a bunch of tasks being proxy migrated to a cpu, then having to
>>>   migrate them all away at once (seeing lots of pick again iterations).
>>>   This may actually be correct, due to chain migration, but it ends up
>>>   looking similar to a deadlock.
> 
> So I suspect what you're seeing is a combination of the two items
> above. With 128 threads, my deadlock detection BUG() at 50 is probably
> far too low, as migration chains can get pretty long.
> Clearly BUG'ing at a fixed count is the wrong approach (but was
> helpful for quickly catching problems and debugging in my
> environment).

Ah! I see. Thank you for clarifying. Let me check if going to commit
259a8134aa27 ("sched: Potential fixup, not sure why rq_selected() is used
here") helps.

> 
> My main priority is trying to get the null se crashes resolved (almost
> have my hands around it, but took a little break from it end of last
> week), and once I have something there I'll update and share my WIP
> tree:
>   https://github.com/johnstultz-work/linux-dev/commits/proxy-exec-WIP

I'll keep an eye out for any updates on the branch.

> 
> To include some extra trace logging and I'll reach out to see if you
> can capture the issue again.
> 
> Thanks so much again for your interest and help in testing!
> -john

 
--
Thanks and Regards,
Prateek