[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <e1ee6187-77a7-dbf2-3e14-adba48460f5b@oracle.com>
Date: Tue, 23 Feb 2021 14:25:16 -0500
From: Chris Hyser <chris.hyser@...cle.com>
To: Peter Zijlstra <peterz@...radead.org>
Cc: "Joel Fernandes (Google)" <joel@...lfernandes.org>,
Nishanth Aravamudan <naravamudan@...italocean.com>,
Julien Desfossez <jdesfossez@...italocean.com>,
Tim Chen <tim.c.chen@...ux.intel.com>,
Vineeth Pillai <viremana@...ux.microsoft.com>,
Aaron Lu <aaron.lwe@...il.com>,
Aubrey Li <aubrey.intel@...il.com>, tglx@...utronix.de,
linux-kernel@...r.kernel.org, mingo@...nel.org,
torvalds@...ux-foundation.org, fweisbec@...il.com,
keescook@...omium.org, kerrnel@...gle.com,
Phil Auld <pauld@...hat.com>,
Valentin Schneider <valentin.schneider@....com>,
Mel Gorman <mgorman@...hsingularity.net>,
Pawan Gupta <pawan.kumar.gupta@...ux.intel.com>,
Paolo Bonzini <pbonzini@...hat.com>, vineeth@...byteword.org,
Chen Yu <yu.c.chen@...el.com>,
Christian Brauner <christian.brauner@...ntu.com>,
Agata Gruza <agata.gruza@...el.com>,
Antonio Gomez Iglesias <antonio.gomez.iglesias@...el.com>,
graf@...zon.com, konrad.wilk@...cle.com, dfaggioli@...e.com,
pjt@...gle.com, rostedt@...dmis.org, derkling@...gle.com,
benbjiang@...cent.com,
Alexandre Chartre <alexandre.chartre@...cle.com>,
James.Bottomley@...senpartnership.com, OWeisse@...ch.edu,
Dhaval Giani <dhaval.giani@...cle.com>,
Junaid Shahid <junaids@...gle.com>, jsbarnes@...gle.com,
Ben Segall <bsegall@...gle.com>, Josh Don <joshdon@...gle.com>,
Hao Luo <haoluo@...gle.com>,
Tom Lendacky <thomas.lendacky@....com>
Subject: Re: [PATCH v10 2/5] sched: CGroup tagging interface for core
scheduling
On 2/23/21 4:05 AM, Peter Zijlstra wrote:
> On Mon, Feb 22, 2021 at 11:00:37PM -0500, Chris Hyser wrote:
>> On 1/22/21 8:17 PM, Joel Fernandes (Google) wrote:
>> While trying to test the new prctl() code I'm working on, I ran into a bug I
>> chased back into this v10 code. Under a fair amount of stress, when the
>> function __sched_core_update_cookie() is ultimately called from
>> sched_core_fork(), the system deadlocks or otherwise non-visibly crashes.
>> I've not had much success figuring out why/what. I'm running with LOCKDEP on
>> and seeing no complaints. Duplicating it only requires setting a cookie on a
>> task and forking a bunch of threads ... all of which then want to update
>> their cookie.
>
> Can you share the code and reproducer?
Attached is a tarball with c code (source) and scripts. Just run ./setup_bug which will compile the source and start a
bash with a cs cookie. Then run ./show_bug which dumps the cookie and then fires off some processes and threads. Note
the cs_clone command is not doing any core sched prctls for this test (not needed and currently coded for a diff prctl
interface). It just creates processes and threads. I see this hang almost instantly.
Josh, I did verify that this occurs on Joel's coresched tree both with and w/o the kprot patch and that should exactly
correspond to these patches.
-chrish
Download attachment "bug.tar.xz" of type "application/x-xz" (2784 bytes)
Powered by blists - more mailing lists