[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200622154840.GA13945@casper.infradead.org>
Date: Mon, 22 Jun 2020 16:48:40 +0100
From: willy@...per.infradead.org
To: "Eric W. Biederman" <ebiederm@...ssion.com>
Cc: Junxiao Bi <junxiao.bi@...cle.com>,
Matthew Wilcox <willy@...radead.org>,
linux-kernel@...r.kernel.org, linux-fsdevel@...r.kernel.org,
Matthew Wilcox <matthew.wilcox@...cle.com>,
Srinivas Eeda <SRINIVAS.EEDA@...cle.com>,
"joe.jin@...cle.com" <joe.jin@...cle.com>,
Wengang Wang <wen.gang.wang@...cle.com>
Subject: Re: [PATCH] proc: Avoid a thundering herd of threads freeing proc
dentries
On Mon, Jun 22, 2020 at 10:20:40AM -0500, Eric W. Biederman wrote:
> Junxiao Bi <junxiao.bi@...cle.com> writes:
> > On 6/20/20 9:27 AM, Matthew Wilcox wrote:
> >> On Fri, Jun 19, 2020 at 05:42:45PM -0500, Eric W. Biederman wrote:
> >>> Junxiao Bi <junxiao.bi@...cle.com> writes:
> >>>> Still high lock contention. Collect the following hot path.
> >>> A different location this time.
> >>>
> >>> I know of at least exit_signal and exit_notify that take thread wide
> >>> locks, and it looks like exit_mm is another. Those don't use the same
> >>> locks as flushing proc.
> >>>
> >>>
> >>> So I think you are simply seeing a result of the thundering herd of
> >>> threads shutting down at once. Given that thread shutdown is fundamentally
> >>> a slow path there is only so much that can be done.
> >>>
> >>> If you are up for a project to working through this thundering herd I
> >>> expect I can help some. It will be a long process of cleaning up
> >>> the entire thread exit process with an eye to performance.
> >> Wengang had some tests which produced wall-clock values for this problem,
> >> which I agree is more informative.
> >>
> >> I'm not entirely sure what the customer workload is that requires a
> >> highly threaded workload to also shut down quickly. To my mind, an
> >> overall workload is normally composed of highly-threaded tasks that run
> >> for a long time and only shut down rarely (thus performance of shutdown
> >> is not important) and single-threaded tasks that run for a short time.
> >
> > The real workload is a Java application working in server-agent mode, issue
> > happened in agent side, all it do is waiting works dispatching from server and
> > execute. To execute one work, agent will start lots of short live threads, there
> > could be a lot of threads exit same time if there were a lots of work to
> > execute, the contention on the exit path caused a high %sys time which impacted
> > other workload.
>
> If I understand correctly, the Java VM is not exiting. Just some of
> it's threads.
>
> That is a very different problem to deal with. That are many
> optimizations that are possible when _all_ of the threads are exiting
> that are not possible when _many_ threads are exiting.
Ah! Now I get it. This explains why the dput() lock contention was
so important. A new thread starting would block on that lock as it
tried to create its new /proc/$pid/task/ directory.
Terminating thousands of threads but not the entire process isn't going
to hit many of the locks (eg exit_signal() and exit_mm() aren't going
to be called). So we need a more sophisticated micro benchmark that is
continually starting threads and asking dozens-to-thousands of them to
stop at the same time. Otherwise we'll try to fix lots of scalability
problems that our customer doesn't care about.
> Do you know if it is simply the cpu time or if it is the lock contention
> that is the problem? If it is simply the cpu time we should consider if
> some of the locks that can be highly contended should become mutexes.
> Or perhaps something like Matthew's cpu pinning idea.
If we're not trying to optimise for the entire process going down, then
we definitely don't want my CPU pinning idea.
Powered by blists - more mailing lists