[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <deddb7b4-02f6-46aa-a075-cf9b7083ffd8@amd.com>
Date: Mon, 22 Jul 2024 09:47:01 +0530
From: Bharata B Rao <bharata@....com>
To: Mateusz Guzik <mjguzik@...il.com>, Yu Zhao <yuzhao@...gle.com>
Cc: linux-mm@...ck.org, linux-kernel@...r.kernel.org, nikunj@....com,
"Upadhyay, Neeraj" <Neeraj.Upadhyay@....com>,
Andrew Morton <akpm@...ux-foundation.org>,
David Hildenbrand <david@...hat.com>, willy@...radead.org, vbabka@...e.cz,
kinseyho@...gle.com, Mel Gorman <mgorman@...e.de>
Subject: Re: Hard and soft lockups with FIO and LTP runs on a large system
On 20-Jul-24 1:27 PM, Mateusz Guzik wrote:
> On Fri, Jul 19, 2024 at 10:21 PM Yu Zhao <yuzhao@...gle.com> wrote:
>> I can't come up with any reasonable band-aid at this moment, i.e.,
>> something not too ugly to work around a more fundamental scalability
>> problem.
>>
>> Before I give up: what type of dirty data was written back to the nvme
>> device? Was it page cache or swap?
>>
>
> With my corporate employee hat on, I would like to note a couple of
> three things.
>
> 1. there are definitely bugs here and someone(tm) should sort them out(R)
>
> however....
>
> 2. the real goal is presumably to beat the kernel into shape where
> production kernels no longer suffer lockups running this workload on
> this hardware
> 3. the flamegraph (to be found in [1]) shows expensive debug enabled,
> notably for preemption count (search for preempt_count_sub to see)
> 4. I'm told the lruvec problem is being worked on (but no ETA) and I
> don't think the above justifies considering any hacks or otherwise
> putting more pressure on it
>
> It is plausible eliminating the aforementioned debug will be good enough.
>
> Apart from that I note percpu_counter_add_batch (+ irq debug) accounts
> for 5.8% cpu time. This will of course go down if irq tracing is
> disabled, but so happens I optimized this routine to be faster
> single-threaded (in particular by dodging the interrupt trip). The
> patch is hanging out in the mm tree [2] and is trivially applicable
> for testing.
>
> Even if none of the debug opts can get modified, this should drop
> percpu_counter_add_batch to 1.5% or so, which may or may not have a
> side effect of avoiding the lockup problem.
Thanks, A few debug options were turned ON to gather debug data. Will do
a full run once with them turned OFF and with the above
percpu_counter_add_batch patch.
Regards,
Bharata.
Powered by blists - more mailing lists