linux-kernel - Re: Hard and soft lockups with FIO and LTP runs on a large system

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <deddb7b4-02f6-46aa-a075-cf9b7083ffd8@amd.com>
Date: Mon, 22 Jul 2024 09:47:01 +0530
From: Bharata B Rao <bharata@....com>
To: Mateusz Guzik <mjguzik@...il.com>, Yu Zhao <yuzhao@...gle.com>
Cc: linux-mm@...ck.org, linux-kernel@...r.kernel.org, nikunj@....com,
 "Upadhyay, Neeraj" <Neeraj.Upadhyay@....com>,
 Andrew Morton <akpm@...ux-foundation.org>,
 David Hildenbrand <david@...hat.com>, willy@...radead.org, vbabka@...e.cz,
 kinseyho@...gle.com, Mel Gorman <mgorman@...e.de>
Subject: Re: Hard and soft lockups with FIO and LTP runs on a large system

On 20-Jul-24 1:27 PM, Mateusz Guzik wrote:
> On Fri, Jul 19, 2024 at 10:21 PM Yu Zhao <yuzhao@...gle.com> wrote:
>> I can't come up with any reasonable band-aid at this moment, i.e.,
>> something not too ugly to work around a more fundamental scalability
>> problem.
>>
>> Before I give up: what type of dirty data was written back to the nvme
>> device? Was it page cache or swap?
>>
> 
> With my corporate employee hat on, I would like to note a couple of
> three things.
> 
> 1. there are definitely bugs here and someone(tm) should sort them out(R)
> 
> however....
> 
> 2. the real goal is presumably to beat the kernel into shape where
> production kernels no longer suffer lockups running this workload on
> this hardware
> 3. the flamegraph (to be found in [1]) shows expensive debug enabled,
> notably for preemption count (search for preempt_count_sub to see)
> 4. I'm told the lruvec problem is being worked on (but no ETA) and I
> don't think the above justifies considering any hacks or otherwise
> putting more pressure on it
> 
> It is plausible eliminating the aforementioned debug will be good enough.
> 
> Apart from that I note percpu_counter_add_batch (+ irq debug) accounts
> for 5.8% cpu time. This will of course go down if irq tracing is
> disabled, but so happens I optimized this routine to be faster
> single-threaded (in particular by dodging the interrupt trip). The
> patch is hanging out in the mm tree [2] and is trivially applicable
> for testing.
> 
> Even if none of the debug opts can get modified, this should drop
> percpu_counter_add_batch to 1.5% or so, which may or may not have a
> side effect of avoiding the lockup problem.

Thanks, A few debug options were turned ON to gather debug data. Will do 
a full run once with them turned OFF and with the above 
percpu_counter_add_batch patch.

Regards,
Bharata.