[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <5660bd90-3094-418a-8a05-58e222dacfb5@amd.com>
Date: Mon, 29 Jul 2024 10:19:57 +0530
From: Bharata B Rao <bharata@....com>
To: Zhaoyang Huang <huangzhaoyang@...il.com>,
"zhaoyang.huang" <zhaoyang.huang@...soc.com>
Cc: Neeraj.Upadhyay@....com, akpm@...ux-foundation.org, david@...hat.com,
kinseyho@...gle.com, linux-kernel@...r.kernel.org, linux-mm@...ck.org,
mgorman@...e.de, mjguzik@...il.com, nikunj@....com, vbabka@...e.cz,
willy@...radead.org, yuzhao@...gle.com, steve.kang@...soc.com
Subject: Re: Hard and soft lockups with FIO and LTP runs on a large system
On 26-Jul-24 8:56 AM, Zhaoyang Huang wrote:
> On Thu, Jul 25, 2024 at 6:00 PM zhaoyang.huang
> <zhaoyang.huang@...soc.com> wrote:
<snip>
>> From the callstack of lock holder, it is looks like a scability issue rather than a deadlock. Unlike legacy LRU management, there is no throttling mechanism for global reclaim under mglru so far.Could we apply the similar method to throttle the reclaim when it is too aggresive. I am wondering if this patch which is a rough version could help on this?
>>
>> diff --git a/mm/vmscan.c b/mm/vmscan.c
>> index 2e34de9cd0d4..827036e21f24 100644
>> --- a/mm/vmscan.c
>> +++ b/mm/vmscan.c
>> @@ -4520,6 +4520,50 @@ static int isolate_folios(struct lruvec *lruvec, struct scan_control *sc, int sw
>> return scanned;
>> }
>>
>> +static void lru_gen_throttle(pg_data_t *pgdat, struct scan_control *sc)
>> +{
>> + struct lruvec *target_lruvec = mem_cgroup_lruvec(sc->target_mem_cgroup, pgdat);
>> +
>> + if (current_is_kswapd()) {
>> + if (sc->nr.writeback && sc->nr.writeback == sc->nr.taken)
>> + set_bit(PGDAT_WRITEBACK, &pgdat->flags);
>> +
>> + /* Allow kswapd to start writing pages during reclaim.*/
>> + if (sc->nr.unqueued_dirty == sc->nr.file_taken)
>> + set_bit(PGDAT_DIRTY, &pgdat->flags);
>> +
>> + if (sc->nr.immediate)
>> + reclaim_throttle(pgdat, VMSCAN_THROTTLE_WRITEBACK);
>> + }
>> +
>> + /*
>> + * Tag a node/memcg as congested if all the dirty pages were marked
>> + * for writeback and immediate reclaim (counted in nr.congested).
>> + *
>> + * Legacy memcg will stall in page writeback so avoid forcibly
>> + * stalling in reclaim_throttle().
>> + */
>> + if (sc->nr.dirty && (sc->nr.dirty / 2 < sc->nr.congested)) {
>> + if (cgroup_reclaim(sc) && writeback_throttling_sane(sc))
>> + set_bit(LRUVEC_CGROUP_CONGESTED, &target_lruvec->flags);
>> +
>> + if (current_is_kswapd())
>> + set_bit(LRUVEC_NODE_CONGESTED, &target_lruvec->flags);
>> + }
>> +
>> + /*
>> + * Stall direct reclaim for IO completions if the lruvec is
>> + * node is congested. Allow kswapd to continue until it
>> + * starts encountering unqueued dirty pages or cycling through
>> + * the LRU too quickly.
>> + */
>> + if (!current_is_kswapd() && current_may_throttle() &&
>> + !sc->hibernation_mode &&
>> + (test_bit(LRUVEC_CGROUP_CONGESTED, &target_lruvec->flags) ||
>> + test_bit(LRUVEC_NODE_CONGESTED, &target_lruvec->flags)))
>> + reclaim_throttle(pgdat, VMSCAN_THROTTLE_CONGESTED);
>> +}
>> +
>> static int evict_folios(struct lruvec *lruvec, struct scan_control *sc, int swappiness)
>> {
>> int type;
>> @@ -4552,6 +4596,16 @@ static int evict_folios(struct lruvec *lruvec, struct scan_control *sc, int swap
>> retry:
>> reclaimed = shrink_folio_list(&list, pgdat, sc, &stat, false);
>> sc->nr_reclaimed += reclaimed;
>> + sc->nr.dirty += stat.nr_dirty;
>> + sc->nr.congested += stat.nr_congested;
>> + sc->nr.unqueued_dirty += stat.nr_unqueued_dirty;
>> + sc->nr.writeback += stat.nr_writeback;
>> + sc->nr.immediate += stat.nr_immediate;
>> + sc->nr.taken += scanned;
>> +
>> + if (type)
>> + sc->nr.file_taken += scanned;
>> +
>> trace_mm_vmscan_lru_shrink_inactive(pgdat->node_id,
>> scanned, reclaimed, &stat, sc->priority,
>> type ? LRU_INACTIVE_FILE : LRU_INACTIVE_ANON);
>> @@ -5908,6 +5962,7 @@ static void shrink_node(pg_data_t *pgdat, struct scan_control *sc)
>>
>> if (lru_gen_enabled() && root_reclaim(sc)) {
>> lru_gen_shrink_node(pgdat, sc);
>> + lru_gen_throttle(pgdat, sc);
>> return;
>> }
> Hi Bharata,
> This patch arised from a regression Android test case failure which
> allocated 1GB virtual memory by each over 8 threads on an 5.5GB RAM
> system. This test could pass on legacy LRU management while failing
> under MGLRU as a watchdog monitor detected abnormal system-wide
> schedule status(watchdog can't be scheduled within 60 seconds). This
> patch with a slight change as below got passed in the test whereas has
> not been investigated deeply for how it was done. Theoretically, this
> patch enrolled the similar reclaim throttle mechanism as legacy do
> which could reduce the contention of lruvec->lru_lock. I think this
> patch is quite naive for now, but I am hoping it could help you as
> your case seems like a scability issue of memory pressure rather than
> a deadlock issue. Thank you!
>
> the change of the applied version(try to throttle the reclaim before
> instead of after)
> if (lru_gen_enabled() && root_reclaim(sc)) {
> + lru_gen_throttle(pgdat, sc);
> lru_gen_shrink_node(pgdat, sc);
> - lru_gen_throttle(pgdat, sc);
> return;
> }
Thanks Zhaoyang Huang for the patch, will give this a test and report back.
Regards,
Bharata.
Powered by blists - more mailing lists