linux-kernel - Re: Hard and soft lockups with FIO and LTP runs on a large system

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <5660bd90-3094-418a-8a05-58e222dacfb5@amd.com>
Date: Mon, 29 Jul 2024 10:19:57 +0530
From: Bharata B Rao <bharata@....com>
To: Zhaoyang Huang <huangzhaoyang@...il.com>,
 "zhaoyang.huang" <zhaoyang.huang@...soc.com>
Cc: Neeraj.Upadhyay@....com, akpm@...ux-foundation.org, david@...hat.com,
 kinseyho@...gle.com, linux-kernel@...r.kernel.org, linux-mm@...ck.org,
 mgorman@...e.de, mjguzik@...il.com, nikunj@....com, vbabka@...e.cz,
 willy@...radead.org, yuzhao@...gle.com, steve.kang@...soc.com
Subject: Re: Hard and soft lockups with FIO and LTP runs on a large system

On 26-Jul-24 8:56 AM, Zhaoyang Huang wrote:
> On Thu, Jul 25, 2024 at 6:00 PM zhaoyang.huang
> <zhaoyang.huang@...soc.com> wrote:
<snip>
>>  From the callstack of lock holder, it is looks like a scability issue rather than a deadlock. Unlike legacy LRU management, there is no throttling mechanism for global reclaim under mglru so far.Could we apply the similar method to throttle the reclaim when it is too aggresive. I am wondering if this patch which is a rough version could help on this?
>>
>> diff --git a/mm/vmscan.c b/mm/vmscan.c
>> index 2e34de9cd0d4..827036e21f24 100644
>> --- a/mm/vmscan.c
>> +++ b/mm/vmscan.c
>> @@ -4520,6 +4520,50 @@ static int isolate_folios(struct lruvec *lruvec, struct scan_control *sc, int sw
>>          return scanned;
>>   }
>>
>> +static void lru_gen_throttle(pg_data_t *pgdat, struct scan_control *sc)
>> +{
>> +       struct lruvec *target_lruvec = mem_cgroup_lruvec(sc->target_mem_cgroup, pgdat);
>> +
>> +       if (current_is_kswapd()) {
>> +               if (sc->nr.writeback && sc->nr.writeback == sc->nr.taken)
>> +                       set_bit(PGDAT_WRITEBACK, &pgdat->flags);
>> +
>> +               /* Allow kswapd to start writing pages during reclaim.*/
>> +               if (sc->nr.unqueued_dirty == sc->nr.file_taken)
>> +                       set_bit(PGDAT_DIRTY, &pgdat->flags);
>> +
>> +               if (sc->nr.immediate)
>> +                       reclaim_throttle(pgdat, VMSCAN_THROTTLE_WRITEBACK);
>> +       }
>> +
>> +       /*
>> +        * Tag a node/memcg as congested if all the dirty pages were marked
>> +        * for writeback and immediate reclaim (counted in nr.congested).
>> +        *
>> +        * Legacy memcg will stall in page writeback so avoid forcibly
>> +        * stalling in reclaim_throttle().
>> +        */
>> +       if (sc->nr.dirty && (sc->nr.dirty / 2 < sc->nr.congested)) {
>> +               if (cgroup_reclaim(sc) && writeback_throttling_sane(sc))
>> +                       set_bit(LRUVEC_CGROUP_CONGESTED, &target_lruvec->flags);
>> +
>> +               if (current_is_kswapd())
>> +                       set_bit(LRUVEC_NODE_CONGESTED, &target_lruvec->flags);
>> +       }
>> +
>> +       /*
>> +        * Stall direct reclaim for IO completions if the lruvec is
>> +        * node is congested. Allow kswapd to continue until it
>> +        * starts encountering unqueued dirty pages or cycling through
>> +        * the LRU too quickly.
>> +        */
>> +       if (!current_is_kswapd() && current_may_throttle() &&
>> +           !sc->hibernation_mode &&
>> +           (test_bit(LRUVEC_CGROUP_CONGESTED, &target_lruvec->flags) ||
>> +            test_bit(LRUVEC_NODE_CONGESTED, &target_lruvec->flags)))
>> +               reclaim_throttle(pgdat, VMSCAN_THROTTLE_CONGESTED);
>> +}
>> +
>>   static int evict_folios(struct lruvec *lruvec, struct scan_control *sc, int swappiness)
>>   {
>>          int type;
>> @@ -4552,6 +4596,16 @@ static int evict_folios(struct lruvec *lruvec, struct scan_control *sc, int swap
>>   retry:
>>          reclaimed = shrink_folio_list(&list, pgdat, sc, &stat, false);
>>          sc->nr_reclaimed += reclaimed;
>> +       sc->nr.dirty += stat.nr_dirty;
>> +       sc->nr.congested += stat.nr_congested;
>> +       sc->nr.unqueued_dirty += stat.nr_unqueued_dirty;
>> +       sc->nr.writeback += stat.nr_writeback;
>> +       sc->nr.immediate += stat.nr_immediate;
>> +       sc->nr.taken += scanned;
>> +
>> +       if (type)
>> +               sc->nr.file_taken += scanned;
>> +
>>          trace_mm_vmscan_lru_shrink_inactive(pgdat->node_id,
>>                          scanned, reclaimed, &stat, sc->priority,
>>                          type ? LRU_INACTIVE_FILE : LRU_INACTIVE_ANON);
>> @@ -5908,6 +5962,7 @@ static void shrink_node(pg_data_t *pgdat, struct scan_control *sc)
>>
>>          if (lru_gen_enabled() && root_reclaim(sc)) {
>>                  lru_gen_shrink_node(pgdat, sc);
>> +               lru_gen_throttle(pgdat, sc);
>>                  return;
>>          }
> Hi Bharata,
> This patch arised from a regression Android test case failure which
> allocated 1GB virtual memory by each over 8 threads on an 5.5GB RAM
> system. This test could pass on legacy LRU management while failing
> under MGLRU as a watchdog monitor detected abnormal system-wide
> schedule status(watchdog can't be scheduled within 60 seconds). This
> patch with a slight change as below got passed in the test whereas has
> not been investigated deeply for how it was done. Theoretically, this
> patch enrolled the similar reclaim throttle mechanism as legacy do
> which could reduce the contention of lruvec->lru_lock. I think this
> patch is quite naive for now， but I am hoping it could help you as
> your case seems like a scability issue of memory pressure rather than
> a deadlock issue. Thank you!
> 
> the change of the applied version(try to throttle the reclaim before
> instead of after)
>           if (lru_gen_enabled() && root_reclaim(sc)) {
>   +               lru_gen_throttle(pgdat, sc);
>                   lru_gen_shrink_node(pgdat, sc);
>   -               lru_gen_throttle(pgdat, sc);
>                   return;
>           }

Thanks Zhaoyang Huang for the patch, will give this a test and report back.

Regards,
Bharata.