linux-kernel - Re: [PATCH RESEND] mm: drop lruvec->lru_lock if contended when skipping folio

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAGWkznFLUcyyqW9yGA-Sy6+Y22yqt0C5cWkxZnpq9VarPiH7gQ@mail.gmail.com>
Date: Wed, 21 Aug 2024 08:51:46 +0800
From: Zhaoyang Huang <huangzhaoyang@...il.com>
To: Usama Arif <usamaarif642@...il.com>
Cc: Andrew Morton <akpm@...ux-foundation.org>, yuzhao@...gle.com, david@...hat.com, 
	leitao@...ian.org, bharata@....com, willy@...radead.org, vbabka@...e.cz, 
	linux-kernel@...r.kernel.org, kernel-team@...a.com, 
	Johannes Weiner <hannes@...xchg.org>, zhaoyang.huang@...soc.com, 
	Rik van Riel <riel@...riel.com>
Subject: Re: [PATCH RESEND] mm: drop lruvec->lru_lock if contended when
 skipping folio

On Tue, Aug 20, 2024 at 11:45 PM Usama Arif <usamaarif642@...il.com> wrote:
>
>
>
> On 20/08/2024 02:17, Andrew Morton wrote:
> > On Mon, 19 Aug 2024 19:46:48 +0100 Usama Arif <usamaarif642@...il.com> wrote:
> >
> >> lruvec->lru_lock is highly contended and is held when calling
> >> isolate_lru_folios. If the lru has a large number of CMA folios
> >> consecutively, while the allocation type requested is not MIGRATE_MOVABLE,
> >> isolate_lru_folios can hold the lock for a very long time while it
> >> skips those. vmscan_lru_isolate tracepoint showed that skipped can go
> >> above 70k in production and lockstat shows that waittime-max is x1000
> >> higher without this patch.
> >> This can cause lockups [1] and high memory pressure for extended periods of
> >> time [2]. Hence release the lock if its contended when skipping a folio to
> >> give other tasks a chance to acquire it and not stall.
> >>
> >> ...
> >>
> >> --- a/mm/vmscan.c
> >> +++ b/mm/vmscan.c
> >> @@ -1695,8 +1695,14 @@ static unsigned long isolate_lru_folios(unsigned long nr_to_scan,
> >>              if (folio_zonenum(folio) > sc->reclaim_idx ||
> >>                              skip_cma(folio, sc)) {
> >>                      nr_skipped[folio_zonenum(folio)] += nr_pages;
> >> -                    move_to = &folios_skipped;
> >> -                    goto move;
> >> +                    list_move(&folio->lru, &folios_skipped);
> >> +                    if (!spin_is_contended(&lruvec->lru_lock))
> >> +                            continue;
> >> +                    if (!list_empty(dst))
> >> +                            break;
> >> +                    spin_unlock_irq(&lruvec->lru_lock);
> >> +                    cond_resched();
> >> +                    spin_lock_irq(&lruvec->lru_lock);
> >>              }
> >
> > Oh geeze ugly thing.  Must we do this?
> >
> > The games that function plays with src, dst and move_to are a bit hard
> > to follow.  Some tasteful comments explaining what's going on would
> > help.
> >
> > Also that test of !list_empty(dst).  It would be helpful to comment the
> > dynamics which are happening in this case - why we're testing dst here.
> >
> >
>
> So Johannes pointed out to me that this is not going to properly fix the problem of holding the lru_lock for a long time introduced in [1] because of 2 reasons:
> - the task that is doing lock break is hoarding folios on folios_skipped and making the lru shorter, I didn't see it in the usecase I was trying, but it could be that yielding the lock to the other task is not of much use as it is going to go through a much shorter lru list or even an empty lru list and would OOM, while the folio it is looking for is on folios_skipped. We would be substituting one OOM problem for another with this patch.
Other tasks can not get folios either if they can not use CMA after
the reclaiming even without the original patch. I am ok with your fix
patch except the 'if (!list_empty(dst))' puzzled me.
> - Compaction code goes through pages by pfn and not using the list, as this patch does not clear lru flag, compaction could claim this folio.
>
> The patch in [1] is severely breaking production at Meta and its not a proper fix to the problem that the commit was trying to be solved. It results in holding the lru_lock for a very significant amount of time, stalling all other processes trying to claim memory, creating very high memory pressure for large periods of time and causing OOM.
>
> The way forward would be to revert it and try to come up with a longer term solution that the original commit tried to solve. If no one is opposed to it, I will wait a couple of days for comments and send a revert patch.
I mainly focus on android systems which have no such scenarios as
server encountered. I agree with reverting it if this fix can not be
accepted.
>
> [1] https://lore.kernel.org/all/1685501461-19290-1-git-send-email-zhaoyang.huang@unisoc.com/
>
> Thanks,
> Usama