[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAJD7tkbgPo1_Gij+EL==tedRh=nJe_etuZors-6Y-obYu44FMQ@mail.gmail.com>
Date: Mon, 27 Nov 2023 21:41:54 -0800
From: Yosry Ahmed <yosryahmed@...gle.com>
To: "Huang, Ying" <ying.huang@...el.com>
Cc: Minchan Kim <minchan@...nel.org>, Chris Li <chriscli@...gle.com>,
Michal Hocko <mhocko@...e.com>,
Liu Shixin <liushixin2@...wei.com>,
Yu Zhao <yuzhao@...gle.com>,
Andrew Morton <akpm@...ux-foundation.org>,
Sachin Sant <sachinp@...ux.ibm.com>,
Johannes Weiner <hannes@...xchg.org>,
Kefeng Wang <wangkefeng.wang@...wei.com>, linux-mm@...ck.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH v10] mm: vmscan: try to reclaim swapcache pages if no swap space
On Mon, Nov 27, 2023 at 9:39 PM Huang, Ying <ying.huang@...el.com> wrote:
>
> Yosry Ahmed <yosryahmed@...gle.com> writes:
>
> > On Mon, Nov 27, 2023 at 8:05 PM Huang, Ying <ying.huang@...el.com> wrote:
> >>
> >> Yosry Ahmed <yosryahmed@...gle.com> writes:
> >>
> >> > On Mon, Nov 27, 2023 at 7:21 PM Huang, Ying <ying.huang@...el.com> wrote:
> >> >>
> >> >> Yosry Ahmed <yosryahmed@...gle.com> writes:
> >> >>
> >> >> > On Mon, Nov 27, 2023 at 1:32 PM Minchan Kim <minchan@...nel.org> wrote:
> >> >> >>
> >> >> >> On Mon, Nov 27, 2023 at 12:22:59AM -0800, Chris Li wrote:
> >> >> >> > On Mon, Nov 27, 2023 at 12:14 AM Huang, Ying <ying.huang@...el.com> wrote:
> >> >> >> > > > I agree with Ying that anonymous pages typically have different page
> >> >> >> > > > access patterns than file pages, so we might want to treat them
> >> >> >> > > > differently to reclaim them effectively.
> >> >> >> > > > One random idea:
> >> >> >> > > > How about we put the anonymous page in a swap cache in a different LRU
> >> >> >> > > > than the rest of the anonymous pages. Then shrinking against those
> >> >> >> > > > pages in the swap cache would be more effective.Instead of having
> >> >> >> > > > [anon, file] LRU, now we have [anon not in swap cache, anon in swap
> >> >> >> > > > cache, file] LRU
> >> >> >> > >
> >> >> >> > > I don't think that it is necessary. The patch is only for a special use
> >> >> >> > > case. Where the swap device is used up while some pages are in swap
> >> >> >> > > cache. The patch will kill performance, but it is used to avoid OOM
> >> >> >> > > only, not to improve performance. Per my understanding, we will not use
> >> >> >> > > up swap device space in most cases. This may be true for ZRAM, but will
> >> >> >> > > we keep pages in swap cache for long when we use ZRAM?
> >> >> >> >
> >> >> >> > I ask the question regarding how many pages can be freed by this patch
> >> >> >> > in this email thread as well, but haven't got the answer from the
> >> >> >> > author yet. That is one important aspect to evaluate how valuable is
> >> >> >> > that patch.
> >> >> >>
> >> >> >> Exactly. Since swap cache has different life time with page cache, they
> >> >> >> would be usually dropped when pages are unmapped(unless they are shared
> >> >> >> with others but anon is usually exclusive private) so I wonder how much
> >> >> >> memory we can save.
> >> >> >
> >> >> > I think the point of this patch is not saving memory, but rather
> >> >> > avoiding an OOM condition that will happen if we have no swap space
> >> >> > left, but some pages left in the swap cache. Of course, the OOM
> >> >> > avoidance will come at the cost of extra work in reclaim to swap those
> >> >> > pages out.
> >> >> >
> >> >> > The only case where I think this might be harmful is if there's plenty
> >> >> > of pages to reclaim on the file LRU, and instead we opt to chase down
> >> >> > the few swap cache pages. So perhaps we can add a check to only set
> >> >> > sc->swapcache_only if the number of pages in the swap cache is more
> >> >> > than the number of pages on the file LRU or similar? Just make sure we
> >> >> > don't chase the swapcache pages down if there's plenty to scan on the
> >> >> > file LRU?
> >> >>
> >> >> The swap cache pages can be divided to 3 groups.
> >> >>
> >> >> - group 1: pages have been written out, at the tail of inactive LRU, but
> >> >> not reclaimed yet.
> >> >>
> >> >> - group 2: pages have been written out, but were failed to be reclaimed
> >> >> (e.g., were accessed before reclaiming)
> >> >>
> >> >> - group 3: pages have been swapped in, but were kept in swap cache. The
> >> >> pages may be in active LRU.
> >> >>
> >> >> The main target of the original patch should be group 1. And the pages
> >> >> may be cheaper to reclaim than file pages.
> >> >>
> >> >> Group 2 are hard to be reclaimed if swap_count() isn't 0.
> >> >>
> >> >> Group 3 should be reclaimed in theory, but the overhead may be high.
> >> >> And we may need to reclaim the swap entries instead of pages if the pages
> >> >> are hot. But we can start to reclaim the swap entries before the swap
> >> >> space is run out.
> >> >>
> >> >> So, if we can count group 1, we may use that as indicator to scan anon
> >> >> pages. And we may add code to reclaim group 3 earlier.
> >> >>
> >> >
> >> > My point was not that reclaiming the pages in the swap cache is more
> >> > expensive that reclaiming the pages in the file LRU. In a lot of
> >> > cases, as you point out, the pages in the swap cache can just be
> >> > dropped, so they may be as cheap or cheaper to reclaim than the pages
> >> > in the file LRU.
> >> >
> >> > My point was that scanning the anon LRU when swap space is exhausted
> >> > to get to the pages in the swap cache may be much more expensive,
> >> > because there may be a lot of pages on the anon LRU that are not in
> >> > the swap cache, and hence are not reclaimable, unlike pages in the
> >> > file LRU, which should mostly be reclaimable.
> >> >
> >> > So what I am saying is that maybe we should not do the effort of
> >> > scanning the anon LRU in the swapcache_only case unless there aren't a
> >> > lot of pages to reclaim on the file LRU (relatively). For example, if
> >> > we have a 100 pages in the swap cache out of 10000 pages in the anon
> >> > LRU, and there are 10000 pages in the file LRU, it's probably not
> >> > worth scanning the anon LRU.
> >>
> >> For group 1 pages, they are at the tail of the anon inactive LRU, so the
> >> scan overhead is low too. For example, if number of group 1 pages is
> >> 100, we just need to scan 100 pages to reclaim them. We can choose to
> >> stop scanning when the number of the non-group-1 pages reached some
> >> threshold.
> >>
> >
> > We should still try to reclaim pages in groups 2 & 3 before OOMing
> > though. Maybe the motivation for this patch is group 1, but I don't
> > see why we should special case them. Pages in groups 2 & 3 should be
> > roughly equally cheap to reclaim. They may have higher refault cost,
> > but IIUC we should still try to reclaim them before OOMing.
>
> The scan cost of group 3 may be high, you may need to scan all anonymous
> pages to identify them. The reclaim cost of group 2 may be high, it may
> just cause trashing (shared pages that are accessed by just one
> process). So I think that we can allow reclaim group 1 in all cases.
> Try to reclaim swap entries for group 3 during normal LRU scanning after
> more than half of swap space of limit is used. As a last resort before
> OOM, try to reclaim group 2 and group 3. Or, limit scan count for group
> 2 and group 3.
It would be nice if this can be done auto-magically without having to
keep track of the groups separately.
>
> BTW, in some situation, OOM is not the worst situation. For example,
> trashing may kill interaction latency, while killing the memory hog (may
> be caused by memory leak) saves system response time.
I agree that in some situations OOMs are better than thrashing, it's
not an easy problem.
Powered by blists - more mailing lists