lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZZ-q0PZ-XCDwA4oG@tiehlicka>
Date: Thu, 11 Jan 2024 09:46:08 +0100
From: Michal Hocko <mhocko@...e.com>
To: Jianfeng Wang <jianfeng.w.wang@...cle.com>
Cc: akpm@...ux-foundation.org, linux-mm@...ck.org,
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH] mm, oom: Add lru_add_drain() in __oom_reap_task_mm()

On Wed 10-01-24 11:02:03, Jianfeng Wang wrote:
> On 1/10/24 12:46 AM, Michal Hocko wrote:
> > On Tue 09-01-24 01:15:11, Jianfeng Wang wrote:
> >> The oom_reaper tries to reclaim additional memory owned by the oom
> >> victim. In __oom_reap_task_mm(), it uses mmu_gather for batched page
> >> free. After oom_reaper was added, mmu_gather feature introduced
> >> CONFIG_MMU_GATHER_NO_GATHER (in 'commit 952a31c9e6fa ("asm-generic/tlb:
> >> Introduce CONFIG_HAVE_MMU_GATHER_NO_GATHER=y")', an option to skip batched
> >> page free. If set, tlb_batch_pages_flush(), which is responsible for
> >> calling lru_add_drain(), is skipped during tlb_finish_mmu(). Without it,
> >> pages could still be held by per-cpu fbatches rather than be freed.
> >>
> >> This fix adds lru_add_drain() prior to mmu_gather. This makes the code
> >> consistent with other cases where mmu_gather is used for freeing pages.
> > 
> > Does this fix any actual problem or is this pure code consistency thing?
> > I am asking because it doesn't make much sense to me TBH, LRU cache
> > draining is usually important when we want to ensure that cached pages
> > are put to LRU to be dealt with because otherwise the MM code wouldn't
> > be able to deal with them. OOM reaper doesn't necessarily run on the
> > same CPU as the oom victim so draining on a local CPU doesn't
> > necessarily do anything for the victim's pages.
> > 
> > While this patch is not harmful I really do not see much point in adding
> > the local draining here. Could you clarify please?
> > 
> It targets the case described in the patch's commit message: oom_killer
> thinks that it 'reclaims' pages while pages are still held by per-cpu
> fbatches with a ref count.
> 
> I admit that pages may sit on a different core(s). Given that
> doing remote calls to all CPUs with lru_add_drain_all() is expensive,
> this line of code can be helpful if it happens to give back a few pages
> to the system right away without the overhead, especially when oom is
> involved. Plus, it also makes the code consistent with other places
> using mmu_gather feature to free pages in batch.

I would argue that consistency the biggest problem of this patch. It
tries to follow a pattern that is just not really correct. First it
operates on a random CPU from the oom victim perspective and second it
doesn't really block any unmapping operation and that is the main
purpose of the reaper. Sure it frees a lot of unmapped memory but if
there are couple of pages that cannot be freed imeediately because they
are sitting on a per-cpu LRU caches then this is not a deal breaker. As
you have noted those pages might be sitting on any per-cpu cache.

So I do not really see that as a good justification. People will follow
that pattern even more and spread lru_add_drain to other random places.

Unless you can show any actual runtime effect of this patch then I think
it shouldn't be merged.

-- 
Michal Hocko
SUSE Labs

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ