lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20090427084347.GJ4454@balbir.in.ibm.com>
Date:	Mon, 27 Apr 2009 14:13:47 +0530
From:	Balbir Singh <balbir@...ux.vnet.ibm.com>
To:	KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>
Cc:	Daisuke Nishimura <nishimura@....nes.nec.co.jp>,
	"linux-mm@...ck.org" <linux-mm@...ck.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"hugh@...itas.com" <hugh@...itas.com>
Subject: Re: [RFC][PATCH] fix swap entries is not reclaimed in proper way
	for memg v3.

* KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com> [2009-04-27 17:21:19]:

> On Mon, 27 Apr 2009 13:42:06 +0530
> Balbir Singh <balbir@...ux.vnet.ibm.com> wrote:
> 
> > * KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com> [2009-04-24 16:28:40]:
> > 
> > > This is new one. (using new logic.) Maybe enough light-weight and caches all cases.
> > 
> > You sure mean catches above :)
> > 
> > 
> > > 
> > > Thanks,
> > > -Kame
> > > ==
> > > From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>
> > > 
> > > Because free_swap_and_cache() function is called under spinlocks,
> > > it can't sleep and use trylock_page() instead of lock_page().
> > > By this, swp_entry which is not used after zap_xx can exists as
> > > SwapCache, which will be never used.
> > > This kind of SwapCache is reclaimed by global LRU when it's found
> > > at LRU rotation.
> > > 
> > > When memory cgroup is used,  the global LRU will not be kicked and
> > > stale Swap Caches will not be reclaimed. This is problematic because
> > > memcg's swap entry accounting is leaked and memcg can't know it.
> > > To catch this stale SwapCache, we have to chase it and check the
> > > swap is alive or not again.
> > > 
> > > This patch adds a function to chase stale swap cache and reclaim it
> > > in modelate way. When zap_xxx fails to remove swap ent, it will be
> > > recoreded into buffer and memcg's "work" will reclaim it later.
> > > No sleep, no memory allocation under free_swap_and_cache().
> > > 
> > > This patch also adds stale-swap-cache-congestion logic and try to avoid having
> > > too much stale swap caches at the same time.
> > > 
> > > Implementation is naive but maybe the cost meets trade-off.
> > > 
> > > How to test:
> > >   1. set limit of memory to very small (1-2M?). 
> > >   2. run some amount of program and run page reclaim/swap-in.
> > >   3. kill programs by SIGKILL etc....then, Stale Swap Cache will
> > >      be increased. After this patch, stale swap caches are reclaimed
> > >      and mem+swap controller will not go to OOM.
> > > 
> > > Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>
> > 
> > Quick comment on the design
> > 
> > 1. I like the marking of swap cache entries as stale
> 
> I like to. But there is no space to record it as stale. And "race" makes
> that difficult even if we have enough space. If you read the whole thread,
> you know there are many patterns of race.

There have been several iterations of this discussion, summarizing it
would be nice, let me find the thread.

> 
> > 2. Can't we reclaim stale entries during memcg LRU reclaim? Why write
> > a GC for it?
> > 
> Because they are not on memcg LRU. we can't reclaim it by memcg LRU.
> (See the first mail from Nishimura of this thread. It explains well.)
>

Hmm.. I don't find it, let me do a more exhaustive search on the web.
If the entry is stale and not on memcg LRU, it is still accounted to
the memcg?
 
> One easy case is here.
> 
>   - CPU0 call zap_pte()->free_swap_and_cache()
>   - CPU1 tries to swap-in it.
>   In this case, free_swap_and_cache() doesn't free swp_entry and swp_entry
>   is read into the memory. But it will never be added memcg's LRU until
>   it's mapped.

That is strange.. not even added to the LRU as a cached page?

>   (What we have to consider here is swapin-readahead. It can swap-in memory
>    even if it's not accessed. Then, this race window is larger than expected.)
> 
> We can't use memcg's LRU then...what we can do is.
> 
>  - scanning global LRU all
>  or
>  - use some trick to reclaim them in lazy way.
>

Thanks for being patient, some of these questions have been discussed
before I suppose. Let me dig out the thread. 

-- 
	Balbir
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ