lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20090427172119.d84aaa68.kamezawa.hiroyu@jp.fujitsu.com>
Date:	Mon, 27 Apr 2009 17:21:19 +0900
From:	KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>
To:	balbir@...ux.vnet.ibm.com
Cc:	Daisuke Nishimura <nishimura@....nes.nec.co.jp>,
	"linux-mm@...ck.org" <linux-mm@...ck.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"hugh@...itas.com" <hugh@...itas.com>
Subject: Re: [RFC][PATCH] fix swap entries is not reclaimed in proper way
 for memg v3.

On Mon, 27 Apr 2009 13:42:06 +0530
Balbir Singh <balbir@...ux.vnet.ibm.com> wrote:

> * KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com> [2009-04-24 16:28:40]:
> 
> > This is new one. (using new logic.) Maybe enough light-weight and caches all cases.
> 
> You sure mean catches above :)
> 
> 
> > 
> > Thanks,
> > -Kame
> > ==
> > From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>
> > 
> > Because free_swap_and_cache() function is called under spinlocks,
> > it can't sleep and use trylock_page() instead of lock_page().
> > By this, swp_entry which is not used after zap_xx can exists as
> > SwapCache, which will be never used.
> > This kind of SwapCache is reclaimed by global LRU when it's found
> > at LRU rotation.
> > 
> > When memory cgroup is used,  the global LRU will not be kicked and
> > stale Swap Caches will not be reclaimed. This is problematic because
> > memcg's swap entry accounting is leaked and memcg can't know it.
> > To catch this stale SwapCache, we have to chase it and check the
> > swap is alive or not again.
> > 
> > This patch adds a function to chase stale swap cache and reclaim it
> > in modelate way. When zap_xxx fails to remove swap ent, it will be
> > recoreded into buffer and memcg's "work" will reclaim it later.
> > No sleep, no memory allocation under free_swap_and_cache().
> > 
> > This patch also adds stale-swap-cache-congestion logic and try to avoid having
> > too much stale swap caches at the same time.
> > 
> > Implementation is naive but maybe the cost meets trade-off.
> > 
> > How to test:
> >   1. set limit of memory to very small (1-2M?). 
> >   2. run some amount of program and run page reclaim/swap-in.
> >   3. kill programs by SIGKILL etc....then, Stale Swap Cache will
> >      be increased. After this patch, stale swap caches are reclaimed
> >      and mem+swap controller will not go to OOM.
> > 
> > Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>
> 
> Quick comment on the design
> 
> 1. I like the marking of swap cache entries as stale

I like to. But there is no space to record it as stale. And "race" makes
that difficult even if we have enough space. If you read the whole thread,
you know there are many patterns of race.

> 2. Can't we reclaim stale entries during memcg LRU reclaim? Why write
> a GC for it?
> 
Because they are not on memcg LRU. we can't reclaim it by memcg LRU.
(See the first mail from Nishimura of this thread. It explains well.)

One easy case is here.

  - CPU0 call zap_pte()->free_swap_and_cache()
  - CPU1 tries to swap-in it.
  In this case, free_swap_and_cache() doesn't free swp_entry and swp_entry
  is read into the memory. But it will never be added memcg's LRU until
  it's mapped.
  (What we have to consider here is swapin-readahead. It can swap-in memory
   even if it's not accessed. Then, this race window is larger than expected.)

We can't use memcg's LRU then...what we can do is.

 - scanning global LRU all
 or
 - use some trick to reclaim them in lazy way.


Thanks,
-Kame


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ