lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20111013073321.GA2784@barrios-desktop>
Date:	Thu, 13 Oct 2011 16:33:21 +0900
From:	Minchan Kim <minchan.kim@...il.com>
To:	Satoru Moriya <satoru.moriya@....com>
Cc:	Andrew Morton <akpm@...ux-foundation.org>,
	Rik van Riel <riel@...hat.com>,
	Randy Dunlap <rdunlap@...otime.net>,
	Satoru Moriya <smoriya@...hat.com>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"linux-mm@...ck.org" <linux-mm@...ck.org>,
	"lwoodman@...hat.com" <lwoodman@...hat.com>,
	Seiji Aguchi <saguchi@...hat.com>,
	"hughd@...gle.com" <hughd@...gle.com>,
	"hannes@...xchg.org" <hannes@...xchg.org>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>,
	David Rientjes <rientjes@...gle.com>
Subject: Re: [PATCH -v2 -mm] add extra free kbytes tunable

On Fri, Sep 02, 2011 at 12:31:14PM -0400, Satoru Moriya wrote:
> On 09/01/2011 05:58 PM, Andrew Morton wrote:
> > On Thu, 1 Sep 2011 15:26:50 -0400
> > Rik van Riel <riel@...hat.com> wrote:
> > 
> >> Add a userspace visible knob
> > 
> > argh.  Fear and hostility at new knobs which need to be maintained for 
> > ever, even if the underlying implementation changes.
> > 
> > Unfortunately, this one makes sense.
> > 
> >> to tell the VM to keep an extra amount of memory free, by increasing 
> >> the gap between each zone's min and low watermarks.
> >>
> >> This is useful for realtime applications that call system calls and 
> >> have a bound on the number of allocations that happen in any short 
> >> time period.  In this application, extra_free_kbytes would be left at 
> >> an amount equal to or larger than the maximum number of 
> >> allocations that happen in any burst.
> > 
> > _is_ it useful?  Proof?
> > 
> > Who is requesting this?  Have they tested it?  Results?
> 
> This is interesting for me.
> 
> Some of our customers have realtime applications and they are concerned 
> the fact that Linux uses free memory as pagecache. It means that
> when their application allocate memory, Linux kernel tries to reclaim
> memory at first and then allocate it. This may make memory allocation
> latency bigger.
> 
> In many cases this is not a big issue because Linux has kswapd for
> background reclaim and it is fast enough not to enter direct reclaim
> path if there are a lot of clean cache. But under some situations -
> e.g. Application allocates a lot of memory which is larger than delta
> between watermark_low and watermark_min in a short time and kswapd
> can't reclaim fast enough due to dirty page reclaim, direct reclaim
> is executed and causes big latency.
> 
> We can avoid the issue above by using preallocation and mlock.
> But it can't cover kmalloc used in systemcall. So I'd like to use
> this patch with mlock to avoid memory allocation latency issue as
> low as possible. It may not be a perfect solution but it is important
> for customers in enterprise area to configure the amount of free
> memory at their own risk.

I agree needs for such feature but don't like such primitive interface
exporting to user.

As Satoru said, we can reserve free pages for user through preallocation and mlocking.
The thing is free pages for kernel itself.
Most desirable thing is we have to avoid syscall in critical realtime section.
But if we can't avoid, my crazy idea is to use memcg for kernel pages.
Of course, we should implement it and not simple stuff but AFAIK, memcg people
always consider it and finally will do it. :)
Recently, Glauber try "Basic kernel memory functionality" but I don't have reviewed
it yet. I am not sure we can reuse it, anyway. Kame?

My simple idea is as follows,

We can assign basic revered page pool and/or size of user-determined pages pool
for each task registred at memcg-slab.
The application have to notify start of RT section to memcg before it goes to
RT section. So, memcg could fill up page pool if it is short. In this case,
application can stuck but it's okay as it doesn't go to RT section yet.
The applicatoin have to notify end of RT section to memcg, too so that memcg
could try to fill up reserved page pool in case of shortage.

Why we need such notification is kswapd high prioiry, new knob and others never
can meet application's deadline requirement in some situations(ex,
there are so many dirty pages in LRU or fill up anon pages in non-swap case and so on)
so that application might end up stuck at some point. The somepoint must be out of RT
section of the task.

For implemenation, we might need new watermark setting for each memcg or/and
kswapd prioirity promotion like thing for hurry reclaiming.
Anyway, they are just implementaions and we could enhance/add further more through
various techniques as time goes by.

Personally, I think it could a valuable featue.

-- 
Kinds regards,
Minchan Kim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ