lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <4F57C610.8050101@openvz.org>
Date:	Thu, 08 Mar 2012 00:33:20 +0400
From:	Konstantin Khlebnikov <khlebnikov@...nvz.org>
To:	Zheng Liu <gnehzuil.liu@...il.com>
CC:	"linux-mm@...ck.org" <linux-mm@...ck.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: Fine granularity page reclaim

Zheng Liu wrote:
>
>
> On Monday, February 20, 2012, Konstantin Khlebnikov <khlebnikov@...nvz.org <mailto:khlebnikov@...nvz.org>> wrote:
>  > Zheng Liu wrote:
>  >>
>  >> Cc linux-kernel mailing list.
>  >>
>  >> On Sat, Feb 18, 2012 at 12:20:05AM +0400, Konstantin Khlebnikov wrote:
>  >>>
>  >>> Zheng Liu wrote:
>  >>>>
>  >>>> Hi all,
>  >>>>
>  >>>> Currently, we encounter a problem about page reclaim. In our product system,
>  >>>> there is a lot of applictions that manipulate a number of files. In these
>  >>>> files, they can be divided into two categories. One is index file, another is
>  >>>> block file. The number of index files is about 15,000, and the number of
>  >>>> block files is about 23,000 in a 2TB disk. The application accesses index
>  >>>> file using mmap(2), and read/write block file using pread(2)/pwrite(2). We hope
>  >>>> to hold index file in memory as much as possible, and it works well in Redhat
>  >>>> 2.6.18-164. It is about 60-70% of index files that can be hold in memory.
>  >>>> However, it doesn't work well in Redhat 2.6.32-133. I know in 2.6.18 that the
>  >>>> linux uses an active list and an inactive list to handle page reclaim, and in
>  >>>> 2.6.32 that they are divided into anonymous list and file list. So I am
>  >>>> curious about why most of index files can be hold in 2.6.18? The index file
>  >>>> should be replaced because mmap doesn't impact the lru list.
>  >>>
>  >>> There was my patch for fixing similar problem with shared/executable mapped pages
>  >>> "vmscan: promote shared file mapped pages" commit 34dbc67a644f and commit c909e99364c
>  >>> maybe it will help in your case.
>  >>
>  >> Hi Konstantin,
>  >>
>  >> Thank you for your reply.  I have tested it in upstream kernel.  These
>  >> patches are useful for multi-processes applications.  But, in our product
>  >> system, there are some applications that are multi-thread.  So
>  >> 'references_ptes>  1' cannot help these applications to hold the data in
>  >> memory.
>  >
>  > Ok, what if you mmap you data as executable, just to test.
>  > Then these pages will be activated after first touch.
>  > In attachment patch with per-mm flag with the same effect.
>  >
>
> Hi Konstantin,
>
> Sorry for the delay reply.  Last two weeks I was trying these two solutions
> and evaluating the impacts for the performance in our product system.
> Good news is that these two solutions both work well. They can keep
> mapped files in memory under mult-thread.  But I have a question for
> the first solution (map the file with PROT_EXEC flag).  I think this way is
> too tricky.  As I said previously, these files that needs to be mapped only
> are normal index file, and they shouldn't be mapped with PROT_EXEC flag
> from the view of an application programmer.  So actually the key issue is
> that we should provide a mechanism, which lets different file sets can be
> reclaimed separately.  I am not sure whether this idea is useful or not.  So
> any feedbacks are welcomed.:-).  Thank you.
>

Sounds good. Yes, PROT_EXEC isn't very usable and secure, per-mm flag not
very flexible too. I prefer setting some kind of memory pressure priorities
for each vma and inode. Probably we can sort vma and inodes into different
cgroup-like sets and balance memory pressure between them.
Maybe someone was thought about it...
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ