linux-kernel - Re: [RFC][PATCH 1/2] Show quicklist at meminfo

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20080821212847.f7fc936b.akpm@linux-foundation.org>
Date:	Thu, 21 Aug 2008 21:28:47 -0700
From:	Andrew Morton <akpm@...ux-foundation.org>
To:	KOSAKI Motohiro <kosaki.motohiro@...fujitsu.com>
Cc:	linux-kernel@...r.kernel.org, linux-mm@...ck.org,
	cl@...ux-foundation.org, tokunaga.keiich@...fujitsu.com
Subject: Re: [RFC][PATCH 1/2] Show quicklist at meminfo

On Fri, 22 Aug 2008 10:05:45 +0900 KOSAKI Motohiro <kosaki.motohiro@...fujitsu.com> wrote:

> > > quicklist_total_size() is racy against cpu hotplug.  That's OK for
> > > /proc/meminfo purposes (occasional transient inaccuracy?), but will it
> > > crash?  Not in the current implementation of per_cpu() afaict, but it
> > > might crash if we ever teach cpu hotunplug to free up the percpu
> > > resources.
> > 
> > First, Quicklist doesn't concern to cpu hotplug at all.
> > it is another quicklist problem.
> > 
> > Next, I think it doesn't cause crash. but I haven't any test.
> > So, I'll test cpu hotplug/unplug testing today.
> > 
> > I'll report result tommorow.
> 
> OK.
> I ran cpu hotplug/unplug coutinuous workload over 12H.
> then, system crash doesn't happend.
> 
> So, I believe my patch is cpu unplug safe.

err, which patch?

I presently have:

mm-show-quicklist-memory-usage-in-proc-meminfo.patch
mm-show-quicklist-memory-usage-in-proc-meminfo-fix.patch
mm-quicklist-shouldnt-be-proportional-to-number-of-cpus.patch
mm-quicklist-shouldnt-be-proportional-to-number-of-cpus-fix.patch

Is that what you have?

I'll consolidate them into two patches and will append them here.  Please check.


From: KOSAKI Motohiro <kosaki.motohiro@...fujitsu.com>

At present the quicklists store some page for each CPU as a cache.  (Each
CPU has node_free_pages/16 pages)

It is used for page table cache.  Then, exit() increase cache, the other
hand fork() spent it.

So, if apache type (one parent and many child model) middleware run, One
CPU process fork(), Other CPU process the middleware work and exit().

At that time, One CPU don't have page table cache at all, Others have
maximum caches.

	QList_max = (#ofCPUs - 1) x Free / 16
	=> QList_max / (Free + QList_max) = (#ofCPUs - 1) / (16 + #ofCPUs - 1)

So, How much quicklist spent memory at maximum case?  That is #CPUs
proposional because it is per CPU cache but cache amount calculation
doesn't use #ofCPUs.

	Above calculation mean

	 Number of CPUs per node            2    4    8   16
	 ==============================  ====================
	 QList_max / (Free + QList_max)   5.8%  16%  30%  48%


Wow!  Quicklist can spent about 50% memory at worst case.  More
unfortunately, it doesn't have any cache shrinking mechanism.  So it cause
some wrong thing.

1. End user misunderstand to memory leak happend.
	=> /proc/meminfo should display amount quicklist

2. It can cause OOM killer
	=> Amount of quicklists shouldn't be proportional to number of CPUs.



This patch:

Quicklists can consume several GB memory.  So, if end user can't see how
much memory is used, he can fail to understand why a memory leak happend.

after this patch applied, /proc/meminfo output following.

% cat /proc/meminfo

MemTotal:        7701504 kB
MemFree:         5159040 kB
Buffers:          112960 kB
Cached:           337536 kB
SwapCached:            0 kB
Active:           218944 kB
Inactive:         350848 kB
Active(anon):     120832 kB
Inactive(anon):        0 kB
Active(file):      98112 kB
Inactive(file):   350848 kB
Unevictable:           0 kB
Mlocked:               0 kB
SwapTotal:       2031488 kB
SwapFree:        2031488 kB
Dirty:               320 kB
Writeback:             0 kB
AnonPages:        119488 kB
Mapped:            38528 kB
Slab:            1595712 kB
SReclaimable:      23744 kB
SUnreclaim:      1571968 kB
PageTables:        14336 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:     5882240 kB
Committed_AS:     356672 kB
VmallocTotal:   17592177655808 kB
VmallocUsed:       29056 kB
VmallocChunk:   17592177626304 kB
Quicklists:       283776 kB
HugePages_Total:     0
HugePages_Free:      0
HugePages_Rsvd:      0
HugePages_Surp:      0
Hugepagesize:    262144 kB

[akpm@...ux-foundation.org: build fix]
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@...fujitsu.com>
Cc: Christoph Lameter <cl@...ux-foundation.org>
Cc: <stable@...nel.org>		[2.6.25.x, 2.6.26.x]
Signed-off-by: Andrew Morton <akpm@...ux-foundation.org>
---

 fs/proc/proc_misc.c       |    7 +++++--
 include/linux/quicklist.h |    7 +++++++
 2 files changed, 12 insertions(+), 2 deletions(-)

diff -puN fs/proc/proc_misc.c~mm-show-quicklist-memory-usage-in-proc-meminfo fs/proc/proc_misc.c
--- a/fs/proc/proc_misc.c~mm-show-quicklist-memory-usage-in-proc-meminfo
+++ a/fs/proc/proc_misc.c
@@ -24,6 +24,7 @@
 #include <linux/tty.h>
 #include <linux/string.h>
 #include <linux/mman.h>
+#include <linux/quicklist.h>
 #include <linux/proc_fs.h>
 #include <linux/ioport.h>
 #include <linux/mm.h>
@@ -189,7 +190,8 @@ static int meminfo_read_proc(char *page,
 		"Committed_AS: %8lu kB\n"
 		"VmallocTotal: %8lu kB\n"
 		"VmallocUsed:  %8lu kB\n"
-		"VmallocChunk: %8lu kB\n",
+		"VmallocChunk:   %8lu kB\n"
+		"Quicklists:     %8lu kB\n",
 		K(i.totalram),
 		K(i.freeram),
 		K(i.bufferram),
@@ -221,7 +223,8 @@ static int meminfo_read_proc(char *page,
 		K(committed),
 		(unsigned long)VMALLOC_TOTAL >> 10,
 		vmi.used >> 10,
-		vmi.largest_chunk >> 10
+		vmi.largest_chunk >> 10,
+		K(quicklist_total_size())
 		);
 
 		len += hugetlb_report_meminfo(page + len);
diff -puN include/linux/quicklist.h~mm-show-quicklist-memory-usage-in-proc-meminfo include/linux/quicklist.h
--- a/include/linux/quicklist.h~mm-show-quicklist-memory-usage-in-proc-meminfo
+++ a/include/linux/quicklist.h
@@ -80,6 +80,13 @@ void quicklist_trim(int nr, void (*dtor)
 
 unsigned long quicklist_total_size(void);
 
+#else
+
+static inline unsigned long quicklist_total_size(void)
+{
+	return 0;
+}
+
 #endif
 
 #endif /* LINUX_QUICKLIST_H */
_



From: KOSAKI Motohiro <kosaki.motohiro@...fujitsu.com>

When a test program which does task migration runs, my 8GB box spends
800MB of memory for quicklist.  This is not memory leak but doesn't seem
good.

% cat /proc/meminfo

MemTotal:        7701568 kB
MemFree:         4724672 kB
(snip)
Quicklists:       844800 kB

because

- My machine spec is
	number of numa node: 2
	number of cpus:      8 (4CPU x2 node)
        total mem:           8GB (4GB x2 node)
        free mem:            about 5GB

- Maximum quicklist usage is here

	 Number of CPUs per node            2    4    8   16
	 ==============================  ====================
	 QList_max / (Free + QList_max)   5.8%  16%  30%  48%

- Then, 4.7GB x 16% ~= 880MB.
  So, Quicklist can use 800MB.

So, if following spec machine run that program

   CPUs: 64 (8cpu x 8node)
   Mem:  1TB (128GB x8node)

Then, quicklist can waste 300GB (= 1TB x 30%).  It is too large.

So, I don't like cache policies which is proportional to # of cpus.

My patch changes the number of caches
from:
   per-cpu-cache-amount = memory_on_node / 16
to
   per-cpu-cache-amount = memory_on_node / 16 / number_of_cpus_on_node.

I think this is reasonable.  but even if this patch is applied, quicklist
can cache tons of memory on big machine.

(Although its patch applied, quicklist can waste 64GB on 1TB server (= 1TB
/ 16), it is still too much??)

test program is below.
--------------------------------------------------------------------------------
#define _GNU_SOURCE

#include <stdio.h>
#include <errno.h>
#include <stdlib.h>
#include <string.h>
#include <sched.h>
#include <unistd.h>
#include <sys/mman.h>
#include <sys/wait.h>

#define BUFFSIZE 512

int max_cpu(void)	/* get max number of logical cpus from /proc/cpuinfo */
{
  FILE *fd;
  char *ret, buffer[BUFFSIZE];
  int cpu = 1;

  fd = fopen("/proc/cpuinfo", "r");
  if (fd == NULL) {
    perror("fopen(/proc/cpuinfo)");
    exit(EXIT_FAILURE);
  }
  while (1) {
    ret = fgets(buffer, BUFFSIZE, fd);
    if (ret == NULL)
      break;
    if (!strncmp(buffer, "processor", 9))
      cpu = atoi(strchr(buffer, ':') + 2);
  }
  fclose(fd);
  return cpu;
}

void cpu_bind(int cpu)	/* bind current process to one cpu */
{
  cpu_set_t mask;
  int ret;

  CPU_ZERO(&mask);
  CPU_SET(cpu, &mask);
  ret = sched_setaffinity(0, sizeof(mask), &mask);
  if (ret == -1) {
    perror("sched_setaffinity()");
    exit(EXIT_FAILURE);
  }
  sched_yield();	/* not necessary */
}

#define MMAP_SIZE (10 * 1024 * 1024)	/* 10 MB */
#define FORK_INTERVAL 1	/* 1 second */

main(int argc, char *argv[])
{
  int cpu_max, nextcpu;
  long pagesize;
  pid_t pid;

  /* set max number of logical cpu */
  if (argc > 1)
    cpu_max = atoi(argv[1]) - 1;
  else
    cpu_max = max_cpu();

  /* get the page size */
  pagesize = sysconf(_SC_PAGESIZE);
  if (pagesize == -1) {
    perror("sysconf(_SC_PAGESIZE)");
    exit(EXIT_FAILURE);
  }

  /* prepare parent process */
  cpu_bind(0);
  nextcpu = cpu_max;

loop:

  /* select destination cpu for child process by round-robin rule */
  if (++nextcpu > cpu_max)
    nextcpu = 1;

  pid = fork();

  if (pid == 0) { /* child action */

    char *p;
    int i;

    /* consume page tables */
    p = mmap(0, MMAP_SIZE, PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS, 0, 0);
    i = MMAP_SIZE / pagesize;
    while (i-- > 0) {
      *p = 1;
      p += pagesize;
    }

    /* move to other cpu */
    cpu_bind(nextcpu);
/*
    printf("a child moved to cpu%d after mmap().\n", nextcpu);
    fflush(stdout);
 */

    /* back page tables to pgtable_quicklist */
    exit(0);

  } else if (pid > 0) { /* parent action */

    sleep(FORK_INTERVAL);
    waitpid(pid, NULL, WNOHANG);

  }

  goto loop;
}

[akpm@...ux-foundation.org: fix build on sparc64]
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@...fujitsu.com>
Cc: Christoph Lameter <cl@...ux-foundation.org>
Cc: <stable@...nel.org>		[2.6.25.x, 2.6.26.x]
Signed-off-by: Andrew Morton <akpm@...ux-foundation.org>
---

 mm/quicklist.c |    8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff -puN mm/quicklist.c~mm-quicklist-shouldnt-be-proportional-to-number-of-cpus mm/quicklist.c
--- a/mm/quicklist.c~mm-quicklist-shouldnt-be-proportional-to-number-of-cpus
+++ a/mm/quicklist.c
@@ -26,7 +26,9 @@ DEFINE_PER_CPU(struct quicklist, quickli
 static unsigned long max_pages(unsigned long min_pages)
 {
 	unsigned long node_free_pages, max;
-	struct zone *zones = NODE_DATA(numa_node_id())->node_zones;
+	int node = numa_node_id();
+	struct zone *zones = NODE_DATA(node)->node_zones;
+	cpumask_t node_cpumask;
 
 	node_free_pages =
 #ifdef CONFIG_ZONE_DMA
@@ -38,6 +40,10 @@ static unsigned long max_pages(unsigned 
 		zone_page_state(&zones[ZONE_NORMAL], NR_FREE_PAGES);
 
 	max = node_free_pages / FRACTION_OF_NODE_MEM;
+
+	node_cpumask = node_to_cpumask(node);
+	max /= cpus_weight_nr(node_cpumask);
+
 	return max(max, min_pages);
 }
 
_

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/