[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4F968B9B.3080807@dlh.net>
Date: Tue, 24 Apr 2012 13:16:43 +0200
From: Peter Lieven <pl@....net>
To: Richard Davies <richard@...chsys.com>
CC: linux-kernel@...r.kernel.org, linux-mm@...ck.org,
Minchan Kim <minchan.kim@...il.com>,
Wu Fengguang <fengguang.wu@...el.com>,
Balbir Singh <balbir@...ux.vnet.ibm.com>,
Christoph Lameter <cl@...ux-foundation.org>,
Lee Schermerhorn <lee.schermerhorn@...com>,
Chris Webb <chris@...chsys.com>
Subject: Re: Over-eager swapping
On 23.04.2012 11:27, Richard Davies wrote:
> We run a number of relatively large x86-64 hosts with twenty or so qemu-kvm
> virtual machines on each of them, and I'm have some trouble with over-eager
> swapping on some of the machines. This is resulting in load spikes during the
> swapping and customer reports of very poor response latency from the virtual
> machines which have been swapped out, despite the hosts apparently having
> large amounts of free memory, and running fine if swap is turned off.
>
>
> All of the hosts are currently running a 3.1.4 or 3.2.2 kernel and have ksm
> enabled with 64GB of RAM and 2x eight-core AMD Opteron 6128 processors.
> However, we have seen this same problem since 2010 on a 2.6.32.7 kernel and
> older hardware - see http://marc.info/?l=linux-mm&m=128075337008943
> (previous helpful contributors cc:ed here - thanks).
>
> We have /proc/sys/vm/swappiness set to 0. The kernel config is here:
> http://users.org.uk/config-3.1.4
>
>
> The rrd graphs at http://imgur.com/a/Fklxr show a typical incident.
>
> We estimate memory used from /proc/meminfo as:
>
> = MemTotal - MemFree - Buffers + SwapTotal - SwapFree
>
> The first rrd shows memory used increasing as a VM starts, but not getting
> near the 64GB of physical RAM.
>
> The second rrd shows the heavy swapping this VM start caused.
>
> The third rrd shows a multi-gigabyte jump in swap used = SwapTotal - SwapFree
>
> The fourth rrd shows the large load spike (from 1 to 15) caused by this swap
> storm.
>
>
> It is obviously hard to capture all of the relevant data actually during an
> incident. However, as of this morning, the relevant stats are as below.
>
> Any help much appreciated! Our strong belief is that there is unnecessary
> swapping going on here, and causing these load spikes. We would like to run
> with swap for real out-of-memory situations, but at present it is causing
> these kind of load spikes on machines which run completely happily with swap
> disabled.
I wonder why a lot of buffers are allocated at all. Can you describe
whats your
storage backend and provide your qemu-kvm command line. Anyhow,
qemu-devel@...gnu.org
might be the better list to discuss this.
Peter
> Thanks,
>
> Richard.
>
>
> # cat /proc/meminfo
> MemTotal: 65915384 kB
> MemFree: 271104 kB
> Buffers: 36274368 kB
> Cached: 31048 kB
> SwapCached: 1830860 kB
> Active: 30594144 kB
> Inactive: 32295972 kB
> Active(anon): 21883428 kB
> Inactive(anon): 4695308 kB
> Active(file): 8710716 kB
> Inactive(file): 27600664 kB
> Unevictable: 6740 kB
> Mlocked: 6740 kB
> SwapTotal: 33054708 kB
> SwapFree: 30067948 kB
> Dirty: 1044 kB
> Writeback: 0 kB
> AnonPages: 24962708 kB
> Mapped: 7320 kB
> Shmem: 48 kB
> Slab: 2210964 kB
> SReclaimable: 1013272 kB
> SUnreclaim: 1197692 kB
> KernelStack: 6816 kB
> PageTables: 129248 kB
> NFS_Unstable: 0 kB
> Bounce: 0 kB
> WritebackTmp: 0 kB
> CommitLimit: 66012400 kB
> Committed_AS: 67375852 kB
> VmallocTotal: 34359738367 kB
> VmallocUsed: 259380 kB
> VmallocChunk: 34308695568 kB
> HardwareCorrupted: 0 kB
> AnonHugePages: 155648 kB
> HugePages_Total: 0
> HugePages_Free: 0
> HugePages_Rsvd: 0
> HugePages_Surp: 0
> Hugepagesize: 2048 kB
> DirectMap4k: 576 kB
> DirectMap2M: 2095104 kB
> DirectMap1G: 65011712 kB
>
> # cat /proc/sys/vm/zone_reclaim_mode
> 0
>
> # cat /proc/sys/vm/min_unmapped_ratio
> 1
>
> # cat /proc/slabinfo
> slabinfo - version: 2.1
> # name<active_objs> <num_objs> <objsize> <objperslab> <pagesperslab> : tunables<limit> <batchcount> <sharedfactor> : slabdata<active_slabs> <num_slabs> <sharedavail>
> ext4_groupinfo_1k 32 32 128 32 1 : tunables 0 0 0 : slabdata 1 1 0
> RAWv6 34 34 960 34 8 : tunables 0 0 0 : slabdata 1 1 0
> UDPLITEv6 0 0 960 34 8 : tunables 0 0 0 : slabdata 0 0 0
> UDPv6 544 544 960 34 8 : tunables 0 0 0 : slabdata 16 16 0
> tw_sock_TCPv6 0 0 320 25 2 : tunables 0 0 0 : slabdata 0 0 0
> TCPv6 72 72 1728 18 8 : tunables 0 0 0 : slabdata 4 4 0
> nf_conntrack_expect 592 592 216 37 2 : tunables 0 0 0 : slabdata 16 16 0
> nf_conntrack_ffffffff8199a280 933 1856 280 29 2 : tunables 0 0 0 : slabdata 64 64 0
> dm_raid1_read_record 0 0 1064 30 8 : tunables 0 0 0 : slabdata 0 0 0
> dm_snap_pending_exception 0 0 104 39 1 : tunables 0 0 0 : slabdata 0 0 0
> dm_crypt_io 1811 2574 152 26 1 : tunables 0 0 0 : slabdata 99 99 0
> kcopyd_job 0 0 3240 10 8 : tunables 0 0 0 : slabdata 0 0 0
> dm_uevent 0 0 2608 12 8 : tunables 0 0 0 : slabdata 0 0 0
> cfq_queue 0 0 232 35 2 : tunables 0 0 0 : slabdata 0 0 0
> bsg_cmd 0 0 312 26 2 : tunables 0 0 0 : slabdata 0 0 0
> mqueue_inode_cache 36 36 896 36 8 : tunables 0 0 0 : slabdata 1 1 0
> udf_inode_cache 0 0 656 24 4 : tunables 0 0 0 : slabdata 0 0 0
> fuse_request 0 0 608 26 4 : tunables 0 0 0 : slabdata 0 0 0
> fuse_inode 0 0 704 46 8 : tunables 0 0 0 : slabdata 0 0 0
> ntfs_big_inode_cache 0 0 832 39 8 : tunables 0 0 0 : slabdata 0 0 0
> ntfs_inode_cache 0 0 280 29 2 : tunables 0 0 0 : slabdata 0 0 0
> isofs_inode_cache 0 0 600 27 4 : tunables 0 0 0 : slabdata 0 0 0
> fat_inode_cache 0 0 664 24 4 : tunables 0 0 0 : slabdata 0 0 0
> fat_cache 0 0 40 102 1 : tunables 0 0 0 : slabdata 0 0 0
> hugetlbfs_inode_cache 28 28 568 28 4 : tunables 0 0 0 : slabdata 1 1 0
> squashfs_inode_cache 0 0 640 25 4 : tunables 0 0 0 : slabdata 0 0 0
> jbd2_journal_handle 2720 2720 24 170 1 : tunables 0 0 0 : slabdata 16 16 0
> jbd2_journal_head 818 1620 112 36 1 : tunables 0 0 0 : slabdata 45 45 0
> jbd2_revoke_record 2048 4096 32 128 1 : tunables 0 0 0 : slabdata 32 32 0
> ext4_inode_cache 2754 5328 864 37 8 : tunables 0 0 0 : slabdata 144 144 0
> ext4_xattr 0 0 88 46 1 : tunables 0 0 0 : slabdata 0 0 0
> ext4_free_data 1168 2628 56 73 1 : tunables 0 0 0 : slabdata 36 36 0
> ext4_allocation_context 540 540 136 30 1 : tunables 0 0 0 : slabdata 18 18 0
> ext4_io_end 0 0 1128 29 8 : tunables 0 0 0 : slabdata 0 0 0
> ext4_io_page 256 256 16 256 1 : tunables 0 0 0 : slabdata 1 1 0
> configfs_dir_cache 0 0 88 46 1 : tunables 0 0 0 : slabdata 0 0 0
> kioctx 0 0 384 42 4 : tunables 0 0 0 : slabdata 0 0 0
> inotify_inode_mark 30 30 136 30 1 : tunables 0 0 0 : slabdata 1 1 0
> kvm_async_pf 448 448 144 28 1 : tunables 0 0 0 : slabdata 16 16 0
> kvm_vcpu 64 94 13856 2 8 : tunables 0 0 0 : slabdata 47 47 0
> UDP-Lite 0 0 768 42 8 : tunables 0 0 0 : slabdata 0 0 0
> xfrm_dst_cache 0 0 448 36 4 : tunables 0 0 0 : slabdata 0 0 0
> ip_fib_trie 219 219 56 73 1 : tunables 0 0 0 : slabdata 3 3 0
> arp_cache 417 500 320 25 2 : tunables 0 0 0 : slabdata 20 20 0
> RAW 672 672 768 42 8 : tunables 0 0 0 : slabdata 16 16 0
> UDP 672 672 768 42 8 : tunables 0 0 0 : slabdata 16 16 0
> tw_sock_TCP 512 1088 256 32 2 : tunables 0 0 0 : slabdata 34 34 0
> TCP 345 357 1536 21 8 : tunables 0 0 0 : slabdata 17 17 0
> blkdev_queue 414 440 1616 20 8 : tunables 0 0 0 : slabdata 22 22 0
> blkdev_requests 945 2209 344 47 4 : tunables 0 0 0 : slabdata 47 47 0
> sock_inode_cache 456 475 640 25 4 : tunables 0 0 0 : slabdata 19 19 0
> shmem_inode_cache 2063 2375 632 25 4 : tunables 0 0 0 : slabdata 95 95 0
> Acpi-ParseExt 3848 3864 72 56 1 : tunables 0 0 0 : slabdata 69 69 0
> Acpi-Namespace 633667 1059270 40 102 1 : tunables 0 0 0 : slabdata 10385 10385 0
> task_delay_info 1238 1584 112 36 1 : tunables 0 0 0 : slabdata 44 44 0
> taskstats 384 384 328 24 2 : tunables 0 0 0 : slabdata 16 16 0
> proc_inode_cache 2460 3250 616 26 4 : tunables 0 0 0 : slabdata 125 125 0
> sigqueue 400 400 160 25 1 : tunables 0 0 0 : slabdata 16 16 0
> bdev_cache 701 714 768 42 8 : tunables 0 0 0 : slabdata 17 17 0
> sysfs_dir_cache 31662 34425 80 51 1 : tunables 0 0 0 : slabdata 675 675 0
> inode_cache 2546 3886 552 29 4 : tunables 0 0 0 : slabdata 134 134 0
> dentry 9452 14868 192 42 2 : tunables 0 0 0 : slabdata 354 354 0
> buffer_head 8175114 8360937 104 39 1 : tunables 0 0 0 : slabdata 214383 214383 0
> vm_area_struct 35344 35834 176 46 2 : tunables 0 0 0 : slabdata 782 782 0
> files_cache 736 874 704 46 8 : tunables 0 0 0 : slabdata 19 19 0
> signal_cache 1011 1296 896 36 8 : tunables 0 0 0 : slabdata 36 36 0
> sighand_cache 682 945 2112 15 8 : tunables 0 0 0 : slabdata 63 63 0
> task_struct 1057 1386 1520 21 8 : tunables 0 0 0 : slabdata 66 66 0
> anon_vma 2417 2856 72 56 1 : tunables 0 0 0 : slabdata 51 51 0
> shared_policy_node 4877 6800 48 85 1 : tunables 0 0 0 : slabdata 80 80 0
> numa_policy 45589 48450 24 170 1 : tunables 0 0 0 : slabdata 285 285 0
> radix_tree_node 227192 248388 568 28 4 : tunables 0 0 0 : slabdata 9174 9174 0
> idr_layer_cache 603 660 544 30 4 : tunables 0 0 0 : slabdata 22 22 0
> dma-kmalloc-8192 0 0 8192 4 8 : tunables 0 0 0 : slabdata 0 0 0
> dma-kmalloc-4096 0 0 4096 8 8 : tunables 0 0 0 : slabdata 0 0 0
> dma-kmalloc-2048 0 0 2048 16 8 : tunables 0 0 0 : slabdata 0 0 0
> dma-kmalloc-1024 0 0 1024 32 8 : tunables 0 0 0 : slabdata 0 0 0
> dma-kmalloc-512 0 0 512 32 4 : tunables 0 0 0 : slabdata 0 0 0
> dma-kmalloc-256 0 0 256 32 2 : tunables 0 0 0 : slabdata 0 0 0
> dma-kmalloc-128 0 0 128 32 1 : tunables 0 0 0 : slabdata 0 0 0
> dma-kmalloc-64 0 0 64 64 1 : tunables 0 0 0 : slabdata 0 0 0
> dma-kmalloc-32 0 0 32 128 1 : tunables 0 0 0 : slabdata 0 0 0
> dma-kmalloc-16 0 0 16 256 1 : tunables 0 0 0 : slabdata 0 0 0
> dma-kmalloc-8 0 0 8 512 1 : tunables 0 0 0 : slabdata 0 0 0
> dma-kmalloc-192 0 0 192 42 2 : tunables 0 0 0 : slabdata 0 0 0
> dma-kmalloc-96 0 0 96 42 1 : tunables 0 0 0 : slabdata 0 0 0
> kmalloc-8192 88 100 8192 4 8 : tunables 0 0 0 : slabdata 25 25 0
> kmalloc-4096 3567 3704 4096 8 8 : tunables 0 0 0 : slabdata 463 463 0
> kmalloc-2048 55140 55936 2048 16 8 : tunables 0 0 0 : slabdata 3496 3496 0
> kmalloc-1024 5960 6496 1024 32 8 : tunables 0 0 0 : slabdata 203 203 0
> kmalloc-512 12185 12704 512 32 4 : tunables 0 0 0 : slabdata 397 397 0
> kmalloc-256 195078 199040 256 32 2 : tunables 0 0 0 : slabdata 6220 6220 0
> kmalloc-128 45645 47328 128 32 1 : tunables 0 0 0 : slabdata 1479 1479 0
> kmalloc-64 14647251 14776576 64 64 1 : tunables 0 0 0 : slabdata 230884 230884 0
> kmalloc-32 5573 7552 32 128 1 : tunables 0 0 0 : slabdata 59 59 0
> kmalloc-16 7550 10752 16 256 1 : tunables 0 0 0 : slabdata 42 42 0
> kmalloc-8 13805 14848 8 512 1 : tunables 0 0 0 : slabdata 29 29 0
> kmalloc-192 47641 50883 192 42 2 : tunables 0 0 0 : slabdata 1214 1214 0
> kmalloc-96 3673 6006 96 42 1 : tunables 0 0 0 : slabdata 143 143 0
> kmem_cache 32 32 256 32 2 : tunables 0 0 0 : slabdata 1 1 0
> kmem_cache_node 495 576 64 64 1 : tunables 0 0 0 : slabdata 9 9 0
>
> # cat /proc/buddyinfo
> Node 0, zone DMA 0 0 1 0 2 1 1 0 1 1 3
> Node 0, zone DMA32 9148 1941 657 673 131 53 18 2 0 0 0
> Node 0, zone Normal 8080 13 0 2 0 2 1 0 1 0 0
> Node 1, zone Normal 19071 3239 675 200 413 37 4 1 2 0 0
> Node 2, zone Normal 37716 3924 154 9 3 1 2 0 1 0 0
> Node 3, zone Normal 20015 4590 1768 996 334 20 1 1 1 0 0
>
> # grep MemFree /sys/devices/system/node/node*/meminfo
> /sys/devices/system/node/node0/meminfo:Node 0 MemFree: 201460 kB
> /sys/devices/system/node/node1/meminfo:Node 1 MemFree: 283224 kB
> /sys/devices/system/node/node2/meminfo:Node 2 MemFree: 287060 kB
> /sys/devices/system/node/node3/meminfo:Node 3 MemFree: 316928 kB
>
> # cat /proc/vmstat
> nr_free_pages 224933
> nr_inactive_anon 1173838
> nr_active_anon 5209232
> nr_inactive_file 6998686
> nr_active_file 2180311
> nr_unevictable 1685
> nr_mlock 1685
> nr_anon_pages 5940145
> nr_mapped 1836
> nr_file_pages 9635092
> nr_dirty 603
> nr_writeback 0
> nr_slab_reclaimable 253121
> nr_slab_unreclaimable 299440
> nr_page_table_pages 32311
> nr_kernel_stack 854
> nr_unstable 0
> nr_bounce 0
> nr_vmscan_write 50485772
> nr_writeback_temp 0
> nr_isolated_anon 0
> nr_isolated_file 0
> nr_shmem 12
> nr_dirtied 5630347228
> nr_written 5625041387
> numa_hit 28372623283
> numa_miss 4761673976
> numa_foreign 4761673976
> numa_interleave 30490
> numa_local 28372334279
> numa_other 4761962980
> nr_anon_transparent_hugepages 76
> nr_dirty_threshold 8192
> nr_dirty_background_threshold 4096
> pgpgin 9523143630
> pgpgout 23124688920
> pswpin 57978726
> pswpout 50121412
> pgalloc_dma 0
> pgalloc_dma32 1132547190
> pgalloc_normal 32421613044
> pgalloc_movable 0
> pgfree 39379011152
> pgactivate 751722445
> pgdeactivate 591205976
> pgfault 41103638391
> pgmajfault 11853858
> pgrefill_dma 0
> pgrefill_dma32 24124080
> pgrefill_normal 540719764
> pgrefill_movable 0
> pgsteal_dma 0
> pgsteal_dma32 297677595
> pgsteal_normal 4784595717
> pgsteal_movable 0
> pgscan_kswapd_dma 0
> pgscan_kswapd_dma32 241277864
> pgscan_kswapd_normal 4004618399
> pgscan_kswapd_movable 0
> pgscan_direct_dma 0
> pgscan_direct_dma32 65729843
> pgscan_direct_normal 1012932822
> pgscan_direct_movable 0
> zone_reclaim_failed 0
> pginodesteal 66
> slabs_scanned 668153728
> kswapd_steal 4063341017
> kswapd_inodesteal 2063
> kswapd_low_wmark_hit_quickly 9834
> kswapd_high_wmark_hit_quickly 488468
> kswapd_skip_congestion_wait 580150
> pageoutrun 22006623
> allocstall 926752
> pgrotated 28467920
> compact_blocks_moved 522323130
> compact_pages_moved 5774251432
> compact_pagemigrate_failed 5267247
> compact_stall 121045
> compact_fail 68349
> compact_success 52696
> htlb_buddy_alloc_success 0
> htlb_buddy_alloc_fail 0
> unevictable_pgs_culled 19976952
> unevictable_pgs_scanned 0
> unevictable_pgs_rescued 33137561
> unevictable_pgs_mlocked 35042070
> unevictable_pgs_munlocked 33138335
> unevictable_pgs_cleared 0
> unevictable_pgs_stranded 0
> unevictable_pgs_mlockfreed 1024
> thp_fault_alloc 263176
> thp_fault_fallback 717335
> thp_collapse_alloc 21307
> thp_collapse_alloc_failed 91103
> thp_split 90328
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@...ck.org. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
> Don't email:<a href=mailto:"dont@...ck.org"> email@...ck.org</a>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists