[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <555d1f95-7c9e-2691-b14f-0260f90d23a9@wiesinger.com>
Date: Sun, 19 Mar 2017 17:02:44 +0100
From: Gerhard Wiesinger <lists@...singer.com>
To: Michal Hocko <mhocko@...nel.org>
Cc: lkml@...garu.com, Minchan Kim <minchan@...nel.org>,
linux-kernel@...r.kernel.org, linux-mm@...ck.org,
Linus Torvalds <torvalds@...ux-foundation.org>
Subject: Re: Still OOM problems with 4.9er/4.10er kernels
On 19.03.2017 16:18, Michal Hocko wrote:
> On Fri 17-03-17 21:08:31, Gerhard Wiesinger wrote:
>> On 17.03.2017 18:13, Michal Hocko wrote:
>>> On Fri 17-03-17 17:37:48, Gerhard Wiesinger wrote:
>>> [...]
>>>> Why does the kernel prefer to swapin/out and not use
>>>>
>>>> a.) the free memory?
>>> It will use all the free memory up to min watermark which is set up
>>> based on min_free_kbytes.
>> Makes sense, how is /proc/sys/vm/min_free_kbytes default value calculated?
> See init_per_zone_wmark_min
>
>>>> b.) the buffer/cache?
>>> the memory reclaim is strongly biased towards page cache and we try to
>>> avoid swapout as much as possible (see get_scan_count).
>> If I understand it correctly, swapping is preferred over dropping the
>> cache, right. Can this behaviour be changed to prefer dropping the
>> cache to some minimum amount? Is this also configurable in a way?
> No, we enforce swapping if the amount of free + file pages are below the
> cumulative high watermark.
>
>> (As far as I remember e.g. kernel 2.4 dropped the caches well).
>>
>>>> There is ~100M memory available but kernel swaps all the time ...
>>>>
>>>> Any ideas?
>>>>
>>>> Kernel: 4.9.14-200.fc25.x86_64
>>>>
>>>> top - 17:33:43 up 28 min, 3 users, load average: 3.58, 1.67, 0.89
>>>> Tasks: 145 total, 4 running, 141 sleeping, 0 stopped, 0 zombie
>>>> %Cpu(s): 19.1 us, 56.2 sy, 0.0 ni, 4.3 id, 13.4 wa, 2.0 hi, 0.3 si, 4.7
>>>> st
>>>> KiB Mem : 230076 total, 61508 free, 123472 used, 45096 buff/cache
>>>>
>>>> procs -----------memory---------- ---swap-- -----io---- -system--
>>>> ------cpu-----
>>>> r b swpd free buff cache si so bi bo in cs us sy id wa st
>>>> 3 5 303916 60372 328 43864 27828 200 41420 236 6984 11138 11 47 6 23 14
>>> I am really surprised to see any reclaim at all. 26% of free memory
>>> doesn't sound as if we should do a reclaim at all. Do you have an
>>> unusual configuration of /proc/sys/vm/min_free_kbytes ? Or is there
>>> anything running inside a memory cgroup with a small limit?
>> nothing special set regarding /proc/sys/vm/min_free_kbytes (default values),
>> detailed config below. Regarding cgroups, none of I know. How to check (I
>> guess nothing is set because cg* commands are not available)?
> be careful because systemd started to use some controllers. You can
> easily check cgroup mount points.
See below.
>
>> /proc/sys/vm/min_free_kbytes
>> 45056
> So at least 45M will be kept reserved for the system. Your data
> indicated you had more memory. How does /proc/zoneinfo look like?
> Btw. you seem to be using fc kernel, are there any patches applied on
> top of Linus tree? Could you try to retest vanilla kernel?
System looks normally now, FYI (e.g. now permanent swapping)
free
total used free shared buff/cache
available
Mem: 349076 154112 41560 184 153404 148716
Swap: 2064380 831844 1232536
cat /proc/zoneinfo
Node 0, zone DMA
per-node stats
nr_inactive_anon 9543
nr_active_anon 22105
nr_inactive_file 9877
nr_active_file 13416
nr_unevictable 0
nr_isolated_anon 0
nr_isolated_file 0
nr_pages_scanned 0
workingset_refault 1926013
workingset_activate 707166
workingset_nodereclaim 187276
nr_anon_pages 11429
nr_mapped 6852
nr_file_pages 46772
nr_dirty 1
nr_writeback 0
nr_writeback_temp 0
nr_shmem 46
nr_shmem_hugepages 0
nr_shmem_pmdmapped 0
nr_anon_transparent_hugepages 0
nr_unstable 0
nr_vmscan_write 3319047
nr_vmscan_immediate_reclaim 32363
nr_dirtied 222115
nr_written 3537529
pages free 3110
min 27
low 33
high 39
node_scanned 0
spanned 4095
present 3998
managed 3977
nr_free_pages 3110
nr_zone_inactive_anon 18
nr_zone_active_anon 3
nr_zone_inactive_file 51
nr_zone_active_file 75
nr_zone_unevictable 0
nr_zone_write_pending 0
nr_mlock 0
nr_slab_reclaimable 214
nr_slab_unreclaimable 289
nr_page_table_pages 185
nr_kernel_stack 16
nr_bounce 0
nr_zspages 0
numa_hit 1214071
numa_miss 0
numa_foreign 0
numa_interleave 0
numa_local 1214071
numa_other 0
nr_free_cma 0
protection: (0, 306, 306, 306, 306)
pagesets
cpu: 0
count: 0
high: 0
batch: 1
vm stats threshold: 4
cpu: 1
count: 0
high: 0
batch: 1
vm stats threshold: 4
node_unreclaimable: 0
start_pfn: 1
node_inactive_ratio: 0
Node 0, zone DMA32
pages free 7921
min 546
low 682
high 818
node_scanned 0
spanned 94172
present 94172
managed 83292
nr_free_pages 7921
nr_zone_inactive_anon 9525
nr_zone_active_anon 22102
nr_zone_inactive_file 9826
nr_zone_active_file 13341
nr_zone_unevictable 0
nr_zone_write_pending 1
nr_mlock 0
nr_slab_reclaimable 5829
nr_slab_unreclaimable 8622
nr_page_table_pages 2638
nr_kernel_stack 2208
nr_bounce 0
nr_zspages 0
numa_hit 23125334
numa_miss 0
numa_foreign 0
numa_interleave 14307
numa_local 23125334
numa_other 0
nr_free_cma 0
protection: (0, 0, 0, 0, 0)
pagesets
cpu: 0
count: 17
high: 90
batch: 15
vm stats threshold: 12
cpu: 1
count: 55
high: 90
batch: 15
vm stats threshold: 12
node_unreclaimable: 0
start_pfn: 4096
node_inactive_ratio: 0
mount | grep cgroup
tmpfs on /sys/fs/cgroup type tmpfs (ro,nosuid,nodev,noexec,mode=755)
cgroup on /sys/fs/cgroup/systemd type cgroup
(rw,nosuid,nodev,noexec,relatime,xattr,release_agent=/usr/lib/systemd/systemd-cgroups-agent,name=systemd)
cgroup on /sys/fs/cgroup/blkio type cgroup
(rw,nosuid,nodev,noexec,relatime,blkio)
cgroup on /sys/fs/cgroup/cpu,cpuacct type cgroup
(rw,nosuid,nodev,noexec,relatime,cpu,cpuacct)
cgroup on /sys/fs/cgroup/cpuset type cgroup
(rw,nosuid,nodev,noexec,relatime,cpuset)
cgroup on /sys/fs/cgroup/net_cls,net_prio type cgroup
(rw,nosuid,nodev,noexec,relatime,net_cls,net_prio)
cgroup on /sys/fs/cgroup/hugetlb type cgroup
(rw,nosuid,nodev,noexec,relatime,hugetlb)
cgroup on /sys/fs/cgroup/pids type cgroup
(rw,nosuid,nodev,noexec,relatime,pids)
cgroup on /sys/fs/cgroup/memory type cgroup
(rw,nosuid,nodev,noexec,relatime,memory)
cgroup on /sys/fs/cgroup/perf_event type cgroup
(rw,nosuid,nodev,noexec,relatime,perf_event)
cgroup on /sys/fs/cgroup/devices type cgroup
(rw,nosuid,nodev,noexec,relatime,devices)
cgroup on /sys/fs/cgroup/freezer type cgroup
(rw,nosuid,nodev,noexec,relatime,freezer)
There are patches (see below), but as far as I saw nothing regarding the
issues which happen.
BTW: Does it make sense to reduce lower limit for low mem VMs? e.g.
echo "10000" > /proc/sys/vm/min_free_kbytes
Thnx.
Ciao,
Gerhard
https://koji.fedoraproject.org/koji/buildinfo?buildID=870215
## Patches needed for building this package
# build tweak for build ID magic, even for -vanilla
Patch001: kbuild-AFTER_LINK.patch
## compile fixes
# ongoing complaint, full discussion delayed until ksummit/plumbers
Patch002: 0001-iio-Use-event-header-from-kernel-tree.patch
%if !%{nopatches}
# Git trees.
# Standalone patches
# a tempory patch for QCOM hardware enablement. Will be gone by end of
2016/F-26 GA
Patch420: qcom-QDF2432-tmp-errata.patch
# http://www.spinics.net/lists/arm-kernel/msg490981.html
Patch421: geekbox-v4-device-tree-support.patch
# http://www.spinics.net/lists/linux-tegra/msg26029.html
Patch422: usb-phy-tegra-Add-38.4MHz-clock-table-entry.patch
# Fix OMAP4 (pandaboard)
Patch423: arm-revert-mmc-omap_hsmmc-Use-dma_request_chan-for-reque.patch
# Not particularly happy we don't yet have a proper upstream resolution
this is the right direction
# https://www.spinics.net/lists/arm-kernel/msg535191.html
Patch424: arm64-mm-Fix-memmap-to-be-initialized-for-the-entire-section.patch
# http://patchwork.ozlabs.org/patch/587554/
Patch425: ARM-tegra-usb-no-reset.patch
Patch426: AllWinner-net-emac.patch
# http://www.spinics.net/lists/devicetree/msg163238.html
Patch430: bcm2837-initial-support.patch
# http://www.spinics.net/lists/dri-devel/msg132235.html
Patch433:
drm-vc4-Fix-OOPSes-from-trying-to-cache-a-partially-constructed-BO..patch
# bcm283x mmc for wifi
http://www.spinics.net/lists/arm-kernel/msg567077.html
Patch434: bcm283x-mmc-bcm2835.patch
# Upstream fixes for i2c/serial/ethernet MAC addresses
Patch435: bcm283x-fixes.patch
# https://lists.freedesktop.org/archives/dri-devel/2017-February/133823.html
Patch436: vc4-fix-vblank-cursor-update-issue.patch
# http://www.spinics.net/lists/arm-kernel/msg552554.html
Patch438: arm-imx6-hummingboard2.patch
Patch460: lib-cpumask-Make-CPUMASK_OFFSTACK-usable-without-deb.patch
Patch466: input-kill-stupid-messages.patch
Patch467: die-floppy-die.patch
Patch468: no-pcspkr-modalias.patch
Patch470: silence-fbcon-logo.patch
Patch471: Kbuild-Add-an-option-to-enable-GCC-VTA.patch
Patch472: crash-driver.patch
Patch473: efi-lockdown.patch
Patch487: Add-EFI-signature-data-types.patch
Patch488: Add-an-EFI-signature-blob-parser-and-key-loader.patch
# This doesn't apply. It seems like it could be replaced by
#
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=5ac7eace2d00eab5ae0e9fdee63e38aee6001f7c
# which has an explicit line about blacklisting
Patch489: KEYS-Add-a-system-blacklist-keyring.patch
Patch490: MODSIGN-Import-certificates-from-UEFI-Secure-Boot.patch
Patch491: MODSIGN-Support-not-importing-certs-from-db.patch
Patch493: drm-i915-hush-check-crtc-state.patch
Patch494: disable-i8042-check-on-apple-mac.patch
Patch495: lis3-improve-handling-of-null-rate.patch
Patch497: scsi-sd_revalidate_disk-prevent-NULL-ptr-deref.patch
Patch498: criu-no-expert.patch
Patch499: ath9k-rx-dma-stop-check.patch
Patch500: xen-pciback-Don-t-disable-PCI_COMMAND-on-PCI-device-.patch
Patch501: Input-synaptics-pin-3-touches-when-the-firmware-repo.patch
Patch502: firmware-Drop-WARN-from-usermodehelper_read_trylock-.patch
# Patch503: drm-i915-turn-off-wc-mmaps.patch
Patch509: MODSIGN-Don-t-try-secure-boot-if-EFI-runtime-is-disa.patch
#CVE-2016-3134 rhbz 1317383 1317384
Patch665: netfilter-x_tables-deal-with-bogus-nextoffset-values.patch
# grabbed from mailing list
Patch667:
v3-Revert-tty-serial-pl011-add-ttyAMA-for-matching-pl011-console.patch
# END OF PATCH DEFINITIONS
Powered by blists - more mailing lists