lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Sun, 19 Mar 2017 17:02:44 +0100
From:   Gerhard Wiesinger <lists@...singer.com>
To:     Michal Hocko <mhocko@...nel.org>
Cc:     lkml@...garu.com, Minchan Kim <minchan@...nel.org>,
        linux-kernel@...r.kernel.org, linux-mm@...ck.org,
        Linus Torvalds <torvalds@...ux-foundation.org>
Subject: Re: Still OOM problems with 4.9er/4.10er kernels

On 19.03.2017 16:18, Michal Hocko wrote:
> On Fri 17-03-17 21:08:31, Gerhard Wiesinger wrote:
>> On 17.03.2017 18:13, Michal Hocko wrote:
>>> On Fri 17-03-17 17:37:48, Gerhard Wiesinger wrote:
>>> [...]
>>>> Why does the kernel prefer to swapin/out and not use
>>>>
>>>> a.) the free memory?
>>> It will use all the free memory up to min watermark which is set up
>>> based on min_free_kbytes.
>> Makes sense, how is /proc/sys/vm/min_free_kbytes default value calculated?
> See init_per_zone_wmark_min
>
>>>> b.) the buffer/cache?
>>> the memory reclaim is strongly biased towards page cache and we try to
>>> avoid swapout as much as possible (see get_scan_count).
>> If I understand it correctly, swapping is preferred over dropping the
>> cache, right. Can this behaviour be changed to prefer dropping the
>> cache to some minimum amount?  Is this also configurable in a way?
> No, we enforce swapping if the amount of free + file pages are below the
> cumulative high watermark.
>
>> (As far as I remember e.g. kernel 2.4 dropped the caches well).
>>
>>>> There is ~100M memory available but kernel swaps all the time ...
>>>>
>>>> Any ideas?
>>>>
>>>> Kernel: 4.9.14-200.fc25.x86_64
>>>>
>>>> top - 17:33:43 up 28 min,  3 users,  load average: 3.58, 1.67, 0.89
>>>> Tasks: 145 total,   4 running, 141 sleeping,   0 stopped,   0 zombie
>>>> %Cpu(s): 19.1 us, 56.2 sy,  0.0 ni,  4.3 id, 13.4 wa, 2.0 hi,  0.3 si,  4.7
>>>> st
>>>> KiB Mem :   230076 total,    61508 free,   123472 used,    45096 buff/cache
>>>>
>>>> procs -----------memory---------- ---swap-- -----io---- -system--
>>>> ------cpu-----
>>>>   r  b   swpd   free   buff  cache   si   so    bi    bo in   cs us sy id wa st
>>>>   3  5 303916  60372    328  43864 27828  200 41420   236 6984 11138 11 47  6 23 14
>>> I am really surprised to see any reclaim at all. 26% of free memory
>>> doesn't sound as if we should do a reclaim at all. Do you have an
>>> unusual configuration of /proc/sys/vm/min_free_kbytes ? Or is there
>>> anything running inside a memory cgroup with a small limit?
>> nothing special set regarding /proc/sys/vm/min_free_kbytes (default values),
>> detailed config below. Regarding cgroups, none of I know. How to check (I
>> guess nothing is set because cg* commands are not available)?
> be careful because systemd started to use some controllers. You can
> easily check cgroup mount points.

See below.

>
>> /proc/sys/vm/min_free_kbytes
>> 45056
> So at least 45M will be kept reserved for the system. Your data
> indicated you had more memory. How does /proc/zoneinfo look like?
> Btw. you seem to be using fc kernel, are there any patches applied on
> top of Linus tree? Could you try to retest vanilla kernel?


System looks normally now, FYI (e.g. now permanent swapping)


free
               total        used        free      shared buff/cache   
available
Mem:         349076      154112       41560         184 153404      148716
Swap:       2064380      831844     1232536

cat /proc/zoneinfo

Node 0, zone      DMA
   per-node stats
       nr_inactive_anon 9543
       nr_active_anon 22105
       nr_inactive_file 9877
       nr_active_file 13416
       nr_unevictable 0
       nr_isolated_anon 0
       nr_isolated_file 0
       nr_pages_scanned 0
       workingset_refault 1926013
       workingset_activate 707166
       workingset_nodereclaim 187276
       nr_anon_pages 11429
       nr_mapped    6852
       nr_file_pages 46772
       nr_dirty     1
       nr_writeback 0
       nr_writeback_temp 0
       nr_shmem     46
       nr_shmem_hugepages 0
       nr_shmem_pmdmapped 0
       nr_anon_transparent_hugepages 0
       nr_unstable  0
       nr_vmscan_write 3319047
       nr_vmscan_immediate_reclaim 32363
       nr_dirtied   222115
       nr_written   3537529
   pages free     3110
         min      27
         low      33
         high     39
    node_scanned  0
         spanned  4095
         present  3998
         managed  3977
       nr_free_pages 3110
       nr_zone_inactive_anon 18
       nr_zone_active_anon 3
       nr_zone_inactive_file 51
       nr_zone_active_file 75
       nr_zone_unevictable 0
       nr_zone_write_pending 0
       nr_mlock     0
       nr_slab_reclaimable 214
       nr_slab_unreclaimable 289
       nr_page_table_pages 185
       nr_kernel_stack 16
       nr_bounce    0
       nr_zspages   0
       numa_hit     1214071
       numa_miss    0
       numa_foreign 0
       numa_interleave 0
       numa_local   1214071
       numa_other   0
       nr_free_cma  0
         protection: (0, 306, 306, 306, 306)
   pagesets
     cpu: 0
               count: 0
               high:  0
               batch: 1
   vm stats threshold: 4
     cpu: 1
               count: 0
               high:  0
               batch: 1
   vm stats threshold: 4
   node_unreclaimable:  0
   start_pfn:           1
   node_inactive_ratio: 0
Node 0, zone    DMA32
   pages free     7921
         min      546
         low      682
         high     818
    node_scanned  0
         spanned  94172
         present  94172
         managed  83292
       nr_free_pages 7921
       nr_zone_inactive_anon 9525
       nr_zone_active_anon 22102
       nr_zone_inactive_file 9826
       nr_zone_active_file 13341
       nr_zone_unevictable 0
       nr_zone_write_pending 1
       nr_mlock     0
       nr_slab_reclaimable 5829
       nr_slab_unreclaimable 8622
       nr_page_table_pages 2638
       nr_kernel_stack 2208
       nr_bounce    0
       nr_zspages   0
       numa_hit     23125334
       numa_miss    0
       numa_foreign 0
       numa_interleave 14307
       numa_local   23125334
       numa_other   0
       nr_free_cma  0
         protection: (0, 0, 0, 0, 0)
   pagesets
     cpu: 0
               count: 17
               high:  90
               batch: 15
   vm stats threshold: 12
     cpu: 1
               count: 55
               high:  90
               batch: 15
   vm stats threshold: 12
   node_unreclaimable:  0
   start_pfn:           4096
   node_inactive_ratio: 0

mount | grep cgroup
tmpfs on /sys/fs/cgroup type tmpfs (ro,nosuid,nodev,noexec,mode=755)
cgroup on /sys/fs/cgroup/systemd type cgroup 
(rw,nosuid,nodev,noexec,relatime,xattr,release_agent=/usr/lib/systemd/systemd-cgroups-agent,name=systemd)
cgroup on /sys/fs/cgroup/blkio type cgroup 
(rw,nosuid,nodev,noexec,relatime,blkio)
cgroup on /sys/fs/cgroup/cpu,cpuacct type cgroup 
(rw,nosuid,nodev,noexec,relatime,cpu,cpuacct)
cgroup on /sys/fs/cgroup/cpuset type cgroup 
(rw,nosuid,nodev,noexec,relatime,cpuset)
cgroup on /sys/fs/cgroup/net_cls,net_prio type cgroup 
(rw,nosuid,nodev,noexec,relatime,net_cls,net_prio)
cgroup on /sys/fs/cgroup/hugetlb type cgroup 
(rw,nosuid,nodev,noexec,relatime,hugetlb)
cgroup on /sys/fs/cgroup/pids type cgroup 
(rw,nosuid,nodev,noexec,relatime,pids)
cgroup on /sys/fs/cgroup/memory type cgroup 
(rw,nosuid,nodev,noexec,relatime,memory)
cgroup on /sys/fs/cgroup/perf_event type cgroup 
(rw,nosuid,nodev,noexec,relatime,perf_event)
cgroup on /sys/fs/cgroup/devices type cgroup 
(rw,nosuid,nodev,noexec,relatime,devices)
cgroup on /sys/fs/cgroup/freezer type cgroup 
(rw,nosuid,nodev,noexec,relatime,freezer)

There are patches (see below), but as far as I saw nothing regarding the 
issues which happen.


BTW: Does it make sense to reduce lower limit for low mem VMs? e.g.

echo "10000" > /proc/sys/vm/min_free_kbytes


Thnx.

Ciao,

Gerhard

https://koji.fedoraproject.org/koji/buildinfo?buildID=870215

## Patches needed for building this package

# build tweak for build ID magic, even for -vanilla
Patch001: kbuild-AFTER_LINK.patch

## compile fixes

# ongoing complaint, full discussion delayed until ksummit/plumbers
Patch002: 0001-iio-Use-event-header-from-kernel-tree.patch

%if !%{nopatches}

# Git trees.

# Standalone patches

# a tempory patch for QCOM hardware enablement. Will be gone by end of 
2016/F-26 GA
Patch420: qcom-QDF2432-tmp-errata.patch

# http://www.spinics.net/lists/arm-kernel/msg490981.html
Patch421: geekbox-v4-device-tree-support.patch

# http://www.spinics.net/lists/linux-tegra/msg26029.html
Patch422: usb-phy-tegra-Add-38.4MHz-clock-table-entry.patch

# Fix OMAP4 (pandaboard)
Patch423: arm-revert-mmc-omap_hsmmc-Use-dma_request_chan-for-reque.patch

# Not particularly happy we don't yet have a proper upstream resolution 
this is the right direction
# https://www.spinics.net/lists/arm-kernel/msg535191.html
Patch424: arm64-mm-Fix-memmap-to-be-initialized-for-the-entire-section.patch

# http://patchwork.ozlabs.org/patch/587554/
Patch425: ARM-tegra-usb-no-reset.patch

Patch426: AllWinner-net-emac.patch

# http://www.spinics.net/lists/devicetree/msg163238.html
Patch430: bcm2837-initial-support.patch

# http://www.spinics.net/lists/dri-devel/msg132235.html
Patch433: 
drm-vc4-Fix-OOPSes-from-trying-to-cache-a-partially-constructed-BO..patch

# bcm283x mmc for wifi 
http://www.spinics.net/lists/arm-kernel/msg567077.html
Patch434: bcm283x-mmc-bcm2835.patch

# Upstream fixes for i2c/serial/ethernet MAC addresses
Patch435: bcm283x-fixes.patch

# https://lists.freedesktop.org/archives/dri-devel/2017-February/133823.html
Patch436: vc4-fix-vblank-cursor-update-issue.patch

# http://www.spinics.net/lists/arm-kernel/msg552554.html
Patch438: arm-imx6-hummingboard2.patch

Patch460: lib-cpumask-Make-CPUMASK_OFFSTACK-usable-without-deb.patch

Patch466: input-kill-stupid-messages.patch

Patch467: die-floppy-die.patch

Patch468: no-pcspkr-modalias.patch

Patch470: silence-fbcon-logo.patch

Patch471: Kbuild-Add-an-option-to-enable-GCC-VTA.patch

Patch472: crash-driver.patch

Patch473: efi-lockdown.patch

Patch487: Add-EFI-signature-data-types.patch

Patch488: Add-an-EFI-signature-blob-parser-and-key-loader.patch

# This doesn't apply. It seems like it could be replaced by
# 
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=5ac7eace2d00eab5ae0e9fdee63e38aee6001f7c
# which has an explicit line about blacklisting
Patch489: KEYS-Add-a-system-blacklist-keyring.patch

Patch490: MODSIGN-Import-certificates-from-UEFI-Secure-Boot.patch

Patch491: MODSIGN-Support-not-importing-certs-from-db.patch

Patch493: drm-i915-hush-check-crtc-state.patch

Patch494: disable-i8042-check-on-apple-mac.patch

Patch495: lis3-improve-handling-of-null-rate.patch

Patch497: scsi-sd_revalidate_disk-prevent-NULL-ptr-deref.patch

Patch498: criu-no-expert.patch

Patch499: ath9k-rx-dma-stop-check.patch

Patch500: xen-pciback-Don-t-disable-PCI_COMMAND-on-PCI-device-.patch

Patch501: Input-synaptics-pin-3-touches-when-the-firmware-repo.patch

Patch502: firmware-Drop-WARN-from-usermodehelper_read_trylock-.patch

# Patch503: drm-i915-turn-off-wc-mmaps.patch

Patch509: MODSIGN-Don-t-try-secure-boot-if-EFI-runtime-is-disa.patch

#CVE-2016-3134 rhbz 1317383 1317384
Patch665: netfilter-x_tables-deal-with-bogus-nextoffset-values.patch

# grabbed from mailing list
Patch667: 
v3-Revert-tty-serial-pl011-add-ttyAMA-for-matching-pl011-console.patch

# END OF PATCH DEFINITIONS

Powered by blists - more mailing lists