lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <485137E8.4020606@ltu.se>
Date:	Thu, 12 Jun 2008 16:51:20 +0200
From:	Staffan Hämälä <sh@....se>
To:	LKML <linux-kernel@...r.kernel.org>
Subject: Problems with the oom-killer

Hi,

I have had a lot of problems with the oom-killer during times of lots of disk
activity. We have two identical machines running TSM (tivoli storage manager),
running on Redhat Enterprise 4. The kernel is 2.6.9 (2.6.9-42.0.10.ELsmp).

The machines both have a lot of disk connected through HBA interfaces. Maybe
the disk buffers grow out of proportion. The file systems are formatted with ext3.

It always seems to happen when there is a lot of disk activity. Either during
automatic maintenance tasks, or when I have manually started jobs that access
the disk a lot (e.g. formatting disk files for TSM).

When this happens, there always seems to be lots of free memory, and the swap
is unused.

I have tried logging the memory usage, but can see no significant change during
the times when the oom-killer has surfaced. It happens very irregularly.

A few weeks ago, however, it happened several times the same day, at a time when
we had some disk problems.

I have read all I can about this problem, and have tried setting the vm.overcommit_memory
setting to 2, but it doesn't seem to have helped.

The settings right now:
vm.overcommit_ratio = 50
vm.overcommit_memory = 2

free usually reports figures like this:

# free -m
              total       used       free     shared    buffers     cached
Mem:          4050       4009         40          0        220       3008
-/+ buffers/cache:        780       3269
Swap:        10236         11      10225


The lines from /var/log/messages:
(very similar each time this happens. dsmserv gets killed each time).

Jun 12 07:07:10 papyrus kernel: oom-killer: gfp_mask=0xd0
Jun 12 07:07:10 papyrus kernel: Mem-info:
Jun 12 07:07:10 papyrus kernel: DMA per-cpu:
Jun 12 07:07:10 papyrus kernel: cpu 0 hot: low 2, high 6, batch 1
Jun 12 07:07:10 papyrus kernel: cpu 0 cold: low 0, high 2, batch 1
Jun 12 07:07:10 papyrus kernel: cpu 1 hot: low 2, high 6, batch 1
Jun 12 07:07:10 papyrus kernel: cpu 1 cold: low 0, high 2, batch 1
Jun 12 07:07:10 papyrus kernel: cpu 2 hot: low 2, high 6, batch 1
Jun 12 07:07:10 papyrus kernel: cpu 2 cold: low 0, high 2, batch 1
Jun 12 07:07:10 papyrus kernel: cpu 3 hot: low 2, high 6, batch 1
Jun 12 07:07:10 papyrus kernel: cpu 3 cold: low 0, high 2, batch 1
Jun 12 07:07:10 papyrus kernel: Normal per-cpu:
Jun 12 07:07:10 papyrus kernel: cpu 0 hot: low 32, high 96, batch 16
Jun 12 07:07:12 papyrus kernel: cpu 0 cold: low 0, high 32, batch 16
Jun 12 07:07:12 papyrus kernel: cpu 1 hot: low 32, high 96, batch 16
Jun 12 07:07:12 papyrus kernel: cpu 1 cold: low 0, high 32, batch 16
Jun 12 07:07:12 papyrus kernel: cpu 2 hot: low 32, high 96, batch 16
Jun 12 07:07:12 papyrus kernel: cpu 2 cold: low 0, high 32, batch 16
Jun 12 07:07:12 papyrus kernel: cpu 3 hot: low 32, high 96, batch 16
Jun 12 07:07:12 papyrus kernel: cpu 3 cold: low 0, high 32, batch 16
Jun 12 07:07:12 papyrus kernel: HighMem per-cpu:
Jun 12 07:07:12 papyrus kernel: cpu 0 hot: low 32, high 96, batch 16
Jun 12 07:07:12 papyrus kernel: cpu 0 cold: low 0, high 32, batch 16
Jun 12 07:07:12 papyrus kernel: cpu 1 hot: low 32, high 96, batch 16
Jun 12 07:07:12 papyrus kernel: cpu 1 cold: low 0, high 32, batch 16
Jun 12 07:07:12 papyrus kernel: cpu 2 hot: low 32, high 96, batch 16
Jun 12 07:07:12 papyrus kernel: cpu 2 cold: low 0, high 32, batch 16
Jun 12 07:07:12 papyrus kernel: cpu 3 hot: low 32, high 96, batch 16
Jun 12 07:07:12 papyrus kernel: cpu 3 cold: low 0, high 32, batch 16
Jun 12 07:07:12 papyrus kernel:
Jun 12 07:07:12 papyrus kernel: Free pages:       15104kB (1664kB HighMem)
Jun 12 07:07:12 papyrus kernel: Active:195212 inactive:800523 dirty:291150 writeback:43473 unstable:0 free:3776 slab:30090 mapped:189285 pagetables:888
Jun 12 07:07:12 papyrus kernel: DMA free:12520kB min:16kB low:32kB high:48kB active:0kB inactive:0kB present:16384kB pages_scanned:401 all_unreclaimable? yes
Jun 12 07:07:12 papyrus kernel: protections[]: 0 0 0
Jun 12 07:07:12 papyrus kernel: Normal free:920kB min:928kB low:1856kB high:2784kB active:9812kB inactive:713164kB present:901120kB pages_scanned:816915 all_unreclaimable? yes
Jun 12 07:07:13 papyrus kernel: protections[]: 0 0 0
Jun 12 07:07:13 papyrus kernel: HighMem free:1664kB min:512kB low:1024kB high:1536kB active:771036kB inactive:2488928kB present:4325376kB pages_scanned:0 all_unreclaimable? no
Jun 12 07:07:13 papyrus kernel: protections[]: 0 0 0
Jun 12 07:07:13 papyrus kernel: DMA: 2*4kB 2*8kB 1*16kB 2*32kB 2*64kB 2*128kB 1*256kB 1*512kB 1*1024kB 1*2048kB 2*4096kB = 12520kB
Jun 12 07:07:13 papyrus kernel: Normal: 62*4kB 26*8kB 1*16kB 0*32kB 1*64kB 1*128kB 1*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 920kB
Jun 12 07:07:13 papyrus kernel: HighMem: 2*4kB 1*8kB 1*16kB 1*32kB 5*64kB 6*128kB 0*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 1664kB
Jun 12 07:07:13 papyrus kernel: Swap cache: add 13134, delete 11406, find 19575/20970, race 0+0
Jun 12 07:07:13 papyrus kernel: 0 bounce buffer pages
Jun 12 07:07:13 papyrus kernel: Free swap:       10467444kB
Jun 12 07:07:13 papyrus kernel: 1310720 pages of RAM
Jun 12 07:07:13 papyrus kernel: 819147 pages of HIGHMEM
Jun 12 07:07:13 papyrus kernel: 273918 reserved pages
Jun 12 07:07:13 papyrus kernel: 821382 pages shared
Jun 12 07:07:13 papyrus kernel: 1728 pages swap cached
Jun 12 07:07:13 papyrus kernel: Out of Memory: Killed process 20524 (dsmserv).

I hope anyone has a clue about this.

Thanks
Staffan Hamala

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ