lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Mon, 13 Jul 2009 23:59:52 +0900
From:	Minchan Kim <minchan.kim@...il.com>
To:	Pierre Ossman <drzeus-list@...eus.cx>
Cc:	avi@...hat.com, kvm@...r.kernel.org,
	LKML <linux-kernel@...r.kernel.org>, linux-mm@...ck.org,
	Wu Fengguang <fengguang.wu@...el.com>,
	KOSAKI Motohiro <kosaki.motohiro@...fujitsu.com>,
	Rik van Riel <riel@...hat.com>
Subject: Re: Page allocation failures in guest

On Mon, Jul 13, 2009 at 6:51 PM, Pierre Ossman<drzeus-list@...eus.cx> wrote:
> I upgraded my Fedora 10 host to 2.6.29 a few days ago and since then
> one of the guests keeps getting page allocation failures after a few
> hours. I've upgraded the kernel in the guest from 2.6.27 to 2.6.29
> without any change. There are also a few other guests running on the
> machine that aren't having any issues.
>
> The only noticable thing that dies for me is the network. The machine
> still logs properly and I can attach to the local console and reboot it.
>
> This is what I see in dmesg/logs:
>
> Jul 12 23:04:54 loki kernel: sshd: page allocation failure. order:0, mode:0x4020

GFP_ATOMIC.
We don't have a many thing for reclaiming.

> Jul 12 23:04:54 loki kernel: Pid: 1682, comm: sshd Not tainted 2.6.29.5-84.fc10.x86_64 #1
> Jul 12 23:04:54 loki kernel: Call Trace:
> Jul 12 23:04:54 loki kernel: <IRQ>  [<ffffffff810a1896>] __alloc_pages_internal+0x42f/0x451
> Jul 12 23:04:54 loki kernel: [<ffffffff810c52f8>] alloc_pages_current+0xb9/0xc2
> Jul 12 23:04:54 loki kernel: [<ffffffff810c926c>] alloc_slab_page+0x19/0x69
> Jul 12 23:04:54 loki kernel: [<ffffffff810c931f>] new_slab+0x63/0x1cb
> Jul 12 23:04:54 loki kernel: [<ffffffff810c99fd>] __slab_alloc+0x23d/0x3ac
> Jul 12 23:04:54 loki kernel: [<ffffffff812d49f2>] ? __netdev_alloc_skb+0x31/0x4d
> Jul 12 23:04:54 loki kernel: [<ffffffff810cac1b>] __kmalloc_node_track_caller+0xbb/0x11f
> Jul 12 23:04:54 loki kernel: [<ffffffff812d49f2>] ? __netdev_alloc_skb+0x31/0x4d
> Jul 12 23:04:54 loki kernel: [<ffffffff812d3dfc>] __alloc_skb+0x6f/0x130
> Jul 12 23:04:54 loki kernel: [<ffffffff812d49f2>] __netdev_alloc_skb+0x31/0x4d
> Jul 12 23:04:54 loki kernel: [<ffffffffa002e668>] try_fill_recv_maxbufs+0x5a/0x20d [virtio_net]
> Jul 12 23:04:54 loki kernel: [<ffffffffa002e83d>] try_fill_recv+0x22/0x17e [virtio_net]
> Jul 12 23:04:54 loki kernel: [<ffffffff812d9c74>] ? netif_receive_skb+0x40a/0x42f
> Jul 12 23:04:54 loki kernel: [<ffffffffa002f4b9>] virtnet_poll+0x57f/0x5ee [virtio_net]
> Jul 12 23:04:54 loki kernel: [<ffffffff81374b45>] ? _spin_lock_irq+0x21/0x26
> Jul 12 23:04:54 loki kernel: [<ffffffff812d8372>] net_rx_action+0xb3/0x1af
> Jul 12 23:04:54 loki kernel: [<ffffffff8104d9f0>] __do_softirq+0x94/0x150
> Jul 12 23:04:54 loki kernel: [<ffffffff8101274c>] call_softirq+0x1c/0x30
> Jul 12 23:04:54 loki kernel: <EOI>  [<ffffffff81013869>] do_softirq+0x4d/0xb4
> Jul 12 23:04:54 loki kernel: [<ffffffff812cf149>] ? release_sock+0xb0/0xbb
> Jul 12 23:04:54 loki kernel: [<ffffffff8104d86f>] _local_bh_enable_ip+0xc5/0xe5
> Jul 12 23:04:54 loki kernel: [<ffffffff8104d898>] local_bh_enable_ip+0x9/0xb
> Jul 12 23:04:54 loki kernel: [<ffffffff81374954>] _spin_unlock_bh+0x13/0x15
> Jul 12 23:04:54 loki kernel: [<ffffffff812cf149>] release_sock+0xb0/0xbb
> Jul 12 23:04:54 loki kernel: [<ffffffff812d2f38>] ? __kfree_skb+0x82/0x86
> Jul 12 23:04:54 loki kernel: [<ffffffff8130f088>] tcp_recvmsg+0x974/0xa99
> Jul 12 23:04:54 loki kernel: [<ffffffff812ce566>] sock_common_recvmsg+0x32/0x47
> Jul 12 23:04:54 loki kernel: [<ffffffff812cc5a1>] __sock_recvmsg+0x6d/0x7a
> Jul 12 23:04:54 loki kernel: [<ffffffff812cc69c>] sock_aio_read+0xee/0xfe
> Jul 12 23:04:54 loki kernel: [<ffffffff810d1ecb>] do_sync_read+0xe7/0x12d
> Jul 12 23:04:54 loki kernel: [<ffffffff811867ba>] ? rb_erase+0x278/0x2a0
> Jul 12 23:04:54 loki kernel: [<ffffffff8105bdc8>] ? autoremove_wake_function+0x0/0x38
> Jul 12 23:04:54 loki kernel: [<ffffffff81374845>] ? _spin_lock+0x9/0xc
> Jul 12 23:04:54 loki kernel: [<ffffffff811502e8>] ? security_file_permission+0x11/0x13
> Jul 12 23:04:54 loki kernel: [<ffffffff810d2884>] vfs_read+0xbb/0x102
> Jul 12 23:04:54 loki kernel: [<ffffffff810d298f>] sys_read+0x47/0x6e
> Jul 12 23:04:54 loki kernel: [<ffffffff8101133a>] system_call_fastpath+0x16/0x1b
> Jul 12 23:04:54 loki kernel: Mem-Info:
> Jul 12 23:04:54 loki kernel: Node 0 DMA per-cpu:
> Jul 12 23:04:54 loki kernel: CPU    0: hi:    0, btch:   1 usd:   0
> Jul 12 23:04:54 loki kernel: Node 0 DMA32 per-cpu:
> Jul 12 23:04:54 loki kernel: CPU    0: hi:  186, btch:  31 usd: 119
> Jul 12 23:04:54 loki kernel: Active_anon:14065 active_file:87384 inactive_anon:37480
> Jul 12 23:04:54 loki kernel: inactive_file:95821 unevictable:4 dirty:8 writeback:0 unstable:0
> Jul 12 23:04:54 loki kernel: free:1344 slab:7113 mapped:4283 pagetables:5656 bounce:0
> Jul 12 23:04:54 loki kernel: Node 0 DMA free:3988kB min:24kB low:28kB high:36kB active_anon:0kB inactive_anon:0kB active_file:3532kB inactive_file:1032kB unevictable:0kB present:6840kB pages_scanned:0 all_un

I don't know why present is bigger than free + [in]active anon ?
Who know this ?

There are 258 pages in inactive file.
Unfortunately, it seems we don't have any discardable pages.
The reclaimer can't sync dirty pages to reclaim them, too.
That's because we are going on GFP_ATOMIC as I mentioned.

> reclaimable? no
> Jul 12 23:04:54 loki kernel: lowmem_reserve[]: 0 994 994 994
> Jul 12 23:04:54 loki kernel: Node 0 DMA32 free:1388kB min:4020kB low:5024kB high:6028kB active_anon:56260kB inactive_anon:149920kB active_file:346004kB inactive_file:382252kB unevictable:16kB present:1018016


free : 1388KB min : 4020KB. In addtion, now GFP_HIGH. so calculation
is as follow for zone_watermark_ok.

1388 < (4020 / 2)

So failed it in zone_watermark_ok.
AFAIU, it's fairy OOM problem.

> kB pages_scanned:96 all_unreclaimable? no
> Jul 12 23:04:54 loki kernel: lowmem_reserve[]: 0 0 0 0
> Jul 12 23:04:54 loki kernel: Node 0 DMA: 1*4kB 0*8kB 1*16kB 0*32kB 0*64kB 1*128kB 1*256kB 1*512kB 1*1024kB 1*2048kB 0*4096kB = 3988kB
> Jul 12 23:04:54 loki kernel: Node 0 DMA32: 4*4kB 77*8kB 3*16kB 0*32kB 1*64kB 1*128kB 0*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 1384kB
> Jul 12 23:04:54 loki kernel: 183936 total pagecache pages
> Jul 12 23:04:54 loki kernel: 0 pages in swap cache
> Jul 12 23:04:54 loki kernel: Swap cache stats: add 0, delete 0, find 0/0
> Jul 12 23:04:54 loki kernel: Free swap  = 1015800kB
> Jul 12 23:04:54 loki kernel: Total swap = 1015800kB
> Jul 12 23:04:54 loki kernel: 262128 pages RAM
> Jul 12 23:04:54 loki kernel: 8339 pages reserved
> Jul 12 23:04:54 loki kernel: 34783 pages shared
> Jul 12 23:04:54 loki kernel: 245277 pages non-shared
>
> It doesn't look like it's out of memory to me, so I'm not sure what is
> going on.
>
> Rgds
> --
>     -- Pierre Ossman
>
>  Linux kernel, MMC maintainer        http://www.kernel.org
>  rdesktop, core developer          http://www.rdesktop.org
>  TigerVNC, core developer          http://www.tigervnc.org
>
>  WARNING: This correspondence is being monitored by the
>  Swedish government. Make sure your server uses encryption
>  for SMTP traffic and consider using PGP for end-to-end
>  encryption.
>



-- 
Kind regards,
Minchan Kim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ