[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAED-sicdkasoUwtbE5kg_UuRBswgKqTxhMR2qi-jdYokxZWJcg@mail.gmail.com>
Date: Thu, 9 Apr 2020 19:38:51 +0200
From: Jesper Krogh <jesper.krogh@...il.com>
To: Jeff Layton <jlayton@...nel.org>, netdev@...r.kernel.org
Subject: Page allocation error - CephFS kernel client hang - 5.4.20 kernel.
Hi.
Jeff suggested me to post it here to se if some of you can figure this out.
The system is VM running 16GB memory 16 v-cores KVM/QEMU client is 5.4.20
and the workload is the "bacula" filedaemon essentially reading data
off a cephfs
mountpoint only with the purpose of shipping it to the network to the
tape-library.
Concurrency is 3-5 threads reading catalogs sequentially -- nothing
else (except
standard distro stuff) on the host.
I have indications from a physical machine that I can reproduce it
there - although kernel 5.2.8 and a mixed workload but same end
result.
Attached is the full kernel.log from bootup until the "crash" where the CephFS
mountpoint goes stale and cannot be recovered without a hard reboot of the VM.
Thread on Ceph-devel is here:
https://www.spinics.net/lists/ceph-devel/msg48034.html
When this happens the VM has "very high" iowait - because reading
from Ceph essentially "only" is waiting for data to come from the
network. But it can be as high as 150+ on a machine with 16 cores and
5 active threads reading from CephFS mountpoint - It feels excessive
to me - but if it worked for me I wouldn't care about it.
Apr 5 14:20:45 wombat kernel: [10956.183074] kworker/7:0: page
allocation failure: order:0, mode:0xa20(GFP_ATOMIC),
nodemask=(null),cpuset=/,mems_allowed=0
Apr 5 14:20:45 wombat kernel: [10956.183089] CPU: 7 PID: 7499 Comm:
kworker/7:0 Not tainted 5.4.30 #5
Apr 5 14:20:45 wombat kernel: [10956.183090] Hardware name: Bochs
Bochs, BIOS Bochs 01/01/2011
Apr 5 14:20:45 wombat kernel: [10956.183141] Workqueue: ceph-msgr
ceph_con_workfn [libceph]
Apr 5 14:20:45 wombat kernel: [10956.183143] Call Trace:
Apr 5 14:20:45 wombat kernel: [10956.183146] <IRQ>
Apr 5 14:20:45 wombat kernel: [10956.183279] dump_stack+0x6d/0x95
Apr 5 14:20:45 wombat kernel: [10956.183284] warn_alloc+0x10c/0x170
Apr 5 14:20:45 wombat kernel: [10956.183288]
__alloc_pages_slowpath+0xe6c/0xef0
Apr 5 14:20:45 wombat kernel: [10956.183290]
__alloc_pages_nodemask+0x2f3/0x360
Apr 5 14:20:45 wombat kernel: [10956.183295] alloc_pages_current+0x6a/0xe0
Apr 5 14:20:45 wombat kernel: [10956.183301] skb_page_frag_refill+0xda/0x100
Apr 5 14:20:45 wombat kernel: [10956.183310]
try_fill_recv+0x285/0x6f0 [virtio_net]
Apr 5 14:20:45 wombat kernel: [10956.183314]
virtnet_poll+0x32d/0x364 [virtio_net]
Apr 5 14:20:45 wombat kernel: [10956.183321] net_rx_action+0x265/0x3e0
Apr 5 14:20:45 wombat kernel: [10956.183328] __do_softirq+0xf9/0x2aa
Apr 5 14:20:45 wombat kernel: [10956.183332] irq_exit+0xae/0xb0
Apr 5 14:20:45 wombat kernel: [10956.183335] do_IRQ+0x59/0xe0
Apr 5 14:20:45 wombat kernel: [10956.183337] common_interrupt+0xf/0xf
Apr 5 14:20:45 wombat kernel: [10956.183338] </IRQ>
Apr 5 14:20:45 wombat kernel: [10956.183341] RIP:
0010:kvm_clock_get_cycles+0x1/0x20
Apr 5 14:20:45 wombat kernel: [10956.183343] Code: 89 04 10 75 d6 b8
f4 ff ff ff 5d c3 48 c1 e1 06 31 c0 48 81 c1 00 10 60 89 49 89 0c 10
eb be 66 2e 0f 1f 84 00 00 00 00 00 55 <48> 89 e5 65 48 8b 3d e4 8f 7a
78 e8 ff 0f 00 00 5d c3 0f 1f 00 66
Apr 5 14:20:45 wombat kernel: [10956.183344] RSP:
0018:ffffaf6881cb7ab8 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffdc
Apr 5 14:20:45 wombat kernel: [10956.183346] RAX: ffffffff8786e040
RBX: 000009f6c5a66e6c RCX: 0000000000001000
Apr 5 14:20:45 wombat kernel: [10956.183347] RDX: 0000000000000000
RSI: 0000000000001000 RDI: ffffffff88e3c0e0
Apr 5 14:20:45 wombat kernel: [10956.183348] RBP: ffffaf6881cb7ae0
R08: 000000000002241a R09: 0000000000001000
Apr 5 14:20:45 wombat kernel: [10956.183348] R10: 0000000000000000
R11: 0000000000001000 R12: 0000000000000000
Apr 5 14:20:45 wombat kernel: [10956.183349] R13: 0000000000176864
R14: ffff9fa900931fb4 R15: 0000000000000000
Apr 5 14:20:45 wombat kernel: [10956.183351] ? kvmclock_setup_percpu+0x80/0x80
Apr 5 14:20:45 wombat kernel: [10956.183354] ? ktime_get+0x3e/0xa0
Apr 5 14:20:45 wombat kernel: [10956.183365] tcp_mstamp_refresh+0x12/0x40
Apr 5 14:20:45 wombat kernel: [10956.183368] tcp_rcv_space_adjust+0x22/0x1d0
Apr 5 14:20:45 wombat kernel: [10956.183369] tcp_recvmsg+0x28b/0xbc0
Apr 5 14:20:45 wombat kernel: [10956.183374] ? aa_sk_perm+0x43/0x190
Apr 5 14:20:45 wombat kernel: [10956.183380] inet_recvmsg+0x64/0xf0
Apr 5 14:20:45 wombat kernel: [10956.183384] sock_recvmsg+0x66/0x70
Apr 5 14:20:45 wombat kernel: [10956.183392]
ceph_tcp_recvpage+0x79/0xb0 [libceph]
Apr 5 14:20:45 wombat kernel: [10956.183400]
read_partial_message+0x3c3/0x7c0 [libceph]
Apr 5 14:20:45 wombat kernel: [10956.183407]
ceph_con_workfn+0xa6a/0x23d0 [libceph]
Apr 5 14:20:45 wombat kernel: [10956.183409] ? __switch_to_asm+0x40/0x70
Apr 5 14:20:45 wombat kernel: [10956.183410] ? __switch_to_asm+0x34/0x70
Apr 5 14:20:45 wombat kernel: [10956.183411] ? __switch_to_asm+0x40/0x70
Apr 5 14:20:45 wombat kernel: [10956.183413] ? __switch_to_asm+0x34/0x70
Apr 5 14:20:45 wombat kernel: [10956.183415] process_one_work+0x167/0x400
Apr 5 14:20:45 wombat kernel: [10956.183416] worker_thread+0x4d/0x460
Apr 5 14:20:45 wombat kernel: [10956.183419] kthread+0x105/0x140
Apr 5 14:20:45 wombat kernel: [10956.183420] ? rescuer_thread+0x370/0x370
Apr 5 14:20:45 wombat kernel: [10956.183421] ?
kthread_destroy_worker+0x50/0x50
Apr 5 14:20:45 wombat kernel: [10956.183423] ret_from_fork+0x35/0x40
Apr 5 14:20:45 wombat kernel: [10956.183425] Mem-Info:
Apr 5 14:20:45 wombat kernel: [10956.183430] active_anon:39163
inactive_anon:39973 isolated_anon:0
Apr 5 14:20:45 wombat kernel: [10956.183430] active_file:85386
inactive_file:3756171 isolated_file:32
Apr 5 14:21:02 wombat kernel: [10956.183430] unevictable:0 dirty:0
writeback:0 unstable:0
Apr 5 14:21:02 wombat kernel: [10956.183430] slab_reclaimable:81555
slab_unreclaimable:45634
Apr 5 14:21:02 wombat kernel: [10956.183430] mapped:4433 shmem:560
pagetables:980 bounce:0
Apr 5 14:21:02 wombat kernel: [10956.183430] free:33869 free_pcp:103
free_cma:0
Apr 5 14:21:02 wombat kernel: [10956.183433] Node 0
active_anon:156652kB inactive_anon:159892kB active_file:341544kB
inactive_file:15024684kB unevictable:0kB isolated(anon):0kB
isolated(file):128kB mapped:17732kB dirty:0kB writeback:0kB
shmem:2240kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB
writeback_tmp:0kB unstable:0kB all_unreclaimable? no
Apr 5 14:21:02 wombat kernel: [10956.183434] Node 0 DMA free:15908kB
min:64kB low:80kB high:96kB active_anon:0kB inactive_anon:0kB
active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB
present:15992kB managed:15908kB mlocked:0kB kernel_stack:0kB
pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
Apr 5 14:21:02 wombat kernel: [10956.183437] lowmem_reserve[]: 0 3444
15930 15930 15930
Apr 5 14:21:02 wombat kernel: [10956.183439] Node 0 DMA32
free:55284kB min:14596kB low:18244kB high:21892kB active_anon:4452kB
inactive_anon:9072kB active_file:5560kB inactive_file:3333524kB
unevictable:0kB writepending:0kB present:3653608kB managed:3588072kB
mlocked:0kB kernel_stack:740kB pagetables:32kB bounce:0kB
free_pcp:164kB local_pcp:164kB free_cma:0kB
Apr 5 14:21:02 wombat kernel: [10956.183442] lowmem_reserve[]: 0 0
12485 12485 12485
Apr 5 14:21:02 wombat kernel: [10956.183444] Node 0 Normal
free:64284kB min:171972kB low:185200kB high:198428kB
active_anon:152200kB inactive_anon:150820kB active_file:335984kB
inactive_file:11691288kB unevictable:0kB writepending:0kB
present:13107200kB managed:12793540kB mlocked:0kB kernel_stack:6684kB
pagetables:3888kB bounce:0kB free_pcp:248kB local_pcp:248kB
free_cma:0kB
Apr 5 14:21:02 wombat kernel: [10956.183447] lowmem_reserve[]: 0 0 0 0 0
Apr 5 14:21:02 wombat kernel: [10956.183448] Node 0 DMA: 1*4kB (U)
0*8kB 0*16kB 1*32kB (U) 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB
1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15908kB
Apr 5 14:21:02 wombat kernel: [10956.183455] Node 0 DMA32: 72*4kB
(MEH) 172*8kB (UMEH) 310*16kB (UH) 155*32kB (UEH) 265*64kB (UMH)
113*128kB (UMEH) 34*256kB (UE) 3*512kB (UM) 2*1024kB (M) 0*2048kB
0*4096kB = 55296kB
Apr 5 14:21:02 wombat kernel: [10956.183462] Node 0 Normal: 392*4kB
(UMEH) 792*8kB (UMEH) 258*16kB (UMH) 318*32kB (MEH) 648*64kB (MH)
10*128kB (M) 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 64960kB
Apr 5 14:21:02 wombat kernel: [10956.183469] Node 0 hugepages_total=0
hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
Apr 5 14:21:02 wombat kernel: [10956.183470] Node 0 hugepages_total=0
hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
Apr 5 14:21:02 wombat kernel: [10956.183471] 3843456 total pagecache pages
Apr 5 14:21:02 wombat kernel: [10956.183473] 1281 pages in swap cache
Apr 5 14:21:02 wombat kernel: [10956.183475] Swap cache stats: add
13816, delete 12535, find 3296/6045
Apr 5 14:21:02 wombat kernel: [10956.183475] Free swap = 4004652kB
Apr 5 14:21:02 wombat kernel: [10956.183476] Total swap = 4038652kB
Apr 5 14:21:02 wombat kernel: [10956.183476] 4194200 pages RAM
Apr 5 14:21:02 wombat kernel: [10956.183477] 0 pages HighMem/MovableOnly
Apr 5 14:21:02 wombat kernel: [10956.183477] 94820 pages reserved
Apr 5 14:21:02 wombat kernel: [10956.183478] 0 pages cma reserved
Apr 5 14:21:02 wombat kernel: [10956.183478] 0 pages hwpoisoned
Apr 5 14:25:15 wombat kernel: [11225.534971] warn_alloc: 43 callbacks
suppressed
Download attachment "kern.log" of type "application/octet-stream" (674761 bytes)
Powered by blists - more mailing lists