[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200604005916.niy2mejjcsx4sv6t@alap3.anarazel.de>
Date: Wed, 3 Jun 2020 17:59:16 -0700
From: Andres Freund <andres@...razel.de>
To: Jens Axboe <axboe@...nel.dk>
Cc: io-uring@...r.kernel.org, linux-fsdevel@...r.kernel.org,
linux-kernel@...r.kernel.org, linux-mm@...ck.org,
akpm@...ux-foundation.org
Subject: Re: [PATCHSET v5 0/12] Add support for async buffered reads
Hi,
I was trying to benchmark the benefits of this for the io_uring using
postgres I am working on. The initial results where quite promising
(reducing cpu usage significantly, to lower than non-uring sync io). But
unfortunately trying another workload triggered both panics and before
that seemingly returned wrong data.
I first saw that problem with b360d424ce02, which was
linux-block/async-buffered.6 at the time. After hitting the issue, I
updated to the current linux-block/async-buffered.6, but the problem
persists.
The workload that triggers the bug within a few seconds is postgres
doing a parallel sequential scan of a large table (and aggregating the
data, but that shouldn't matter). In the triggering case that boils down
to 9 processes sequentially reading a number of 1GB files (we chunk
tables internally into smaller files). Each process will read a 512kB
chunk of the file on its own, and then claim the next 512kB from a
shared memory location. Most of the IO will be READV requests, reading
16 * 8kB into postgres' buffer pool (which may or may not be neighboring
8kB pages).
The io submissions in this case will all be done to the same io_uring,
targeting MAP_HUGETLB | MAP_SHARED | MAP_ANONYMOUS memory. The data is
on xfs.
The benchmark starts with dropping caches, which helpfully is visible in
dmesg... That takes a few seconds, so the actual buffered io_uring load
starts soon after.
[ 125.526092] tee (7775): drop_caches: 3
[ 146.264327] kill_fasync: bad magic number in fasync_struct!
After this, but before the oops, I see some userspace memory or
non-io-uring pipe reads being corrupted.
[ 146.264327] kill_fasync: bad magic number in fasync_struct!
[ 146.285175] kill_fasync: bad magic number in fasync_struct!
[ 146.290793] kill_fasync: bad magic number in fasync_struct!
[ 146.285175] kill_fasync: bad magic number in fasync_struct!
[ 146.290793] kill_fasync: bad magic number in fasync_struct!
[ 157.071979] BUG: kernel NULL pointer dereference, address: 0000000000000020
[ 157.078945] #PF: supervisor read access in kernel mode
[ 157.084082] #PF: error_code(0x0000) - not-present page
[ 157.089225] PGD 0 P4D 0
[ 157.091763] Oops: 0000 [#1] SMP NOPTI
[ 157.095429] CPU: 3 PID: 7756 Comm: postgres Not tainted 5.7.0-rc7-andres-00133-gc8707bf69395 #41
[ 157.104208] Hardware name: Supermicro SYS-7049A-T/X11DAi-N, BIOS 3.2 11/13/2019
[ 157.111518] RIP: 0010:xfs_log_commit_cil+0x356/0x7f0
[ 157.116478] Code: 00 00 48 89 7e 08 49 89 76 30 48 89 11 48 89 4a 08 8b 4c 24 2c 48 89 85 c0 00 00 00 48 89 85 c8 00 00 00 49 8b 46 20 45 31 c9 <8b> 70 20 85 f6 0f 84 d9 02 00 00 45 31 c0 85 c9 7e 4f 41 8b 46 2c
[ 157.135226] RSP: 0018:ffffc90021fefb00 EFLAGS: 00010246
[ 157.140452] RAX: 0000000000000000 RBX: ffff8897a9ea80b0 RCX: 0000000000000458
[ 157.147582] RDX: ffff8897a9ea80c0 RSI: ffff88afc39a1200 RDI: ffff88afb1ae2c18
[ 157.154715] RBP: ffff8897a9ea8000 R08: ffffc90021fefb30 R09: 0000000000000000
[ 157.161848] R10: 0000000000000084 R11: 0000000000000012 R12: ffff8897721bd400
[ 157.168983] R13: ffff88afb1ae2c00 R14: ffff88afb19860c0 R15: ffff8897a9ea80a0
[ 157.176115] FS: 00007ffbbe9a1980(0000) GS:ffff8897e0cc0000(0000) knlGS:0000000000000000
[ 157.184199] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 157.189946] CR2: 0000000000000020 CR3: 000000179c0ac005 CR4: 00000000007606e0
[ 157.197081] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 157.204212] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 157.211343] PKRU: 55555554
[ 157.214048] Call Trace:
[ 157.216498] __xfs_trans_commit+0xa1/0x340
[ 157.220596] xfs_create+0x4cc/0x5e0
[ 157.224087] xfs_generic_create+0x241/0x310
[ 157.228274] path_openat+0xdb7/0x1020
[ 157.231940] do_filp_open+0x91/0x100
[ 157.235518] ? __sys_recvfrom+0xe4/0x170
[ 157.239444] ? _cond_resched+0x19/0x30
[ 157.243196] do_sys_openat2+0x215/0x2d0
[ 157.247037] do_sys_open+0x44/0x80
[ 157.250443] do_syscall_64+0x48/0x130
[ 157.254108] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 157.259161] RIP: 0033:0x7ffbc0ded307
[ 157.262738] Code: 25 00 00 41 00 3d 00 00 41 00 74 47 64 8b 04 25 18 00 00 00 85 c0 75 6b 44 89 e2 48 89 ee bf 9c ff ff ff b8 01 01 00 00 0f 05 <48> 3d 00 f0 ff ff 0f 87 95 00 00 00 48 8b 4c 24 28 64 48 33 0c 25
[ 157.281485] RSP: 002b:00007fff520a4940 EFLAGS: 00000246 ORIG_RAX: 0000000000000101
[ 157.289052] RAX: ffffffffffffffda RBX: 00005561dc02eca0 RCX: 00007ffbc0ded307
[ 157.296185] RDX: 0000000000000241 RSI: 00005561dc013470 RDI: 00000000ffffff9c
[ 157.303317] RBP: 00005561dc013470 R08: 0000000000000004 R09: 0000000000000001
[ 157.310447] R10: 00000000000001b6 R11: 0000000000000246 R12: 0000000000000241
[ 157.317575] R13: 00005561dc02eca0 R14: 0000000000000001 R15: 0000000000000000
[ 157.324705] Modules linked in: isst_if_common squashfs nls_iso8859_1 nls_cp437 snd_hda_codec_realtek vfat snd_hda_codec_generic fat iwlmvm ledtrig_audio snd_hda_codec_hdmi x86_pkg_temp_thermal intel_powerclamp snd_hda_intel snd_intel_dspcfg iwlwifi efi_pstore btusb snd_hda_codec btrtl btbcm efivars snd_hwdep btintel snd_hda_core iTCO_wdt iTCO_vendor_support mei_me mei loop coretemp hid_logitech_hidpp hid_logitech_dj hid_lenovo amdgpu gpu_sched i40e ixgbe ast drm_vram_helper drm_ttm_helper ttm mdio xhci_pci xhci_hcd
[ 157.370249] CR2: 0000000000000020
There's a lot more of these later, some interspersed (possibly related
to grabbing the output via serial->bmc->network->ipmitool). I've
attached the whole dmesg and .config.
What would be helpful for debugging? Should I try the v5 branch instead?
I'll try setting up a VM + passing through NVME to be able to test this
without fear... In one instance I did see some minor corruption on
another device & fs (ext4 on dm-crypt on nvme). It's all backed up,
but...
Greetings,
Andres Freund
View attachment "dmesg" of type "text/plain" (234796 bytes)
View attachment ".config" of type "text/plain" (176818 bytes)
Powered by blists - more mailing lists