[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <9550725a-2d3f-fa35-1410-cae912e128b9@tessares.net>
Date: Tue, 15 Sep 2020 17:34:50 +0200
From: Matthieu Baerts <matthieu.baerts@...sares.net>
To: Linus Torvalds <torvalds@...ux-foundation.org>
Cc: Michael Larabel <Michael@...haellarabel.com>,
Matthew Wilcox <willy@...radead.org>,
Amir Goldstein <amir73il@...il.com>,
Ted Ts'o <tytso@...gle.com>,
Andreas Dilger <adilger.kernel@...ger.ca>,
Ext4 Developers List <linux-ext4@...r.kernel.org>,
Jan Kara <jack@...e.cz>,
linux-fsdevel <linux-fsdevel@...r.kernel.org>
Subject: Re: Kernel Benchmarking
Hi Linus,
Thank you very much for your reply, with very clear explanations and
instructions!
On 14/09/2020 22:53, Linus Torvalds wrote:
> On Mon, Sep 14, 2020 at 1:21 PM Matthieu Baerts
> <matthieu.baerts@...sares.net> wrote:
>>
>> Recently, some of these packetdrill tests have been failing after 2
>> minutes (timeout) instead of being executed in a few seconds (~6
>> seconds). No packets are even exchanged during these two minutes.
>
> Hmm.
>
> That sounds like a deadlock to me, and sounds like it's a latent bug
> waiting to happen.
Yesterday evening, I wanted to get confirmation about that using
PROVE_LOCKING but just like today, each time I enable this kconfig, I
cannot reproduce the issue.
Anyway I am sure you are right and this bug has been there for sometime
but is too hard to reproduce.
>> I would be glad to help by validating new modifications or providing new
>> info. My setup is also easy to put in place: a Docker image is built
>> with all required tools to start the same VM just like the one I have.
>
> I'm not familiar enough with packetdrill or any of that infrastructure
> - does it do its own kernel modules etc for the packet latency
> testing?
No, Packetdrill doesn't load any kernel module.
Here is a short description of the execution model of Packetdrill from a
paper the authors wrote:
packetdrill parses an entire test script, and then executes each
timestamped line in real time -- at the pace described by the
timestamps -- to replay and verify the scenario.
- For each system call line, packetdrill executes the system call
and verifies that it returns the expected result.
- For each command line, packetdrill executes the shell command.
- For each incoming packet (denoted by a leading < on the line),
packetdrill constructs a packet and injects it into the kernel.
- For each outgoing packet (denoted by a leading > on the line),
packetdrill sniffs the next outgoing packet and verifies that the
packet's timing and contents match the script.
Source: https://research.google/pubs/pub41316/
> But it sounds like it's 100% repeatable with the fair page lock, which
> is actually a good thing. It means that if you do a "sysrq-w" while
> it's blocking, you should see exactly what is waiting for what.
>
> (Except since it times out nicely eventually, probably at least part
> of the waiting is interruptible, and then you need to do "sysrq-t"
> instead and it's going to be _very_ verbose and much harder to
> pinpoint things, and you'll probably need to have a very big printk
> buffer).
Thank you for this idea! I was focused on using lockdep and I forgot
about this simple method. It is not (yet) a reflex for me to use it!
I think I got an interesting trace I took 20 seconds after having
started packetdrill:
------------------- 8< -------------------
[ 25.507563] sysrq: Show Blocked State
[ 25.510695] task:packetdrill state:D stack:13848 pid: 188 ppid:
155 flags:0x00004000
[ 25.517841] Call Trace:
[ 25.520103] __schedule+0x3eb/0x680
[ 25.523197] schedule+0x45/0xb0
[ 25.526013] io_schedule+0xd/0x30
[ 25.528964] __lock_page_killable+0x13e/0x280
[ 25.532794] ? file_fdatawait_range+0x20/0x20
[ 25.536605] filemap_fault+0x6b4/0x970
[ 25.539911] ? filemap_map_pages+0x195/0x330
[ 25.543682] __do_fault+0x32/0x90
[ 25.546620] handle_mm_fault+0x8c1/0xe50
[ 25.550050] ? asm_sysvec_apic_timer_interrupt+0x12/0x20
[ 25.554637] __get_user_pages+0x25c/0x750
[ 25.558101] populate_vma_page_range+0x57/0x60
[ 25.561968] __mm_populate+0xa9/0x150
[ 25.565125] __x64_sys_mlockall+0x151/0x180
[ 25.568787] do_syscall_64+0x33/0x40
[ 25.571915] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 25.576230] RIP: 0033:0x7f21bee46b3b
[ 25.579357] Code: Bad RIP value.
[ 25.582199] RSP: 002b:00007ffcb5f8ad38 EFLAGS: 00000246 ORIG_RAX:
0000000000000097
[ 25.588588] RAX: ffffffffffffffda RBX: 000055c9762f1450 RCX:
00007f21bee46b3b
[ 25.594627] RDX: 00007ffcb5f8ad28 RSI: 0000000000000002 RDI:
0000000000000003
[ 25.600637] RBP: 00007ffcb5f8ad40 R08: 0000000000000001 R09:
0000000000000000
[ 25.606701] R10: 00007f21beec9ac0 R11: 0000000000000246 R12:
000055c9762b30a0
[ 25.612738] R13: 00007ffcb5f8b180 R14: 0000000000000000 R15:
0000000000000000
[ 25.618762] task:packetdrill state:D stack:13952 pid: 190 ppid:
153 flags:0x00004000
[ 25.625781] Call Trace:
[ 25.627987] __schedule+0x3eb/0x680
[ 25.631046] schedule+0x45/0xb0
[ 25.633796] io_schedule+0xd/0x30
[ 25.636726] ? wake_up_page_bit+0xd1/0x100
[ 25.640271] ? file_fdatawait_range+0x20/0x20
[ 25.644022] ? filemap_fault+0x6b4/0x970
[ 25.647427] ? filemap_map_pages+0x195/0x330
[ 25.651146] ? __do_fault+0x32/0x90
[ 25.654227] ? handle_mm_fault+0x8c1/0xe50
[ 25.657752] ? __get_user_pages+0x25c/0x750
[ 25.661368] ? populate_vma_page_range+0x57/0x60
[ 25.665338] ? __mm_populate+0xa9/0x150
[ 25.668707] ? __x64_sys_mlockall+0x151/0x180
[ 25.672467] ? do_syscall_64+0x33/0x40
[ 25.675751] ? entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 25.680213] task:packetdrill state:D stack:13952 pid: 193 ppid:
160 flags:0x00004000
[ 25.687285] Call Trace:
[ 25.689472] __schedule+0x3eb/0x680
[ 25.692547] schedule+0x45/0xb0
[ 25.695314] io_schedule+0xd/0x30
[ 25.698216] __lock_page_killable+0x13e/0x280
[ 25.702013] ? file_fdatawait_range+0x20/0x20
[ 25.705752] filemap_fault+0x6b4/0x970
[ 25.709010] ? filemap_map_pages+0x195/0x330
[ 25.712691] __do_fault+0x32/0x90
[ 25.715620] handle_mm_fault+0x8c1/0xe50
[ 25.719013] __get_user_pages+0x25c/0x750
[ 25.722485] populate_vma_page_range+0x57/0x60
[ 25.726326] __mm_populate+0xa9/0x150
[ 25.729528] __x64_sys_mlockall+0x151/0x180
[ 25.733138] do_syscall_64+0x33/0x40
[ 25.736263] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 25.740587] RIP: 0033:0x7feb59c16b3b
[ 25.743716] Code: Bad RIP value.
[ 25.746653] RSP: 002b:00007ffd75ef7f38 EFLAGS: 00000246 ORIG_RAX:
0000000000000097
[ 25.753019] RAX: ffffffffffffffda RBX: 0000562a49acc450 RCX:
00007feb59c16b3b
[ 25.759077] RDX: 00007ffd75ef7f28 RSI: 0000000000000002 RDI:
0000000000000003
[ 25.765127] RBP: 00007ffd75ef7f40 R08: 0000000000000001 R09:
0000000000000000
[ 25.771231] R10: 00007feb59c99ac0 R11: 0000000000000246 R12:
0000562a49a8e0a0
[ 25.777442] R13: 00007ffd75ef8380 R14: 0000000000000000 R15:
0000000000000000
[ 25.783496] task:packetdrill state:D stack:13952 pid: 194 ppid:
157 flags:0x00004000
[ 25.790536] Call Trace:
[ 25.792726] __schedule+0x3eb/0x680
[ 25.795777] schedule+0x45/0xb0
[ 25.798582] io_schedule+0xd/0x30
[ 25.801473] __lock_page_killable+0x13e/0x280
[ 25.805246] ? file_fdatawait_range+0x20/0x20
[ 25.809015] filemap_fault+0x6b4/0x970
[ 25.812279] ? filemap_map_pages+0x195/0x330
[ 25.815981] __do_fault+0x32/0x90
[ 25.818909] handle_mm_fault+0x8c1/0xe50
[ 25.822458] __get_user_pages+0x25c/0x750
[ 25.825947] populate_vma_page_range+0x57/0x60
[ 25.829775] __mm_populate+0xa9/0x150
[ 25.832973] __x64_sys_mlockall+0x151/0x180
[ 25.836591] do_syscall_64+0x33/0x40
[ 25.839715] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 25.844089] RIP: 0033:0x7f1bdd340b3b
[ 25.847219] Code: Bad RIP value.
[ 25.850079] RSP: 002b:00007fff992f49e8 EFLAGS: 00000246 ORIG_RAX:
0000000000000097
[ 25.856446] RAX: ffffffffffffffda RBX: 0000557ddd3b8450 RCX:
00007f1bdd340b3b
[ 25.862481] RDX: 00007fff992f49d8 RSI: 0000000000000002 RDI:
0000000000000003
[ 25.868455] RBP: 00007fff992f49f0 R08: 0000000000000001 R09:
0000000000000000
[ 25.874528] R10: 00007f1bdd3c3ac0 R11: 0000000000000246 R12:
0000557ddd37a0a0
[ 25.880541] R13: 00007fff992f4e30 R14: 0000000000000000 R15:
0000000000000000
[ 25.886556] task:packetdrill state:D stack:13952 pid: 200 ppid:
162 flags:0x00004000
[ 25.893568] Call Trace:
[ 25.895776] __schedule+0x3eb/0x680
[ 25.898833] schedule+0x45/0xb0
[ 25.901578] io_schedule+0xd/0x30
[ 25.904495] __lock_page_killable+0x13e/0x280
[ 25.908246] ? file_fdatawait_range+0x20/0x20
[ 25.912012] filemap_fault+0x6b4/0x970
[ 25.915270] ? filemap_map_pages+0x195/0x330
[ 25.918964] __do_fault+0x32/0x90
[ 25.921853] handle_mm_fault+0x8c1/0xe50
[ 25.925245] __get_user_pages+0x25c/0x750
[ 25.928720] populate_vma_page_range+0x57/0x60
[ 25.932543] __mm_populate+0xa9/0x150
[ 25.935727] __x64_sys_mlockall+0x151/0x180
[ 25.939348] do_syscall_64+0x33/0x40
[ 25.942466] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 25.946906] RIP: 0033:0x7fb34860bb3b
[ 25.950026] Code: Bad RIP value.
[ 25.952846] RSP: 002b:00007ffea61b7668 EFLAGS: 00000246 ORIG_RAX:
0000000000000097
[ 25.959289] RAX: ffffffffffffffda RBX: 000055c6f01c2450 RCX:
00007fb34860bb3b
[ 25.965453] RDX: 00007ffea61b7658 RSI: 0000000000000002 RDI:
0000000000000003
[ 25.971504] RBP: 00007ffea61b7670 R08: 0000000000000001 R09:
0000000000000000
[ 25.977505] R10: 00007fb34868eac0 R11: 0000000000000246 R12:
000055c6f01840a0
[ 25.983596] R13: 00007ffea61b7ab0 R14: 0000000000000000 R15:
0000000000000000
[ 25.989598] task:packetdrill state:D stack:13952 pid: 203 ppid:
169 flags:0x00004000
[ 25.996611] Call Trace:
[ 25.998823] __schedule+0x3eb/0x680
[ 26.001863] schedule+0x45/0xb0
[ 26.004645] io_schedule+0xd/0x30
[ 26.007576] ? wake_up_page_bit+0xd1/0x100
[ 26.011133] ? file_fdatawait_range+0x20/0x20
[ 26.014900] ? filemap_fault+0x6b4/0x970
[ 26.018282] ? filemap_map_pages+0x195/0x330
[ 26.021973] ? __do_fault+0x32/0x90
[ 26.025017] ? handle_mm_fault+0x8c1/0xe50
[ 26.028551] ? __get_user_pages+0x25c/0x750
[ 26.032163] ? populate_vma_page_range+0x57/0x60
[ 26.036134] ? __mm_populate+0xa9/0x150
[ 26.039487] ? __x64_sys_mlockall+0x151/0x180
[ 26.043260] ? do_syscall_64+0x33/0x40
[ 26.046528] ? entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 26.050968] task:packetdrill state:D stack:13904 pid: 207 ppid:
173 flags:0x00004000
[ 26.058008] Call Trace:
[ 26.060192] __schedule+0x3eb/0x680
[ 26.063248] schedule+0x45/0xb0
[ 26.066032] io_schedule+0xd/0x30
[ 26.068924] __lock_page_killable+0x13e/0x280
[ 26.072677] ? file_fdatawait_range+0x20/0x20
[ 26.076429] filemap_fault+0x6b4/0x970
[ 26.079704] ? filemap_map_pages+0x195/0x330
[ 26.083424] __do_fault+0x32/0x90
[ 26.086342] handle_mm_fault+0x8c1/0xe50
[ 26.089722] __get_user_pages+0x25c/0x750
[ 26.093209] populate_vma_page_range+0x57/0x60
[ 26.097040] __mm_populate+0xa9/0x150
[ 26.100218] __x64_sys_mlockall+0x151/0x180
[ 26.103837] do_syscall_64+0x33/0x40
[ 26.106948] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 26.111256] RIP: 0033:0x7f90fb829b3b
[ 26.114383] Code: Bad RIP value.
[ 26.117183] RSP: 002b:00007ffc3ae07ea8 EFLAGS: 00000246 ORIG_RAX:
0000000000000097
[ 26.123589] RAX: ffffffffffffffda RBX: 0000560bf837d450 RCX:
00007f90fb829b3b
[ 26.129614] RDX: 00007ffc3ae07e98 RSI: 0000000000000002 RDI:
0000000000000003
[ 26.135641] RBP: 00007ffc3ae07eb0 R08: 0000000000000001 R09:
0000000000000000
[ 26.141660] R10: 00007f90fb8acac0 R11: 0000000000000246 R12:
0000560bf833f0a0
[ 26.147675] R13: 00007ffc3ae082f0 R14: 0000000000000000 R15:
0000000000000000
[ 26.153693] task:packetdrill state:D stack:13952 pid: 210 ppid:
179 flags:0x00004000
[ 26.160728] Call Trace:
[ 26.162923] ? sched_clock_cpu+0x95/0xa0
[ 26.166326] ? ttwu_do_wakeup.isra.0+0x34/0xe0
[ 26.170172] ? __schedule+0x3eb/0x680
[ 26.173349] ? schedule+0x45/0xb0
[ 26.176271] ? io_schedule+0xd/0x30
[ 26.179320] ? __lock_page_killable+0x13e/0x280
[ 26.183216] ? file_fdatawait_range+0x20/0x20
[ 26.187031] ? filemap_fault+0x6b4/0x970
[ 26.190451] ? filemap_map_pages+0x195/0x330
[ 26.194128] ? __do_fault+0x32/0x90
[ 26.197161] ? handle_mm_fault+0x8c1/0xe50
[ 26.200692] ? push_rt_tasks+0xc/0x20
[ 26.203866] ? __get_user_pages+0x25c/0x750
[ 26.207474] ? populate_vma_page_range+0x57/0x60
[ 26.211423] ? __mm_populate+0xa9/0x150
[ 26.214763] ? __x64_sys_mlockall+0x151/0x180
[ 26.218506] ? do_syscall_64+0x33/0x40
[ 26.221757] ? entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 26.226216] task:packetdrill state:D stack:13952 pid: 212 ppid:
185 flags:0x00004000
[ 26.233223] Call Trace:
[ 26.235435] __schedule+0x3eb/0x680
[ 26.238487] schedule+0x45/0xb0
[ 26.241234] io_schedule+0xd/0x30
[ 26.244133] __lock_page_killable+0x13e/0x280
[ 26.247890] ? file_fdatawait_range+0x20/0x20
[ 26.251647] filemap_fault+0x6b4/0x970
[ 26.254906] ? filemap_map_pages+0x195/0x330
[ 26.258590] __do_fault+0x32/0x90
[ 26.261462] handle_mm_fault+0x8c1/0xe50
[ 26.264869] __get_user_pages+0x25c/0x750
[ 26.268327] populate_vma_page_range+0x57/0x60
[ 26.272162] __mm_populate+0xa9/0x150
[ 26.275347] __x64_sys_mlockall+0x151/0x180
[ 26.278970] do_syscall_64+0x33/0x40
[ 26.282082] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 26.286466] RIP: 0033:0x7f00e8863b3b
[ 26.289574] Code: Bad RIP value.
[ 26.292420] RSP: 002b:00007fff5b28f378 EFLAGS: 00000246 ORIG_RAX:
0000000000000097
[ 26.298797] RAX: ffffffffffffffda RBX: 000055ea3bc97450 RCX:
00007f00e8863b3b
[ 26.304787] RDX: 00007fff5b28f368 RSI: 0000000000000002 RDI:
0000000000000003
[ 26.310867] RBP: 00007fff5b28f380 R08: 0000000000000001 R09:
0000000000000000
[ 26.316697] R10: 00007f00e88e6ac0 R11: 0000000000000246 R12:
000055ea3bc590a0
[ 26.322525] R13: 00007fff5b28f7c0 R14: 0000000000000000 R15:
0000000000000000
------------------- 8< -------------------
A version from "decode_stacktrace.sh" is also attached to this email, I
was not sure it would be readable here.
Please tell me if anything else is needed.
One more thing, only when I have the issue, I can also see this kernel
message that seems clearly linked:
[ 7.198259] sched: RT throttling activated
> There are obviously other ways to do it too - kgdb or whatever - which
> you may or may not be more used to.
I never tried to use kgdb with this setup but it is clearly a good
occasion to start! I guess I will be able to easily reproduce the issue
and then generate the crash dump.
> But sysrq is very traditional and often particularly easy if it's a
> very repeatable "things are hung". Not nearly as good as lockdep, of
> course. But if the machine is otherwise working, you can just do
>
> echo 'w' > /proc/sysrq-trigger
>
> in another terminal (and again, maybe you need 't', but then you
> really want to do it *without* having a full GUI setup or anythign
> like that, to at least make it somewhat less verbose).
Please tell me if the trace I shared above is helpful. If not I can
easily share the long output from sysrq-t -- no GUI nor any other
services are running in the background -- and if needed I can prioritise
the generation of a crash dump + analysis.
> Aside: a quick google shows that Nick Piggin did try to extend lockdep
> to the page lock many many years ago. I don't think it ever went
> anywhere. To quote Avril Lavigne: "It's complicated".
:-)
Cheers,
Matt
--
Tessares | Belgium | Hybrid Access Solutions
www.tessares.net
View attachment "sysrq_w_1_analysed.txt" of type "text/plain" (13513 bytes)
Powered by blists - more mailing lists