lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <9550725a-2d3f-fa35-1410-cae912e128b9@tessares.net>
Date:   Tue, 15 Sep 2020 17:34:50 +0200
From:   Matthieu Baerts <matthieu.baerts@...sares.net>
To:     Linus Torvalds <torvalds@...ux-foundation.org>
Cc:     Michael Larabel <Michael@...haellarabel.com>,
        Matthew Wilcox <willy@...radead.org>,
        Amir Goldstein <amir73il@...il.com>,
        Ted Ts'o <tytso@...gle.com>,
        Andreas Dilger <adilger.kernel@...ger.ca>,
        Ext4 Developers List <linux-ext4@...r.kernel.org>,
        Jan Kara <jack@...e.cz>,
        linux-fsdevel <linux-fsdevel@...r.kernel.org>
Subject: Re: Kernel Benchmarking

Hi Linus,

Thank you very much for your reply, with very clear explanations and 
instructions!

On 14/09/2020 22:53, Linus Torvalds wrote:
> On Mon, Sep 14, 2020 at 1:21 PM Matthieu Baerts
> <matthieu.baerts@...sares.net> wrote:
>>
>> Recently, some of these packetdrill tests have been failing after 2
>> minutes (timeout) instead of being executed in a few seconds (~6
>> seconds). No packets are even exchanged during these two minutes.
> 
> Hmm.
> 
> That sounds like a deadlock to me, and sounds like it's a latent bug
> waiting to happen.

Yesterday evening, I wanted to get confirmation about that using 
PROVE_LOCKING but just like today, each time I enable this kconfig, I 
cannot reproduce the issue.

Anyway I am sure you are right and this bug has been there for sometime 
but is too hard to reproduce.
>> I would be glad to help by validating new modifications or providing new
>> info. My setup is also easy to put in place: a Docker image is built
>> with all required tools to start the same VM just like the one I have.
> 
> I'm not familiar enough with packetdrill or any of that infrastructure
> - does it do its own kernel modules etc for the packet latency
> testing?

No, Packetdrill doesn't load any kernel module.

Here is a short description of the execution model of Packetdrill from a 
paper the authors wrote:

     packetdrill parses an entire test script, and then executes each
     timestamped line in real time -- at the pace described by the
     timestamps -- to replay and verify the scenario.
     - For each system call line, packetdrill executes the system call
       and verifies that it returns the expected result.
     - For each command line, packetdrill executes the shell command.
     - For each incoming packet (denoted by a leading < on the line),
       packetdrill constructs a packet and injects it into the kernel.
     - For each outgoing packet (denoted by a leading > on the line),
       packetdrill sniffs the next outgoing packet and verifies that the
       packet's timing and contents match the script.

Source: https://research.google/pubs/pub41316/

> But it sounds like it's 100% repeatable with the fair page lock, which
> is actually a good thing. It means that if you do a "sysrq-w" while
> it's blocking, you should see exactly what is waiting for what.
> 
> (Except since it times out nicely eventually, probably at least part
> of the waiting is interruptible, and then you need to do "sysrq-t"
> instead and it's going to be _very_ verbose and much harder to
> pinpoint things, and you'll probably need to have a very big printk
> buffer).

Thank you for this idea! I was focused on using lockdep and I forgot 
about this simple method. It is not (yet) a reflex for me to use it!

I think I got an interesting trace I took 20 seconds after having 
started packetdrill:


------------------- 8< -------------------
[   25.507563] sysrq: Show Blocked State
[   25.510695] task:packetdrill     state:D stack:13848 pid:  188 ppid: 
   155 flags:0x00004000
[   25.517841] Call Trace:
[   25.520103]  __schedule+0x3eb/0x680
[   25.523197]  schedule+0x45/0xb0
[   25.526013]  io_schedule+0xd/0x30
[   25.528964]  __lock_page_killable+0x13e/0x280
[   25.532794]  ? file_fdatawait_range+0x20/0x20
[   25.536605]  filemap_fault+0x6b4/0x970
[   25.539911]  ? filemap_map_pages+0x195/0x330
[   25.543682]  __do_fault+0x32/0x90
[   25.546620]  handle_mm_fault+0x8c1/0xe50
[   25.550050]  ? asm_sysvec_apic_timer_interrupt+0x12/0x20
[   25.554637]  __get_user_pages+0x25c/0x750
[   25.558101]  populate_vma_page_range+0x57/0x60
[   25.561968]  __mm_populate+0xa9/0x150
[   25.565125]  __x64_sys_mlockall+0x151/0x180
[   25.568787]  do_syscall_64+0x33/0x40
[   25.571915]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[   25.576230] RIP: 0033:0x7f21bee46b3b
[   25.579357] Code: Bad RIP value.
[   25.582199] RSP: 002b:00007ffcb5f8ad38 EFLAGS: 00000246 ORIG_RAX: 
0000000000000097
[   25.588588] RAX: ffffffffffffffda RBX: 000055c9762f1450 RCX: 
00007f21bee46b3b
[   25.594627] RDX: 00007ffcb5f8ad28 RSI: 0000000000000002 RDI: 
0000000000000003
[   25.600637] RBP: 00007ffcb5f8ad40 R08: 0000000000000001 R09: 
0000000000000000
[   25.606701] R10: 00007f21beec9ac0 R11: 0000000000000246 R12: 
000055c9762b30a0
[   25.612738] R13: 00007ffcb5f8b180 R14: 0000000000000000 R15: 
0000000000000000
[   25.618762] task:packetdrill     state:D stack:13952 pid:  190 ppid: 
   153 flags:0x00004000
[   25.625781] Call Trace:
[   25.627987]  __schedule+0x3eb/0x680
[   25.631046]  schedule+0x45/0xb0
[   25.633796]  io_schedule+0xd/0x30
[   25.636726]  ? wake_up_page_bit+0xd1/0x100
[   25.640271]  ? file_fdatawait_range+0x20/0x20
[   25.644022]  ? filemap_fault+0x6b4/0x970
[   25.647427]  ? filemap_map_pages+0x195/0x330
[   25.651146]  ? __do_fault+0x32/0x90
[   25.654227]  ? handle_mm_fault+0x8c1/0xe50
[   25.657752]  ? __get_user_pages+0x25c/0x750
[   25.661368]  ? populate_vma_page_range+0x57/0x60
[   25.665338]  ? __mm_populate+0xa9/0x150
[   25.668707]  ? __x64_sys_mlockall+0x151/0x180
[   25.672467]  ? do_syscall_64+0x33/0x40
[   25.675751]  ? entry_SYSCALL_64_after_hwframe+0x44/0xa9
[   25.680213] task:packetdrill     state:D stack:13952 pid:  193 ppid: 
   160 flags:0x00004000
[   25.687285] Call Trace:
[   25.689472]  __schedule+0x3eb/0x680
[   25.692547]  schedule+0x45/0xb0
[   25.695314]  io_schedule+0xd/0x30
[   25.698216]  __lock_page_killable+0x13e/0x280
[   25.702013]  ? file_fdatawait_range+0x20/0x20
[   25.705752]  filemap_fault+0x6b4/0x970
[   25.709010]  ? filemap_map_pages+0x195/0x330
[   25.712691]  __do_fault+0x32/0x90
[   25.715620]  handle_mm_fault+0x8c1/0xe50
[   25.719013]  __get_user_pages+0x25c/0x750
[   25.722485]  populate_vma_page_range+0x57/0x60
[   25.726326]  __mm_populate+0xa9/0x150
[   25.729528]  __x64_sys_mlockall+0x151/0x180
[   25.733138]  do_syscall_64+0x33/0x40
[   25.736263]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[   25.740587] RIP: 0033:0x7feb59c16b3b
[   25.743716] Code: Bad RIP value.
[   25.746653] RSP: 002b:00007ffd75ef7f38 EFLAGS: 00000246 ORIG_RAX: 
0000000000000097
[   25.753019] RAX: ffffffffffffffda RBX: 0000562a49acc450 RCX: 
00007feb59c16b3b
[   25.759077] RDX: 00007ffd75ef7f28 RSI: 0000000000000002 RDI: 
0000000000000003
[   25.765127] RBP: 00007ffd75ef7f40 R08: 0000000000000001 R09: 
0000000000000000
[   25.771231] R10: 00007feb59c99ac0 R11: 0000000000000246 R12: 
0000562a49a8e0a0
[   25.777442] R13: 00007ffd75ef8380 R14: 0000000000000000 R15: 
0000000000000000
[   25.783496] task:packetdrill     state:D stack:13952 pid:  194 ppid: 
   157 flags:0x00004000
[   25.790536] Call Trace:
[   25.792726]  __schedule+0x3eb/0x680
[   25.795777]  schedule+0x45/0xb0
[   25.798582]  io_schedule+0xd/0x30
[   25.801473]  __lock_page_killable+0x13e/0x280
[   25.805246]  ? file_fdatawait_range+0x20/0x20
[   25.809015]  filemap_fault+0x6b4/0x970
[   25.812279]  ? filemap_map_pages+0x195/0x330
[   25.815981]  __do_fault+0x32/0x90
[   25.818909]  handle_mm_fault+0x8c1/0xe50
[   25.822458]  __get_user_pages+0x25c/0x750
[   25.825947]  populate_vma_page_range+0x57/0x60
[   25.829775]  __mm_populate+0xa9/0x150
[   25.832973]  __x64_sys_mlockall+0x151/0x180
[   25.836591]  do_syscall_64+0x33/0x40
[   25.839715]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[   25.844089] RIP: 0033:0x7f1bdd340b3b
[   25.847219] Code: Bad RIP value.
[   25.850079] RSP: 002b:00007fff992f49e8 EFLAGS: 00000246 ORIG_RAX: 
0000000000000097
[   25.856446] RAX: ffffffffffffffda RBX: 0000557ddd3b8450 RCX: 
00007f1bdd340b3b
[   25.862481] RDX: 00007fff992f49d8 RSI: 0000000000000002 RDI: 
0000000000000003
[   25.868455] RBP: 00007fff992f49f0 R08: 0000000000000001 R09: 
0000000000000000
[   25.874528] R10: 00007f1bdd3c3ac0 R11: 0000000000000246 R12: 
0000557ddd37a0a0
[   25.880541] R13: 00007fff992f4e30 R14: 0000000000000000 R15: 
0000000000000000
[   25.886556] task:packetdrill     state:D stack:13952 pid:  200 ppid: 
   162 flags:0x00004000
[   25.893568] Call Trace:
[   25.895776]  __schedule+0x3eb/0x680
[   25.898833]  schedule+0x45/0xb0
[   25.901578]  io_schedule+0xd/0x30
[   25.904495]  __lock_page_killable+0x13e/0x280
[   25.908246]  ? file_fdatawait_range+0x20/0x20
[   25.912012]  filemap_fault+0x6b4/0x970
[   25.915270]  ? filemap_map_pages+0x195/0x330
[   25.918964]  __do_fault+0x32/0x90
[   25.921853]  handle_mm_fault+0x8c1/0xe50
[   25.925245]  __get_user_pages+0x25c/0x750
[   25.928720]  populate_vma_page_range+0x57/0x60
[   25.932543]  __mm_populate+0xa9/0x150
[   25.935727]  __x64_sys_mlockall+0x151/0x180
[   25.939348]  do_syscall_64+0x33/0x40
[   25.942466]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[   25.946906] RIP: 0033:0x7fb34860bb3b
[   25.950026] Code: Bad RIP value.
[   25.952846] RSP: 002b:00007ffea61b7668 EFLAGS: 00000246 ORIG_RAX: 
0000000000000097
[   25.959289] RAX: ffffffffffffffda RBX: 000055c6f01c2450 RCX: 
00007fb34860bb3b
[   25.965453] RDX: 00007ffea61b7658 RSI: 0000000000000002 RDI: 
0000000000000003
[   25.971504] RBP: 00007ffea61b7670 R08: 0000000000000001 R09: 
0000000000000000
[   25.977505] R10: 00007fb34868eac0 R11: 0000000000000246 R12: 
000055c6f01840a0
[   25.983596] R13: 00007ffea61b7ab0 R14: 0000000000000000 R15: 
0000000000000000
[   25.989598] task:packetdrill     state:D stack:13952 pid:  203 ppid: 
   169 flags:0x00004000
[   25.996611] Call Trace:
[   25.998823]  __schedule+0x3eb/0x680
[   26.001863]  schedule+0x45/0xb0
[   26.004645]  io_schedule+0xd/0x30
[   26.007576]  ? wake_up_page_bit+0xd1/0x100
[   26.011133]  ? file_fdatawait_range+0x20/0x20
[   26.014900]  ? filemap_fault+0x6b4/0x970
[   26.018282]  ? filemap_map_pages+0x195/0x330
[   26.021973]  ? __do_fault+0x32/0x90
[   26.025017]  ? handle_mm_fault+0x8c1/0xe50
[   26.028551]  ? __get_user_pages+0x25c/0x750
[   26.032163]  ? populate_vma_page_range+0x57/0x60
[   26.036134]  ? __mm_populate+0xa9/0x150
[   26.039487]  ? __x64_sys_mlockall+0x151/0x180
[   26.043260]  ? do_syscall_64+0x33/0x40
[   26.046528]  ? entry_SYSCALL_64_after_hwframe+0x44/0xa9
[   26.050968] task:packetdrill     state:D stack:13904 pid:  207 ppid: 
   173 flags:0x00004000
[   26.058008] Call Trace:
[   26.060192]  __schedule+0x3eb/0x680
[   26.063248]  schedule+0x45/0xb0
[   26.066032]  io_schedule+0xd/0x30
[   26.068924]  __lock_page_killable+0x13e/0x280
[   26.072677]  ? file_fdatawait_range+0x20/0x20
[   26.076429]  filemap_fault+0x6b4/0x970
[   26.079704]  ? filemap_map_pages+0x195/0x330
[   26.083424]  __do_fault+0x32/0x90
[   26.086342]  handle_mm_fault+0x8c1/0xe50
[   26.089722]  __get_user_pages+0x25c/0x750
[   26.093209]  populate_vma_page_range+0x57/0x60
[   26.097040]  __mm_populate+0xa9/0x150
[   26.100218]  __x64_sys_mlockall+0x151/0x180
[   26.103837]  do_syscall_64+0x33/0x40
[   26.106948]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[   26.111256] RIP: 0033:0x7f90fb829b3b
[   26.114383] Code: Bad RIP value.
[   26.117183] RSP: 002b:00007ffc3ae07ea8 EFLAGS: 00000246 ORIG_RAX: 
0000000000000097
[   26.123589] RAX: ffffffffffffffda RBX: 0000560bf837d450 RCX: 
00007f90fb829b3b
[   26.129614] RDX: 00007ffc3ae07e98 RSI: 0000000000000002 RDI: 
0000000000000003
[   26.135641] RBP: 00007ffc3ae07eb0 R08: 0000000000000001 R09: 
0000000000000000
[   26.141660] R10: 00007f90fb8acac0 R11: 0000000000000246 R12: 
0000560bf833f0a0
[   26.147675] R13: 00007ffc3ae082f0 R14: 0000000000000000 R15: 
0000000000000000
[   26.153693] task:packetdrill     state:D stack:13952 pid:  210 ppid: 
   179 flags:0x00004000
[   26.160728] Call Trace:
[   26.162923]  ? sched_clock_cpu+0x95/0xa0
[   26.166326]  ? ttwu_do_wakeup.isra.0+0x34/0xe0
[   26.170172]  ? __schedule+0x3eb/0x680
[   26.173349]  ? schedule+0x45/0xb0
[   26.176271]  ? io_schedule+0xd/0x30
[   26.179320]  ? __lock_page_killable+0x13e/0x280
[   26.183216]  ? file_fdatawait_range+0x20/0x20
[   26.187031]  ? filemap_fault+0x6b4/0x970
[   26.190451]  ? filemap_map_pages+0x195/0x330
[   26.194128]  ? __do_fault+0x32/0x90
[   26.197161]  ? handle_mm_fault+0x8c1/0xe50
[   26.200692]  ? push_rt_tasks+0xc/0x20
[   26.203866]  ? __get_user_pages+0x25c/0x750
[   26.207474]  ? populate_vma_page_range+0x57/0x60
[   26.211423]  ? __mm_populate+0xa9/0x150
[   26.214763]  ? __x64_sys_mlockall+0x151/0x180
[   26.218506]  ? do_syscall_64+0x33/0x40
[   26.221757]  ? entry_SYSCALL_64_after_hwframe+0x44/0xa9
[   26.226216] task:packetdrill     state:D stack:13952 pid:  212 ppid: 
   185 flags:0x00004000
[   26.233223] Call Trace:
[   26.235435]  __schedule+0x3eb/0x680
[   26.238487]  schedule+0x45/0xb0
[   26.241234]  io_schedule+0xd/0x30
[   26.244133]  __lock_page_killable+0x13e/0x280
[   26.247890]  ? file_fdatawait_range+0x20/0x20
[   26.251647]  filemap_fault+0x6b4/0x970
[   26.254906]  ? filemap_map_pages+0x195/0x330
[   26.258590]  __do_fault+0x32/0x90
[   26.261462]  handle_mm_fault+0x8c1/0xe50
[   26.264869]  __get_user_pages+0x25c/0x750
[   26.268327]  populate_vma_page_range+0x57/0x60
[   26.272162]  __mm_populate+0xa9/0x150
[   26.275347]  __x64_sys_mlockall+0x151/0x180
[   26.278970]  do_syscall_64+0x33/0x40
[   26.282082]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[   26.286466] RIP: 0033:0x7f00e8863b3b
[   26.289574] Code: Bad RIP value.
[   26.292420] RSP: 002b:00007fff5b28f378 EFLAGS: 00000246 ORIG_RAX: 
0000000000000097
[   26.298797] RAX: ffffffffffffffda RBX: 000055ea3bc97450 RCX: 
00007f00e8863b3b
[   26.304787] RDX: 00007fff5b28f368 RSI: 0000000000000002 RDI: 
0000000000000003
[   26.310867] RBP: 00007fff5b28f380 R08: 0000000000000001 R09: 
0000000000000000
[   26.316697] R10: 00007f00e88e6ac0 R11: 0000000000000246 R12: 
000055ea3bc590a0
[   26.322525] R13: 00007fff5b28f7c0 R14: 0000000000000000 R15: 
0000000000000000
------------------- 8< -------------------


A version from "decode_stacktrace.sh" is also attached to this email, I 
was not sure it would be readable here.
Please tell me if anything else is needed.

One more thing, only when I have the issue, I can also see this kernel 
message that seems clearly linked:

   [    7.198259] sched: RT throttling activated

> There are obviously other ways to do it too - kgdb or whatever - which
> you may or may not be more used to.

I never tried to use kgdb with this setup but it is clearly a good 
occasion to start! I guess I will be able to easily reproduce the issue 
and then generate the crash dump.

> But sysrq is very traditional and often particularly easy if it's a
> very repeatable "things are hung". Not nearly as good as lockdep, of
> course. But if the machine is otherwise working, you can just do
> 
>      echo 'w' > /proc/sysrq-trigger
> 
> in another terminal (and again, maybe you need 't', but then you
> really want to do it *without* having a full GUI setup or anythign
> like that, to at least make it somewhat less verbose).

Please tell me if the trace I shared above is helpful. If not I can 
easily share the long output from sysrq-t -- no GUI nor any other 
services are running in the background -- and if needed I can prioritise 
the generation of a crash dump + analysis.

> Aside: a quick google shows that Nick Piggin did try to extend lockdep
> to the page lock many many years ago. I don't think it ever went
> anywhere. To quote Avril Lavigne: "It's complicated".

:-)

Cheers,
Matt
-- 
Tessares | Belgium | Hybrid Access Solutions
www.tessares.net

View attachment "sysrq_w_1_analysed.txt" of type "text/plain" (13513 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ