linux-ext4 - Re: Kernel Benchmarking

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAHk-=wimdSWe+GVBKwB0_=ZKX2ZN5JEqK5yA99toab4MAoYAsg@mail.gmail.com>
Date:   Tue, 15 Sep 2020 11:27:19 -0700
From:   Linus Torvalds <torvalds@...ux-foundation.org>
To:     Matthieu Baerts <matthieu.baerts@...sares.net>
Cc:     Michael Larabel <Michael@...haellarabel.com>,
        Matthew Wilcox <willy@...radead.org>,
        Amir Goldstein <amir73il@...il.com>,
        "Ted Ts'o" <tytso@...gle.com>,
        Andreas Dilger <adilger.kernel@...ger.ca>,
        Ext4 Developers List <linux-ext4@...r.kernel.org>,
        Jan Kara <jack@...e.cz>,
        linux-fsdevel <linux-fsdevel@...r.kernel.org>
Subject: Re: Kernel Benchmarking

On Tue, Sep 15, 2020 at 8:34 AM Matthieu Baerts
<matthieu.baerts@...sares.net> wrote:
>
> > But it sounds like it's 100% repeatable with the fair page lock, which
> > is actually a good thing. It means that if you do a "sysrq-w" while
> > it's blocking, you should see exactly what is waiting for what.
> >
> > (Except since it times out nicely eventually, probably at least part
> > of the waiting is interruptible, and then you need to do "sysrq-t"
> > instead and it's going to be _very_ verbose and much harder to
> > pinpoint things, and you'll probably need to have a very big printk
> > buffer).
>
> Thank you for this idea! I was focused on using lockdep and I forgot
> about this simple method. It is not (yet) a reflex for me to use it!
>
> I think I got an interesting trace I took 20 seconds after having
> started packetdrill:

Ok, so everybody there is basically in the same identical situation,
they all seem to be doing mlockall(), which does __mm_populate() ->
populate_vma_page_range() -> __get_user_pages() -> handle_mm_fault()
and then actually tries to fault in the missing pages.

And that does do a lot of "lock_page()" (and, of course, as a result,
a lot of "unlock_page()" too).

Every one of them is in the "io_schedule()" in the filemap_fault()
path, although two of them seem to be in file_fdatawait_range() rather
than in the lock_page() code itself (so they are also waiting on a
page bit, but they are waiting for the writeback bit to clear).

And all of them do it under the mmap_read_lock().

I'm not seeing what else they'd be blocking on, though.

As mentioned, the thing they are blocking on might be something
interruptible that holds the lock, and might not be in 'D' state. Then
it wouldn't show up in sysrq-W, you'd have to do 'sysrq-T' to see
those..

>From past experience, that tends to be a _lot_ of data, though, and it
easily overflows the printk buffers etc.

lockdep has made these kinds of sysrq hacks mostly a thing of the
past, and the few non-lockdep locks (and the page lock is definitely
the biggest of them) are an annoying blast to the past..

                    Linus