linux-ext4 - Re: Kernel Benchmarking

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20200917185049.GV5449@casper.infradead.org>
Date:   Thu, 17 Sep 2020 19:50:49 +0100
From:   Matthew Wilcox <willy@...radead.org>
To:     Linus Torvalds <torvalds@...ux-foundation.org>
Cc:     Michael Larabel <Michael@...haellarabel.com>,
        Matthieu Baerts <matthieu.baerts@...sares.net>,
        Amir Goldstein <amir73il@...il.com>,
        Ted Ts'o <tytso@...gle.com>,
        Andreas Dilger <adilger.kernel@...ger.ca>,
        Ext4 Developers List <linux-ext4@...r.kernel.org>,
        Jan Kara <jack@...e.cz>,
        linux-fsdevel <linux-fsdevel@...r.kernel.org>
Subject: Re: Kernel Benchmarking

On Thu, Sep 17, 2020 at 11:30:00AM -0700, Linus Torvalds wrote:
> On Thu, Sep 17, 2020 at 11:23 AM Matthew Wilcox <willy@...radead.org> wrote:
> >
> >             Something like taking
> > the i_mmap_lock_read(file->f_mapping) in filemap_fault, then adding a
> > new VM_FAULT_I_MMAP_LOCKED bit so that do_read_fault() and friends add:
> >
> >         if (ret & VM_FAULT_I_MMAP_LOCKED)
> >                 i_mmap_unlock_read(vmf->vma->vm_file->f_mapping);
> >         else
> >                 unlock_page(page);
> >
> > ... want me to turn that into a real patch?
> 
> I can't guarantee it's the right model - it does worry me how many
> places we might get that i_mmap_rwlock, and how long we migth hold it
> for writing, and what deadlocks it might cause when we take it for
> reading in the page fault path.
> 
> But I think it might be very interesting as a benchmark patch and a
> trial balloon. Maybe it "just works".

Ahh.  Here's a race this doesn't close:

int truncate_inode_page(struct address_space *mapping, struct page *page)
{
        VM_BUG_ON_PAGE(PageTail(page), page);

        if (page->mapping != mapping)
                return -EIO;

        truncate_cleanup_page(mapping, page);
        delete_from_page_cache(page);
        return 0;
}

truncate_cleanup_page() does
        if (page_mapped(page)) {
                pgoff_t nr = PageTransHuge(page) ? HPAGE_PMD_NR : 1;
                unmap_mapping_pages(mapping, page->index, nr, false);
        }

but ->mapping isn't cleared until delete_from_page_cache() many
instructions later.  So we can get the lock and have a page which appears
to be not-truncated, only for it to get truncated on us later.

> I would _love_ for the page lock itself to be only (or at least
> _mainly_) about the actual IO synchronization on the page.
> 
> That was the origin of it, the whole "protect all the complex state of
> a page" behavior kind of grew over time, since it was the only
> per-page lock we had.

Yes, I think that's a noble goal.