lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200915160553.GJ1221970@ziepe.ca>
Date:   Tue, 15 Sep 2020 13:05:53 -0300
From:   Jason Gunthorpe <jgg@...pe.ca>
To:     Peter Xu <peterx@...hat.com>
Cc:     Linus Torvalds <torvalds@...ux-foundation.org>,
        Leon Romanovsky <leonro@...dia.com>,
        Linux-MM <linux-mm@...ck.org>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        "Maya B . Gokhale" <gokhale2@...l.gov>,
        Yang Shi <yang.shi@...ux.alibaba.com>,
        Marty Mcfadden <mcfadden8@...l.gov>,
        Kirill Shutemov <kirill@...temov.name>,
        Oleg Nesterov <oleg@...hat.com>, Jann Horn <jannh@...gle.com>,
        Jan Kara <jack@...e.cz>, Kirill Tkhai <ktkhai@...tuozzo.com>,
        Andrea Arcangeli <aarcange@...hat.com>,
        Christoph Hellwig <hch@....de>,
        Andrew Morton <akpm@...ux-foundation.org>
Subject: Re: [PATCH 1/4] mm: Trial do_wp_page() simplification

On Tue, Sep 15, 2020 at 10:50:40AM -0400, Peter Xu wrote:
> On Mon, Sep 14, 2020 at 08:28:51PM -0300, Jason Gunthorpe wrote:
> > Yes, this stuff does pin_user_pages_fast() and MADV_DONTFORK
> > together. It sets FOLL_FORCE and FOLL_WRITE to get an exclusive copy
> > of the page and MADV_DONTFORK was needed to ensure that a future fork
> > doesn't establish a COW that would break the DMA by moving the
> > physical page over to the fork. DMA should stay with the process that
> > called pin_user_pages_fast() (Is MADV_DONTFORK still needed with
> > recent years work to GUP/etc? It is a pretty terrible ancient thing)
> 
> ... Now I'm more confused on what has happened.

I'm going to try to confirm that the MADV_DONTFORK is actually being
done by userspace properly, more later.

> It means, as long as the rdma region has VM_WRITE set (which I think of no
> reason on why it shouldn't...), then it should have the write bit in the COWed
> page entry.  If so, the page should be stable and I don't undersdand why
> another COW could even trigger and how the code path in the "trial cow" patch
> is triggered.

All the regions the test are doing DMA to will be simple process
writable anonymous VMA's from malloc()
 
> Or, the VMA is without VM_WRITE due to some reason?  Sorry I probably know
> nothing about RDMA, more information on that side might help too. E.g., is the
> hardware going to walk the software process page table too when doing RDMA (or
> is IOMMU page table used, or none)?

It does pin_user_pages_fast(), gets a list of DMA addresses for the
pages and then programs the hardware. The pin remains for a very long
time and the HW does DMA to those pages independently.

Userspace will write to the memory and trigger DMA reads and HW will
do DMA writes and trigger something close to an eventfd to let
userspace know to check the DMA'd data.

Very similar to how an in-kernel driver works. It is similar to VFIO
in how it uses pin_user_pages_fast().

Symptoms look to be like the DMA's are not arriving.

As before, the requirement is that once a process as done
pin_user_pages() the physical page stays with the process. If there is
a fork() and a COW then the current memory stays with the original
process and the fork'd child gets the copy. MADV_DONTFORK is expected
to ensure this..

We haven't been able to narrow to a reproduction that doesn't require
alot of hardware unfortunately. It seems oddly sensitive, maybe due to
memory layout triggering a COW..

Jason

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ