lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20181119214110.GJ4890@ziepe.ca>
Date:   Mon, 19 Nov 2018 14:41:10 -0700
From:   Jason Gunthorpe <jgg@...pe.ca>
To:     Jerome Glisse <jglisse@...hat.com>
Cc:     Tim Sell <timothy.sell@...sys.com>, linux-doc@...r.kernel.org,
        Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
        Zaibo Xu <xuzaibo@...wei.com>, zhangfei.gao@...mail.com,
        linuxarm@...wei.com, haojian.zhuang@...aro.org,
        Christoph Lameter <cl@...ux.com>,
        Hao Fang <fanghao11@...wei.com>,
        Gavin Schenk <g.schenk@...elmann.de>,
        Leon Romanovsky <leon@...nel.org>,
        RDMA mailing list <linux-rdma@...r.kernel.org>,
        Vinod Koul <vkoul@...nel.org>,
        Doug Ledford <dledford@...hat.com>,
        Uwe Kleine-König 
        <u.kleine-koenig@...gutronix.de>,
        David Kershner <david.kershner@...sys.com>,
        Kenneth Lee <nek.in.cn@...il.com>,
        Johan Hovold <johan@...nel.org>,
        Cyrille Pitchen <cyrille.pitchen@...e-electrons.com>,
        Sagar Dharia <sdharia@...eaurora.org>,
        Jens Axboe <axboe@...nel.dk>, guodong.xu@...aro.org,
        linux-netdev <netdev@...r.kernel.org>,
        Randy Dunlap <rdunlap@...radead.org>,
        linux-kernel@...r.kernel.org, Zhou Wang <wangzhou1@...ilicon.com>,
        linux-crypto@...r.kernel.org,
        Philippe Ombredanne <pombredanne@...b.com>,
        Sanyog Kale <sanyog.r.kale@...el.com>,
        Kenneth Lee <liguozhu@...ilicon.com>,
        "David S. Miller" <davem@...emloft.net>,
        linux-accelerators@...ts.ozlabs.org
Subject: Re: [RFCv3 PATCH 1/6] uacce: Add documents for WarpDrive/uacce

On Mon, Nov 19, 2018 at 04:33:20PM -0500, Jerome Glisse wrote:
> On Mon, Nov 19, 2018 at 02:26:38PM -0700, Jason Gunthorpe wrote:
> > On Mon, Nov 19, 2018 at 03:26:15PM -0500, Jerome Glisse wrote:
> > > On Mon, Nov 19, 2018 at 01:11:56PM -0700, Jason Gunthorpe wrote:
> > > > On Mon, Nov 19, 2018 at 02:46:32PM -0500, Jerome Glisse wrote:
> > > > 
> > > > > > ?? How can O_DIRECT be fine but RDMA not? They use exactly the same
> > > > > > get_user_pages flow, right? Can we do what O_DIRECT does in RDMA and
> > > > > > be fine too?
> > > > > > 
> > > > > > AFAIK the only difference is the length of the race window. You'd have
> > > > > > to fork and fault during the shorter time O_DIRECT has get_user_pages
> > > > > > open.
> > > > > 
> > > > > Well in O_DIRECT case there is only one page table, the CPU
> > > > > page table and it gets updated during fork() so there is an
> > > > > ordering there and the race window is small.
> > > > 
> > > > Not really, in O_DIRECT case there is another 'page table', we just
> > > > call it a DMA scatter/gather list and it is sent directly to the block
> > > > device's DMA HW. The sgl plays exactly the same role as the various HW
> > > > page list data structures that underly RDMA MRs.
> > > > 
> > > > It is not a page table that matters here, it is if the DMA address of
> > > > the page is active for DMA on HW.
> > > > 
> > > > Like you say, the only difference is that the race is hopefully small
> > > > with O_DIRECT (though that is not really small, NVMeof for instance
> > > > has windows as large as connection timeouts, if you try hard enough)
> > > > 
> > > > So we probably can trigger this trouble with O_DIRECT and fork(), and
> > > > I would call it a bug :(
> > > 
> > > I can not think of any scenario that would be a bug with O_DIRECT.
> > > Do you have one in mind ? When you fork() and do other syscall that
> > > affect the memory of your process in another thread you should
> > > expect non consistant results. Kernel is not here to provide a fully
> > > safe environement to user, user can shoot itself in the foot and
> > > that's fine as long as it only affect the process itself and no one
> > > else. We should not be in the business of making everything baby
> > > proof :)
> > 
> > Sure, I setup AIO with O_DIRECT and launch a read.
> > 
> > Then I fork and dirty the READ target memory using the CPU in the
> > child.
> > 
> > As you described in this case the fork will retain the physical page
> > that is undergoing O_DIRECT DMA, and the parent gets a new copy'd page.
> > 
> > The DMA completes, and the child gets the DMA'd to page. The parent
> > gets an unchanged copy'd page.
> > 
> > The parent gets the AIO completion, but can't see the data.
> > 
> > I'd call that a bug with O_DIRECT. The only correct outcome is that
> > the parent will always see the O_DIRECT data. Fork should not cause
> > the *parent* to malfunction. I agree the child cannot make any
> > prediction what memory it will see.
> > 
> > I assume the same flow is possible using threads and read()..
> > 
> > It is really no different than the RDMA bug with fork.
> > 
> 
> Yes and that's expected behavior :) If you fork() and have anything
> still in flight at time of fork that can change your process address
> space (including data in it) then all bets are of.
> 
> At least this is my reading of fork() syscall.

Not mine.. I can't think of anything else that would have this
behavior.

All traditional syscalls, will properly dirty the pages of the
parent. ie if I call read() in a thread and do fork in another thread,
then not seeing the data after read() completes is clearly a bug. All
other syscalls are the same.

It is bonkers that opening the file with O_DIRECT would change this
basic behavior. I'm calling it a bug :)

Jason

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ