lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200429013955.2b59bd99@p-imbrenda>
Date:   Wed, 29 Apr 2020 01:39:55 +0200
From:   Claudio Imbrenda <imbrenda@...ux.ibm.com>
To:     Dave Hansen <dave.hansen@...el.com>
Cc:     Peter Zijlstra <peterz@...radead.org>,
        Andy Lutomirski <luto@...nel.org>, linux-next@...r.kernel.org,
        akpm@...ux-foundation.org, jack@...e.cz, kirill@...temov.name,
        "Edgecombe, Rick P" <rick.p.edgecombe@...el.com>,
        Sean Christopherson <sean.j.christopherson@...el.com>,
        borntraeger@...ibm.com, david@...hat.com, aarcange@...hat.com,
        linux-mm@...ck.org, frankja@...ux.ibm.com, sfr@...b.auug.org.au,
        jhubbard@...dia.com, linux-kernel@...r.kernel.org,
        linux-s390@...r.kernel.org, Will Deacon <will@...nel.org>,
        "Williams, Dan J" <dan.j.williams@...el.com>, pasic@...ux.ibm.com
Subject: Re: [PATCH v4 2/2] mm/gup/writeback: add callbacks for inaccessible
 pages

On Tue, 28 Apr 2020 12:43:45 -0700
Dave Hansen <dave.hansen@...el.com> wrote:

> On 4/21/20 2:31 PM, Dave Hansen wrote:
> > On 4/16/20 12:02 PM, Dave Hansen wrote:  
> >> On 4/16/20 9:34 AM, Claudio Imbrenda wrote:  
> >>>> Ahh, so this is *just* intended to precede I/O done on the page,
> >>>> when a non-host entity is touching the memory?  
> >>> yep  
> >> OK, so we've got to do an action that precedes *all* I/O to a page.
> >> That's not too bad.
> >>
> >> I still don't understand how this could work generally, though
> >> There are lots of places where I/O is done to a page without
> >> either going through __test_set_page_writeback() or gup() with
> >> FOLL_PIN set.
> >>
> >> sendfile() is probably the best example of this:
> >>
> >> 	fd = open("/normal/ext4/file", O_RDONLY);
> >> 	sendfile(socket_fd, fd, &off, count);
> >>
> >> There's no gup in sight since the file doesn't have an address and
> >> it's not being written to so there's no writeback.
> >>
> >> How does sendfile work?  
> > 
> > Did you manage to see if sendfile works (or any other operation that
> > DMAs file-backed data without being preceded by a gup)?  
> 
> It's been a couple of weeks with no response on this.

sorry, I've been busy with things

> From where I'm standing, we have a hook in the core VM that can't
> possibly work with some existing kernel functionality and has
> virtually no chance of getting used on a second architecture.

it seems to work at least for us, so it does possibly work :)

regarding second architectures: when we started sending these patches
around, there has been interest from some other architectures, so
just because nobody else needs them now, it doesn't mean nobody will
use them ever. Moreover this is the only way for us to reasonably
implement this (see below).

> It sounds like there may need to be some additional work here, but
> should these hooks stay in for 5.7?  Or, should we revert this patch
> and try again for 5.8?

I don't see why we should revert a patch that works as intended and
poses no overhead for non-users, whereas reverting it would break
functionality.


Now let me elaborate a little on the DMA API. There are some issues
with some of the bus types used on s390 when it comes to the DMA API.
Most I/O instructions on s390 need to allocate some small control blocks
for each operation, and those need to be under 2GB. Those control blocks
will be accessed directly by the hardware. The rest of the actual I/O
buffers have no restriction and can be anywhere (64 bits). 
Setting the DMA mask to 2GB means that all other buffers will be
almost always bounced, which is unacceptable. Especially since there are
no bounce buffers set up for s390x hosts anyway (they are set up only in
protected guests (and not in normal guests), so this was also introduced
quite recently).

Also notice that, until now, there has been no actual need to use the
DMA API on most s390 device drivers, hence why it's not being used
there. I know that you are used to need the DMA API for DMA operations
otherwise Bad Thingsā„¢ happen, but this is not the case on s390 (for
non-PCI devices at least).

So until the DMA API is fixed, there is no way to convert all the
drivers to the DMA API (which would be quite a lot of work in itself
already, but that's not the point here). A fix for the DMA API was
actually proposed by my colleague Halil several months ago now, but it
did not go through. His proposal was to allow architectures to override
the GFP flags for DMA allocations, to allow allocating some buffers
from some areas and some other buffers from other areas.


I hope this clarifies the matter a little :)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ