[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <165f5d5b-34f2-40de-b0ec-8c1ca36babe8@lunn.ch>
Date: Fri, 2 May 2025 16:21:55 +0200
From: Andrew Lunn <andrew@...n.ch>
To: David Howells <dhowells@...hat.com>
Cc: David Hildenbrand <david@...hat.com>,
John Hubbard <jhubbard@...dia.com>,
"David S. Miller" <davem@...emloft.net>,
Jakub Kicinski <kuba@...nel.org>, willy@...radead.org,
netdev@...r.kernel.org, linux-mm@...ck.org
Subject: Re: MSG_ZEROCOPY and the O_DIRECT vs fork() race
On Fri, May 02, 2025 at 02:41:46PM +0100, David Howells wrote:
> Andrew Lunn <andrew@...n.ch> wrote:
>
> > > I'm looking into making the sendmsg() code properly handle the 'DIO vs
> > > fork' issue (where pages need pinning rather than refs taken) and also
> > > getting rid of the taking of refs entirely as the page refcount is going
> > > to go away in the relatively near future.
> >
> > Sorry, new to this conversation, and i don't know what you mean by DIO
> > vs fork.
>
> As I understand it, there's a race between O_DIRECT I/O and fork whereby if
> you, say, start a DIO read operation on a page and then fork, the target page
> gets attached to child and a copy made for the parent (because the refcount is
> elevated by the I/O) - and so only the child sees the result. This is made
> more interesting by such as AIO where the parent gets the completion
> notification, but not the data.
>
> Further, a DIO write is then alterable by the child if the DMA has not yet
> happened.
>
> One of the things mm/gup.c does is to work around this issue... However, I
> don't think that MSG_ZEROCOPY handles this - and so zerocopy sendmsg is, I
> think, subject to the same race.
For zerocopy, you probably should be talking to Eric Dumazet, David Wei.
I don't know too much about this, but from the Ethernet drivers
perspective, i _think_ it has no idea about zero copy. It is just
passed a skbuf containing data, nothing special about it. Once the
interface says it is on the wire, the driver tells the netdev core it
has finished with the skbuf.
So, i guess your question about CRC is to do with CoW? If the driver
does not touch the data, just DMA it out, the page could be shared
between the processes. If it needs to modify it, put CRCs into the
packet, that write means the page cannot be shared? If you have
scatter/gather you can place the headers in kernel memory and do
writes to set the CRCs without touching the userspace data. I don't
know, but i suspect this is how it is done. There is also an skbuf
operation to linearize a packet, which will allocate a new skbuf big
enough to contain the whole packet in a single segment, and do a
memcpy of the fragments. Not what you want for zerocopy, but if your
interface does not have the needed support, there is not much choice.
Andrew
Powered by blists - more mailing lists