lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <165f5d5b-34f2-40de-b0ec-8c1ca36babe8@lunn.ch>
Date: Fri, 2 May 2025 16:21:55 +0200
From: Andrew Lunn <andrew@...n.ch>
To: David Howells <dhowells@...hat.com>
Cc: David Hildenbrand <david@...hat.com>,
	John Hubbard <jhubbard@...dia.com>,
	"David S. Miller" <davem@...emloft.net>,
	Jakub Kicinski <kuba@...nel.org>, willy@...radead.org,
	netdev@...r.kernel.org, linux-mm@...ck.org
Subject: Re: MSG_ZEROCOPY and the O_DIRECT vs fork() race

On Fri, May 02, 2025 at 02:41:46PM +0100, David Howells wrote:
> Andrew Lunn <andrew@...n.ch> wrote:
> 
> > > I'm looking into making the sendmsg() code properly handle the 'DIO vs
> > > fork' issue (where pages need pinning rather than refs taken) and also
> > > getting rid of the taking of refs entirely as the page refcount is going
> > > to go away in the relatively near future.
> > 
> > Sorry, new to this conversation, and i don't know what you mean by DIO
> > vs fork.
> 
> As I understand it, there's a race between O_DIRECT I/O and fork whereby if
> you, say, start a DIO read operation on a page and then fork, the target page
> gets attached to child and a copy made for the parent (because the refcount is
> elevated by the I/O) - and so only the child sees the result.  This is made
> more interesting by such as AIO where the parent gets the completion
> notification, but not the data.
> 
> Further, a DIO write is then alterable by the child if the DMA has not yet
> happened.
> 
> One of the things mm/gup.c does is to work around this issue...  However, I
> don't think that MSG_ZEROCOPY handles this - and so zerocopy sendmsg is, I
> think, subject to the same race.

For zerocopy, you probably should be talking to Eric Dumazet, David Wei.

I don't know too much about this, but from the Ethernet drivers
perspective, i _think_ it has no idea about zero copy. It is just
passed a skbuf containing data, nothing special about it. Once the
interface says it is on the wire, the driver tells the netdev core it
has finished with the skbuf.

So, i guess your question about CRC is to do with CoW? If the driver
does not touch the data, just DMA it out, the page could be shared
between the processes. If it needs to modify it, put CRCs into the
packet, that write means the page cannot be shared? If you have
scatter/gather you can place the headers in kernel memory and do
writes to set the CRCs without touching the userspace data. I don't
know, but i suspect this is how it is done. There is also an skbuf
operation to linearize a packet, which will allocate a new skbuf big
enough to contain the whole packet in a single segment, and do a
memcpy of the fragments. Not what you want for zerocopy, but if your
interface does not have the needed support, there is not much choice.

	Andrew

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ