lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aday6q1o5re.fsf@cisco.com>
Date:	Sun, 02 Aug 2009 21:55:49 -0700
From:	Roland Dreier <rdreier@...co.com>
To:	Brice Goglin <Brice.Goglin@...ia.fr>
Cc:	Andrew Morton <akpm@...ux-foundation.org>,
	linux-kernel@...r.kernel.org, jsquyres@...co.com,
	rostedt@...dmis.org
Subject: Re: [PATCH v3] ummunotify: Userspace support for MMU notifications


 > I like the interface but I have a couple questions:

Thanks.

 > 1) Why does userspace have to register these address ranges? I would
 > have just reported all invalidation evens and let user-space check which
 > ones are interesting. My feeling is that the number of invalidation
 > events will usually be lower than the number registered ranges, so
 > you'll report more events through the file descriptor, but userspace
 > will do a lot less ioctls.

A couple of reasons.  First, MMU notifier events may be delivered (in
the kernel) in interrupt context so the amount of allocation we can do
in a notifier hook is limited (and any allocation will fail sometimes).
So if we just want to report all events to userspace then I don't see
any was around having to sometimes deliver an event like "uh, some
events got lost" and have userspace have to flush everything.

I suspect that MPI workloads will hit the overflow case in practice,
since they probably want to run as close to out-of-memory as possible,
and the application may not enter the MPI library often enough to keep
the queue of ummunotify events short -- I can imagine some codes that do
a lot of memory management, enter MPI infrequently, and end up
overflowing the queue and flushing all registrations over and over.
Having userspace register ranges means I can preallocate a landing area
for each event and make the MMU notifier hook pretty simple.

Second, it turns out that having the filter does cut down quite a bit on
the events.  From running some Open MPI tests that Jeff provided, I saw
that there were often several times as many MMU notifier events
delivered in the kernel than ended up being reported to userspace.

 > 2) What happens in case of fork? If father+child keep reading from the
 > previously-open /dev/ummunotify, each event will be delivered only to
 > the first reader, right? Fork is always a mess in HPC, but maybe there's
 > something to do here.

It works just like any other file where fork results in two file
descriptors in two processes... as you point out the two processes can
step on each other.  (And in the ummunotify case the file remains
associated with the original mm)  However this is the case for simpler
stuff like sockets etc too, and I think uniformity of interface and
least surprise say that ummunotify should follow the same model.

 > 3) What's userspace supposed to do if 2 libraries need such events in
 > the same process? Should each of them open /dev/ummunotify separately?
 > Doesn't matter much for performance, just wondering.

I guess the libraries could work out some way to share things, but that
would require one library to pass events to the other or something like
that.  It should work fine for 2 libraries to have independent
ummunotify files open though (I've not tested but "what could go wrong"?).

Thanks,
  Roland

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ