lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Thu, 14 Feb 2008 12:17:21 -0800
From:	"Caitlin Bestler" <caitlin.bestler@...il.com>
To:	"Christoph Lameter" <clameter@....com>
Cc:	"Steve Wise" <swise@...ngridcomputing.com>,
	"Rik van Riel" <riel@...hat.com>, steiner@....com,
	"Andrea Arcangeli" <andrea@...ranet.com>, a.p.zijlstra@...llo.nl,
	izike@...ranet.com, "Roland Dreier" <rdreier@...co.com>,
	linux-kernel@...r.kernel.org, avi@...ranet.com, linux-mm@...ck.org,
	daniel.blueman@...drics.com, "Robin Holt" <holt@....com>,
	general@...ts.openfabrics.org,
	"Andrew Morton" <akpm@...ux-foundation.org>,
	kvm-devel@...ts.sourceforge.net
Subject: Re: [ofa-general] Re: Demand paging for memory regions

On Thu, Feb 14, 2008 at 11:39 AM, Christoph Lameter <clameter@....com> wrote:
> On Thu, 14 Feb 2008, Steve Wise wrote:
>
>  > Note that for T3, this involves suspending _all_ rdma connections that are in
>  > the same PD as the MR being remapped.  This is because the driver doesn't know
>  > who the application advertised the rkey/stag to.  So without that knowledge,
>  > all connections that _might_ rdma into the MR must be suspended.  If the MR
>  > was only setup for local access, then the driver could track the connections
>  > with references to the MR and only quiesce those connections.
>  >
>  > Point being, it will stop probably all connections that an application is
>  > using (assuming the application uses a single PD).
>
>  Right but if the system starts reclaiming pages of the application then we
>  have a memory shortage. So the user should address that by not running
>  other apps concurrently. The stopping of all connections is still better
>  than the VM getting into major trouble. And the stopping of connections in
>  order to move the process memory into a more advantageous memory location
>  (f.e. using page migration) or stopping of connections in order to be able
>  to move the process memory out of a range of failing memory is certainly
>  good.
>

In that spirit, there are two important aspects of a suspend/resume API that
would enable the memory manager to solve problems most effectively:

1) The device should be allowed flexibility to extend the scope of the suspend
    to what it is capable of implementing -- rather than being forced
to say that
    it does not support suspend/;resume merely because it does so at a different
    granularity.

2) It is very important that users of this API understand that it is
only the RDMA
   device handling of incoming packets and WQEs that is being suspended. The
   peers are not suspended by this API, or even told that this end is
suspending.
   Unless the suspend is kept *extremely* short there will be adverse impacts.
   And "short" here is measured in network terms, not human terms. The blink
   of any eye is *way* too long. Any external dependencies between "suspend"
   and "resume" will probably mean that things will not work, especially if the
   external entities involve a disk drive.

So suspend/resume to re-arrange pages is one thing. Suspend/resume to cover
swapping out pages so they can be reallocated is an exercise in futility. By the
time you resume the connections will be broken or at the minimum damaged.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ