lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <6632d503f3ae5_e1f58294df@iweiny-mobl.notmuch>
Date: Wed, 1 May 2024 16:49:24 -0700
From: Ira Weiny <ira.weiny@...el.com>
To: Jonathan Cameron <Jonathan.Cameron@...wei.com>, <ira.weiny@...el.com>
CC: Dave Jiang <dave.jiang@...el.com>, Fan Ni <fan.ni@...sung.com>, "Navneet
 Singh" <navneet.singh@...el.com>, Dan Williams <dan.j.williams@...el.com>,
	Davidlohr Bueso <dave@...olabs.net>, Alison Schofield
	<alison.schofield@...el.com>, Vishal Verma <vishal.l.verma@...el.com>,
	<linux-btrfs@...r.kernel.org>, <linux-cxl@...r.kernel.org>,
	<linux-kernel@...r.kernel.org>, Chris Mason <clm@...com>, Josef Bacik
	<josef@...icpanda.com>, David Sterba <dsterba@...e.com>
Subject: Re: [PATCH 00/26] DCD: Add support for Dynamic Capacity Devices (DCD)

Jonathan Cameron wrote:
> 
> > 
> > Fan Ni's latest v5 of Qemu DCD was used for testing.[2]
> Hi Ira, Navneet.
> > 
> > Remaining work:
> > 
> > 	1) Integrate the QoS work from Dave Jiang
> > 	2) Interleave support
> 
> 
> More flag.  This one I think is potentially important and don't
> see any handling in here.

Nope I admit I missed the spec requirement.

> 
> Whilst an FM could in theory be careful to avoid sending a
> sparse set of extents, if the device is managing the memory range
> (which is possible all it supports) and the FM issues an Initiate Dynamic
> Capacity Add with Free (again may be all device supports) then we
> can't stop the device issuing a bunch of sparse extents.
> 
> Now it won't be broken as such without this, but every time we
> accept the first extent that will implicitly reject the rest.
> That will look very ugly to an FM which has to poke potentially many
> times to successfully allocate memory to a host.

This helps me to see see why the more bit is useful.

> 
> I also don't think it will be that hard to support, but maybe I'm
> missing something? 

Just a bunch of code and refactoring busy work.  ;-)  It's not rocket
science but does fundamentally change the arch again.

> 
> My first thought is it's just a loop in cxl_handle_dcd_add_extent()
> over a list of extents passed in then slightly more complex response
> generation.

Not exactly 'just a loop'.  No matter how I work this out there is the
possibility that some extents get surfaced and then the kernel tries to
remove them because it should not have.

To be most safe the cxl core is going to have to make 2 round trips to the
cxl region layer for each extent.  The first determines if the extent is
valid and creates the extent as much as possible.  The second actually
surfaces the extents.  However, if the surface fails then you might not
get the extents back.  So now we are in an invalid state.  :-/  WARN and
continue I guess?!??!

I think the safest way to handle this is add a new kernel notify event
called 'extent create' which stops short of surfacing the extent.  [I'm
not 100% sure how this is going to affect interleave.]

I think the safest logic for add is something like:

	cxl_handle_dcd_add_event()
		add_extent(squirl_list, extent);

		if (more bit) /* wait for more */
			return;

		/* Create extents to hedge the bets against failure */
		for_each(squirl_list)
			if (notify 'extent create' != ok)
				send_response(fail);
				return;
		
		for_each(squirl_list)
			if (notify 'surface' != ok)
				/*
				 * If the more bit was set, some extents
				 * have been surfaced and now need to be
				 * removed...
				 *
				 * Try to remove them and hope...
				 */
				WARN_ON('surface extents failed');
				for_each(squirl_list)
					notify 'remove without response'
				send_response(fail);
				return;

		send_response(squirl_list, accept);

The logic for remove is not changed AFAICS because the device must allow
for memory to be released at any time so the host is free to release each
of the extents individually despite the 'more' bit???

> 
> I don't want this to block getting initial DCD support in but it
> will be a bit ugly if we quickly support the more flag and then end
> up with just one kernel that an FM has to be careful with...

I'm not sure which is worse.  Given your use case above it seems like the
more bit may be more important for 'dumb' devices which want to add
extents in blocks before responding to the FM.  Thus complicating the FM.

It seems 'smarter' devices which could figure this out (not requiring the
more bit) are the ones which will be developed later.  So it seems the use
case time line is the opposite of what we need right now.

For that reason I'm inclined to try and get this in.

Ira

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ