lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <66351a55a38db_e1f58294ce@iweiny-mobl.notmuch>
Date: Fri, 3 May 2024 10:09:41 -0700
From: Ira Weiny <ira.weiny@...el.com>
To: Jonathan Cameron <Jonathan.Cameron@...wei.com>, <ira.weiny@...el.com>
CC: Dave Jiang <dave.jiang@...el.com>, Fan Ni <fan.ni@...sung.com>, "Navneet
 Singh" <navneet.singh@...el.com>, Dan Williams <dan.j.williams@...el.com>,
	Davidlohr Bueso <dave@...olabs.net>, Alison Schofield
	<alison.schofield@...el.com>, Vishal Verma <vishal.l.verma@...el.com>,
	<linux-btrfs@...r.kernel.org>, <linux-cxl@...r.kernel.org>,
	<linux-kernel@...r.kernel.org>
Subject: Re: [PATCH 07/26] cxl/port: Add dynamic capacity size support to
 endpoint decoders

Jonathan Cameron wrote:
> On Sun, 24 Mar 2024 16:18:10 -0700
> ira.weiny@...el.com wrote:
> 
> > From: Navneet Singh <navneet.singh@...el.com>
> > 
> > To support Dynamic Capacity Devices (DCD) endpoint decoders will need to
> > map DC partitions (regions).  In addition to assigning the size of the
> > DC partition, the decoder must assign any skip value from the previous
> > decoder.  This must be done within a contiguous DPA space.
> > 
> > Two complications arise with Dynamic Capacity regions which did not
> > exist with Ram and PMEM partitions.  First, gaps in the DPA space can
> > exist between and around the DC Regions.  Second, the Linux resource
> > tree does not allow a resource to be marked across existing nodes within
> > a tree.
> > 
> > For clarity, below is an example of an 60GB device with 10GB of RAM,
> > 10GB of PMEM and 10GB for each of 2 DC Regions.  The desired CXL mapping
> > is 5GB of RAM, 5GB of PMEM, and all 10GB of DC1.
> > 
> >      DPA RANGE
> >      (dpa_res)
> > 0GB        10GB       20GB       30GB       40GB       50GB       60GB
> > |----------|----------|----------|----------|----------|----------|
> > 
> > RAM         PMEM                  DC0                   DC1
> >  (ram_res)  (pmem_res)            (dc_res[0])           (dc_res[1])
> > |----------|----------|   <gap>  |----------|   <gap>  |----------|
> > 
> >  RAM        PMEM                                        DC1
> > |XXXXX|----|XXXXX|----|----------|----------|----------|XXXXXXXXXX|
> > 0GB   5GB  10GB  15GB 20GB       30GB       40GB       50GB       60GB
> 
> 
> To add another corner to the example, maybe map only part of DC1?

Maybe.  See below.


[snip]

> > @@ -500,6 +617,21 @@ int cxl_dpa_alloc(struct cxl_endpoint_decoder *cxled, unsigned long long size)
> >  	else
> >  		free_pmem_start = cxlds->pmem_res.start;
> >  
> > +	/*
> > +	 * Limit each decoder to a single DC region to map memory with
> > +	 * different DSMAS entry.

This prevents more than 1 region per DC partition (region).

> > +	 */
> > +	dc_index = dc_mode_to_region_index(cxled->mode);
> > +	if (dc_index >= 0) {
> > +		if (cxlds->dc_res[dc_index].child) {
> > +			dev_err(dev, "Cannot allocate DPA from DC Region: %d\n",
> > +				dc_index);
> > +			rc = -EINVAL;
> > +			goto out;
> > +		}
> > +		free_dc_start = cxlds->dc_res[dc_index].start;
> > +	}
> > +
> >  	if (cxled->mode == CXL_DECODER_RAM) {
> >  		start = free_ram_start;
> >  		avail = cxlds->ram_res.end - start + 1;
> > @@ -521,12 +653,38 @@ int cxl_dpa_alloc(struct cxl_endpoint_decoder *cxled, unsigned long long size)
> >  		else
> >  			skip_end = start - 1;
> >  		skip = skip_end - skip_start + 1;
> > +	} else if (cxl_decoder_mode_is_dc(cxled->mode)) {
> > +		resource_size_t skip_start, skip_end;
> > +
> > +		start = free_dc_start;
> > +		avail = cxlds->dc_res[dc_index].end - start + 1;
> > +		if ((resource_size(&cxlds->pmem_res) == 0) || !cxlds->pmem_res.child)
> > +			skip_start = free_ram_start;
> > +		else
> > +			skip_start = free_pmem_start;
> > +		/*
> > +		 * If any dc region is already mapped, then that allocation
> > +		 * already handled the RAM and PMEM skip.  Check for DC region
> > +		 * skip.
> > +		 */
> > +		for (int i = dc_index - 1; i >= 0 ; i--) {
> > +			if (cxlds->dc_res[i].child) {
> > +				skip_start = cxlds->dc_res[i].child->end + 1;
> > +				break;
> > +			}
> > +		}
> > +
> > +		skip_end = start - 1;
> > +		skip = skip_end - skip_start + 1;
> 
> I notice in the pmem equivalent there is a case for part of the region already mapped.
> Can that not happen for a DC region as well?

See above check.  Each DC region (partition) was to be associated with a
single DSMAS entry.  I'm unclear now why that decision was made.

It does not seem hard to add this though.  Do we really need that ability
considering dax devices are likely going to be the main boundry for users
of a DC region?

Ira

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ