[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <2024042708-outscore-dreadful-2c21@gregkh>
Date: Sat, 27 Apr 2024 13:12:33 +0200
From: Greg KH <gregkh@...uxfoundation.org>
To: Harold Johnson <harold.johnson@...adcom.com>
Cc: Jonathan Cameron <Jonathan.Cameron@...wei.com>,
Dan Williams <dan.j.williams@...el.com>, linux-cxl@...r.kernel.org,
Sreenivas Bagalkote <sreenivas.bagalkote@...adcom.com>,
Brett Henning <brett.henning@...adcom.com>,
Sumanesh Samanta <sumanesh.samanta@...adcom.com>,
linux-kernel@...r.kernel.org, Davidlohr Bueso <dave@...olabs.net>,
Dave Jiang <dave.jiang@...el.com>,
Alison Schofield <alison.schofield@...el.com>,
Vishal Verma <vishal.l.verma@...el.com>,
Ira Weiny <ira.weiny@...el.com>, linuxarm@...wei.com,
linux-api@...r.kernel.org,
Lorenzo Pieralisi <lpieralisi@...nel.org>,
"Natu, Mahesh" <mahesh.natu@...el.com>
Subject: Re: RFC: Restricting userspace interfaces for CXL fabric management
On Fri, Apr 26, 2024 at 02:25:29PM -0500, Harold Johnson wrote:
> A few examples:
> a) Temperature monitoring of a component or internal chip die
> temperatures. Could CXL define a standard OpCode to gather temperatures,
> yes it could; but is this really part of CXL? Then how many temperature
> elements and what does each element mean? This enters into the
> implementation and therefore is vendor specific. Unless the CXL spec
> starts to define the implementation, something along the lines of "thou
> shall have an average die temperature, rather than specific temperatures
> across a die", etc.
>
> b) Error counters, metrics, internal counters, etc. Could CXL define a
> set of common error counters, absolutely. PCIe has done some of this.
> However, a specific implementation may have counters and error reporting
> that are meaningful only to a specific design and a specific
> implementation rather than a "least common denominator" approach of a
> standard body.
>
> c) Performance counters, metric, indicators, etc. Performance can be very
> implementation specific and tweaking performance is likely to be
> implementation specific. Yes, generic and a least common denominator
> elements could be created, but are likely to limiting in realizing the
> maximum performance of an implementation.
>
> d) Logs, errors and debug information. In addition to spec defined
> logging of CXL topology errors, specific designs will have logs, crash
> dumps, debug data that is very specific to a implementation. There are
> likely to be cases where a product that conforms to a specification like
> CXL, may have features that don't directly have anything to do with CXL,
> but where a standards based management interface can be used to configure,
> manage, and collect data for a non-CXL feature.
All of the above should be able to be handled by vendor-specific KERNEL
drivers that feed the needed information to the proper user/kernel apis
that the kernel already provides.
So while innovating at the hardware level is fine, follow the ways that
everyone has done this for other specification types (USB, PCI, etc.)
and just allow vendor drivers to provide the information. Don't do this
in crazy userspace drivers which will circumvent the whole reason we
have standard kernel/user apis in the first place for these types of
things.
> e) Innovation. I believe that innovation should be encouraged. There may
> be designs that support CXL, but that also incorporate unique and
> innovative features or functions that might service a niche market. The
> AI space is ripe for innovation and perhaps specialized features that may
> not make sense for the overall CXL specification.
>
> I think that in most cases Vendor specific opcodes are not used to
> circumvent the standards, but are used when the standards group has no
> interested in driving into the standard certain features that are clearly
> either implementation specific or are vendor specific additions that have
> a specific appeal to a select class of customer, but yet are not relevant
> to a specific standard.
Then fight this out in the specification groups, which are highly
political, and do not push that into the kernel space please. Again,
this is nothing new, we have all done this for specs for decades now,
allow vendor additions to the spec and handle that in the kernel and all
should be ok, right?
Or am I missing something obvious here where we would NOT want to do
what all other specs have done?
thanks,
greg k-h
Powered by blists - more mailing lists