[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAPcyv4gdbjHem9zwoCjj+6a2KJ5dyYro3OwPvNLSPTKpjeF0hg@mail.gmail.com>
Date: Tue, 15 Mar 2022 09:09:18 -0700
From: Dan Williams <dan.j.williams@...el.com>
To: Greg KH <gregkh@...uxfoundation.org>
Cc: "Luck, Tony" <tony.luck@...el.com>,
"Joseph, Jithu" <jithu.joseph@...el.com>,
"hdegoede@...hat.com" <hdegoede@...hat.com>,
"markgross@...nel.org" <markgross@...nel.org>,
"tglx@...utronix.de" <tglx@...utronix.de>,
"mingo@...hat.com" <mingo@...hat.com>,
"bp@...en8.de" <bp@...en8.de>,
"dave.hansen@...ux.intel.com" <dave.hansen@...ux.intel.com>,
"x86@...nel.org" <x86@...nel.org>, "hpa@...or.com" <hpa@...or.com>,
"corbet@....net" <corbet@....net>,
"andriy.shevchenko@...ux.intel.com"
<andriy.shevchenko@...ux.intel.com>,
"Raj, Ashok" <ashok.raj@...el.com>,
"rostedt@...dmis.org" <rostedt@...dmis.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"linux-doc@...r.kernel.org" <linux-doc@...r.kernel.org>,
"platform-driver-x86@...r.kernel.org"
<platform-driver-x86@...r.kernel.org>,
"patches@...ts.linux.dev" <patches@...ts.linux.dev>,
"Shankar, Ravi V" <ravi.v.shankar@...el.com>
Subject: Re: [RFC 00/10] Introduce In Field Scan driver
On Tue, Mar 15, 2022 at 9:04 AM Dan Williams <dan.j.williams@...el.com> wrote:
>
> On Tue, Mar 15, 2022 at 8:27 AM Greg KH <gregkh@...uxfoundation.org> wrote:
> >
> > On Tue, Mar 15, 2022 at 02:59:03PM +0000, Luck, Tony wrote:
> > > >> This seems a novel use of uevent ... is it OK, or is is abuse?
> > > >
> > > > Don't create "novel" uses of uevents. They are there to express a
> > > > change in state of a device so that userspace can then go and do
> > > > something with that information. If that pattern fits here, wonderful.
> > >
> > > Maybe Dan will chime in here to better explain his idea. I think for
> > > the case where the core test fails, there is a good match with uevent.
> > > The device (one CPU core) has changed state from "working" to
> > > "untrustworthy". Userspace can do things like: take the logical CPUs
> > > on that core offline, initiate a service call, or in a VMM cluster environment
> > > migrate work to a different node.
> >
> > Again, I have no idea what you are doing at all with this driver, nor
> > what you want to do with it.
> >
> > Start over please.
> >
> > What is the hardware you have to support?
> >
> > What is the expectation from userspace with regards to using the
> > hardware?
>
> Here is what I have learned about this driver since engaging on this
> patch set. Cores go bad at run time. Datacenters can detect them at
> scale.
Tony pointed me to this video if you have not seen it:
https://www.youtube.com/watch?v=QMF3rqhjYuM
Powered by blists - more mailing lists