[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-Id: <20170407162224.c81f50838be6add33a419adc@arm.com>
Date: Fri, 7 Apr 2017 16:22:24 +0100
From: Kim Phillips <kim.phillips@....com>
To: Mark Rutland <mark.rutland@....com>
Cc: Will Deacon <will.deacon@....com>,
<linux-arm-kernel@...ts.infradead.org>, <marc.zyngier@....com>,
<alex.bennee@...aro.org>, <christoffer.dall@...aro.org>,
<tglx@...utronix.de>, <peterz@...radead.org>,
<alexander.shishkin@...ux.intel.com>, <robh@...nel.org>,
<suzuki.poulose@....com>, <pawel.moll@....com>,
<mathieu.poirier@...aro.org>, <mingo@...hat.com>,
<linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v2 5/6] drivers/perf: Add support for ARMv8.2
Statistical Profiling Extension
On Fri, 7 Apr 2017 12:31:07 +0100
Mark Rutland <mark.rutland@....com> wrote:
> Hi Kim,
Hi Mark,
> On Fri, Apr 07, 2017 at 11:52:41AM +0100, Kim Phillips wrote:
> > On Thu, 6 Apr 2017 19:45:04 +0100
> > Mark Rutland <mark.rutland@....com> wrote:
> > > On Thu, Apr 06, 2017 at 07:33:07PM +0100, Kim Phillips wrote:
> > > > On Thu, 6 Apr 2017 17:18:15 +0100
> > > > Will Deacon <will.deacon@....com> wrote:
>
> > > > > + if ((reg & BIT(PMSFCR_EL1_FL_SHIFT)) &&
> > > > > + !(spe_pmu->features & SPE_PMU_FEAT_FILT_LAT))
> > > > > + return -EOPNOTSUPP;
>
> > > > Can you please explain why we're not emitting messages to dmesg here?:
> > > >
> > > > https://patchwork.kernel.org/patch/9545979/
> > >
> > > The above cases are not (system) errors, and using dev_err (even
> > > ratelimited) is certainly not appropriate. These are pr_debug() at best.
> >
> > They are driver errors: the input parameters to the driver were bad,
> > and it failed to execute.
>
> This is not a driver error; the user requested something that the driver
> does not or cannot support, and the driver responded appropriately and
> correctly.
>
> A *_err print is for when something is truly wrong at the system level,
> for example if the HW behaves unexpectedly, FW reports something
> invalid, or some kernel code invariant has been violated. It is not for
> every case where an error code is returned.
The driver is trying to report an error: in the above example, it's
reporting that it cannot support an operation by returning
-*E*OPNOTSUPP: an ERROR because it was unable to complete the request:
the request failed. Unlike e.g., a warning where something may not
have been quite right, but went along with executing the operation
anyway.
> > pr_debug - to me at least - should indicate progress during normal
> > operation.
>
> Quite frankly, the above is the normal operation of the driver. We don't
> pr_err in every syscall path when validating arguments provided by the
> user, and this is no different.
It is different because this particular system call's error reporting
is notoriously bad, vis-a-vis this discussion.
debug-level messages such as pr_debug, dev_dbg can easily be hidden
from the user, depending on the distro's default dmesg level
preferences.
> > > The dmesg is not always the appropriate place to dump such information,
> > > even if it happens to be convenient. As part of usual operation, dmesg
> > > should be very quiet, and we don't log messages elsewhere where the user
> > > passes some parameter the kernel does not like.
> > >
> > > These messages are really only useful to those developing tools (such as
> > > yourself).
> >
> > We had a customer come back with a simple usage failure because they
> > were running a kernel with a different VHE configuration, blindly
> > failing the hv exclusion check above. They had to manually modify the
> > driver to find the cause. So it affects all users, not just me.
>
> I agree that we can and should do something better for regular users. I
> disagree that dmesg is the solution.
>
> What we need to do is expose information such that the tool can provide
> useful messages at the point of use, which are guaranteed to correspond
> to a particular action.
>
> A user may not be aware of dmesg (e.g. if they're SSH'd in they're
> unlikely to see it), and cannot match messages to particular actions
> when there are multiple applications and/or users. So this doesn't solve
> the general case.
dmesg isn't the ultimate solution, no, but it's currently what we
have.
Meanwhile, perf occasionally says things like:
"/bin/dmesg may provide additional information"
so people know to look there, for now at least.
> > Unless you're implying the above code be duplicated in the perf tool
> > somehow?
>
> Some feature probing could be done by the tool. We already do that today
> for CPU PMUs. If you take a look at perf_evsel__open, you can see it
> automatically determines whether the kernel supports guest exclusion for
> CPU PMU events.
>
> We might be able to do something similar by default to cater for the VHE
> case you mentioned above.
>
> We could also expose information under sysfs to explicitly tell the tool
> what is and is not supported by the driver.
This is essentially the gist of the code in question: that's exactly
what it's doing, except it's not doing it via sysfs.
I think the best solution is what Will originally pointed me to during
an earlier review round: adding capability to perf syscalls to return
error strings:
https://lkml.org/lkml/2015/8/24/506
but we're not there yet.
> > > There are some cases where they're actively harmful (e.g.
> > > when fuzzing).
> >
> > I'd expect fuzzer users to be more amenable to manually modifying the
> > driver rather than regular users of the driver.
>
> When fuzzing, I take a mainline, defconfig kernel, and run it through
> its paces. I don't touch each and every driver.
>
> As above, prints are not the solution for regular users.
In the context of this patch review, dmesg is all we have for now:
let's use it please.
Kim
Powered by blists - more mailing lists