[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <a513868f-addf-3774-2a43-b65262b7db4e@linux.intel.com>
Date: Tue, 28 Jan 2020 15:19:43 -0500
From: "Liang, Kan" <kan.liang@...ux.intel.com>
To: "Sudarikov, Roman" <roman.sudarikov@...ux.intel.com>,
Greg KH <gregkh@...uxfoundation.org>
Cc: Andi Kleen <ak@...ux.intel.com>, peterz@...radead.org,
mingo@...hat.com, acme@...nel.org, mark.rutland@....com,
alexander.shishkin@...ux.intel.com, jolsa@...hat.com,
namhyung@...nel.org, linux-kernel@...r.kernel.org,
eranian@...gle.com, bgregg@...flix.com, alexander.antonov@...el.com
Subject: Re: [PATCH v4 2/2] perf x86: Exposing an Uncore unit to PMON for Intel Xeon® server platform
On 1/28/2020 9:55 AM, Sudarikov, Roman wrote:
> On 21.01.2020 20:15, Greg KH wrote:
>> On Tue, Jan 21, 2020 at 07:15:56PM +0300, Sudarikov, Roman wrote:
>>> On 17.01.2020 19:54, Greg KH wrote:
>>>> On Fri, Jan 17, 2020 at 08:23:57AM -0800, Andi Kleen wrote:
>>>>>> I thought I was nice and gentle last time and said that this was a
>>>>>> really bad idea and you would fix it up. That didn't happen, so I am
>>>>>> being explicit here, THIS IS NOT AN ACCEPTABLE FILE OUTPUT FOR A
>>>>>> SYSFS
>>>>>> FILE.
>>>>> Could you suggest how such a 1:N mapping should be expressed
>>>>> instead in
>>>>> sysfs?
>>>> I have yet to figure out what it is you all are trying to express here
>>>> given a lack of Documentation/ABI/ file :)
>>>>
>>>> But again, sysfs is ONE VALUE PER FILE. You have a list of items here,
>>>> that is bounded only by the number of devices in the system at the
>>>> moment. That number will go up in time, as we all know. So this is
>>>> just not going to work at all as-is.
>>>>
>>>> greg k-h
>>> Hi Greg,
>>>
>>> Technically, the motivation behind this patch is to enable Linux perf
>>> tool
>>> to attribute IO traffic to IO device.
>>>
>>> Currently, perf tool provides interface to configure IO PMUs only
>>> without
>>> any
>>> context.
>>>
>>> Understanding IIO stack concept to find which IIO stack that particular
>>> IO device is connected to, or to identify an IIO PMON block to program
>>> for monitoring specific IIO stack assumes a lot of implicit knowledge
>>> about given Intel server platform architecture.
>> Is "IIO" being used here the same way that drivers/iio/ is in the
>> kernel, or is this some other term? If it is the same, why isn't the
>> iio developers involved in this? If it is some other term, please
>> always define it and perhaps pick a different name :)
> The term "IIO" (Integrated IO) in that context refers to set of PMUs
> which are
> responsible for monitoring traffic crossing PCIe domain boundaries. It's
> specific
> for Intel Xeon server line and supported by Linux kernel perf tool
> starting v4.9.
> So I'm just referring to what's already in the kernel :)
>>> Please consider the following mapping schema:
>>>
>>> 1. new "mapping" directory is to be added under each uncore_iio_N
>>> directory
>> What is uncore_iio_N? A struct device? Or something else?
> It's interface to corresponding IIO PMU, should be struct device
>>> 2. that "mapping" directory is supposed to contain symlinks named "dieN"
>>> which are pointed to corresponding root bus.
>>> Below is how it looks like for 2S machine:
>>>
>>> # ll uncore_iio_0/mapping/
>>> lrwxrwxrwx 1 root root 0 Jan 20 23:55 die0 ->
>>> ../../pci0000:00/pci_bus/0000:00
>> Where did "pci_bus" come from in there? I don't see under /sys/devices/
>> for my pci bridges.
>>
>>> lrwxrwxrwx 1 root root 0 Jan 20 23:55 die1 ->
>>> ../../pci0000:80/pci_bus/0000:80
>>>
>>> # ll uncore_iio_1/mapping/
>>> lrwxrwxrwx 1 root root 0 Jan 20 23:55 die0 ->
>>> ../../pci0000:17/pci_bus/0000:17
>>> lrwxrwxrwx 1 root root 0 Jan 20 23:55 die1 ->
>>> ../../pci0000:85/pci_bus/0000:85
>>>
>>> # ll uncore_iio_2/mapping/
>>> lrwxrwxrwx 1 root root 0 Jan 20 23:55 die0 ->
>>> ../../pci0000:3a/pci_bus/0000:3a
>>> lrwxrwxrwx 1 root root 0 Jan 20 23:55 die1 ->
>>> ../../pci0000:ae/pci_bus/0000:ae
>>>
>>> # ll uncore_iio_3/mapping/
>>> lrwxrwxrwx 1 root root 0 Jan 20 23:55 die0 ->
>>> ../../pci0000:5d/pci_bus/0000:5d
>>> lrwxrwxrwx 1 root root 0 Jan 20 23:55 die1 ->
>>> ../../pci0000:d7/pci_bus/0000:d7
>> Why have a subdir here?
> Just for convenience. I can put it the same level as other attributes
> (cpumask etc).
> Please let me know which layout to choose.
>> Anyway, yes, that would make sense, if userspace can actually do
>> something with that, can it?
> Sure! The linux perf tool will use it to attribute IO traffic to devices.
> Initially the feature was sent for review containing both kernel[1] and
> user space[2] parts, but later it was decided to finalize kernel part first
> and then proceed with user space.
>
> [1]
> https://lore.kernel.org/lkml/20191126163630.17300-2-roman.sudarikov@linux.intel.com/
>
> [2]
> https://lore.kernel.org/lkml/20191126163630.17300-5-roman.sudarikov@linux.intel.com/
>
>>
>> Also, what tears those symlinks down when you remove those pci devices
>> from the system? Shouldn't you have an entry in the pci device itself
>> for this type of thing? And if so, isn't this really just a "normal"
>> class type driver you are writing? That should handle all of the
>> symlinks and stuff for you automatically, right?
> The IIO PMUs by design monitors traffic crossing integrated pci root ports.
> For each IIO PMU the feature creates symlinks to its pci root port on
> each node.
Can we just simply assign the BUS# to it as below?
# cat uncore_iio_1/mapping/die0
0000:00
I'm not sure why we need a symlink here.
Also, if the BUS is removed, I think we may want to update mapping as well.
Thanks,
Kan
> Those pci devices, by its nature, can not be "just removed". If the SOC is
> designed the way that some integrated root port is not present
> then the case will be correctly handled by the feature.
>> thanks,
>>
>> greg k-h
>
>
Powered by blists - more mailing lists