[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4fae49c3-427a-90f3-faa0-033e8c32e336@linux.ibm.com>
Date: Thu, 10 Jun 2021 13:31:02 +0530
From: Pratik Sampat <psampat@...ux.ibm.com>
To: Fabiano Rosas <farosas@...ux.ibm.com>, mpe@...erman.id.au,
benh@...nel.crashing.org, paulus@...ba.org,
linuxppc-dev@...ts.ozlabs.org, kvm-ppc@...r.kernel.org,
linux-kernel@...r.kernel.org, pratik.r.sampat@...il.com
Subject: Re: [RFC] powerpc/pseries: Interface to represent PAPR firmware
attributes
On 10/06/21 5:33 am, Fabiano Rosas wrote:
> Pratik Sampat <psampat@...ux.ibm.com> writes:
>
>>>> 3. version info - 1 byte
>>>> 4. A data array of size num attributes, which contains the following:
>>>> a. attribute ID - 8 bytes
>>>> b. attribute value in number - 8 bytes
>>>> c. attribute name in string - 64 bytes
>>>> d. attribute value in string - 64 bytes
>>> Is this new hypercall already present in the spec? These seem a bit
>>> underspecified to me.
>> Yes, it is present in the spec. I probably summarized a little more than needed
>> here and I could expand upon below.
>>
>> The input buffer recives the following data:
>>
>> 1. “flags”:
>> a. Bit 0: singleAttribute
>> If set to 1, only return the single attribute matching firstAttributeId.
>> b. Bits 1-63: Reserved
>> 2. “firstAttributeId”: The first attribute to retrieve
>> 3. “bufferAddress”: The logical real address of the start of the output buffer
>> 4. “bufferSize”: The size in bytes of the output buffer
>>
>>
>> From the document, the format of the output buffer is as follows:
>>
>> Table 1 --> output buffer
>> ================================================================================
>> | Field Name | Byte | Length | Description
>> | | Offset | in Bytes |
>> ================================================================================
>> | NumberOf | | | Number of Attributes in Buffer
>> | AttributesInBuffer | 0x000 | 0x08 |
>> --------------------------------------------------------------------------------
>> | AttributeArrayOffset | 0x008 | 0x08 | Byte offset to start of Array
>> | | | | of Attributes
>> | | | |
>> --------------------------------------------------------------------------------
>> | OutputBufferData | | | Version of the Header.
>> | HeaderVersion | 0x010 | 0x01 | The header will be always
>> | AttributesInBuffer | | | backward compatible, and changes
>> | | | | will not impact the Array of
>> | | | | attributes.
>> | | | | Current version = 0x01
> This is not clear to me. In the event of a header version change, is the
> total set of attributes guaranteed to remain the same? Or only the array
> layout? We might not need to expose the version information after all.
I believe, the way versioning currently works is that if any new attribute is
added/modified to the list, this will entail a new version.
Regardless, the older attributes and their ids will not change and will still
be backwards compatible.
If the versioning does change, this patch does introduce a version check and
will fail to populate the sysfs and, a tool like powerpc-utils will not read
incorrect/non-coherent information.
So I'm inclined also believe now that verisoning information may not be needed
to expose to userspace.
>> --------------------------------------------------------------------------------
>> | ArrayOfAttributes | | | The array will contain
>> | | | | "NumberOfAttributesInBuffer"
>> | | | | array elements not to exceed
>> | | | | the size of the buffer.
>> | | | | Layout of the array is
>> | | | | detailed in Table 2.
>> --------------------------------------------------------------------------------
>>
>>
>> Table 2 --> Array of attributes
>> ================================================================================
>> | Field Name | Byte | Length | Description
>> | | Offset | in Bytes |
>> ================================================================================
>> | 1st AttributeId | 0x000 | 0x08 | The ID of the Attribute
>> --------------------------------------------------------------------------------
>> | 1st AttributeValue | 0x008 | 0x08 | The numerical value of
>> | | | | the attribute
>> --------------------------------------------------------------------------------
>> | 1st AttributeString | 0x010 | 0x40 | The ASCII string
>> | Description | | | description of the
>> | | | | attribute, up to 63
>> | | | | characters plus a NULL
>> | | | | terminator.
> There is a slight disconnect in that this is called "description" by the
> spec, which makes me think they could eventually have something more
> verbose than what you'd expect from "name".
>
> So they could give us either: "Frequency" or "The Frequency in GigaHertz".
Yes, the description can be more verbose, like I can see attributes with the
description as "Minimum Frequency (MHz)". That's probably why parsing based on
IDs is a better approach.
>
>> --------------------------------------------------------------------------------
>> | 1st AttributeValue | 0x050 | 0x40 | The ASCII string
>> | StringDescription | | | description of the
>> | | | | attribute value, up to 63
>> | | | | characters plus a NULL
>> | | | | terminator. If this
>> | | | | contains only a NULL
>> | | | | terminator, then there is
>> | | | | no ASCII string
>> | | | | associated with AttributeValue.
>> --------------------------------------------------------------------------------
>> | .... | | |
>>
>>
>>>> The new H_CALL exports information in direct string value format, hence
>>>> a new interface has been introduced in /sys/firmware/papr to export
>>> Hm.. Maybe this should be something less generic than "papr"?
>> The interface naming was inspired from /sys/firmware/opal's naming convention.
>> We believed the name PAPR could serve as more generic name to be used by both
>> Linux running on PHYP and linux on KVM.
> Right, I agree with that rationale, but /opal has identifiable elements
> in it whereas /papr would have the generic "attr_X_name", which does not
> give much hint about what they are.
>
> We also expect people to iterate the "attr_X_*" files, so if we decide
> to add something else under /papr in the future, that would potentially
> cause issues with any tool that just lists the content of the directory.
>
> So maybe we should be proactive and put the hcall stuff inside a
> subdirectory already. /papr/energy_scale_attrs comes to mind, but I
> don't have a strong opinion on the particular name.
Encapsulating it within another directory like energy_scale_attrs does make
sense and keeps the PAPR directory open to more such information going forward.
>> If you have something more concrete in mind, please let me know. I'm open to
>> suggestions.
>>
>>>> this information to userspace in an extensible pass-through format.
>>>> The H_CALL returns the name, numeric value and string value. As string
>>>> values are in human readable format, therefore if the string value
>>>> exists then that is given precedence over the numeric value.
>>> So the hypervisor could simply not send the string representation? How
>>> will the userspace tell the difference since they are reading everything
>>> from a file?
>>>
>>> Overall I'd say we should give the data in a more structured way and let
>>> the user-facing tool do the formatting and presentation.
>> That's a valid concern, the design for this was inspired from hwmon's interface
>> to housing the sensor information.
>>
>> One alternative to add more structure to this format could be to introduce:
>> attr_X_name, attr_X_num_val, attr_X_str_val
>>
>> However, in some cases like min/max frequency the string value is empty. In
>> that case the file attr_X_str_val will also be empty.
>> Is that an acceptable format of having empty files that in some cases will
>> never be populated?
> I'm thinking yes, but I'm not sure. Let's see if someone else has a say
> in this.
Sure, if we can have empty sysfs files, then this presents a coherent interface.
@mpe, can you weigh in here, can we have an interface where we have the following structure:
/sys/firmware/papr/energy_scale_attrs/
|-- <id>/
|-- desc
|-- value
|-- value_desc
where value_desc can be empty in some case?
If so, can we leave them empty or do we need to have them populated with a
string "NULL"/"NONE"?
>
>> We also went ahead to confirm with the SPEC team that if a string value exists
>> in their buffer, that must be given precedence.
> Huh.. That must be a recommendation only. The hypervisor has no control
> over how people present the information in userspace.
>
>> Another alternative format could to keep attr_X_name, attr_X_val intact but
>> change what X means. Currently X is just an iteratively increasing number. But
>> X can also serve as an ID which we get from H_CALL output buffer.
> This seems like a good idea. It makes it easier to correlate the
> attribute with what is in PAPR.
>
>> In this case, we should also include some versioning so that the tool now also
>> has cognizance of contents of each file.
>>
>>>> The format of exposing the sysfs information is as follows:
>>>> /sys/firmware/papr/
>>>> |-- attr_0_name
>>>> |-- attr_0_val
>>>> |-- attr_1_name
>>>> |-- attr_1_val
>>>> ...
>>> How do we keep a stable interface with userspace? Say the hypervisor
>>> decides to add or remove attributes, change their order, string
>>> representation, etc? It will inform us via the version field, but that
>>> is lost when we output this to sysfs.
>>>
>>> I get that if the userspace just iterate over the contents of the
>>> directory then nothing breaks, but there is not much else it could do it
>>> seems.
>> Fair point, having the version exposed to the sysfs does seem crucial.
>>
>> Currently in ppc-utils we iterate over all the information, however as you
>> rightly pointed out there may be other tools needing just specific information.
>> The alternative I suggested a few sentences above to include ID based attribute
>> naming and versioning maybe a more elegant way of solving this problem.
>>
>> What are your thoughts on a design like this?
>>
> Based on all the new information you provided, I'd say present all the
> data and group it under the ID:
>
> /sys/firmware/papr/energy_scale_attrs/
> |-- <id>/
> |-- desc
> |-- value
> |-- value_desc
> |-- <id>/
> |-- desc
> |-- value
> |-- value_desc
>
> Is that workable?
If we can confirm if value descriptions can be empty, then I too think this is
a good interface to introduce for energy attributes.
Thanks for your feedback.
Pratik
>>>> The energy information that is exported is useful for userspace tools
>>>> such as powerpc-utils. Currently these tools infer the
>>>> "power_mode_data" value in the lparcfg, which in turn is obtained from
>>>> the to be deprecated H_GET_EM_PARMS H_CALL.
>>>> On future platforms, such userspace utilities will have to look at the
>>>> data returned from the new H_CALL being populated in this new sysfs
>>>> interface and report this information directly without the need of
>>>> interpretation.
>>>>
>>>> Signed-off-by: Pratik R. Sampat <psampat@...ux.ibm.com>
>> Thanks
>> Pratik
Powered by blists - more mailing lists