linux-kernel - Re: [RFC PATCH 05/14] PCI: add access functions for PCIe capabilities to hide PCIe spec differences

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Mon, 16 Jul 2012 14:57:58 -0400
From:	Don Dutile <ddutile@...hat.com>
To:	Bjorn Helgaas <bhelgaas@...gle.com>
CC:	Jiang Liu <liuj97@...il.com>, Jiang Liu <jiang.liu@...wei.com>,
	Yinghai Lu <yinghai@...nel.org>,
	Taku Izumi <izumi.taku@...fujitsu.com>,
	"Rafael J . Wysocki" <rjw@...k.pl>,
	Kenji Kaneshige <kaneshige.kenji@...fujitsu.com>,
	Yijing Wang <wangyijing@...wei.com>,
	Keping Chen <chenkeping@...wei.com>,
	linux-kernel@...r.kernel.org, linux-pci@...r.kernel.org
Subject: Re: [RFC PATCH 05/14] PCI: add access functions for PCIe capabilities
 to hide PCIe spec differences

On 07/16/2012 01:29 PM, Bjorn Helgaas wrote:
> On Sun, Jul 15, 2012 at 10:47 AM, Jiang Liu<liuj97@...il.com>  wrote:
>> On 07/13/2012 04:49 AM, Bjorn Helgaas wrote:
>>>> Hi Bjorn,
>>>>          It's a little risk to change these PCIe capabilities access
>>>> functions as void. On some platform with hardware error detecting/correcting
>>>> capabilities, such as EEH on Power, it would be better to return
>>>> error code if hardware error happens during accessing configuration registers.
>>>>          As I know, coming Intel Xeon processor may provide PCIe hardware
>>>> error detecting capability similar to EEH on power.
>>>
>>> I guess I'm playing devil's advocate here.  As a general rule, people
>>> don't check the return value of pci_read_config_*() or
>>> pci_write_config_*().  Unless you change them all, most callers of
>>> pci_pcie_capability_read_*() and _write_*() won't check the returns
>>> either.  So I'm not sure return values are an effective way to detect
>>> those hardware errors.
>>>
>>> How do these EEH errors get detected or reported today?  Do the
>>> drivers check every config access for success?  Adding those checks
>>> and figuring out how to handle errors at every possible point doesn't
>>> seem like a recipe for success.
>>
>> Hi Bjorn,
>>          Sorry for later reply, on travel these days.
>>          Yeah, it's true that most driver doesn't check return values of configuration
>> access functions, but there are still some drivers which do check return value of
>> pci_read_config_xxx(). For example, pciehp driver checks return value of CFG access
>> functions.
>>
>>          It's not realistic to enhance all drivers, but we may focus on a small set of
>> drivers for hardwares on specific high-end servers. For RAS features, we can never provide
>> perfect solutions, so we prefer some improvements. After all a small improvement is still
>> an improvement:)
>>
>>          I'm only familiar with PCI on IA64 and x86. For PowerPC, I just know that the OS
>> may query firmware whether there's some hardware faults if pci_cfg_read_xxx() returns
>> all 1s. For PCI on IA64, SAL may handle PCI hardware errors and return error code to
>> pci_cfg_read_xxx(). For x86, I think it will have some mechanisms to report hardware faults
>> like SAL on IA64.
>>
>>          So how about keeping consistence with pci_cfg_read_xxx() and pci_user_cfg_read_xxx()?
>
> My goal is "the caller should never have to know whether this is a v1
> or v2 capability."  Returning any error other than one passed along
> from pci_read/write_config_xxx() means we miss that goal.  Perhaps the
> goal is unattainable, but I haven't been convinced yet.
>
> I think hardware error detection is irrelevant to this discussion.
> After reading Documentation/PCI/pci-error-recovery.txt, I'm even less
> convinced that checking return values from pci_read/write_config_xxx()
> or pci_pcie_capability_read/write_xxx() is a useful way to detect
> hardware errors.
>
> Having drivers detect hardware failures by checking for config access
> errors is neither necessary nor sufficient.  It's not necessary
> because a platform can implement a config accessor that checks *every*
> access and reports failures to the driver via the pci_error_handler
> framework.  It's not sufficient because config accesses are rare
> (usually only at init-time), and hardware failures may happen at
> arbitrary other times.
>
> In my opinion, the only relevant question is whether a caller of
> pci_pcie_capability_read/write_xxx() needs to know whether a register
> is implemented (i.e., we have a v2 capability) or not.  For reads, I
> don't think there's a case where fabricating a value of zero when
> reading an unimplemented register is a problem.
>
> Writes are obviously more interesting, but I'm still not sure there's
> a case where silently dropping a write to an unimplemented register is
> a problem.  The "capability" registers are read-only, so there's no
> problem if we drop writes to them.  The "status" registers are
> generally RO or RW1C, where it's only meaningful to write a non-zero
> value if you're previously *read* a non-zero value.  The "control"
> registers are often RW, of course, but generally it's only meaningful
> to write a non-zero value when a non-zero bit in the "capability"
> register has previously told you that something is supported.
>
> Bjorn
+1
Returning 0 on capability reads -- due to unimplemented
features/register or due to failures,
should translate into the (core) code doing no writes.
Thus, the reason I suggested returning 0 on failure in original posting.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/