lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Message-ID: <50C05466.705@numascale-asia.com> Date: Thu, 06 Dec 2012 16:16:38 +0800 From: Daniel J Blueman <daniel@...ascale-asia.com> To: Bjorn Helgaas <bhelgaas@...gle.com> CC: linux-pci@...r.kernel.org, linux-kernel@...r.kernel.org, Steffen Persvold <sp@...ascale.com> Subject: Re: [PATCH v2 RESEND] Add NumaChip remote PCI support On 01/12/2012 00:45, Bjorn Helgaas wrote: > On Thu, Nov 29, 2012 at 10:28 PM, Daniel J Blueman >> On 29/11/2012 07:08, Bjorn Helgaas wrote: >>> On Wed, Nov 21, 2012 at 1:39 AM, Daniel J Blueman >>> <daniel@...ascale-asia.com> wrote: >>>> >>>> Add NumaChip-specific PCI access mechanism via MMCONFIG cycles, but >>>> preventing access to AMD Northbridges which shouldn't respond. >>>> >>>> v2: Use PCI_DEVFN in precomputed constant limit; drop unneeded includes >>>> >>>> Signed-off-by: Daniel J Blueman <daniel@...ascale-asia.com> >>>> --- >>>> arch/x86/include/asm/numachip/numachip.h | 20 +++++ >>>> arch/x86/kernel/apic/apic_numachip.c | 2 + >>>> arch/x86/pci/Makefile | 1 + >>>> arch/x86/pci/numachip.c | 134 >>>> ++++++++++++++++++++++++++++++ >>>> 4 files changed, 157 insertions(+) >>>> create mode 100644 arch/x86/include/asm/numachip/numachip.h >>>> create mode 100644 arch/x86/pci/numachip.c >>>> >>>> diff --git a/arch/x86/include/asm/numachip/numachip.h >>>> b/arch/x86/include/asm/numachip/numachip.h >>>> new file mode 100644 >>>> index 0000000..d35e71a >>>> --- /dev/null >>>> +++ b/arch/x86/include/asm/numachip/numachip.h >>>> @@ -0,0 +1,20 @@ >>>> +/* >>>> + * This file is subject to the terms and conditions of the GNU General >>>> Public >>>> + * License. See the file "COPYING" in the main directory of this >>>> archive >>>> + * for more details. >>>> + * >>>> + * Numascale NumaConnect-specific header file >>>> + * >>>> + * Copyright (C) 2012 Numascale AS. All rights reserved. >>>> + * >>>> + * Send feedback to <support@...ascale.com> >>>> + * >>>> + */ >>>> + >>>> +#ifndef _ASM_X86_NUMACHIP_NUMACHIP_H >>>> +#define _ASM_X86_NUMACHIP_NUMACHIP_H >>>> + >>>> +extern int __init pci_numachip_init(void); >>>> + >>>> +#endif /* _ASM_X86_NUMACHIP_NUMACHIP_H */ >>>> + >>>> diff --git a/arch/x86/kernel/apic/apic_numachip.c >>>> b/arch/x86/kernel/apic/apic_numachip.c >>>> index a65829a..9c2aa89 100644 >>>> --- a/arch/x86/kernel/apic/apic_numachip.c >>>> +++ b/arch/x86/kernel/apic/apic_numachip.c >>>> @@ -22,6 +22,7 @@ >>>> #include <linux/hardirq.h> >>>> #include <linux/delay.h> >>>> >>>> +#include <asm/numachip/numachip.h> >>>> #include <asm/numachip/numachip_csr.h> >>>> #include <asm/smp.h> >>>> #include <asm/apic.h> >>>> @@ -179,6 +180,7 @@ static int __init numachip_system_init(void) >>>> return 0; >>>> >>>> x86_cpuinit.fixup_cpu_id = fixup_cpu_id; >>>> + x86_init.pci.arch_init = pci_numachip_init; >>>> >>>> map_csrs(); >>>> >>>> diff --git a/arch/x86/pci/Makefile b/arch/x86/pci/Makefile >>>> index 3af5a1e..ee0af58 100644 >>>> --- a/arch/x86/pci/Makefile >>>> +++ b/arch/x86/pci/Makefile >>>> @@ -16,6 +16,7 @@ obj-$(CONFIG_STA2X11) += sta2x11-fixup.o >>>> obj-$(CONFIG_X86_VISWS) += visws.o >>>> >>>> obj-$(CONFIG_X86_NUMAQ) += numaq_32.o >>>> +obj-$(CONFIG_X86_NUMACHIP) += numachip.o >>> >>> >>> It looks like this depends on CONFIG_PCI_MMCONFIG for >>> pci_mmconfig_lookup(). Are there config constraints that force >>> CONFIG_PCI_MMCONFIG=y when CONFIG_X86_NUMACHIP=y? >> >> >> I'll revise the patch with this constraint after we work out the best >> approach for below. >> >> >>>> obj-$(CONFIG_X86_INTEL_MID) += mrst.o >>>> >>>> diff --git a/arch/x86/pci/numachip.c b/arch/x86/pci/numachip.c >>>> new file mode 100644 >>>> index 0000000..3773e05 >>>> --- /dev/null >>>> +++ b/arch/x86/pci/numachip.c >>>> @@ -0,0 +1,129 @@ >>>> +/* >>>> + * This file is subject to the terms and conditions of the GNU General >>>> Public >>>> + * License. See the file "COPYING" in the main directory of this >>>> archive >>>> + * for more details. >>>> + * >>>> + * Numascale NumaConnect-specific PCI code >>>> + * >>>> + * Copyright (C) 2012 Numascale AS. All rights reserved. >>>> + * >>>> + * Send feedback to <support@...ascale.com> >>>> + * >>>> + * PCI accessor functions derived from mmconfig_64.c >>>> + * >>>> + */ >>>> + >>>> +#include <linux/pci.h> >>>> +#include <asm/pci_x86.h> >>>> + >>>> +static u8 limit __read_mostly; >>>> + >>>> +static inline char __iomem *pci_dev_base(unsigned int seg, unsigned int >>>> bus, unsigned int devfn) >>>> +{ >>>> + struct pci_mmcfg_region *cfg = pci_mmconfig_lookup(seg, bus); >>>> + >>>> + if (cfg && cfg->virt) >>>> + return cfg->virt + (PCI_MMCFG_BUS_OFFSET(bus) | (devfn << >>>> 12)); >>>> + return NULL; >>>> +} >>> >>> >>> Most of this file is copied directly from mmconfig_64.c (as you >>> mentioned above). I wonder if we could avoid the code duplication by >>> making the pci_dev_base() implementation in mmconfig_64.c a weak >>> definition. Then you could just supply a non-weak pci_dev_base() here >>> that would override that default version. Your version would look >>> something like: >>> >>> char __iomem *pci_dev_base(unsigned int seg, unsigned int bus, >>> unsigned int devfn) >>> { >>> struct pci_mmcfg_region *cfg = pci_mmconfig_lookup(seg, bus); >>> >>> if (cfg && cfg->virt && devfn < limit) >>> return cfg->virt + (PCI_MMCFG_BUS_OFFSET(bus) | (devfn << 12)); >>> return NULL; >>> } >>> >>> That would be different from what you have in this patch because reads >>> & writes to devices above "limit" would return -EINVAL rather than 0 >>> as you do here. Would that be a problem? >> >> >> That would work nicely (pointer lookup and inlining etc aside) if there was >> the runtime ability to override pci_dev_base only if the NumaChip signature >> was detected. >> >> We could expose pci_dev_base via struct x86_init_pci; the extra complexity >> and performance tradeoff may not be worth it for a single case perhaps? > > Oh, right, I forgot that you can't decide this at build-time. This is > PCI config access, which is not a performance path, so I'm not really > concerned about it from that angle, but you make a good point about > the complexity. > > The reason I'm interested in this is because MMCONFIG is a generic > PCIe feature but is currently done via several arch-specific > implementations, so I'm starting to think about how we can make parts > of it more generic. From that perspective, it's nicer to parameterize > an existing implementation than to clone it because it makes > refactoring opportunities more obvious. > > Backing up a bit, I'm curious about exactly why you need to check for > the limit to begin with. The comment says "Ensure AMD Northbridges > don't decode reads to other devices," but that doesn't seem strictly > accurate. You're not changing anything in the hardware to prevent it > from *decoding* a read, so it seems like you're actually just > preventing the read in the first place. > > What happens without the limit check? Do you get a response timeout > and a machine check? Read from the wrong device? > > As far as I can tell, you still describe your MMCONFIG area with an > MCFG table (since you use pci_mmconfig_lookup() to find the region). > That table only includes the starting and ending bus numbers, so the > assumption is that the MMCONFIG space is valid for every possible > device on those buses. So it seems like your system is not really > compatible with the spec here. > > Because the MCFG table can't describe finer granularity than start/end > bus numbers, we manage MMCONFIG regions as (segment, start_bus, > end_bus, address) tuples. Maybe if we tracked it with slightly finer > granularity, e.g., (segment, start_bus, end_bus, end_bus_device, > address), you could have some sort of MCFG-parsing quirk that reduces > the size of the MMCONFIG region you register for bus 0. > > Just brainstorming here; it's not obvious to me yet what the best solution is. The main intent with the approach I chose was to ensure zero additional code/overhead when CONFIG_NUMACHIP isn't defined, and as a side-effect, all changes are scoped within CONFIG_NUMACHIP, so there's no potential for side-effects. It's possible to add a function pointer to struct x86_init_pci to abstract the base address calculation, but it's a pity to do this just for one need (ie NumaChip) and add indirection even when CONFIG_NUMACHIP is not defined. I revised the patch with the constraints a few days back, so hope it looks good otherwise. Let me know if abstracting the base address calculation via struct x86_init_pci sounds like a better plan (albeit missing the 3.8 merge window). Thanks Bjorn, Daniel -- Daniel J Blueman Principal Software Engineer, Numascale Asia -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@...r.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists