linux-kernel - Re: [PATCH 2/2] pci: host: Add Broadcom STB PCIE RC controller

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CANCKTBuNFQFLsab2WS+BRGUOuUOTLYuoaWNuM5T9HPGihvwYCg@mail.gmail.com>
Date:	Thu, 5 May 2016 13:48:53 -0400
From:	Jim Quinlan <jim2101024@...il.com>
To:	Florian Fainelli <f.fainelli@...il.com>
Cc:	Arnd Bergmann <arnd@...db.de>, linux-pci@...r.kernel.org,
	linux-kernel@...r.kernel.org, devicetree@...r.kernel.org,
	linux-arm-kernel@...ts.infradead.org,
	bcm-kernel-feedback-list@...adcom.com, bhelgaas@...gle.com
Subject: Re: [PATCH 2/2] pci: host: Add Broadcom STB PCIE RC controller

On Wed, May 4, 2016 at 3:36 PM, Florian Fainelli <f.fainelli@...il.com> wrote:
> On 03/05/16 02:57, Arnd Bergmann wrote:
>>> +static const struct pcie_cfg_data generic_cfg = {
>>> +    .offsets        = pcie_offsets,
>>> +    .type           = GENERIC,
>>> +};
>>
>> The way you access the config space here seems very indirect. I'd
>> suggest instead writing two sets of pci_ops, with hardcoded registers
>> offsets in them, and a function pointer to access the RGR1_SW_INIT_1
>> register.
>
> How about introducing helper functions but keeping the same set of
> read/write pci_ops instead, would that seem more acceptable? I agree
> that the macros are not the friendliest thing.
>
>>
>>> +struct brcm_msi {
>>> +    struct irq_domain *domain;
>>> +    struct irq_chip irq_chip;
>>> +    struct msi_controller chip;
>>> +    struct mutex lock;
>>> +    int irq;
>>> +    /* intr_base is the base pointer for interrupt status/set/clr regs */
>>> +    void __iomem *intr_base;
>>> +    /* intr_legacy_mask indicates how many bits are MSI interrupts */
>>> +    u32 intr_legacy_mask;
>>> +    /* intr_legacy_offset indicates bit position of MSI_01 */
>>> +    u32 intr_legacy_offset;
>>> +    /* used indicates which MSI interrupts have been alloc'd */
>>> +    unsigned long used;
>>> +    /* working indicates that on boot we have brought up MSI */
>>> +    bool working;
>>> +};
>>
>> I'd move the MSI controller stuff into a separate file, and add a way to
>> override it. It's likely that at some point the same PCIe implementation
>> will be used with a modern GICv3 or GICv2m, so you should be able to
>> look up the msi controller from a property.
>
> Good point, let's do that, though all controllers actually do support
> MSI, so I wonder.
>
>>
>>> +struct brcm_window {
>>> +    u64 size;
>>> +    u64 cpu_addr;
>>> +    struct resource pcie_iomem_res;
>>> +};
>>
>> This appears to duplicate things we already have. Try to get rid of
>> the structure and use what is already there.
>
> OK, the resource is probably good enough here.
>
>>
>>> +struct brcm_dev_pwr_supply {
>>> +    struct list_head node;
>>> +    char name[32];
>>> +    struct regulator *regulator;
>>> +};
>>
>> Same here: Just get all the regulators you know about by name
>> and put them into the main structure.
>
> OK, I will drop the regulator support for now, see below for a more
> elaborate answer.
>
>>
>>> +/* Internal Bus Controller Information.*/
>>> +struct brcm_pcie {
>>> +    struct list_head        list;
>>> +    void __iomem            *base;
>>> +    char                    name[8];
>>> +    bool                    suspended;
>>> +    struct clk              *clk;
>>> +    struct device_node      *dn;
>>> +    int                     pcie_irq[4];
>>> +    int                     irq;
>>> +    int                     num_out_wins;
>>> +    bool                    ssc;
>>> +    int                     gen;
>>> +    int                     scb_size_vals[BRCM_MAX_SCB];
>>> +    struct brcm_window      out_wins[BRCM_NUM_PCI_OUT_WINS];
>>> +    struct pci_bus          *bus;
>>> +    struct device           *dev;
>>> +    struct list_head        resource;
>>> +    struct list_head        pwr_supplies;
>>> +    struct brcm_msi         msi;
>>> +    unsigned int            rev;
>>> +    unsigned int            num;
>>> +    bool                    bridge_setup_done;
>>> +    const int               *reg_offsets;
>>> +    enum pcie_type          type;
>>> +};
>>> +
>>> +static int brcm_num_pci_controllers;
>>> +static int num_memc;
>>
>> Remove the globals.
>
> OK.
>
>>
>>> +
>>> +/*
>>> + * MIPS endianness is configured by boot strap, which also reverses all
>>> + * bus endianness (i.e., big-endian CPU + big endian bus ==> native
>>> + * endian I/O).
>>> + *
>>> + * Other architectures (e.g., ARM) either do not support big endian, or
>>> + * else leave I/O in little endian mode.
>>> + */
>>> +static inline u32 bpcie_readl(void __iomem *base)
>>> +{
>>> +    if (IS_ENABLED(CONFIG_MIPS) && IS_ENABLED(CONFIG_CPU_BIG_ENDIAN))
>>> +            return __raw_readl(base);
>>> +    else
>>> +            return readl_relaxed(base);
>>> +}
>>
>> I think it would make more sense to only make this depend on the
>> architecture, not on endianess: If the __raw_ version works on MIPS
>> in big-endian mode, you should be able to also use it in little-endian
>> mode.
>>
>> For the default (ARM) version, please use the non-relaxed accessor
>> by default, unless you can show a difference in performance and prove
>> that you don't need the implied barriers.
>
> Our busing makes it so that the __raw_readl() (or _relaxed) I/O access
> is going to have the same guarantees as if we were adding a barrier,
> that is, writes are not going to be re-ordered with each other by the
> bus, nor the CPU (since it maps the space as device memory), and on
> MIPS, well, it's all through unmapped, uncached space, but I can
> definitively switch that for readl(), should not make a huge performance
> difference, if noticeable.
>
>>
>>> +/* negative return value indicates error */
>>> +static int mdio_read(void __iomem *base, u8 phyad, u8 regad)
>>> +{
>>> +    u32 data = ((phyad & 0xf) << 16)
>>> +            | (regad & 0x1f)
>>> +            | 0x100000;
>>> +
>>> +    bpcie_writel(data, base + PCIE_RC_DL_MDIO_ADDR);
>>> +    bpcie_readl(base + PCIE_RC_DL_MDIO_ADDR);
>>> +
>>> +    data = bpcie_readl(base + PCIE_RC_DL_MDIO_RD_DATA);
>>> +    if (!(data & 0x80000000)) {
>>> +            mdelay(1);
>>
>> Try to restructure the code so this is never called with spinlocks
>> held and then replace the mdelay() with an msleep().
>
> I cannot really think of a reason why we did not use msleep() all along,
> so thanks for spotting that.

This is an artifact of when we used to do our PCIe suspend/resume via
syscore calls and we could not sleep.

>
>>> +
>>> +    /* set up 4GB PCIE->SCB memory window on BAR2 */
>>> +    bpcie_writel(0x00000011, base + PCIE_MISC_RC_BAR2_CONFIG_LO);
>>> +    bpcie_writel(0x00000000, base + PCIE_MISC_RC_BAR2_CONFIG_HI);
>>
>> Where does this window come from? It's not in the DT as far as I can see.
>
> This is an architectural maximum value which is baked into the PCIE Root
> Complex hardware, on all generations that this driver supports. We
> configure the largest address match rang size here, just to be safe,
> AFAICT there is no point in looking at the inbound or outbound window
> sizes to re-program that differently.

Florian, I have a subsequent commit that passes the inbound
information in the device tree; it's designed for when we have to map
the multiple inbound regions differently using dma-ranges.  I will
make it available to you offline.

>
>
>>> +    INIT_LIST_HEAD(&pcie->pwr_supplies);
>>> +    INIT_LIST_HEAD(&pcie->resource);
>>> +
>>> +    supplies = of_property_count_strings(dn, "supply-names");
>>> +    if (supplies <= 0)
>>> +            supplies = 0;
>>> +
>>> +    for (i = 0; i < supplies; i++) {
>>> +            if (of_property_read_string_index(dn, "supply-names", i,
>>> +                                              &name))
>>> +                    continue;
>>> +            supply = devm_kzalloc(&pdev->dev, sizeof(*supply), GFP_KERNEL);
>>> +            if (!supply)
>>> +                    return -ENOMEM;
>>> +
>>> +            strncpy(supply->name, name, sizeof(supply->name));
>>> +            supply->name[sizeof(supply->name) - 1] = '\0';
>>> +            supply->regulator = devm_regulator_get(&pdev->dev, name);
>>> +            if (IS_ERR(supply->regulator)) {
>>> +                    dev_err(&pdev->dev, "Unable to get %s supply, err=%d\n",
>>> +                            name, (int)PTR_ERR(supply->regulator));
>>> +                    continue;
>>> +            }
>>> +
>>> +            if (regulator_enable(supply->regulator))
>>> +                    dev_err(&pdev->dev, "Unable to enable %s supply.\n",
>>> +                            name);
>>> +            list_add_tail(&supply->node, &pcie->pwr_supplies);
>>> +    }
>>
>> Don't parse the regulator properties yourself here, use the proper APIs.
>
> I will probably drop this for now, and  add it later, there are only a
> handful of boards which requires this, and ultimately, I think we should
> be seaking for a more generic solutions, we can't be the only ones doing
> that.
>
>>
>>> +    /* 'num_memc' will be set only by the first controller, and all
>>> +     * other controllers will use the value set by the first.
>>> +     */
>>> +    if (num_memc == 0)
>>> +            for_each_compatible_node(mdn, NULL, "brcm,brcmstb-memc")
>>> +                    if (of_device_is_available(mdn))
>>> +                            num_memc++;
>>
>> Why is this?
>
> This is so we do not end-up programming the PCIe RC which is agnostic of
> the number of

I believe this code is still around for folks passing us a device tree
with lacking information.  It should be removed.

>
>>
>>> +    resource_list_for_each_entry(win, &res) {
>>> +            struct brcm_window *w = &pcie->out_wins[i];
>>> +
>>> +            r = win->res;
>>> +
>>> +            if (!r->flags)
>>> +                    continue;
>>> +
>>> +            switch (resource_type(r)) {
>>> +            case IORESOURCE_MEM:
>>> +                    w->cpu_addr = r->start;
>>> +                    w->size = resource_size(r);
>>> +                    w->pcie_iomem_res.name  = "External PCIe MEM";
>>> +                    w->pcie_iomem_res.flags = r->flags;
>>> +                    w->pcie_iomem_res.start = r->start;
>>> +                    w->pcie_iomem_res.end   = r->end;
>>> +                    pcie->num_out_wins++;
>>> +                    i++;
>>> +                    /* Request memory region resources. */
>>> +                    ret = devm_request_resource(&pdev->dev,
>>> +                                                &iomem_resource,
>>> +                                                &w->pcie_iomem_res);
>>> +                    if (ret) {
>>> +                            dev_err(&pdev->dev,
>>> +                                    "request PCIe memory resource failed\n");
>>> +                            goto out_err_clk;
>>> +                    }
>>> +                    break;
>>> +
>>> +            default:
>>> +                    continue;
>>> +            }
>>> +    }
>>
>> What about IORESOURCE_IO?
>
> We do not support I/O space on this controller AFAIR. Our downstream
> driver does insert a fake bogus I/O range, but I cannot really remember
> why that was needed now, Jim do you remember?
> --
> Florian

We added a bogus IO region because there was no other way to proceed
w/o getting an error.  Or should I say, I knew of no other way to
proceed...

Thanks,
Jim Quinlan