[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <52b8f850-af8c-1971-9729-c5de37875bf9@amd.com>
Date: Fri, 2 Jun 2023 17:09:38 +0530
From: Kishon Vijay Abraham I <kvijayab@....com>
To: Damien Le Moal <dlemoal@...nel.org>,
Kishon Vijay Abraham I <kvijayab@....com>,
Shunsuke Mie <mie@...l.co.jp>,
Jingoo Han <jingoohan1@...il.com>
Cc: Gustavo Pimentel <gustavo.pimentel@...opsys.com>,
Lorenzo Pieralisi <lpieralisi@...nel.org>,
Rob Herring <robh@...nel.org>,
Krzysztof Wilczyński <kw@...ux.com>,
Bjorn Helgaas <bhelgaas@...gle.com>,
Manivannan Sadhasivam <mani@...nel.org>,
Kishon Vijay Abraham I <kishon@...nel.org>,
Kunihiko Hayashi <hayashi.kunihiko@...ionext.com>,
Hou Zhiqiang <Zhiqiang.Hou@....com>,
Frank Li <Frank.Li@....com>, Li Chen <lchen@...arella.com>,
linux-pci@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [RFC PATCH 1/3] PCI: endpoint: support an alignment aware
map/unmaping
On 6/2/2023 5:13 AM, Damien Le Moal wrote:
> On 6/2/23 00:06, Kishon Vijay Abraham I wrote:
>> Hi Shunsuke,
>>
>> On 1/13/2023 2:33 PM, Shunsuke Mie wrote:
>>> Add an align_mem operation to the EPC ops, which function is used to
>>> pci_epc_map/unmap_addr(). These change to enable mapping for any alignment
>>> restriction of EPC. The map function maps an aligned memory to include a
>>> requested memory region.
>>
>> I'd prefer all the PCIe address alignment restriction be handled in the
>> endpoint function drivers and not inside the core layer (esp in map and
>> unmap calls).
>
> That is a really *bad* idea ! Most function drivers should be able to work with
> any EP controller hardware. Asking these drivers to support all the alignment
> peculiarities of every possible EP controller is impossible.
Function drivers already work with various restrictions of EP controller
hardware. pci_epc_features was added to provide such restrictions to
function drivers. Not sure why it has to be different here.
>
>> IMO, get the pci address alignment restriction using pci_epc_features.
>> And use a bigger size (based on alignment restriction) in
>> pci_epc_mem_alloc_addr() and access the allocated window using an offset
>> (based on alignment value). You can add separate helpers if required.
>
> That is too simplistic and not enough. Example: Rick and I working on an nvme
> function driver are facing a lot of issues with the EPC API for mem & mapping
> management because we have 0 control over the PCI address that the host will
> use. Alignment is all over the place, and the current EPC memory API
> restrictions (window size limitations) make it impossible to transparently
> handle all cases. We endup with NVMe command failures simply because of the API
> limitations.
You mean restrictions w.r.t OB window address and not PCIe address?
>
> And sure, we can modify that driver to better support the EP controller we are
> using (rockchip). But we need to support other EP controllers as well. So API
Every EP controller can provide it's restrictions in pci_epc_features.
Unless the alignment is going to change dynamically, don't see a need
for adding new epc ops.
Not sure why the following cannot be handled from function driver?
From
A A + S
┌────────────────────────┐
│ │
│ OB WIN │
├────────────────────────┤
mapping │ │
▼ B + S ▼
B ┌────────────────────────┐
│ │
│ PCI Address │
└────────────────────────┘
To
A A'│ A + S A+S+alignment
┌────┼───────────────────┬──────┐
│ │ │ │
│ │ OB WIN │ │
├────┴───────────────────┴──────┤
│ |
│ |
B' ▼ B B + S ▼
┌────┬──────────────────────────┐
│ │ │
│ │ PCI Address │
└────┴──────────────────────────┘
So the changes in function driver will be
1) Get alignment value in epc_features
2) pci_epc_mem_alloc_addr()/pci_epc_map_addr() will take into account
the alignment value (change in size parameter)
3) Access host memory from an offset in the provided
pci_epc_mem_alloc_addr().
> changes are definitely needed. Working on that. That is not easy as the mapping
> API and its semantic impacts data transfers (memcpy_from|toio and DMA).
>
> I do have a patch that does something similar as this one, but at a much higher
> level with a helper function that gives the function driver the offset into the
> allocated memory region to use for mapping a particular PCI address. And then
> this helper is then in turn used into a new pci_epc_map() function which does
> mem alloc + mapping in one go based on the EPC constraints. That hides all
pci_epc_map() was added only to perform mapping functionality. I'd
prefer it stays that way instead of adding bunch of other things into it.
Thanks,
Kishon
> alignment details to the function drivers, which greatlyu simplyfies the code.
> But that is not enough as alignment also implies that we have to deal with
> boundaries (due to limited window size) and so sometimes endpu failing a mapping
> that is too large because the host used a PCI address close to the boundary.
> More work is needed to have pci_epc_map() also hide that with tricks like
> allowing the allocation and mapping of multiple contiguous windows. So EPC ops
> API changes are also needed.
>
>
>>
>> Thanks,
>> Kishon
>>
>>>
>>> Signed-off-by: Shunsuke Mie <mie@...l.co.jp>
>>> ---
>>> drivers/pci/endpoint/pci-epc-core.c | 57 ++++++++++++++++++++++++-----
>>> include/linux/pci-epc.h | 10 +++--
>>> 2 files changed, 53 insertions(+), 14 deletions(-)
>>>
>>> diff --git a/drivers/pci/endpoint/pci-epc-core.c b/drivers/pci/endpoint/pci-epc-core.c
>>> index 2542196e8c3d..60d586e05e7d 100644
>>> --- a/drivers/pci/endpoint/pci-epc-core.c
>>> +++ b/drivers/pci/endpoint/pci-epc-core.c
>>> @@ -430,8 +430,12 @@ EXPORT_SYMBOL_GPL(pci_epc_set_msix);
>>> * Invoke to unmap the CPU address from PCI address.
>>> */
>>> void pci_epc_unmap_addr(struct pci_epc *epc, u8 func_no, u8 vfunc_no,
>>> - phys_addr_t phys_addr)
>>> + phys_addr_t phys_addr, void __iomem *virt_addr, size_t size)
>>> {
>>> + u64 aligned_phys;
>>> + void __iomem *aligned_virt;
>>> + size_t offset;
>>> +
>>> if (IS_ERR_OR_NULL(epc) || func_no >= epc->max_functions)
>>> return;
>>>
>>> @@ -441,9 +445,22 @@ void pci_epc_unmap_addr(struct pci_epc *epc, u8 func_no, u8 vfunc_no,
>>> if (!epc->ops->unmap_addr)
>>> return;
>>>
>>> + if (epc->ops->align_mem) {
>>> + mutex_lock(&epc->lock);
>>> + aligned_phys = epc->ops->align_mem(epc, phys_addr, &size);
>>> + mutex_unlock(&epc->lock);
>>> + } else {
>>> + aligned_phys = phys_addr;
>>> + }
>>> +
>>> + offset = phys_addr - aligned_phys;
>>> + aligned_virt = virt_addr - offset;
>>> +
>>> mutex_lock(&epc->lock);
>>> - epc->ops->unmap_addr(epc, func_no, vfunc_no, phys_addr);
>>> + epc->ops->unmap_addr(epc, func_no, vfunc_no, aligned_phys);
>>> mutex_unlock(&epc->lock);
>>> +
>>> + pci_epc_mem_free_addr(epc, aligned_phys, aligned_virt, size);
>>> }
>>> EXPORT_SYMBOL_GPL(pci_epc_unmap_addr);
>>>
>>> @@ -458,26 +475,46 @@ EXPORT_SYMBOL_GPL(pci_epc_unmap_addr);
>>> *
>>> * Invoke to map CPU address with PCI address.
>>> */
>>> -int pci_epc_map_addr(struct pci_epc *epc, u8 func_no, u8 vfunc_no,
>>> - phys_addr_t phys_addr, u64 pci_addr, size_t size)
>>> +void __iomem *pci_epc_map_addr(struct pci_epc *epc, u8 func_no, u8 vfunc_no,
>>> + u64 pci_addr, phys_addr_t *phys_addr, size_t size)
>>> {
>>> int ret;
>>> + u64 aligned_addr;
>>> + size_t offset;
>>> + void __iomem *virt_addr;
>>>
>>> if (IS_ERR_OR_NULL(epc) || func_no >= epc->max_functions)
>>> - return -EINVAL;
>>> + return ERR_PTR(-EINVAL);
>>>
>>> if (vfunc_no > 0 && (!epc->max_vfs || vfunc_no > epc->max_vfs[func_no]))
>>> - return -EINVAL;
>>> + return ERR_PTR(-EINVAL);
>>>
>>> if (!epc->ops->map_addr)
>>> - return 0;
>>> + return ERR_PTR(-ENOPTSUPP);
>>> +
>>> + if (epc->ops->align_mem) {
>>> + mutex_lock(&epc->lock);
>>> + aligned_addr = epc->ops->align_mem(epc, pci_addr, &size);
>>> + mutex_unlock(&epc->lock);
>>> + } else {
>>> + aligned_addr = pci_addr;
>>> + }
>>> +
>>> + offset = pci_addr - aligned_addr;
>>> +
>>> + virt_addr = pci_epc_mem_alloc_addr(epc, phys_addr, size);
>>> + if (!virt_addr)
>>> + return ERR_PTR(-ENOMEM);
>>>
>>> mutex_lock(&epc->lock);
>>> - ret = epc->ops->map_addr(epc, func_no, vfunc_no, phys_addr, pci_addr,
>>> - size);
>>> + ret = epc->ops->map_addr(epc, func_no, vfunc_no, *phys_addr, aligned_addr, size);
>>> mutex_unlock(&epc->lock);
>>> + if (ret)
>>> + return ERR_PTR(ret);
>>>
>>> - return ret;
>>> + *phys_addr += offset;
>>> +
>>> + return virt_addr + offset;
>>> }
>>> EXPORT_SYMBOL_GPL(pci_epc_map_addr);
>>>
>>> diff --git a/include/linux/pci-epc.h b/include/linux/pci-epc.h
>>> index a48778e1a4ee..8f29161bce80 100644
>>> --- a/include/linux/pci-epc.h
>>> +++ b/include/linux/pci-epc.h
>>> @@ -84,6 +84,7 @@ struct pci_epc_ops {
>>> phys_addr_t phys_addr, u8 interrupt_num,
>>> u32 entry_size, u32 *msi_data,
>>> u32 *msi_addr_offset);
>>> + u64 (*align_mem)(struct pci_epc *epc, u64 addr, size_t *size);
>>> int (*start)(struct pci_epc *epc);
>>> void (*stop)(struct pci_epc *epc);
>>> const struct pci_epc_features* (*get_features)(struct pci_epc *epc,
>>> @@ -218,11 +219,12 @@ int pci_epc_set_bar(struct pci_epc *epc, u8 func_no, u8 vfunc_no,
>>> struct pci_epf_bar *epf_bar);
>>> void pci_epc_clear_bar(struct pci_epc *epc, u8 func_no, u8 vfunc_no,
>>> struct pci_epf_bar *epf_bar);
>>> -int pci_epc_map_addr(struct pci_epc *epc, u8 func_no, u8 vfunc_no,
>>> - phys_addr_t phys_addr,
>>> - u64 pci_addr, size_t size);
>>> +void __iomem *pci_epc_map_addr(struct pci_epc *epc, u8 func_no, u8 vfunc_no,
>>> + u64 pci_addr, phys_addr_t *phys_addr,
>>> + size_t size);
>>> void pci_epc_unmap_addr(struct pci_epc *epc, u8 func_no, u8 vfunc_no,
>>> - phys_addr_t phys_addr);
>>> + phys_addr_t phys_addr, void __iomem *virt_addr,
>>> + size_t size);
>>> int pci_epc_set_msi(struct pci_epc *epc, u8 func_no, u8 vfunc_no,
>>> u8 interrupts);
>>> int pci_epc_get_msi(struct pci_epc *epc, u8 func_no, u8 vfunc_no);
>
Powered by blists - more mailing lists