[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ECC7645D-082A-4590-9339-C45949E10C4D@gmail.com>
Date: Thu, 14 Nov 2019 13:04:26 -0800
From: "Jonathan Lemon" <jonathan.lemon@...il.com>
To: "Ilias Apalodimas" <ilias.apalodimas@...aro.org>
Cc: "Lorenzo Bianconi" <lorenzo@...nel.org>, netdev@...r.kernel.org,
lorenzo.bianconi@...hat.com, davem@...emloft.net,
thomas.petazzoni@...tlin.com, brouer@...hat.com,
matteo.croce@...hat.com
Subject: Re: [PATCH net-next 2/3] net: page_pool: add the possibility to sync
DMA memory for non-coherent devices
On 14 Nov 2019, at 12:42, Ilias Apalodimas wrote:
> Hi Jonathan,
>
> On Thu, Nov 14, 2019 at 12:27:40PM -0800, Jonathan Lemon wrote:
>>
>>
>> On 14 Nov 2019, at 10:53, Ilias Apalodimas wrote:
>>
>>> [...]
>>>>> index 2cbcdbdec254..defbfd90ab46 100644
>>>>> --- a/include/net/page_pool.h
>>>>> +++ b/include/net/page_pool.h
>>>>> @@ -65,6 +65,9 @@ struct page_pool_params {
>>>>> int nid; /* Numa node id to allocate from pages from */
>>>>> struct device *dev; /* device, for DMA pre-mapping purposes */
>>>>> enum dma_data_direction dma_dir; /* DMA mapping direction */
>>>>> + unsigned int max_len; /* max DMA sync memory size */
>>>>> + unsigned int offset; /* DMA addr offset */
>>>>> + u8 sync;
>>>>> };
>>>>
>>>> How about using PP_FLAG_DMA_SYNC instead of another flag word?
>>>> (then it can also be gated on having DMA_MAP enabled)
>>>
>>> You mean instead of the u8?
>>> As you pointed out on your V2 comment of the mail, some cards don't
>>> sync
>>> back to
>>> device.
>>> As the API tries to be generic a u8 was choosen instead of a flag to
>>> cover these
>>> use cases. So in time we'll change the semantics of this to 'always
>>> sync', 'dont
>>> sync if it's an skb-only queue' etc.
>>> The first case Lorenzo covered is sync the required len only instead
>>> of
>>> the full
>>> buffer
>>
>> Yes, I meant instead of:
>> + .sync = 1,
>>
>> Something like:
>> .flags = PP_FLAG_DMA_MAP | PP_FLAG_DMA_SYNC
>>
>> Since .sync alone doesn't make sense if the page pool isn't
>> performing any
>> DMA mapping, right?
>
> Correct. If the sync happens regardless of the page pool mapping
> capabilities,
> this will affect performance negatively as well (on non-coherent
> architectures)
>
>> Then existing drivers, if they're converted, can just
>> add the SYNC flag.
>>
>> I did see the initial case where only the RX_BUF_SIZE (1536) is
>> sync'd
>> instead of the full page.
>>
>> Could you expand on your 'skb-only queue' comment? I'm currently
>> running
>> a variant of your patch where iommu mapped pages are attached to
>> skb's and
>> sent up the stack, then reclaimed on release. I imagine that with
>> this
>> change, they would have the full RX_BUF_SIZE sync'd before returning
>> to the
>> driver, since the upper layers could basically do anything with the
>> buffer
>> area.
>
> The idea was that page_pool lives per device queue. Usually some
> queues are
> reserved for XDP only. Since eBPF progs can change the packet we have
> to sync
> for the device, before we fill in the device descriptors.
And some devices (mlx4) run xdp on the normal RX queue, and if the
verdict is
PASS, a skb is constructed and sent up the stack.
> For the skb reserved queues, this depends on the 'anything'. If the
> rest of the
> layers touch (or rather write) into that area, then we'll again gave
> to sync.
> If we know that the data has not been altered though, we can hand them
> back to
> the device skipping that sync right?
Sure, but this is also true for eBPF programs. How would the driver
know that
the data has not been altered / compacted by the upper layers?
--
Jonathan
Powered by blists - more mailing lists