[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1713510661.7868748-2-xuanzhuo@linux.alibaba.com>
Date: Fri, 19 Apr 2024 15:11:01 +0800
From: Xuan Zhuo <xuanzhuo@...ux.alibaba.com>
To: Jesper Dangaard Brouer <hawk@...nel.org>
Cc: virtualization@...ts.linux.dev,
"Michael S. Tsirkin" <mst@...hat.com>,
"David S. Miller" <davem@...emloft.net>,
Eric Dumazet <edumazet@...gle.com>,
Jakub Kicinski <kuba@...nel.org>,
Paolo Abeni <pabeni@...hat.com>,
netdev@...r.kernel.org,
"Linux-MM" <linux-mm@...ck.org>,
Matthew Wilcox <willy@...radead.org>,
Ilias Apalodimas <ilias.apalodimas@...aro.org>,
Mel Gorman <mgorman@...hsingularity.net>,
Jason Wang <jasowang@...hat.com>
Subject: Re: [PATCH vhost 3/6] virtio_net: replace private by pp struct inside page
On Thu, 18 Apr 2024 22:19:33 +0200, Jesper Dangaard Brouer <hawk@...nel.org> wrote:
>
>
> On 17/04/2024 10.20, Xuan Zhuo wrote:
> > On Wed, 17 Apr 2024 12:08:10 +0800, Jason Wang <jasowang@...hat.com> wrote:
> >> On Wed, Apr 17, 2024 at 9:38 AM Xuan Zhuo <xuanzhuo@...ux.alibaba.com> wrote:
> >>>
> >>> On Tue, 16 Apr 2024 11:24:53 +0800, Jason Wang <jasowang@...hat.com> wrote:
> >>>> On Mon, Apr 15, 2024 at 5:04 PM Xuan Zhuo <xuanzhuo@...ux.alibaba.com> wrote:
> >>>>>
> >>>>> On Mon, 15 Apr 2024 16:56:45 +0800, Jason Wang <jasowang@...hat.com> wrote:
> >>>>>> On Mon, Apr 15, 2024 at 4:50 PM Xuan Zhuo <xuanzhuo@...ux.alibaba.com> wrote:
> >>>>>>>
> >>>>>>> On Mon, 15 Apr 2024 14:43:24 +0800, Jason Wang <jasowang@...hat.com> wrote:
> >>>>>>>> On Mon, Apr 15, 2024 at 10:35 AM Xuan Zhuo <xuanzhuo@...ux.alibaba.com> wrote:
> >>>>>>>>>
> >>>>>>>>> On Fri, 12 Apr 2024 13:49:12 +0800, Jason Wang <jasowang@...hat.com> wrote:
> >>>>>>>>>> On Fri, Apr 12, 2024 at 1:39 PM Xuan Zhuo <xuanzhuo@...ux.alibaba.com> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>> On Fri, 12 Apr 2024 12:47:55 +0800, Jason Wang <jasowang@...hat.com> wrote:
> >>>>>>>>>>>> On Thu, Apr 11, 2024 at 10:51 AM Xuan Zhuo <xuanzhuo@...ux.alibaba.com> wrote:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Now, we chain the pages of big mode by the page's private variable.
> >>>>>>>>>>>>> But a subsequent patch aims to make the big mode to support
> >>>>>>>>>>>>> premapped mode. This requires additional space to store the dma addr.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Within the sub-struct that contains the 'private', there is no suitable
> >>>>>>>>>>>>> variable for storing the DMA addr.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> struct { /* Page cache and anonymous pages */
> >>>>>>>>>>>>> /**
> >>>>>>>>>>>>> * @lru: Pageout list, eg. active_list protected by
> >>>>>>>>>>>>> * lruvec->lru_lock. Sometimes used as a generic list
> >>>>>>>>>>>>> * by the page owner.
> >>>>>>>>>>>>> */
> >>>>>>>>>>>>> union {
> >>>>>>>>>>>>> struct list_head lru;
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> /* Or, for the Unevictable "LRU list" slot */
> >>>>>>>>>>>>> struct {
> >>>>>>>>>>>>> /* Always even, to negate PageTail */
> >>>>>>>>>>>>> void *__filler;
> >>>>>>>>>>>>> /* Count page's or folio's mlocks */
> >>>>>>>>>>>>> unsigned int mlock_count;
> >>>>>>>>>>>>> };
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> /* Or, free page */
> >>>>>>>>>>>>> struct list_head buddy_list;
> >>>>>>>>>>>>> struct list_head pcp_list;
> >>>>>>>>>>>>> };
> >>>>>>>>>>>>> /* See page-flags.h for PAGE_MAPPING_FLAGS */
> >>>>>>>>>>>>> struct address_space *mapping;
> >>>>>>>>>>>>> union {
> >>>>>>>>>>>>> pgoff_t index; /* Our offset within mapping. */
> >>>>>>>>>>>>> unsigned long share; /* share count for fsdax */
> >>>>>>>>>>>>> };
> >>>>>>>>>>>>> /**
> >>>>>>>>>>>>> * @private: Mapping-private opaque data.
> >>>>>>>>>>>>> * Usually used for buffer_heads if PagePrivate.
> >>>>>>>>>>>>> * Used for swp_entry_t if PageSwapCache.
> >>>>>>>>>>>>> * Indicates order in the buddy system if PageBuddy.
> >>>>>>>>>>>>> */
> >>>>>>>>>>>>> unsigned long private;
> >>>>>>>>>>>>> };
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> But within the page pool struct, we have a variable called
> >>>>>>>>>>>>> dma_addr that is appropriate for storing dma addr.
> >>>>>>>>>>>>> And that struct is used by netstack. That works to our advantage.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> struct { /* page_pool used by netstack */
> >>>>>>>>>>>>> /**
> >>>>>>>>>>>>> * @pp_magic: magic value to avoid recycling non
> >>>>>>>>>>>>> * page_pool allocated pages.
> >>>>>>>>>>>>> */
> >>>>>>>>>>>>> unsigned long pp_magic;
> >>>>>>>>>>>>> struct page_pool *pp;
> >>>>>>>>>>>>> unsigned long _pp_mapping_pad;
> >>>>>>>>>>>>> unsigned long dma_addr;
> >>>>>>>>>>>>> atomic_long_t pp_ref_count;
> >>>>>>>>>>>>> };
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> On the other side, we should use variables from the same sub-struct.
> >>>>>>>>>>>>> So this patch replaces the "private" with "pp".
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Signed-off-by: Xuan Zhuo <xuanzhuo@...ux.alibaba.com>
> >>>>>>>>>>>>> ---
> >>>>>>>>>>>>
> >>>>>>>>>>>> Instead of doing a customized version of page pool, can we simply
> >>>>>>>>>>>> switch to use page pool for big mode instead? Then we don't need to
> >>>>>>>>>>>> bother the dma stuffs.
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> The page pool needs to do the dma by the DMA APIs.
> >>>>>>>>>>> So we can not use the page pool directly.
> >>>>>>>>>>
> >>>>>>>>>> I found this:
> >>>>>>>>>>
> >>>>>>>>>> define PP_FLAG_DMA_MAP BIT(0) /* Should page_pool do the DMA
> >>>>>>>>>> * map/unmap
> >>>>>>>>>>
> >>>>>>>>>> It seems to work here?
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> I have studied the page pool mechanism and believe that we cannot use it
> >>>>>>>>> directly. We can make the page pool to bypass the DMA operations.
> >>>>>>>>> This allows us to handle DMA within virtio-net for pages allocated from the page
> >>>>>>>>> pool. Furthermore, we can utilize page pool helpers to associate the DMA address
> >>>>>>>>> to the page.
> >>>>>>>>>
> >>>>>>>>> However, the critical issue pertains to unmapping. Ideally, we want to return
> >>>>>>>>> the mapped pages to the page pool and reuse them. In doing so, we can omit the
> >>>>>>>>> unmapping and remapping steps.
> >>>>>>>>>
> >>>>>>>>> Currently, there's a caveat: when the page pool cache is full, it disconnects
> >>>>>>>>> and releases the pages. When the pool hits its capacity, pages are relinquished
> >>>>>>>>> without a chance for unmapping.
>
> Could Jakub's memory provider for PP help your use-case?
>
> See: [1]
> https://lore.kernel.org/all/20240403002053.2376017-3-almasrymina@google.com/
> Or: [2]
> https://lore.kernel.org/netdev/f8270765-a27b-6ccf-33ea-cda097168d79@redhat.com/T/
It can not. That make the pp can get page by the callbacks.
Here we talk about the map/unmap.
The virtio-net has the different DMA APIs.
dma_addr_t virtqueue_dma_map_single_attrs(struct virtqueue *_vq, void *ptr, size_t size,
enum dma_data_direction dir, unsigned long attrs);
void virtqueue_dma_unmap_single_attrs(struct virtqueue *_vq, dma_addr_t addr,
size_t size, enum dma_data_direction dir,
unsigned long attrs);
dma_addr_t virtqueue_dma_map_page_attrs(struct virtqueue *_vq, struct page *page,
size_t offset, size_t size,
enum dma_data_direction dir,
unsigned long attrs);
void virtqueue_dma_unmap_page_attrs(struct virtqueue *_vq, dma_addr_t addr,
size_t size, enum dma_data_direction dir,
unsigned long attrs);
int virtqueue_dma_mapping_error(struct virtqueue *_vq, dma_addr_t addr);
bool virtqueue_dma_need_sync(struct virtqueue *_vq, dma_addr_t addr);
void virtqueue_dma_sync_single_range_for_cpu(struct virtqueue *_vq, dma_addr_t addr,
unsigned long offset, size_t size,
enum dma_data_direction dir);
void virtqueue_dma_sync_single_range_for_device(struct virtqueue *_vq, dma_addr_t addr,
unsigned long offset, size_t size,
enum dma_data_direction dir);
Thanks.
>
>
> [...]
> >>>>>>
> >>>>>> Adding Jesper for some comments.
> >>>>>>
> >>>>>>>
> >>>>>>> Back to this patch set, I think we should keep the virtio-net to manage
> >>>>>>> the pages.
> >>>>>>>
>
> For context the patch:
> [3]
> https://lore.kernel.org/all/20240411025127.51945-4-xuanzhuo@linux.alibaba.com/
>
> >>>>>>> What do you think?
> >>>>>>
> >>>>>> I might be wrong, but I think if we need to either
> >>>>>>
> >>>>>> 1) seek a way to manage the pages by yourself but not touching page
> >>>>>> pool metadata (or Jesper is fine with this)
> >>>>>
> >>>>> Do you mean working with page pool or not?
> >>>>>
> >>>>
> >>>> I meant if Jesper is fine with reusing page pool metadata like this patch.
> >>>>
> >>>>> If we manage the pages by self(no page pool), we do not care the metadata is for
> >>>>> page pool or not. We just use the space of pages like the "private".
> >>>>
> >>>> That's also fine.
> >>>>
>
> I'm not sure it is "fine" to, explicitly choosing not to use page pool,
> and then (ab)use `struct page` member (pp) that intended for page_pool
> for other stuff. (In this case create a linked list of pages).
>
> +#define page_chain_next(p) ((struct page *)((p)->pp))
> +#define page_chain_add(p, n) ((p)->pp = (void *)n)
>
> I'm not sure that I (as PP maintainer) can make this call actually, as I
> think this area belong with the MM "page" maintainers (Cc MM-list +
> people) to judge.
>
> Just invention new ways to use struct page fields without adding your
> use-case to struct page, will make it harder for MM people to maintain
> (e.g. make future change).
>
> --Jesper
>
>
Powered by blists - more mailing lists