lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ad98cb14-cc1b-4a01-aacc-8fb53445049e@kernel.org>
Date: Thu, 18 Apr 2024 22:19:33 +0200
From: Jesper Dangaard Brouer <hawk@...nel.org>
To: Xuan Zhuo <xuanzhuo@...ux.alibaba.com>, Jason Wang <jasowang@...hat.com>
Cc: virtualization@...ts.linux.dev, "Michael S. Tsirkin" <mst@...hat.com>,
 "David S. Miller" <davem@...emloft.net>, Eric Dumazet <edumazet@...gle.com>,
 Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>,
 netdev@...r.kernel.org, Linux-MM <linux-mm@...ck.org>,
 Matthew Wilcox <willy@...radead.org>,
 Ilias Apalodimas <ilias.apalodimas@...aro.org>,
 Mel Gorman <mgorman@...hsingularity.net>
Subject: Re: [PATCH vhost 3/6] virtio_net: replace private by pp struct inside
 page



On 17/04/2024 10.20, Xuan Zhuo wrote:
> On Wed, 17 Apr 2024 12:08:10 +0800, Jason Wang <jasowang@...hat.com> wrote:
>> On Wed, Apr 17, 2024 at 9:38 AM Xuan Zhuo <xuanzhuo@...ux.alibaba.com> wrote:
>>>
>>> On Tue, 16 Apr 2024 11:24:53 +0800, Jason Wang <jasowang@...hat.com> wrote:
>>>> On Mon, Apr 15, 2024 at 5:04 PM Xuan Zhuo <xuanzhuo@...ux.alibaba.com> wrote:
>>>>>
>>>>> On Mon, 15 Apr 2024 16:56:45 +0800, Jason Wang <jasowang@...hat.com> wrote:
>>>>>> On Mon, Apr 15, 2024 at 4:50 PM Xuan Zhuo <xuanzhuo@...ux.alibaba.com> wrote:
>>>>>>>
>>>>>>> On Mon, 15 Apr 2024 14:43:24 +0800, Jason Wang <jasowang@...hat.com> wrote:
>>>>>>>> On Mon, Apr 15, 2024 at 10:35 AM Xuan Zhuo <xuanzhuo@...ux.alibaba.com> wrote:
>>>>>>>>>
>>>>>>>>> On Fri, 12 Apr 2024 13:49:12 +0800, Jason Wang <jasowang@...hat.com> wrote:
>>>>>>>>>> On Fri, Apr 12, 2024 at 1:39 PM Xuan Zhuo <xuanzhuo@...ux.alibaba.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>> On Fri, 12 Apr 2024 12:47:55 +0800, Jason Wang <jasowang@...hat.com> wrote:
>>>>>>>>>>>> On Thu, Apr 11, 2024 at 10:51 AM Xuan Zhuo <xuanzhuo@...ux.alibaba.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>> Now, we chain the pages of big mode by the page's private variable.
>>>>>>>>>>>>> But a subsequent patch aims to make the big mode to support
>>>>>>>>>>>>> premapped mode. This requires additional space to store the dma addr.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Within the sub-struct that contains the 'private', there is no suitable
>>>>>>>>>>>>> variable for storing the DMA addr.
>>>>>>>>>>>>>
>>>>>>>>>>>>>                  struct {        /* Page cache and anonymous pages */
>>>>>>>>>>>>>                          /**
>>>>>>>>>>>>>                           * @lru: Pageout list, eg. active_list protected by
>>>>>>>>>>>>>                           * lruvec->lru_lock.  Sometimes used as a generic list
>>>>>>>>>>>>>                           * by the page owner.
>>>>>>>>>>>>>                           */
>>>>>>>>>>>>>                          union {
>>>>>>>>>>>>>                                  struct list_head lru;
>>>>>>>>>>>>>
>>>>>>>>>>>>>                                  /* Or, for the Unevictable "LRU list" slot */
>>>>>>>>>>>>>                                  struct {
>>>>>>>>>>>>>                                          /* Always even, to negate PageTail */
>>>>>>>>>>>>>                                          void *__filler;
>>>>>>>>>>>>>                                          /* Count page's or folio's mlocks */
>>>>>>>>>>>>>                                          unsigned int mlock_count;
>>>>>>>>>>>>>                                  };
>>>>>>>>>>>>>
>>>>>>>>>>>>>                                  /* Or, free page */
>>>>>>>>>>>>>                                  struct list_head buddy_list;
>>>>>>>>>>>>>                                  struct list_head pcp_list;
>>>>>>>>>>>>>                          };
>>>>>>>>>>>>>                          /* See page-flags.h for PAGE_MAPPING_FLAGS */
>>>>>>>>>>>>>                          struct address_space *mapping;
>>>>>>>>>>>>>                          union {
>>>>>>>>>>>>>                                  pgoff_t index;          /* Our offset within mapping. */
>>>>>>>>>>>>>                                  unsigned long share;    /* share count for fsdax */
>>>>>>>>>>>>>                          };
>>>>>>>>>>>>>                          /**
>>>>>>>>>>>>>                           * @private: Mapping-private opaque data.
>>>>>>>>>>>>>                           * Usually used for buffer_heads if PagePrivate.
>>>>>>>>>>>>>                           * Used for swp_entry_t if PageSwapCache.
>>>>>>>>>>>>>                           * Indicates order in the buddy system if PageBuddy.
>>>>>>>>>>>>>                           */
>>>>>>>>>>>>>                          unsigned long private;
>>>>>>>>>>>>>                  };
>>>>>>>>>>>>>
>>>>>>>>>>>>> But within the page pool struct, we have a variable called
>>>>>>>>>>>>> dma_addr that is appropriate for storing dma addr.
>>>>>>>>>>>>> And that struct is used by netstack. That works to our advantage.
>>>>>>>>>>>>>
>>>>>>>>>>>>>                  struct {        /* page_pool used by netstack */
>>>>>>>>>>>>>                          /**
>>>>>>>>>>>>>                           * @pp_magic: magic value to avoid recycling non
>>>>>>>>>>>>>                           * page_pool allocated pages.
>>>>>>>>>>>>>                           */
>>>>>>>>>>>>>                          unsigned long pp_magic;
>>>>>>>>>>>>>                          struct page_pool *pp;
>>>>>>>>>>>>>                          unsigned long _pp_mapping_pad;
>>>>>>>>>>>>>                          unsigned long dma_addr;
>>>>>>>>>>>>>                          atomic_long_t pp_ref_count;
>>>>>>>>>>>>>                  };
>>>>>>>>>>>>>
>>>>>>>>>>>>> On the other side, we should use variables from the same sub-struct.
>>>>>>>>>>>>> So this patch replaces the "private" with "pp".
>>>>>>>>>>>>>
>>>>>>>>>>>>> Signed-off-by: Xuan Zhuo <xuanzhuo@...ux.alibaba.com>
>>>>>>>>>>>>> ---
>>>>>>>>>>>>
>>>>>>>>>>>> Instead of doing a customized version of page pool, can we simply
>>>>>>>>>>>> switch to use page pool for big mode instead? Then we don't need to
>>>>>>>>>>>> bother the dma stuffs.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> The page pool needs to do the dma by the DMA APIs.
>>>>>>>>>>> So we can not use the page pool directly.
>>>>>>>>>>
>>>>>>>>>> I found this:
>>>>>>>>>>
>>>>>>>>>> define PP_FLAG_DMA_MAP         BIT(0) /* Should page_pool do the DMA
>>>>>>>>>>                                          * map/unmap
>>>>>>>>>>
>>>>>>>>>> It seems to work here?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I have studied the page pool mechanism and believe that we cannot use it
>>>>>>>>> directly. We can make the page pool to bypass the DMA operations.
>>>>>>>>> This allows us to handle DMA within virtio-net for pages allocated from the page
>>>>>>>>> pool. Furthermore, we can utilize page pool helpers to associate the DMA address
>>>>>>>>> to the page.
>>>>>>>>>
>>>>>>>>> However, the critical issue pertains to unmapping. Ideally, we want to return
>>>>>>>>> the mapped pages to the page pool and reuse them. In doing so, we can omit the
>>>>>>>>> unmapping and remapping steps.
>>>>>>>>>
>>>>>>>>> Currently, there's a caveat: when the page pool cache is full, it disconnects
>>>>>>>>> and releases the pages. When the pool hits its capacity, pages are relinquished
>>>>>>>>> without a chance for unmapping.

Could Jakub's memory provider for PP help your use-case?

See: [1] 
https://lore.kernel.org/all/20240403002053.2376017-3-almasrymina@google.com/
Or: [2]
https://lore.kernel.org/netdev/f8270765-a27b-6ccf-33ea-cda097168d79@redhat.com/T/


[...]
>>>>>>
>>>>>> Adding Jesper for some comments.
>>>>>>
>>>>>>>
>>>>>>> Back to this patch set, I think we should keep the virtio-net to manage
>>>>>>> the pages.
>>>>>>>

For context the patch:
  [3] 
https://lore.kernel.org/all/20240411025127.51945-4-xuanzhuo@linux.alibaba.com/

>>>>>>> What do you think?
>>>>>>
>>>>>> I might be wrong, but I think if we need to either
>>>>>>
>>>>>> 1) seek a way to manage the pages by yourself but not touching page
>>>>>> pool metadata (or Jesper is fine with this)
>>>>>
>>>>> Do you mean working with page pool or not?
>>>>>
>>>>
>>>> I meant if Jesper is fine with reusing page pool metadata like this patch.
>>>>
>>>>> If we manage the pages by self(no page pool), we do not care the metadata is for
>>>>> page pool or not. We just use the space of pages like the "private".
>>>>
>>>> That's also fine.
>>>>

I'm not sure it is "fine" to, explicitly choosing not to use page pool,
and then (ab)use `struct page` member (pp) that intended for page_pool
for other stuff. (In this case create a linked list of pages).

  +#define page_chain_next(p)	((struct page *)((p)->pp))
  +#define page_chain_add(p, n)	((p)->pp = (void *)n)

I'm not sure that I (as PP maintainer) can make this call actually, as I
think this area belong with the MM "page" maintainers (Cc MM-list +
people) to judge.

Just invention new ways to use struct page fields without adding your
use-case to struct page, will make it harder for MM people to maintain
(e.g. make future change).

--Jesper



Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ