lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date: Fri,  7 Jul 2023 11:39:23 -0700
From: Jakub Kicinski <kuba@...nel.org>
To: netdev@...r.kernel.org
Cc: almasrymina@...gle.com,
	hawk@...nel.org,
	ilias.apalodimas@...aro.org,
	edumazet@...gle.com,
	dsahern@...il.com,
	michael.chan@...adcom.com,
	willemb@...gle.com,
	Jakub Kicinski <kuba@...nel.org>
Subject: [RFC 00/12] net: huge page backed page_pool

Hi!

This is an "early PoC" at best. It seems to work for a basic
traffic test but there's no uAPI and a lot more general polish
is needed.

The problem we're seeing is that performance of some older NICs
degrades quite a bit when IOMMU is used (in non-passthru mode).
There is a long tail of old NICs deployed, especially in PoPs/
/on edge. From a conversation I had with Eric a few months
ago it sounded like others may have similar issues. So I thought
I'd take a swing at getting page pool to feed drivers huge pages.
1G pages require hooking into early init via CMA but it works
just fine.

I haven't tested this with a real workload, because I'm still
waiting to get my hands on the right machine. But the experiment
with bnxt shows a ~90% reduction in IOTLB misses (670k -> 70k).

In terms of the missing parts - uAPI is definitely needed.
The rough plan would be to add memory config via the netdev
genl family. Should fit nicely there. Have the config stored
in struct netdevice. When page pool is created get to the netdev
and automatically select the provider without the driver even
knowing. Two problems with that are - 1) if the driver follows
the recommended flow of allocating new queues before freeing
old ones we will have page pools created before the old ones
are gone, which means we'd need to reserve 2x the number of
1G pages; 2) there's no callback to the driver to say "I did
something behind your back, don't worry about it, but recreate
your queues, please" so the change will not take effect until
some unrelated change like installing XDP. Which may be fine
in practice but is a bit odd.

Then we get into hand-wavy stuff like - if we can link page
pools to netdevs, we should also be able to export the page pool
stats via the netdev family instead doing it the ethtool -S.. ekhm..
"way". And if we start storing configs behind driver's back why
don't we also store other params, like ring size and queue count...
A lot of potential improvements as we iron out a new API...

Live tree: https://github.com/kuba-moo/linux/tree/pp-providers

Jakub Kicinski (12):
  net: hack together some page sharing
  net: create a 1G-huge-page-backed allocator
  net: page_pool: hide page_pool_release_page()
  net: page_pool: merge page_pool_release_page() with
    page_pool_return_page()
  net: page_pool: factor out releasing DMA from releasing the page
  net: page_pool: create hooks for custom page providers
  net: page_pool: add huge page backed memory providers
  eth: bnxt: let the page pool manage the DMA mapping
  eth: bnxt: use the page pool for data pages
  eth: bnxt: make sure we make for recycle skbs before freeing them
  eth: bnxt: wrap coherent allocations into helpers
  eth: bnxt: hack in the use of MEP

 Documentation/networking/page_pool.rst        |  10 +-
 arch/x86/kernel/setup.c                       |   6 +-
 drivers/net/ethernet/broadcom/bnxt/bnxt.c     | 154 +++--
 drivers/net/ethernet/broadcom/bnxt/bnxt.h     |   5 +
 drivers/net/ethernet/engleder/tsnep_main.c    |   2 +-
 .../net/ethernet/stmicro/stmmac/stmmac_main.c |   4 +-
 include/net/dcalloc.h                         |  28 +
 include/net/page_pool.h                       |  36 +-
 net/core/Makefile                             |   2 +-
 net/core/dcalloc.c                            | 615 +++++++++++++++++
 net/core/dcalloc.h                            |  96 +++
 net/core/page_pool.c                          | 625 +++++++++++++++++-
 12 files changed, 1478 insertions(+), 105 deletions(-)
 create mode 100644 include/net/dcalloc.h
 create mode 100644 net/core/dcalloc.c
 create mode 100644 net/core/dcalloc.h

-- 
2.41.0


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ