lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <83d3f5c1-1f3f-a08e-1632-df8bc7b8ab7b@intel.com>
Date:   Mon, 30 Jan 2023 11:03:18 +0100
From:   Alexander Lobakin <alexandr.lobakin@...el.com>
To:     Jacob Keller <jacob.e.keller@...el.com>
CC:     Intel Wired LAN <intel-wired-lan@...ts.osuosl.org>,
        Anthony Nguyen <anthony.l.nguyen@...el.com>,
        Pavan Kumar Linga <pavan.kumar.linga@...el.com>,
        <netdev@...r.kernel.org>
Subject: Re: [Intel-wired-lan] [PATCH net-next v2 05/13] ice: Fix RDMA latency
 issue by allowing write-combining

From: Jacob Keller <jacob.e.keller@...el.com>
Date: Wed, 18 Jan 2023 17:16:45 -0800

> The current method of mapping the entire BAR region as a single uncacheable
> region does not allow RDMA to use write combining (WC). This results in
> increased latency with RDMA.
> 
> To fix this, we initially planned to reduce the size of the map made by the
> PF driver to include only up to the beginning of the RDMA space.
> Unfortunately this will not work in the future as there are some hardware
> features which use registers beyond the RDMA area. This includes Scalable
> IOV, a virtualization feature being worked on currently.
> 
> Instead of simply reducing the size of the map, we need a solution which
> will allow access to all areas of the address space while leaving the RDMA
> area open to be mapped with write combining.
> 
> To allow for this, and fix the RMDA latency issue without blocking the
> higher areas of the BAR, we need to create multiple separate memory maps.
> Doing so will create a sparse mapping rather than a contiguous single area.
> 
> Replace the void *hw_addr with a special ice_hw_addr structure which
> represents the multiple mappings as a flexible array.
> 
> Based on the available BAR size, map up to 3 regions:
> 
>  * The space before the RDMA section
>  * The RDMA section which wants write combining behavior
>  * The space after the RDMA section

Please don't.

You have[0]:

* io_mapping_init_wc() (+ io_mapping_fini());
* io_mapping_create_wc() (+ io_mapping_free());

^ they do the same (the second just allocates a struct ad-hoc, but it
  can be allocated manually or embedded into a driver structure),

* arch_phys_wc_add() (+ arch_phys_wc_del())[1];

^ optional to make MTRR happy

-- precisely for the case when you need to remap *a part* of BAR in a
different mode.

Splitting BARs, dropping pcim_iomap_regions() and so on, is very wrong.
Not speaking of that it's PCI driver which must own and map all the
memory the device advertises in its PCI config space, and in case of
ice, PCI driver is combined with Ethernet, so it's ice which must own
and map all the memory.
Not speaking of that using a structure with a flex array and creating a
static inline to calculate the pointer each time you need to read/write
a register, hurts performance and looks properly ugly.

The interfaces above must be used by the RDMA driver, right before
mapping its part in WC mode. PCI driver has no idea that someone else
wants to remap its memory differently, so the code doesn't belong here.
I'd drop the patch and let the RDMA team fix/improve their driver.

> 
> Add an ice_get_hw_addr function which converts a register offset into the
> appropriate kernel address based on which chunk it falls into. This does
> cost us slightly more computation overhead for register access as we now
> must check the table each access. However, we can pre-compute the addresses
> where this would most be a problem.
> 
> With this change, the RDMA driver is now free to map the RDMA register
> section as write-combined without impacting access to other device
> registers used by the main PF driver.
> 
> Reported-by: Dave Ertman <david.m.ertman@...el.com>
> Signed-off-by: Jacob Keller <jacob.e.keller@...el.com>
> ---
> Changes since v1:
> * Export ice_get_hw_addr
> * Use ice_get_hw_addr in iRDMA driver
> * Fix the WARN_ON to use %pa instead of %llx for printing a resource_size_t
> 
>  drivers/infiniband/hw/irdma/main.c           |   2 +-
>  drivers/net/ethernet/intel/ice/ice.h         |   4 +-
>  drivers/net/ethernet/intel/ice/ice_base.c    |   5 +-
>  drivers/net/ethernet/intel/ice/ice_ethtool.c |   3 +-
>  drivers/net/ethernet/intel/ice/ice_main.c    | 177 +++++++++++++++++--
>  drivers/net/ethernet/intel/ice/ice_osdep.h   |  48 ++++-
>  drivers/net/ethernet/intel/ice/ice_txrx.h    |   2 +-
>  drivers/net/ethernet/intel/ice/ice_type.h    |   2 +-
>  8 files changed, 219 insertions(+), 24 deletions(-)
[0]
https://elixir.bootlin.com/linux/v6.2-rc6/source/include/linux/io-mapping.h#L42
[1]
https://elixir.bootlin.com/linux/v6.2-rc6/source/arch/x86/include/asm/io.h#L339

Thanks,
Olek

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ