lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20210111062300.28541-1-saiprakash.ranjan@codeaurora.org>
Date:   Mon, 11 Jan 2021 11:52:59 +0530
From:   Sai Prakash Ranjan <saiprakash.ranjan@...eaurora.org>
To:     isaacm@...eaurora.org
Cc:     iommu@...ts.linux-foundation.org, joro@...tes.org,
        linux-arm-kernel@...ts.infradead.org, linux-kernel@...r.kernel.org,
        pdaly@...eaurora.org, pratikp@...eaurora.org, robin.murphy@....com,
        will@...nel.org,
        Sai Prakash Ranjan <saiprakash.ranjan@...eaurora.org>
Subject: Re: [PATCH 0/5] Optimize iommu_map_sg() performance

Hi Isaac,

On 2021-01-09 07:20, Isaac J. Manjarres wrote:
> The iommu_map_sg() code currently iterates through the given
> scatter-gather list, and in the worst case, invokes iommu_map()
> for each element in the scatter-gather list, which calls into
> the IOMMU driver through an indirect call. For an IOMMU driver
> that uses a format supported by the io-pgtable code, the IOMMU
> driver will then call into the io-pgtable code to map the chunk.
> 
> Jumping between the IOMMU core code, the IOMMU driver, and the
> io-pgtable code and back for each element in a scatter-gather list
> is not efficient.
> 
> Instead, add a map_sg() hook in both the IOMMU driver ops and the
> io-pgtable ops. iommu_map_sg() can then call into the IOMMU driver's
> map_sg() hook with the entire scatter-gather list, which can call
> into the io-pgtable map_sg() hook, which can process the entire
> scatter-gather list, signficantly reducing the number of indirect
> calls, and jumps between these layers, boosting performance.
> 
> On a system that uses the ARM SMMU driver, and the ARM LPAE format,
> the current implementation of iommu_map_sg() yields the following
> latencies for mapping scatter-gather lists of various sizes. These
> latencies are calculated by repeating the mapping operation 10 times:
> 
>     size        iommu_map_sg latency
>       4K            0.624 us
>      64K            9.468 us
>       1M          122.557 us
>       2M          239.807 us
>      12M         1435.979 us
>      24M         2884.968 us
>      32M         3832.979 us
> 
> On the same system, the proposed modifications yield the following
> results:
> 
>     size        iommu_map_sg latency
>       4K            3.645 us
>      64K            4.198 us
>       1M           11.010 us
>       2M           17.125 us
>      12M           82.416 us
>      24M          158.677 us
>      32M          210.468 us
> 
> The procedure for collecting the iommu_map_sg latencies is
> the same in both experiments. Clearly, reducing the jumps
> between the different layers in the IOMMU code offers a
> signficant performance boost in iommu_map_sg() latency.
> 

I gave this series a go on chromebook and saw these warnings
and several device probe failures, logs attached below:

WARN corresponds to this code in arm_lpae_map_by_pgsize()

	if (WARN_ON(iaext || (paddr + size) >> cfg->oas))
		return -ERANGE;

Logs:

[    2.411391] ------------[ cut here ]------------
[    2.416149] WARNING: CPU: 6 PID: 56 at drivers/iommu/io-pgtable-arm.c:492 arm_lpae_map_sg+0x234/0x248
[    2.425606] Modules linked in:
[    2.428749] CPU: 6 PID: 56 Comm: kworker/6:1 Not tainted 5.10.5 #970
[    2.440287] Workqueue: events deferred_probe_work_func
[    2.445563] pstate: 20c00009 (nzCv daif +PAN +UAO -TCO BTYPE=--)
[    2.451726] pc : arm_lpae_map_sg+0x234/0x248
[    2.456112] lr : arm_lpae_map_sg+0xe0/0x248
[    2.460410] sp : ffffffc010513750
[    2.463820] x29: ffffffc010513790 x28: ffffffb943332000 
[    2.469281] x27: 0000000ffffff000 x26: ffffffb943d14900 
[    2.474738] x25: 0000000000001000 x24: 0000000103465000 
[    2.480196] x23: 0000000000000001 x22: 0000000103466000 
[    2.485645] x21: 0000000000000003 x20: 0000000000000a20 
[    2.491103] x19: ffffffc010513850 x18: 0000000000000001 
[    2.496562] x17: 0000000000000002 x16: 00000000ffffffff 
[    2.502021] x15: 0000000000000000 x14: 0000000000000000 
[    2.507479] x13: 0000000000000001 x12: 0000000000000000 
[    2.512928] x11: 0000001000000000 x10: 0000000000000000 
[    2.518385] x9 : 0000000000000001 x8 : 0000000040201000 
[    2.523844] x7 : 0000000000000a20 x6 : ffffffb943463000 
[    2.529302] x5 : 0000000000000003 x4 : 0000000000001000 
[    2.534760] x3 : 0000000000000001 x2 : ffffffb941f605a0 
[    2.540219] x1 : 0000000000000003 x0 : 0000000000000e40 
[    2.545679] Call trace:
[    2.548196]  arm_lpae_map_sg+0x234/0x248
[    2.552225]  arm_smmu_map_sg+0x80/0xc4
[    2.556078]  __iommu_map_sg+0x6c/0x188
[    2.559931]  iommu_map_sg_atomic+0x18/0x20
[    2.564144]  iommu_dma_alloc_remap+0x26c/0x34c
[    2.568703]  iommu_dma_alloc+0x9c/0x268
[    2.572647]  dma_alloc_attrs+0x88/0xfc
[    2.576503]  gsi_ring_alloc+0x50/0x144
[    2.580356]  gsi_init+0x2c4/0x5c4
[    2.583766]  ipa_probe+0x14c/0x2b4
[    2.587263]  platform_drv_probe+0x94/0xb4
[    2.591377]  really_probe+0x138/0x348
[    2.595145]  driver_probe_device+0x80/0xb8
[    2.599358]  __device_attach_driver+0x90/0xa8
[    2.603829]  bus_for_each_drv+0x84/0xcc
[    2.607772]  __device_attach+0xc0/0x148
[    2.611713]  device_initial_probe+0x18/0x20
[    2.616012]  bus_probe_device+0x38/0x94
[    2.619953]  deferred_probe_work_func+0x78/0xb0
[    2.624611]  process_one_work+0x210/0x3dc
[    2.628726]  worker_thread+0x284/0x3e0
[    2.632578]  kthread+0x148/0x1a8
[    2.635891]  ret_from_fork+0x10/0x18
[    2.639562] ---[ end trace 9bac18cad6a9862e ]---
[    2.644414] ipa 1e40000.ipa: error -12 allocating channel 0 event ring
[    2.651656] ipa: probe of 1e40000.ipa failed with error -12
[    2.660072] dwc3 a600000.dwc3: Adding to iommu group 8
[    2.668632] xhci-hcd xhci-hcd.13.auto: xHCI Host Controller
[    2.674680] xhci-hcd xhci-hcd.13.auto: new USB bus registered, assigned bus number 1
[    2.683638] ------------[ cut here ]------------
[    2.688391] WARNING: CPU: 6 PID: 56 at drivers/iommu/io-pgtable-arm.c:492 arm_lpae_map_sg+0x234/0x248
[    2.697846] Modules linked in:
[    2.700988] CPU: 6 PID: 56 Comm: kworker/6:1 Tainted: G        W         5.10.5 #970
[    2.713954] Workqueue: events deferred_probe_work_func
[    2.719228] pstate: 20c00009 (nzCv daif +PAN +UAO -TCO BTYPE=--)
[    2.725390] pc : arm_lpae_map_sg+0x234/0x248
[    2.729775] lr : arm_lpae_map_sg+0xe0/0x248
[    2.734073] sp : ffffffc010512e20
[    2.737483] x29: ffffffc010512e60 x28: ffffffb94345e000 
[    2.742942] x27: 0000000ffffff000 x26: ffffffb941fa1500 
[    2.748400] x25: 0000000000001000 x24: 0000000103468000 
[    2.753858] x23: 0000000000000001 x22: 0000000103469000 
[    2.759318] x21: 0000000000000003 x20: 0000000000000a20 
[    2.764777] x19: ffffffc010512f20 x18: 0000000000000001 
[    2.770235] x17: 0000000000000001 x16: 00000000ffffffff 
[    2.775694] x15: 0000000000000000 x14: 0000000000000001 
[    2.781154] x13: 0000000000000001 x12: 0000000000000000 
[    2.786613] x11: 0000001000000000 x10: 0000000000000000 
[    2.792071] x9 : 0000000000000001 x8 : 0000000040201000 
[    2.797529] x7 : 0000000000000000 x6 : ffffffc010512f20 
[    2.802988] x5 : 0000000000000a20 x4 : 0000000000001000 
[    2.808438] x3 : 0000000000000001 x2 : ffffffb9457d6100 
[    2.813896] x1 : 0000000000000003 x0 : 0000000000000e40 
[    2.819356] Call trace:
[    2.821878]  arm_lpae_map_sg+0x234/0x248
[    2.825907]  arm_smmu_map_sg+0x80/0xc4
[    2.829761]  __iommu_map_sg+0x6c/0x188
[    2.833615]  iommu_map_sg_atomic+0x18/0x20
[    2.837827]  iommu_dma_alloc_remap+0x26c/0x34c
[    2.842386]  iommu_dma_alloc+0x9c/0x268
[    2.846329]  dma_alloc_attrs+0x88/0xfc
[    2.850184]  xhci_mem_init+0x200/0x930
[    2.854037]  xhci_init+0xc0/0xec
[    2.857350]  xhci_gen_setup+0x270/0x348
[    2.861292]  xhci_plat_setup+0x4c/0x58
[    2.865146]  usb_add_hcd+0x288/0x430
[    2.868815]  xhci_plat_probe+0x3f8/0x568
[    2.872844]  platform_drv_probe+0x94/0xb4
[    2.876957]  really_probe+0x138/0x348
[    2.880725]  driver_probe_device+0x80/0xb8
[    2.884936]  __device_attach_driver+0x90/0xa8
[    2.889407]  bus_for_each_drv+0x84/0xcc
[    2.893348]  __device_attach+0xc0/0x148
[    2.897289]  device_initial_probe+0x18/0x20
[    2.901587]  bus_probe_device+0x38/0x94
[    2.905529]  device_add+0x214/0x3c4
[    2.909112]  platform_device_add+0x198/0x208
[    2.913497]  dwc3_host_init+0x228/0x2bc
[    2.917437]  dwc3_core_init_mode+0xfc/0x18c
[    2.921735]  dwc3_probe+0x978/0xac8
[    2.925318]  platform_drv_probe+0x94/0xb4
[    2.929432]  really_probe+0x138/0x348
[    2.933200]  driver_probe_device+0x80/0xb8
[    2.937411]  __device_attach_driver+0x90/0xa8
[    2.941881]  bus_for_each_drv+0x84/0xcc
[    2.945823]  __device_attach+0xc0/0x148
[    2.949765]  device_initial_probe+0x18/0x20
[    2.954064]  bus_probe_device+0x38/0x94
[    2.958005]  device_add+0x214/0x3c4
[    2.961590]  of_device_add+0x3c/0x48
[    2.965260]  of_platform_device_create_pdata+0xac/0xec
[    2.970535]  of_platform_bus_create+0x1cc/0x348
[    2.975191]  of_platform_populate+0x78/0xc8
[    2.979490]  dwc3_qcom_probe+0x4e0/0xa88
[    2.983518]  platform_drv_probe+0x94/0xb4
[    2.987632]  really_probe+0x138/0x348
[    2.991400]  driver_probe_device+0x80/0xb8
[    2.995612]  __device_attach_driver+0x90/0xa8
[    3.000084]  bus_for_each_drv+0x84/0xcc
[    3.004025]  __device_attach+0xc0/0x148
[    3.007966]  device_initial_probe+0x18/0x20
[    3.012264]  bus_probe_device+0x38/0x94
[    3.016206]  deferred_probe_work_func+0x78/0xb0
[    3.020864]  process_one_work+0x210/0x3dc
[    3.024979]  worker_thread+0x284/0x3e0
[    3.028833]  kthread+0x148/0x1a8
[    3.032147]  ret_from_fork+0x10/0x18
[    3.035817] ---[ end trace 9bac18cad6a9862f ]---
[    3.041583] xhci-hcd xhci-hcd.13.auto: can't setup: -12
[    3.046950] xhci-hcd xhci-hcd.13.auto: USB bus 1 deregistered
[    3.053107] xhci-hcd: probe of xhci-hcd.13.auto failed with error -12
[    3.062208] coresight etm0: CPU0: ETM v4.2 initialized
[    3.063345] sdhci_msm 7c4000.sdhci: TCXO clk not present (-2)
[    3.067862] coresight etm1: CPU1: ETM v4.2 initialized
[    3.078829] ------------[ cut here ]------------
[    3.079076] coresight etm2: CPU2: ETM v4.2 initialized
[    3.083587] WARNING: CPU: 5 PID: 7 at drivers/iommu/io-pgtable-arm.c:492 arm_lpae_map_sg+0x234/0x248
[    3.083589] Modules linked in:
[    3.089200] coresight etm3: CPU3: ETM v4.2 initialized
[    3.098235] 
[    3.098241] CPU: 5 PID: 7 Comm: kworker/u16:0 Tainted: G        W         5.10.5 #970
[    3.101722] coresight etm4: CPU4: ETM v4.2 initialized
[    3.106672] Workqueue: events_unbound async_run_entry_fn
[    3.131984] pstate: 20c00009 (nzCv daif +PAN +UAO -TCO BTYPE=--)
[    3.138161] pc : arm_lpae_map_sg+0x234/0x248
[    3.142557] lr : arm_lpae_map_sg+0xe0/0x248
[    3.146864] sp : ffffffc0100c3800
[    3.150272] x29: ffffffc0100c3840 x28: ffffffb942394000 
[    3.155736] x27: 0000000ffffff000 x26: ffffffb944010100 
[    3.161198] x25: 0000000000001000 x24: 0000000102326000 
[    3.166660] x23: 0000000000000001 x22: 0000000102327000 
[    3.172122] x21: 0000000000000003 x20: 0000000000000a20 
[    3.177586] x19: ffffffc0100c3900 x18: 0000000000000001 
[    3.183048] x17: 0000000000000001 x16: 00000000ffffffff 
[    3.188512] x15: 0000000000000000 x14: 0000000000000001 
[    3.193973] x13: 0000000000000001 x12: 0000000000000000 
[    3.199434] x11: 0000001000000000 x10: 0000000000000000 
[    3.204895] x9 : 0000000000000001 x8 : 0000000040201000 
[    3.210356] x7 : 0000000000000000 x6 : ffffffc0100c3900 
[    3.215818] x5 : 0000000000000a20 x4 : 0000000000001000 
[    3.221279] x3 : 0000000000000001 x2 : ffffffb94581a100 
[    3.226741] x1 : 0000000000000003 x0 : 0000000000000e40 
[    3.232203] Call trace:
[    3.234722]  arm_lpae_map_sg+0x234/0x248
[    3.238761]  arm_smmu_map_sg+0x80/0xc4
[    3.242615]  __iommu_map_sg+0x6c/0x188
[    3.246468]  iommu_map_sg_atomic+0x18/0x20
[    3.250680]  iommu_dma_alloc_remap+0x26c/0x34c
[    3.255250]  iommu_dma_alloc+0x9c/0x268
[    3.259204]  dma_alloc_attrs+0x88/0xfc
[    3.263058]  sdhci_setup_host+0x250/0xc8c
[    3.267185]  sdhci_msm_cqe_add_host+0x38/0x188
[    3.271756]  sdhci_msm_probe+0x540/0x628
[    3.275795]  platform_drv_probe+0x94/0xb4
[    3.279921]  really_probe+0x138/0x348
[    3.283687]  driver_probe_device+0x80/0xb8
[    3.287899]  __device_attach_driver+0x90/0xa8
[    3.292381]  bus_for_each_drv+0x84/0xcc
[    3.296332]  __device_attach_async_helper+0x80/0xd4
[    3.301347]  async_run_entry_fn+0x4c/0x100
[    3.305559]  process_one_work+0x210/0x3dc
[    3.309684]  worker_thread+0x234/0x3e0
[    3.313537]  kthread+0x148/0x1a8
[    3.316861]  ret_from_fork+0x10/0x18
[    3.320541] ---[ end trace 9bac18cad6a98630 ]---
[    3.325372] mmc1: Unable to allocate ADMA buffers - falling back to standard DMA
<snip>...
[    3.587535] mmc1: running CQE recovery
[    3.591419] mmc1: Got command interrupt 0x00010001 even though no command operation was in progress.
[    3.600796] mmc1: sdhci: ============ SDHCI REGISTER DUMP ===========
[    3.607417] mmc1: sdhci: Sys addr:  0x00000000 | Version:  0x00007202
[    3.614038] mmc1: sdhci: Blk size:  0x00000200 | Blk cnt:  0x00000108

Thanks,
Sai

--
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum, hosted by The Linux Foundation

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ