lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <4d02978c-03c0-48fe-84eb-0f3fa0c54fea@os.amperecomputing.com>
Date: Mon, 14 Apr 2025 14:24:41 -0700
From: Yang Shi <yang@...amperecomputing.com>
To: Ryan Roberts <ryan.roberts@....com>, will@...nel.org,
 catalin.marinas@....com, Miko.Lenczewski@....com,
 scott@...amperecomputing.com, cl@...two.org
Cc: linux-arm-kernel@...ts.infradead.org, linux-kernel@...r.kernel.org
Subject: Re: [v3 PATCH 0/6] arm64: support FEAT_BBM level 2 and large block
 mapping when rodata=full



On 4/14/25 6:03 AM, Ryan Roberts wrote:
> On 10/04/2025 23:00, Yang Shi wrote:
>> Hi Ryan,
>>
>> I know you may have a lot of things to follow up after LSF/MM. Just gently ping,
>> hopefully we can resume the review soon.
> Hi, I'm out on holiday at the moment, returning on the 22nd April. But I'm very
> keen to move this series forward so will come back to you next week. (although
> TBH, I thought I was waiting for you to respond to me... :-| )
>
> FWIW, having thought about it a bit more, I think some of the suggestions I
> previously made may not have been quite right, but I'll elaborate next week. I'm
> keen to build a pgtable splitting primitive here that we can reuse with vmalloc
> as well to enable huge mappings by default with vmalloc too.

Sounds good. I think the patches can support splitting vmalloc page 
table too. Anyway we can discuss more after you are back. Enjoy your 
holiday.

Thanks,
Yang

>
> Thanks,
> Ryan
>
>> Thanks,
>> Yang
>>
>>
>> On 3/13/25 10:40 AM, Yang Shi wrote:
>>>
>>> On 3/13/25 10:36 AM, Ryan Roberts wrote:
>>>> On 13/03/2025 17:28, Yang Shi wrote:
>>>>> Hi Ryan,
>>>>>
>>>>> I saw Miko posted a new spin of his patches. There are some slight changes that
>>>>> have impact to my patches (basically check the new boot parameter). Do you
>>>>> prefer I rebase my patches on top of his new spin right now then restart review
>>>>> from the new spin or review the current patches then solve the new review
>>>>> comments and rebase to Miko's new spin together?
>>>> Hi Yang,
>>>>
>>>> Sorry I haven't got to reviewing this version yet, it's in my queue!
>>>>
>>>> I'm happy to review against v3 as it is. I'm familiar with Miko's series and am
>>>> not too bothered about the integration with that; I think it's pretty straight
>>>> forward. I'm more interested in how you are handling the splitting, which I
>>>> think is the bulk of the effort.
>>> Yeah, sure, thank you.
>>>
>>>> I'm hoping to get to this next week before heading out to LSF/MM the following
>>>> week (might I see you there?)
>>> Unfortunately I can't make it this year. Have a fun!
>>>
>>> Thanks,
>>> Yang
>>>
>>>> Thanks,
>>>> Ryan
>>>>
>>>>
>>>>> Thanks,
>>>>> Yang
>>>>>
>>>>>
>>>>> On 3/4/25 2:19 PM, Yang Shi wrote:
>>>>>> Changelog
>>>>>> =========
>>>>>> v3:
>>>>>>      * Rebased to v6.14-rc4.
>>>>>>      * Based on Miko's BBML2 cpufeature patch (https://lore.kernel.org/linux-
>>>>>> arm-kernel/20250228182403.6269-3-miko.lenczewski@....com/).
>>>>>>        Also included in this series in order to have the complete patchset.
>>>>>>      * Enhanced __create_pgd_mapping() to handle split as well per Ryan.
>>>>>>      * Supported CONT mappings per Ryan.
>>>>>>      * Supported asymmetric system by splitting kernel linear mapping if such
>>>>>>        system is detected per Ryan. I don't have such system to test, so the
>>>>>>        testing is done by hacking kernel to call linear mapping repainting
>>>>>>        unconditionally. The linear mapping doesn't have any block and cont
>>>>>>        mappings after booting.
>>>>>>
>>>>>> RFC v2:
>>>>>>      * Used allowlist to advertise BBM lv2 on the CPUs which can handle TLB
>>>>>>        conflict gracefully per Will Deacon
>>>>>>      * Rebased onto v6.13-rc5
>>>>>>      * https://lore.kernel.org/linux-arm-kernel/20250103011822.1257189-1-
>>>>>> yang@...amperecomputing.com/
>>>>>>
>>>>>> RFC v1: https://lore.kernel.org/lkml/20241118181711.962576-1-
>>>>>> yang@...amperecomputing.com/
>>>>>>
>>>>>> Description
>>>>>> ===========
>>>>>> When rodata=full kernel linear mapping is mapped by PTE due to arm's
>>>>>> break-before-make rule.
>>>>>>
>>>>>> A number of performance issues arise when the kernel linear map is using
>>>>>> PTE entries due to arm's break-before-make rule:
>>>>>>      - performance degradation
>>>>>>      - more TLB pressure
>>>>>>      - memory waste for kernel page table
>>>>>>
>>>>>> These issues can be avoided by specifying rodata=on the kernel command
>>>>>> line but this disables the alias checks on page table permissions and
>>>>>> therefore compromises security somewhat.
>>>>>>
>>>>>> With FEAT_BBM level 2 support it is no longer necessary to invalidate the
>>>>>> page table entry when changing page sizes.  This allows the kernel to
>>>>>> split large mappings after boot is complete.
>>>>>>
>>>>>> This patch adds support for splitting large mappings when FEAT_BBM level 2
>>>>>> is available and rodata=full is used. This functionality will be used
>>>>>> when modifying page permissions for individual page frames.
>>>>>>
>>>>>> Without FEAT_BBM level 2 we will keep the kernel linear map using PTEs
>>>>>> only.
>>>>>>
>>>>>> If the system is asymmetric, the kernel linear mapping may be repainted once
>>>>>> the BBML2 capability is finalized on all CPUs.  See patch #6 for more details.
>>>>>>
>>>>>> We saw significant performance increases in some benchmarks with
>>>>>> rodata=full without compromising the security features of the kernel.
>>>>>>
>>>>>> Testing
>>>>>> =======
>>>>>> The test was done on AmpereOne machine (192 cores, 1P) with 256GB memory and
>>>>>> 4K page size + 48 bit VA.
>>>>>>
>>>>>> Function test (4K/16K/64K page size)
>>>>>>      - Kernel boot.  Kernel needs change kernel linear mapping permission at
>>>>>>        boot stage, if the patch didn't work, kernel typically didn't boot.
>>>>>>      - Module stress from stress-ng. Kernel module load change permission for
>>>>>>        linear mapping.
>>>>>>      - A test kernel module which allocates 80% of total memory via vmalloc(),
>>>>>>        then change the vmalloc area permission to RO, this also change linear
>>>>>>        mapping permission to RO, then change it back before vfree(). Then
>>>>>> launch
>>>>>>        a VM which consumes almost all physical memory.
>>>>>>      - VM with the patchset applied in guest kernel too.
>>>>>>      - Kernel build in VM with guest kernel which has this series applied.
>>>>>>      - rodata=on. Make sure other rodata mode is not broken.
>>>>>>      - Boot on the machine which doesn't support BBML2.
>>>>>>
>>>>>> Performance
>>>>>> ===========
>>>>>> Memory consumption
>>>>>> Before:
>>>>>> MemTotal:       258988984 kB
>>>>>> MemFree:        254821700 kB
>>>>>>
>>>>>> After:
>>>>>> MemTotal:       259505132 kB
>>>>>> MemFree:        255410264 kB
>>>>>>
>>>>>> Around 500MB more memory are free to use.  The larger the machine, the
>>>>>> more memory saved.
>>>>>>
>>>>>> Performance benchmarking
>>>>>> * Memcached
>>>>>> We saw performance degradation when running Memcached benchmark with
>>>>>> rodata=full vs rodata=on.  Our profiling pointed to kernel TLB pressure.
>>>>>> With this patchset we saw ops/sec is increased by around 3.5%, P99
>>>>>> latency is reduced by around 9.6%.
>>>>>> The gain mainly came from reduced kernel TLB misses.  The kernel TLB
>>>>>> MPKI is reduced by 28.5%.
>>>>>>
>>>>>> The benchmark data is now on par with rodata=on too.
>>>>>>
>>>>>> * Disk encryption (dm-crypt) benchmark
>>>>>> Ran fio benchmark with the below command on a 128G ramdisk (ext4) with disk
>>>>>> encryption (by dm-crypt).
>>>>>> fio --directory=/data --random_generator=lfsr --norandommap --randrepeat 1 \
>>>>>>        --status-interval=999 --rw=write --bs=4k --loops=1 --ioengine=sync \
>>>>>>        --iodepth=1 --numjobs=1 --fsync_on_close=1 --group_reporting --thread \
>>>>>>        --name=iops-test-job --eta-newline=1 --size 100G
>>>>>>
>>>>>> The IOPS is increased by 90% - 150% (the variance is high, but the worst
>>>>>> number of good case is around 90% more than the best number of bad case).
>>>>>> The bandwidth is increased and the avg clat is reduced proportionally.
>>>>>>
>>>>>> * Sequential file read
>>>>>> Read 100G file sequentially on XFS (xfs_io read with page cache populated).
>>>>>> The bandwidth is increased by 150%.
>>>>>>
>>>>>>
>>>>>> Mikołaj Lenczewski (1):
>>>>>>          arm64: Add BBM Level 2 cpu feature
>>>>>>
>>>>>> Yang Shi (5):
>>>>>>          arm64: cpufeature: add AmpereOne to BBML2 allow list
>>>>>>          arm64: mm: make __create_pgd_mapping() and helpers non-void
>>>>>>          arm64: mm: support large block mapping when rodata=full
>>>>>>          arm64: mm: support split CONT mappings
>>>>>>          arm64: mm: split linear mapping if BBML2 is not supported on secondary
>>>>>> CPUs
>>>>>>
>>>>>>     arch/arm64/Kconfig                  |  11 +++++
>>>>>>     arch/arm64/include/asm/cpucaps.h    |   2 +
>>>>>>     arch/arm64/include/asm/cpufeature.h |  15 ++++++
>>>>>>     arch/arm64/include/asm/mmu.h        |   4 ++
>>>>>>     arch/arm64/include/asm/pgtable.h    |  12 ++++-
>>>>>>     arch/arm64/kernel/cpufeature.c      |  95 ++++++++++++++++++++++++++++++
>>>>>> +++++++
>>>>>>     arch/arm64/mm/mmu.c                 | 397 ++++++++++++++++++++++++++++++
>>>>>> ++++
>>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>>> ++++++++++++++++++++++-------------------
>>>>>>     arch/arm64/mm/pageattr.c            |  37 ++++++++++++---
>>>>>>     arch/arm64/tools/cpucaps            |   1 +
>>>>>>     9 files changed, 518 insertions(+), 56 deletions(-)
>>>>>>
>>>>>>


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ