linux-kernel - Re: [PATCH v3 1/2] selinux: Make avc cache slot size configurable during boot

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20251017081050.1171969-1-zhanghongru@xiaomi.com>
Date: Fri, 17 Oct 2025 16:10:50 +0800
From: Hongru Zhang <zhanghongru06@...il.com>
To: paul@...l-moore.com
Cc: linux-kernel@...r.kernel.org,
	omosnace@...hat.com,
	selinux@...r.kernel.org,
	stephen.smalley.work@...il.com,
	zhanghongru06@...il.com,
	zhanghongru@...omi.com
Subject: Re: [PATCH v3 1/2] selinux: Make avc cache slot size configurable during boot

> On Sep 26, 2025 Hongru Zhang <zhanghongru06@...il.com> wrote:
> > 
> > On mobile device high-load situations, permission check can happen
> > more than 90,000/s (8 core system). With default 512 cache nodes
> > configuration, avc cache miss happens more often and occasionally
> > leads to long time (>2ms) irqs off on both big and little cores,
> > which decreases system real-time capability.
> > 
> > An actual call stack is as follows:
> >  => avc_compute_av
> >  => avc_perm_nonode
> >  => avc_has_perm_noaudit
> >  => selinux_capable
> >  => security_capable
> >  => capable
> >  => __sched_setscheduler
> >  => do_sched_setscheduler
> >  => __arm64_sys_sched_setscheduler
> >  => invoke_syscall
> >  => el0_svc_common
> >  => do_el0_svc
> >  => el0_svc
> >  => el0t_64_sync_handler
> >  => el0t_64_sync
> > 
> > Although we can expand avc nodes through /sys/fs/selinux/cache_threshold
> > to mitigate long time irqs off, hash conflicts make the bucket average
> > length longer because of the fixed size of cache slots, leading to
> > avc_search_node latency increase.
> > 
> > Make avc cache slot size also configurable, and with fine tuning, we can
> > mitigate long time irqs off with slightly avc_search_node performance
> > regression.
> > 
> > Theoretically, the main overhead is memory consumption.
> > 
> > avc_search_node avg latency test results (about 100,000,000 times) on
> > Qcom SM8750, 6.6.30-android15-8:
> > 
> > Case 1:
> > +---------+---------------------+------------------------+
> > |         | no-patch (512/512)  | with-patch (512/512)   |
> > +---------+---------------------+------------------------+
> > | latency |        85 ns        |         87 ns          |
> > +---------+---------------------+------------------------+
> > 
> > Case 2:
> > +---------+---------------------+------------------------+
> > |         | no-patch (8192/512) | with-patch (8192/8192) |
> > +---------+---------------------+------------------------+
> > | latency |        277 ns       |         106 ns         |
> > +---------+---------------------+------------------------+
> > 
> > Case 1 shows 512 nodes configuration has ~2% performance regression
> > with patch.
> > Case 2 shows 8192 nodes configuration has ~61% latency benifit with
> > patch.
> > 
> > Signed-off-by: Hongru Zhang <zhanghongru@...omi.com>
> > Reviewed-by: Stephen Smalley <stephen.smalley.work@...il.com>
> > ---
> >  .../admin-guide/kernel-parameters.txt         |  4 ++
> >  security/selinux/avc.c                        | 68 +++++++++++++------
> >  2 files changed, 50 insertions(+), 22 deletions(-)
> 
> I would expect the number of active AVC nodes, and AVC churn in general,
> to be very policy dependent; some policies and use cases simply result in
> more AVC nodes than others.  With that in mind, I'm wondering if instead
> of using a kernel command line parameter to specify the number of AVC
> buckets, we should instead include an AVC size "hint" in the policy that
> we can use to size the AVC when loading a new policy.
> 
> Thoughts?

I previously considered supporting dynamic adjustment of slot size during
runtime, but this seems to introduce code complexity and overhead. Every
time avc_lookup() or avc_insert() happens, we would need to check if the
table exists. Adjusting slot size and accessing a specific slot might
occur simultaneously, potentially requiring additional lock protection.

When increasing slot size, we could directly copy the contents from the
old table. When decreasing slot size, nodes exceeding the new slot size
would need to be re-hashed and attached to appropriate positions.

On my Android device, policies are fixed before system image release and
don't change or load dynamically during system running. Using kernel
parameters for adjustment ensures no additional locks or checks are needed
during runtime table access, maintaining simplicity and efficiency of the
lookup code.

If this is necessary, I would appreciate some assistance and need time
to think and develop it. I'm not a senior developer in this field and
lack sufficient knowledge in this area.

To support this, I think I need to:
1. Update the relevant specifications to add descriptions for this new
    field;
2. Modify the compiler to support this new field while ensuring
    compatibility in the generated policy file;
3. Modify the kernel to parse this field and adjust the slot size
    accordingly based on its value.

I'm unsure if this is correct and comprehensive. I would be very grateful
for any guidance.

> I think it would be important to consider it strictly as a "hint" as
> that would make life easier, e.g. if the previous policy hinted at a
> larger AVC we may not want to bother with reducing the number of buckets.
> I would suggest starting with an implementation that uses the hint as a
> power of two for the number of AVC slots/buckets, with a value of '0'
> indicating a default value (512 slots, e.g. '2^9').

Would this approach lead to slot size monotonically increasing, always 
remaining at the maximum size? This could potentially cause some memory
waste, though the total amount might not be very large.