linux-kernel - Re: [PATCH 2/3] audit: annotate branch direction for audit_in

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <8735cagcgi.fsf@oracle.com>
Date:   Thu, 29 Sep 2022 13:19:57 -0700
From:   Ankur Arora <ankur.a.arora@...cle.com>
To:     Paul Moore <paul@...l-moore.com>
Cc:     Ankur Arora <ankur.a.arora@...cle.com>, linux-audit@...hat.com,
        eparis@...hat.com, linux-kernel@...r.kernel.org,
        boris.ostrovsky@...cle.com, konrad.wilk@...cle.com
Subject: Re: [PATCH 2/3] audit: annotate branch direction for audit_in_mask()


Paul Moore <paul@...l-moore.com> writes:

> On Tue, Sep 27, 2022 at 7:00 PM Ankur Arora <ankur.a.arora@...cle.com> wrote:
>>
>> With sane audit rules, audit logging would only be triggered
>> infrequently. Keeping this in mind, annotate audit_in_mask() as
>> unlikely() to allow the compiler to pessimize the call to
>> audit_filter_rules().
>>
>> This allows GCC to invert the branch direction for the audit_filter_rules()
>> basic block in this loop:
>>
>>         list_for_each_entry_rcu(e, &audit_filter_list[AUDIT_FILTER_EXIT], list) {
>>                 if (audit_in_mask(&e->rule, major) &&
>>                     audit_filter_rules(tsk, &e->rule, ctx, NULL,
>>                                        &state, false)) {
>>                         ...
>>
>> such that it executes the common case in a straight line fashion.
>>
>> On a Skylakex system change in getpid() latency (all results
>> aggregated across 12 boot cycles):
>>
>>          Min     Mean    Median   Max      pstdev
>>          (ns)    (ns)    (ns)     (ns)
>>
>>  -    196.63   207.86  206.60  230.98      (+- 3.92%)
>>  +    173.11   182.51  179.65  202.09      (+- 4.34%)
>>
>> Performance counter stats for 'bin/getpid' (3 runs) go from:
>>     cycles               805.58  (  +-  4.11% )
>>     instructions        1654.11  (  +-   .05% )
>>     IPC                    2.06  (  +-  3.39% )
>>     branches             430.02  (  +-   .05% )
>>     branch-misses          1.55  (  +-  7.09% )
>>     L1-dcache-loads      440.01  (  +-   .09% )
>>     L1-dcache-load-misses  9.05  (  +- 74.03% )
>>
>>  to:
>>     cycles               706.13  (  +-  4.13% )
>>     instructions        1654.70  (  +-   .06% )
>>     IPC                    2.35  (  +-  4.25% )
>>     branches             430.99  (  +-   .06% )
>>     branch-misses          0.50  (  +-  2.00% )
>>     L1-dcache-loads      440.02  (  +-   .07% )
>>     L1-dcache-load-misses  5.22  (  +- 82.75% )
>>
>> (Both aggregated over 12 boot cycles.)
>>
>> cycles: performance improves on average by ~100 cycles/call. IPC
>> improves commensurately. Two reasons for this improvement:
>>
>>   * one fewer branch mispred: no obvious reason for this
>>     branch-miss reduction. There is no significant change in
>>     basic-block structure (apart from the branch inversion.)
>>
>>   * the direction of the branch for the call is now inverted, so it
>>     chooses the not-taken direction more often. The issue-latency
>>     for not-taken branches is often cheaper.
>>
>> Signed-off-by: Ankur Arora <ankur.a.arora@...cle.com>
>> ---
>>  kernel/auditsc.c | 15 ++++++++-------
>>  1 file changed, 8 insertions(+), 7 deletions(-)
>
> I generally dislike merging likely()/unlikely() additions to code
> paths that can have varying levels of performance depending on runtime
> configuration.

I think that's fair, and in this particular case the benchmark is quite
contrived.

But, just to elaborate a bit more on why that unlikely() clause made
sense to me: it seems to me that audit typically would be triggered for
control syscalls and the ratio between control and non-control ones
would be fairly lopsided.

Let me see if I can rewrite the conditional in a different way to get a
similar effect but I suspect that might be even more compiler dependent.

Also, let me run the audit-testsuite this time. Is there a good test
there that you would recommend that might serve as a more representative
workload?


Thanks
Ankur

> While I appreciate the work you are doing to improve
> audit performance, I don't think this is something I want to merge,
> I'm sorry.



>
>> diff --git a/kernel/auditsc.c b/kernel/auditsc.c
>> index 533b087c3c02..bf26f47b5226 100644
>> --- a/kernel/auditsc.c
>> +++ b/kernel/auditsc.c
>> @@ -789,7 +789,7 @@ static enum audit_state audit_filter_task(struct task_struct *tsk, char **key)
>>         return AUDIT_STATE_BUILD;
>>  }
>>
>> -static int audit_in_mask(const struct audit_krule *rule, unsigned long val)
>> +static bool audit_in_mask(const struct audit_krule *rule, unsigned long val)
>>  {
>>         int word, bit;
>>
>> @@ -850,12 +850,13 @@ static void audit_filter_syscall(struct task_struct *tsk,
>>
>>         rcu_read_lock();
>>         list_for_each_entry_rcu(e, &audit_filter_list[AUDIT_FILTER_EXIT], list) {
>> -               if (audit_in_mask(&e->rule, major) &&
>> -                   audit_filter_rules(tsk, &e->rule, ctx, NULL,
>> -                                      &state, false)) {
>> -                       rcu_read_unlock();
>> -                       ctx->current_state = state;
>> -                       return;
>> +               if (unlikely(audit_in_mask(&e->rule, major))) {
>> +                       if (audit_filter_rules(tsk, &e->rule, ctx, NULL,
>> +                                              &state, false)) {
>> +                               rcu_read_unlock();
>> +                               ctx->current_state = state;
>> +                               return;
>> +                       }
>>                 }
>>         }
>>         rcu_read_unlock();
>> --
>> 2.31.1


--
ankur