lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <19d3cb0b-e5ec-4a35-9ec5-06522903a80c@linux.microsoft.com>
Date:   Fri, 13 Oct 2023 16:06:15 -0400
From:   Dan Clash <daclash@...ux.microsoft.com>
To:     Paul Moore <paul@...l-moore.com>, Jens Axboe <axboe@...nel.dk>
Cc:     Christian Brauner <brauner@...nel.org>, audit@...r.kernel.org,
        io-uring@...r.kernel.org, linux-kernel@...r.kernel.org,
        linux-fsdevel@...r.kernel.org, dan.clash@...rosoft.com
Subject: Re: [PATCH] audit,io_uring: io_uring openat triggers audit reference
 count underflow



On 2023-10-13 11:43, Paul Moore wrote:
> On Fri, Oct 13, 2023 at 10:21 AM Jens Axboe <axboe@...nel.dk> wrote:
>> On 10/13/23 2:24 AM, Christian Brauner wrote:
>>> On Thu, Oct 12, 2023 at 02:55:18PM -0700, Dan Clash wrote:
>>>> An io_uring openat operation can update an audit reference count
>>>> from multiple threads resulting in the call trace below.
>>>>
>>>> A call to io_uring_submit() with a single openat op with a flag of
>>>> IOSQE_ASYNC results in the following reference count updates.
>>>>
>>>> These first part of the system call performs two increments that do not race.
>>>>
>>>> do_syscall_64()
>>>>    __do_sys_io_uring_enter()
>>>>      io_submit_sqes()
>>>>        io_openat_prep()
>>>>          __io_openat_prep()
>>>>            getname()
>>>>              getname_flags()       /* update 1 (increment) */
>>>>                __audit_getname()   /* update 2 (increment) */
>>>>
>>>> The openat op is queued to an io_uring worker thread which starts the
>>>> opportunity for a race.  The system call exit performs one decrement.
>>>>
>>>> do_syscall_64()
>>>>    syscall_exit_to_user_mode()
>>>>      syscall_exit_to_user_mode_prepare()
>>>>        __audit_syscall_exit()
>>>>          audit_reset_context()
>>>>             putname()              /* update 3 (decrement) */
>>>>
>>>> The io_uring worker thread performs one increment and two decrements.
>>>> These updates can race with the system call decrement.
>>>>
>>>> io_wqe_worker()
>>>>    io_worker_handle_work()
>>>>      io_wq_submit_work()
>>>>        io_issue_sqe()
>>>>          io_openat()
>>>>            io_openat2()
>>>>              do_filp_open()
>>>>                path_openat()
>>>>                  __audit_inode()   /* update 4 (increment) */
>>>>              putname()             /* update 5 (decrement) */
>>>>          __audit_uring_exit()
>>>>            audit_reset_context()
>>>>              putname()             /* update 6 (decrement) */
>>>>
>>>> The fix is to change the refcnt member of struct audit_names
>>>> from int to atomic_t.
>>>>
>>>> kernel BUG at fs/namei.c:262!
>>>> Call Trace:
>>>> ...
>>>>   ? putname+0x68/0x70
>>>>   audit_reset_context.part.0.constprop.0+0xe1/0x300
>>>>   __audit_uring_exit+0xda/0x1c0
>>>>   io_issue_sqe+0x1f3/0x450
>>>>   ? lock_timer_base+0x3b/0xd0
>>>>   io_wq_submit_work+0x8d/0x2b0
>>>>   ? __try_to_del_timer_sync+0x67/0xa0
>>>>   io_worker_handle_work+0x17c/0x2b0
>>>>   io_wqe_worker+0x10a/0x350
>>>>
>>>> Cc: <stable@...r.kernel.org>
>>>> Link: https://lore.kernel.org/lkml/MW2PR2101MB1033FFF044A258F84AEAA584F1C9A@MW2PR2101MB1033.namprd21.prod.outlook.com/
>>>> Fixes: 5bd2182d58e9 ("audit,io_uring,io-wq: add some basic audit support to io_uring")
>>>> Signed-off-by: Dan Clash <daclash@...ux.microsoft.com>
>>>> ---
>>>>   fs/namei.c         | 9 +++++----
>>>>   include/linux/fs.h | 2 +-
>>>>   kernel/auditsc.c   | 8 ++++----
>>>>   3 files changed, 10 insertions(+), 9 deletions(-)
>>>>
>>>> diff --git a/fs/namei.c b/fs/namei.c
>>>> index 567ee547492b..94565bd7e73f 100644
>>>> --- a/fs/namei.c
>>>> +++ b/fs/namei.c
>>>> @@ -188,7 +188,7 @@ getname_flags(const char __user *filename, int flags, int *empty)
>>>>               }
>>>>       }
>>>>
>>>> -    result->refcnt = 1;
>>>> +    atomic_set(&result->refcnt, 1);
>>>>       /* The empty path is special. */
>>>>       if (unlikely(!len)) {
>>>>               if (empty)
>>>> @@ -249,7 +249,7 @@ getname_kernel(const char * filename)
>>>>       memcpy((char *)result->name, filename, len);
>>>>       result->uptr = NULL;
>>>>       result->aname = NULL;
>>>> -    result->refcnt = 1;
>>>> +    atomic_set(&result->refcnt, 1);
>>>>       audit_getname(result);
>>>>
>>>>       return result;
>>>> @@ -261,9 +261,10 @@ void putname(struct filename *name)
>>>>       if (IS_ERR(name))
>>>>               return;
>>>>
>>>> -    BUG_ON(name->refcnt <= 0);
>>>> +    if (WARN_ON_ONCE(!atomic_read(&name->refcnt)))
>>>> +            return;
>>>>
>>>> -    if (--name->refcnt > 0)
>>>> +    if (!atomic_dec_and_test(&name->refcnt))
>>>>               return;
>>>
>>> Fine by me. I'd write this as:
>>>
>>> count = atomic_dec_if_positive(&name->refcnt);
>>> if (WARN_ON_ONCE(unlikely(count < 0))
>>>        return;
>>> if (count > 0)
>>>        return;
>>
>> Would be fine too, my suspicion was that most archs don't implement a
>> primitive for that, and hence it might be more expensive than
>> atomic_read()/atomic_dec_and_test() which do. But I haven't looked at
>> the code generation. The dec_if_positive degenerates to a atomic cmpxchg
>> for most cases.
> 
> I'm not too concerned, either approach works for me, the important bit
> is moving to an atomic_t/refcount_t so we can protect ourselves
> against the race.  The patch looks good to me and I'd like to get this
> fix merged.
> 
> Dan, barring any further back-and-forth on the putname() change, I
> would say to go ahead and make the change Christian suggested and
> repost the patch.  Based on Jens comment above it seems safe to
> preserve his 'Reviewed-by:' tag on the next revision.  Assuming there
> are no objections posted in the meantime, I'll plan to merge the next
> revision into the audit/stable-6.6 branch and get that up to Linus
> (likely next week since it's Friday).

I did not see many arch implementations of atomic_dec_if_positive.
The x86_64 generated code looks like arch_atomic_dec_unless_positive()
in atomic-arch-fallback.h with a loop around lock cmpxchg.

I did not want to compound the email race so I did not send patch v2 but 
I can if desired.


devvm2 ~/linux $ sysctl kernel.arch
kernel.arch = x86_64

devvm2 ~/linux $ cat -n ./fs/namei.c | grep -B 7 -A 4 atomic_dec_if_positive
    259  void putname(struct filename *name)
    260  {
    261          int count;
    262
    263          if (IS_ERR(name))
    264                  return;
    265
    266          count = atomic_dec_if_positive(&name->refcnt);
    267          if (WARN_ON_ONCE(unlikely(count < 0)))
    268                  return;
    269          if (count > 0)
    270                  return;

devvm2 ~/linux $ objdump --disassemble --line-numbers ./fs/namei.o | \
grep -B 8 -A 12 atomic_dec_if_positive
/home/daclash/linux/fs/namei.c:260
      22e:       55                      push   %rbp
      22f:       48 89 e5                mov    %rsp,%rbp
      232:       41 54                   push   %r12
arch_atomic_read():
/home/daclash/linux/./arch/x86/include/asm/atomic.h:23
      234:       8b 47 10                mov    0x10(%rdi),%eax
      237:       49 89 fc                mov    %rdi,%r12
raw_atomic_dec_if_positive():
/home/daclash/linux/./include/linux/atomic/atomic-arch-fallback.h:2535
      23a:       89 c2                   mov    %eax,%edx
      23c:       83 ea 01                sub    $0x1,%edx
      23f:       78 50                   js     291 <putname+0x71>
arch_atomic_try_cmpxchg():
/home/daclash/linux/./arch/x86/include/asm/atomic.h:115
      241:       f0 41 0f b1 54 24 10    lock cmpxchg %edx,0x10(%r12)
      248:       75 f0                   jne    23a <putname+0x1a>
putname():
/home/daclash/linux/fs/namei.c:269
      24a:       85 d2                   test   %edx,%edx
      24c:       75 22                   jne    270 <putname+0x50>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ