linux-kernel - Re: [BUG] Race between policy reload sidtab conversion and live conversion

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20210224143651.GE6000@sequoia>
Date:   Wed, 24 Feb 2021 08:36:51 -0600
From:   Tyler Hicks <tyhicks@...ux.microsoft.com>
To:     Ondrej Mosnacek <omosnace@...hat.com>
Cc:     Paul Moore <paul@...l-moore.com>,
        Stephen Smalley <stephen.smalley.work@...il.com>,
        SElinux list <selinux@...r.kernel.org>,
        Linux kernel mailing list <linux-kernel@...r.kernel.org>
Subject: Re: [BUG] Race between policy reload sidtab conversion and live
 conversion

On 2021-02-24 10:33:46, Ondrej Mosnacek wrote:
> On Tue, Feb 23, 2021 at 11:37 PM Tyler Hicks
> <tyhicks@...ux.microsoft.com> wrote:
> > On 2021-02-23 15:50:56, Tyler Hicks wrote:
> > > On 2021-02-23 15:43:48, Tyler Hicks wrote:
> > > > I'm seeing a race during policy load while the "regular" sidtab
> > > > conversion is happening and a live conversion starts to take place in
> > > > sidtab_context_to_sid().
> > > >
> > > > We have an initial policy that's loaded by systemd ~0.6s into boot and
> > > > then another policy gets loaded ~2-3s into boot. That second policy load
> > > > is what hits the race condition situation because the sidtab is only
> > > > partially populated and there's a decent amount of filesystem operations
> > > > happening, at the same time, which are triggering live conversions.
> >
> > Hmm, perhaps this is the same problem that's fixed by Ondrej's proposed
> > change here:
> >
> >  https://lore.kernel.org/selinux/20210212185930.130477-3-omosnace@redhat.com/
> >
> > I'll put these changes through a validation run (the only place that I
> > can seem to reproduce this crash) and see how it looks.
> 
> Hm... I think there is actually another race condition introduced by
> the switch from rwlock to RCU [1]... Judging from the call trace you
> may be hitting that.

I believe your patches above fixed the race I was seeing. I was able to
make it through a full validation run without any crashes. Without those
patches applied, I would see several crashes resulting from this race
over the course of a validation run.

I'll continue to test with your changes and let you know if I end up
running into the other race you spotted.

Tyler

> 
> Basically, before the switch the sidtab swapover worked like this:
> 1. Start live conversion of new entries.
> 2. Convert existing entries.
> [Still only the old sidtab is visible to readers here.]
> 3. Swap sidtab under write lock.
> 4. Now only the new sidtab is visible to readers, so the old one can
> be destroyed.
> 
> After the switch to RCU, we now have:
> 1. Start live conversion of new entries.
> 2. Convert existing entries.
> 3. RCU-assign the new policy pointer to selinux_state.
> [!!! Now actually both old and new sidtab may be referenced by
> readers, since there is no synchronization barrier previously provided
> by the write lock.]
> 4. Wait for synchronize_rcu() to return.
> 5. Now only the new sidtab is visible to readers, so the old one can
> be destroyed.
> 
> So the race can happen between 3. and 5., if one thread already sees
> the new sidtab and adds a new entry there, and a second thread still
> has the reference to the old sidtab and also tires to add a new entry;
> live-converting to the new sidtab, which it doesn't expect to change
> by itself. Unfortunately I failed to realize this when reviewing the
> patch :/
> 
> I think the only two options to fix it are A) switching back to
> read-write lock (the easy and safe way; undoing the performance
> benefits of [1]), or B) implementing a safe two-way live conversion of
> new sidtab entries, so that both tables are kept in sync while they
> are both available (more complicated and with possible tricky
> implications of different interpretations of contexts by the two
> policies).
> 
> [1] 1b8b31a2e612 ("selinux: convert policy read-write lock to RCU")
> 
> --
> Ondrej Mosnacek
> Software Engineer, Linux Security - SELinux kernel
> Red Hat, Inc.
>