linux-kernel - Re: [GIT PULL] sigqueue cache fix

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <YNlapAKObfeVPoQO@gmail.com>
Date:   Mon, 28 Jun 2021 07:14:12 +0200
From:   Ingo Molnar <mingo@...nel.org>
To:     Linus Torvalds <torvalds@...ux-foundation.org>
Cc:     Christian Brauner <christian.brauner@...ntu.com>,
        Oleg Nesterov <oleg@...hat.com>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        Peter Zijlstra <peterz@...radead.org>,
        Andrew Morton <akpm@...ux-foundation.org>
Subject: Re: [GIT PULL] sigqueue cache fix


* Linus Torvalds <torvalds@...ux-foundation.org> wrote:

> On Sun, Jun 27, 2021 at 11:52 AM Linus Torvalds
> <torvalds@...ux-foundation.org> wrote:
> >
> > Ok, I may have confused myself looking at all this, but it does all
> > make me think this is dodgy.
> 
> I also couldn't convince myself that the memory ordering is correct
> for the _contents_ of the sigqueue entry that had its pointer cached,
> although I suspect that is purely a theoretical concern (certainly a
> non-issue on x86).
> 
> So I've reverted the sigqueue cache code, in that I haven't heard
> anything back and I'm not going to delay 5.13 over something small and
> easily undone like this.

I concur that it was the safest to revert this, because it was close to the 
final release.

I think the code is safe, but only by accident. The most critical data race 
isn't well-documented, unless I missed something.

The most fundamental race we can have is this:

      CPU#0

      __sigqueue_alloc()

      [ holds sighand->siglock ]
      [ IRQs off. ]

      q = READ_ONCE(t->sigqueue_cache);
      if (!q || sigqueue_flags)
            q = kmem_cache_alloc(sigqueue_cachep, gfp_flags);
      else
            WRITE_ONCE(t->sigqueue_cache, NULL);


                                CPU#1  

                                __sigqueue_free()

                                [ IRQs off. ]

                                if (!READ_ONCE(current->sigqueue_cache))
                                      WRITE_ONCE(current->sigqueue_cache, q);
                                else
                                      kmem_cache_free(sigqueue_cachep, q);

( Let's assume exit_task_sigqueue_cache() happens while there's no new 
  signal sending going on, so that angle is safe. )

Someone confusingly, *alloc() is the consumer and *free() is the producer 
of the sigqueue_cache.

Here's how I see the 3 fundamental races these two pieces of code may have:

 - Producer <-> producer: The producer cannot race with itself, because it 
   only ever produces into current->sigqueue_cache and has interrupts 
   disabled. We don't send signals from NMI context.

 - Consumer <-> consumer: multiple consumers cannot race with themselves, 
   because they serialize on sighand->siglock.

 - Producer <-> consumer: this is the most interesting race, and I think 
   it's unsafe in theory, because the producer doesn't make sure that any 
   previous writes to the actual queue entry (struct sigqueue *q) have 
   reached storage before the new 'free' entry is advertised to consumers.

   So in principle CPU#0 could see a new sigqueue entry and use it, before 
   it's fully freed.

   In *practice* it's probably safe by accident (or by undocumented 
   intent), because there's an atomic op we have shortly before putting the 
   queue entry into the sigqueue_cache, in __sigqueue_free():

         if (atomic_dec_and_test(&q->user->sigpending))
                free_uid(q->user);

   And atomic_dec_and_test() implies a full barrier - although I haven't 
   found the place where we document it and 
   Documentation/memory-ordering.txt is silent on it. We should probably 
   fix that too.

At minimum the patch adding the ->sigqueue_cache should include a 
well-documented race analysis firmly documenting the implicit barrier after 
the atomic_dec_and_test().

Anyway, I agree with the revert.

Thanks,

	Ingo