netdev - Re: kernel BUG at net/unix/garbage.c:149!"

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAOssrKc5Us_ci7gUQAHZyNH4EiXp=jApw_ucAR1KNOLwZmt6Gw@mail.gmail.com>
Date:   Tue, 30 Aug 2016 11:39:58 +0200
From:   Miklos Szeredi <mszeredi@...hat.com>
To:     Nikolay Borisov <kernel@...p.com>
Cc:     Hannes Frederic Sowa <hannes@...essinduktion.org>,
        "Linux-Kernel@...r. Kernel. Org" <linux-kernel@...r.kernel.org>,
        netdev@...r.kernel.org
Subject: Re: kernel BUG at net/unix/garbage.c:149!"

On Tue, Aug 30, 2016 at 11:31 AM, Nikolay Borisov <kernel@...p.com> wrote:
>
>
> On 08/30/2016 12:18 PM, Miklos Szeredi wrote:
>> On Tue, Aug 30, 2016 at 12:37 AM, Miklos Szeredi <mszeredi@...hat.com> wrote:
>>> On Sat, Aug 27, 2016 at 11:55 AM, Miklos Szeredi <mszeredi@...hat.com> wrote:
>>
>>> crash> list -H gc_inflight_list unix_sock.link -s unix_sock.inflight |
>>> grep counter | cut -d= -f2 | awk '{s+=$1} END {print s}'
>>> 130
>>> crash> p unix_tot_inflight
>>> unix_tot_inflight = $2 = 135
>>>
>>> We've lost track of a total of five inflight sockets, so it's not a
>>> one-off thing.  Really weird...  Now off to sleep, maybe I'll dream of
>>> the solution.
>>
>> Okay, found one bug: gc assumes that in-flight sockets that don't have
>> an external ref can't gain one while unix_gc_lock is held.  That is
>> true because unix_notinflight() will be called before detaching fds,
>> which takes unix_gc_lock.  Only MSG_PEEK was somehow overlooked.  That
>> one also clones the fds, also keeping them in the skb.  But through
>> MSG_PEEK an external reference can definitely be gained without ever
>> touching unix_gc_lock.
>>
>> Not sure whether the reported bug can be explained by this.  Can you
>> confirm the MSG_PEEK was used in the setup?
>>
>> Does someone want to write a stress test for SCM_RIGHTS + MSG_PEEK?
>>
>> Anyway, attaching a fix that works by acquiring unix_gc_lock in case
>> of MSG_PEEK also.  It is trivially correct, but I haven't tested it.
>
> I have no way of being 100% sure but looking through nginx's source code
> it seems they do utilize MSG_PEEK on several occasions. This issue has
> been apparently very hard to reproduce since I have 100s of servers
> running a lot of  NGINX processes and this has been triggered only once.
>
> On a different note - if I inspect a live node without this patch should
> the discrepancy between the gc_inflight_list and the unix_tot_inflight
> be present VS with this patch applied?

May well be, since in the vmcore 4 in-flight sockets were "lost"
before triggering the bug.  I guess the best way to check is with a
systemtap script that walks the list with the gc lock.

Thanks,
Miklos