linux-kernel - Re: memory leak in inotify_update

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CACT4Y+YSP8+Oy0Hm4ss8sH-eoas3ZHgUe18rVwDif8uba+qTxA@mail.gmail.com>
Date:   Wed, 8 Jul 2020 13:15:00 +0200
From:   Dmitry Vyukov <dvyukov@...gle.com>
To:     Catalin Marinas <catalin.marinas@....com>
Cc:     Jan Kara <jack@...e.cz>,
        syzbot <syzbot+dec34b033b3479b9ef13@...kaller.appspotmail.com>,
        Amir Goldstein <amir73il@...il.com>,
        linux-fsdevel <linux-fsdevel@...r.kernel.org>,
        LKML <linux-kernel@...r.kernel.org>,
        syzkaller-bugs <syzkaller-bugs@...glegroups.com>,
        syzkaller <syzkaller@...glegroups.com>
Subject: Re: memory leak in inotify_update_watch

On Wed, Jul 8, 2020 at 1:08 PM Catalin Marinas <catalin.marinas@....com> wrote:
>
> On Wed, Jul 08, 2020 at 09:17:37AM +0200, Dmitry Vyukov wrote:
> > On Tue, Jul 7, 2020 at 8:17 PM Catalin Marinas <catalin.marinas@....com> wrote:
> > > Kmemleak never performs well under heavy load. Normally you'd need to
> > > let the system settle for a bit before checking whether the leaks are
> > > still reported. The issue is caused by the memory scanning not stopping
> > > the whole machine, so pointers may be hidden in registers on different
> > > CPUs (list insertion/deletion for example causes transient kmemleak
> > > confusion).
> > >
> > > I think the syzkaller guys tried a year or so ago to run it in parallel
> > > with kmemleak and gave up shortly. The proposal was to add a "stopscan"
> > > command to kmemleak which would do this under stop_machine(). However,
> > > no-one got to implementing it.
> > >
> > > So, in this case, does the leak still appear with the reproducer, once
> > > the system went idle?
> >
> > This report came from syzbot, so obviously we did not give up :)
>
> That's good to know ;).
>
> > We don't run scanning in parallel with fuzzing and do a very intricate
> > multi-step dance to overcome false positives:
> > https://github.com/google/syzkaller/blob/5962a2dc88f6511b77100acdf687c1088f253f6b/executor/common_linux.h#L3407-L3478
> > and only report leaks that are reproducible.
> > So far I have not seen any noticable amount of false positives, and
> > you can see 70 already fixed leaks here:
> > https://syzkaller.appspot.com/upstream/fixed?manager=ci-upstream-gce-leak
> > https://syzkaller.appspot.com/upstream?manager=ci-upstream-gce-leak
>
> Thanks for the information and the good work here. If you have time, you
> could implement the stop_machine() kmemleak scan as well ;).

stop_machine will only help with pointers stored in registers/jumping
in memory. But there may be other sources of false positives like
hidden pointers via some hashing, offsets, reused low/high bits. Doing
several scans and crc checksum of object contents helps with these as
well and is orthogonal to stop_machine.
So now I wonder if using stop_machine will actually solve all
problems... because if not, then doing this work but then having to do
several scans and checksums anyway is kinda pointless...