linux-kernel - Re: [GIT PULL] perf fixes

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20191101203048.GA6622@gmail.com>
Date:   Fri, 1 Nov 2019 21:30:48 +0100
From:   Ingo Molnar <mingo@...nel.org>
To:     Linus Torvalds <torvalds@...ux-foundation.org>
Cc:     Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        Peter Zijlstra <a.p.zijlstra@...llo.nl>,
        Arnaldo Carvalho de Melo <acme@...radead.org>,
        Jiri Olsa <jolsa@...hat.com>,
        Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
        Mark Rutland <mark.rutland@....com>,
        Thomas Gleixner <tglx@...utronix.de>
Subject: Re: [GIT PULL] perf fixes


* Linus Torvalds <torvalds@...ux-foundation.org> wrote:

> On Fri, Nov 1, 2019 at 10:48 AM Ingo Molnar <mingo@...nel.org> wrote:
> >
> > Alexander Shishkin (1):
> >       perf/core: Start rejecting the syscall with attr.__reserved_2 set
> 
> This seems to quite possibly break existing apps. Is there any reason 
> to believe that existing users have actually cleared that field?

Yes, there's good reason to believe that the field ends up being zero 
within the kernel even if old user-space uses the old, smaller structure:

> It's suspect for another reason too: the commit that added that field
> just added it to the end of the structure, with the argument that
> "aux_watermark will only matter for new AUX-aware code, so the old
> code should still be fine".
> 
> So by *definition* those old kinds of users would never have cleared
> that field, because that field didn't exist.

Firstly, non-existent fields are initialized to zero by default in the 
perf ABI: *even if user-space was built well before that new field was 
introduced in the kernel*.

This is done in perf_copy_attr():

        /* Zero the full structure, so that a short copy will be nice. */
        memset(attr, 0, sizeof(*attr));

The user-space structure that is passed in ('uattr' within that function) 
might indeed be short and not contain the new field - but we handle this 
via uattr->size, which is set by user-space - for example 'perf' sets it 
in event_attr_init() in tools/perf/util/util.c:

        /* to capture ABI version */
        attr->size = sizeof(*attr);

Second, the kernel syscall then checks this size against the kernel's 
size of attr:

 - if uattr->size < kattr_size: then old ABI user-space binary is 
   running on new kernel, and we zero out residual fields.

 - if uattr->size = kattr_size, this is the normal case of matching ABI (sizes)

 - if uattr->size > kattr_size, then new ABI user-space binary is running 
   on an old kernel - in this case the kernel copies in a shortened 
   structure from the new tool and ignores the remaining fields. To not 
   surprise new user-space the kernel copies back the recognized size to 
   uattr->size and user-space can decide whether they want to run on this 
   kernel or not:

             put_user(sizeof(*attr), &uattr->size);

This way the perf ABI is *both* backwards and forward compatible, even 
across multiple ABI changes. We seemingly didn't "care" about the zeroing 
of the new field in old binaries in that changelog, because it's 
guaranteed by the syscall implementation.

[ Note that there *is* still potential for more subtle ABI breakage, see 
  further below. ]

This actually works fine in practice. I just tried out a brand new kernel 
with really old perf tooling in the v3.19 kernel, from 4.5 years ago:

  $ git checkout v3.19
  $ cd tools/perf/
  $ make install WERROR=0

  $ perf version
  perf version 3.19.gbfa76d49

  $ perf top

This ancient version of 'perf top', 'perf record', 'perf report' works 
just fine on the new kernel, despite passing in a smaller 'attr':

  [  558.408907] perf kABI size: 112
  [  558.412855] perf uABI size: 104

when running new tooling on a new kernel the two sizes match:

[  331.598089] perf kABI size: 112
[  331.602050] perf uABI size: 112

and everything works as usual.

> And the people involved should stop claiming this "fixes" anything,
> and should look hard at their random ABI expansions and "fixes".
> 
> The original code that said "old users would not be impacted" is
> correct. But this "fix" is very very very questionable indeed, and I
> get the feeling that somebody doesn't understand what ABI is, when
> they claim that this "fixes" anything.

So if your primary worry is about old user-space that didn't know about 
the new field passing in the new field with some random value, then if 
the perf ABI handling logic I outlined above is implemented correctly 
(and I believe it's correct), then this should not happen.

But arguably there's still a more subtle ABI breakage that this change 
might trigger: if due to some bug user-space *does* initialize this 
reserved field to non-zero, and the kernel ignored that non-zero field 
for 4 years and new kernels suddenly start rejecting these syscalls, 
that's an ABI bug and grounds for an immediate revert.

I don't expect this to happen though, due to the combination of how 
user-space and kernel-space handles the extension of the perf ABI - but 
if it happens then it's an insta-revert.

I'd also argue that because we didn't check this field for years, the 
next ABI extension should probably not use the __reserved_2 field - which 
is a good idea anyway as it has an unnatural __u16 size.

> Honestly, this all shows a worrying complete disregard for backwards
> compatibility. Calling this a "fix" is questionable, when it is much
> more likely to break some old user.
> 
> I've pulled it, but I need people to be aware that this is utter
> garbage, and that if anybody ever reports it, this needs to be
> immediately reverted.

In light of the above logic I'd like to push back against this 
characterization: this kind of ABI compatibility is actually one of the 
most important aspects of the perf ABI, and we very much take it 
seriously.

In fact we are proud of it: the perf ABI is one of the most iterated, yet 
also one of the most compatible and tooling-friendly ABIs in the kernel. :-)

The reason it wasn't even mentioned in the changelog is because we tried 
to design out this kind of mishap from the ABI, and all the main perf 
developers are aware of that design. Will try to include a more careful 
explanation next time.

Thanks,

	Ingo