[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20191101203048.GA6622@gmail.com>
Date: Fri, 1 Nov 2019 21:30:48 +0100
From: Ingo Molnar <mingo@...nel.org>
To: Linus Torvalds <torvalds@...ux-foundation.org>
Cc: Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
Peter Zijlstra <a.p.zijlstra@...llo.nl>,
Arnaldo Carvalho de Melo <acme@...radead.org>,
Jiri Olsa <jolsa@...hat.com>,
Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
Mark Rutland <mark.rutland@....com>,
Thomas Gleixner <tglx@...utronix.de>
Subject: Re: [GIT PULL] perf fixes
* Linus Torvalds <torvalds@...ux-foundation.org> wrote:
> On Fri, Nov 1, 2019 at 10:48 AM Ingo Molnar <mingo@...nel.org> wrote:
> >
> > Alexander Shishkin (1):
> > perf/core: Start rejecting the syscall with attr.__reserved_2 set
>
> This seems to quite possibly break existing apps. Is there any reason
> to believe that existing users have actually cleared that field?
Yes, there's good reason to believe that the field ends up being zero
within the kernel even if old user-space uses the old, smaller structure:
> It's suspect for another reason too: the commit that added that field
> just added it to the end of the structure, with the argument that
> "aux_watermark will only matter for new AUX-aware code, so the old
> code should still be fine".
>
> So by *definition* those old kinds of users would never have cleared
> that field, because that field didn't exist.
Firstly, non-existent fields are initialized to zero by default in the
perf ABI: *even if user-space was built well before that new field was
introduced in the kernel*.
This is done in perf_copy_attr():
/* Zero the full structure, so that a short copy will be nice. */
memset(attr, 0, sizeof(*attr));
The user-space structure that is passed in ('uattr' within that function)
might indeed be short and not contain the new field - but we handle this
via uattr->size, which is set by user-space - for example 'perf' sets it
in event_attr_init() in tools/perf/util/util.c:
/* to capture ABI version */
attr->size = sizeof(*attr);
Second, the kernel syscall then checks this size against the kernel's
size of attr:
- if uattr->size < kattr_size: then old ABI user-space binary is
running on new kernel, and we zero out residual fields.
- if uattr->size = kattr_size, this is the normal case of matching ABI (sizes)
- if uattr->size > kattr_size, then new ABI user-space binary is running
on an old kernel - in this case the kernel copies in a shortened
structure from the new tool and ignores the remaining fields. To not
surprise new user-space the kernel copies back the recognized size to
uattr->size and user-space can decide whether they want to run on this
kernel or not:
put_user(sizeof(*attr), &uattr->size);
This way the perf ABI is *both* backwards and forward compatible, even
across multiple ABI changes. We seemingly didn't "care" about the zeroing
of the new field in old binaries in that changelog, because it's
guaranteed by the syscall implementation.
[ Note that there *is* still potential for more subtle ABI breakage, see
further below. ]
This actually works fine in practice. I just tried out a brand new kernel
with really old perf tooling in the v3.19 kernel, from 4.5 years ago:
$ git checkout v3.19
$ cd tools/perf/
$ make install WERROR=0
$ perf version
perf version 3.19.gbfa76d49
$ perf top
This ancient version of 'perf top', 'perf record', 'perf report' works
just fine on the new kernel, despite passing in a smaller 'attr':
[ 558.408907] perf kABI size: 112
[ 558.412855] perf uABI size: 104
when running new tooling on a new kernel the two sizes match:
[ 331.598089] perf kABI size: 112
[ 331.602050] perf uABI size: 112
and everything works as usual.
> And the people involved should stop claiming this "fixes" anything,
> and should look hard at their random ABI expansions and "fixes".
>
> The original code that said "old users would not be impacted" is
> correct. But this "fix" is very very very questionable indeed, and I
> get the feeling that somebody doesn't understand what ABI is, when
> they claim that this "fixes" anything.
So if your primary worry is about old user-space that didn't know about
the new field passing in the new field with some random value, then if
the perf ABI handling logic I outlined above is implemented correctly
(and I believe it's correct), then this should not happen.
But arguably there's still a more subtle ABI breakage that this change
might trigger: if due to some bug user-space *does* initialize this
reserved field to non-zero, and the kernel ignored that non-zero field
for 4 years and new kernels suddenly start rejecting these syscalls,
that's an ABI bug and grounds for an immediate revert.
I don't expect this to happen though, due to the combination of how
user-space and kernel-space handles the extension of the perf ABI - but
if it happens then it's an insta-revert.
I'd also argue that because we didn't check this field for years, the
next ABI extension should probably not use the __reserved_2 field - which
is a good idea anyway as it has an unnatural __u16 size.
> Honestly, this all shows a worrying complete disregard for backwards
> compatibility. Calling this a "fix" is questionable, when it is much
> more likely to break some old user.
>
> I've pulled it, but I need people to be aware that this is utter
> garbage, and that if anybody ever reports it, this needs to be
> immediately reverted.
In light of the above logic I'd like to push back against this
characterization: this kind of ABI compatibility is actually one of the
most important aspects of the perf ABI, and we very much take it
seriously.
In fact we are proud of it: the perf ABI is one of the most iterated, yet
also one of the most compatible and tooling-friendly ABIs in the kernel. :-)
The reason it wasn't even mentioned in the changelog is because we tried
to design out this kind of mishap from the ABI, and all the main perf
developers are aware of that design. Will try to include a more careful
explanation next time.
Thanks,
Ingo
Powered by blists - more mailing lists