linux-kernel - Re: KRETPROBES are broken since kernel 5.8

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-Id: <20201211104424.e9e422f8be00e12b5d90260c@kernel.org>
Date:   Fri, 11 Dec 2020 10:44:24 +0900
From:   Masami Hiramatsu <mhiramat@...nel.org>
To:     Adam Zabrocki <pi3@....com.pl>
Cc:     Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
        x86@...nel.org, "H. Peter Anvin" <hpa@...or.com>,
        linux-kernel@...r.kernel.org,
        "Naveen N. Rao" <naveen.n.rao@...ux.ibm.com>,
        Anil S Keshavamurthy <anil.s.keshavamurthy@...el.com>,
        "David S. Miller" <davem@...emloft.net>,
        Solar Designer <solar@...nwall.com>
Subject: Re: KRETPROBES are broken since kernel 5.8

On Thu, 10 Dec 2020 18:14:30 +0100
Adam Zabrocki <pi3@....com.pl> wrote:

> Hi,
> 
> > > However, there might be another issue which I wanted to brought / discuss - 
> > > problem with optimizer. Until kernel 5.9 KRETPROBE on e.g. 
> > > 'ftrace_enable_sysctl' function was correctly optimized without any problems. 
> > 
> > Did you check it on other functions? Did you see it only on the "ftrace_enable_sysctl"?
> > 
> 
> Yes, I see it in most of the functions with padding.

Thanks for the confirmation.

> 
> > > Since 5.9 it can't be optimized anynmore. I didn't see any changes in the 
> > > sources regarding the optimizer, neither function itself.
> > > When I looked at the generated vmlinux binary, I can see that GCC generated 
> > > padding at the end of this function using INT3 opcode:
> > > 
> > > ...
> > > ffffffff8130528b:       41 bd f0 ff ff ff       mov    $0xfffffff0,%r13d
> > > ffffffff81305291:       e9 fe fe ff ff          jmpq   ffffffff81305194 <ftrace_enable_sysctl+0x114>
> > > ffffffff81305296:       cc                      int3
> > > ffffffff81305297:       cc                      int3
> > > ffffffff81305298:       cc                      int3
> > > ffffffff81305299:       cc                      int3
> > > ffffffff8130529a:       cc                      int3
> > > ffffffff8130529b:       cc                      int3
> > > ffffffff8130529c:       cc                      int3
> > > ffffffff8130529d:       cc                      int3
> > > ffffffff8130529e:       cc                      int3
> > > ffffffff8130529f:       cc                      int3
> > 
> > So these int3 is generated by GCC for padding, right?
> > 
> 
> I've just browsed a few commits and I've found that one:
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=7705dc8557973d8ad8f10840f61d8ec805695e9e
> 
> It looks like INT3 is now a default padding used by linker.

Thanks for the information! OK, I will add Fixed: tag and backport it.

> 
> > > However, that's not the case here. INT3_INSN_OPCODE is placed at the end of 
> > > the function as padding (and protect from NOP-padding problems).
> > > 
> > > I wonder, if optimizer should take this special case into account? Is it worth 
> > > to still optimize such functions when they are padded with INT3?
> > 
> > Indeed. I expected int3 is used from other subsystems (e.g. kgdb) and,
> > in that case the optimization can confuse them.
> 
> Right. The same can happen when text section is being actively modified. 
> However, this case could be covered by running the optimizer logic under 
> text_mutex.

No, this check is needed because of the instruction decoding. Usually,
the int3 will be put a the first byte of the existing instruction whose
length is usually 1-6 bytes. If the instruction's opcode is overwritten
by the int3, kprobes can not get the original opcode and this means it 
can not get the original length of the instruction.
To optimize the probe, kprobes have to ensure the other jump instruction
doesn't jump into the instructions which will be overwritten by optimized
jump instruction. This is why the can_optimize() decodes all instructions
in the function (note that ksyms has no information of padding bytes, it
returns the function size with the padding bytes.)
Thus, when the kprobes detects the int3 in the function, it gives up the
decoding and optimizing.

> 
> > But if the gcc uses int3 to pad the room between functions, it should be
> > reconsidered. 
> > 
> 
> Looks like it's a default behavior now.

OK, let me fix that. If the int3 is only used for the padding between functions,
those int3 should continue to the end of the function. So kprobes can distinguish
the int3 comes from other subsystems or linker.

Thank you,

> 
> > Thank you,
> >
> > > If it is OK, we should backport those to stable tree.
> > 
> > Agreed.
> 
> It is also important to make sure that distro kernels would pick-up such 
> backported fix.
> 
> Thanks,
> Adam
> 
> -- 
> pi3 (pi3ki31ny) - pi3 (at) itsec pl
> http://pi3.com.pl
> 


-- 
Masami Hiramatsu <mhiramat@...nel.org>