lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Sat, 20 Jun 2020 07:28:20 +0800
From:   Ming Lei <ming.lei@...hat.com>
To:     Masami Hiramatsu <mhiramat@...nel.org>
Cc:     Steven Rostedt <rostedt@...dmis.org>,
        Ming Lei <tom.leiming@...il.com>,
        "Naveen N. Rao" <naveen.n.rao@...ux.ibm.com>,
        Anil S Keshavamurthy <anil.s.keshavamurthy@...el.com>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        "David S. Miller" <davem@...emloft.net>,
        linux-block <linux-block@...r.kernel.org>
Subject: Re: kprobe: __blkdev_put probe is missed

Hi Masami,

On Sat, Jun 20, 2020 at 12:35:09AM +0900, Masami Hiramatsu wrote:
> Hi Ming,
> 
> On Fri, 19 Jun 2020 21:32:40 +0800
> Ming Lei <ming.lei@...hat.com> wrote:
> 
> > On Fri, Jun 19, 2020 at 08:19:54AM -0400, Steven Rostedt wrote:
> > > On Fri, 19 Jun 2020 15:28:59 +0800
> > > Ming Lei <ming.lei@...hat.com> wrote:
> > > 
> > > > > 
> > > > > OK, then let's make events (for sure)
> > > > > 
> > > > > root@...note2:/sys/kernel/debug/tracing# echo p __blkdev_put >> kprobe_events 
> > > > > root@...note2:/sys/kernel/debug/tracing# echo r __blkdev_put >> kprobe_events 
> > > > > root@...note2:/sys/kernel/debug/tracing# echo p blkdev_put >> kprobe_events 
> > > 
> > > Hi Ming,
> > > 
> > > Do you have the kprobe_events file?
> > > 
> > > > > root@...note2:/sys/kernel/debug/tracing# echo 1 > events/kprobes/enable   
> > > > 
> > > > I can't find 'events/kprobes' in my VM with upstream kernel, also not found
> > > > the dir under fedora31(5.5.15-200) & rhel8(v4.18 based).
> > > 
> > > The events/kprobes directly will be created when you create a
> > > kprobe_event. It wont exist until then.
> > 
> > Hi Steven and Masami,
> > 
> > Got it, thanks for your help, now I can run the test, follows the steps
> > and results, and there is still one __blkdev_put probed.
> 
> Hmm, strange...
> 
> > And it is observed in my VM reliably with 5.7+ or Fedora kernel reliably,
> > kernel config is attached.
> 
> Thanks for sharing it.
> 
> > 
> > [root@...st-01 tracing]# uname -a
> > Linux ktest-01 5.7.0+ #1900 SMP Fri Jun 19 16:26:47 CST 2020 x86_64 x86_64 x86_64 GNU/Linux
> > [root@...st-01 tracing]#
> > [root@...st-01 tracing]# cat kprobe_events
> > [root@...st-01 tracing]#
> > [root@...st-01 tracing]# echo p blkdev_put >> kprobe_events
> > [root@...st-01 tracing]# echo p __blkdev_put >> kprobe_events
> > [root@...st-01 tracing]# echo r __blkdev_put >> kprobe_events
> > [root@...st-01 tracing]#
> > [root@...st-01 tracing]# echo 1 > events/kprobes/enable
> > [root@...st-01 tracing]# blockdev --getbsz /dev/sda1
> > 4096
> > [root@...st-01 tracing]# echo 0 > events/kprobes/enable
> > [root@...st-01 tracing]# cat trace
> > # tracer: nop
> > #
> > # entries-in-buffer/entries-written: 3/3   #P:8
> > #
> > #                              _-----=> irqs-off
> > #                             / _----=> need-resched
> > #                            | / _---=> hardirq/softirq
> > #                            || / _--=> preempt-depth
> > #                            ||| /     delay
> > #           TASK-PID   CPU#  ||||    TIMESTAMP  FUNCTION
> > #              | |       |   ||||       |         |
> >         blockdev-970   [005] .... 17603.447236: p_blkdev_put_0: (blkdev_put+0x0/0xb4)
> >         blockdev-970   [005] .... 17603.447244: p___blkdev_put_0: (__blkdev_put+0x0/0x19d)
> >         blockdev-970   [005] d... 17603.447251: r___blkdev_put_0: (blkdev_close+0x22/0x25 <- __blkdev_put)
> 
> This shows __blkdev_put() is a tail-call. It is possible that the
> internal (nested) __blkdev_put() call becomes a goto inside the
> function by the gcc optimization.
> 
> Ah, after all it is as expected. With your kconfig, the kernel is
> very agressively optimized.
> 
> $ objdump -dS vmlinux | less
> ...
> ffffffff81256dc3 <__blkdev_put>:
> {
> ffffffff81256dc3:       e8 98 85 df ff          callq  ffffffff8104f360 <__fentry__>
> ffffffff81256dc8:       41 57                   push   %r15
> ffffffff81256dca:       41 56                   push   %r14
> ffffffff81256dcc:       41 55                   push   %r13
> ...
> ffffffff81256f05:       75 02                   jne    ffffffff81256f09 <__blkdev_put+0x146>
>         struct block_device *victim = NULL;
> ffffffff81256f07:       31 db                   xor    %ebx,%ebx
>                 bdev->bd_contains = NULL;
> ffffffff81256f09:       48 c7 45 60 00 00 00    movq   $0x0,0x60(%rbp)
> ffffffff81256f10:       00 
>                 put_disk_and_module(disk);
> ffffffff81256f11:       4c 89 f7                mov    %r14,%rdi
> ffffffff81256f14:       e8 c6 3d 11 00          callq  ffffffff8136acdf <put_disk_and_module>
>         mutex_unlock(&bdev->bd_mutex);
> ffffffff81256f19:       4c 89 ff                mov    %r15,%rdi
>                 __blkdev_put(victim, mode, 1);
> ffffffff81256f1c:       41 bc 01 00 00 00       mov    $0x1,%r12d
>         mutex_unlock(&bdev->bd_mutex);
> ffffffff81256f22:       e8 8d d7 48 00          callq  ffffffff816e46b4 <mutex_unlock>
>         bdput(bdev);
> ffffffff81256f27:       48 89 ef                mov    %rbp,%rdi
> ffffffff81256f2a:       e8 f0 e9 ff ff          callq  ffffffff8125591f <bdput>
>         if (victim)
> ffffffff81256f2f:       48 85 db                test   %rbx,%rbx
> ffffffff81256f32:       74 08                   je     ffffffff81256f3c <__blkdev_put+0x179>
> ffffffff81256f34:       48 89 dd                mov    %rbx,%rbp
> ffffffff81256f37:       e9 b4 fe ff ff          jmpq   ffffffff81256df0 <__blkdev_put+0x2d> <<-----THIS!!
> }
> ffffffff81256f3c:       48 8b 44 24 28          mov    0x28(%rsp),%rax
> ffffffff81256f41:       65 48 33 04 25 28 00    xor    %gs:0x28,%rax
> ffffffff81256f48:       00 00 
> ffffffff81256f4a:       74 05                   je     ffffffff81256f51 <__blkdev_put+0x18e>
> ffffffff81256f4c:       e8 5a 4e 48 00          callq  ffffffff816dbdab <__stack_chk_fail>
> ffffffff81256f51:       48 83 c4 30             add    $0x30,%rsp
> ffffffff81256f55:       5b                      pop    %rbx
> ffffffff81256f56:       5d                      pop    %rbp
> ffffffff81256f57:       41 5c                   pop    %r12
> ffffffff81256f59:       41 5d                   pop    %r13
> ffffffff81256f5b:       41 5e                   pop    %r14
> ffffffff81256f5d:       41 5f                   pop    %r15
> ffffffff81256f5f:       c3                      retq   
> 
> 
> As you can see, the nested __blkdev_put() is coverted to a loop.
> If you put kprobe on __blkdev_put+0x2d, you'll see the event twice.

Thanks for your investigation.

Some trace tools can just trace on function entry, such as bcc, and some
user script always trace on function entry.

I guess the issue should belong to kprobe implementation:

1) __blkdev_put() is capable of being kprobed, so from user view, the
probe on entry of __blkdev_put() should be triggered

2) from implementation view, I understand exception should be trapped
on the entry of __blkdev_put(), looks it isn't done.

Correct me if the above is wrong, and is it possible to fix it in kprobe?


Thanks,
Ming

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ