linux-kernel - Re: [PATCH] tracing: Fix race when concurrently splice_read trace

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <b5dbdbeb-be3a-3434-0909-0697d8cb15bf@huawei.com>
Date:   Sat, 12 Aug 2023 15:38:12 +0800
From:   Zheng Yejian <zhengyejian1@...wei.com>
To:     Steven Rostedt <rostedt@...dmis.org>
CC:     <mhiramat@...nel.org>, <laijs@...fujitsu.com>,
        <linux-kernel@...r.kernel.org>,
        <linux-trace-kernel@...r.kernel.org>
Subject: Re: [PATCH] tracing: Fix race when concurrently splice_read
 trace_pipe

On 2023/8/12 03:25, Steven Rostedt wrote:
> On Thu, 10 Aug 2023 20:39:05 +0800
> Zheng Yejian <zhengyejian1@...wei.com> wrote:
> 
>> When concurrently splice_read file trace_pipe and per_cpu/cpu*/trace_pipe,
>> there are more data being read out than expected.
> 
> Honestly the real fix is to prevent that use case. We should probably have
> access to trace_pipe lock all the per_cpu trace_pipes too.
> 
> -- Steve
> 

Hi~

Reproduction testcase is show as below, it can always reproduce the
issue in v5.10, and after this patch, the testcase passed.

In v5.10, when run `cat trace_pipe > /tmp/myfile &`, it call
sendfile() to transmit data from trace_pipe into /tmp/myfile. And in
kernel, .splice_read() of trace_pipe is called then the issue is
reproduced.

However in the newest v6.5, this reproduction case didn't run into the
.splice_read() of trace_pipe, because after commit 97ef77c52b78 ("fs:
check FMODE_LSEEK to control internal pipe splicing"), non-seekable
trace_pipe cannot be sendfile-ed.

``` repro.sh
#!/bin/bash


do_test()
{
         local trace_dir=/sys/kernel/tracing
         local trace=${trace_dir}/trace
         local old_trace_lines
         local new_trace_lines
         local tempfiles
         local testlog="trace pipe concurrency issue"
         local pipe_pids
         local i
         local write_cnt=1000
         local read_cnt=0
         local nr_cpu=`nproc`

         # 1. At first, clear all ring buffer
         echo > ${trace}

         # 2. Count how many lines in trace file now
         old_trace_lines=`cat ${trace} | wc -l`

         # 3. Close water mark so that reader can read as event comes
         echo 0 > ${trace_dir}/buffer_percent

         # 4. Read percpu trace_pipes into local file on background.
         #    Splice read must be used under command 'cat' so that the racy
         #    issue can be reproduced !!!
         i=0
         while [ ${i} -lt ${nr_cpu} ]; do
                 tempfiles[${i}]=/tmp/percpu_trace_pipe_${i}
                 cat ${trace_dir}/per_cpu/cpu${i}/trace_pipe > 
${tempfiles[${i}]} &
                 pipe_pids[${i}]=$!
                 let i=i+1
         done

         # 5. Read main trace_pipe into local file on background.
         #    The same, splice read must be used to reproduce the issue !!!
         tempfiles[${i}]=/tmp/main_trace_pipe
         cat ${trace_dir}/trace_pipe > ${tempfiles[${i}]} &
         pipe_pids[${i}]=$!

         echo "Take a break, let readers run."
         sleep 3

         # 6. Write events into ring buffer through trace_marker, so that
         #    hungry readers start racing these events.
         i=0
         while [ ${i} -lt ${write_cnt} ]; do
                 echo "${testlog} <${i}>" > ${trace_dir}/trace_marker
                 let i=i+1
         done

         # 7. Wait until all events being consumed
         new_trace_lines=`cat ${trace} | wc -l`
         while [ "${new_trace_lines}" != "${old_trace_lines}" ]; do
                 new_trace_lines=`cat ${trace} | wc -l`
                 sleep 1
         done
         echo "All written events have been consumed."

         # 8. Kill all readers and count the events readed
         i=0
         while [ ${i} -lt ${#pipe_pids[*]} ]; do
                 local num

                 kill -9 ${pipe_pids[${i}]}
                 wait ${pipe_pids[${i}]}
                 num=`cat ${tempfiles[${i}]} | grep "${testlog}" | wc -l`
                 let read_cnt=read_cnt+num
                 let i=i+1
         done

         # 9. Expect to read events as much as write
         if [ "${read_cnt}" != "${write_cnt}" ]; then
                 echo "Test fail: write ${write_cnt} but read 
${read_cnt} !!!"
                 return 1
         fi

         # 10. Clean temp files if test success
         i=0
         while [ ${i} -lt ${#tempfiles[*]} ]; do
                 rm ${tempfiles[${i}]}
                 let i=i+1
         done
         return 0
}

do_test
```

-- Zheng Yejian