linux-kernel - [QUESTION] ltp: mavise06 failed when the task scheduled to another cpu

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [day] [month] [year] [list]

Message-ID: <5ed4bc3d-9c30-8cae-e826-b2d0c37c2f6e@huawei.com>
Date:   Mon, 11 Oct 2021 16:14:16 +0800
From:   Yongqiang Liu <liuyongqiang13@...wei.com>
To:     <vincent.guittot@...aro.org>, <dietmar.eggemann@....com>,
        <xuyang2018.jy@...itsu.com>, <liwang@...hat.com>
CC:     <linux-kernel@...r.kernel.org>, <linux-mm@...ck.org>,
        <vincent.guittot@...aro.org>, <mingo@...hat.com>,
        <peterz@...radead.org>, <mgorman@...e.de>,
        <akpm@...ux-foundation.org>, <vbabka@...e.cz>,
        "David Hildenbrand" <david@...hat.com>, <willy@...radead.org>,
        "Wangkefeng (OS Kernel Lab)" <wangkefeng.wang@...wei.com>
Subject: [QUESTION] ltp: mavise06 failed when the task scheduled to another
 cpu

Hi,

when runing this case in 5.10-lts kernel, it will trigger the folloing 
failure:

  ......

     madvise06.c:74: TINFO:  memory.kmem.usage_in_bytes: 1752 Kb
     madvise06.c:208: TPASS: more than 102400 Kb were moved to the swap 
cache
     madvise06.c:217: TINFO: PageFault(madvice / no mem access): 102401
     madvise06.c:221: TINFO: PageFault(madvice / mem access): 102417
     madvise06.c:82: TINFO: After page access
     madvise06.c:84: TINFO:  Swap: 307372 Kb
     madvise06.c:86: TINFO:  SwapCached: 101820 Kb
     madvise06.c:88: TINFO:  Cached: 103004Kb
     madvise06.c:74: TINFO:  memory.kmem.usage_in_bytes: 0Kb
     madvise06.c:225: TFAIL: 16 pages were faulted out of 2 max

and we found that when we call the madvise the task was scheduled to 
another cpu:

......

tst_res(TINFO, "before madvise MEMLIMIT CPU:%d", sched_getcpu());--->cpu0

TEST(madvise(target, MEM_LIMIT, MADV_WILLNEED));

tst_res(TINFO, "after madvise MEMLIMIT CPU:%d", sched_getcpu());--->cpu1

......

tst_res(TINFO, "before madvise PASS_THRESHOLDCPU:%d", 
sched_getcpu());-->cpu1

TEST(madvise(target, PASS_THRESHOLD, MADV_WILLNEED));

tst_res(TINFO, "after madvise PASS_THRESHOLDCPU:%d", 
sched_getcpu());-->cpu0

.....

Is the PERCPU data swap_slot was not handled well?


with the following patch almost fix the error:

e9b9734b7465 sched/fair: Reduce cases for active balance

8a41dfcda7a3 sched/fair: Don't set LBF_ALL_PINNED unnecessarily

fc488ffd4297 sched/fair: Skip idle cfs_rq

but bind the task to a cpu also can solve this problem.

Kind regards,