linux-kernel - Re: linux-next test error: BUG: using __this_cpu_read() in preemptible code in __mod_memcg

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <5b1196be-09ce-51f7-f5e7-63f2e597f91e@linux.alibaba.com>
Date:   Mon, 9 Mar 2020 17:56:04 +0800
From:   Alex Shi <alex.shi@...ux.alibaba.com>
To:     "Kirill A. Shutemov" <kirill@...temov.name>,
        syzbot <syzbot+826543256ed3b8c37f62@...kaller.appspotmail.com>
Cc:     akpm@...ux-foundation.org, cgroups@...r.kernel.org,
        hannes@...xchg.org, linux-kernel@...r.kernel.org,
        linux-mm@...ck.org, mhocko@...nel.org,
        syzkaller-bugs@...glegroups.com, vdavydov.dev@...il.com
Subject: Re: linux-next test error: BUG: using __this_cpu_read() in
 preemptible code in __mod_memcg_state



在 2020/3/9 下午5:24, Kirill A. Shutemov 写道:
>> check_preemption_disabled: 3 callbacks suppressed
>> BUG: using __this_cpu_read() in preemptible [00000000] code: syz-fuzzer/9432
>> caller is __mod_memcg_state+0x27/0x1a0 mm/memcontrol.c:689
>> CPU: 1 PID: 9432 Comm: syz-fuzzer Not tainted 5.6.0-rc4-next-20200306-syzkaller #0
>> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
>> Call Trace:
>>  __dump_stack lib/dump_stack.c:77 [inline]
>>  dump_stack+0x188/0x20d lib/dump_stack.c:118
>>  check_preemption_disabled lib/smp_processor_id.c:47 [inline]
>>  __this_cpu_preempt_check.cold+0x84/0x90 lib/smp_processor_id.c:64
>>  __mod_memcg_state+0x27/0x1a0 mm/memcontrol.c:689
>>  __split_huge_page mm/huge_memory.c:2575 [inline]
>>  split_huge_page_to_list+0x124b/0x3380 mm/huge_memory.c:2862
>>  split_huge_page include/linux/huge_mm.h:167 [inline]
> It looks like a regression due to c8cba0cc2a80 ("mm/thp: narrow lru
> locking").

yes, I guess so.

In this patch, I am very bold to move the lru unlock from before 
'remap_page(head);' up to before 'ClearPageCompound(head);' which is
often checked in lrulock. I want to know which part that real should
stay in lru_lock. 

So revert this patch or move it back or move after ClearPageCompound
should fix this problem. 

In the weekend and today, I tried a lot to reproduce this bug on my 2
machines, but still can't. :~(

Many thanks to give a try!

Thank
Alex 

line 2605 mm/huge_memory.c:
        spin_unlock_irqrestore(&pgdat->lru_lock, flags);

        ClearPageCompound(head);

        split_page_owner(head, HPAGE_PMD_ORDER);

        /* See comment in __split_huge_page_tail() */
        if (PageAnon(head)) {
                /* Additional pin to swap cache */
                if (PageSwapCache(head)) {
                        page_ref_add(head, 2);
                        xa_unlock(&swap_cache->i_pages);
                } else {
                        page_ref_inc(head);
                }
        } else {
                /* Additional pin to page cache */
                page_ref_add(head, 2);
                xa_unlock(&head->mapping->i_pages);
        }

        remap_page(head);