lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <be323425-2465-423a-a6f4-affbaa1efe09@bytedance.com>
Date: Tue, 11 Feb 2025 20:41:14 +0800
From: Qi Zheng <zhengqi.arch@...edance.com>
To: David Hildenbrand <david@...hat.com>
Cc: "Russell King (Oracle)" <linux@...linux.org.uk>,
 Ezra Buehler <ezra@...yb.ch>, linux-mm@...ck.org,
 Andrew Morton <akpm@...ux-foundation.org>,
 "Mike Rapoport (Microsoft)" <rppt@...nel.org>,
 Muchun Song <muchun.song@...ux.dev>, Vlastimil Babka <vbabka@...e.cz>,
 Ryan Roberts <ryan.roberts@....com>,
 "Vishal Moola (Oracle)" <vishal.moola@...il.com>,
 Hugh Dickins <hughd@...gle.com>, Matthew Wilcox <willy@...radead.org>,
 Peter Xu <peterx@...hat.com>, Nicolas Ferre <nicolas.ferre@...rochip.com>,
 Alexandre Belloni <alexandre.belloni@...tlin.com>,
 Claudiu Beznea <claudiu.beznea@...on.dev>,
 open list <linux-kernel@...r.kernel.org>,
 linux-arm-kernel@...ts.infradead.org
Subject: Re: [REGRESSION] NULL pointer dereference on ARM (AT91SAM9G25) during
 compaction



On 2025/2/11 20:09, David Hildenbrand wrote:
> On 11.02.25 10:43, Qi Zheng wrote:
>>
>>
>> On 2025/2/11 17:37, David Hildenbrand wrote:
>>> On 11.02.25 10:29, Qi Zheng wrote:
>>>>
>>>>
>>>> On 2025/2/11 17:14, David Hildenbrand wrote:
>>>>> On 11.02.25 04:45, Qi Zheng wrote:
>>>>>> Hi Russell,
>>>>>>
>>>>>> On 2025/2/11 01:03, Russell King (Oracle) wrote:
>>>>>>> On Mon, Feb 10, 2025 at 05:49:38PM +0100, Ezra Buehler wrote:
>>>>>>>> When running vanilla Linux 6.13 or newer (6.14-rc2) on the
>>>>>>>> AT91SAM9G25-based GARDENA smart Gateway, we are seeing a NULL 
>>>>>>>> pointer
>>>>>>>> dereference resulting in a kernel panic. The culprit seems to be
>>>>>>>> commit
>>>>>>>> fc9c45b71f43 ("arm: adjust_pte() usepte_offset_map_rw_nolock()").
>>>>>>>> Reverting the commit apparently fixes the issue.
>>>>>>>
>>>>>>> The blamed commit is buggy:
>>>>>>>
>>>>>>> arch/arm/include/asm/tlbflush.h:
>>>>>>> #define update_mmu_cache(vma, addr, ptep) \
>>>>>>>             update_mmu_cache_range(NULL, vma, addr, ptep, 1)
>>>>>>>
>>>>>>> So vmf can be NULL. This didn't used to matter before this commit,
>>>>>>> because vmf was not used by ARM's update_mmu_cache_range(). However,
>>>>>>> the commit introduced a dereference of it, which now causes a NULL
>>>>>>> point dereference.
>>>>>>>
>>>>>>> Not sure what the correct solution is, but at a guess, both:
>>>>>>>
>>>>>>>       if (ptl != vmf->ptl)
>>>>>>>
>>>>>>> need to become:
>>>>>>>
>>>>>>>       if (!vmf || ptl != vmf->ptl)
>>>>>>
>>>>>> No, we can't do that, because without using split PTE locks, we would
>>>>>> use shared mm->page_table_lock, which would create a deadlock.
>>>>>
>>>>> Maybe we can simply special-case on CONFIG_SPLIT_PTE_PTLOCKS ?
>>>>>
>>>>> if (IS_ENABLED(CONFIG_SPLIT_PTE_PTLOCKS)) {
>>>>
>>>> In this case, if two vmas map the same PTE page, then the same PTE lock
>>>> will be held repeatedly. Right?
>>>
>>> Hmm, the comment says:
>>>
>>>           /*
>>>            * This is called while another page table is mapped, so we
>>>            * must use the nested version.  This also means we need to
>>>            * open-code the spin-locking.
>>>            */
>>>
>>> "another page table" implies that it cannot be the same. But maybe that
>>> comment was also wrong?
>>
>> I don't see make_coherent() ensuring this when traversing vma.
> 
> Right, we could just have the same file range mapped MAP_SHARED into the 
> same page table using two VMAs ... I suspect writing a reproducer for 
> the deadlock should be easy.

I guess so, but I don't have an arm32 test environment yet. :(

[...]

>>           vma_interval_tree_foreach(mpnt, &mapping->i_mmap, pgoff, 
>> pgoff) {
>> +               unsigned long mpnt_addr;
>> +
>>                   /*
>>                    * If this VMA is not in our MM, we can ignore it.
>>                    * Note that we intentionally mask out the VMA
>> @@ -151,7 +178,14 @@ make_coherent(struct address_space *mapping, struct
>> vm_area_struct *vma,
>>                   if (!(mpnt->vm_flags & VM_MAYSHARE))
>>                           continue;
>>                   offset = (pgoff - mpnt->vm_pgoff) << PAGE_SHIFT;
>> -               aliases += adjust_pte(mpnt, mpnt->vm_start + offset,
>> pfn, vmf);
>> +               mpnt_addr = mpnt->vm_start + offset;
>> +               /*
>> +                * If mpnt_addr and addr are mapped to the same PTE page,
>> +                * also skip this vma.
>> +                */
>> +               if (mpnt_addr >= start && mpnt_addr - start < PMD_SIZE)
>> +                       continue;
> 
> Hmm, but is skipping the right thing to do? Maybe you would want to 

Ah, your concern is that in this case we may also need to maintain cache
coherency. I'm not sure about this.

> indicate to adjust_pte() whether it must take the PTL or not.

Indeed, this is a better approach. Will do it, and hope arm people
can help test it.

Thanks!

> 
> I did not study that code in detail, though ...
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ