linux-kernel - Re: [PATCH v1] mm: fix unmap_mapping

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CADOw95fss1AY_xuQV+4iOLZOA0ofYAaK7uCJHPiuVVLZDZBa6A@mail.gmail.com>
Date: Thu, 21 Dec 2023 13:40:11 +0800
From: Jiajun Xie <jiajun.xie.sh@...il.com>
To: Andrew Morton <akpm@...ux-foundation.org>
Cc: linux-mm@...ck.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v1] mm: fix unmap_mapping_range high bits shift bug

On Thu, Dec 21, 2023 at 1:53 AM Andrew Morton <akpm@...ux-foundation.org> wrote:
>
> On Wed, 20 Dec 2023 13:28:39 +0800 "jiajun.xie" <jiajun.xie.sh@...il.com> wrote:
>
> > From: Jiajun Xie <jiajun.xie.sh@...il.com>
> >
> > The bug happens when highest bit of holebegin is 1, suppose
> > holebign is 0x8000000111111000, after shift, hba would be
> > 0xfff8000000111111, then vma_interval_tree_foreach would look
> > it up fail or leads to the wrong result.
> >
> > error call seq e.g.:
> > - mmap(..., offset=0x8000000111111000)
> >   |- syscall(mmap, ... unsigned long, off):
> >      |- ksys_mmap_pgoff( ... , off >> PAGE_SHIFT);
> >
> >   here pgoff is correctly shifted to 0x8000000111111,
> >   but pass 0x8000000111111000 as holebegin to unmap
> >   would then cause terrible result, as shown below:
> >
> > - unmap_mapping_range(..., loff_t const holebegin)
> >   |- pgoff_t hba = holebegin >> PAGE_SHIFT;
> >           /* hba = 0xfff8000000111111 unexpectedly */
> >
> > turn holebegin to be unsigned first would fix the bug.
> >
>
> Thanks.  Are you able to describe the runtime effects of this
> (obviously bad, but it's good to spell it out) and under what
> circumstances it occurs?

Thanks for the quick reply.

The issue happens in Heterogeneous computing, where the
device(e.g. gpu) and host share the same virtual address space.

A simple workflow pattern which hit the issue is:
        /* host */
    1. userspace first mmap a file backed VA range with specified offset.
                        e.g. (offset=0x800..., mmap return: va_a)
    2. write some data to the corresponding sys page
                         e.g. (va_a = 0xAABB)
        /* device */
    3. gpu workload touches VA, triggers gpu fault and notify the host.
        /* host */
    4. reviced gpu fault notification, then it will:
            4.1 unmap host pages and also takes care of cpu tlb
                  (use unmap_mapping_range with offset=0x800...)
            4.2 migrate sys page to device
            4.3 setup device page table and resolve device fault.
        /* device */
    5. gpu workload continued, it accessed va_a and got 0xAABB.
    6. gpu workload continued, it wrote 0xBBCC to va_a.
        /* host */
    7. userspace access va_a, as expected, it will:
            7.1 trigger cpu vm fault.
            7.2 driver handling fault to migrate gpu local page to host.
    8. userspace then could correctly get 0xBBCC from va_a
    9. done

But in step 4.1, if we hitted the bug this patch mentioned, then user space
would never trigger cpu fault, and still get the old value: 0xAABB.