lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAJuCfpGJqzuJqfH42=f5LXxkf4uCNmUUh2zXyzodczP29O6_hQ@mail.gmail.com>
Date:   Thu, 19 Oct 2023 14:55:31 -0700
From:   Suren Baghdasaryan <surenb@...gle.com>
To:     kernel test robot <lkp@...el.com>
Cc:     akpm@...ux-foundation.org, oe-kbuild-all@...ts.linux.dev,
        viro@...iv.linux.org.uk, brauner@...nel.org, shuah@...nel.org,
        aarcange@...hat.com, lokeshgidra@...gle.com, peterx@...hat.com,
        david@...hat.com, hughd@...gle.com, mhocko@...e.com,
        axelrasmussen@...gle.com, rppt@...nel.org, willy@...radead.org,
        Liam.Howlett@...cle.com, jannh@...gle.com, zhangpeng362@...wei.com,
        bgeffon@...gle.com, kaleshsingh@...gle.com, ngeoffray@...gle.com,
        jdduke@...gle.com, linux-mm@...ck.org,
        linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org,
        linux-kselftest@...r.kernel.org, kernel-team@...roid.com
Subject: Re: [PATCH v3 2/3] userfaultfd: UFFDIO_MOVE uABI

On Tue, Oct 17, 2023 at 12:40 PM kernel test robot <lkp@...el.com> wrote:
>
> Hi Suren,
>
> kernel test robot noticed the following build warnings:
>
> [auto build test WARNING on akpm-mm/mm-everything]
> [also build test WARNING on next-20231017]
> [cannot apply to linus/master v6.6-rc6]
> [If your patch is applied to the wrong git tree, kindly drop us a note.
> And when submitting patch, we suggest to use '--base' as documented in
> https://git-scm.com/docs/git-format-patch#_base_tree_information]
>
> url:    https://github.com/intel-lab-lkp/linux/commits/Suren-Baghdasaryan/mm-rmap-support-move-to-different-root-anon_vma-in-folio_move_anon_rmap/20231009-144552
> base:   https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-everything
> patch link:    https://lore.kernel.org/r/20231009064230.2952396-3-surenb%40google.com
> patch subject: [PATCH v3 2/3] userfaultfd: UFFDIO_MOVE uABI
> config: i386-randconfig-141-20231017 (https://download.01.org/0day-ci/archive/20231018/202310180338.zTpcYECK-lkp@intel.com/config)
> compiler: gcc-12 (Debian 12.2.0-14) 12.2.0
> reproduce: (https://download.01.org/0day-ci/archive/20231018/202310180338.zTpcYECK-lkp@intel.com/reproduce)
>
> If you fix the issue in a separate patch/commit (i.e. not just a new version of
> the same patch/commit), kindly add following tags
> | Reported-by: kernel test robot <lkp@...el.com>
> | Closes: https://lore.kernel.org/oe-kbuild-all/202310180338.zTpcYECK-lkp@intel.com/
>
> smatch warnings:
> mm/userfaultfd.c:1380 remap_pages() warn: unsigned 'src_start + len - src_addr' is never less than zero.

Hmm. I think this warning is correct only when
CONFIG_TRANSPARENT_HUGEPAGE=n. I guess I'll have to add an "ifdef
CONFIG_TRANSPARENT_HUGEPAGE" here after all, which lets us move these
checks before locking PTL.

>
> vim +1380 mm/userfaultfd.c
>
>   1195
>   1196  /**
>   1197   * remap_pages - remap arbitrary anonymous pages of an existing vma
>   1198   * @dst_start: start of the destination virtual memory range
>   1199   * @src_start: start of the source virtual memory range
>   1200   * @len: length of the virtual memory range
>   1201   *
>   1202   * remap_pages() remaps arbitrary anonymous pages atomically in zero
>   1203   * copy. It only works on non shared anonymous pages because those can
>   1204   * be relocated without generating non linear anon_vmas in the rmap
>   1205   * code.
>   1206   *
>   1207   * It provides a zero copy mechanism to handle userspace page faults.
>   1208   * The source vma pages should have mapcount == 1, which can be
>   1209   * enforced by using madvise(MADV_DONTFORK) on src vma.
>   1210   *
>   1211   * The thread receiving the page during the userland page fault
>   1212   * will receive the faulting page in the source vma through the network,
>   1213   * storage or any other I/O device (MADV_DONTFORK in the source vma
>   1214   * avoids remap_pages() to fail with -EBUSY if the process forks before
>   1215   * remap_pages() is called), then it will call remap_pages() to map the
>   1216   * page in the faulting address in the destination vma.
>   1217   *
>   1218   * This userfaultfd command works purely via pagetables, so it's the
>   1219   * most efficient way to move physical non shared anonymous pages
>   1220   * across different virtual addresses. Unlike mremap()/mmap()/munmap()
>   1221   * it does not create any new vmas. The mapping in the destination
>   1222   * address is atomic.
>   1223   *
>   1224   * It only works if the vma protection bits are identical from the
>   1225   * source and destination vma.
>   1226   *
>   1227   * It can remap non shared anonymous pages within the same vma too.
>   1228   *
>   1229   * If the source virtual memory range has any unmapped holes, or if
>   1230   * the destination virtual memory range is not a whole unmapped hole,
>   1231   * remap_pages() will fail respectively with -ENOENT or -EEXIST. This
>   1232   * provides a very strict behavior to avoid any chance of memory
>   1233   * corruption going unnoticed if there are userland race conditions.
>   1234   * Only one thread should resolve the userland page fault at any given
>   1235   * time for any given faulting address. This means that if two threads
>   1236   * try to both call remap_pages() on the same destination address at the
>   1237   * same time, the second thread will get an explicit error from this
>   1238   * command.
>   1239   *
>   1240   * The command retval will return "len" is successful. The command
>   1241   * however can be interrupted by fatal signals or errors. If
>   1242   * interrupted it will return the number of bytes successfully
>   1243   * remapped before the interruption if any, or the negative error if
>   1244   * none. It will never return zero. Either it will return an error or
>   1245   * an amount of bytes successfully moved. If the retval reports a
>   1246   * "short" remap, the remap_pages() command should be repeated by
>   1247   * userland with src+retval, dst+reval, len-retval if it wants to know
>   1248   * about the error that interrupted it.
>   1249   *
>   1250   * The UFFDIO_MOVE_MODE_ALLOW_SRC_HOLES flag can be specified to
>   1251   * prevent -ENOENT errors to materialize if there are holes in the
>   1252   * source virtual range that is being remapped. The holes will be
>   1253   * accounted as successfully remapped in the retval of the
>   1254   * command. This is mostly useful to remap hugepage naturally aligned
>   1255   * virtual regions without knowing if there are transparent hugepage
>   1256   * in the regions or not, but preventing the risk of having to split
>   1257   * the hugepmd during the remap.
>   1258   *
>   1259   * If there's any rmap walk that is taking the anon_vma locks without
>   1260   * first obtaining the folio lock (the only current instance is
>   1261   * folio_referenced), they will have to verify if the folio->mapping
>   1262   * has changed after taking the anon_vma lock. If it changed they
>   1263   * should release the lock and retry obtaining a new anon_vma, because
>   1264   * it means the anon_vma was changed by remap_pages() before the lock
>   1265   * could be obtained. This is the only additional complexity added to
>   1266   * the rmap code to provide this anonymous page remapping functionality.
>   1267   */
>   1268  ssize_t remap_pages(struct mm_struct *dst_mm, struct mm_struct *src_mm,
>   1269                      unsigned long dst_start, unsigned long src_start,
>   1270                      unsigned long len, __u64 mode)
>   1271  {
>   1272          struct vm_area_struct *src_vma, *dst_vma;
>   1273          unsigned long src_addr, dst_addr;
>   1274          pmd_t *src_pmd, *dst_pmd;
>   1275          long err = -EINVAL;
>   1276          ssize_t moved = 0;
>   1277
>   1278          /*
>   1279           * Sanitize the command parameters:
>   1280           */
>   1281          BUG_ON(src_start & ~PAGE_MASK);
>   1282          BUG_ON(dst_start & ~PAGE_MASK);
>   1283          BUG_ON(len & ~PAGE_MASK);
>   1284
>   1285          /* Does the address range wrap, or is the span zero-sized? */
>   1286          BUG_ON(src_start + len <= src_start);
>   1287          BUG_ON(dst_start + len <= dst_start);
>   1288
>   1289          /*
>   1290           * Because these are read sempahores there's no risk of lock
>   1291           * inversion.
>   1292           */
>   1293          mmap_read_lock(dst_mm);
>   1294          if (dst_mm != src_mm)
>   1295                  mmap_read_lock(src_mm);
>   1296
>   1297          /*
>   1298           * Make sure the vma is not shared, that the src and dst remap
>   1299           * ranges are both valid and fully within a single existing
>   1300           * vma.
>   1301           */
>   1302          src_vma = find_vma(src_mm, src_start);
>   1303          if (!src_vma || (src_vma->vm_flags & VM_SHARED))
>   1304                  goto out;
>   1305          if (src_start < src_vma->vm_start ||
>   1306              src_start + len > src_vma->vm_end)
>   1307                  goto out;
>   1308
>   1309          dst_vma = find_vma(dst_mm, dst_start);
>   1310          if (!dst_vma || (dst_vma->vm_flags & VM_SHARED))
>   1311                  goto out;
>   1312          if (dst_start < dst_vma->vm_start ||
>   1313              dst_start + len > dst_vma->vm_end)
>   1314                  goto out;
>   1315
>   1316          err = validate_remap_areas(src_vma, dst_vma);
>   1317          if (err)
>   1318                  goto out;
>   1319
>   1320          for (src_addr = src_start, dst_addr = dst_start;
>   1321               src_addr < src_start + len;) {
>   1322                  spinlock_t *ptl;
>   1323                  pmd_t dst_pmdval;
>   1324                  unsigned long step_size;
>   1325
>   1326                  BUG_ON(dst_addr >= dst_start + len);
>   1327                  /*
>   1328                   * Below works because anonymous area would not have a
>   1329                   * transparent huge PUD. If file-backed support is added,
>   1330                   * that case would need to be handled here.
>   1331                   */
>   1332                  src_pmd = mm_find_pmd(src_mm, src_addr);
>   1333                  if (unlikely(!src_pmd)) {
>   1334                          if (!(mode & UFFDIO_MOVE_MODE_ALLOW_SRC_HOLES)) {
>   1335                                  err = -ENOENT;
>   1336                                  break;
>   1337                          }
>   1338                          src_pmd = mm_alloc_pmd(src_mm, src_addr);
>   1339                          if (unlikely(!src_pmd)) {
>   1340                                  err = -ENOMEM;
>   1341                                  break;
>   1342                          }
>   1343                  }
>   1344                  dst_pmd = mm_alloc_pmd(dst_mm, dst_addr);
>   1345                  if (unlikely(!dst_pmd)) {
>   1346                          err = -ENOMEM;
>   1347                          break;
>   1348                  }
>   1349
>   1350                  dst_pmdval = pmdp_get_lockless(dst_pmd);
>   1351                  /*
>   1352                   * If the dst_pmd is mapped as THP don't override it and just
>   1353                   * be strict. If dst_pmd changes into TPH after this check, the
>   1354                   * remap_pages_huge_pmd() will detect the change and retry
>   1355                   * while remap_pages_pte() will detect the change and fail.
>   1356                   */
>   1357                  if (unlikely(pmd_trans_huge(dst_pmdval))) {
>   1358                          err = -EEXIST;
>   1359                          break;
>   1360                  }
>   1361
>   1362                  ptl = pmd_trans_huge_lock(src_pmd, src_vma);
>   1363                  if (ptl) {
>   1364                          if (pmd_devmap(*src_pmd)) {
>   1365                                  spin_unlock(ptl);
>   1366                                  err = -ENOENT;
>   1367                                  break;
>   1368                          }
>   1369
>   1370                          /*
>   1371                           * Check if we can move the pmd without
>   1372                           * splitting it. First check the address
>   1373                           * alignment to be the same in src/dst.  These
>   1374                           * checks don't actually need the PT lock but
>   1375                           * it's good to do it here to optimize this
>   1376                           * block away at build time if
>   1377                           * CONFIG_TRANSPARENT_HUGEPAGE is not set.
>   1378                           */
>   1379                          if ((src_addr & ~HPAGE_PMD_MASK) || (dst_addr & ~HPAGE_PMD_MASK) ||
> > 1380                              src_start + len - src_addr < HPAGE_PMD_SIZE || !pmd_none(dst_pmdval)) {
>
> --
> 0-DAY CI Kernel Test Service
> https://github.com/intel/lkp-tests/wiki
>
> --
> To unsubscribe from this group and stop receiving emails from it, send an email to kernel-team+unsubscribe@...roid.com.
>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ