lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <201503182136.EJC90660.QSFOVJFOLHFOtM@I-love.SAKURA.ne.jp>
Date:	Wed, 18 Mar 2015 21:36:33 +0900
From:	Tetsuo Handa <penguin-kernel@...ove.SAKURA.ne.jp>
To:	vbabka@...e.cz, mhocko@...e.cz, hannes@...xchg.org
Cc:	akpm@...ux-foundation.org, david@...morbit.com, mgorman@...e.de,
	riel@...hat.com, fengguang.wu@...el.com, linux-mm@...ck.org,
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH 1/2 v2] mm: Allow small allocations to fail

Vlastimil Babka wrote:
> I'll add that I think if we do improve the reclaim etc, and make 
> allocations failures rarer, then the whole testing effort will have much 
> lower chance of finding the places where allocation failures are not 
> handled properly. Also Michal says that catching those depend on running 
> all "their loads which we never dreamed of". In that case, if our goal 
> is to fix all broken allocation sites with some quantifiable 
> probability, I'm afraid we might be really better off with some form of 
> fault injection, which will trigger the failures with the probability we 
> set, and not depend on corner case low memory conditions manifesting
> just at the time the workload is at one of the broken allocation sites.
> 

I think we can use SystemTap based fault injection which allows only once
injection per each backtrace without putting the system under OOM condition,
which I demonstrated at https://lkml.org/lkml/2014/12/25/64 .

Since SystemTap can generate backtraces without garbage lines,
we can uniquely identify and inject only once per each backtrace,
making it possible to test every memory allocation callers.

Steps for installation and testing are described below.

---------- installation start ----------
wget https://sourceware.org/systemtap/ftp/releases/systemtap-2.7.tar.gz
echo 'e0c3c36955323ae59be07a26a9563474  systemtap-2.7.tar.gz' | md5sum --check -
tar -zxf systemtap-2.7.tar.gz
cd systemtap-2.7
./configure --prefix=$HOME/systemtap.tmp
make -s
make -s install
---------- installation end ----------

---------- preparation (optional) start ----------
Start kdump service and set /proc/sys/kernel/panic_on_oops to 1
as root user so that we can obtain vmcore upon kernel oops.
---------- preparation (optional) end ----------

---------- testing start ----------
Run

$HOME/systemtap.tmp/bin/staprun fault_injection.ko

and operate as you like, and see whether your system can survive or not.
---------- testing end ----------

The fault_injection.ko is generated by commands shown below.
Scripts shown below checks only sleepable allocations. If you
replace %{ __GFP_WAIT %} with 0, you can check atomic allocations.

---------- For testing __kmalloc() failure ----------
$HOME/systemtap.tmp/bin/stap -p4 -m fault_injection -g -DSTP_NO_OVERLOAD -e '
global traces_bt[65536];
probe begin { printf("Probe start!\n"); }
probe kernel.function("__kmalloc") {
  if (($flags & %{ __GFP_NOFAIL | __GFP_WAIT %} ) == %{ __GFP_WAIT %} && execname() != "stapio") {
    bt = backtrace();
    if (traces_bt[bt]++ == 0) {
      printf("%s (%u) size:%u gfp:0x%x\n", execname(), tid(), $size, $flags);
      print_stack(bt);
      printf("\n\n");
      $size = 1 << 30;
    }
  }
}
probe end { delete traces_bt; }'
---------- For testing __kmalloc() failure ----------

Like an example shown below demonstrate, we will be able to selectively
test specific subsystems by setting per a task_struct marker.

---------- For testing __alloc_pages_nodemask() failure except page fault ----------
$HOME/systemtap.tmp/bin/stap -p4 -m fault_injection -g -DSTP_NO_OVERLOAD -e '
global traces_bt[65536];
global in_page_fault%;
probe begin { printf("Probe start!\n"); }
probe kernel.function("__alloc_pages_nodemask") {
  if (($gfp_mask & %{ __GFP_NOFAIL | __GFP_WAIT %} ) == %{ __GFP_WAIT %} &&
      in_page_fault[tid()] == 0 && execname() != "stapio") {
    bt = backtrace();
    if (traces_bt[bt]++ == 0) {
      printf("%s (%u) order:%u gfp:0x%x\n", execname(), tid(), $order, $gfp_mask);
      print_stack(bt);
      printf("\n\n");
      $order = 1 << 30;
      $gfp_mask = $gfp_mask | %{ __GFP_NORETRY %};
    }
  }
}
probe kernel.function("handle_mm_fault") {
  in_page_fault[tid()]++;
}
probe kernel.function("handle_mm_fault").return {
  in_page_fault[tid()]--;
}
probe end { delete traces_bt; delete in_page_fault; }'
---------- For testing __alloc_pages_nodemask() failure except page fault ----------
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ