| Commit message (Collapse) | Author |
|
Introduce LD_DEAD_CODE_DATA_ELIMINATION option for architectures to
select to build with -ffunction-sections, -fdata-sections, and link
with --gc-sections. It requires some work (documented) to ensure all
unreferenced entrypoints are live, and requires toolchain and build
verification, so it is made a per-arch option for now.
On a random powerpc64le build, this yelds a significant size saving,
it boots and runs fine, but there is a lot I haven't tested as yet, so
these savings may be reduced if there are bugs in the link.
text data bss dec filename
11169741 1180744 1923176 14273661 vmlinux
10445269 1004127 1919707 13369103 vmlinux.dce
~700K text, ~170K data, 6% removed from kernel image size.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michal Marek <mmarek@suse.com>
(cherry-pick from b67067f1176df6ee727450546b58704e4b588563)
Change-Id: I81b63489605bc2f146498d0bb0e1cc5b7adab8a0
Signed-off-by: Dan Aloni <daloni@magicleap.com>
Signed-off-by: Davide Garberi <dade.garberi@gmail.com>
|
|
Make the anon_inodes facility unconditional so that it can be used by core
VFS code.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
(cherry picked from commit dadd2299ab61fc2b55b95b7b3a8f674cdd3b69c9)
Bug: 135608568
Test: test program using syscall(__NR_sys_pidfd_open,..) and poll()
Change-Id: I2f97bda4f360d8d05bbb603de839717b3d8067ae
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
|
|
* This has been added twice who knows when by CAF, most likely during an important merge
Signed-off-by: Davide Garberi <dade.garberi@gmail.com>
|
|
commit 08389d888287c3823f80b0216766b71e17f0aba5 upstream.
Add a kconfig knob which allows for unprivileged bpf to be disabled by default.
If set, the knob sets /proc/sys/kernel/unprivileged_bpf_disabled to value of 2.
This still allows a transition of 2 -> {0,1} through an admin. Similarly,
this also still keeps 1 -> {1} behavior intact, so that once set to permanently
disabled, it cannot be undone aside from a reboot.
We've also added extra2 with max of 2 for the procfs handler, so that an admin
still has a chance to toggle between 0 <-> 2.
Either way, as an additional alternative, applications can make use of CAP_BPF
that we added a while ago.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/74ec548079189e4e4dffaeb42b8987bb3c852eee.1620765074.git.daniel@iogearbox.net
[fllinden@amazon.com: backported to 4.9]
Signed-off-by: Frank van der Linden <fllinden@amazon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
We now 'select SOCK_CGROUP_DATA' but Kconfig complains that this is
not right when CONFIG_NET is disabled and there is no socket interface:
warning: (CGROUP_BPF) selects SOCK_CGROUP_DATA which has unmet direct dependencies (NET)
I don't know what the correct solution for this is, but simply removing
the dependency on NET from SOCK_CGROUP_DATA by moving it out of the
'if NET' section avoids the warning and does not produce other build
errors.
Fixes: 483c4933ea09 ("cgroup: Fix CGROUP_BPF config")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fixes: Change-Id: Ib41ef78fba02eb9e592558ddbf06f9ec0aa337b6
("UPSTREAM: cgroup: Fix CGROUP_BPF config")
(cherry picked from commit 73b351473547e543e9c8166dd67fd99c64c15b0b)
Signed-off-by: Amit Pundir <amit.pundir@linaro.org>
Signed-off-by: Chatur27 <jasonbright2709@gmail.com>
|
|
Cherry-pick from commit 483c4933ea09b7aa625b9d64af286fc22ec7e419
CGROUP_BPF depended on SOCK_CGROUP_DATA which can't be manually
enabled, making it rather challenging to turn CGROUP_BPF on.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Bug: 30950746
Change-Id: Ib41ef78fba02eb9e592558ddbf06f9ec0aa337b6
Signed-off-by: Chatur27 <jasonbright2709@gmail.com>
|
|
[ upstream commit 290af86629b25ffd1ed6232c4e9107da031705cb ]
The BPF interpreter has been used as part of the spectre 2 attack CVE-2017-5715.
A quote from goolge project zero blog:
"At this point, it would normally be necessary to locate gadgets in
the host kernel code that can be used to actually leak data by reading
from an attacker-controlled location, shifting and masking the result
appropriately and then using the result of that as offset to an
attacker-controlled address for a load. But piecing gadgets together
and figuring out which ones work in a speculation context seems annoying.
So instead, we decided to use the eBPF interpreter, which is built into
the host kernel - while there is no legitimate way to invoke it from inside
a VM, the presence of the code in the host kernel's text section is sufficient
to make it usable for the attack, just like with ordinary ROP gadgets."
To make attacker job harder introduce BPF_JIT_ALWAYS_ON config
option that removes interpreter from the kernel in favor of JIT-only mode.
So far eBPF JIT is supported by:
x64, arm64, arm32, sparc64, s390, powerpc64, mips64
The start of JITed program is randomized and code page is marked as read-only.
In addition "constant blinding" can be turned on with net.core.bpf_jit_harden
v2->v3:
- move __bpf_prog_ret0 under ifdef (Daniel)
v1->v2:
- fix init order, test_bpf and cBPF (Daniel's feedback)
- fix offloaded bpf (Jakub's feedback)
- add 'return 0' dummy in case something can invoke prog->bpf_func
- retarget bpf tree. For bpf-next the patch would need one extra hunk.
It will be sent when the trees are merged back to net-next
Considered doing:
int bpf_jit_enable __read_mostly = BPF_EBPF_JIT_DEFAULT;
but it seems better to land the patch as-is and in bpf-next remove
bpf_jit_enable global variable from all JITs, consolidate in one place
and remove this jit_init() function.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Chatur27 <jasonbright2709@gmail.com>
|
|
Cherry-pick from commit 3007098494bec614fb55dee7bc0410bb7db5ad18
This patch adds two sets of eBPF program pointers to struct cgroup.
One for such that are directly pinned to a cgroup, and one for such
that are effective for it.
To illustrate the logic behind that, assume the following example
cgroup hierarchy.
A - B - C
\ D - E
If only B has a program attached, it will be effective for B, C, D
and E. If D then attaches a program itself, that will be effective for
both D and E, and the program in B will only affect B and C. Only one
program of a given type is effective for a cgroup.
Attaching and detaching programs will be done through the bpf(2)
syscall. For now, ingress and egress inet socket filtering are the
only supported use-cases.
Signed-off-by: Daniel Mack <daniel@zonque.org>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Bug: 30950746
Change-Id: I3df35d8d3b1261503f9b5bcd90b18c9358f1ac28
Signed-off-by: Chatur27 <jasonbright2709@gmail.com>
|
|
This reverts commit 28c486744e6de4d882a1d853aa63d99fcba4b7a6.
Change-Id: I0e65abce457093a6c05d971b750228d99d5de131
|
|
commit 0711f0d7050b9e07c44bc159bbc64ac0a1022c7f upstream.
During boot, kernel_init_freeable() initializes `cad_pid` to the init
task's struct pid. Later on, we may change `cad_pid` via a sysctl, and
when this happens proc_do_cad_pid() will increment the refcount on the
new pid via get_pid(), and will decrement the refcount on the old pid
via put_pid(). As we never called get_pid() when we initialized
`cad_pid`, we decrement a reference we never incremented, can therefore
free the init task's struct pid early. As there can be dangling
references to the struct pid, we can later encounter a use-after-free
(e.g. when delivering signals).
This was spotted when fuzzing v5.13-rc3 with Syzkaller, but seems to
have been around since the conversion of `cad_pid` to struct pid in
commit 9ec52099e4b8 ("[PATCH] replace cad_pid by a struct pid") from the
pre-KASAN stone age of v2.6.19.
Fix this by getting a reference to the init task's struct pid when we
assign it to `cad_pid`.
Full KASAN splat below.
==================================================================
BUG: KASAN: use-after-free in ns_of_pid include/linux/pid.h:153 [inline]
BUG: KASAN: use-after-free in task_active_pid_ns+0xc0/0xc8 kernel/pid.c:509
Read of size 4 at addr ffff23794dda0004 by task syz-executor.0/273
CPU: 1 PID: 273 Comm: syz-executor.0 Not tainted 5.12.0-00001-g9aef892b2d15 #1
Hardware name: linux,dummy-virt (DT)
Call trace:
ns_of_pid include/linux/pid.h:153 [inline]
task_active_pid_ns+0xc0/0xc8 kernel/pid.c:509
do_notify_parent+0x308/0xe60 kernel/signal.c:1950
exit_notify kernel/exit.c:682 [inline]
do_exit+0x2334/0x2bd0 kernel/exit.c:845
do_group_exit+0x108/0x2c8 kernel/exit.c:922
get_signal+0x4e4/0x2a88 kernel/signal.c:2781
do_signal arch/arm64/kernel/signal.c:882 [inline]
do_notify_resume+0x300/0x970 arch/arm64/kernel/signal.c:936
work_pending+0xc/0x2dc
Allocated by task 0:
slab_post_alloc_hook+0x50/0x5c0 mm/slab.h:516
slab_alloc_node mm/slub.c:2907 [inline]
slab_alloc mm/slub.c:2915 [inline]
kmem_cache_alloc+0x1f4/0x4c0 mm/slub.c:2920
alloc_pid+0xdc/0xc00 kernel/pid.c:180
copy_process+0x2794/0x5e18 kernel/fork.c:2129
kernel_clone+0x194/0x13c8 kernel/fork.c:2500
kernel_thread+0xd4/0x110 kernel/fork.c:2552
rest_init+0x44/0x4a0 init/main.c:687
arch_call_rest_init+0x1c/0x28
start_kernel+0x520/0x554 init/main.c:1064
0x0
Freed by task 270:
slab_free_hook mm/slub.c:1562 [inline]
slab_free_freelist_hook+0x98/0x260 mm/slub.c:1600
slab_free mm/slub.c:3161 [inline]
kmem_cache_free+0x224/0x8e0 mm/slub.c:3177
put_pid.part.4+0xe0/0x1a8 kernel/pid.c:114
put_pid+0x30/0x48 kernel/pid.c:109
proc_do_cad_pid+0x190/0x1b0 kernel/sysctl.c:1401
proc_sys_call_handler+0x338/0x4b0 fs/proc/proc_sysctl.c:591
proc_sys_write+0x34/0x48 fs/proc/proc_sysctl.c:617
call_write_iter include/linux/fs.h:1977 [inline]
new_sync_write+0x3ac/0x510 fs/read_write.c:518
vfs_write fs/read_write.c:605 [inline]
vfs_write+0x9c4/0x1018 fs/read_write.c:585
ksys_write+0x124/0x240 fs/read_write.c:658
__do_sys_write fs/read_write.c:670 [inline]
__se_sys_write fs/read_write.c:667 [inline]
__arm64_sys_write+0x78/0xb0 fs/read_write.c:667
__invoke_syscall arch/arm64/kernel/syscall.c:37 [inline]
invoke_syscall arch/arm64/kernel/syscall.c:49 [inline]
el0_svc_common.constprop.1+0x16c/0x388 arch/arm64/kernel/syscall.c:129
do_el0_svc+0xf8/0x150 arch/arm64/kernel/syscall.c:168
el0_svc+0x28/0x38 arch/arm64/kernel/entry-common.c:416
el0_sync_handler+0x134/0x180 arch/arm64/kernel/entry-common.c:432
el0_sync+0x154/0x180 arch/arm64/kernel/entry.S:701
The buggy address belongs to the object at ffff23794dda0000
which belongs to the cache pid of size 224
The buggy address is located 4 bytes inside of
224-byte region [ffff23794dda0000, ffff23794dda00e0)
The buggy address belongs to the page:
page:(____ptrval____) refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x4dda0
head:(____ptrval____) order:1 compound_mapcount:0
flags: 0x3fffc0000010200(slab|head)
raw: 03fffc0000010200 dead000000000100 dead000000000122 ffff23794d40d080
raw: 0000000000000000 0000000000190019 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected
Memory state around the buggy address:
ffff23794dd9ff00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff23794dd9ff80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff23794dda0000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff23794dda0080: fb fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc
ffff23794dda0100: fc fc fc fc fc fc fc fc 00 00 00 00 00 00 00 00
==================================================================
Link: https://lkml.kernel.org/r/20210524172230.38715-1-mark.rutland@arm.com
Fixes: 9ec52099e4b8678a ("[PATCH] replace cad_pid by a struct pid")
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Cc: Christian Brauner <christian@brauner.io>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Kees Cook <keescook@chromium.org
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit ea29b20a828511de3348334e529a3d046a180416 upstream.
I read the commit log of the following two:
- bc083a64b6c0 ("init/Kconfig: make COMPILE_TEST depend on !UML")
- 334ef6ed06fa ("init/Kconfig: make COMPILE_TEST depend on !S390")
Both are talking about HAS_IOMEM dependency missing in many drivers.
So, 'depends on HAS_IOMEM' seems the direct, sensible solution to me.
This does not change the behavior of UML. UML still cannot enable
COMPILE_TEST because it does not provide HAS_IOMEM.
The current dependency for S390 is too strong. Under the condition of
CONFIG_PCI=y, S390 provides HAS_IOMEM, hence can enable COMPILE_TEST.
I also removed the meaningless 'default n'.
Link: https://lkml.kernel.org/r/20210224140809.1067582-1-masahiroy@kernel.org
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: Arnd Bergmann <arnd@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: KP Singh <kpsingh@google.com>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Nick Terrell <terrelln@fb.com>
Cc: Quentin Perret <qperret@google.com>
Cc: Valentin Schneider <valentin.schneider@arm.com>
Cc: "Enrico Weigelt, metux IT consult" <lkml@metux.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 334ef6ed06fa1a54e35296b77b693bcf6d63ee9e upstream.
While allmodconfig and allyesconfig build for s390 there are also
various bots running compile tests with randconfig, where PCI is
disabled. This reveals that a lot of drivers should actually depend on
HAS_IOMEM.
Adding this to each device driver would be a never ending story,
therefore just disable COMPILE_TEST for s390.
The reasoning is more or less the same as described in
commit bc083a64b6c0 ("init/Kconfig: make COMPILE_TEST depend on !UML").
Reported-by: kernel test robot <lkp@intel.com>
Suggested-by: Arnd Bergmann <arnd@kernel.org>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Cc: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit bc083a64b6c035135c0f80718f9e9192cc0867c6 upstream.
UML is a bit special since it does not have iomem nor dma. That means a
lot of drivers will not build if they miss a dependency on HAS_IOMEM.
s390 used to have the same issues but since it gained PCI support UML is
the only stranger.
We are tired of patching dozens of new drivers after every merge window
just to un-break allmod/yesconfig UML builds. One could argue that a
decent driver has to know on what it depends and therefore a missing
HAS_IOMEM dependency is a clear driver bug. But the dependency not
obvious and not everyone does UML builds with COMPILE_TEST enabled when
developing a device driver.
A possible solution to make these builds succeed on UML would be
providing stub functions for ioremap() and friends which fail upon
runtime. Another one is simply disabling COMPILE_TEST for UML. Since
it is the least hassle and does not force use to fake iomem support
let's do the latter.
Link: http://lkml.kernel.org/r/1466152995-28367-1-git-send-email-richard@nod.at
Signed-off-by: Richard Weinberger <richard@nod.at>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Since we need to change the implementation, stop exposing internals.
Provide KREF_INIT() to allow static initialization of struct kref.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Bug: 173789633
Change-Id: I5fa30eb173b15bcb5d291ccbb61294f28ba27ea9
(cherry picked from commit 1e24edca0557dba6486d39d3c24c288475432bcf)
Signed-off-by: Giuliano Procida <gprocida@google.com>
|
|
[ Upstream commit 550c10d28d21bd82a8bb48debbb27e6ed53262f6 ]
The .bss section for the h8300 is relatively small. A value of
CONFIG_LOG_BUF_SHIFT that is larger than 19 will create a static
printk ringbuffer that is too large. Limit the range appropriately
for the H8300.
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20200812073122.25412-1-john.ogness@linutronix.de
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
commit a9a3ed1eff3601b63aea4fb462d8b3b92c7c1e7e upstream.
... or the odyssey of trying to disable the stack protector for the
function which generates the stack canary value.
The whole story started with Sergei reporting a boot crash with a kernel
built with gcc-10:
Kernel panic — not syncing: stack-protector: Kernel stack is corrupted in: start_secondary
CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.6.0-rc5—00235—gfffb08b37df9 #139
Hardware name: Gigabyte Technology Co., Ltd. To be filled by O.E.M./H77M—D3H, BIOS F12 11/14/2013
Call Trace:
dump_stack
panic
? start_secondary
__stack_chk_fail
start_secondary
secondary_startup_64
-—-[ end Kernel panic — not syncing: stack—protector: Kernel stack is corrupted in: start_secondary
This happens because gcc-10 tail-call optimizes the last function call
in start_secondary() - cpu_startup_entry() - and thus emits a stack
canary check which fails because the canary value changes after the
boot_init_stack_canary() call.
To fix that, the initial attempt was to mark the one function which
generates the stack canary with:
__attribute__((optimize("-fno-stack-protector"))) ... start_secondary(void *unused)
however, using the optimize attribute doesn't work cumulatively
as the attribute does not add to but rather replaces previously
supplied optimization options - roughly all -fxxx options.
The key one among them being -fno-omit-frame-pointer and thus leading to
not present frame pointer - frame pointer which the kernel needs.
The next attempt to prevent compilers from tail-call optimizing
the last function call cpu_startup_entry(), shy of carving out
start_secondary() into a separate compilation unit and building it with
-fno-stack-protector, was to add an empty asm("").
This current solution was short and sweet, and reportedly, is supported
by both compilers but we didn't get very far this time: future (LTO?)
optimization passes could potentially eliminate this, which leads us
to the third attempt: having an actual memory barrier there which the
compiler cannot ignore or move around etc.
That should hold for a long time, but hey we said that about the other
two solutions too so...
Reported-by: Sergei Trofimovich <slyfox@gentoo.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Tested-by: Kalle Valo <kvalo@codeaurora.org>
Cc: <stable@vger.kernel.org>
Link: https://lkml.kernel.org/r/20200314164451.346497-1-slyfox@gentoo.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 78a5255ffb6a1af189a83e493d916ba1c54d8c75 upstream.
We have some rather random rules about when we accept the
"maybe-initialized" warnings, and when we don't.
For example, we consider it unreliable for gcc versions < 4.9, but also
if -O3 is enabled, or if optimizing for size. And then various kernel
config options disabled it, because they know that they trigger that
warning by confusing gcc sufficiently (ie PROFILE_ALL_BRANCHES).
And now gcc-10 seems to be introducing a lot of those warnings too, so
it falls under the same heading as 4.9 did.
At the same time, we have a very straightforward way to _enable_ that
warning when wanted: use "W=2" to enable more warnings.
So stop playing these ad-hoc games, and just disable that warning by
default, with the known and straight-forward "if you want to work on the
extra compiler warnings, use W=123".
Would it be great to have code that is always so obvious that it never
confuses the compiler whether a variable is used initialized or not?
Yes, it would. In a perfect world, the compilers would be smarter, and
our source code would be simpler.
That's currently not the world we live in, though.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit b303c6df80c9f8f13785aa83a0471fca7e38b24d upstream.
Since -Wmaybe-uninitialized was introduced by GCC 4.7, we have patched
various false positives:
- commit e74fc973b6e5 ("Turn off -Wmaybe-uninitialized when building
with -Os") turned off this option for -Os.
- commit 815eb71e7149 ("Kbuild: disable 'maybe-uninitialized' warning
for CONFIG_PROFILE_ALL_BRANCHES") turned off this option for
CONFIG_PROFILE_ALL_BRANCHES
- commit a76bcf557ef4 ("Kbuild: enable -Wmaybe-uninitialized warning
for "make W=1"") turned off this option for GCC < 4.9
Arnd provided more explanation in https://lkml.org/lkml/2017/3/14/903
I think this looks better by shifting the logic from Makefile to Kconfig.
Link: https://github.com/ClangBuiltLinux/linux/issues/350
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Reviewed-by: Nathan Chancellor <natechancellor@gmail.com>
Tested-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit c6c314a613cd7d03fb97713e0d642b493de42e69 upstream.
There are a few places in the kernel that access stack memory
belonging to a different task. Before we can start freeing task
stacks before the task_struct is freed, we need a way for those code
paths to pin the stack.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jann Horn <jann@thejh.net>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/17a434f50ad3d77000104f21666575e10a9c1fbd.1474003868.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: zhangyi (F) <yi.zhang@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit c65eacbe290b8141554c71b2c94489e73ade8c8d upstream.
If an arch opts in by setting CONFIG_THREAD_INFO_IN_TASK_STRUCT,
then thread_info is defined as a single 'u32 flags' and is the first
entry of task_struct. thread_info::task is removed (it serves no
purpose if thread_info is embedded in task_struct), and
thread_info::cpu gets its own slot in task_struct.
This is heavily based on a patch written by Linus.
Originally-from: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jann Horn <jann@thejh.net>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/a0898196f0476195ca02713691a5037a14f2aac5.1473801993.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: zhangyi (F) <yi.zhang@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
The WALT time accounting breaks when CFS tasks are throttled by the CPU
bandwidth control mechanism of the CPU cgroups controller. This can
result in a negative cumulative_runnable_avg, which can then lead to a
kernel panic, and the device crashing.
Although the right fix would be add support for throttled CFS tasks to
WALT, the common kernel is now in stable maintenance mode and will not
get new features which could cause issues for partners downstream.
To work around the issue, make the CFS_BANDWIDTH Kconfig option depend
on SCHED_WALT=n, hence preventing these two things from being enabled
simultaneously. This should not be an issue for most partners (nobody
had noticed the breakage for years), and those who do need the better
fix can apply it in their device kernel.
Bug: 139071966
Bug: 120440300
Change-Id: Ieb3c367ae7893ac93fb5b38c1580dc59151aacce
Signed-off-by: Quentin Perret <quentin.perret@arm.com>
|
|
The WALT time accounting breaks when CFS tasks are throttled by the CPU
bandwidth control mechanism of the CPU cgroups controller. This can
result in a negative cumulative_runnable_avg, which can then lead to a
kernel panic, and the device crashing.
Although the right fix would be add support for throttled CFS tasks to
WALT, the common kernel is now in stable maintenance mode and will not
get new features which could cause issues for partners downstream.
To work around the issue, make the CFS_BANDWIDTH Kconfig option depend
on SCHED_WALT=n, hence preventing these two things from being enabled
simultaneously. This should not be an issue for most partners (nobody
had noticed the breakage for years), and those who do need the better
fix can apply it in their device kernel.
Bug: 139071966
Bug: 120440300
Change-Id: Ieb3c367ae7893ac93fb5b38c1580dc59151aacce
Signed-off-by: Quentin Perret <quentin.perret@arm.com>
|
|
[ Upstream commit 6041186a32585fc7a1d0f6cfe2f138b05fdc3c82 ]
When a module option, or core kernel argument, toggles a static-key it
requires jump labels to be initialized early. While x86, PowerPC, and
ARM64 arrange for jump_label_init() to be called before parse_args(),
ARM does not.
Kernel command line: rdinit=/sbin/init page_alloc.shuffle=1 panic=-1 console=ttyAMA0,115200 page_alloc.shuffle=1
------------[ cut here ]------------
WARNING: CPU: 0 PID: 0 at ./include/linux/jump_label.h:303
page_alloc_shuffle+0x12c/0x1ac
static_key_enable(): static key 'page_alloc_shuffle_key+0x0/0x4' used
before call to jump_label_init()
Modules linked in:
CPU: 0 PID: 0 Comm: swapper Not tainted
5.1.0-rc4-next-20190410-00003-g3367c36ce744 #1
Hardware name: ARM Integrator/CP (Device Tree)
[<c0011c68>] (unwind_backtrace) from [<c000ec48>] (show_stack+0x10/0x18)
[<c000ec48>] (show_stack) from [<c07e9710>] (dump_stack+0x18/0x24)
[<c07e9710>] (dump_stack) from [<c001bb1c>] (__warn+0xe0/0x108)
[<c001bb1c>] (__warn) from [<c001bb88>] (warn_slowpath_fmt+0x44/0x6c)
[<c001bb88>] (warn_slowpath_fmt) from [<c0b0c4a8>]
(page_alloc_shuffle+0x12c/0x1ac)
[<c0b0c4a8>] (page_alloc_shuffle) from [<c0b0c550>] (shuffle_store+0x28/0x48)
[<c0b0c550>] (shuffle_store) from [<c003e6a0>] (parse_args+0x1f4/0x350)
[<c003e6a0>] (parse_args) from [<c0ac3c00>] (start_kernel+0x1c0/0x488)
Move the fallback call to jump_label_init() to occur before
parse_args().
The redundant calls to jump_label_init() in other archs are left intact
in case they have static key toggling use cases that are even earlier
than option parsing.
Link: http://lkml.kernel.org/r/155544804466.1032396.13418949511615676665.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Reported-by: Guenter Roeck <groeck@google.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Cc: Russell King <rmk@armlinux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
The thread which initiates the hot plug can get scheduled
out, while trying to acquire the console lock,
thus increasing the hot plug latency. This option
allows to selectively disable the console flush and
in turn reduce the hot plug latency.
Change-Id: I42507804d321b29b7761146a6c175d959bf79925
Signed-off-by: Mohammed Khajapasha <mkhaja@codeaurora.org>
Signed-off-by: Prasad Sodagudi <psodagud@codeaurora.org>
Signed-off-by: Neeraj Upadhyay <neeraju@codeaurora.org>
|
|
commit 877417e6ffb9578e8580abf76a71e15732473456 upstream.
CC_OPTIMIZE_FOR_SIZE disables the often useful -Wmaybe-unused warning,
because that causes a ridiculous amount of false positives when combined
with -Os.
This means a lot of warnings don't show up in testing by the developers
that should see them with an 'allmodconfig' kernel that has
CC_OPTIMIZE_FOR_SIZE enabled, but only later in randconfig builds
that don't.
This changes the Kconfig logic around CC_OPTIMIZE_FOR_SIZE to make
it a 'choice' statement defaulting to CC_OPTIMIZE_FOR_PERFORMANCE
that gets added for this purpose. The allmodconfig and allyesconfig
kernels now default to -O2 with the maybe-unused warning enabled.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Michal Marek <mmarek@suse.com>
Cc: Nathan Chancellor <natechancellor@gmail.com>
Cc: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Put kernel end place_marker for all targets.
This saves the kernel end time for targets which
enable MSM_BOOT_TIME_MARKER.
Change-Id: Iad635e971bdd341328d40681b7acf8a6f43f288d
Signed-off-by: Vivek Kumar <vivekuma@codeaurora.org>
|
|
[ upstream commit 290af86629b25ffd1ed6232c4e9107da031705cb ]
The BPF interpreter has been used as part of the spectre 2 attack CVE-2017-5715.
A quote from goolge project zero blog:
"At this point, it would normally be necessary to locate gadgets in
the host kernel code that can be used to actually leak data by reading
from an attacker-controlled location, shifting and masking the result
appropriately and then using the result of that as offset to an
attacker-controlled address for a load. But piecing gadgets together
and figuring out which ones work in a speculation context seems annoying.
So instead, we decided to use the eBPF interpreter, which is built into
the host kernel - while there is no legitimate way to invoke it from inside
a VM, the presence of the code in the host kernel's text section is sufficient
to make it usable for the attack, just like with ordinary ROP gadgets."
To make attacker job harder introduce BPF_JIT_ALWAYS_ON config
option that removes interpreter from the kernel in favor of JIT-only mode.
So far eBPF JIT is supported by:
x64, arm64, arm32, sparc64, s390, powerpc64, mips64
The start of JITed program is randomized and code page is marked as read-only.
In addition "constant blinding" can be turned on with net.core.bpf_jit_harden
v2->v3:
- move __bpf_prog_ret0 under ifdef (Daniel)
v1->v2:
- fix init order, test_bpf and cBPF (Daniel's feedback)
- fix offloaded bpf (Jakub's feedback)
- add 'return 0' dummy in case something can invoke prog->bpf_func
- retarget bpf tree. For bpf-next the patch would need one extra hunk.
It will be sent when the trees are merged back to net-next
Considered doing:
int bpf_jit_enable __read_mostly = BPF_EBPF_JIT_DEFAULT;
but it seems better to land the patch as-is and in bpf-next remove
bpf_jit_enable global variable from all JITs, consolidate in one place
and remove this jit_init() function.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Kaiser only needs to map one page of the stack; and
kernel/fork.c did not build on powerpc (no __PAGE_KERNEL).
It's all cleaner if linux/kaiser.h provides kaiser_map_thread_stack()
and kaiser_unmap_thread_stack() wrappers around asm/kaiser.h's
kaiser_add_mapping() and kaiser_remove_mapping(). And use
linux/kaiser.h in init/main.c to avoid the #ifdefs there.
Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
This patch introduces our implementation of KAISER (Kernel Address Isolation to
have Side-channels Efficiently Removed), a kernel isolation technique to close
hardware side channels on kernel address information.
More information about the patch can be found on:
https://github.com/IAIK/KAISER
From: Richard Fellner <richard.fellner@student.tugraz.at>
From: Daniel Gruss <daniel.gruss@iaik.tugraz.at>
X-Subject: [RFC, PATCH] x86_64: KAISER - do not map kernel in user mode
Date: Thu, 4 May 2017 14:26:50 +0200
Link: http://marc.info/?l=linux-kernel&m=149390087310405&w=2
Kaiser-4.10-SHA1: c4b1831d44c6144d3762ccc72f0c4e71a0c713e5
To: <linux-kernel@vger.kernel.org>
To: <kernel-hardening@lists.openwall.com>
Cc: <clementine.maurice@iaik.tugraz.at>
Cc: <moritz.lipp@iaik.tugraz.at>
Cc: Michael Schwarz <michael.schwarz@iaik.tugraz.at>
Cc: Richard Fellner <richard.fellner@student.tugraz.at>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: <kirill.shutemov@linux.intel.com>
Cc: <anders.fogh@gdata-adan.de>
After several recent works [1,2,3] KASLR on x86_64 was basically
considered dead by many researchers. We have been working on an
efficient but effective fix for this problem and found that not mapping
the kernel space when running in user mode is the solution to this
problem [4] (the corresponding paper [5] will be presented at ESSoS17).
With this RFC patch we allow anybody to configure their kernel with the
flag CONFIG_KAISER to add our defense mechanism.
If there are any questions we would love to answer them.
We also appreciate any comments!
Cheers,
Daniel (+ the KAISER team from Graz University of Technology)
[1] http://www.ieee-security.org/TC/SP2013/papers/4977a191.pdf
[2] https://www.blackhat.com/docs/us-16/materials/us-16-Fogh-Using-Undocumented-CPU-Behaviour-To-See-Into-Kernel-Mode-And-Break-KASLR-In-The-Process.pdf
[3] https://www.blackhat.com/docs/us-16/materials/us-16-Jang-Breaking-Kernel-Address-Space-Layout-Randomization-KASLR-With-Intel-TSX.pdf
[4] https://github.com/IAIK/KAISER
[5] https://gruss.cc/files/kaiser.pdf
[patch based also on
https://raw.githubusercontent.com/IAIK/KAISER/master/KAISER/0001-KAISER-Kernel-Address-Isolation.patch]
Signed-off-by: Richard Fellner <richard.fellner@student.tugraz.at>
Signed-off-by: Moritz Lipp <moritz.lipp@iaik.tugraz.at>
Signed-off-by: Daniel Gruss <daniel.gruss@iaik.tugraz.at>
Signed-off-by: Michael Schwarz <michael.schwarz@iaik.tugraz.at>
Acked-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Memory allocated for initrd would not be reclaimed if initializing ramfs
was skipped.
Bug: 69901741
Test: "grep MemTotal /proc/meminfo" increases by a few MB on an Android
device with a/b boot.
Change-Id: Ifbe094d303ed12cfd6de6aa004a8a19137a2f58a
Signed-off-by: Nick Bray <ncbray@google.com>
|
|
There are a few places in the kernel that access stack memory
belonging to a different task. Before we can start freeing task
stacks before the task_struct is freed, we need a way for those code
paths to pin the stack.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jann Horn <jann@thejh.net>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/17a434f50ad3d77000104f21666575e10a9c1fbd.1474003868.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Bug: 38331309
Change-Id: I414853e9b72ecb0967d5e1cbfc77b4929bf3f4f5
(cherry picked from commit c6c314a613cd7d03fb97713e0d642b493de42e69)
Signed-off-by: Zubin Mithra <zsm@google.com>
|
|
If an arch opts in by setting CONFIG_THREAD_INFO_IN_TASK_STRUCT,
then thread_info is defined as a single 'u32 flags' and is the first
entry of task_struct. thread_info::task is removed (it serves no
purpose if thread_info is embedded in task_struct), and
thread_info::cpu gets its own slot in task_struct.
This is heavily based on a patch written by Linus.
Originally-from: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jann Horn <jann@thejh.net>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/a0898196f0476195ca02713691a5037a14f2aac5.1473801993.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Bug: 38331309
Change-Id: I25e5a830f2ada5e74fa93661e97e5e701b1b70d2
(cherry picked from commit c65eacbe290b8141554c71b2c94489e73ade8c8d)
Signed-off-by: Zubin Mithra <zsm@google.com>
|
|
We've had the thread info allocated together with the thread stack for
most architectures for a long time (since the thread_info was split off
from the task struct), but that is about to change.
But the patches that move the thread info to be off-stack (and a part of
the task struct instead) made it clear how confused the allocator and
freeing functions are.
Because the common case was that we share an allocation with the thread
stack and the thread_info, the two pointers were identical. That
identity then meant that we would have things like
ti = alloc_thread_info_node(tsk, node);
...
tsk->stack = ti;
which certainly _worked_ (since stack and thread_info have the same
value), but is rather confusing: why are we assigning a thread_info to
the stack? And if we move the thread_info away, the "confusing" code
just gets to be entirely bogus.
So remove all this confusion, and make it clear that we are doing the
stack allocation by renaming and clarifying the function names to be
about the stack. The fact that the thread_info then shares the
allocation is an implementation detail, and not really about the
allocation itself.
This is a pure renaming and type fix: we pass in the same pointer, it's
just that we clarify what the pointer means.
The ia64 code that actually only has one single allocation (for all of
task_struct, thread_info and kernel thread stack) now looks a bit odd,
but since "tsk->stack" is actually not even used there, that oddity
doesn't matter. It would be a separate thing to clean that up, I
intentionally left the ia64 changes as a pure brute-force renaming and
type change.
Acked-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Bug: 38331309
Change-Id: I870b5476fc900c9145134f9dd3ed18a32a490162
(cherry picked from commit b235beea9e996a4d36fed6cfef4801a3e7d7a9a5)
Signed-off-by: Zubin Mithra <zsm@google.com>
|
|
This is needed for AVB integration work.
Bug: 31796270
Test: Manually tested (other arch).
Change-Id: I32fd37c1578c6414e3e6ff277d16ad94df7886b8
Signed-off-by: David Zeuthen <zeuthen@google.com>
Git-commit: 6a6a7657c231e947233c43ae0522bbd4edf0139e
Git-repo: https://android.googlesource.com/kernel/common/
Signed-off-by: Runmin Wang <runminw@codeaurora.org>
|
|
This is needed for AVB integration work.
Bug: 31796270
Test: Manually tested (other arch).
Change-Id: I32fd37c1578c6414e3e6ff277d16ad94df7886b8
Signed-off-by: David Zeuthen <zeuthen@google.com>
|
|
Fix SCHED_WALT dependency on FAIR_GROUP_SCHED otherwise we run
into following build failure:
CC kernel/sched/walt.o
kernel/sched/walt.c: In function 'walt_inc_cfs_cumulative_runnable_avg':
kernel/sched/walt.c:148:8: error: 'struct cfs_rq' has no member named 'cumulative_runnable_avg'
cfs_rq->cumulative_runnable_avg += p->ravg.demand;
^
kernel/sched/walt.c: In function 'walt_dec_cfs_cumulative_runnable_avg':
kernel/sched/walt.c:154:8: error: 'struct cfs_rq' has no member named 'cumulative_runnable_avg'
cfs_rq->cumulative_runnable_avg -= p->ravg.demand;
^
Reported-at: https://bugs.linaro.org/show_bug.cgi?id=2793
Signed-off-by: Amit Pundir <amit.pundir@linaro.org>
|
|
Fix SCHED_WALT dependency on FAIR_GROUP_SCHED otherwise we run
into following build failure:
CC kernel/sched/walt.o
kernel/sched/walt.c: In function 'walt_inc_cfs_cumulative_runnable_avg':
kernel/sched/walt.c:148:8: error: 'struct cfs_rq' has no member named 'cumulative_runnable_avg'
cfs_rq->cumulative_runnable_avg += p->ravg.demand;
^
kernel/sched/walt.c: In function 'walt_dec_cfs_cumulative_runnable_avg':
kernel/sched/walt.c:154:8: error: 'struct cfs_rq' has no member named 'cumulative_runnable_avg'
cfs_rq->cumulative_runnable_avg -= p->ravg.demand;
^
Reported-at: https://bugs.linaro.org/show_bug.cgi?id=2793
Signed-off-by: Amit Pundir <amit.pundir@linaro.org>
|
|
We no longer require dm_table_put. Upstream fixed the dm_mounts
driver to remove this symbol.
Change-Id: I4ba1043965d25ec444a833283392ac2394c845f3
Signed-off-by: Vikram Mulukutla <markivx@codeaurora.org>
|
|
1. "dm: optimize use SRCU and RCU" removes the use of dm_table_put.
2. "dm: remove request-based logic from make_request_fn wrapper" necessitates
calling dm_setup_md_queue or else the request_queue's make_request_fn
pointer ends being unset.
[ 7.711600] Internal error: Oops - bad mode: 0 [#1] PREEMPT SMP
[ 7.717519] CPU: 1 PID: 1 Comm: swapper/0 Tainted: G W 4.1.15-02273-gb057d16-dirty #33
[ 7.726559] Hardware name: HiKey Development Board (DT)
[ 7.731779] task: ffffffc005f8acc0 ti: ffffffc005f8c000 task.ti: ffffffc005f8c000
[ 7.739257] PC is at 0x0
[ 7.741787] LR is at generic_make_request+0x8c/0x108
....
[ 9.082931] Call trace:
[ 9.085372] [< (null)>] (null)
[ 9.090074] [<ffffffc0003f4ac0>] submit_bio+0x98/0x1e0
[ 9.095212] [<ffffffc0001e2618>] _submit_bh+0x120/0x1f0
[ 9.096165] cfg80211: Calling CRDA to update world regulatory domain
[ 9.106781] [<ffffffc0001e5450>] __bread_gfp+0x94/0x114
[ 9.112004] [<ffffffc00024a748>] ext4_fill_super+0x18c/0x2d64
[ 9.117750] [<ffffffc0001b275c>] mount_bdev+0x194/0x1c0
[ 9.122973] [<ffffffc0002450dc>] ext4_mount+0x14/0x1c
[ 9.128021] [<ffffffc0001b29a0>] mount_fs+0x3c/0x194
[ 9.132985] [<ffffffc0001d059c>] vfs_kern_mount+0x4c/0x134
[ 9.138467] [<ffffffc0001d2168>] do_mount+0x204/0xbbc
[ 9.143514] [<ffffffc0001d2ec4>] SyS_mount+0x94/0xe8
[ 9.148479] [<ffffffc000c54074>] mount_block_root+0x120/0x24c
[ 9.154222] [<ffffffc000c543e8>] mount_root+0x110/0x12c
[ 9.159443] [<ffffffc000c54574>] prepare_namespace+0x170/0x1b8
[ 9.165273] [<ffffffc000c53d98>] kernel_init_freeable+0x23c/0x260
[ 9.171365] [<ffffffc0009b1748>] kernel_init+0x10/0x118
[ 9.176589] Code: bad PC value
[ 9.179807] ---[ end trace 75e1bc52ba364d13 ]---
Bug: 27175947
Signed-off-by: Badhri Jagan Sridharan <Badhri@google.com>
Change-Id: I952d86fd1475f0825f9be1386e3497b36127abd0
Git-commit: 5a77db78398a7247ab6bb6ff9720bae19d1f9ff4
Git-repo: https://android.googlesource.com/kernel/common
Signed-off-by: Vikram Mulukutla <markivx@codeaurora.org>
|
|
This is a wrap-up of three patches pending upstream approval.
I'm bundling them because they are interdependent, and it'll be
easier to drop it on rebase later.
1. dm: allow a dm-fs-style device to be shared via dm-ioctl
Integrates feedback from Alisdair, Mike, and Kiyoshi.
Two main changes occur here:
- One function is added which allows for a programmatically created
mapped device to be inserted into the dm-ioctl hash table. This binds
the device to a name and, optional, uuid which is needed by udev and
allows for userspace management of the mapped device.
- dm_table_complete() was extended to handle all of the final
functional changes required for the table to be operational once
called.
2. init: boot to device-mapper targets without an initr*
Add a dm= kernel parameter modeled after the md= parameter from
do_mounts_md. It allows for device-mapper targets to be configured at
boot time for use early in the boot process (as the root device or
otherwise). It also replaces /dev/XXX calls with major:minor opportunistically.
The format is dm="name uuid ro,table line 1,table line 2,...". The
parser expects the comma to be safe to use as a newline substitute but,
otherwise, uses the normal separator of space. Some attempt has been
made to make it forgiving of additional spaces (using skip_spaces()).
A mapped device created during boot will be assigned a minor of 0 and
may be access via /dev/dm-0.
An example dm-linear root with no uuid may look like:
root=/dev/dm-0 dm="lroot none ro, 0 4096 linear /dev/ubdb 0, 4096 4096 linear /dv/ubdc 0"
Once udev is started, /dev/dm-0 will become /dev/mapper/lroot.
Older upstream threads:
http://marc.info/?l=dm-devel&m=127429492521964&w=2
http://marc.info/?l=dm-devel&m=127429499422096&w=2
http://marc.info/?l=dm-devel&m=127429493922000&w=2
Latest upstream threads:
https://patchwork.kernel.org/patch/104859/
https://patchwork.kernel.org/patch/104860/
https://patchwork.kernel.org/patch/104861/
Bug: 27175947
Signed-off-by: Will Drewry <wad@chromium.org>
Review URL: http://codereview.chromium.org/2020011
Change-Id: I92bd53432a11241228d2e5ac89a3b20d19b05a31
Git-commit: 96b0434c25be8c73822a679be35b29ca5263baea
Git-repo: https://android.googlesource.com/kernel/common
Signed-off-by: Vikram Mulukutla <markivx@codeaurora.org>
|
|
To temporarily allow compilation of an upcoming dm change,
add a dummy dm_table_put definition.
Change-Id: Iceca2eb6daa55f0acb936eafe1d59f65f7cfcd55
Signed-off-by: Vikram Mulukutla <markivx@codeaurora.org>
|
|
feature flag
The ENERGY_AWARE sched feature flag cannot be set unless
CONFIG_SCHED_DEBUG is enabled.
So this patch allows the flag to default to true at build time
if the config is set.
Change-Id: I8835a571fdb7a8f8ee6a54af1e11a69f3b5ce8e6
Signed-off-by: John Stultz <john.stultz@linaro.org>
|
|
To support task performance boosting, the usage of a single knob has the
advantage to be a simple solution, both from the implementation and the
usability standpoint. However, on a real system it can be difficult to
identify a single value for the knob which fits the needs of multiple
different tasks. For example, some kernel threads and/or user-space
background services should be better managed the "standard" way while we
still want to be able to boost the performance of specific workloads.
In order to improve the flexibility of the task boosting mechanism this
patch is the first of a small series which extends the previous
implementation to introduce a "per task group" support.
This first patch introduces just the basic CGroups support, a new
"schedtune" CGroups controller is added which allows to configure
different boost value for different groups of tasks.
To keep the implementation simple but still effective for a boosting
strategy, the new controller:
1. allows only a two layer hierarchy
2. supports only a limited number of boost groups
A two layer hierarchy allows to place each task either:
a) in the root control group
thus being subject to a system-wide boosting value
b) in a child of the root group
thus being subject to the specific boost value defined by that
"boost group"
The limited number of "boost groups" supported is mainly motivated by
the observation that in a real system it could be useful to have only
few classes of tasks which deserve different treatment.
For example, background vs foreground or interactive vs low-priority.
As an additional benefit, a limited number of boost groups allows also
to have a simpler implementation especially for the code required to
compute the boost value for CPUs which have runnable tasks belonging to
different boost groups.
Change-Id: I1304e33a8440bfdad9c8bcf8129ff390216f2e32
cc: Tejun Heo <tj@kernel.org>
cc: Li Zefan <lizefan@huawei.com>
cc: Johannes Weiner <hannes@cmpxchg.org>
cc: Ingo Molnar <mingo@redhat.com>
cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>
Git-commit: 13001f47c9a610705219700af4636386b647e231
Git-repo: https://android.googlesource.com/kernel/common
Signed-off-by: Vikram Mulukutla <markivx@codeaurora.org>
|
|
Stack canary initialization involves getting a random number.
Getting this random number may involve accessing caches or other
architectural specific features which are not available until
after the architecture is setup. Move the stack canary initialization
later to accommodate this.
Change-Id: I00b564a2c3172229a44339c061fa380c17fe7d8e
Signed-off-by: Laura Abbott <lauraa@codeaurora.org>
[ohaugan@codeaurora.org: Fix trivial merge conflict]
Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>
|
|
The current (CFS) scheduler implementation does not allow "to boost"
tasks performance by running them at a higher OPP compared to the
minimum required to meet their workload demands.
To support tasks performance boosting the scheduler should provide a
"knob" which allows to tune how much the system is going to be optimised
for energy efficiency vs performance.
This patch is the first of a series which provides a simple interface to
define a tuning knob. One system-wide "boost" tunable is exposed via:
/proc/sys/kernel/sched_cfs_boost
which can be configured in the range [0..100], to define a percentage
where:
- 0% boost requires to operate in "standard" mode by scheduling
tasks at the minimum capacities required by the workload demand
- 100% boost requires to push at maximum the task performances,
"regardless" of the incurred energy consumption
A boost value in between these two boundaries is used to bias the
power/performance trade-off, the higher the boost value the more the
scheduler is biased toward performance boosting instead of energy
efficiency.
Change-Id: I59a41725e2d8f9238a61dfb0c909071b53560fc0
cc: Ingo Molnar <mingo@redhat.com>
cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>
Git-commit: 63c8fad2b06805ef88f1220551289f0a3c3529f1
Git-repo: https://source.codeaurora.org/quic/la/kernel/msm-4.4
Signed-off-by: Vikram Mulukutla <markivx@codeaurora.org>
|
|
Move core control from out-of-tree module into the kernel proper.
Core control monitors load on CPUs and controls how many CPUs are
available for the system to use at any point in time. This can help save
power. Core control can be configured through sysfs interface.
Change-Id: Ia78e701468ea3828195c2a15c9cf9fafd099804a
Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>
|
|
read-only kernel mappings
It may be useful to debug writes to the readonly sections of memory,
so provide a cmdline "rodata=off" to allow for this. This can be
expanded in the future to support "log" and "write" modes, but that
will need to be architecture-specific.
This also makes KDB software breakpoints more usable, as read-only
mappings can now be disabled on any kernel.
Suggested-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David Brown <david.brown@linaro.org>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Emese Revfy <re.emese@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mathias Krause <minipli@googlemail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: PaX Team <pageexec@freemail.hu>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: kernel-hardening@lists.openwall.com
Cc: linux-arch <linux-arch@vger.kernel.org>
Link: http://lkml.kernel.org/r/1455748879-21872-3-git-send-email-keescook@chromium.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Bug: 31660652
Change-Id: I67b818ca390afdd42ab1c27cb4f8ac64bbdb3b65
(cherry picked from commit d2aa1acad22f1bdd0cfa67b3861800e392254454)
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
|
|
feature flag
The ENERGY_AWARE sched feature flag cannot be set unless
CONFIG_SCHED_DEBUG is enabled.
So this patch allows the flag to default to true at build time
if the config is set.
Change-Id: I8835a571fdb7a8f8ee6a54af1e11a69f3b5ce8e6
Signed-off-by: John Stultz <john.stultz@linaro.org>
|
|
use a window based view of time in order to track task
demand and CPU utilization in the scheduler.
Window Assisted Load Tracking (WALT) implementation credits:
Srivatsa Vaddagiri, Steve Muckle, Syed Rameez Mustafa, Joonwoo Park,
Pavan Kumar Kondeti, Olav Haugan
2016-03-06: Integration with EAS/refactoring by Vikram Mulukutla
and Todd Kjos
Change-Id: I21408236836625d4e7d7de1843d20ed5ff36c708
Includes fixes for issues:
eas/walt: Use walt_ktime_clock() instead of ktime_get_ns() to avoid a
race resulting in watchdog resets
BUG: 29353986
Change-Id: Ic1820e22a136f7c7ebd6f42e15f14d470f6bbbdb
Handle walt accounting anomoly during resume
During resume, there is a corner case where on wakeup, a task's
prev_runnable_sum can go negative. This is a workaround that
fixes the condition and warns (instead of crashing).
BUG: 29464099
Change-Id: I173e7874324b31a3584435530281708145773508
Signed-off-by: Todd Kjos <tkjos@google.com>
Signed-off-by: Srinath Sridharan <srinathsr@google.com>
Signed-off-by: Juri Lelli <juri.lelli@arm.com>
[jstultz: fwdported to 4.4]
Signed-off-by: John Stultz <john.stultz@linaro.org>
|
|
Currently the build for a single-core (e.g. user-mode) Linux is broken
and this configuration is required (at least) to run some network tests.
The main issues for the current code support on single-core systems are:
1. {se,rq}::sched_avg is not available nor maintained for !SMP systems
This means that load and utilisation signals are NOT available in single
core systems. All the EAS code depends on these signals.
2. sched_group_energy is also SMP dependant. Again this means that all the
EAS setup and preparation code (energyn model initialization) has to be
properly guarded/disabled for !SMP systems.
3. SchedFreq depends on utilization signal, which is not available on
!SMP systems.
4. SchedTune is useless on unicore systems if SchedFreq is not available.
5. WALT machinery is not required on single-core systems.
This patch addresses all these issues by enforcing some constraints for
single-core systems:
a) WALT, SchedTune and SchedTune are now dependant on SMP
b) The default governor for !SMP systems is INTERACTIVE
c) The energy model initialisation/build functions are
d) Other minor code re-arrangements and CONFIG_SMP guarding to enable
single core builds.
Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>
|