summaryrefslogtreecommitdiff
path: root/kernel (follow)
Commit message (Collapse)AuthorAge
...
| * | | | | | | | sched/cgroup: Fix/cleanup cgroup teardown/initPeter Zijlstra2016-10-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The CPU controller hasn't kept up with the various changes in the whole cgroup initialization / destruction sequence, and commit: 2e91fa7f6d45 ("cgroup: keep zombies associated with their original cgroups") caused it to explode. The reason for this is that zombies do not inhibit css_offline() from being called, but do stall css_released(). Now we tear down the cfs_rq structures on css_offline() but zombies can run after that, leading to use-after-free issues. The solution is to move the tear-down to css_released(), which guarantees nobody (including no zombies) is still using our cgroup. Furthermore, a few simple cleanups are possible too. There doesn't appear to be any point to us using css_online() (anymore?) so fold that in css_alloc(). And since cgroup code guarantees an RCU grace period between css_released() and css_free() we can forgo using call_rcu() and free the stuff immediately. Change-Id: I51af3d4f0e5dd1c9df6375cce4bb933f67f1022e Suggested-by: Tejun Heo <tj@kernel.org> Reported-by: Kazuki Yamaguchi <k@rhe.jp> Reported-by: Niklas Cassel <niklas.cassel@axis.com> Tested-by: Niklas Cassel <niklas.cassel@axis.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Tejun Heo <tj@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: 2e91fa7f6d45 ("cgroup: keep zombies associated with their original cgroups") Link: http://lkml.kernel.org/r/20160316152245.GY6344@twins.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org> Git-commit: 2f5177f0fd7e531b26d54633be62d1d4cb94621c Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git Signed-off-by: Satya Durga Srinivasu Prabhala <satyap@codeaurora.org>
| * | | | | | | | sched/cgroup: Fix cgroup entity load tracking tear-downPeter Zijlstra2016-10-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When a cgroup's CPU runqueue is destroyed, it should remove its remaining load accounting from its parent cgroup. The current site for doing so it unsuited because its far too late and unordered against other cgroup removal (->css_free() will be, but we're also in an RCU callback). Put it in the ->css_offline() callback, which is the start of cgroup destruction, right after the group has been made unavailable to userspace. The ->css_offline() callbacks are called in hierarchical order after the following v4.4 commit: aa226ff4a1ce ("cgroup: make sure a parent css isn't offlined before its children") Change-Id: Ice7cbd71d9e545da84d61686aa46c7213607bb9d Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Li Zefan <lizefan@huawei.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20160121212416.GL6357@twins.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org> Git-commit: 6fe1f348b3dd1f700f9630562b7d38afd6949568 Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git Signed-off-by: Satya Durga Srinivasu Prabhala <satyap@codeaurora.org>
* | | | | | | | | Merge "sched: bucketize CPU c-state levels"Linux Build Service Account2016-10-13
|\ \ \ \ \ \ \ \ \
| * | | | | | | | | sched: bucketize CPU c-state levelsJoonwoo Park2016-10-12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | C-state aware scheduler takes note of wakeup latency of each c-state level to determine whether to pack or wake up LPM CPU. But it doesn't distinguish small and large delta as it's inefficient for scheduler to do so on its critical path. Disregard wakeup latencies less than 64 us between different c-state levels. This reduces unnecessary task packing. CRs-fixed: 1074879 Change-Id: Ib0cadbd390d1a0b6da3e39c98010cedb43e5bf60 Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
* | | | | | | | | | Merge "sched: use wakeup latency as c-state determinant"Linux Build Service Account2016-10-13
|\| | | | | | | | |
| * | | | | | | | | sched: use wakeup latency as c-state determinantJoonwoo Park2016-10-12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | C-state aware scheduler at present, uses a raw c-state index number as its determinant and avoids task placement on deeper c-state CPUs at cost of latency. However there are CPUs offering comparable wake-up latency at different c-state levels and the wake-up latency at each c-state levels are already have being fed to scheduler. Hence use the wakeup_latency as c-state determinant instead of raw c-state index to avoid unnecessary task packing where it's doable. CRs-fixed: 1074879 Change-Id: If927f84f6c8ba719716d99669e5d1f1b19aaacbe Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
* | | | | | | | | | sched/tune: Remove redundant checks for NULL cssSyed Rameez Mustafa2016-10-12
|/ / / / / / / / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The check for NULL css is redundant as upper layers are already making sure that css cannot be NULL. Remove this check. It helps to silence static analysis errors as well. Change-Id: I64585ff8cceb307904e20ff788e52eb05c000e1f Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
* | | | | | | | | sched: Add cgroup attach functionality to the tune controllerSyed Rameez Mustafa2016-10-10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is required to allow tasks to freely move between cgroups associated with the tune controller. Change-Id: I1f39b957462034586edc2fdc0a35488b314e9c8c Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
* | | | | | | | | sched: Update the number of tune groups to 5Syed Rameez Mustafa2016-10-10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The schedtune controller will mimic the cpusets controller configuration for now. For that we need to make 4 groups in addition to the root group present by default. Change-Id: I082f1e4e4ebf863e623cf66ee127eac70a3e2716 Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
* | | | | | | | | sched/tune: add initial support for CGroups based boostingPatrick Bellasi2016-10-10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | To support task performance boosting, the usage of a single knob has the advantage to be a simple solution, both from the implementation and the usability standpoint. However, on a real system it can be difficult to identify a single value for the knob which fits the needs of multiple different tasks. For example, some kernel threads and/or user-space background services should be better managed the "standard" way while we still want to be able to boost the performance of specific workloads. In order to improve the flexibility of the task boosting mechanism this patch is the first of a small series which extends the previous implementation to introduce a "per task group" support. This first patch introduces just the basic CGroups support, a new "schedtune" CGroups controller is added which allows to configure different boost value for different groups of tasks. To keep the implementation simple but still effective for a boosting strategy, the new controller: 1. allows only a two layer hierarchy 2. supports only a limited number of boost groups A two layer hierarchy allows to place each task either: a) in the root control group thus being subject to a system-wide boosting value b) in a child of the root group thus being subject to the specific boost value defined by that "boost group" The limited number of "boost groups" supported is mainly motivated by the observation that in a real system it could be useful to have only few classes of tasks which deserve different treatment. For example, background vs foreground or interactive vs low-priority. As an additional benefit, a limited number of boost groups allows also to have a simpler implementation especially for the code required to compute the boost value for CPUs which have runnable tasks belonging to different boost groups. Change-Id: I1304e33a8440bfdad9c8bcf8129ff390216f2e32 cc: Tejun Heo <tj@kernel.org> cc: Li Zefan <lizefan@huawei.com> cc: Johannes Weiner <hannes@cmpxchg.org> cc: Ingo Molnar <mingo@redhat.com> cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com> Git-commit: 13001f47c9a610705219700af4636386b647e231 Git-repo: https://android.googlesource.com/kernel/common Signed-off-by: Vikram Mulukutla <markivx@codeaurora.org>
* | | | | | | | | genirq: Avoid race between cpu hot plug and irq_desc() allocation pathsPrasad Sodagudi2016-10-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | One of the core might have just allocated irq_desc() and other core might be doing irq migration in the hot plug path. In the hot plug path during the IRQ migration, for_each_active_irq macro is trying to get irqs whose bits are set in allocated_irqs bit map but there is no return value check after irq_to_desc for desc validity. [ 24.566381] msm_thermal:do_core_control Set Offline: CPU4 Temp: 73 [ 24.568821] Unable to handle kernel NULL pointer dereference at virtual address 000000a4 [ 24.568931] pgd = ffffffc002184000 [ 24.568995] [000000a4] *pgd=0000000178df5003, *pud=0000000178df5003, *pmd=0000000178df6003, *pte=0060000017a00707 [ 24.569153] ------------[ cut here ]------------ [ 24.569228] Kernel BUG at ffffffc0000f3060 [verbose debug info unavailable] [ 24.569334] Internal error: Oops - BUG: 96000005 [#1] PREEMPT SMP [ 24.569422] Modules linked in: [ 24.569486] CPU: 4 PID: 28 Comm: migration/4 Tainted: G W 4.4.8-perf-9407222-eng #1 [ 24.569684] task: ffffffc0f28f0e80 ti: ffffffc0f2a84000 task.ti: ffffffc0f2a84000 [ 24.569785] PC is at do_raw_spin_lock+0x20/0x160 [ 24.569859] LR is at _raw_spin_lock+0x34/0x40 [ 24.569931] pc : [<ffffffc0000f3060>] lr : [<ffffffc001023dec>] pstate: 200001c5 [ 24.570029] sp : ffffffc0f2a87bc0 [ 24.570091] x29: ffffffc0f2a87bc0 x28: ffffffc001033988 [ 24.570174] x27: ffffffc001adebb0 x26: 0000000000000000 [ 24.570257] x25: 00000000000000a0 x24: 0000000000000020 [ 24.570339] x23: ffffffc001033978 x22: ffffffc001adeb80 [ 24.570421] x21: 000000000000027e x20: 0000000000000000 [ 24.570502] x19: 00000000000000a0 x18: 000000000000000d [ 24.570584] x17: 0000000000005f00 x16: 0000000000000003 [ 24.570666] x15: 000000000000bd39 x14: 0ffffffffffffffe [ 24.570748] x13: 0000000000000000 x12: 0000000000000018 [ 24.570829] x11: 0101010101010101 x10: 7f7f7f7f7f7f7f7f [ 24.570911] x9 : fefefefeff332e6d x8 : 7f7f7f7f7f7f7f7f [ 24.570993] x7 : ffffffc00344f6a0 x6 : 0000000000000000 [ 24.571075] x5 : 0000000000000001 x4 : ffffffc00344f488 [ 24.571157] x3 : 0000000000000000 x2 : 0000000000000000 [ 24.571238] x1 : ffffffc0f2a84000 x0 : 0000000000004ead ... ... ... [ 24.581324] Call trace: [ 24.581379] [<ffffffc0000f3060>] do_raw_spin_lock+0x20/0x160 [ 24.581463] [<ffffffc001023dec>] _raw_spin_lock+0x34/0x40 [ 24.581546] [<ffffffc000103f10>] irq_migrate_all_off_this_cpu+0x84/0x1c4 [ 24.581641] [<ffffffc00008ec84>] __cpu_disable+0x54/0x74 [ 24.581722] [<ffffffc0000a3368>] take_cpu_down+0x1c/0x58 [ 24.581803] [<ffffffc00013ac08>] multi_cpu_stop+0xb0/0x10c [ 24.581885] [<ffffffc00013ad60>] cpu_stopper_thread+0xb8/0x184 [ 24.581972] [<ffffffc0000c4920>] smpboot_thread_fn+0x26c/0x290 [ 24.582057] [<ffffffc0000c0f84>] kthread+0x100/0x108 [ 24.582135] Code: aa0003f3 aa1e03e0 d503201f 5289d5a0 (b9400661) [ 24.582224] ---[ end trace 609f38584306f5d9 ]--- CRs-Fixed: 1074611 Change-Id: I6cc5399e511b6d62ec7fbc4cac21f4f41023520e Signed-off-by: Prasad Sodagudi <psodagud@codeaurora.org> Signed-off-by: Trilok Soni <tsoni@codeaurora.org> Signed-off-by: Runmin Wang <runminw@codeaurora.org>
* | | | | | | | | Merge "sched/tune: add sysctl interface to define a boost value"Linux Build Service Account2016-10-06
|\ \ \ \ \ \ \ \ \
| * | | | | | | | | sched/tune: add sysctl interface to define a boost valuePatrick Bellasi2016-10-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The current (CFS) scheduler implementation does not allow "to boost" tasks performance by running them at a higher OPP compared to the minimum required to meet their workload demands. To support tasks performance boosting the scheduler should provide a "knob" which allows to tune how much the system is going to be optimised for energy efficiency vs performance. This patch is the first of a series which provides a simple interface to define a tuning knob. One system-wide "boost" tunable is exposed via: /proc/sys/kernel/sched_cfs_boost which can be configured in the range [0..100], to define a percentage where: - 0% boost requires to operate in "standard" mode by scheduling tasks at the minimum capacities required by the workload demand - 100% boost requires to push at maximum the task performances, "regardless" of the incurred energy consumption A boost value in between these two boundaries is used to bias the power/performance trade-off, the higher the boost value the more the scheduler is biased toward performance boosting instead of energy efficiency. Change-Id: I59a41725e2d8f9238a61dfb0c909071b53560fc0 cc: Ingo Molnar <mingo@redhat.com> cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com> Git-commit: 63c8fad2b06805ef88f1220551289f0a3c3529f1 Git-repo: https://source.codeaurora.org/quic/la/kernel/msm-4.4 Signed-off-by: Vikram Mulukutla <markivx@codeaurora.org>
| * | | | | | | | | sched: Initialize HMP stats inside init_sd_lb_stats()Syed Rameez Mustafa2016-10-05
| |/ / / / / / / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This ensures that the load balancer always works correctly even without compiler optimizations. Change-Id: I36408ae65833b624401e60edfb50c19cc061d7bf Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
* | | | | | | | | Merge "sched: Fix a division by zero bug in scale_exec_time()"Linux Build Service Account2016-10-06
|\ \ \ \ \ \ \ \ \
| * | | | | | | | | sched: Fix a division by zero bug in scale_exec_time()Pavankumar Kondeti2016-10-01
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When cycle_counter is used to estimate the frequency, calling update_task_ravg() twice on the same task without refreshing the wallclock results in a division by zero bug. Add a safety check in update_task_ravg() to prevent this. The above bug is hit from __schedule() when next == prev. There is no need to call update_task_ravg() twice for PUT_PREV_TASK and PICK_NEXT_TASK events for the same task. Calling update_task_ravg() with TASK_UPDATE event is sufficient. Change-Id: Ib3af9004f2462618c535b8195377bedb584d0261 Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org>
* | | | | | | | | | Merge "sched: Fix integer overflow in sched_update_nr_prod()"Linux Build Service Account2016-10-05
|\ \ \ \ \ \ \ \ \ \
| * | | | | | | | | | sched: Fix integer overflow in sched_update_nr_prod()Pavankumar Kondeti2016-10-04
| | |/ / / / / / / / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | "int" type is used to hold the time difference between the successive updates to nr_run in sched_update_nr_prod(). This can result in overflow, if the function is called ~2.15 sec after it was called before. The most probable scenarios are when CPU is idle and hotplugged. But as we update the last_time of all possible CPUs in sched_get_nr_running_avg() periodically from a deferrable timer context (core_ctl module), this overflow is observed only when the system is completely idle for long time. When this overflow happens we hit a BUG_ON() in sched_get_nr_running_avg(). Use "u64" type instead of "int" for holding the time difference and add additional BUG_ON() to catch the instances where sched_clock() returns a backward value. Change-Id: I284abb5889ceb8cf9cc689c79ed69422a0e74986 Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org>
* | | | | | | | | | Merge "sched: Add a device tree property to specify the sched boost type"Linux Build Service Account2016-10-05
|\ \ \ \ \ \ \ \ \ \
| * | | | | | | | | | sched: Add a device tree property to specify the sched boost typePavankumar Kondeti2016-10-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The HMP scheduler has two types of task placement boost policies. (1) boost-on-big policy make use of all big CPUs up to their full capacity before using the little CPUs. This improves performance on true b.L systems where the big CPUs have higher efficiency compared to the little CPUs. (2) boost-on-all policy place the tasks on the CPU having the highest spare capacity. This policy is optimal for SMP like systems. The scheduler sets the boost policy to boost-on-big on systems which has CPUs of different efficiencies. However it is possible that CPUs of the same micro architecture to have slight difference in efficiency due to other factors like cache size. Selecting the boost-on-big policy based on relative difference in efficiency is not optimal on such systems. The boost-policy device tree property is introduced to specify the required boost type and it overrides the default selection of boost type in the scheduler. The possible values for this property are "boost-on-big" and "boost-on-all". Change-Id: Iac19183fa7d4bfd9e5746b02a02b2b19cf64b78d Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org>
* | | | | | | | | | | Merge "RFC: FROMLIST: cgroup: reduce read locked section of ↵Linux Build Service Account2016-10-05
|\ \ \ \ \ \ \ \ \ \ \ | |_|/ / / / / / / / / |/| | | | | | | | | | | | | | | | | | | | | cgroup_threadgroup_rwsem during fork"
| * | | | | | | | | | RFC: FROMLIST: cgroup: reduce read locked section of ↵Balbir Singh2016-08-29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | cgroup_threadgroup_rwsem during fork cgroup_threadgroup_rwsem is acquired in read mode during process exit and fork. It is also grabbed in write mode during __cgroups_proc_write(). I've recently run into a scenario with lots of memory pressure and OOM and I am beginning to see systemd __switch_to+0x1f8/0x350 __schedule+0x30c/0x990 schedule+0x48/0xc0 percpu_down_write+0x114/0x170 __cgroup_procs_write.isra.12+0xb8/0x3c0 cgroup_file_write+0x74/0x1a0 kernfs_fop_write+0x188/0x200 __vfs_write+0x6c/0xe0 vfs_write+0xc0/0x230 SyS_write+0x6c/0x110 system_call+0x38/0xb4 This thread is waiting on the reader of cgroup_threadgroup_rwsem to exit. The reader itself is under memory pressure and has gone into reclaim after fork. There are times the reader also ends up waiting on oom_lock as well. __switch_to+0x1f8/0x350 __schedule+0x30c/0x990 schedule+0x48/0xc0 jbd2_log_wait_commit+0xd4/0x180 ext4_evict_inode+0x88/0x5c0 evict+0xf8/0x2a0 dispose_list+0x50/0x80 prune_icache_sb+0x6c/0x90 super_cache_scan+0x190/0x210 shrink_slab.part.15+0x22c/0x4c0 shrink_zone+0x288/0x3c0 do_try_to_free_pages+0x1dc/0x590 try_to_free_pages+0xdc/0x260 __alloc_pages_nodemask+0x72c/0xc90 alloc_pages_current+0xb4/0x1a0 page_table_alloc+0xc0/0x170 __pte_alloc+0x58/0x1f0 copy_page_range+0x4ec/0x950 copy_process.isra.5+0x15a0/0x1870 _do_fork+0xa8/0x4b0 ppc_clone+0x8/0xc In the meanwhile, all processes exiting/forking are blocked almost stalling the system. This patch moves the threadgroup_change_begin from before cgroup_fork() to just before cgroup_canfork(). There is no nee to worry about threadgroup changes till the task is actually added to the threadgroup. This avoids having to call reclaim with cgroup_threadgroup_rwsem held. tj: Subject and description edits. Signed-off-by: Balbir Singh <bsingharora@gmail.com> Acked-by: Zefan Li <lizefan@huawei.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: stable@vger.kernel.org # v4.2+ Signed-off-by: Tejun Heo <tj@kernel.org> [jstultz: Cherry-picked from: git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup.git 568ac888215c7f] Change-Id: Ie8ece84fb613cf6a7b08cea1468473a8df2b9661 Signed-off-by: John Stultz <john.stultz@linaro.org> Git-commit: e91f1799ff2cc3883907b5f3e141507f9716ff0e Git-repo: https://android.googlesource.com/kernel/common/+/android-4.4 Signed-off-by: Omprakash Dhyade <odhyade@codeaurora.org>
| * | | | | | | | | | RFC: FROMLIST: cgroup: avoid synchronize_sched() in __cgroup_procs_write()Peter Zijlstra2016-08-29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The current percpu-rwsem read side is entirely free of serializing insns at the cost of having a synchronize_sched() in the write path. The latency of the synchronize_sched() is too high for cgroups. The commit 1ed1328792ff talks about the write path being a fairly cold path but this is not the case for Android which moves task to the foreground cgroup and back around binder IPC calls from foreground processes to background processes, so it is significantly hotter than human initiated operations. Switch cgroup_threadgroup_rwsem into the slow mode for now to avoid the problem, hopefully it should not be that slow after another commit 80127a39681b ("locking/percpu-rwsem: Optimize readers and reduce global impact"). We could just add rcu_sync_enter() into cgroup_init() but we do not want another synchronize_sched() at boot time, so this patch adds the new helper which doesn't block but currently can only be called before the first use. Cc: Tejun Heo <tj@kernel.org> Cc: Paul McKenney <paulmck@linux.vnet.ibm.com> Reported-by: John Stultz <john.stultz@linaro.org> Reported-by: Dmitry Shmidt <dimitrysh@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Oleg Nesterov <oleg@redhat.com> [jstultz: backported to 4.4] Change-Id: I34aa9c394d3052779b56976693e96d861bd255f2 Mailing-list-URL: https://lkml.org/lkml/2016/8/11/557 Signed-off-by: John Stultz <john.stultz@linaro.org> Git-commit: 0c3240a1ef2e840aaa17f593326e3642bc857aa7 Git-repo: https://android.googlesource.com/kernel/common/+/android-4.4 Signed-off-by: Omprakash Dhyade <odhyade@codeaurora.org>
| * | | | | | | | | | RFC: FROMLIST: locking/percpu-rwsem: Optimize readers and reduce global impactPeter Zijlstra2016-08-29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently the percpu-rwsem switches to (global) atomic ops while a writer is waiting; which could be quite a while and slows down releasing the readers. This patch cures this problem by ordering the reader-state vs reader-count (see the comments in __percpu_down_read() and percpu_down_write()). This changes a global atomic op into a full memory barrier, which doesn't have the global cacheline contention. This also enables using the percpu-rwsem with rcu_sync disabled in order to bias the implementation differently, reducing the writer latency by adding some cost to readers. Mailing-list-URL: https://lkml.org/lkml/2016/8/9/181 Cc: Paul McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> [jstultz: Backported to 4.4] Change-Id: I8ea04b4dca2ec36f1c2469eccafde1423490572f Signed-off-by: John Stultz <john.stultz@linaro.org> Git-commit: 3228c5eb7af2b4cb981706b88ed3c3e81ab8e80a Git-repo: https://android.googlesource.com/kernel/common/+/android-4.4 Signed-off-by: Omprakash Dhyade <odhyade@codeaurora.org>
* | | | | | | | | | | Merge "sched: Fix CPU selection when all online CPUs are isolated"Linux Build Service Account2016-10-03
|\ \ \ \ \ \ \ \ \ \ \
| * | | | | | | | | | | sched: Fix CPU selection when all online CPUs are isolatedSyed Rameez Mustafa2016-09-30
| | |_|/ / / / / / / / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | After the introduction of "33c24b sched: add cpu isolation support" select_fallback_rq() might sometimes be unable find any CPU to place a task on. This happens when the all online CPUs are isolated and the allow isolated flag is set to false. In such cases, we have little choice but to use an isolated CPU and wait for core control to eventually un-isolate one or more online CPUs. Change-Id: Id8738bd8493c11731c5491efcc99eb90f051233e Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
* | | | | | | | | | | Merge "sched: Add a stub function for init_clusters()"Linux Build Service Account2016-10-03
|\ \ \ \ \ \ \ \ \ \ \ | | |_|/ / / / / / / / | |/| | | | | | | | |
| * | | | | | | | | | sched: Add a stub function for init_clusters()Pavankumar Kondeti2016-10-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a stub function for init_cluster() and remove a ifdefry for SCHED_HMP in sched_init() Change-Id: I6745485152d735436d8398818f7fb5e70ce5ee65 Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org>
* | | | | | | | | | | Merge "sched: add a knob to prefer the waker CPU for sync wakeups"Linux Build Service Account2016-10-03
|\| | | | | | | | | |
| * | | | | | | | | | sched: add a knob to prefer the waker CPU for sync wakeupsPavankumar Kondeti2016-10-02
| |/ / / / / / / / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The current policy has a preference to select an idle CPU in the waker cluster compared to the waker CPU running only 1 task. By selecting an idle CPU, it eliminates the chance of waker migrating to a different CPU after the wakee preempts it. This policy is also not susceptible to the incorrect "sync" usage i.e the waker does not goto sleep after waking up the wakee. However LPM exit latency associated with an idle CPU outweigh the above benefits on some targets. So add a knob to prefer the waker CPU having only 1 runnable task over idle CPUs in the waker cluster. Change-Id: Id974748c07625c1b19112235f426a5d204dfdb33 Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org>
* | | | | | | | | | Merge "hrtimer: Ensure timer is not running before migrating"Linux Build Service Account2016-10-03
|\ \ \ \ \ \ \ \ \ \
| * | | | | | | | | | hrtimer: Ensure timer is not running before migratingOlav Haugan2016-09-30
| |/ / / / / / / / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A timer might be running when we are trying to move the timer to another CPU so ensure that we wait for the timer to finish before migrating. Change-Id: I4c9ee39c715baebfbdb8a50476a475e38b092f70 Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>
* | | | | | | | | | Merge "sched: don't assume higher capacity means higher power in lb"Linux Build Service Account2016-09-30
|\ \ \ \ \ \ \ \ \ \
| * | | | | | | | | | sched: don't assume higher capacity means higher power in lbPavankumar Kondeti2016-09-29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The load balancer restrictions are in place to control the tasks migration from the lower capacity cluster to higher capacity cluster to save power. The assumption here is that higher capacity cluster will have higher power cost which may not be necessarily true for all platforms. Use power cost based checks instead of capacity based checks while applying the inter cluster migration restrictions. Change-Id: Id9519eb8f7b183a2e9fca87a23cf95e951aa4005 Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org>
* | | | | | | | | | | Merge "sched: panic on corrupted stack end"Linux Build Service Account2016-09-30
|\ \ \ \ \ \ \ \ \ \ \ | |_|/ / / / / / / / / |/| | | | | | | | | |
| * | | | | | | | | | sched: panic on corrupted stack endJann Horn2016-09-14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Until now, hitting this BUG_ON caused a recursive oops (because oops handling involves do_exit(), which calls into the scheduler, which in turn raises an oops), which caused stuff below the stack to be overwritten until a panic happened (e.g. via an oops in interrupt context, caused by the overwritten CPU index in the thread_info). Just panic directly. Change-Id: I73409be3e4cfba82bae36a487227eb5260cd6e37 Signed-off-by: Jann Horn <jannh@google.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Git-repo: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git Git-commit: 29d6455178a09e1dc340380c582b13356227e8df Signed-off-by: Dennis Cagle <d-cagle@codeaurora.org>
* | | | | | | | | | | Merge "sched: constrain HMP scheduler tunable range with in better way"Linux Build Service Account2016-09-29
|\ \ \ \ \ \ \ \ \ \ \
| * | | | | | | | | | | sched: constrain HMP scheduler tunable range with in better wayJoonwoo Park2016-09-21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | HMP scheduler tunables can be constrained via extra1 and extra2 of ctl_table. Having valid range in the sysctl table gives clearer view of tunable's range. Also add range for sched_select_prev_cpu_us so we can avoid invalid value configuration of that tunable. CRs-fixed: 1056910 Change-Id: I09fcc019133f4d37b7be3287da8e0733e40fc0ac Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
* | | | | | | | | | | | Merge "sched/core_ctl: Integrate core control with cpu isolation"Linux Build Service Account2016-09-29
|\ \ \ \ \ \ \ \ \ \ \ \
| * | | | | | | | | | | | sched/core_ctl: Integrate core control with cpu isolationOlav Haugan2016-09-24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Replace hotplug functionality in core control with cpu isolation and integrate into scheduler. Change-Id: I4f1514ba5bac2e259a1105fcafb31d6a92ddd249 Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>
* | | | | | | | | | | | | Merge "sched/core_ctl: Refactor cpu data"Linux Build Service Account2016-09-29
|\| | | | | | | | | | | |
| * | | | | | | | | | | | sched/core_ctl: Refactor cpu dataOlav Haugan2016-09-24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Refactor cpu data into cpu data and cluster data to improve readability and ease of understanding the code. Change-Id: I96505aeb9d07a6fa3a2c28648ffa299e0cfa2e41 Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>
| * | | | | | | | | | | | trace: Move core control trace events to schedulerOlav Haugan2016-09-24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Move the core control trace events to scheduler trace event file. Change-Id: I65943d8e4a9eac1f9f5a40ad5aaf166679215f48 Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>
* | | | | | | | | | | | | Merge "core_ctrl: Move core control into kernel"Linux Build Service Account2016-09-29
|\| | | | | | | | | | | | | |_|_|/ / / / / / / / / |/| | | | | | | | | | |
| * | | | | | | | | | | core_ctrl: Move core control into kernelOlav Haugan2016-09-24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Move core control from out-of-tree module into the kernel proper. Core control monitors load on CPUs and controls how many CPUs are available for the system to use at any point in time. This can help save power. Core control can be configured through sysfs interface. Change-Id: Ia78e701468ea3828195c2a15c9cf9fafd099804a Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>
| * | | | | | | | | | | core_ctl_helper: Remove code since it is not used anymoreOlav Haugan2016-09-24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Remove the core control helper code since this is not needed anymore with subsequent patches that moves core control into the kernel. Change-Id: I62acddeb707fc7d5626580166b3466e63f45fd89 Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>
| * | | | | | | | | | | perf: Add cpu isolation awarenessOlav Haugan2016-09-24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Ensure perf events does not wake up idle cores when core is isolated. Change-Id: Ifefb2f1cf6c24af7bc46fc62797955b8c8ad5815 Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>
| * | | | | | | | | | | smp: Do not wake up all idle CPUsOlav Haugan2016-09-24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Do not wake up cpus that are isolated. Change-Id: I07702bb5b738c1c75c49a2ca4cb08be0231ccb12 Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>
| * | | | | | | | | | | pmqos: Enable cpu isolation awarenessOlav Haugan2016-09-24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Set long latency requirement for isolated cores to ensure LPM logic will select a deep sleep state. Change-Id: I83e9fbb800df259616a145d311b50627dc42a5ff Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>
| * | | | | | | | | | | irq: Make irq affinity function cpu isolation awareOlav Haugan2016-09-24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Prohibit setting the affinity of an IRQ to an isolated core. Change-Id: I7b50778615541a64f9956573757c7f28748c4f69 Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>