summaryrefslogtreecommitdiff
path: root/kernel/sched (follow)
Commit message (Collapse)AuthorAge
...
| | * | | | | | | | sched/walt: Drop arch-specific timer accessChris Redpath2016-10-12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On at least one platform, occasionally the timer providing the wallclock was able to be reset/go backwards for at least some time after wakeup. Accept that this might happen and warn the first time, but otherwise just carry on. Change-Id: Id3164477ba79049561af7f0889cbeebc199ead4e Signed-off-by: Chris Redpath <chris.redpath@arm.com>
| | * | | | | | | | eas/sched/fair: Fixing comments in find_best_target.Srinath Sridharan2016-10-12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Change-Id: I83f5b9887e98f9fdb81318cde45408e7ebfc4b13 Signed-off-by: Srinath Sridharan <srinathsr@google.com>
| | * | | | | | | | Merge branch 'linux-linaro-lsk-v4.4' into linux-linaro-lsk-v4.4-androidAlex Shi2016-10-11
| | |\ \ \ \ \ \ \ \ | | | | |_|_|_|_|_|/ | | | |/| | | | | | | | | | | | | | | | | | | | | | | | | | Conflicts: kernel/cpuset.c
| | | * | | | | | | sched/core: Fix a race between try_to_wake_up() and a woken up taskBalbir Singh2016-09-24
| | | | |_|_|_|_|/ | | | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 135e8c9250dd5c8c9aae5984fde6f230d0cbfeaf upstream. The origin of the issue I've seen is related to a missing memory barrier between check for task->state and the check for task->on_rq. The task being woken up is already awake from a schedule() and is doing the following: do { schedule() set_current_state(TASK_(UN)INTERRUPTIBLE); } while (!cond); The waker, actually gets stuck doing the following in try_to_wake_up(): while (p->on_cpu) cpu_relax(); Analysis: The instance I've seen involves the following race: CPU1 CPU2 while () { if (cond) break; do { schedule(); set_current_state(TASK_UN..) } while (!cond); wakeup_routine() spin_lock_irqsave(wait_lock) raw_spin_lock_irqsave(wait_lock) wake_up_process() } try_to_wake_up() set_current_state(TASK_RUNNING); .. list_del(&waiter.list); CPU2 wakes up CPU1, but before it can get the wait_lock and set current state to TASK_RUNNING the following occurs: CPU3 wakeup_routine() raw_spin_lock_irqsave(wait_lock) if (!list_empty) wake_up_process() try_to_wake_up() raw_spin_lock_irqsave(p->pi_lock) .. if (p->on_rq && ttwu_wakeup()) .. while (p->on_cpu) cpu_relax() .. CPU3 tries to wake up the task on CPU1 again since it finds it on the wait_queue, CPU1 is spinning on wait_lock, but immediately after CPU2, CPU3 got it. CPU3 checks the state of p on CPU1, it is TASK_UNINTERRUPTIBLE and the task is spinning on the wait_lock. Interestingly since p->on_rq is checked under pi_lock, I've noticed that try_to_wake_up() finds p->on_rq to be 0. This was the most confusing bit of the analysis, but p->on_rq is changed under runqueue lock, rq_lock, the p->on_rq check is not reliable without this fix IMHO. The race is visible (based on the analysis) only when ttwu_queue() does a remote wakeup via ttwu_queue_remote. In which case the p->on_rq change is not done uder the pi_lock. The result is that after a while the entire system locks up on the raw_spin_irqlock_save(wait_lock) and the holder spins infintely Reproduction of the issue: The issue can be reproduced after a long run on my system with 80 threads and having to tweak available memory to very low and running memory stress-ng mmapfork test. It usually takes a long time to reproduce. I am trying to work on a test case that can reproduce the issue faster, but thats work in progress. I am still testing the changes on my still in a loop and the tests seem OK thus far. Big thanks to Benjamin and Nick for helping debug this as well. Ben helped catch the missing barrier, Nick caught every missing bit in my theory. Signed-off-by: Balbir Singh <bsingharora@gmail.com> [ Updated comment to clarify matching barriers. Many architectures do not have a full barrier in switch_to() so that cannot be relied upon. ] Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Alexey Kardashevskiy <aik@ozlabs.ru> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Nicholas Piggin <nicholas.piggin@gmail.com> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/e02cce7b-d9ca-1ad0-7a61-ea97c7582b37@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* | | | | | | | | Merge "sched/rt: Fix PI handling vs. sched_setscheduler()"Linux Build Service Account2017-03-03
|\ \ \ \ \ \ \ \ \
| * | | | | | | | | sched/rt: Fix PI handling vs. sched_setscheduler()Peter Zijlstra2017-02-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Andrea Parri reported: > I found that the following scenario (with CONFIG_RT_GROUP_SCHED=y) is not > handled correctly: > > T1 (prio = 20) > lock(rtmutex); > > T2 (prio = 20) > blocks on rtmutex (rt_nr_boosted = 0 on T1's rq) > > T1 (prio = 20) > sys_set_scheduler(prio = 0) > [new_effective_prio == oldprio] > T1 prio = 20 (rt_nr_boosted = 0 on T1's rq) > > The last step is incorrect as T1 is now boosted (c.f., rt_se_boosted()); > in particular, if we continue with > > T1 (prio = 20) > unlock(rtmutex) > wakeup(T2) > adjust_prio(T1) > [prio != rt_mutex_getprio(T1)] > dequeue(T1) > rt_nr_boosted = (unsigned long)(-1) > ... > T1 prio = 0 > > then we end up leaving rt_nr_boosted in an "inconsistent" state. > > The simple program attached could reproduce the previous scenario; note > that, as a consequence of the presence of this state, the "assertion" > > WARN_ON(!rt_nr_running && rt_nr_boosted) > > from dec_rt_group() may trigger. So normally we dequeue/enqueue tasks in sched_setscheduler(), which would ensure the accounting stays correct. However in the early PI path we fail to do so. So this was introduced at around v3.14, by: c365c292d059 ("sched: Consider pi boosting in setscheduler()") which fixed another problem exactly because that dequeue/enqueue, joy. Fix this by teaching rt about DEQUEUE_SAVE/ENQUEUE_RESTORE and have it preserve runqueue location with that option. This requires decoupling the on_rt_rq() state from being on the list. In order to allow for explicit movement during the SAVE/RESTORE, introduce {DE,EN}QUEUE_MOVE. We still must use SAVE/RESTORE in these cases to preserve other invariants. Respecting the SAVE/RESTORE flags also has the (nice) side-effect that things like sys_nice()/sys_sched_setaffinity() also do not reorder FIFO tasks (whereas they used to before this patch). Change-Id: I1450923252f55dba19f450008db813113eb06c76 Reported-by: Andrea Parri <parri.andrea@gmail.com> Tested-by: Andrea Parri <parri.andrea@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Juri Lelli <juri.lelli@arm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> [pkondeti@codeaurora.org: Fix trivial merge conflict] Git-commit: ff77e468535987b3d21b7bd4da15608ea3ce7d0b Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org>
* | | | | | | | | | sched: Print aggregation status in sched_get_busy trace eventPavankumar Kondeti2017-02-27
| |/ / / / / / / / |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Aggregation for frequency is not enabled all the time. The aggregated load is attached to the most busy CPU only when the group load is above a certain threshold. Print the aggregation status in sched_get_busy trace event to make debugging and testing easier. Change-Id: Icb916f362ea0fa8b5dc7d23cb384168d86159687 Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org>
* | | | | | | | | core_ctl: fix bug in assignment of not_preferred tunable valuesPavankumar Kondeti2017-02-17
|/ / / / / / / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Cluster's lru list is iterated while storing the not_preferred tunable values. The position of a CPU in this list gets changed when it is isolated. This results in an incorrect assignment of user input. Fix this by iterating the CPUs serially in a cluster. Change-Id: I7235ca981b0fd82488034ab8d1880bb7498c9a72 Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org>
* | | | | | | | Merge "core_ctl: redo offline_delay_ms tunable implementation"Linux Build Service Account2017-02-15
|\ \ \ \ \ \ \ \
| * | | | | | | | core_ctl: redo offline_delay_ms tunable implementationPavankumar Kondeti2017-02-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The offline_delay_ms tunable is supposed to give hysteresis effect by delaying the CPU isolation. The current implementation is not enforcing this correctly due to which we see a CPU is getting isolated immediately in the next evaluation cycle. Allow isolating a CPU only if offline_delay_ms is passed after since the last time we isolated/unisolated/evaluated without changing the need CPUs. Change-Id: I9681a11dea1ffa07b2fda6cc9a40af9b453bf553 Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org>
* | | | | | | | | Merge "sched: don't assume higher capacity means higher power in tick migration"Linux Build Service Account2017-02-15
|\ \ \ \ \ \ \ \ \
| * | | | | | | | | sched: don't assume higher capacity means higher power in tick migrationPavankumar Kondeti2017-02-15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When an upmigrate ineligible task running on the maximum capacity CPU, we check if it can be migrated to a lower capacity CPU in tick path. Add a power cost based check there to prevent the task migration from a power efficient CPU. Change-Id: I291c62d7dbf169d5123faba5f5246ad44a7a40dd Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org>
| * | | | | | | | | sched: optimize cpumask operations during task placementPavankumar Kondeti2017-02-15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Compute the CPU search mask once by taking task affinity, cpu_active_mask and cpu_isolated_mask into account and cache it in cpu_selection_env. This prevents doing the same cpumask operations multiple times. Change-Id: I78f35c59e6ee9437b3a522ac7ad856c0251f81ec Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org>
| * | | | | | | | | sched: don't select an inactive/isolated CPU in sbc()Pavankumar Kondeti2017-02-09
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In select_best_cpu(), if no CPU is selected from the candidate cluster, the search is expanded to the backup cluster. The current code may select an inactive/isolated CPU in the backup cluster. Fix this. Change-Id: Id1e8a2b2f84ea274cdeda408957490ca05ef5fdb Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org>
* | | | | | | | | | Merge "sched: remove sched_new_task_windows tunable"Linux Build Service Account2017-02-09
|\ \ \ \ \ \ \ \ \ \ | |_|/ / / / / / / / |/| | | | | | | | |
| * | | | | | | | | sched: remove sched_new_task_windows tunablePavankumar Kondeti2017-02-08
| |/ / / / / / / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The sched_new_task_windows tunable is set to 5 in the scheduler and it is not changed from user space. Remove this unused tunable. Change-Id: I771e12b44876efe75ce87a90e4e9d69c22168b64 Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org>
* / / / / / / / / sched: fix bug in auto adjustment of group upmigrate/downmigratePavankumar Kondeti2017-02-08
|/ / / / / / / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | sched_group_upmigrate tunable can accept values greater than 100%. Don't limit it to 100% while doing the auto adjustment. Change-Id: I3d1c1e84f2f4dec688235feb1536b9261a3e808b Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org>
* | | | | | | | Merge "sched: fix argument type in update_task_burst()"Linux Build Service Account2017-02-07
|\ \ \ \ \ \ \ \
| * | | | | | | | sched: fix argument type in update_task_burst()Pavankumar Kondeti2017-02-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | update_task_burst() function's runtime argument type should be u64 not int. Fix this to avoid potential overflow. Change-Id: I33757b7b42f142138c1a099bb8be18c2a3bed331 Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org>
* | | | | | | | | Merge "sysctl: define upper limit for sched_freq_reporting_policy"Linux Build Service Account2017-02-07
|\ \ \ \ \ \ \ \ \
| * | | | | | | | | sysctl: define upper limit for sched_freq_reporting_policyPavankumar Kondeti2017-02-03
| |/ / / / / / / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Setting sched_freq_reporting_policy tunable to an unsupported values results in a warning from the scheduler. The previous policy setting is also lost. As sched_freq_reporting_policy can not be set to an incorrect value now, remove the WARN_ON_ONCE from the scheduler. Change-Id: I58d7e5dfefb7d11d2309bc05a1dd66acdc11b766 Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org>
* | | | | | | | | Merge "sched: Remove sched_enable_hmp flag"Linux Build Service Account2017-02-03
|\ \ \ \ \ \ \ \ \
| * | | | | | | | | sched: Remove sched_enable_hmp flagOlav Haugan2017-02-02
| |/ / / / / / / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Clean up the code and make it more maintainable by removing dependency on the sched_enable_hmp flag. We do not support HMP scheduler without recompiling. Enabling the HMP scheduler is done through enabling the CONFIG_SCHED_HMP config. Change-Id: I246c1b1889f8dcbc8f0a0805077c0ce5d4f083b0 Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>
* / / / / / / / / sched: maintain group busy time counters in runqueuePavankumar Kondeti2017-02-01
|/ / / / / / / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There is no advantage of tracking busy time counters per related thread group. We need busy time across all groups for either a CPU or a frequency domain. Hence maintain group busy time counters in the runqueue itself. When CPU window is rolled over, the group busy counters are also rolled over. This eliminates the overhead of individual group's window_start maintenance. As we are preallocating related thread group now, this patch saves 40 * nr_cpu_ids * (nr_grp - 1) bytes memory. Change-Id: Ieaaccea483b377f54ea1761e6939ee23a78a5e9c Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org>
* | | | | | | | sched: set LBF_IGNORE_PREFERRED_CLUSTER_TASKS correctlyPavankumar Kondeti2017-01-27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The LBF_IGNORE_PREFERRED_CLUSTER_TASKS flag needs to be set for all types of inter-cluster load balance. Currently this is set only when higher capacity CPU is pulling the tasks from a lower capacity CPU. This can result in the migration of grouped tasks from higher capacity cluster to lower capacity cluster. Change-Id: Ib0476c5c85781804798ef49268e1b193859ff5ef Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org>
* | | | | | | | sched: Update capacity and load scale factor for all clusters at bootSyed Rameez Mustafa2017-01-20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Cluster capacities should reflect differences in efficiency of different clusters even in the absence of cpufreq. Currently capacity is updated only when cpufreq policy notifier is received. Therefore placement is suboptimal when cpufreq is turned off. Fix this by updating capacities and load scaling factors during cluster detection. Change-Id: I47f63c1e374bbfd247a4302525afb37d55334bad Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
* | | | | | | | Merge "sched: kill sync_cpu maintenance"Linux Build Service Account2017-01-19
|\ \ \ \ \ \ \ \
| * | | | | | | | sched: kill sync_cpu maintenancePavankumar Kondeti2017-01-19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We assume boot CPU as a sync CPU and initialize it's window_start to sched_ktime_clock(). As windows are synchronized across all CPUs, the secondary CPUs' window_start are initialized from the sync_cpu's window_start. A CPU's window_start is never reset, so this synchronization happens only once for a given CPU. Given this fact, there is no need to reassigning the sync_cpu role to another CPU when the boot CPU is going offline. Remove this unnecessary maintenance of sync_cpu and use any online CPU's window_start as reference. Change-Id: I169a8e80573c6dbcb1edeab0659c07c17102f4c9 Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org>
* | | | | | | | | Merge "sched: hmp: Remove the global sysctl_sched_enable_colocation tunable"Linux Build Service Account2017-01-18
|\ \ \ \ \ \ \ \ \ | |/ / / / / / / / |/| | | | | | | |
| * | | | | | | | sched: hmp: Remove the global sysctl_sched_enable_colocation tunableVikram Mulukutla2017-01-18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Colocation in HMP includes a tunable that turns on or off the feature globally across all colocation groups. Supporting this tunable correctly would result in complexity that would outweigh any foreseeable benefits. For example, disabling the feature globally would involve deleting all colocation groups one by one while ensuring no placement decisions are made during the process. Remove the tunable. Adding or removing a task from a colocation group is still possible and so we're not losing functionality. Change-Id: I4cb8bcdbee98d3bdd168baacbac345eca9ea8879 Signed-off-by: Vikram Mulukutla <markivx@codeaurora.org>
| * | | | | | | | sched: hmp: Ensure that best_cluster() never returns NULLVikram Mulukutla2017-01-18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There are certain conditions under which group_will_fit() may return 0 for all clusters in the system, especially under changing thermal conditions. This may result in crashes such as this one: CPU 0 | CPU 1 ==================================================================== select_best_cpu() | -> env.rtg = rtgA | rtgA.pref_cluster=C_big | | set_pref_cluster() for rtgA | -> best_cluster() | C_little doesn't fit | | IRQ: thermal mitigation | C_big capacity now less | than C_little capacity | | -> best_cluster() continues | C_big doesn't fit | set_pref_cluster() sets | rtgA.pref_cluster = NULL | select_least_power_cluster() | -> cluster_first_cpu() | -> BUG() | To add lock protection around accesses to the group's preferred cluster would be expensive and defeat the point of the usage of RCU to protect access to the related_thread_group structure. Therefore, ensure that best_cluster() can never return NULL. In the worst case, we'll select the wrong cluster for a related_thread_group's demand, but this should be fixed in the next tick or wakeup etc. Locking would have still led to the momentary wrong decision with the additional expense! Also, don't set preferred cluster to NULL when colocation is disabled. Change-Id: Id3f514b149add9b3ed33d104fa6a9bd57bec27e2 Signed-off-by: Vikram Mulukutla <markivx@codeaurora.org>
* | | | | | | | | Merge "sched: Initialize variables"Linux Build Service Account2017-01-16
|\ \ \ \ \ \ \ \ \
| * | | | | | | | | sched: Initialize variablesOlav Haugan2017-01-13
| |/ / / / / / / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Initialize variable at definition to avoid compiler warning when compiling with CONFIG_OPTIMIZE_FOR_SIZE=n. Change-Id: Ibd201877b2274c70ced9d7240d0e527bc77402f3 Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>
* | | | | | | | | Merge "sched: Fix compilation errors when CFS_BANDWIDTH && !SCHED_HMP"Linux Build Service Account2017-01-16
|\ \ \ \ \ \ \ \ \
| * | | | | | | | | sched: Fix compilation errors when CFS_BANDWIDTH && !SCHED_HMPPavankumar Kondeti2017-01-12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There are few compiler errors and warnings when CFS_BANDWIDTH config is enabled but not SCHED_HMP. Change-Id: Idaf4a7364564b6faf56df2eb3a1a74eeb242d57e Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org>
| * | | | | | | | | sched: fix compiler errors with !SCHED_HMPPavankumar Kondeti2017-01-12
| |/ / / / / / / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | HMP scheduler boost feature related functions are referred in SMP load balancer. Add the nop functions for the same to fix the compiler errors with !SCHED_HMP. Change-Id: I1cbcf67f728c2cbc7c0f47e8eaf1f4165649dce8 Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org>
* | | | | | | | | Merge "sched: fix a bug in handling top task table rollover"Linux Build Service Account2017-01-14
|\ \ \ \ \ \ \ \ \
| * | | | | | | | | sched: fix a bug in handling top task table rolloverPavankumar Kondeti2017-01-07
| |/ / / / / / / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When frequency aggregation is enabled, there is a possibility of rolling over the top task table multiple times in a single window. For example - utra() is called with PUT_PREV_TASK for task 'A' which does not belong to any related thread grp. Lets say window rollover happens. rq counters and top task table rollover is done. - utra() is called with PICK_NEXT_TASK/TASK_WAKE for task 'B' which belongs to a related thread grp. Lets say this happens before the grp's cpu_time->window_start is in sync with rq->window_start. In this case, grp's cpu_time counters are rolled over and the top task table is also rolled over again. Roll over the top task table in the context of current running task to fix this. Change-Id: Iea3075e0ea460a9279a01ba42725890c46edd713 Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org>
* / / / / / / / / sched: Convert the global wake_up_idle flag to a per cluster flagSyed Rameez Mustafa2017-01-10
|/ / / / / / / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since clusters can vary significantly in the power and performance characteristics, there may be a need to have different CPU selection policies based on which cluster a task is being placed on. For example the placement policy can be more aggressive in using idle CPUs on cluster that are power efficient and less aggressive on clusters that are geared towards performance. Add support for per cluster wake_up_idle flag to allow greater flexibility in placement policies. Change-Id: I18cd3d907cd965db03a13f4655870dc10c07acfe Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
* | | | | | | | sched: fix stale predicted load in trace_sched_get_busy()Pavankumar Kondeti2017-01-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When early detection notification is pending, we skip calculating predicted load. Initialize it to 0 so that stale value does not get printed in trace_sched_get_busy(). Change-Id: I36287c0081f6c12191235104666172b7cae2a583 Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org>
* | | | | | | | Merge "sched: Delete heavy task heuristics in prediction code"Linux Build Service Account2017-01-05
|\ \ \ \ \ \ \ \
| * | | | | | | | sched: Delete heavy task heuristics in prediction codeRohit Gupta2017-01-04
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Heavy task prediction code needs further tuning to avoid any negative power impact. Delete the code for now instead of adding tunables to avoid inefficiencies in the scheduler path. Change-Id: I71e3b37a5c99e24bc5be93cc825d7e171e8ff7ce Signed-off-by: Rohit Gupta <rohgup@codeaurora.org>
* | | | | | | | | Merge "sched: Fix new task accounting bug in transfer_busy_time()"Linux Build Service Account2017-01-05
|\ \ \ \ \ \ \ \ \ | |/ / / / / / / / |/| | | | | | | |
| * | | | | | | | sched: Fix new task accounting bug in transfer_busy_time()Syed Rameez Mustafa2017-01-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In transfer_busy_time(), the new_task flag is set based on the active window count prior to the call to update_task_ravg(). update_task_ravg() however, can then increment the active window count and consequently the new_task flag above becomes stale. This is turn leads to inaccurate accounting whereby update_task_ravg() does accounting based on the fact that the task is not new whereas transfer_busy_time() then continues to do further accounting assuming that the task is new. The accounting discrepancies are sometimes caught by some of the scheduler BUGs. Fix the described problem by moving the check is_new_task() after the call to update_task_ravg(). Also add two missing BUGs that would catch the problem sooner rather than later. Change-Id: I8dc4822e97cc03ebf2ca1ee2de95eb4e5851f459 Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
* | | | | | | | | sched: Fix deadlock between cpu hotplug and upmigrate changePavankumar Kondeti2016-12-30
|/ / / / / / / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There is a circular dependency between cpu_hotplug.lock and HMP scheduler policy mutex. Prevent this by enforcing the same lock order. Here CPU0 and CPU4 are governed by different cpufreq policies. ---------------- -------------------- CPU 0 CPU 4 --------------- -------------------- proc_sys_call_handler() cpu_up() --> acquired cpu_hotplug.lock sched_hmp_proc_update_handler() cpufreq_cpu_callback() --> acquired policy_mutex cpufreq_governor_interactive() get_online_cpus() sched_set_window() --> waiting for cpu_hotplug.lock --> waiting for policy_mutex Change-Id: I39efc394f4f00815b72adc975021fdb16fe6e30a Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org>
* | | | | | | | Merge "sched: Fix out of bounds array access in sched_reset_all_window_stats()"Linux Build Service Account2016-12-21
|\ \ \ \ \ \ \ \
| * | | | | | | | sched: Fix out of bounds array access in sched_reset_all_window_stats()Pavankumar Kondeti2016-11-29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A new reset reason code "FREQ_AGGREGATE_CHANGE" is added to reset_reason_code enum but the corresponding string array is not updated. Fix this. Change-Id: I2a17d95328bef91c4a5dd4dde418296efca44431 Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org>
* | | | | | | | | Merge "sched/tune: remove duplicate allow_attach in schedtune_cgrp_subsys"Linux Build Service Account2016-12-21
|\ \ \ \ \ \ \ \ \
| * | | | | | | | | sched/tune: remove duplicate allow_attach in schedtune_cgrp_subsysRunmin Wang2016-12-20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Remove the extra allow_attach function and its function definiton. Change-Id: I530f9f5098d7d2cd6bb343e44c2b8b808af69414 Signed-off-by: Runmin Wang <runminw@codeaurora.org>
* | | | | | | | | | sched: Avoid packing tasks with low sleep timeSrivatsa Vaddagiri2016-12-20
|/ / / / / / / / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Low sleep time can be an indication that waking tasks will not receive any vruntime bonus and hence would suffer from latency when packed. short-burst tasks sleeping on an average more than sched_short_sleep_ns are not eligible for packing. This policy covers the case where a task runs in short bursts and sleeping for smaller duration in between. Change-Id: Ib81fa37809b85c267949cd433bc6115dd89f100e Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org> Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org>