summaryrefslogtreecommitdiff
path: root/kernel/sched/rt.c (unfollow)
Commit message (Collapse)Author
2024-10-17Merge remote-tracking branch 'msm8998/lineage-20' into lineage-20Raghuram Subramani
Change-Id: I126075a330f305c85f8fe1b8c9d408f368be95d1
2023-07-27softirq, sched: reduce softirq conflicts with RTJohn Dias
joshuous: Adapted to work with CAF's "softirq: defer softirq processing to ksoftirqd if CPU is busy with RT" commit. ajaivasudeve: adapted for the commit "softirq: Don't defer all softirq during RT task" We're finding audio glitches caused by audio-producing RT tasks that are either interrupted to handle softirq's or that are scheduled onto cpu's that are handling softirq's. In a previous patch, we attempted to catch many cases of the latter problem, but it's clear that we are still losing significant numbers of races in some apps. This patch attempts to address both problems: 1. It prohibits handling softirq's when interrupting an RT task, by delaying the softirq to the ksoftirqd thread. 2. It attempts to reduce the most common windows in which we lose the race between scheduling an RT task on a remote core and starting to handle softirq's on that core. We still lose some races, but we lose significantly fewer. (And we don't want to introduce any heavyweight forms of synchronization on these paths.) Bug: 64912585 Change-Id: Ida89a903be0f1965552dd0e84e67ef1d3158c7d8 Signed-off-by: John Dias <joaodias@google.com> Signed-off-by: joshuous <joshuous@gmail.com> Signed-off-by: ajaivasudeve <ajaivasudeve@gmail.com>
2023-07-27ANDROID: sched/rt: rt cpu selection integration with EAS.Srinath Sridharan
For effective interplay between RT and fair tasks. Enables sched_fifo for UI and Render tasks. Critical for improving user experience. bug: 24503801 bug: 30377696 Change-Id: I2a210c567c3f5c7edbdd7674244822f848e6d0cf Signed-off-by: Srinath Sridharan <srinathsr@google.com> (cherry picked from commit dfe0f16b6fd3a694173c5c62cf825643eef184a3)
2023-07-16ANDROID: Move schedtune en/dequeue before schedutil update triggersChris Redpath
CPU rq util updates happen when rq signals are updated as part of enqueue and dequeue operations. Doing these updates triggers a call to the registered util update handler, which takes schedtune boosting into account. Enqueueing the task in the correct schedtune group after this happens means that we will potentially not see the boost for an entire throttle period. Move the enqueue/dequeue operations for schedtune before the signal updates which can trigger OPP changes. Change-Id: I4236e6b194bc5daad32ff33067d4be1987996780 Signed-off-by: Chris Redpath <chris.redpath@arm.com>
2023-07-16sched/walt: Re-add code to allow WALT to functionEthan Chen
Change-Id: Ieb1067c5e276f872ed4c722b7d1fabecbdad87e7
2020-07-09sched/rt: Show the 'sched_rr_timeslice' SCHED_RR timeslice tuning knob in ↵Shile Zhang
milliseconds [ Upstream commit 975e155ed8732cb81f55c021c441ae662dd040b5 ] We added the 'sched_rr_timeslice_ms' SCHED_RR tuning knob in this commit: ce0dbbbb30ae ("sched/rt: Add a tuning knob to allow changing SCHED_RR timeslice") ... which name suggests to users that it's in milliseconds, while in reality it's being set in milliseconds but the result is shown in jiffies. This is obviously confusing when HZ is not 1000, it makes it appear like the value set failed, such as HZ=100: root# echo 100 > /proc/sys/kernel/sched_rr_timeslice_ms root# cat /proc/sys/kernel/sched_rr_timeslice_ms 10 Fix this to be milliseconds all around. Signed-off-by: Shile Zhang <shile.zhang@nokia.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1485612049-20923-1-git-send-email-shile.zhang@nokia.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2018-05-30sched/rt: Fix rq->clock_update_flags < RQCF_ACT_SKIP warningDavidlohr Bueso
[ Upstream commit d29a20645d5e929aa7e8616f28e5d8e1c49263ec ] While running rt-tests' pi_stress program I got the following splat: rq->clock_update_flags < RQCF_ACT_SKIP WARNING: CPU: 27 PID: 0 at kernel/sched/sched.h:960 assert_clock_updated.isra.38.part.39+0x13/0x20 [...] <IRQ> enqueue_top_rt_rq+0xf4/0x150 ? cpufreq_dbs_governor_start+0x170/0x170 sched_rt_rq_enqueue+0x65/0x80 sched_rt_period_timer+0x156/0x360 ? sched_rt_rq_enqueue+0x80/0x80 __hrtimer_run_queues+0xfa/0x260 hrtimer_interrupt+0xcb/0x220 smp_apic_timer_interrupt+0x62/0x120 apic_timer_interrupt+0xf/0x20 </IRQ> [...] do_idle+0x183/0x1e0 cpu_startup_entry+0x5f/0x70 start_secondary+0x192/0x1d0 secondary_startup_64+0xa5/0xb0 We can get rid of it be the "traditional" means of adding an update_rq_clock() call after acquiring the rq->lock in do_sched_rt_period_timer(). The case for the RT task throttling (which this workload also hits) can be ignored in that the skip_update call is actually bogus and quite the contrary (the request bits are removed/reverted). By setting RQCF_UPDATED we really don't care if the skip is happening or not and will therefore make the assert_clock_updated() check happy. Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Reviewed-by: Matt Fleming <matt@codeblueprint.co.uk> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: dave@stgolabs.net Cc: linux-kernel@vger.kernel.org Cc: rostedt@goodmis.org Link: http://lkml.kernel.org/r/20180402164954.16255-1-dave@stgolabs.net Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-22sched: Stop switched_to_rt() from sending IPIs to offline CPUsPaul E. McKenney
[ Upstream commit 2fe2582649aa2355f79acddb86bd4d6c5363eb63 ] The rcutorture test suite occasionally provokes a splat due to invoking rt_mutex_lock() which needs to boost the priority of a task currently sitting on a runqueue that belongs to an offline CPU: WARNING: CPU: 0 PID: 12 at /home/paulmck/public_git/linux-rcu/arch/x86/kernel/smp.c:128 native_smp_send_reschedule+0x37/0x40 Modules linked in: CPU: 0 PID: 12 Comm: rcub/7 Not tainted 4.14.0-rc4+ #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014 task: ffff9ed3de5f8cc0 task.stack: ffffbbf80012c000 RIP: 0010:native_smp_send_reschedule+0x37/0x40 RSP: 0018:ffffbbf80012fd10 EFLAGS: 00010082 RAX: 000000000000002f RBX: ffff9ed3dd9cb300 RCX: 0000000000000004 RDX: 0000000080000004 RSI: 0000000000000086 RDI: 00000000ffffffff RBP: ffffbbf80012fd10 R08: 000000000009da7a R09: 0000000000007b9d R10: 0000000000000001 R11: ffffffffbb57c2cd R12: 000000000000000d R13: ffff9ed3de5f8cc0 R14: 0000000000000061 R15: ffff9ed3ded59200 FS: 0000000000000000(0000) GS:ffff9ed3dea00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000080686f0 CR3: 000000001b9e0000 CR4: 00000000000006f0 Call Trace: resched_curr+0x61/0xd0 switched_to_rt+0x8f/0xa0 rt_mutex_setprio+0x25c/0x410 task_blocks_on_rt_mutex+0x1b3/0x1f0 rt_mutex_slowlock+0xa9/0x1e0 rt_mutex_lock+0x29/0x30 rcu_boost_kthread+0x127/0x3c0 kthread+0x104/0x140 ? rcu_report_unblock_qs_rnp+0x90/0x90 ? kthread_create_on_node+0x40/0x40 ret_from_fork+0x22/0x30 Code: f0 00 0f 92 c0 84 c0 74 14 48 8b 05 34 74 c5 00 be fd 00 00 00 ff 90 a0 00 00 00 5d c3 89 fe 48 c7 c7 a0 c6 fc b9 e8 d5 b5 06 00 <0f> ff 5d c3 0f 1f 44 00 00 8b 05 a2 d1 13 02 85 c0 75 38 55 48 But the target task's priority has already been adjusted, so the only purpose of switched_to_rt() invoking resched_curr() is to wake up the CPU running some task that needs to be preempted by the boosted task. But the CPU is offline, which presumably means that the task must be migrated to some other CPU, and that this other CPU will undertake any needed preemption at the time of migration. Because the runqueue lock is held when resched_curr() is invoked, we know that the boosted task cannot go anywhere, so it is not necessary to invoke resched_curr() in this particular case. This commit therefore makes switched_to_rt() refrain from invoking resched_curr() when the target CPU is offline. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-02-16sched/rt: Up the root domain ref count when passing it around via IPIsSteven Rostedt (VMware)
commit 364f56653708ba8bcdefd4f0da2a42904baa8eeb upstream. When issuing an IPI RT push, where an IPI is sent to each CPU that has more than one RT task scheduled on it, it references the root domain's rto_mask, that contains all the CPUs within the root domain that has more than one RT task in the runable state. The problem is, after the IPIs are initiated, the rq->lock is released. This means that the root domain that is associated to the run queue could be freed while the IPIs are going around. Add a sched_get_rd() and a sched_put_rd() that will increment and decrement the root domain's ref count respectively. This way when initiating the IPIs, the scheduler will up the root domain's ref count before releasing the rq->lock, ensuring that the root domain does not go away until the IPI round is complete. Reported-by: Pavan Kondeti <pkondeti@codeaurora.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: 4bdced5c9a292 ("sched/rt: Simplify the IPI based RT balancing logic") Link: http://lkml.kernel.org/r/CAEU1=PkiHO35Dzna8EQqNSKW1fr1y1zRQ5y66X117MG06sQtNA@mail.gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-02-16sched/rt: Use container_of() to get root domain in rto_push_irq_work_func()Steven Rostedt (VMware)
commit ad0f1d9d65938aec72a698116cd73a980916895e upstream. When the rto_push_irq_work_func() is called, it looks at the RT overloaded bitmask in the root domain via the runqueue (rq->rd). The problem is that during CPU up and down, nothing here stops rq->rd from changing between taking the rq->rd->rto_lock and releasing it. That means the lock that is released is not the same lock that was taken. Instead of using this_rq()->rd to get the root domain, as the irq work is part of the root domain, we can simply get the root domain from the irq work that is passed to the routine: container_of(work, struct root_domain, rto_push_work) This keeps the root domain consistent. Reported-by: Pavan Kondeti <pkondeti@codeaurora.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: 4bdced5c9a292 ("sched/rt: Simplify the IPI based RT balancing logic") Link: http://lkml.kernel.org/r/CAEU1=PkiHO35Dzna8EQqNSKW1fr1y1zRQ5y66X117MG06sQtNA@mail.gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-02-01ANDROID: sched/rt: schedtune: Add boost retention to RTJoel Fernandes
Boosted RT tasks can be deboosted quickly, this makes boost usless for RT tasks and causes lots of glitching. Use timers to prevent de-boost too soon and wait for long enough such that next enqueue happens after a threshold. While this can be solved in the governor, there are following advantages: - The approach used is governor-independent - Reduces boost group lock contention for frequently sleepers/wakers Note: Fixed build breakage due to schedfreq dependency which isn't used for RT anymore. Bug: 30210506 Change-Id: I428a2695cac06cc3458cdde0dea72315e4e66c00 Signed-off-by: Joel Fernandes <joelaf@google.com>
2017-12-20sched/rt: Do not pull from current CPU if only one CPU to pullSteven Rostedt
commit f73c52a5bcd1710994e53fbccc378c42b97a06b6 upstream. Daniel Wagner reported a crash on the BeagleBone Black SoC. This is a single CPU architecture, and does not have a functional arch_send_call_function_single_ipi() implementation which can crash the kernel if that is called. As it only has one CPU, it shouldn't be called, but if the kernel is compiled for SMP, the push/pull RT scheduling logic now calls it for irq_work if the one CPU is overloaded, it can use that function to call itself and crash the kernel. Ideally, we should disable the SCHED_FEAT(RT_PUSH_IPI) if the system only has a single CPU. But SCHED_FEAT is a constant if sched debugging is turned off. Another fix can also be used, and this should also help with normal SMP machines. That is, do not initiate the pull code if there's only one RT overloaded CPU, and that CPU happens to be the current CPU that is scheduling in a lower priority task. Even on a system with many CPUs, if there's many RT tasks waiting to run on a single CPU, and that CPU schedules in another RT task of lower priority, it will initiate the PULL logic in case there's a higher priority RT task on another CPU that is waiting to run. But if there is no other CPU with waiting RT tasks, it will initiate the RT pull logic on itself (as it still has RT tasks waiting to run). This is a wasted effort. Not only does this help with SMP code where the current CPU is the only one with RT overloaded tasks, it should also solve the issue that Daniel encountered, because it will prevent the PULL logic from executing, as there's only one CPU on the system, and the check added here will cause it to exit the RT pull code. Reported-by: Daniel Wagner <wagi@monom.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-rt-users <linux-rt-users@vger.kernel.org> Fixes: 4bdced5c9 ("sched/rt: Simplify the IPI based RT balancing logic") Link: http://lkml.kernel.org/r/20171202130454.4cbbfe8d@vmware.local.home Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-11-30sched/rt: Simplify the IPI based RT balancing logicSteven Rostedt (Red Hat)
commit 4bdced5c9a2922521e325896a7bbbf0132c94e56 upstream. When a CPU lowers its priority (schedules out a high priority task for a lower priority one), a check is made to see if any other CPU has overloaded RT tasks (more than one). It checks the rto_mask to determine this and if so it will request to pull one of those tasks to itself if the non running RT task is of higher priority than the new priority of the next task to run on the current CPU. When we deal with large number of CPUs, the original pull logic suffered from large lock contention on a single CPU run queue, which caused a huge latency across all CPUs. This was caused by only having one CPU having overloaded RT tasks and a bunch of other CPUs lowering their priority. To solve this issue, commit: b6366f048e0c ("sched/rt: Use IPI to trigger RT task push migration instead of pulling") changed the way to request a pull. Instead of grabbing the lock of the overloaded CPU's runqueue, it simply sent an IPI to that CPU to do the work. Although the IPI logic worked very well in removing the large latency build up, it still could suffer from a large number of IPIs being sent to a single CPU. On a 80 CPU box, I measured over 200us of processing IPIs. Worse yet, when I tested this on a 120 CPU box, with a stress test that had lots of RT tasks scheduling on all CPUs, it actually triggered the hard lockup detector! One CPU had so many IPIs sent to it, and due to the restart mechanism that is triggered when the source run queue has a priority status change, the CPU spent minutes! processing the IPIs. Thinking about this further, I realized there's no reason for each run queue to send its own IPI. As all CPUs with overloaded tasks must be scanned regardless if there's one or many CPUs lowering their priority, because there's no current way to find the CPU with the highest priority task that can schedule to one of these CPUs, there really only needs to be one IPI being sent around at a time. This greatly simplifies the code! The new approach is to have each root domain have its own irq work, as the rto_mask is per root domain. The root domain has the following fields attached to it: rto_push_work - the irq work to process each CPU set in rto_mask rto_lock - the lock to protect some of the other rto fields rto_loop_start - an atomic that keeps contention down on rto_lock the first CPU scheduling in a lower priority task is the one to kick off the process. rto_loop_next - an atomic that gets incremented for each CPU that schedules in a lower priority task. rto_loop - a variable protected by rto_lock that is used to compare against rto_loop_next rto_cpu - The cpu to send the next IPI to, also protected by the rto_lock. When a CPU schedules in a lower priority task and wants to make sure overloaded CPUs know about it. It increments the rto_loop_next. Then it atomically sets rto_loop_start with a cmpxchg. If the old value is not "0", then it is done, as another CPU is kicking off the IPI loop. If the old value is "0", then it will take the rto_lock to synchronize with a possible IPI being sent around to the overloaded CPUs. If rto_cpu is greater than or equal to nr_cpu_ids, then there's either no IPI being sent around, or one is about to finish. Then rto_cpu is set to the first CPU in rto_mask and an IPI is sent to that CPU. If there's no CPUs set in rto_mask, then there's nothing to be done. When the CPU receives the IPI, it will first try to push any RT tasks that is queued on the CPU but can't run because a higher priority RT task is currently running on that CPU. Then it takes the rto_lock and looks for the next CPU in the rto_mask. If it finds one, it simply sends an IPI to that CPU and the process continues. If there's no more CPUs in the rto_mask, then rto_loop is compared with rto_loop_next. If they match, everything is done and the process is over. If they do not match, then a CPU scheduled in a lower priority task as the IPI was being passed around, and the process needs to start again. The first CPU in rto_mask is sent the IPI. This change removes this duplication of work in the IPI logic, and greatly lowers the latency caused by the IPIs. This removed the lockup happening on the 120 CPU machine. It also simplifies the code tremendously. What else could anyone ask for? Thanks to Peter Zijlstra for simplifying the rto_loop_start atomic logic and supplying me with the rto_start_trylock() and rto_start_unlock() helper functions. Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Clark Williams <williams@redhat.com> Cc: Daniel Bristot de Oliveira <bristot@redhat.com> Cc: John Kacur <jkacur@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Scott Wood <swood@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20170424114732.1aac6dc4@gandalf.local.home Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-11-08Revert "ANDROID: sched/rt: schedtune: Add boost retention to RT"Todd Kjos
This reverts commit d194ba5d712f051ff6c025f3484bb72f219764e3. Reason for revert: Broke some builds. Will fix and resubmit. Change-Id: I4e6fa1562346eda1bbf058f1d5ace5ba6256ce07
2017-11-07cpufreq: Drop schedfreq governorViresh Kumar
We all should be using (and improving) the schedutil governor now. Get rid of the non-upstream governor. Tested on Hikey. Change-Id: Ic660756536e5da51952738c3c18b94e31f58cd57 Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
2017-11-07ANDROID: sched/rt: schedtune: Add boost retention to RTJoel Fernandes
Boosted RT tasks can be deboosted quickly, this makes boost usless for RT tasks and causes lots of glitching. Use timers to prevent de-boost too soon and wait for long enough such that next enqueue happens after a threshold. While this can be solved in the governor, there are following advantages: - The approach used is governor-independent - Reduces boost group lock contention for frequently sleepers/wakers - Works with schedfreq without any other schedfreq hacks. Bug: 30210506 Change-Id: I41788b235586988be446505deb7c0529758a9898 Signed-off-by: Joel Fernandes <joelaf@google.com>
2017-11-07ANDROID: sched/rt: add schedtune accountingJoel Fernandes
This patch adds schedtune enqueue/dequeue to RT scheduling class. Change-Id: If416e64319d62191f3aedd675d3e9a21fe2102fb Signed-off-by: Joel Fernandes <joelaf@google.com>
2017-10-27FROMLIST: sched/fair: Use wake_q length as a hint for wake_wideBrendan Jackman
(from https://patchwork.kernel.org/patch/9895261/) This patch adds a parameter to select_task_rq, sibling_count_hint allowing the caller, where it has this information, to inform the sched_class the number of tasks that are being woken up as part of the same event. The wake_q mechanism is one case where this information is available. select_task_rq_fair can then use the information to detect that it needs to widen the search space for task placement in order to avoid overloading the last-level cache domain's CPUs. * * * The reason I am investigating this change is the following use case on ARM big.LITTLE (asymmetrical CPU capacity): 1 task per CPU, which all repeatedly do X amount of work then pthread_barrier_wait (i.e. sleep until the last task finishes its X and hits the barrier). On big.LITTLE, the tasks which get a "big" CPU finish faster, and then those CPUs pull over the tasks that are still running: v CPU v ->time-> ------------- 0 (big) 11111 /333 ------------- 1 (big) 22222 /444| ------------- 2 (LITTLE) 333333/ ------------- 3 (LITTLE) 444444/ ------------- Now when task 4 hits the barrier (at |) and wakes the others up, there are 4 tasks with prev_cpu=<big> and 0 tasks with prev_cpu=<little>. want_affine therefore means that we'll only look in CPUs 0 and 1 (sd_llc), so tasks will be unnecessarily coscheduled on the bigs until the next load balance, something like this: v CPU v ->time-> ------------------------ 0 (big) 11111 /333 31313\33333 ------------------------ 1 (big) 22222 /444|424\4444444 ------------------------ 2 (LITTLE) 333333/ \222222 ------------------------ 3 (LITTLE) 444444/ \1111 ------------------------ ^^^ underutilization So, I'm trying to get want_affine = 0 for these tasks. I don't _think_ any incarnation of the wakee_flips mechanism can help us here because which task is waker and which tasks are wakees generally changes with each iteration. However pthread_barrier_wait (or more accurately FUTEX_WAKE) has the nice property that we know exactly how many tasks are being woken, so we can cheat. It might be a disadvantage that we "widen" _every_ task that's woken in an event, while select_idle_sibling would work fine for the first sd_llc_size - 1 tasks. IIUC, if wake_affine() behaves correctly this trick wouldn't be necessary on SMP systems, so it might be best guarded by the presence of SD_ASYM_CPUCAPACITY? * * * Final note.. In order to observe "perfect" behaviour for this use case, I also had to disable the TTWU_QUEUE sched feature. Suppose during the wakeup above we are working through the work queue and have placed tasks 3 and 2, and are about to place task 1: v CPU v ->time-> -------------- 0 (big) 11111 /333 3 -------------- 1 (big) 22222 /444|4 -------------- 2 (LITTLE) 333333/ 2 -------------- 3 (LITTLE) 444444/ <- Task 1 should go here -------------- If TTWU_QUEUE is enabled, we will not yet have enqueued task 2 (having instead sent a reschedule IPI) or attached its load to CPU 2. So we are likely to also place task 1 on cpu 2. Disabling TTWU_QUEUE means that we enqueue task 2 before placing task 1, solving this issue. TTWU_QUEUE is there to minimise rq lock contention, and I guess that this contention is less of an issue on big.LITTLE systems since they have relatively few CPUs, which suggests the trade-off makes sense here. Change-Id: I2080302839a263e0841a89efea8589ea53bbda9c Signed-off-by: Brendan Jackman <brendan.jackman@arm.com> Signed-off-by: Chris Redpath <chris.redpath@arm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Josef Bacik <josef@toxicpanda.com> Cc: Joel Fernandes <joelaf@google.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Matt Fleming <matt@codeblueprint.co.uk>
2017-10-16sched: Update task->on_rq when tasks are moving between runqueuesOlav Haugan
Task->on_rq has three states: 0 - Task is not on runqueue (rq) 1 (TASK_ON_RQ_QUEUED) - Task is on rq 2 (TASK_ON_RQ_MIGRATING) - Task is on rq but in the process of being migrated to another rq When a task is moving between rqs task->on_rq state should be TASK_ON_RQ_MIGRATING in order for WALT to account rq's cumulative runnable average correctly. Without such state marking for all the classes, WALT's update_history() would try to fixup task's demand which was never contributed to any of CPUs during migration. Change-Id: Iced3428f3924fe8ab5d0075698273ead04f12d5b Signed-off-by: Olav Haugan <ohaugan@codeaurora.org> [joonwoop: Reinforced changelog to explain why this is needed by WALT. Fixed conflicts in deadline.c] Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2017-07-14sched: avoid RT tasks contention during sched boostPavankumar Kondeti
When placement boost is active, we are currently considering only the highest capacity cluster. If all of the active CPUs in this cluster are busy with RT tasks, the waking task is placed on it's previous CPU, which may be running a RT task. This results in suboptimal performance. Fix this by expanding the search to the other clusters, when there is no eligible CPU found in the highest capacity cluster. Change-Id: Iaab2e397b994c2b219dc086c7a6fa91ca26a5128 Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org>
2017-06-09sched: avoid migrating when softint on tgt cpu should be shortJohn Dias
The scheduling change (bug 31501544) to avoid putting RT threads on cores that are handling softint's was catching cases where there was no reason to believe the softint would take a long time, resulting in unnecessary migration overhead. This patch reduces the migration to cases where the core has a softint that is actually likely to take a long time, as opposed to the RCU, SCHED, and TIMER softints that are rather quick. Bug: 31752786 Change-Id: Ib4e179f1e15c736b2fdba31070494e357e9fbbe2 Git-commit: ce05770bd37b8065b61ef650108ecef2b97b148b Git-repo: https://android.googlesource.com/kernel/msm [pkondeti@codeaurora.org: resolved minor merge conflicts] Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org>
2017-06-09sched: avoid scheduling RT threads on cores currently handling softirqsJohn Dias
Bug: 31501544 Change-Id: I99dd7aaa12c11270b28dbabea484bcc8fb8ba0c1 Git-commit: 080ea011fd9f47315e1fc53185872ef813b59d00 Git-repo: https://android.googlesource.com/kernel/msm [pkondeti@codeaurora.org: resolved minor merge conflicts and fixed checkpatch warnings] Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org>
2017-06-02sched: backport cpufreq hooks from 4.9-rc4Steve Muckle
The scheduler cpufreq hooks are required by the schedutil cpufreq governor. Change-Id: Ied6c46262bb33b7e81bbb3d3d2761124e0c676b7 Signed-off-by: Steve Muckle <smuckle@linaro.org> [trivial cherry-picking fixes] Signed-off-by: Juri Lelli <juri.lelli@arm.com> Signed-off-by: Chris Redpath <chris.redpath@arm.com>
2017-04-06Merge 4.4.59 into android-4.4Greg Kroah-Hartman
Changes in 4.4.59: xfrm: policy: init locks early xfrm_user: validate XFRM_MSG_NEWAE XFRMA_REPLAY_ESN_VAL replay_window xfrm_user: validate XFRM_MSG_NEWAE incoming ESN size harder virtio_balloon: init 1st buffer in stats vq pinctrl: qcom: Don't clear status bit on irq_unmask c6x/ptrace: Remove useless PTRACE_SETREGSET implementation h8300/ptrace: Fix incorrect register transfer count mips/ptrace: Preserve previous registers for short regset write sparc/ptrace: Preserve previous registers for short regset write metag/ptrace: Preserve previous registers for short regset write metag/ptrace: Provide default TXSTATUS for short NT_PRSTATUS metag/ptrace: Reject partial NT_METAG_RPIPE writes fscrypt: remove broken support for detecting keyring key revocation sched/rt: Add a missing rescheduling point Linux 4.4.59 Change-Id: Ifa35307b133cbf29d0a0084bb78a7b0436182b53 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2017-03-31sched/rt: Add a missing rescheduling pointSebastian Andrzej Siewior
commit 619bd4a71874a8fd78eb6ccf9f272c5e98bcc7b7 upstream. Since the change in commit: fd7a4bed1835 ("sched, rt: Convert switched_{from, to}_rt() / prio_changed_rt() to balance callbacks") ... we don't reschedule a task under certain circumstances: Lets say task-A, SCHED_OTHER, is running on CPU0 (and it may run only on CPU0) and holds a PI lock. This task is removed from the CPU because it used up its time slice and another SCHED_OTHER task is running. Task-B on CPU1 runs at RT priority and asks for the lock owned by task-A. This results in a priority boost for task-A. Task-B goes to sleep until the lock has been made available. Task-A is already runnable (but not active), so it receives no wake up. The reality now is that task-A gets on the CPU once the scheduler decides to remove the current task despite the fact that a high priority task is enqueued and waiting. This may take a long time. The desired behaviour is that CPU0 immediately reschedules after the priority boost which made task-A the task with the lowest priority. Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: fd7a4bed1835 ("sched, rt: Convert switched_{from, to}_rt() prio_changed_rt() to balance callbacks") Link: http://lkml.kernel.org/r/20170124144006.29821-1-bigeasy@linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-02-22sched/rt: Fix PI handling vs. sched_setscheduler()Peter Zijlstra
Andrea Parri reported: > I found that the following scenario (with CONFIG_RT_GROUP_SCHED=y) is not > handled correctly: > > T1 (prio = 20) > lock(rtmutex); > > T2 (prio = 20) > blocks on rtmutex (rt_nr_boosted = 0 on T1's rq) > > T1 (prio = 20) > sys_set_scheduler(prio = 0) > [new_effective_prio == oldprio] > T1 prio = 20 (rt_nr_boosted = 0 on T1's rq) > > The last step is incorrect as T1 is now boosted (c.f., rt_se_boosted()); > in particular, if we continue with > > T1 (prio = 20) > unlock(rtmutex) > wakeup(T2) > adjust_prio(T1) > [prio != rt_mutex_getprio(T1)] > dequeue(T1) > rt_nr_boosted = (unsigned long)(-1) > ... > T1 prio = 0 > > then we end up leaving rt_nr_boosted in an "inconsistent" state. > > The simple program attached could reproduce the previous scenario; note > that, as a consequence of the presence of this state, the "assertion" > > WARN_ON(!rt_nr_running && rt_nr_boosted) > > from dec_rt_group() may trigger. So normally we dequeue/enqueue tasks in sched_setscheduler(), which would ensure the accounting stays correct. However in the early PI path we fail to do so. So this was introduced at around v3.14, by: c365c292d059 ("sched: Consider pi boosting in setscheduler()") which fixed another problem exactly because that dequeue/enqueue, joy. Fix this by teaching rt about DEQUEUE_SAVE/ENQUEUE_RESTORE and have it preserve runqueue location with that option. This requires decoupling the on_rt_rq() state from being on the list. In order to allow for explicit movement during the SAVE/RESTORE, introduce {DE,EN}QUEUE_MOVE. We still must use SAVE/RESTORE in these cases to preserve other invariants. Respecting the SAVE/RESTORE flags also has the (nice) side-effect that things like sys_nice()/sys_sched_setaffinity() also do not reorder FIFO tasks (whereas they used to before this patch). Change-Id: I1450923252f55dba19f450008db813113eb06c76 Reported-by: Andrea Parri <parri.andrea@gmail.com> Tested-by: Andrea Parri <parri.andrea@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Juri Lelli <juri.lelli@arm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> [pkondeti@codeaurora.org: Fix trivial merge conflict] Git-commit: ff77e468535987b3d21b7bd4da15608ea3ce7d0b Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org>
2017-02-02sched: Remove sched_enable_hmp flagOlav Haugan
Clean up the code and make it more maintainable by removing dependency on the sched_enable_hmp flag. We do not support HMP scheduler without recompiling. Enabling the HMP scheduler is done through enabling the CONFIG_SCHED_HMP config. Change-Id: I246c1b1889f8dcbc8f0a0805077c0ce5d4f083b0 Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>
2016-12-16sched: Avoid waking idle cpu for short-burst tasksSrivatsa Vaddagiri
Introduce sched_short_burst tunable to classify "short-burst" tasks. These tasks are eligible for packing to avoid overhead associated with waking up an idle CPU. select_best_cpu() ignores power-cost and selects the CPU with least wakeup latency which is not loaded with IRQs and can accommodate this task without exceeding spill limits. The ties are broken with load followed by previous CPU. This policy does not affect cluster selection but only CPU selection in the selected cluster. The tasks eligible for "wakeup-up-idle" and "boost" are not considered for packing. This policy is applied for both "fair" and "rt" scheduling class tasks. Change-Id: I2a05493fde93f58636725f18d0ce8dbce4418a30 Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org> Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org>
2016-12-09sched: Ensure proper task migration when a CPU is isolatedSyed Rameez Mustafa
migrate_tasks() migrates all tasks of a CPU by using pick_next_task(). This works in the hotplug case as we force migrate every single task allowing pick_next_task() to return a new task on every loop iteration. In the case of isolation, however, task migration is not guaranteed which causes pick_next_task() to keep returning the same task over and over again until we terminate the loop without having migrated all the tasks that were supposed to migrated. Fix the above problem by temporarily dequeuing tasks that are pinned and marking them with TASK_ON_RQ_MIGRATING. This not only allows pick_next_task() to properly walk the runqueue but also prevents any migrations or changes in affinity for the dequeued tasks. Once we are done with migrating all possible tasks, we re-enqueue all the dequeued tasks. While at it, ensure consistent ordering between task de-activation and setting the TASK_ON_RQ_MIGRATING flag across all scheduling classes. Change-Id: Id06151a8e34edab49ac76b4bffd50c132f0b792f Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
2016-11-16sched/hmp: Enhance co-location and scheduler boost featuresSyed Rameez Mustafa
The recent introduction of the schedtune cgroup controller has provided the scheduler with added flexibility in terms of some of it's placement features. In particular each cgroup under the schedtune controller can now specify: 1) Whether it needs co-location along with other cgroups 2) Whether it is eligible for scheduler boost (sched_boost_enabled) 3) Whether the kernel can override the boost eligibility when necessary (sched_boost_no_override) The scheduler now creates a reserved co-location group at boot. This group is used to co-locate all tasks that form part of any one of the cgroups that have co-location enabled. This reserved group can neither be destroyed nor reused for other purposes. Furthermore, cgroups are only allowed to indicate their co-location preference once at boot. Further updates are disallowed. Since we are now creating co-location groups for an extended period of time, there are a few other factors to consider when determining the preferred cluster for the group. We first exclude any tasks in the group that have not been observed to be running for a significant amount of time. Secondly we introduce the notion of group up and down migrate tunables to allow different migration policies than individual tasks. Lastly we break co-location if a single task in a group exceeds up-migrate but the total load of the group does not exceed group up-migrate. In terms of sched_boost, the scheduler now supports multiple types of boost. These are: 1) FULL_THROTTLE : Force up-migrate tasks belonging any cgroup that has the sched_boost_enabled flag turned on. Little CPUs will only be used when big CPUs can no longer accommodate tasks. Also up-migrate all RT tasks. 2) CONSERVATIVE : Override the sched_boost_enabled flag for all cgroups except those that have the sched_boost_no_override flag set. Force up-migrate all tasks belonging to only those cgroups that still remain eligible for boost. RT tasks do not get force up migrated. 3) RESTRAINED : Start frequency aggregation for co-located tasks. This type of boost does not force up-migrate any task. Finally the boost API removes ref-counting. This means that there can only be a single entity using boost at any given time. If multiple entities are managing boost, they are required to be well behaved so that they don't interfere with one another. Even for a single client, it is not possible to switch directly from one boost type to another. Boost must be first turned off before switching over to a new type. Change-Id: I8d224a70cbef162f27078b62b73acaa22670861d Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
2016-09-24sched: add cpu isolation supportOlav Haugan
This adds cpu isolation APIs to the scheduler to isolate and unisolate CPUs. Isolating and unisolating a CPU can be used in place of hotplug. Isolating and unisolating a CPU is faster than hotplug and can thus be used to optimize the performance and power of multi-core CPUs. Isolating works by migrating non-pinned IRQs and tasks to other CPUS and marking the CPU as not available to the scheduler and load balancer. Pinned tasks and IRQs are still allowed to run but it is expected that this would be minimal. Unisolation works by just marking the CPU available for scheduler and load balancer. Change-Id: I0bbddb56238c2958c5987877c5bfc3e79afa67cc Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>
2016-09-14sched/rt: Add Kconfig option to enable panicking for RT throttlingMatt Wagantall
This may be useful for detecting and debugging RT throttling issues. Change-Id: I5807a897d11997d76421c1fcaa2918aad988c6c9 Signed-off-by: Matt Wagantall <mattw@codeaurora.org> [rameezmustafa@codeaurora.org]: Port to msm-3.18] Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org> [jstultz: forwardported to 4.4] Signed-off-by: John Stultz <john.stultz@linaro.org>
2016-09-14sched/rt: print RT tasks when RT throttling is activatedMatt Wagantall
Existing debug prints do not provide any clues about which tasks may have triggered RT throttling. Print the names and PIDs of all tasks on the throttled rt_rq to help narrow down the source of the problem. Change-Id: I180534c8a647254ed38e89d0c981a8f8bccd741c Signed-off-by: Matt Wagantall <mattw@codeaurora.org> [rameezmustafa@codeaurora.org]: Port to msm-3.18] Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
2016-09-14sched: Introduce Window Assisted Load Tracking (WALT)Srivatsa Vaddagiri
use a window based view of time in order to track task demand and CPU utilization in the scheduler. Window Assisted Load Tracking (WALT) implementation credits: Srivatsa Vaddagiri, Steve Muckle, Syed Rameez Mustafa, Joonwoo Park, Pavan Kumar Kondeti, Olav Haugan 2016-03-06: Integration with EAS/refactoring by Vikram Mulukutla and Todd Kjos Change-Id: I21408236836625d4e7d7de1843d20ed5ff36c708 Includes fixes for issues: eas/walt: Use walt_ktime_clock() instead of ktime_get_ns() to avoid a race resulting in watchdog resets BUG: 29353986 Change-Id: Ic1820e22a136f7c7ebd6f42e15f14d470f6bbbdb Handle walt accounting anomoly during resume During resume, there is a corner case where on wakeup, a task's prev_runnable_sum can go negative. This is a workaround that fixes the condition and warns (instead of crashing). BUG: 29464099 Change-Id: I173e7874324b31a3584435530281708145773508 Signed-off-by: Todd Kjos <tkjos@google.com> Signed-off-by: Srinath Sridharan <srinathsr@google.com> Signed-off-by: Juri Lelli <juri.lelli@arm.com> [jstultz: fwdported to 4.4] Signed-off-by: John Stultz <john.stultz@linaro.org>
2016-09-14sched: rt scheduler sets capacity requirementVincent Guittot
RT tasks don't provide any running constraints like deadline ones except their running priority. The only current usable input to estimate the capacity needed by RT tasks is the rt_avg metric. We use it to estimate the CPU capacity needed for the RT scheduler class. In order to monitor the evolution for RT task load, we must peridiocally check it during the tick. Then, we use the estimated capacity of the last activity to estimate the next one which can not be that accurate but is a good starting point without any impact on the wake up path of RT tasks. Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> Signed-off-by: Steve Muckle <smuckle@linaro.org>
2016-08-11sched/rt: Add Kconfig option to enable panicking for RT throttlingMatt Wagantall
This may be useful for detecting and debugging RT throttling issues. Change-Id: I5807a897d11997d76421c1fcaa2918aad988c6c9 Signed-off-by: Matt Wagantall <mattw@codeaurora.org> [rameezmustafa@codeaurora.org]: Port to msm-3.18] Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org> [jstultz: forwardported to 4.4] Signed-off-by: John Stultz <john.stultz@linaro.org>
2016-08-11sched/rt: print RT tasks when RT throttling is activatedMatt Wagantall
Existing debug prints do not provide any clues about which tasks may have triggered RT throttling. Print the names and PIDs of all tasks on the throttled rt_rq to help narrow down the source of the problem. Change-Id: I180534c8a647254ed38e89d0c981a8f8bccd741c Signed-off-by: Matt Wagantall <mattw@codeaurora.org> [rameezmustafa@codeaurora.org]: Port to msm-3.18] Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
2016-08-11sched: Introduce Window Assisted Load Tracking (WALT)Srivatsa Vaddagiri
use a window based view of time in order to track task demand and CPU utilization in the scheduler. Window Assisted Load Tracking (WALT) implementation credits: Srivatsa Vaddagiri, Steve Muckle, Syed Rameez Mustafa, Joonwoo Park, Pavan Kumar Kondeti, Olav Haugan 2016-03-06: Integration with EAS/refactoring by Vikram Mulukutla and Todd Kjos Change-Id: I21408236836625d4e7d7de1843d20ed5ff36c708 Includes fixes for issues: eas/walt: Use walt_ktime_clock() instead of ktime_get_ns() to avoid a race resulting in watchdog resets BUG: 29353986 Change-Id: Ic1820e22a136f7c7ebd6f42e15f14d470f6bbbdb Handle walt accounting anomoly during resume During resume, there is a corner case where on wakeup, a task's prev_runnable_sum can go negative. This is a workaround that fixes the condition and warns (instead of crashing). BUG: 29464099 Change-Id: I173e7874324b31a3584435530281708145773508 Signed-off-by: Todd Kjos <tkjos@google.com> Signed-off-by: Srinath Sridharan <srinathsr@google.com> Signed-off-by: Juri Lelli <juri.lelli@arm.com> [jstultz: fwdported to 4.4] Signed-off-by: John Stultz <john.stultz@linaro.org>
2016-05-10sched: rt scheduler sets capacity requirementVincent Guittot
RT tasks don't provide any running constraints like deadline ones except their running priority. The only current usable input to estimate the capacity needed by RT tasks is the rt_avg metric. We use it to estimate the CPU capacity needed for the RT scheduler class. In order to monitor the evolution for RT task load, we must peridiocally check it during the tick. Then, we use the estimated capacity of the last activity to estimate the next one which can not be that accurate but is a good starting point without any impact on the wake up path of RT tasks. Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> Signed-off-by: Steve Muckle <smuckle@linaro.org>
2016-03-23sched: let sched_boost take precedence over sched_restrict_cluster_spillPavankumar Kondeti
When sched_restrict_cluster_spill knob is enabled, RT tasks are restricted to lower power cluster. This knob also restricts inter cluster no-hz kicks. Ignore this knob setting when sched_boost is enabled so that tasks are placed on CPUs with highest spare capacity. CRs-Fixed: 968852 Change-Id: I01b3fc10b39dc834a733d64c2ee29c308d7ff730 Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org>
2016-03-23sched: Add separate load tracking histogram to predict loadsPavankumar Kondeti
Current window based load tracking only saves history for five windows. A historically heavy task's heavy load will be completely forgotten after five windows of light load. Even before the five window expires, a heavy task wakes up on same CPU it used to run won't trigger any frequency change until end of the window. It would starve for the entire window. It also adds one "small" load window to history because it's accumulating load at a low frequency, further reducing the tracked load for this heavy task. Ideally, scheduler should be able to identify such tasks and notify governor to increase frequency immediately after it wakes up. Add a histogram for each task to track a much longer load history. A prediction will be made based on runtime of previous or current window, histogram data and load tracked in recent windows. Prediction of all tasks that is currently running or runnable on a CPU is aggregated and reported to CPUFreq governor in sched_get_cpus_busy(). sched_get_cpus_busy() now returns predicted busy time in addition to previous window busy time and new task busy time, scaled to the CPU maximum possible frequency. Tunables: - /proc/sys/kernel/sched_gov_alert_freq (KHz) This tunable can be used to further filter the notifications. Frequency alert notification is sent only when the predicted load exceeds previous window load by sched_gov_alert_freq converted to load. Change-Id: If29098cd2c5499163ceaff18668639db76ee8504 Suggested-by: Saravana Kannan <skannan@codeaurora.org> Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org> Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org> Signed-off-by: Junjie Wu <junjiew@codeaurora.org> [joonwoop@codeaurora.org: fixed merge conflicts around __migrate_task() and removed changes for CONFIG_SCHED_QHMP.]
2016-03-23sched: Provide a facility to restrict RT tasks to lower power clusterPavankumar Kondeti
The current CPU selection algorithm for RT tasks looks for the least loaded CPU in all clusters. Stop the search at the lowest possible power cluster based on "sched_restrict_cluster_spill" sysctl tunable. Change-Id: I34fdaefea56e0d1b7e7178d800f1bb86aa0ec01c Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org>
2016-03-23sched: Update fair and rt placement logic to use scheduler clustersSrivatsa Vaddagiri
Make use of clusters in the fair and rt scheduling classes. This is needed as the freq domain mask can no longer be used to do correct task placement. The freq domain mask was being used to demarcate clusters. Change-Id: I57f74147c7006f22d6760256926c10fd0bf50cbd Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org> Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org> [joonwoop@codeaurora.org: fixed merge conflicts due to omitted changes for CONFIG_SCHED_QHMP.] Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2016-03-23sched: Optimize scheduler trace events to reduce trace buffer usageSyed Rameez Mustafa
Scheduler ftrace events currently generate a lot of data when turned on. The excessive log messages often end up overflowing trace buffers for long use cases or crowding out other events. Optimize scheduler events so that the log spew is less and more manageable. To that end change the variable type for some event fields; introduce variants of sched_cpu_load that can be turned on/off for separate code paths and remove unused fields from various events. Change-Id: I2b313542b39ad5e09a01ad1303b5dfe2c4883b8a Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org> [joonwoop@codeaurora.org: fixed conflict in rt.c due to CONFIG_SCHED_QHMP.] Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2016-03-23sched: add preference for prev and sibling CPU in RT task placementJoonwoo Park
Add a bias towards the RT task's previous CPU and sibling CPUs in order to avoid cache bouncing and migrations. CRs-fixed: 927903 Change-Id: I45d79d774e65efcb38282130b6692b4c3b03c2f0 Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2016-03-23sched: precompute required frequency for CPU loadJoonwoo Park
At present in order to estimate power cost of CPU load, HMP scheduler converts CPU load to coresponding frequency on the fly which can be avoided. Optimize and reduce execution time of select_best_cpu() by precomputing CPU load to frequency conversion. This optimization reduces about ~20% of execution time of select_best_cpu() on average. Change-Id: I385c57f2ea9a50883b76ba6ca3deb673b827217f [joonwoop@codeaurora.org: fixed minior conflict in kernel/sched/sched.h. stripped out codes for CONFIG_SCHED_QHMP.] Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2016-03-23sched: clean up fixup_hmp_sched_stats()Joonwoo Park
The commit 392edf4969d20 ("sched: avoid stale cumulative_runnable_avg HMP statistics) introduced the callback function fixup_hmp_sched_stats() so update_history() can avoid decrement and increment pair of HMP stat. However the commit also made fixup function to do obscure p->ravg.demand update which isn't the cleanest way. Revise the function fixup_hmp_sched_stats() so the caller can update p->ravg.demand directly. Change-Id: Id54667d306495d2109c26362813f80f08a1385ad [joonwoop@codeaurora.org: stripped out CONFIG_SCHED_QHMP.] Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2016-03-23sched: Update task->on_rq when tasks are moving between runqueuesOlav Haugan
Task->on_rq has three states: 0 - Task is not on runqueue (rq) 1 (TASK_ON_RQ_QUEUED) - Task is on rq 2 (TASK_ON_RQ_MIGRATING) - Task is on rq but in the process of being migrated to another rq When a task is moving between rqs task->on_rq state should be TASK_ON_RQ_MIGRATING CRs-fixed: 884720 Change-Id: I1572aba00a0273d4ad5bc9a3dd60fb68e2f0b895 Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>
2016-03-23sched: Update the wakeup placement logic for fair and rt tasksSyed Rameez Mustafa
For the fair sched class, update the select_best_cpu() policy to do power based placement. The hope is to minimize the voltage at which the CPU runs. While RT tasks already do power based placement, their placement preference has to now take into account the power cost of all tasks on a given CPU. Also remove the check for sched_boost since sched_boost no longer intends to elevate all tasks to the highest capacity cluster. Change-Id: Ic6a7625c97d567254d93b94cec3174a91727cb87 Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
2016-03-23sched: avoid stale cumulative_runnable_avg HMP statisticsJoonwoo Park
When a new window starts for a task and the task is on a rq, scheduler decreases rq's cumulative_runnable_avg momentarily, re-account task's demand and increases rq's cumulative_runnable_avg with newly accounted task's demand. Therefore there is short time period that rq's cumulative_runnable_avg is less than what it's supposed to be. Meanwhile, there is chance that other CPU is in search of best CPU to place a task and makes suboptimal decision with momentarily stale cumulative_runnable_avg. Fix such issue by adding or subtracting of delta between task's old and new demand instead of decrementing and incrementing of entire task's load. Change-Id: I3c9329961e6f96e269fa13359e7d1c39c4973ff2 Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>