android_kernel_zuk_msm8996.git

	Commit message (Collapse)	Author	Age
...
* \| \|	ARC: add/fix some comments in code - no functional change	Vineet Gupta	2015-08-20
\| \| \| \| \| \| \| \| \| \| \| \|	Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
* \| \|	ARC: change some branchs to jumps to resolve linkage errors	Yuriy Kolerov	2015-08-20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When kernel's binary becomes large enough (32M and more) errors may occur during the final linkage stage. It happens because the build system uses short relocations for ARC by default. This problem may be easily resolved by passing -mlong-calls option to GCC to use long absolute jumps (j) instead of short relative branchs (b). But there are fragments of pure assembler code exist which use branchs in inappropriate places and cause a linkage error because of relocations overflow. First of these fragments is .fixup insertion in futex.h and unaligned.c. It inserts a code in the separate section (.fixup) with branch instruction. It leads to the linkage error when kernel becomes large. Second of these fragments is calling scheduler's functions (common kernel code) from entry.S of ARC's code. When kernel's binary becomes large it may lead to the linkage error because scheduler may occur far enough from ARC's code in the final binary. Signed-off-by: Yuriy Kolerov <yuriy.kolerov@synopsys.com> Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
* \| \|	ARC: ensure futex ops are atomic in !LLSC config	Vineet Gupta	2015-08-20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	W/o hardware assisted atomic r-m-w the best we can do is to disable preemption. Cc: David Hildenbrand <dahi@linux.vnet.ibm.com> Cc: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Michel Lespinasse <walken@google.com> Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
* \| \|	ARC: make futex_atomic_cmpxchg_inatomic() return bimodal	Vineet Gupta	2015-08-20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Callers of cmpxchg_futex_value_locked() in futex code expect bimodal return value: !0 (essentially -EFAULT as failure) 0 (success) Before this patch, the success return value was old value of futex, which could very well be non zero, causing caller to possibly take the failure path erroneously. Fix that by returning 0 for success (This fix was done back in 2011 for all upstream arches, which ARC obviously missed) Cc: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Michel Lespinasse <walken@google.com> Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
* \| \|	ARC: futex cosmetics	Vineet Gupta	2015-08-20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Cc: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Michel Lespinasse <walken@google.com> Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
* \| \|	ARC: add barriers to futex code	Vineet Gupta	2015-08-20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The atomic ops on futex need to provide the full barrier just like regular atomics in kernel. Also remove pagefault_enable/disable in futex_atomic_cmpxchg_inatomic() as core code already does that Cc: David Hildenbrand <dahi@linux.vnet.ibm.com> Cc: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Michel Lespinasse <walken@google.com> Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
* \| \|	ARCv2: Support IO Coherency and permutations involving L1 and L2 caches	Alexey Brodkin	2015-08-20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In case of ARCv2 CPU there're could be following configurations that affect cache handling for data exchanged with peripherals via DMA: [1] Only L1 cache exists [2] Both L1 and L2 exist, but no IO coherency unit [3] L1, L2 caches and IO coherency unit exist Current implementation takes care of [1] and [2]. Moreover support of [2] is implemented with run-time check for SLC existence which is not super optimal. This patch introduces support of [3] and rework of DMA ops usage. Instead of doing run-time check every time a particular DMA op is executed we'll have 3 different implementations of DMA ops and select appropriate one during init. As for IOC support for it we need: [a] Implement empty DMA ops because IOC takes care of cache coherency with DMAed data [b] Route dma_alloc_coherent() via dma_alloc_noncoherent() This is required to make IOC work in first place and also serves as optimization as LD/ST to coherent buffers can be srviced from caches w/o going all the way to memory Signed-off-by: Alexey Brodkin <abrodkin@synopsys.com> [vgupta: -Added some comments about IOC gains -Marked dma ops as static, -Massaged changelog a bit] Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
* \| \|	ARCv2: spinlock/rwlock/atomics: reduce 1 instruction in exponential backoff	Vineet Gupta	2015-08-07
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The increment of delay counter was 2 instructions: Arithmatic Shfit Left (ASL) + set to 1 on overflow This can be done in 1 using ROtate Left (ROL) Suggested-by: Nigel Topham <ntopham@synopsys.com> Cc: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: linux-kernel@vger.kernel.org Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
* \| \|	ARC: Make pt_regs regs unsigned	Vineet Gupta	2015-08-05
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	KGDB fails to build after f51e2f191112 ("ARC: make sure instruction_pointer() returns unsigned value") The hack to force one specific reg to unsigned backfired. There's no reason to keep the regs signed after all. \| CC arch/arc/kernel/kgdb.o \|../arch/arc/kernel/kgdb.c: In function 'kgdb_trap': \| ../arch/arc/kernel/kgdb.c:180:29: error: lvalue required as left operand of assignment \| instruction_pointer(regs) -= BREAK_INSTR_SIZE; Reported-by: Yuriy Kolerov <yuriy.kolerov@synopsys.com> Fixes: f51e2f191112 ("ARC: make sure instruction_pointer() returns unsigned value") Cc: Alexey Brodkin <abrodkin@synopsys.com> Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
* \| \|	ARCv2: spinlock/rwlock: Reset retry delay when starting a new spin-wait cycle	Vineet Gupta	2015-08-04
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The previous commit for delayed retry of SCOND needs some fine tuning for spin locks. The backoff from delayed retry in conjunction with spin looping of lock itself can potentially cause the delay counter to reach high values. So to provide fairness to any lock operation, after a lock "seems" available (i.e. just before first SCOND try0, reset the delay counter back to starting value of 1 Essentially reset delay to 1 for a new spin-wait-loop-acquire cycle. Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
* \| \|	ARCv2: spinlock/rwlock/atomics: Delayed retry of failed SCOND with ↵	Vineet Gupta	2015-08-04
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	exponential backoff This is to workaround the llock/scond livelock HS38x4 could get into a LLOCK/SCOND livelock in case of multiple overlapping coherency transactions in the SCU. The exclusive line state keeps rotating among contenting cores leading to a never ending cycle. So break the cycle by deferring the retry of failed exclusive access (SCOND). The actual delay needed is function of number of contending cores as well as the unrelated coherency traffic from other cores. To keep the code simple, start off with small delay of 1 which would suffice most cases and in case of contention double the delay. Eventually the delay is sufficient such that the coherency pipeline is drained, thus a subsequent exclusive access would succeed. Link: http://lkml.kernel.org/r/1438612568-28265-1-git-send-email-vgupta@synopsys.com Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
* \| \|	ARC: LLOCK/SCOND based rwlock	Vineet Gupta	2015-08-04
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	With LLOCK/SCOND, the rwlock counter can be atomically updated w/o need for a guarding spin lock. This in turn elides the EXchange instruction based spinning which causes the cacheline transition to exclusive state and concurrent spinning across cores would cause the line to keep bouncing around. LLOCK/SCOND based implementation is superior as spinning on LLOCK keeps the cacheline in shared state. Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
* \| \|	ARC: LLOCK/SCOND based spin_lock	Vineet Gupta	2015-08-04
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Current spin_lock uses EXchange instruction to implement the atomic test and set of lock location (reads orig value and ST 1). This however forces the cacheline into exclusive state (because of the ST) and concurrent loops in multiple cores will bounce the line around between cores. Instead, use LLOCK/SCOND to implement the atomic test and set which is better as line is in shared state while lock is spinning on LLOCK The real motivation of this change however is to make way for future changes in atomics to implement delayed retry (with backoff). Initial experiment with delayed retry in atomics combined with orig EX based spinlock was a total disaster (broke even LMBench) as struct sock has a cache line sharing an atomic_t and spinlock. The tight spinning on lock, caused the atomic retry to keep backing off such that it would never finish. Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
* \| \|	ARC: refactor atomic inline asm operands with symbolic names	Vineet Gupta	2015-08-04
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This reduces the diff in forth-coming patches and also helps understand better the incremental changes to inline asm. Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
* \| \|	Revert "ARCv2: STAR 9000837815 workaround hardware exclusive transactions ↵	Vineet Gupta	2015-08-04
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	livelock" Extended testing of quad core configuration revealed that this fix was insufficient. Specifically LTP open posix shm_op/23-1 would cause the hardware livelock in llock/scond loop in update_cpu_load_active() So remove this and make way for a proper workaround This reverts commit a5c8b52abe677977883655166796f167ef1e0084. Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
* \| \|	ARCv2: Fix the peripheral address space detection	Vineet Gupta	2015-08-03
\|/ / \| \| \| \| \| \| \| \| \| \| \| \| \| \|	With HS 2.1 release, the peripheral space register no longer contains the uncached space specifics, causing the kernel to panic early on. So read the newer NON VOLATILE AUX register to get that info. Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
* \|	mm: clean up per architecture MM hook header files	Laurent Dufour	2015-07-17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Commit 2ae416b142b6 ("mm: new mm hook framework") introduced an empty header file (mm-arch-hooks.h) for every architecture, even those which doesn't need to define mm hooks. As suggested by Geert Uytterhoeven, this could be cleaned through the use of a generic header file included via each per architecture asm/include/Kbuild file. The PowerPC architecture is not impacted here since this architecture has to defined the arch_remap MM hook. Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com> Suggested-by: Geert Uytterhoeven <geert@linux-m68k.org> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Acked-by: Vineet Gupta <vgupta@synopsys.com> Cc: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* \|	ARC: make sure instruction_pointer() returns unsigned value	Alexey Brodkin	2015-07-13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently instruction_pointer() returns pt_regs->ret and so return value is of type "long", which implicitly stands for "signed long". While that's perfectly fine when dealing with 32-bit values if return value of instruction_pointer() gets assigned to 64-bit variable sign extension may happen. And at least in one real use-case it happens already. In perf_prepare_sample() return value of perf_instruction_pointer() (which is an alias to instruction_pointer() in case of ARC) is assigned to (struct perf_sample_data)->ip (which type is "u64"). And what we see if instuction pointer points to user-space application that in case of ARC lays below 0x8000_0000 "ip" gets set properly with leading 32 zeros. But if instruction pointer points to kernel address space that starts from 0x8000_0000 then "ip" is set with 32 leadig "f"-s. I.e. id instruction_pointer() returns 0x8100_0000, "ip" will be assigned with 0xffff_ffff__8100_0000. Which is obviously wrong. In particular that issuse broke output of perf, because perf was unable to associate addresses like 0xffff_ffff__8100_0000 with anything from /proc/kallsyms. That's what we used to see: ----------->8---------- 6.27% ls [unknown] [k] 0xffffffff8046c5cc 2.96% ls libuClibc-0.9.34-git.so [.] memcpy 2.25% ls libuClibc-0.9.34-git.so [.] memset 1.66% ls [unknown] [k] 0xffffffff80666536 1.54% ls libuClibc-0.9.34-git.so [.] 0x000224d6 1.18% ls libuClibc-0.9.34-git.so [.] 0x00022472 ----------->8---------- With that change perf output looks much better now: ----------->8---------- 8.21% ls [kernel.kallsyms] [k] memset 3.52% ls libuClibc-0.9.34-git.so [.] memcpy 2.11% ls libuClibc-0.9.34-git.so [.] malloc 1.88% ls libuClibc-0.9.34-git.so [.] memset 1.64% ls [kernel.kallsyms] [k] _raw_spin_unlock_irqrestore 1.41% ls [kernel.kallsyms] [k] __d_lookup_rcu ----------->8---------- Signed-off-by: Alexey Brodkin <abrodkin@synopsys.com> Cc: arc-linux-dev@synopsys.com Cc: stable@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
* \|	ARC: Add llock/scond to futex backend	Vineet Gupta	2015-07-09
\| \| \| \| \| \| \| \|	Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
* \|	ARC: Make ARC bitops "safer" (add anti-optimization)	Vineet Gupta	2015-07-09
\|/ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	ARCompact/ARCv2 ISA provide that any instructions which deals with bitpos/count operand ASL, LSL, BSET, BCLR, BMSK .... will only consider lower 5 bits. i.e. auto-clamp the pos to 0-31. ARC Linux bitops exploited this fact by NOT explicitly masking out upper bits for @nr operand in general, saving a bunch of AND/BMSK instructions in generated code around bitops. While this micro-optimization has worked well over years it is NOT safe as shifting a number with a value, greater than native size is "undefined" per "C" spec. So as it turns outm EZChip ran into this eventually, in their massive muti-core SMP build with 64 cpus. There was a test_bit() inside a loop from 63 to 0 and gcc was weirdly optimizing away the first iteration (so it was really adhering to standard by implementing undefined behaviour vs. removing all the iterations which were phony i.e. (1 << [63..32]) \| for i = 63 to 0 \| X = ( 1 << i ) \| if X == 0 \| continue So fix the code to do the explicit masking at the expense of generating additional instructions. Fortunately, this can be mitigated to a large extent as gcc has SHIFT_COUNT_TRUNCATED which allows combiner to fold masking into shift operation itself. It is currently not enabled in ARC gcc backend, but could be done after a bit of testing. Fixes STAR 9000866918 ("unsafe "undefined behavior" code in kernel") Reported-by: Noam Camus <noamc@ezchip.com> Cc: <stable@vger.kernel.org> Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
*	Merge branch 'akpm' (patches from Andrew)	Linus Torvalds	2015-07-01
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Merge third patchbomb from Andrew Morton: - the rest of MM - scripts/gdb updates - ipc/ updates - lib/ updates - MAINTAINERS updates - various other misc things * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (67 commits) genalloc: rename of_get_named_gen_pool() to of_gen_pool_get() genalloc: rename dev_get_gen_pool() to gen_pool_get() x86: opt into HAVE_COPY_THREAD_TLS, for both 32-bit and 64-bit MAINTAINERS: add zpool MAINTAINERS: BCACHE: Kent Overstreet has changed email address MAINTAINERS: move Jens Osterkamp to CREDITS MAINTAINERS: remove unused nbd.h pattern MAINTAINERS: update brcm gpio filename pattern MAINTAINERS: update brcm dts pattern MAINTAINERS: update sound soc intel patterns MAINTAINERS: remove website for paride MAINTAINERS: update Emulex ocrdma email addresses bcache: use kvfree() in various places libcxgbi: use kvfree() in cxgbi_free_big_mem() target: use kvfree() in session alloc and free IB/ehca: use kvfree() in ipz_queue_{cd}tor() drm/nouveau/gem: use kvfree() in u_free() drm: use kvfree() in drm_free_large() cxgb4: use kvfree() in t4_free_mem() cxgb3: use kvfree() in cxgb_free_mem() ...
\| *	arc: use for_each_sg()	Akinobu Mita	2015-06-30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This replaces the plain loop over the sglist array with for_each_sg() macro which consists of sg_next() function calls. Since arc doesn't select ARCH_HAS_SG_CHAIN, it is not necessary to use for_each_sg() in order to loop over each sg element. But this can help find problems with drivers that do not properly initialize their sg tables when CONFIG_DEBUG_SG is enabled. Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Acked-by: Vineet Gupta <vgupta@synopsys.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* \|	Merge tag 'arc-4.2-rc1' of ↵	Linus Torvalds	2015-07-01
\|\ \ \| \|/ \|/\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc Pull ARC architecture updates from Vineet Gupta: - support for HS38 cores based on ARCv2 ISA ARCv2 is the next generation ISA from Synopsys and basis for the HS3{4,6,8} families of processors which retain the traditional ARC mantra of low power and configurability and are now more performant and feature rich. HS38x is a 10 stage pipeline core which supports MMU (with huge pages) and SMP (upto 4 cores) among other features. + www.synopsys.com/dw/ipdir.php?ds=arc-hs38-processor + http://news.synopsys.com/2014-10-14-New-DesignWare-ARC-HS38-Processor-Doubles-Performance-for-Embedded-Linux-Applications + http://www.embedded.com/electronics-news/4435975/Synopsys-ARC-HS38-core-gives-2X-boost-to-Linux-based-apps - support for ARC SDP (Software Development platform): Main Board + CPU Cards = AXS101: CPU Card with ARC700 in silicon @ 700 MHz = AXS103: CPU Card with HS38x in FPGA - refactoring of ARCompact port to accomodate new ARCv2 ISA - misc updates/cleanups * tag 'arc-4.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc: (72 commits) ARC: Fix build failures for ARCompact in linux-next after ARCv2 support ARCv2: Allow older gcc to cope with new regime of ARCv2/ARCompact support ARCv2: [vdk] dts files and defconfig for HS38 VDK ARCv2: [axs103] Support ARC SDP FPGA platform for HS38x cores ARC: [axs101] Prepare for AXS103 ARCv2: [nsimhs] Support simulation platforms for HS38x cores ARCv2: All bits in place, allow ARCv2 builds ARCv2: SLC: Handle explcit flush for DMA ops (w/o IO-coherency) ARCv2: STAR 9000837815 workaround hardware exclusive transactions livelock ARC: Reduce bitops lines of code using macros ARCv2: barriers arch: conditionally define smp_{mb,rmb,wmb} ARC: add smp barriers around atomics per Documentation/atomic_ops.txt ARC: add compiler barrier to LLSC based cmpxchg ARCv2: SMP: intc: IDU 2nd level intc for dynamic IRQ distribution ARCv2: SMP: clocksource: Enable Global Real Time counter ARCv2: SMP: ARConnect debug/robustness ARCv2: SMP: Support ARConnect (MCIP) for Inter-Core-Interrupts et al ARC: make plat_smp_ops weak to allow over-rides ARCv2: clocksource: Introduce 64bit local RTC counter ...
\| *	ARCv2: SLC: Handle explcit flush for DMA ops (w/o IO-coherency)	Vineet Gupta	2015-06-25
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	L2 cache on ARCHS processors is called SLC (System Level Cache) For working DMA (in absence of hardware assisted IO Coherency) we need to manage SLC explicitly when buffers transition between cpu and controllers. Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
\| *	ARCv2: STAR 9000837815 workaround hardware exclusive transactions livelock	Vineet Gupta	2015-06-25
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	A quad core SMP build could get into hardware livelock with concurrent LLOCK/SCOND. Workaround that by adding a PREFETCHW which is serialized by SCU (System Coherency Unit). It brings the cache line in Exclusive state and makes others invalidate their lines. This gives enough time for winner to complete the LLOCK/SCOND, before others can get the line back. The prefetchw in the ll/sc loop is not nice but this is the only software workaround for current version of RTL. Cc: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
\| *	ARC: Reduce bitops lines of code using macros	Vineet Gupta	2015-06-25
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	No semantical changes ! Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
\| *	ARCv2: barriers	Vineet Gupta	2015-06-25
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	ARCv2 based HS38 cores are weakly ordered and thus explicit barriers for kernel proper. SMP barrier is provided by DMB instruction which also guarantees local barrier hence used as backend of smp_mb() as well as mb() APIs Also hookup barriers into MMIO accessors to avoid ordering issues in IO Cc: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
\| *	ARC: add smp barriers around atomics per Documentation/atomic_ops.txt	Vineet Gupta	2015-06-25
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	- arch_spin_lock/unlock were lacking the ACQUIRE/RELEASE barriers Since ARCv2 only provides load/load, store/store and all/all, we need the full barrier - LLOCK/SCOND based atomics, bitops, cmpxchg, which return modified values were lacking the explicit smp barriers. - Non LLOCK/SCOND varaints don't need the explicit barriers since that is implicity provided by the spin locks used to implement the critical section (the spin lock barriers in turn are also fixed in this commit as explained above Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: stable@vger.kernel.org Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
\| *	ARC: add compiler barrier to LLSC based cmpxchg	Vineet Gupta	2015-06-25
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When auditing cmpxchg call sites, Chuck noted that gcc was optimizing away some of the desired LDs. \| do { \| new = old = *ipi_data_ptr; \| new \|= 1U << msg; \| } while (cmpxchg(ipi_data_ptr, old, new) != old); was generating to below \| 8015cef8: ld r2,[r4,0] <-- First LD \| 8015cefc: bset r1,r2,r1 \| \| 8015cf00: llock r3,[r4] <-- atomic op \| 8015cf04: brne r3,r2,8015cf10 \| 8015cf08: scond r1,[r4] \| 8015cf0c: bnz 8015cf00 \| \| 8015cf10: brne r3,r2,8015cf00 <-- Branch doesn't go to orig LD Although this was fixed by adding a ACCESS_ONCE in this call site, it seems safer (for now at least) to add compiler barrier to LLSC based cmpxchg Reported-by: Chuck Jordan <cjordan@synopsys,com> Cc: <stable@vger.kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
\| *	ARCv2: SMP: clocksource: Enable Global Real Time counter	Vineet Gupta	2015-06-22
\| \| \| \| \| \| \| \| \| \| \| \|	Cc: Daniel Lezcano <daniel.lezcano@linaro.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
\| *	ARCv2: SMP: Support ARConnect (MCIP) for Inter-Core-Interrupts et al	Vineet Gupta	2015-06-22
\| \| \| \| \| \| \| \| \| \| \| \|	Cc: Jason Cooper <jason@lakedaemon.net> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
\| *	ARCv2: Adhere to Zero Delay loop restriction	Vineet Gupta	2015-06-22
\| \| \| \| \| \| \| \| \| \| \| \|	Branch insn can't be scheduled as last insn of Zero Overhead loop Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
\| *	ARCv2: MMUv4: support aliasing icache config	Vineet Gupta	2015-06-22
\| \| \| \| \| \| \| \| \| \| \| \|	This is also default for AXS103 release Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
\| *	ARCv2: MMUv4: cache programming model changes	Vineet Gupta	2015-06-22
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Caveats about cache flush on ARCv2 based cores - dcache is PIPT so paddr is sufficient for cache maintenance ops (no need to setup PTAG reg - icache is still VIPT but only aliasing configs need PTAG setup So basically this is departure from MMU-v3 which always need vaddr in line ops registers (DC_IVDL, DC_FLDL, IC_IVIL) but paddr in DC_PTAG, IC_PTAG respectively. Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
\| *	ARCv2: MMUv4: TLB programming Model changes	Vineet Gupta	2015-06-22
\| \| \| \| \| \| \| \|	Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
\| *	ARCv2: STAR 9000814690: Really Re-enable interrupts to avoid deadlocks	Vineet Gupta	2015-06-22
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The issue was, on HS when interrupt is taken, IRQ_ACT is set and that is NOT cleared unless we do RTIE (or manually clear it). Linux interrupt handling has top and bottom halves. Latter lead to softirqs (which can reschedule) AND expect interrupts to be REALLY re-enabled which was NOT happening for us since we only SETI, dont clear IRQ_ACT So we can have a state when both cores have taken interrupt (IRQ_ACT set), get rescheduled, both send IPI and wait in CSD lock which will never be cleared as cores can't take the pending IPI IRQ due to existing IRQ_ACT set. So local_irq_enable() now drops the IRQ_ACT.act bit to re-enable IRQs. Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
\| *	ARCv2: STAR 9000808988: signals involving Delay Slot	Vineet Gupta	2015-06-22
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Reported by Anton as LTP:munmap01 failing with Illegal Instruction Exception. --------------------->8-------------------------------------- mmap2(NULL, 24576, PROT_READ\|PROT_WRITE, MAP_SHARED, 3, 0) = 0x200d2000 munmap(0x200d2000, 24576) = 0 --- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_MAPERR, si_addr=0x200d2000} --- potentially unexpected fatal signal 4. Path: /munmap01 CPU: 0 PID: 61 Comm: munmap01 Not tainted 3.13.0-g5d5c46d9a556 #8 task: 9f1a8000 ti: 9f154000 task.ti: 9f154000 [ECR ]: 0x00020100 => Illegal Insn [EFA ]: 0x0001354c [BLINK ]: 0x200515d4 [ERET ]: 0x1354c @off 0x1354c in [/munmap01] VMA: 0x00010000 to 0x00018000 [STAT32]: 0x800802c0 ... --------------------->8-------------------------------------- The issue was 1. munmap01 accessed unmapped memory (on purpose) with signal handler installed for SIGSEGV 2. The faulting instruction happened to be in Delay Slot 00011864 <main>: 11908: bl.d 13284 <tst_resm> 1190c: stb r16,[r2] 3. kernel sets up the reg file for signal handler and correctly clears the DE bit in pt_regs->status32 placeholder 4. However RESTORE_CALLEE_SAVED_USER macro is not adjusted for ARCv2, and it over-writes the above with orig/stale value of status32 5. After RTIE, userspace signal handler executes a non branch instruction with DE bit set, triggering Illegal Instruction Exception. Reported-by: Anton Kolesov <akolesov@synopsys.com> Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
\| *	ARCv2: Support for ARCv2 ISA and HS38x cores	Vineet Gupta	2015-06-22
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The notable features are: - SMP configurations of upto 4 cores with coherency - Optional L2 Cache and IO-Coherency - Revised Interrupt Architecture (multiple priorites, reg banks, auto stack switch, auto regfile save/restore) - MMUv4 (PIPT dcache, Huge Pages) - Instructions for * 64bit load/store: LDD, STD * Hardware assisted divide/remainder: DIV, REM * Function prologue/epilogue: ENTER_S, LEAVE_S * IRQ enable/disable: CLRI, SETI * pop count: FFS, FLS * SETcc, BMSKN, XBFU... Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
\| *	ARCv2: [intc] HS38 core interrupt controller	Vineet Gupta	2015-06-22
\| \| \| \| \| \| \| \| \| \| \| \|	Cc: Jason Cooper <jason@lakedaemon.net> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
\| *	ARC: uncached base is hard constant for ARC, don't save it	Vineet Gupta	2015-06-22
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	ioremap already uses the hard define, just make sure BCR value matches that Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
\| *	ARC: intc: split into ARCompact ISA specific, common bits	Vineet Gupta	2015-06-19
\| \| \| \| \| \| \| \|	Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
\| *	ARC: entry.S: [arcompact] simplify SWITCH_TO_KERNEL_STK	Vineet Gupta	2015-06-19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Previously this macro was overloaded with stack switching, saving SP at right slot in pt_regs, saving/setup of r25 and setting SP baseline to where pt_regs->sp is saved (vs. bottom of pt_regs) Now it only does SP switch, and leaves SP pointing to bottom of pt_regs. r25 saving is no longer done here to allow for future reordering of regfile in pt_regs w/o touching this macro Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
\| *	ARC: entry.S: micro-optimize Trap handler	Vineet Gupta	2015-06-19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Elide the need to re-read ECR in Trap handler by ensuring that EXCEPTION_PROLOGUE does that at the very end just before returning to Trap handler ARCv2 EXCEPTION_PROLOGUE already did that, so same for ARcompact and the common trap handler adjusted to use cached ECR Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
\| *	ARC: entry.S: split into ARCompact ISA specific, common bits	Vineet Gupta	2015-06-19
\| \| \| \| \| \| \| \|	Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
\| *	ARC: entry.S: FAKE_RET_FROM_EXCPN can always use r9	Vineet Gupta	2015-06-19
\| \| \| \| \| \| \| \|	Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
\| *	ARC: entry.S: canonical'ize EXCEPTION_{PROLOGUE,EPILOGUE}	Vineet Gupta	2015-06-19
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	-EXCEPTION_EPILOGUE introduced -EXCEPTION_PROLOGUE now also includes reg file saving Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
\| *	ARC: entry.S: Introduce INTERRUPT_{PROLOGUE,EPILOGUE}	Vineet Gupta	2015-06-19
\| \| \| \| \| \| \| \| \| \| \| \|	-common'ize macros for level 1 and level 2 interrupts Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
\| *	ARC: entry.S: common'ize scrtach reg freeup in intr + exceptions	Vineet Gupta	2015-06-19
\| \| \| \| \| \| \| \|	Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
\| *	ARC: mm: document system mem map clearly	Vineet Gupta	2015-06-19
\| \| \| \| \| \| \| \|	Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
\| *	ARC: compress cpuinfo_arc_mmu (mainly save page size in KB)	Vineet Gupta	2015-06-19
\| \| \| \| \| \| \| \|	Signed-off-by: Vineet Gupta <vgupta@synopsys.com>