| Commit message (Collapse) | Author |
|
Expose socket options for setting a classic or extended BPF program
for use when selecting sockets in an SO_REUSEPORT group. These options
can be used on the first socket to belong to a group before bind or
on any socket in the group after bind.
This change includes refactoring of the existing sk_filter code to
allow reuse of the existing BPF filter validation checks.
Signed-off-by: Craig Gallek <kraig@google.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Chatur27 <jasonbright2709@gmail.com>
Change-Id: I4433dfd5d865bb69e8d3cab55cc68325829e8b49
|
|
commit 7de21e679e6a789f3729e8402bc440b623a28eae upstream.
A few archs like powerpc have different errno.h values for macros
EDEADLOCK and EDEADLK. In code including both libc and linux versions of
errno.h, this can result in multiple definitions of EDEADLOCK in the
include chain. Definitions to the same value (e.g. seen with mips) do
not raise warnings, but on powerpc there are redefinitions changing the
value, which raise warnings and errors (if using "-Werror").
Guard against these redefinitions to avoid build errors like the following,
first seen cross-compiling libbpf v5.8.9 for powerpc using GCC 8.4.0 with
musl 1.1.24:
In file included from ../../arch/powerpc/include/uapi/asm/errno.h:5,
from ../../include/linux/err.h:8,
from libbpf.c:29:
../../include/uapi/asm-generic/errno.h:40: error: "EDEADLOCK" redefined [-Werror]
#define EDEADLOCK EDEADLK
In file included from toolchain-powerpc_8540_gcc-8.4.0_musl/include/errno.h:10,
from libbpf.c:26:
toolchain-powerpc_8540_gcc-8.4.0_musl/include/bits/errno.h:58: note: this is the location of the previous definition
#define EDEADLOCK 58
cc1: all warnings being treated as errors
Cc: Stable <stable@vger.kernel.org>
Reported-by: Rosen Penev <rosenp@gmail.com>
Signed-off-by: Tony Ambardar <Tony.Ambardar@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200917135437.1238787-1-Tony.Ambardar@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Regularly, when a new header is created in include/uapi/, the developer
forgets to add it in the corresponding Kbuild file. This error is usually
detected after the release is out.
In fact, all headers under uapi directories should be exported, thus it's
useless to have an exhaustive list.
After this patch, the following files, which were not exported, are now
exported (with make headers_install_all):
asm-arc/kvm_para.h
asm-arc/ucontext.h
asm-blackfin/shmparam.h
asm-blackfin/ucontext.h
asm-c6x/shmparam.h
asm-c6x/ucontext.h
asm-cris/kvm_para.h
asm-h8300/shmparam.h
asm-h8300/ucontext.h
asm-hexagon/shmparam.h
asm-m32r/kvm_para.h
asm-m68k/kvm_para.h
asm-m68k/shmparam.h
asm-metag/kvm_para.h
asm-metag/shmparam.h
asm-metag/ucontext.h
asm-mips/hwcap.h
asm-mips/reg.h
asm-mips/ucontext.h
asm-nios2/kvm_para.h
asm-nios2/ucontext.h
asm-openrisc/shmparam.h
asm-parisc/kvm_para.h
asm-powerpc/perf_regs.h
asm-sh/kvm_para.h
asm-sh/ucontext.h
asm-tile/shmparam.h
asm-unicore32/shmparam.h
asm-unicore32/ucontext.h
asm-x86/hwcap2.h
asm-xtensa/kvm_para.h
drm/armada_drm.h
drm/etnaviv_drm.h
drm/vgem_drm.h
linux/aspeed-lpc-ctrl.h
linux/auto_dev-ioctl.h
linux/bcache.h
linux/btrfs_tree.h
linux/can/vxcan.h
linux/cifs/cifs_mount.h
linux/coresight-stm.h
linux/cryptouser.h
linux/fsmap.h
linux/genwqe/genwqe_card.h
linux/hash_info.h
linux/kcm.h
linux/kcov.h
linux/kfd_ioctl.h
linux/lightnvm.h
linux/module.h
linux/nbd-netlink.h
linux/nilfs2_api.h
linux/nilfs2_ondisk.h
linux/nsfs.h
linux/pr.h
linux/qrtr.h
linux/rpmsg.h
linux/sched/types.h
linux/sed-opal.h
linux/smc.h
linux/smc_diag.h
linux/stm.h
linux/switchtec_ioctl.h
linux/vfio_ccw.h
linux/wil6210_uapi.h
rdma/bnxt_re-abi.h
Note that I have removed from this list the files which are generated in every
exported directories (like .install or .install.cmd).
Thanks to Julien Floret <julien.floret@6wind.com> for the tip to get all
subdirs with a pure makefile command.
For the record, note that exported files for asm directories are a mix of
files listed by:
- include/uapi/asm-generic/Kbuild.asm;
- arch/<arch>/include/uapi/asm/Kbuild;
- arch/<arch>/include/asm/Kbuild.
Change-Id: I132df74f736b8f35f77390eaa12804e74ef536ee
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Acked-by: Russell King <rmk+kernel@armlinux.org.uk>
Acked-by: Mark Salter <msalter@redhat.com>
Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Git-Commit: fcc8487d477a3452a1d0ccbdd4c5e0e1e3cb8bed
Git-Repo: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
[bharad@codeaurora.org: resolve trivial merge conflicts]
Signed-off-by: Naitik Bharadiya <bharad@codeaurora.org>
[schikk@codeaurora.org: resolve trivial merge conflicts]
Signed-off-by: Swetha Chikkaboraiah <schikk@codeaurora.org>
|
|
commit 0d808df06a44200f52262b6eb72bcb6042f5a7c5 upstream.
When switching from/to a guest that has a transaction in progress,
we need to save/restore the checkpointed register state. Although
XER is part of the CPU state that gets checkpointed, the code that
does this saving and restoring doesn't save/restore XER.
This fixes it by saving and restoring the XER. To allow userspace
to read/write the checkpointed XER value, we also add a new ONE_REG
specifier.
The visible effect of this bug is that the guest may see its XER
value being corrupted when it uses transactions.
Fixes: e4e38121507a ("KVM: PPC: Book3S HV: Add transactional memory support")
Fixes: 0a8eccefcb34 ("KVM: PPC: Book3S HV: Add missing code for transaction reclaim on guest exit")
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
This reverts commit 9d6fd2c3e9fcfb ("Merge remote-tracking branch
'msm-4.4/tmp-510d0a3f' into msm-4.4"), because it breaks the
dump parsing tools due to kernel can be loaded anywhere in the memory
now and not fixed at linear mapping.
Change-Id: Id416f0a249d803442847d09ac47781147b0d0ee6
Signed-off-by: Trilok Soni <tsoni@codeaurora.org>
|
|
commit 6997e57d693b07289694239e52a10d2f02c3a46f upstream.
The REAL_LE feature entry in the ibm_pa_feature struct is missing an MMU
feature value, meaning all the remaining elements initialise the wrong
values.
This means instead of checking for byte 5, bit 0, we check for byte 0,
bit 0, and then we incorrectly set the CPU feature bit as well as MMU
feature bit 1 and CPU user feature bits 0 and 2 (5).
Checking byte 0 bit 0 (IBM numbering), means we're looking at the
"Memory Management Unit (MMU)" feature - ie. does the CPU have an MMU.
In practice that bit is set on all platforms which have the property.
This means we set CPU_FTR_REAL_LE always. In practice that seems not to
matter because all the modern cpus which have this property also
implement REAL_LE, and we've never needed to disable it.
We're also incorrectly setting MMU feature bit 1, which is:
#define MMU_FTR_TYPE_8xx 0x00000002
Luckily the only place that looks for MMU_FTR_TYPE_8xx is in Book3E
code, which can't run on the same cpus as scan_features(). So this also
doesn't matter in practice.
Finally in the CPU user feature mask, we're setting bits 0 and 2. Bit 2
is not currently used, and bit 0 is:
#define PPC_FEATURE_PPC_LE 0x00000001
Which says the CPU supports the old style "PPC Little Endian" mode.
Again this should be harmless in practice as no 64-bit CPUs implement
that mode.
Fix the code by adding the missing initialisation of the MMU feature.
Also add a comment marking CPU user feature bit 2 (0x4) as reserved. It
would be unsafe to start using it as old kernels incorrectly set it.
Fixes: 44ae3ab3358e ("powerpc: Free up some CPU feature bits by moving out MMU-related features")
Signed-off-by: Anton Blanchard <anton@samba.org>
[mpe: Flesh out changelog, add comment reserving 0x4]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit a61674bdfc7c2bf909c4010699607b62b69b7bec upstream.
GCC 6 will include changes to generated code with -mcmodel=large,
which is used to build kernel modules on powerpc64le. This was
necessary because the large model is supposed to allow arbitrary
sizes and locations of the code and data sections, but the ELFv2
global entry point prolog still made the unconditional assumption
that the TOC associated with any particular function can be found
within 2 GB of the function entry point:
func:
addis r2,r12,(.TOC.-func)@ha
addi r2,r2,(.TOC.-func)@l
.localentry func, .-func
To remove this assumption, GCC will now generate instead this global
entry point prolog sequence when using -mcmodel=large:
.quad .TOC.-func
func:
.reloc ., R_PPC64_ENTRY
ld r2, -8(r12)
add r2, r2, r12
.localentry func, .-func
The new .reloc triggers an optimization in the linker that will
replace this new prolog with the original code (see above) if the
linker determines that the distance between .TOC. and func is in
range after all.
Since this new relocation is now present in module object files,
the kernel module loader is required to handle them too. This
patch adds support for the new relocation and implements the
same optimization done by the GNU linker.
Signed-off-by: Ulrich Weigand <ulrich.weigand@de.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
This partially reverts commit a34236155afb1cc41945e58388ac988431bcb0b8.
While reviewing the glibc patch to exploit the individual IPC calls,
Arnd & Andreas noticed that we were still requiring userspace to pass
IPC_64 in order to get the new style IPC API.
With a bit of cleanup in the kernel we can drop that requirement, and
instead only provide the new style API, which will simplify things for
userspace.
Rather than try and sneak that patch into 4.4, instead we will drop the
individual IPC calls for powerpc, and merge them again in 4.5 once the
cleanup patch has gone in.
Because we've already added sys_mlock2() as syscall #378, we don't do a
full revert of the IPC calls. Instead we drop the __NR #defines, and
send those now undefined syscall numbers to sys_ni_syscall(). This
leaves a gap in the syscall numbers, but we'll reuse them when we merge
the individual IPC calls.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Acked-by: Arnd Bergmann <arnd@arndb.de>
|
|
The selftest passes on 64-bit LE and 32-bit BE.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
The previous patch introduced a flag that specified pages in a VMA should
be placed on the unevictable LRU, but they should not be made present when
the area is created. This patch adds the ability to set this state via
the new mlock system calls.
We add MLOCK_ONFAULT for mlock2 and MCL_ONFAULT for mlockall.
MLOCK_ONFAULT will set the VM_LOCKONFAULT modifier for VM_LOCKED.
MCL_ONFAULT should be used as a modifier to the two other mlockall flags.
When used with MCL_CURRENT, all current mappings will be marked with
VM_LOCKED | VM_LOCKONFAULT. When used with MCL_FUTURE, the mm->def_flags
will be marked with VM_LOCKED | VM_LOCKONFAULT. When used with both
MCL_CURRENT and MCL_FUTURE, all current mappings and mm->def_flags will be
marked with VM_LOCKED | VM_LOCKONFAULT.
Prior to this patch, mlockall() will unconditionally clear the
mm->def_flags any time it is called without MCL_FUTURE. This behavior is
maintained after adding MCL_ONFAULT. If a call to mlockall(MCL_FUTURE) is
followed by mlockall(MCL_CURRENT), the mm->def_flags will be cleared and
new VMAs will be unlocked. This remains true with or without MCL_ONFAULT
in either mlockall() invocation.
munlock() will unconditionally clear both vma flags. munlockall()
unconditionally clears for VMA flags on all VMAs and in the mm->def_flags
field.
Signed-off-by: Eric B Munson <emunson@akamai.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Shuah Khan <shuahkh@osg.samsung.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
This patch provides individual system call numbers for the following
System V IPC system calls, on PowerPC, so that they do not need to be
multiplexed:
* semop, semget, semctl, semtimedop
* msgsnd, msgrcv, msgget, msgctl
* shmat, shmdt, shmget, shmctl
Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
The selftest passes on 64-bit LE & BE, and 32-bit.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
The selftest passes on 64-bit LE and BE.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
This adds include/uapi/asm/eeh.h to kbuild so that the header
file will be exported automatically with below command. The
header file was added by commit ed3e81ff2016 ("powerpc/eeh: Move PE
state constants around")
make INSTALL_HDR_PATH=/tmp/headers \
SRCARCH=powerpc headers_install
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
Currently on powerpc we have our own #define for the highest (negative)
errno value, called _LAST_ERRNO. This is defined to be 516, for reasons
which are not clear.
The generic code, and x86, use MAX_ERRNO, which is defined to be 4095.
In particular seccomp uses MAX_ERRNO to restrict the value that a
seccomp filter can return.
Currently with the mismatch between _LAST_ERRNO and MAX_ERRNO, a seccomp
tracer wanting to return 600, expecting it to be seen as an error, would
instead find on powerpc that userspace sees a successful syscall with a
return value of 600.
To avoid this inconsistency, switch powerpc to use MAX_ERRNO.
We are somewhat confident that generic syscalls that can return a
non-error value above negative MAX_ERRNO have already been updated to
use force_successful_syscall_return().
I have also checked all the powerpc specific syscalls, and believe that
none of them expect to return a non-error value between -MAX_ERRNO and
-516. So this change should be safe ...
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Kees Cook <keescook@chromium.org>
|
|
Commit ce48b2100785 "powerpc: Add VSX context save/restore, ptrace and
signal support" expanded the 'vmx_reserve' array element to contain 101
double words, but the comment block above was not updated.
Also reorder the constants in the array size declaration to reflect the
logic mentioned in the comment block above. This change helps in
explaining how the HW registers are represented in the array. But no
functional change.
Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
[mpe: Reworded change log and added whitespace around +'s]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
This patch changes the syscall handler to doom (tabort) active
transactions when a syscall is made and return very early without
performing the syscall and keeping side effects to a minimum (no CPU
accounting or system call tracing is performed). Also included is a
new HWCAP2 bit, PPC_FEATURE2_HTM_NOSC, to indicate this
behaviour to userspace.
Currently, the system call instruction automatically suspends an
active transaction which causes side effects to persist when an active
transaction fails.
This does change the kernel's behaviour, but in a way that was
documented as unsupported. It doesn't reduce functionality as
syscalls will still be performed after tsuspend; it just requires that
the transaction be explicitly suspended. It also provides a
consistent interface and makes the behaviour of user code
substantially the same across powerpc and platforms that do not
support suspended transactions (e.g. x86 and s390).
Performance measurements using
http://ozlabs.org/~anton/junkcode/null_syscall.c indicate the cost of
a normal (non-aborted) system call increases by about 0.25%.
Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
We'll want to build the opal-prd daemon with the prd headers, so include
this in the uapi headers list.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
This change adds a char device to access the "PRD" (processor runtime
diagnostics) channel to OPAL firmware.
Includes contributions from Vaidyanathan Srinivasan, Neelesh Gupta &
Vishal Kulkarni.
Signed-off-by: Neelesh Gupta <neelegup@linux.vnet.ibm.com>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Acked-by: Stewart Smith <stewart@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
The patch defines PCI error types and functions in uapi/asm/eeh.h
and exports function eeh_pe_inject_err(), which will be called by
VFIO driver to inject the specified PCI error to the indicated
PE for testing purpose.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
There are two equivalent sets of PE state constants, defined in
arch/powerpc/include/asm/eeh.h and include/uapi/linux/vfio.h.
Though the names are different, their corresponding values are
exactly same. The former is used by EEH core and the latter is
used by userspace.
The patch moves those constants from arch/powerpc/include/asm/eeh.h
to arch/powerpc/include/uapi/asm/eeh.h, which are expected to be
used by userspace from now on. We can't delete those constants in
vfio.h as it's uncertain that those constants have been or will be
used by userspace.
Suggested-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
This reverts commit feba40362b11341bee6d8ed58d54b896abbd9f84.
Although the principle of this change is good, the implementation has a
few issues.
Firstly we can sometimes fail to abort a syscall because r12 may have
been clobbered by C code if we went down the virtual CPU accounting
path, or if syscall tracing was enabled.
Secondly we have decided that it is safer to abort the syscall even
earlier in the syscall entry path, so that we avoid the syscall tracing
path when we are transactional.
So that we have time to thoroughly test those changes we have decided to
revert this for this merge window and will merge the fixed version in
the next window.
NB. Rather than reverting the selftest we just drop tm-syscall from
TEST_PROGS so that it's not run by default.
Fixes: feba40362b11 ("powerpc/tm: Abort syscalls in active transactions")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
Switch to using the newly created asm-generic/seccomp.h for the seccomp
strict mode syscall definitions. The obsolete sigreturn in COMPAT mode is
retained as an override. Remaining definitions are identical, though they
incorrectly appeared in uapi, which has been corrected.
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
This patch changes the syscall handler to doom (tabort) active
transactions when a syscall is made and return immediately without
performing the syscall.
Currently, the system call instruction automatically suspends an
active transaction which causes side effects to persist when an active
transaction fails.
This does change the kernel's behaviour, but in a way that was
documented as unsupported. It doesn't reduce functionality because
syscalls will still be performed after tsuspend. It also provides a
consistent interface and makes the behaviour of user code
substantially the same across powerpc and platforms that do not
support suspended transactions (e.g. x86 and s390).
Performance measurements using
http://ozlabs.org/~anton/junkcode/null_syscall.c
indicate the cost of a system call increases by about 0.5%.
Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com>
Acked-By: Michael Neuling <mikey@neuling.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
We currently have a "special" syscall for switching endianness. This is
syscall number 0x1ebe, which is handled explicitly in the 64-bit syscall
exception entry.
That has a few problems, firstly the syscall number is outside of the
usual range, which confuses various tools. For example strace doesn't
recognise the syscall at all.
Secondly it's handled explicitly as a special case in the syscall
exception entry, which is complicated enough without it.
As a first step toward removing the special syscall, we need to add a
regular syscall that implements the same functionality.
The logic is simple, it simply toggles the MSR_LE bit in the userspace
MSR. This is the same as the special syscall, with the caveat that the
special syscall clobbers fewer registers.
This version clobbers r9-r12, XER, CTR, and CR0-1,5-7.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
As our various loops (copy, string, crypto etc) get more complicated,
we want to share implementations between userspace (eg glibc) and
the kernel. We also want to write userspace test harnesses to put
in tools/testing/selftest.
One gratuitous difference between userspace and the kernel is the
VMX register definitions - the kernel uses vrX whereas both gcc and
glibc use vX.
Change the kernel to match userspace.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
Wire up sys_execveat(). This passes the selftests for the system call.
Check success of execveat(3, '../execveat', 0)... [OK]
Check success of execveat(5, 'execveat', 0)... [OK]
Check success of execveat(6, 'execveat', 0)... [OK]
Check success of execveat(-100, '/home/pranith/linux/...ftests/exec/execveat', 0)... [OK]
Check success of execveat(99, '/home/pranith/linux/...ftests/exec/execveat', 0)... [OK]
Check success of execveat(8, '', 4096)... [OK]
Check success of execveat(17, '', 4096)... [OK]
Check success of execveat(9, '', 4096)... [OK]
Check success of execveat(14, '', 4096)... [OK]
Check success of execveat(14, '', 4096)... [OK]
Check success of execveat(15, '', 4096)... [OK]
Check failure of execveat(8, '', 0) with ENOENT... [OK]
Check failure of execveat(8, '(null)', 4096) with EFAULT... [OK]
Check success of execveat(5, 'execveat.symlink', 0)... [OK]
Check success of execveat(6, 'execveat.symlink', 0)... [OK]
Check success of execveat(-100, '/home/pranith/linux/...xec/execveat.symlink', 0)... [OK]
Check success of execveat(10, '', 4096)... [OK]
Check success of execveat(10, '', 4352)... [OK]
Check failure of execveat(5, 'execveat.symlink', 256) with ELOOP... [OK]
Check failure of execveat(6, 'execveat.symlink', 256) with ELOOP... [OK]
Check failure of execveat(-100, '/home/pranith/linux/tools/testing/selftests/exec/execveat.symlink', 256) with ELOOP... [OK]
Check success of execveat(3, '../script', 0)... [OK]
Check success of execveat(5, 'script', 0)... [OK]
Check success of execveat(6, 'script', 0)... [OK]
Check success of execveat(-100, '/home/pranith/linux/...elftests/exec/script', 0)... [OK]
Check success of execveat(13, '', 4096)... [OK]
Check success of execveat(13, '', 4352)... [OK]
Check failure of execveat(18, '', 4096) with ENOENT... [OK]
Check failure of execveat(7, 'script', 0) with ENOENT... [OK]
Check success of execveat(16, '', 4096)... [OK]
Check success of execveat(16, '', 4096)... [OK]
Check success of execveat(4, '../script', 0)... [OK]
Check success of execveat(4, 'script', 0)... [OK]
Check success of execveat(4, '../script', 0)... [OK]
Check failure of execveat(4, 'script', 0) with ENOENT... [OK]
Check failure of execveat(5, 'execveat', 65535) with EINVAL... [OK]
Check failure of execveat(5, 'no-such-file', 0) with ENOENT... [OK]
Check failure of execveat(6, 'no-such-file', 0) with ENOENT... [OK]
Check failure of execveat(-100, 'no-such-file', 0) with ENOENT... [OK]
Check failure of execveat(5, '', 4096) with EACCES... [OK]
Check failure of execveat(5, 'Makefile', 0) with EACCES... [OK]
Check failure of execveat(11, '', 4096) with EACCES... [OK]
Check failure of execveat(12, '', 4096) with EACCES... [OK]
Check failure of execveat(99, '', 4096) with EBADF... [OK]
Check failure of execveat(99, 'execveat', 0) with EBADF... [OK]
Check failure of execveat(8, 'execveat', 0) with ENOTDIR... [OK]
Invoke copy of 'execveat' via filename of length 4093:
Check success of execveat(19, '', 4096)... [OK]
Check success of execveat(5, 'xxxxxxxxxxxxxxxxxxxx...yyyyyyyyyyyyyyyyyyyy', 0)... [OK]
Invoke copy of 'script' via filename of length 4093:
Check success of execveat(20, '', 4096)... [OK]
/bin/sh: 0: Can't open /dev/fd/5/xxxxxxx(... a long line of x's and y's, 0)... [OK]
Check success of execveat(5, 'xxxxxxxxxxxxxxxxxxxx...yyyyyyyyyyyyyyyyyyyy', 0)... [OK]
Tested on a 32-bit powerpc system.
Signed-off-by: Pranith Kumar <bobby.prani@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
introduce new setsockopt() command:
setsockopt(sock, SOL_SOCKET, SO_ATTACH_BPF, &prog_fd, sizeof(prog_fd))
where prog_fd was received from syscall bpf(BPF_PROG_LOAD, attr, ...)
and attr->prog_type == BPF_PROG_TYPE_SOCKET_FILTER
setsockopt() calls bpf_prog_get() which increments refcnt of the program,
so it doesn't get unloaded while socket is using the program.
The same eBPF program can be attached to multiple sockets.
User task exit automatically closes socket which calls sk_filter_uncharge()
which decrements refcnt of eBPF program
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Alternative to RPS/RFS is to use hardware support for multiple
queues.
Then split a set of million of sockets into worker threads, each
one using epoll() to manage events on its own socket pool.
Ideally, we want one thread per RX/TX queue/cpu, but we have no way to
know after accept() or connect() on which queue/cpu a socket is managed.
We normally use one cpu per RX queue (IRQ smp_affinity being properly
set), so remembering on socket structure which cpu delivered last packet
is enough to solve the problem.
After accept(), connect(), or even file descriptor passing around
processes, applications can use :
int cpu;
socklen_t len = sizeof(cpu);
getsockopt(fd, SOL_SOCKET, SO_INCOMING_CPU, &cpu, &len);
And use this information to put the socket into the right silo
for optimal performance, as all networking stack should run
on the appropriate cpu, without need to send IPI (RPS/RFS).
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch wires up the new syscall sys_bpf() on powerpc.
Passes the tests in samples/bpf:
#0 add+sub+mul OK
#1 unreachable OK
#2 unreachable2 OK
#3 out of range jump OK
#4 out of range jump2 OK
#5 test1 ld_imm64 OK
#6 test2 ld_imm64 OK
#7 test3 ld_imm64 OK
#8 test4 ld_imm64 OK
#9 test5 ld_imm64 OK
#10 no bpf_exit OK
#11 loop (back-edge) OK
#12 loop2 (back-edge) OK
#13 conditional loop OK
#14 read uninitialized register OK
#15 read invalid register OK
#16 program doesn't init R0 before exit OK
#17 stack out of bounds OK
#18 invalid call insn1 OK
#19 invalid call insn2 OK
#20 invalid function call OK
#21 uninitialized stack1 OK
#22 uninitialized stack2 OK
#23 check valid spill/fill OK
#24 check corrupted spill/fill OK
#25 invalid src register in STX OK
#26 invalid dst register in STX OK
#27 invalid dst register in ST OK
#28 invalid src register in LDX OK
#29 invalid dst register in LDX OK
#30 junk insn OK
#31 junk insn2 OK
#32 junk insn3 OK
#33 junk insn4 OK
#34 junk insn5 OK
#35 misaligned read from stack OK
#36 invalid map_fd for function call OK
#37 don't check return value before access OK
#38 access memory with incorrect alignment OK
#39 sometimes access memory with incorrect alignment OK
#40 jump test 1 OK
#41 jump test 2 OK
#42 jump test 3 OK
#43 jump test 4 OK
Signed-off-by: Pranith Kumar <bobby.prani@gmail.com>
[mpe: test using samples/bpf]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
Move ONE_REG AltiVec support to powerpc generic layer.
Signed-off-by: Mihai Caraman <mihai.caraman@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
|
|
Signed-off-by: Bharat Bhushan <Bharat.Bhushan@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
|
|
This patch wires up three new syscalls for powerpc. The three
new syscalls are seccomp, getrandom and memfd_create.
Signed-off-by: Pranith Kumar <bobby.prani@gmail.com>
Reviewed-by: David Herrmann <dh.herrmann@gmail.com>
|
|
Unfortunately, the LPCR got defined as a 32-bit register in the
one_reg interface. This is unfortunate because KVM allows userspace
to control the DPFD (default prefetch depth) field, which is in the
upper 32 bits. The result is that DPFD always get set to 0, which
reduces performance in the guest.
We can't just change KVM_REG_PPC_LPCR to be a 64-bit register ID,
since that would break existing userspace binaries. Instead we define
a new KVM_REG_PPC_LPCR_64 id which is 64-bit. Userspace can still use
the old KVM_REG_PPC_LPCR id, but it now only modifies those fields in
the bottom 32 bits that userspace can modify (ILE, TC and AIL).
If userspace uses the new KVM_REG_PPC_LPCR_64 id, it can modify DPFD
as well.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Cc: stable@vger.kernel.org
Signed-off-by: Alexander Graf <agraf@suse.de>
|
|
We now support SPRG9 for guest, so also add a one reg interface for same
Note: Changes are in bookehv code only as we do not have SPRG9 on booke-pr.
Signed-off-by: Bharat Bhushan <Bharat.Bhushan@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
|
|
The Vector Crypto category instructions are supported by current POWER8
chips, advertise them to userspace using a specific bit to properly
differentiate with chips of the same architecture level that might not
have them.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
CC: <stable@vger.kernel.org> [v3.10+]
|
|
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
Commit b005255e12a3 ("KVM: PPC: Book3S HV: Context-switch new POWER8
SPRs") added a definition of KVM_REG_PPC_WORT with the same register
number as the existing KVM_REG_PPC_VRSAVE (though in fact the
definitions are not identical because of the different register sizes.)
For clarity, this moves KVM_REG_PPC_WORT to the next unused number,
and also adds it to api.txt.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
|
|
Old guests try to use the magic page, but map their trampoline code inside
of an NX region.
Since we can't fix those old kernels, try to detect whether the guest is sane
or not. If not, just disable NX functionality in KVM so that old guests at
least work at all. For newer guests, add a bit that we can set to keep NX
functionality available.
Signed-off-by: Alexander Graf <agraf@suse.de>
|
|
The arch/powerpc/include/asm/linkage.h is being unintentionally exported
in the kernel headers since commit e1b5bb6d1236 (consolidate
cond_syscall and SYSCALL_ALIAS declarations) when
arch/powerpc/include/uapi/asm/linkage.h was deleted but the header-y not
removed from the Kbuild file. This happens because Makefile.headersinst
still checks the old asm/ directory if the specified header doesn't
exist in the uapi directory.
The asm/linkage.h shouldn't ever have been exported anyway. No other
arch does and it doesn't contain anything useful to userland, so remove
the header-y line from the Kbuild file which triggers the export.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: David Howells <dhowells@redhat.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
I've had a report that the current limit is too small for
an automated network based installer. Bump it.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
The new ELF ABI tends to use R_PPC64_REL16_LO and R_PPC64_REL16_HA
relocations (PC-relative), so implement them.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
|
|
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
|
|
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
This adds the software abort code defines for transactional memory (TM).
These values are from PAPR.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
|
|
The DABRX (DABR extension) register on POWER7 processors provides finer
control over which accesses cause a data breakpoint interrupt. It
contains 3 bits which indicate whether to enable accesses in user,
kernel and hypervisor modes respectively to cause data breakpoint
interrupts, plus one bit that enables both real mode and virtual mode
accesses to cause interrupts. Currently, KVM sets DABRX to allow
both kernel and user accesses to cause interrupts while in the guest.
This adds support for the guest to specify other values for DABRX.
PAPR defines a H_SET_XDABR hcall to allow the guest to set both DABR
and DABRX with one call. This adds a real-mode implementation of
H_SET_XDABR, which shares most of its code with the existing H_SET_DABR
implementation. To support this, we add a per-vcpu field to store the
DABRX value plus code to get and set it via the ONE_REG interface.
For Linux guests to use this new hcall, userspace needs to add
"hcall-xdabr" to the set of strings in the /chosen/hypertas-functions
property in the device tree. If userspace does this and then migrates
the guest to a host where the kernel doesn't include this patch, then
userspace will need to implement H_SET_XDABR by writing the specified
DABR value to the DABR using the ONE_REG interface. In that case, the
old kernel will set DABRX to DABRX_USER | DABRX_KERNEL. That should
still work correctly, at least for Linux guests, since Linux guests
cope with getting data breakpoint interrupts in modes that weren't
requested by just ignoring the interrupt, and Linux guests never set
DABRX_BTI.
The other thing this does is to make H_SET_DABR and H_SET_XDABR work
on POWER8, which has the DAWR and DAWRX instead of DABR/X. Guests that
know about POWER8 should use H_SET_MODE rather than H_SET_[X]DABR, but
guests running in POWER7 compatibility mode will still use H_SET_[X]DABR.
For them, this adds the logic to convert DABR/X values into DAWR/X values
on POWER8.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
|
|
This adds fields to the struct kvm_vcpu_arch to store the new
guest-accessible SPRs on POWER8, adds code to the get/set_one_reg
functions to allow userspace to access this state, and adds code to
the guest entry and exit to context-switch these SPRs between host
and guest.
Note that DPDES (Directed Privileged Doorbell Exception State) is
shared between threads on a core; hence we store it in struct
kvmppc_vcore and have the master thread save and restore it.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
|
|
For user space packet capturing libraries such as libpcap, there's
currently only one way to check which BPF extensions are supported
by the kernel, that is, commit aa1113d9f85d ("net: filter: return
-EINVAL if BPF_S_ANC* operation is not supported"). For querying all
extensions at once this might be rather inconvenient.
Therefore, this patch introduces a new option which can be used as
an argument for getsockopt(), and allows one to obtain information
about which BPF extensions are supported by the current kernel.
As David Miller suggests, we do not need to define any bits right
now and status quo can just return 0 in order to state that this
versions supports SKF_AD_PROTOCOL up to SKF_AD_PAY_OFFSET. Later
additions to BPF extensions need to add their bits to the
bpf_tell_extensions() function, as documented in the comment.
Signed-off-by: Michal Sekletar <msekleta@redhat.com>
Cc: David Miller <davem@davemloft.net>
Reviewed-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch adds the debug stub support on booke/bookehv.
Now QEMU debug stub can use hw breakpoint, watchpoint and
software breakpoint to debug guest.
This is how we save/restore debug register context when switching
between guest, userspace and kernel user-process:
When QEMU is running
-> thread->debug_reg == QEMU debug register context.
-> Kernel will handle switching the debug register on context switch.
-> no vcpu_load() called
QEMU makes ioctls (except RUN)
-> This will call vcpu_load()
-> should not change context.
-> Some ioctls can change vcpu debug register, context saved in vcpu->debug_regs
QEMU Makes RUN ioctl
-> Save thread->debug_reg on STACK
-> Store thread->debug_reg == vcpu->debug_reg
-> load thread->debug_reg
-> RUN VCPU ( So thread points to vcpu context )
Context switch happens When VCPU running
-> makes vcpu_load() should not load any context
-> kernel loads the vcpu context as thread->debug_regs points to vcpu context.
On heavyweight_exit
-> Load the context saved on stack in thread->debug_reg
Currently we do not support debug resource emulation to guest,
On debug exception, always exit to user space irrespective of
user space is expecting the debug exception or not. If this is
unexpected exception (breakpoint/watchpoint event not set by
userspace) then let us leave the action on user space. This
is similar to what it was before, only thing is that now we
have proper exit state available to user space.
Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
|
|
"ehpriv 1" instruction is used for setting software breakpoints
by user space. This patch adds support to exit to user space
with "run->debug" have relevant information.
As this is the first point we are using run->debug, also defined
the run->debug structure.
Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
|