android_kernel_zuk_msm8996.git

	Commit message (Collapse)	Author	Age
...
\| \| \| * \| \| \|	selinux: cleanup and consolidate the XFRM alloc/clone/delete/free code	Paul Moore	2013-07-25
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The SELinux labeled IPsec code state management functions have been long neglected and could use some cleanup and consolidation. Signed-off-by: Paul Moore <pmoore@redhat.com> Signed-off-by: Eric Paris <eparis@redhat.com>
\| \| \| * \| \| \|	lsm: split the xfrm_state_alloc_security() hook implementation	Paul Moore	2013-07-25
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The xfrm_state_alloc_security() LSM hook implementation is really a multiplexed hook with two different behaviors depending on the arguments passed to it by the caller. This patch splits the LSM hook implementation into two new hook implementations, which match the LSM hooks in the rest of the kernel: * xfrm_state_alloc * xfrm_state_alloc_acquire Also included in this patch are the necessary changes to the SELinux code; no other LSMs are affected. Signed-off-by: Paul Moore <pmoore@redhat.com> Signed-off-by: Eric Paris <eparis@redhat.com>
\| * \| \| \| \| \|	KEYS: initialize root uid and session keyrings early	Mimi Zohar	2013-09-25
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In order to create the integrity keyrings (eg. _evm, _ima), root's uid and session keyrings need to be initialized early. Signed-off-by: Mimi Zohar <zohar@us.ibm.com> Signed-off-by: David Howells <dhowells@redhat.com>
\| * \| \| \| \| \|	KEYS: Add a 'trusted' flag and a 'trusted only' flag	David Howells	2013-09-25
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add KEY_FLAG_TRUSTED to indicate that a key either comes from a trusted source or had a cryptographic signature chain that led back to a trusted key the kernel already possessed. Add KEY_FLAGS_TRUSTED_ONLY to indicate that a keyring will only accept links to keys marked with KEY_FLAGS_TRUSTED. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Kees Cook <keescook@chromium.org>
\| * \| \| \| \| \|	KEYS: Add per-user_namespace registers for persistent per-UID kerberos caches	David Howells	2013-09-24
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add support for per-user_namespace registers of persistent per-UID kerberos caches held within the kernel. This allows the kerberos cache to be retained beyond the life of all a user's processes so that the user's cron jobs can work. The kerberos cache is envisioned as a keyring/key tree looking something like: struct user_namespace \___ .krb_cache keyring - The register \___ _krb.0 keyring - Root's Kerberos cache \___ _krb.5000 keyring - User 5000's Kerberos cache \___ _krb.5001 keyring - User 5001's Kerberos cache \___ tkt785 big_key - A ccache blob \___ tkt12345 big_key - Another ccache blob Or possibly: struct user_namespace \___ .krb_cache keyring - The register \___ _krb.0 keyring - Root's Kerberos cache \___ _krb.5000 keyring - User 5000's Kerberos cache \___ _krb.5001 keyring - User 5001's Kerberos cache \___ tkt785 keyring - A ccache \___ krbtgt/REDHAT.COM@REDHAT.COM big_key \___ http/REDHAT.COM@REDHAT.COM user \___ afs/REDHAT.COM@REDHAT.COM user \___ nfs/REDHAT.COM@REDHAT.COM user \___ krbtgt/KERNEL.ORG@KERNEL.ORG big_key \___ http/KERNEL.ORG@KERNEL.ORG big_key What goes into a particular Kerberos cache is entirely up to userspace. Kernel support is limited to giving you the Kerberos cache keyring that you want. The user asks for their Kerberos cache by: krb_cache = keyctl_get_krbcache(uid, dest_keyring); The uid is -1 or the user's own UID for the user's own cache or the uid of some other user's cache (requires CAP_SETUID). This permits rpc.gssd or whatever to mess with the cache. The cache returned is a keyring named "_krb.<uid>" that the possessor can read, search, clear, invalidate, unlink from and add links to. Active LSMs get a chance to rule on whether the caller is permitted to make a link. Each uid's cache keyring is created when it first accessed and is given a timeout that is extended each time this function is called so that the keyring goes away after a while. The timeout is configurable by sysctl but defaults to three days. Each user_namespace struct gets a lazily-created keyring that serves as the register. The cache keyrings are added to it. This means that standard key search and garbage collection facilities are available. The user_namespace struct's register goes away when it does and anything left in it is then automatically gc'd. Signed-off-by: David Howells <dhowells@redhat.com> Tested-by: Simo Sorce <simo@redhat.com> cc: Serge E. Hallyn <serge.hallyn@ubuntu.com> cc: Eric W. Biederman <ebiederm@xmission.com>
\| * \| \| \| \| \|	KEYS: Implement a big key type that can save to tmpfs	David Howells	2013-09-24
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Implement a big key type that can save its contents to tmpfs and thus swapspace when memory is tight. This is useful for Kerberos ticket caches. Signed-off-by: David Howells <dhowells@redhat.com> Tested-by: Simo Sorce <simo@redhat.com>
\| * \| \| \| \| \|	KEYS: Expand the capacity of a keyring	David Howells	2013-09-24
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Expand the capacity of a keyring to be able to hold a lot more keys by using the previously added associative array implementation. Currently the maximum capacity is: (PAGE_SIZE - sizeof(header)) / sizeof(struct key *) which, on a 64-bit system, is a little more 500. However, since this is being used for the NFS uid mapper, we need more than that. The new implementation gives us effectively unlimited capacity. With some alterations, the keyutils testsuite runs successfully to completion after this patch is applied. The alterations are because (a) keyrings that are simply added to no longer appear ordered and (b) some of the errors have changed a bit. Signed-off-by: David Howells <dhowells@redhat.com>
\| * \| \| \| \| \|	KEYS: Drop the permissions argument from __keyring_search_one()	David Howells	2013-09-24
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Drop the permissions argument from __keyring_search_one() as the only caller passes 0 here - which causes all checks to be skipped. Signed-off-by: David Howells <dhowells@redhat.com>
\| * \| \| \| \| \|	KEYS: Define a __key_get() wrapper to use rather than atomic_inc()	David Howells	2013-09-24
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Define a __key_get() wrapper to use rather than atomic_inc() on the key usage count as this makes it easier to hook in refcount error debugging. Signed-off-by: David Howells <dhowells@redhat.com>
\| * \| \| \| \| \|	KEYS: Search for auth-key by name rather than target key ID	David Howells	2013-09-24
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Search for auth-key by name rather than by target key ID as, in a future patch, we'll by searching directly by index key in preference to iteration over all keys. Signed-off-by: David Howells <dhowells@redhat.com>
\| * \| \| \| \| \|	KEYS: Introduce a search context structure	David Howells	2013-09-24
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Search functions pass around a bunch of arguments, each of which gets copied with each call. Introduce a search context structure to hold these. Whilst we're at it, create a search flag that indicates whether the search should be directly to the description or whether it should iterate through all keys looking for a non-description match. This will be useful when keyrings use a generic data struct with generic routines to manage their content as the search terms can just be passed through to the iterator callback function. Also, for future use, the data to be supplied to the match function is separated from the description pointer in the search context. This makes it clear which is being supplied. Signed-off-by: David Howells <dhowells@redhat.com>
\| * \| \| \| \| \|	KEYS: Consolidate the concept of an 'index key' for key access	David Howells	2013-09-24
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Consolidate the concept of an 'index key' for accessing keys. The index key is the search term needed to find a key directly - basically the key type and the key description. We can add to that the description length. This will be useful when turning a keyring into an associative array rather than just a pointer block. Signed-off-by: David Howells <dhowells@redhat.com>
\| * \| \| \| \| \|	KEYS: key_is_dead() should take a const key pointer argument	David Howells	2013-09-24
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	key_is_dead() should take a const key pointer argument as it doesn't modify what it points to. Signed-off-by: David Howells <dhowells@redhat.com>
\| * \| \| \| \| \|	KEYS: Use bool in make_key_ref() and is_key_possessed()	David Howells	2013-09-24
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Make make_key_ref() take a bool possession parameter and make is_key_possessed() return a bool. Signed-off-by: David Howells <dhowells@redhat.com>
\| * \| \| \| \| \|	KEYS: Skip key state checks when checking for possession	David Howells	2013-09-24
\| \| \|_\|/ / / \| \|/\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Skip key state checks (invalidation, revocation and expiration) when checking for possession. Without this, keys that have been marked invalid, revoked keys and expired keys are not given a possession attribute - which means the possessor is not granted any possession permits and cannot do anything with them unless they also have one a user, group or other permit. This causes failures in the keyutils test suite's revocation and expiration tests now that commit 96b5c8fea6c0861621051290d705ec2e971963f1 reduced the initial permissions granted to a key. The failures are due to accesses to revoked and expired keys being given EACCES instead of EKEYREVOKED or EKEYEXPIRED. Signed-off-by: David Howells <dhowells@redhat.com>
\| * \| \| \| \|	security: remove erroneous comment about capabilities.o link ordering	Eric Paris	2013-09-24
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Back when we had half ass LSM stacking we had to link capabilities.o after bigger LSMs so that on initialization the bigger LSM would register first and the capabilities module would be the one stacked as the 'seconday'. Somewhere around 6f0f0fd496333777d53 (back in 2008) we finally removed the last of the kinda module stacking code but this comment in the makefile still lives today. Reported-by: Valdis Kletnieks <Valdis.Kletnieks@vt.edu> Signed-off-by: Eric Paris <eparis@redhat.com> Signed-off-by: James Morris <james.l.morris@oracle.com>
* \| \| \| \| \|	Merge git://git.infradead.org/users/eparis/audit	Linus Torvalds	2013-11-21
\|\ \ \ \ \ \ \| \| \|_\|_\|_\|/ \| \|/\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Pull audit updates from Eric Paris: "Nothing amazing. Formatting, small bug fixes, couple of fixes where we didn't get records due to some old VFS changes, and a change to how we collect execve info..." Fixed conflict in fs/exec.c as per Eric and linux-next. * git://git.infradead.org/users/eparis/audit: (28 commits) audit: fix type of sessionid in audit_set_loginuid() audit: call audit_bprm() only once to add AUDIT_EXECVE information audit: move audit_aux_data_execve contents into audit_context union audit: remove unused envc member of audit_aux_data_execve audit: Kill the unused struct audit_aux_data_capset audit: do not reject all AUDIT_INODE filter types audit: suppress stock memalloc failure warnings since already managed audit: log the audit_names record type audit: add child record before the create to handle case where create fails audit: use given values in tty_audit enable api audit: use nlmsg_len() to get message payload length audit: use memset instead of trying to initialize field by field audit: fix info leak in AUDIT_GET requests audit: update AUDIT_INODE filter rule to comparator function audit: audit feature to set loginuid immutable audit: audit feature to only allow unsetting the loginuid audit: allow unsetting the loginuid (with priv) audit: remove CONFIG_AUDIT_LOGINUID_IMMUTABLE audit: loginuid functions coding style selinux: apply selinux checks on new audit message types ...
\| * \| \| \| \|	audit: suppress stock memalloc failure warnings since already managed	Richard Guy Briggs	2013-11-05
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Supress the stock memory allocation failure warnings for audit buffers since audit alreay takes care of memory allocation failure warnings, including rate-limiting, in audit_log_start(). Signed-off-by: Richard Guy Briggs <rgb@redhat.com> Signed-off-by: Eric Paris <eparis@redhat.com>
\| * \| \| \| \|	selinux: apply selinux checks on new audit message types	Eric Paris	2013-11-05
\| \| \|/ / / \| \|/\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We use the read check to get the feature set (like AUDIT_GET) and the write check to set the features (like AUDIT_SET). Signed-off-by: Eric Paris <eparis@redhat.com> Signed-off-by: Richard Guy Briggs <rgb@redhat.com> Signed-off-by: Eric Paris <eparis@redhat.com>
* \| \| \| \|	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next	Linus Torvalds	2013-11-13
\|\ \ \ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Pull networking updates from David Miller: 1) The addition of nftables. No longer will we need protocol aware firewall filtering modules, it can all live in userspace. At the core of nftables is a, for lack of a better term, virtual machine that executes byte codes to inspect packet or metadata (arriving interface index, etc.) and make verdict decisions. Besides support for loading packet contents and comparing them, the interpreter supports lookups in various datastructures as fundamental operations. For example sets are supports, and therefore one could create a set of whitelist IP address entries which have ACCEPT verdicts attached to them, and use the appropriate byte codes to do such lookups. Since the interpreted code is composed in userspace, userspace can do things like optimize things before giving it to the kernel. Another major improvement is the capability of atomically updating portions of the ruleset. In the existing netfilter implementation, one has to update the entire rule set in order to make a change and this is very expensive. Userspace tools exist to create nftables rules using existing netfilter rule sets, but both kernel implementations will need to co-exist for quite some time as we transition from the old to the new stuff. Kudos to Patrick McHardy, Pablo Neira Ayuso, and others who have worked so hard on this. 2) Daniel Borkmann and Hannes Frederic Sowa made several improvements to our pseudo-random number generator, mostly used for things like UDP port randomization and netfitler, amongst other things. In particular the taus88 generater is updated to taus113, and test cases are added. 3) Support 64-bit rates in HTB and TBF schedulers, from Eric Dumazet and Yang Yingliang. 4) Add support for new 577xx tigon3 chips to tg3 driver, from Nithin Sujir. 5) Fix two fatal flaws in TCP dynamic right sizing, from Eric Dumazet, Neal Cardwell, and Yuchung Cheng. 6) Allow IP_TOS and IP_TTL to be specified in sendmsg() ancillary control message data, much like other socket option attributes. From Francesco Fusco. 7) Allow applications to specify a cap on the rate computed automatically by the kernel for pacing flows, via a new SO_MAX_PACING_RATE socket option. From Eric Dumazet. 8) Make the initial autotuned send buffer sizing in TCP more closely reflect actual needs, from Eric Dumazet. 9) Currently early socket demux only happens for TCP sockets, but we can do it for connected UDP sockets too. Implementation from Shawn Bohrer. 10) Refactor inet socket demux with the goal of improving hash demux performance for listening sockets. With the main goals being able to use RCU lookups on even request sockets, and eliminating the listening lock contention. From Eric Dumazet. 11) The bonding layer has many demuxes in it's fast path, and an RCU conversion was started back in 3.11, several changes here extend the RCU usage to even more locations. From Ding Tianhong and Wang Yufen, based upon suggestions by Nikolay Aleksandrov and Veaceslav Falico. 12) Allow stackability of segmentation offloads to, in particular, allow segmentation offloading over tunnels. From Eric Dumazet. 13) Significantly improve the handling of secret keys we input into the various hash functions in the inet hashtables, TCP fast open, as well as syncookies. From Hannes Frederic Sowa. The key fundamental operation is "net_get_random_once()" which uses static keys. Hannes even extended this to ipv4/ipv6 fragmentation handling and our generic flow dissector. 14) The generic driver layer takes care now to set the driver data to NULL on device removal, so it's no longer necessary for drivers to explicitly set it to NULL any more. Many drivers have been cleaned up in this way, from Jingoo Han. 15) Add a BPF based packet scheduler classifier, from Daniel Borkmann. 16) Improve CRC32 interfaces and generic SKB checksum iterators so that SCTP's checksumming can more cleanly be handled. Also from Daniel Borkmann. 17) Add a new PMTU discovery mode, IP_PMTUDISC_INTERFACE, which forces using the interface MTU value. This helps avoid PMTU attacks, particularly on DNS servers. From Hannes Frederic Sowa. 18) Use generic XPS for transmit queue steering rather than internal (re-)implementation in virtio-net. From Jason Wang. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1622 commits) random32: add test cases for taus113 implementation random32: upgrade taus88 generator to taus113 from errata paper random32: move rnd_state to linux/random.h random32: add prandom_reseed_late() and call when nonblocking pool becomes initialized random32: add periodic reseeding random32: fix off-by-one in seeding requirement PHY: Add RTL8201CP phy_driver to realtek xtsonic: add missing platform_set_drvdata() in xtsonic_probe() macmace: add missing platform_set_drvdata() in mace_probe() ethernet/arc/arc_emac: add missing platform_set_drvdata() in arc_emac_probe() ipv6: protect for_each_sk_fl_rcu in mem_check with rcu_read_lock_bh vlan: Implement vlan_dev_get_egress_qos_mask as an inline. ixgbe: add warning when max_vfs is out of range. igb: Update link modes display in ethtool netfilter: push reasm skb through instead of original frag skbs ip6_output: fragment outgoing reassembled skb properly MAINTAINERS: mv643xx_eth: take over maintainership from Lennart net_sched: tbf: support of 64bit rates ixgbe: deleting dfwd stations out of order can cause null ptr deref ixgbe: fix build err, num_rx_queues is only available with CONFIG_RPS ...
\| * \ \ \ \	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net	David S. Miller	2013-10-23
\| \|\ \ \ \ \ \| \| \| \|_\|_\|/ \| \| \|/\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Conflicts: drivers/net/usb/qmi_wwan.c include/net/dst.h Trivial merge conflicts, both were overlapping changes. Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \| \| \| \|	netfilter: pass hook ops to hookfn	Patrick McHardy	2013-10-14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Pass the hook ops to the hookfn to allow for generic hook functions. This change is required by nf_tables. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
\| * \| \| \| \|	net: fix build errors if ipv6 is disabled	Eric Dumazet	2013-10-09
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	CONFIG_IPV6=n is still a valid choice ;) It appears we can remove dead code. Reported-by: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \| \| \| \|	ipv6: make lookups simpler and faster	Eric Dumazet	2013-10-09
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	TCP listener refactoring, part 4 : To speed up inet lookups, we moved IPv4 addresses from inet to struct sock_common Now is time to do the same for IPv6, because it permits us to have fast lookups for all kind of sockets, including upcoming SYN_RECV. Getting IPv6 addresses in TCP lookups currently requires two extra cache lines, plus a dereference (and memory stall). inet6_sk(sk) does the dereference of inet_sk(__sk)->pinet6 This patch is way bigger than its IPv4 counter part, because for IPv4, we could add aliases (inet_daddr, inet_rcv_saddr), while on IPv6, it's not doable easily. inet6_sk(sk)->daddr becomes sk->sk_v6_daddr inet6_sk(sk)->rcv_saddr becomes sk->sk_v6_rcv_saddr And timewait socket also have tw->tw_v6_daddr & tw->tw_v6_rcv_saddr at the same offset. We get rid of INET6_TW_MATCH() as INET6_MATCH() is now the generic macro. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \| \| \| \|	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net	David S. Miller	2013-10-01
\| \|\ \ \ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Conflicts: drivers/net/ethernet/emulex/benet/be.h drivers/net/usb/qmi_wwan.c drivers/net/wireless/brcm80211/brcmfmac/dhd_bus.h include/net/netfilter/nf_conntrack_synproxy.h include/net/secure_seq.h The conflicts are of two varieties: 1) Conflicts with Joe Perches's 'extern' removal from header file function declarations. Usually it's an argument signature change or a function being added/removed. The resolutions are trivial. 2) Some overlapping changes in qmi_wwan.c and be.h, one commit adds a new value, another changes an existing value. That sort of thing. Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \| \| \| \| \|	net ipv4: Convert ipv4.ip_local_port_range to be per netns v3	Eric W. Biederman	2013-09-30
\| \| \|_\|_\|/ / \| \|/\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	- Move sysctl_local_ports from a global variable into struct netns_ipv4. - Modify inet_get_local_port_range to take a struct net, and update all of the callers. - Move the initialization of sysctl_local_ports into sysctl_net_ipv4.c:ipv4_sysctl_init_net from inet_connection_sock.c v2: - Ensure indentation used tabs - Fixed ip.h so it applies cleanly to todays net-next v3: - Compile fixes of strange callers of inet_get_local_port_range. This patch now successfully passes an allmodconfig build. Removed manual inlining of inet_get_local_port_range in ipv4_local_port_range Originally-by: Samya <samya@twitter.com> Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* \| \| \| \| \|	Merge branch 'for-3.13' of ↵	Linus Torvalds	2013-11-13
\|\ \ \ \ \ \ \| \|_\|_\|/ / / \|/\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull cgroup changes from Tejun Heo: "Not too much activity this time around. css_id is finally killed and a minor update to device_cgroup" * 'for-3.13' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: device_cgroup: remove can_attach cgroup: kill css_id memcg: stop using css id memcg: fail to create cgroup if the cgroup id is too big memcg: convert to use cgroup id memcg: convert to use cgroup_is_descendant()
\| * \| \| \| \|	device_cgroup: remove can_attach	Serge Hallyn	2013-10-24
\| \|/ / / / \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	It is really only wanting to duplicate a check which is already done by the cgroup subsystem. With this patch, user jdoe still cannot move pid 1 into a devices cgroup he owns, but now he can move his own other tasks into devices cgroups. Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com> Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Aristeu Rozanski <aris@redhat.com>
* \| \| \| \|	apparmor: fix bad lock balance when introspecting policy	John Johansen	2013-10-16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	BugLink: http://bugs.launchpad.net/bugs/1235977 The profile introspection seq file has a locking bug when policy is viewed from a virtual root (task in a policy namespace), introspection from the real root is not affected. The test for root while (parent) { is correct for the real root, but incorrect for tasks in a policy namespace. This allows the task to walk backup the policy tree past its virtual root causing it to be unlocked before the virtual root should be in the p_stop fn. This results in the following lockdep back trace: [ 78.479744] [ BUG: bad unlock balance detected! ] [ 78.479792] 3.11.0-11-generic #17 Not tainted [ 78.479838] ------------------------------------- [ 78.479885] grep/2223 is trying to release lock (&ns->lock) at: [ 78.479952] [<ffffffff817bf3be>] mutex_unlock+0xe/0x10 [ 78.480002] but there are no more locks to release! [ 78.480037] [ 78.480037] other info that might help us debug this: [ 78.480037] 1 lock held by grep/2223: [ 78.480037] #0: (&p->lock){+.+.+.}, at: [<ffffffff812111bd>] seq_read+0x3d/0x3d0 [ 78.480037] [ 78.480037] stack backtrace: [ 78.480037] CPU: 0 PID: 2223 Comm: grep Not tainted 3.11.0-11-generic #17 [ 78.480037] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 [ 78.480037] ffffffff817bf3be ffff880007763d60 ffffffff817b97ef ffff8800189d2190 [ 78.480037] ffff880007763d88 ffffffff810e1c6e ffff88001f044730 ffff8800189d2190 [ 78.480037] ffffffff817bf3be ffff880007763e00 ffffffff810e5bd6 0000000724fe56b7 [ 78.480037] Call Trace: [ 78.480037] [<ffffffff817bf3be>] ? mutex_unlock+0xe/0x10 [ 78.480037] [<ffffffff817b97ef>] dump_stack+0x54/0x74 [ 78.480037] [<ffffffff810e1c6e>] print_unlock_imbalance_bug+0xee/0x100 [ 78.480037] [<ffffffff817bf3be>] ? mutex_unlock+0xe/0x10 [ 78.480037] [<ffffffff810e5bd6>] lock_release_non_nested+0x226/0x300 [ 78.480037] [<ffffffff817bf2fe>] ? __mutex_unlock_slowpath+0xce/0x180 [ 78.480037] [<ffffffff817bf3be>] ? mutex_unlock+0xe/0x10 [ 78.480037] [<ffffffff810e5d5c>] lock_release+0xac/0x310 [ 78.480037] [<ffffffff817bf2b3>] __mutex_unlock_slowpath+0x83/0x180 [ 78.480037] [<ffffffff817bf3be>] mutex_unlock+0xe/0x10 [ 78.480037] [<ffffffff81376c91>] p_stop+0x51/0x90 [ 78.480037] [<ffffffff81211408>] seq_read+0x288/0x3d0 [ 78.480037] [<ffffffff811e9d9e>] vfs_read+0x9e/0x170 [ 78.480037] [<ffffffff811ea8cc>] SyS_read+0x4c/0xa0 [ 78.480037] [<ffffffff817ccc9d>] system_call_fastpath+0x1a/0x1f Signed-off-by: John Johansen <john.johansen@canonical.com> Signed-off-by: James Morris <james.l.morris@oracle.com>
* \| \| \| \|	apparmor: fix memleak of the profile hash	John Johansen	2013-10-16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	BugLink: http://bugs.launchpad.net/bugs/1235523 This fixes the following kmemleak trace: unreferenced object 0xffff8801e8c35680 (size 32): comm "apparmor_parser", pid 691, jiffies 4294895667 (age 13230.876s) hex dump (first 32 bytes): e0 d3 4e b5 ac 6d f4 ed 3f cb ee 48 1c fd 40 cf ..N..m..?..H..@. 5b cc e9 93 00 00 00 00 00 00 00 00 00 00 00 00 [............... backtrace: [<ffffffff817a97ee>] kmemleak_alloc+0x4e/0xb0 [<ffffffff811ca9f3>] __kmalloc+0x103/0x290 [<ffffffff8138acbc>] aa_calc_profile_hash+0x6c/0x150 [<ffffffff8138074d>] aa_unpack+0x39d/0xd50 [<ffffffff8137eced>] aa_replace_profiles+0x3d/0xd80 [<ffffffff81376937>] profile_replace+0x37/0x50 [<ffffffff811e9f2d>] vfs_write+0xbd/0x1e0 [<ffffffff811ea96c>] SyS_write+0x4c/0xa0 [<ffffffff817ccb1d>] system_call_fastpath+0x1a/0x1f [<ffffffffffffffff>] 0xffffffffffffffff Signed-off-by: John Johansen <john.johansen@canonical.com> Signed-off-by: James Morris <james.l.morris@oracle.com>
* \| \| \| \|	selinux: remove 'flags' parameter from avc_audit()	Linus Torvalds	2013-10-04
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Now avc_audit() has no more users with that parameter. Remove it. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* \| \| \| \|	selinux: avc_has_perm_flags has no more users	Linus Torvalds	2013-10-04
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	.. so get rid of it. The only indirect users were all the avc_has_perm() callers which just expanded to have a zero flags argument. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* \| \| \| \|	selinux: remove 'flags' parameter from inode_has_perm	Linus Torvalds	2013-10-04
\| \|/ / / \|/\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Every single user passes in '0'. I think we had non-zero users back in some stone age when selinux_inode_permission() was implemented in terms of inode_has_perm(), but that complicated case got split up into a totally separate code-path so that we could optimize the much simpler special cases. See commit 2e33405785d3 ("SELinux: delay initialization of audit data in selinux_inode_permission") for example. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* \| \| \|	apparmor: fix suspicious RCU usage warning in policy.c/policy.h	John Johansen	2013-09-30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The recent 3.12 pull request for apparmor was missing a couple rcu _protected access modifiers. Resulting in the follow suspicious RCU usage [ 29.804534] [ INFO: suspicious RCU usage. ] [ 29.804539] 3.11.0+ #5 Not tainted [ 29.804541] ------------------------------- [ 29.804545] security/apparmor/include/policy.h:363 suspicious rcu_dereference_check() usage! [ 29.804548] [ 29.804548] other info that might help us debug this: [ 29.804548] [ 29.804553] [ 29.804553] rcu_scheduler_active = 1, debug_locks = 1 [ 29.804558] 2 locks held by apparmor_parser/1268: [ 29.804560] #0: (sb_writers#9){.+.+.+}, at: [<ffffffff81120a4c>] file_start_write+0x27/0x29 [ 29.804576] #1: (&ns->lock){+.+.+.}, at: [<ffffffff811f5d88>] aa_replace_profiles+0x166/0x57c [ 29.804589] [ 29.804589] stack backtrace: [ 29.804595] CPU: 0 PID: 1268 Comm: apparmor_parser Not tainted 3.11.0+ #5 [ 29.804599] Hardware name: ASUSTeK Computer Inc. UL50VT /UL50VT , BIOS 217 03/01/2010 [ 29.804602] 0000000000000000 ffff8800b95a1d90 ffffffff8144eb9b ffff8800b94db540 [ 29.804611] ffff8800b95a1dc0 ffffffff81087439 ffff880138cc3a18 ffff880138cc3a18 [ 29.804619] ffff8800b9464a90 ffff880138cc3a38 ffff8800b95a1df0 ffffffff811f5084 [ 29.804628] Call Trace: [ 29.804636] [<ffffffff8144eb9b>] dump_stack+0x4e/0x82 [ 29.804642] [<ffffffff81087439>] lockdep_rcu_suspicious+0xfc/0x105 [ 29.804649] [<ffffffff811f5084>] __aa_update_replacedby+0x53/0x7f [ 29.804655] [<ffffffff811f5408>] __replace_profile+0x11f/0x1ed [ 29.804661] [<ffffffff811f6032>] aa_replace_profiles+0x410/0x57c [ 29.804668] [<ffffffff811f16d4>] profile_replace+0x35/0x4c [ 29.804674] [<ffffffff81120fa3>] vfs_write+0xad/0x113 [ 29.804680] [<ffffffff81121609>] SyS_write+0x44/0x7a [ 29.804687] [<ffffffff8145bfd2>] system_call_fastpath+0x16/0x1b [ 29.804691] [ 29.804694] =============================== [ 29.804697] [ INFO: suspicious RCU usage. ] [ 29.804700] 3.11.0+ #5 Not tainted [ 29.804703] ------------------------------- [ 29.804706] security/apparmor/policy.c:566 suspicious rcu_dereference_check() usage! [ 29.804709] [ 29.804709] other info that might help us debug this: [ 29.804709] [ 29.804714] [ 29.804714] rcu_scheduler_active = 1, debug_locks = 1 [ 29.804718] 2 locks held by apparmor_parser/1268: [ 29.804721] #0: (sb_writers#9){.+.+.+}, at: [<ffffffff81120a4c>] file_start_write+0x27/0x29 [ 29.804733] #1: (&ns->lock){+.+.+.}, at: [<ffffffff811f5d88>] aa_replace_profiles+0x166/0x57c [ 29.804744] [ 29.804744] stack backtrace: [ 29.804750] CPU: 0 PID: 1268 Comm: apparmor_parser Not tainted 3.11.0+ #5 [ 29.804753] Hardware name: ASUSTeK Computer Inc. UL50VT /UL50VT , BIOS 217 03/01/2010 [ 29.804756] 0000000000000000 ffff8800b95a1d80 ffffffff8144eb9b ffff8800b94db540 [ 29.804764] ffff8800b95a1db0 ffffffff81087439 ffff8800b95b02b0 0000000000000000 [ 29.804772] ffff8800b9efba08 ffff880138cc3a38 ffff8800b95a1dd0 ffffffff811f4f94 [ 29.804779] Call Trace: [ 29.804786] [<ffffffff8144eb9b>] dump_stack+0x4e/0x82 [ 29.804791] [<ffffffff81087439>] lockdep_rcu_suspicious+0xfc/0x105 [ 29.804798] [<ffffffff811f4f94>] aa_free_replacedby_kref+0x4d/0x62 [ 29.804804] [<ffffffff811f4f47>] ? aa_put_namespace+0x17/0x17 [ 29.804810] [<ffffffff811f4f0b>] kref_put+0x36/0x40 [ 29.804816] [<ffffffff811f5423>] __replace_profile+0x13a/0x1ed [ 29.804822] [<ffffffff811f6032>] aa_replace_profiles+0x410/0x57c [ 29.804829] [<ffffffff811f16d4>] profile_replace+0x35/0x4c [ 29.804835] [<ffffffff81120fa3>] vfs_write+0xad/0x113 [ 29.804840] [<ffffffff81121609>] SyS_write+0x44/0x7a [ 29.804847] [<ffffffff8145bfd2>] system_call_fastpath+0x16/0x1b Reported-by: miles.lane@gmail.com CC: paulmck@linux.vnet.ibm.com Signed-off-by: John Johansen <john.johansen@canonical.com> Signed-off-by: James Morris <james.l.morris@oracle.com>
* \| \| \|	apparmor: Use shash crypto API interface for profile hashes	Tyler Hicks	2013-09-30
\|/ / / \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Use the shash interface, rather than the hash interface, when hashing AppArmor profiles. The shash interface does not use scatterlists and it is a better fit for what AppArmor needs. This fixes a kernel paging BUG when aa_calc_profile_hash() is passed a buffer from vmalloc(). The hash interface requires callers to handle vmalloc() buffers differently than what AppArmor was doing. Due to vmalloc() memory not being physically contiguous, each individual page behind the buffer must be assigned to a scatterlist with sg_set_page() and then the scatterlist passed to crypto_hash_update(). The shash interface does not have that limitation and allows vmalloc() and kmalloc() buffers to be handled in the same manner. BugLink: https://launchpad.net/bugs/1216294/ BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=62261 Signed-off-by: Tyler Hicks <tyhicks@canonical.com> Acked-by: Seth Arnold <seth.arnold@canonical.com> Signed-off-by: John Johansen <john.johansen@canonical.com> Signed-off-by: James Morris <james.l.morris@oracle.com>
* \| \|	Merge branch 'for-linus' of ↵	Linus Torvalds	2013-09-07
\|\ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull namespace changes from Eric Biederman: "This is an assorted mishmash of small cleanups, enhancements and bug fixes. The major theme is user namespace mount restrictions. nsown_capable is killed as it encourages not thinking about details that need to be considered. A very hard to hit pid namespace exiting bug was finally tracked and fixed. A couple of cleanups to the basic namespace infrastructure. Finally there is an enhancement that makes per user namespace capabilities usable as capabilities, and an enhancement that allows the per userns root to nice other processes in the user namespace" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: userns: Kill nsown_capable it makes the wrong thing easy capabilities: allow nice if we are privileged pidns: Don't have unshare(CLONE_NEWPID) imply CLONE_THREAD userns: Allow PR_CAPBSET_DROP in a user namespace. namespaces: Simplify copy_namespaces so it is clear what is going on. pidns: Fix hang in zap_pid_ns_processes by sending a potentially extra wakeup sysfs: Restrict mounting sysfs userns: Better restrictions on when proc and sysfs can be mounted vfs: Don't copy mount bind mounts of /proc/<pid>/ns/mnt between namespaces kernel/nsproxy.c: Improving a snippet of code. proc: Restrict mounting the proc filesystem vfs: Lock in place mounts from more privileged users
\| * \| \|	capabilities: allow nice if we are privileged	Serge Hallyn	2013-08-30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We allow task A to change B's nice level if it has a supserset of B's privileges, or of it has CAP_SYS_NICE. Also allow it if A has CAP_SYS_NICE with respect to B - meaning it is root in the same namespace, or it created B's namespace. Signed-off-by: Serge Hallyn <serge.hallyn@canonical.com> Reviewed-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
\| * \| \|	userns: Allow PR_CAPBSET_DROP in a user namespace.	Eric W. Biederman	2013-08-30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	As the capabilites and capability bounding set are per user namespace properties it is safe to allow changing them with just CAP_SETPCAP permission in the user namespace. Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Tested-by: Richard Weinberger <richard@nod.at> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
* \| \| \|	Merge branch 'next' of ↵	Linus Torvalds	2013-09-07
\|\ \ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security Pull security subsystem updates from James Morris: "Nothing major for this kernel, just maintenance updates" * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (21 commits) apparmor: add the ability to report a sha1 hash of loaded policy apparmor: export set of capabilities supported by the apparmor module apparmor: add the profile introspection file to interface apparmor: add an optional profile attachment string for profiles apparmor: add interface files for profiles and namespaces apparmor: allow setting any profile into the unconfined state apparmor: make free_profile available outside of policy.c apparmor: rework namespace free path apparmor: update how unconfined is handled apparmor: change how profile replacement update is done apparmor: convert profile lists to RCU based locking apparmor: provide base for multiple profiles to be replaced at once apparmor: add a features/policy dir to interface apparmor: enable users to query whether apparmor is enabled apparmor: remove minimum size check for vmalloc() Smack: parse multiple rules per write to load2, up to PAGE_SIZE-1 bytes Smack: network label match fix security: smack: add a hash table to quicken smk_find_entry() security: smack: fix memleak in smk_write_rules_list() xattr: Constify ->name member of "struct xattr". ...
\| * \ \ \	Merge branch 'smack-for-3.12' of git://git.gitorious.org/smack-next/kernel ↵	James Morris	2013-08-23
\| \|\ \ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	into ra-next
\| \| * \| \| \|	Smack: parse multiple rules per write to load2, up to PAGE_SIZE-1 bytes	Rafal Krypa	2013-08-12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Smack interface for loading rules has always parsed only single rule from data written to it. This requires user program to call one write() per each rule it wants to load. This change makes it possible to write multiple rules, separated by new line character. Smack will load at most PAGE_SIZE-1 characters and properly return number of processed bytes. In case when user buffer is larger, it will be additionally truncated. All characters after last \n will not get parsed to avoid partial rule near input buffer boundary. Signed-off-by: Rafal Krypa <r.krypa@samsung.com>
\| \| * \| \| \|	Smack: network label match fix	Casey Schaufler	2013-08-01
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The Smack code that matches incoming CIPSO tags with Smack labels reaches through the NetLabel interfaces and compares the network data with the CIPSO header associated with a Smack label. This was done in a ill advised attempt to optimize performance. It works so long as the categories fit in a single capset, but this isn't always the case. This patch changes the Smack code to use the appropriate NetLabel interfaces to compare the incoming CIPSO header with the CIPSO header associated with a label. It will always match the CIPSO headers correctly. Targeted for git://git.gitorious.org/smack-next/kernel.git Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
\| \| * \| \| \|	security: smack: add a hash table to quicken smk_find_entry()	Tomasz Stanislawski	2013-08-01
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Accepted for the smack-next tree after changing the number of slots from 128 to 16. This patch adds a hash table to quicken searching of a smack label by its name. Basically, the patch improves performance of SMACK initialization. Parsing of rules involves translation from a string to a smack_known (aka label) entity which is done in smk_find_entry(). The current implementation of the function iterates over a global list of smack_known resulting in O(N) complexity for smk_find_entry(). The total complexity of SMACK initialization becomes O(rules * labels). Therefore it scales quadratically with a complexity of a system. Applying the patch reduced the complexity of smk_find_entry() to O(1) as long as number of label is in hundreds. If the number of labels is increased please update SMACK_HASH_SLOTS constant defined in security/smack/smack.h. Introducing the configuration of this constant with Kconfig or cmdline might be a good idea. The size of the hash table was adjusted experimentally. The rule set used by TIZEN contains circa 17K rules for 500 labels. The table above contains results of SMACK initialization using 'time smackctl apply' bash command. The 'Ref' is a kernel without this patch applied. The consecutive values refers to value of SMACK_HASH_SLOTS. Every measurement was repeated three times to reduce noise. \| Ref \| 1 \| 2 \| 4 \| 8 \| 16 \| 32 \| 64 \| 128 \| 256 \| 512 -------------------------------------------------------------------------------------------- Run1 \| 1.156 \| 1.096 \| 0.883 \| 0.764 \| 0.692 \| 0.667 \| 0.649 \| 0.633 \| 0.634 \| 0.629 \| 0.620 Run2 \| 1.156 \| 1.111 \| 0.885 \| 0.764 \| 0.694 \| 0.661 \| 0.649 \| 0.651 \| 0.634 \| 0.638 \| 0.623 Run3 \| 1.160 \| 1.107 \| 0.886 \| 0.764 \| 0.694 \| 0.671 \| 0.661 \| 0.638 \| 0.631 \| 0.624 \| 0.638 AVG \| 1.157 \| 1.105 \| 0.885 \| 0.764 \| 0.693 \| 0.666 \| 0.653 \| 0.641 \| 0.633 \| 0.630 \| 0.627 Surprisingly, a single hlist is slightly faster than a double-linked list. The speed-up saturates near 64 slots. Therefore I chose value 128 to provide some margin if more labels were used. It looks that IO becomes a new bottleneck. Signed-off-by: Tomasz Stanislawski <t.stanislaws@samsung.com>
\| \| * \| \| \|	security: smack: fix memleak in smk_write_rules_list()	Tomasz Stanislawski	2013-08-01
\| \| \|/ / / \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The smack_parsed_rule structure is allocated. If a rule is successfully installed then the last reference to the object is lost. This patch fixes this leak. Moreover smack_parsed_rule is allocated on stack because it no longer needed ofter smk_write_rules_list() is finished. Signed-off-by: Tomasz Stanislawski <t.stanislaws@samsung.com>
\| * \| \| \|	apparmor: add the ability to report a sha1 hash of loaded policy	John Johansen	2013-08-14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Provide userspace the ability to introspect a sha1 hash value for each profile currently loaded. Signed-off-by: John Johansen <john.johansen@canonical.com> Acked-by: Seth Arnold <seth.arnold@canonical.com>
\| * \| \| \|	apparmor: export set of capabilities supported by the apparmor module	John Johansen	2013-08-14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Signed-off-by: John Johansen <john.johansen@canonical.com> Acked-by: Seth Arnold <seth.arnold@canonical.com>
\| * \| \| \|	apparmor: add the profile introspection file to interface	John Johansen	2013-08-14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add the dynamic namespace relative profiles file to the interace, to allow introspection of loaded profiles and their modes. Signed-off-by: John Johansen <john.johansen@canonical.com> Acked-by: Kees Cook <kees@ubuntu.com>
\| * \| \| \|	apparmor: add an optional profile attachment string for profiles	John Johansen	2013-08-14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add the ability to take in and report a human readable profile attachment string for profiles so that attachment specifications can be easily inspected. Signed-off-by: John Johansen <john.johansen@canonical.com> Acked-by: Seth Arnold <seth.arnold@canonical.com>
\| * \| \| \|	apparmor: add interface files for profiles and namespaces	John Johansen	2013-08-14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add basic interface files to access namespace and profile information. The interface files are created when a profile is loaded and removed when the profile or namespace is removed. Signed-off-by: John Johansen <john.johansen@canonical.com>
\| * \| \| \|	apparmor: allow setting any profile into the unconfined state	John Johansen	2013-08-14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Allow emulating the default profile behavior from boot, by allowing loading of a profile in the unconfined state into a new NS. Signed-off-by: John Johansen <john.johansen@canonical.com> Acked-by: Seth Arnold <seth.arnold@canonical.com>