summaryrefslogtreecommitdiff
path: root/include/linux/sunrpc (unfollow)
Commit message (Collapse)Author
2021-02-23SUNRPC: Move simple_get_bytes and simple_get_netobj into private headerDave Wysochanski
[ Upstream commit ba6dfce47c4d002d96cd02a304132fca76981172 ] Remove duplicated helper functions to parse opaque XDR objects and place inside new file net/sunrpc/auth_gss/auth_gss_internal.h. In the new file carry the license and copyright from the source file net/sunrpc/auth_gss/auth_gss.c. Finally, update the comment inside include/linux/sunrpc/xdr.h since lockd is not the only user of struct xdr_netobj. Signed-off-by: Dave Wysochanski <dwysocha@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-06-20sunrpc: clean up properly in gss_mech_unregister()NeilBrown
commit 24c5efe41c29ee3e55bcf5a1c9f61ca8709622e8 upstream. gss_mech_register() calls svcauth_gss_register_pseudoflavor() for each flavour, but gss_mech_unregister() does not call auth_domain_put(). This is unbalanced and makes it impossible to reload the module. Change svcauth_gss_register_pseudoflavor() to return the registered auth_domain, and save it for later release. Cc: stable@vger.kernel.org (v2.6.12+) Link: https://bugzilla.kernel.org/show_bug.cgi?id=206651 Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-05-10sunrpc: Update RPCBIND_MAXNETIDLENChuck Lever
commit 4b9c7f9db9a003f5c342184dc4401c1b7f2efb39 upstream. Commit 176e21ee2ec8 ("SUNRPC: Support for RPC over AF_LOCAL transports") added a 5-character netid, but did not bump RPCBIND_MAXNETIDLEN from 4 to 5. Fixes: 176e21ee2ec8 ("SUNRPC: Support for RPC over AF_LOCAL ...") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-11-25SUNRPC: Fix priority queue fairnessTrond Myklebust
[ Upstream commit f42f7c283078ce3c1e8368b140e270755b1ae313 ] Fix up the priority queue to not batch by owner, but by queue, so that we allow '1 << priority' elements to be dequeued before switching to the next priority queue. The owner field is still used to wake up requests in round robin order by owner to avoid single processes hogging the RPC layer by loading the queues. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-01-16sunrpc: use-after-free in svc_process_common()Vasily Averin
commit d4b09acf924b84bae77cad090a9d108e70b43643 upstream. if node have NFSv41+ mounts inside several net namespaces it can lead to use-after-free in svc_process_common() svc_process_common() /* Setup reply header */ rqstp->rq_xprt->xpt_ops->xpo_prep_reply_hdr(rqstp); <<< HERE svc_process_common() can use incorrect rqstp->rq_xprt, its caller function bc_svc_process() takes it from serv->sv_bc_xprt. The problem is that serv is global structure but sv_bc_xprt is assigned per-netnamespace. According to Trond, the whole "let's set up rqstp->rq_xprt for the back channel" is nothing but a giant hack in order to work around the fact that svc_process_common() uses it to find the xpt_ops, and perform a couple of (meaningless for the back channel) tests of xpt_flags. All we really need in svc_process_common() is to be able to run rqstp->rq_xprt->xpt_ops->xpo_prep_reply_hdr() Bruce J Fields points that this xpo_prep_reply_hdr() call is an awfully roundabout way just to do "svc_putnl(resv, 0);" in the tcp case. This patch does not initialiuze rqstp->rq_xprt in bc_svc_process(), now it calls svc_process_common() with rqstp->rq_xprt = NULL. To adjust reply header svc_process_common() just check rqstp->rq_prot and calls svc_tcp_prep_reply_hdr() for tcp case. To handle rqstp->rq_xprt = NULL case in functions called from svc_process_common() patch intruduces net namespace pointer svc_rqst->rq_bc_net and adjust SVC_NET() definition. Some other function was also adopted to properly handle described case. Signed-off-by: Vasily Averin <vvs@virtuozzo.com> Cc: stable@vger.kernel.org Fixes: 23c20ecd4475 ("NFS: callback up - users counting cleanup") Signed-off-by: J. Bruce Fields <bfields@redhat.com> v2: - added lost extern svc_tcp_prep_reply_hdr() - dropped trace_svc_process() changes - context fixes in svc_process_common() Signed-off-by: Vasily Averin <vvs@virtuozzo.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-02-01SUNRPC: cleanup ida information when removing sunrpc moduleKinglong Mee
commit c929ea0b910355e1876c64431f3d5802f95b3d75 upstream. After removing sunrpc module, I get many kmemleak information as, unreferenced object 0xffff88003316b1e0 (size 544): comm "gssproxy", pid 2148, jiffies 4294794465 (age 4200.081s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<ffffffffb0cfb58a>] kmemleak_alloc+0x4a/0xa0 [<ffffffffb03507fe>] kmem_cache_alloc+0x15e/0x1f0 [<ffffffffb0639baa>] ida_pre_get+0xaa/0x150 [<ffffffffb0639cfd>] ida_simple_get+0xad/0x180 [<ffffffffc06054fb>] nlmsvc_lookup_host+0x4ab/0x7f0 [lockd] [<ffffffffc0605e1d>] lockd+0x4d/0x270 [lockd] [<ffffffffc06061e5>] param_set_timeout+0x55/0x100 [lockd] [<ffffffffc06cba24>] svc_defer+0x114/0x3f0 [sunrpc] [<ffffffffc06cbbe7>] svc_defer+0x2d7/0x3f0 [sunrpc] [<ffffffffc06c71da>] rpc_show_info+0x8a/0x110 [sunrpc] [<ffffffffb044a33f>] proc_reg_write+0x7f/0xc0 [<ffffffffb038e41f>] __vfs_write+0xdf/0x3c0 [<ffffffffb0390f1f>] vfs_write+0xef/0x240 [<ffffffffb0392fbd>] SyS_write+0xad/0x130 [<ffffffffb0d06c37>] entry_SYSCALL_64_fastpath+0x1a/0xa9 [<ffffffffffffffff>] 0xffffffffffffffff I found, the ida information (dynamic memory) isn't cleanup. Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Fixes: 2f048db4680a ("SUNRPC: Add an identifier for struct rpc_clnt") Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-07-27nfsd4/rpc: move backchannel create logic into rpc codeJ. Bruce Fields
commit d50039ea5ee63c589b0434baa5ecf6e5075bb6f9 upstream. Also simplify the logic a bit. Signed-off-by: J. Bruce Fields <bfields@redhat.com> Acked-by: Trond Myklebust <trondmy@primarydata.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-11-02NFS: Enable client side NFSv4.1 backchannel to use other transportsChuck Lever
Forechannel transports get their own "bc_up" method to create an endpoint for the backchannel service. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> [Anna Schumaker: Add forward declaration of struct net to xprt.h] Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2015-11-02svcrdma: Add backward direction service for RPC/RDMA transportChuck Lever
On NFSv4.1 mount points, the Linux NFS client uses this transport endpoint to receive backward direction calls and route replies back to the NFSv4.1 server. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Acked-by: "J. Bruce Fields" <bfields@fieldses.org> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Tested-By: Devesh Sharma <devesh.sharma@avagotech.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2015-11-02SUNRPC: Abstract backchannel operationsChuck Lever
xprt_{setup,destroy}_backchannel() won't be adequate for RPC/RMDA bi-direction. In particular, receive buffers have to be pre- registered and posted in order to receive incoming backchannel requests. Add a virtual function call to allow the insertion of appropriate backchannel setup and destruction methods for each transport. In addition, freeing a backchannel request is a little different for RPC/RDMA. Introduce an rpc_xprt_op to handle the difference. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Tested-By: Devesh Sharma <devesh.sharma@avagotech.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2015-10-28svcrdma: Port to new memory registration APISagi Grimberg
Instead of maintaining a fastreg page list, keep an sg table and convert an array of pages to a sg list. Then call ib_map_mr_sg and construct ib_reg_wr. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Acked-by: Christoph Hellwig <hch@lst.de> Tested-by: Steve Wise <swise@opengridcomputing.com> Tested-by: Selvin Xavier <selvin.xavier@avagotech.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-10-23sunrpc/cache: make cache flushing more reliable.Neil Brown
The caches used to store sunrpc authentication information can be flushed by writing a timestamp to a file in /proc. This timestamp has a one-second resolution and any entry in cache that was last_refreshed *before* that time is treated as expired. This is problematic as it is not possible to reliably flush the cache without interrupting NFS service. If the current time is written to the "flush" file, any entry that was added since the current second started will still be treated as valid. If one second beyond than the current time is written to the file then no entries can be valid until the second ticks over. This will mean that no NFS request will be handled for up to 1 second. To resolve this issue we make two changes: 1/ treat an entry as expired if the timestamp when it was last_refreshed is before *or the same as* the expiry time. This means that current code which writes out the current time will now flush the cache reliably. 2/ when a new entry in added to the cache - set the last_refresh timestamp to 1 second *beyond* the current flush time, when that not in the past. This ensures that newly added entries will always be valid. Now that we have a very reliable way to flush the cache, and also since we are using "since-boot" timestamps which are monotonic, change cache_purge() to set the smallest future flush_time which will work, and leave it there: don't revert to '1'. Also disable the setting of the 'flush_time' far into the future. That has never been useful and is now awkward as it would cause last_refresh times to be strange. Finally: if a request is made to set the 'flush_time' to the current second, assume the intent is to flush the cache and advance it, if necessary, to 1 second beyond the current 'flush_time' so that all active entries will be deemed to be expired. As part of this we need to add a 'cache_detail' arg to cache_init() and cache_fresh_locked() so they can find the current ->flush_time. Signed-off-by: NeilBrown <neilb@suse.com> Reported-by: Olaf Kirch <okir@suse.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-10-08SUNRPC: Move TCP receive data path into a workqueue contextTrond Myklebust
Stream protocols such as TCP can often build up a backlog of data to be read due to ordering. Combine this with the fact that some workloads such as NFS read()-intensive workloads need to receive a lot of data per RPC call, and it turns out that receiving the data from inside a softirq context can cause starvation. The following patch moves the TCP data receive into a workqueue context. We still end up calling tcp_read_sock(), but we do so from a process context, meaning that softirqs are enabled for most of the time. With this patch, I see a doubling of read bandwidth when running a multi-threaded iozone workload between a virtual client and server setup. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-09-17SUNRPC: Ensure that we wait for connections to complete before retryingTrond Myklebust
Commit 718ba5b87343, moved the responsibility for unlocking the socket to xs_tcp_setup_socket, meaning that the socket will be unlocked before we know that it has finished trying to connect. The following patch is based on an initial patch by Russell King to ensure that we delay clearing the XPRT_CONNECTING flag until we either know that we failed to initiate a connection attempt, or the connection attempt itself failed. Fixes: 718ba5b87343 ("SUNRPC: Add helpers to prevent socket create from racing") Reported-by: Russell King <linux@arm.linux.org.uk> Reported-by: Russell King <rmk+kernel@arm.linux.org.uk> Tested-by: Russell King <rmk+kernel@arm.linux.org.uk> Tested-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-08-28svcrdma: Use max_sge_rd for destination read depthsSteve Wise
Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-17SUNRPC: Drop double-underscores from __rpc_cmp_addr6()Trond Myklebust
Reported-by: kbuild test robot <fengguang.wu@intel.com> Fixes: 7b0ce60c0b20 ("SUNRPC: Drop double-underscores from rpc_cmp_addr{4|6}()") Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-08-17SUNRPC: Add an rpc_cmp_addr_port() functionAnna Schumaker
This function is to help determine if two sockaddrs are really the same socket. Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-08-17SUNRPC: Drop double-underscores from rpc_cmp_addr{4|6}()Anna Schumaker
I'm planning on using these functions inside the client, so remove the underscores to make it feel like I'm using a public interface. Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-08-13sunrpc: Switch to using hash list instead single listKinglong Mee
Switch using list_head for cache_head in cache_detail, it is useful of remove an cache_head entry directly from cache_detail. v8, using hash list, not head list Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Reviewed-by: NeilBrown <neilb@suse.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-08-13sunrpc/nfsd: Remove redundant code by exports seq_operations functionsKinglong Mee
Nfsd has implement a site of seq_operations functions as sunrpc's cache. Just exports sunrpc's codes, and remove nfsd's redundant codes. v8, same as v6 Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Reviewed-by: NeilBrown <neilb@suse.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-08-12sunrpc: increase UNX_MAXNODENAME from 32 to __NEW_UTS_LEN bytesJeff Layton
The current limit of 32 bytes artificially limits the name string that we end up stuffing into NFSv4.x client ID blobs. If you have multiple hosts with long hostnames that only differ near the end, then this can cause NFSv4 client ID collisions. Linux nodenames are actually limited to __NEW_UTS_LEN bytes (64), so use that as the limit instead. Also, use XDR_QUADLEN to specify the slack length, just for clarity and in case someone in the future changes this to something not evenly divisible by 4. Reported-by: Michael Skralivetsky <michael.skralivetsky@primarydata.com> Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-08-10nfsd/sunrpc: factor svc_rqst allocation and freeing from sv_nrthreads ↵Jeff Layton
refcounting In later patches, we'll want to be able to allocate and free svc_rqst structures without monkeying with the serv->sv_nrthreads refcount. Factor those pieces out of their respective functions. Signed-off-by: Shirley Ma <shirley.ma@oracle.com> Acked-by: Jeff Layton <jlayton@primarydata.com> Tested-by: Shirley Ma <shirley.ma@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-08-10nfsd/sunrpc: move pool_mode definitions into svc.hJeff Layton
In later patches, we're going to need to allow code external to svc.c to figure out what pool_mode is in use. Move these definitions into svc.h to prepare for that. Also, make the svc_pool_map object available and exported so that other modules can peek in there to get insight into what pool mode is in use. Likewise, export svc_pool_map_get/put function to make it safe to do so. Signed-off-by: Shirley Ma <shirley.ma@oracle.com> Acked-by: Jeff Layton <jlayton@primarydata.com> Tested-by: Shirley Ma <shirley.ma@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-08-10nfsd/sunrpc: abstract out svc_set_num_threads to sv_opsJeff Layton
Add an operation that will do setup of the service. In the case of a classic thread-based service that means starting up threads. In the case of a workqueue-based service, the setup will do something different. Signed-off-by: Shirley Ma <shirley.ma@oracle.com> Acked-by: Jeff Layton <jlayton@primarydata.com> Tested-by: Shirley Ma <shirliey.ma@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-08-10nfsd/sunrpc: turn enqueueing a svc_xprt into a svc_serv operationJeff Layton
For now, all services use svc_xprt_do_enqueue, but once we add workqueue-based service support, we'll need to do something different. Signed-off-by: Shirley Ma <shirley.ma@oracle.com> Acked-by: Jeff Layton <jlayton@primarydata.com> Tested-by: Shirley Ma <shirley.ma@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-08-10nfsd/sunrpc: move sv_module parm into sv_opsJeff Layton
...not technically an operation, but it's more convenient and cleaner to pass the module pointer in this struct. Signed-off-by: Shirley Ma <shirley.ma@oracle.com> Acked-by: Jeff Layton <jlayton@primarydata.com> Tested-by: Shirley Ma <shirley.ma@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-08-10nfsd/sunrpc: move sv_function into sv_opsJeff Layton
Since we now have a container for holding svc_serv operations, move the sv_function into it as well. Signed-off-by: Shirley Ma <shirley.ma@oracle.com> Acked-by: Jeff Layton <jlayton@primarydata.com> Tested-by: Shirley Ma <shirley.ma@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-08-10nfsd/sunrpc: add a new svc_serv_ops struct and move sv_shutdown into itJeff Layton
In later patches we'll need to abstract out more operations on a per-service level, besides sv_shutdown and sv_function. Declare a new svc_serv_ops struct to hold these operations, and move sv_shutdown into this struct. Signed-off-by: Shirley Ma <shirley.ma@oracle.com> Acked-by: Jeff Layton <jlayton@primarydata.com> Tested-by: Shirley Ma <shirley.ma@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-08-10svcrdma: Change maximum server payload back to RPCSVC_MAXPAYLOADChuck Lever
Both commit 0380a3f375 ("svcrdma: Add a separate "max data segs" macro for svcrdma") and commit 7e5be28827bf ("svcrdma: advertise the correct max payload") are incorrect. This commit reverts both changes, restoring the server's maximum payload size to 1MB. Commit 7e5be28827bf based the server's maximum payload on the _client's_ RPCRDMA_MAX_DATA_SEGS value. That was wrong. Commit 0380a3f375 tried to fix this so that the client maximum payload size could be raised without affecting the server, but managed to confuse matters more on the server side. More importantly, limiting the advertised maximum payload size was meant to be a workaround, not the actual fix. We need to revisit https://bugzilla.linux-nfs.org/show_bug.cgi?id=270 A Linux client on a platform with 64KB pages can overrun and crash an x86_64 NFS/RDMA server when the r/wsize is 1MB. An x86/64 Linux client seems to work fine using 1MB reads and writes when the Linux server's maximum payload size is restored to 1MB. BugLink: https://bugzilla.linux-nfs.org/show_bug.cgi?id=270 Fixes: 0380a3f375 ("svcrdma: Add a separate "max data segs" macro") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-08-05xprtrdma: Increase default credit limitChuck Lever
In preparation for similar increases on NFS/RDMA servers, bump the advertised credit limit for RPC/RDMA to 128. This allocates some extra resources, but the client will continue to allow only the number of RPCs in flight that the server requests via its advertised credit limit. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-By: Sagi Grimberg <sagig@mellanox.com> Tested-by: Devesh Sharma <devesh.sharma@avagotech.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2015-07-20svcrdma: Remove svc_rdma_fastreg()Chuck Lever
Commit 0bf4828983df ("svcrdma: refactor marshalling logic") removed the last call site for svc_rdma_fastreg(). Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-07-20svcrdma: Clean up svc_rdma_get_reply_array()Chuck Lever
Kernel coding conventions frown upon having large nontrivial functions in header files, and the preference these days is to allow the compiler to make inlining decisions if possible. As these functions are re-homed into a .c file, be sure that comparisons with fields in struct rpcrdma_msg are with be32 constants. This is a refactoring change; no behavior change is intended. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-06-12xprtrdma: Remove rpcrdma_ia::ri_memreg_strategyChuck Lever
Clean up: This field is no longer used. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Reviewed-by: Devesh Sharma <devesh.sharma@avagotech.com> Tested-By: Devesh Sharma <devesh.sharma@avagotech.com> Reviewed-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2015-06-10SUNRPC: Transport fault injectionChuck Lever
It has been exceptionally useful to exercise the logic that handles local immediate errors and RDMA connection loss. To enable developers to test this regularly and repeatably, add logic to simulate connection loss every so often. Fault injection is disabled by default. It is enabled with $ sudo echo xxx > /sys/kernel/debug/sunrpc/inject_fault/disconnect where "xxx" is a large positive number of transport method calls before a disconnect. A value of several thousand is usually a good number that allows reasonable forward progress while still causing a lot of connection drops. These hooks are disabled when SUNRPC_DEBUG is turned off. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-06-10sunrpc: turn swapper_enable/disable functions into rpc_xprt_opsJeff Layton
RDMA xprts don't have a sock_xprt, but an rdma_xprt, so the xs_swapper_enable/disable functions will likely oops when fed an RDMA xprt. Turn these functions into rpc_xprt_ops so that that doesn't occur. For now the RDMA versions are no-ops that just return -EINVAL on an attempt to swapon. Cc: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-06-10sunrpc: make xprt->swapper an atomic_tJeff Layton
Split xs_swapper into enable/disable functions and eliminate the "enable" flag. Currently, it's racy if you have multiple swapon/swapoff operations running in parallel over the same xprt. Also fix it so that we only set it to a memalloc socket on a 0->1 transition and only clear it on a 1->0 transition. Cc: Mel Gorman <mgorman@suse.de> Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-06-10sunrpc: keep a count of swapfiles associated with the rpc_clntJeff Layton
Jerome reported seeing a warning pop when working with a swapfile on NFS. The nfs_swap_activate can end up calling sk_set_memalloc while holding the rcu_read_lock and that function can sleep. To fix that, we need to take a reference to the xprt while holding the rcu_read_lock, set the socket up for swapping and then drop that reference. But, xprt_put is not exported and having NFS deal with the underlying xprt is a bit of layering violation anyway. Fix this by adding a set of activate/deactivate functions that take a rpc_clnt pointer instead of an rpc_xprt, and have nfs_swap_activate and nfs_swap_deactivate call those. Also, add a per-rpc_clnt atomic counter to keep track of the number of active swapfiles associated with it. When the counter does a 0->1 transition, we enable swapping on the xprt, when we do a 1->0 transition we disable swapping on it. This also allows us to be a bit more selective with the RPC_TASK_SWAPPER flag. If non-swapper and swapper clnts are sharing a xprt, then we only need to flag the tasks from the swapper clnt with that flag. Acked-by: Mel Gorman <mgorman@suse.de> Reported-by: Jerome Marchand <jmarchan@redhat.com> Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-06-05SUNRPC: Fix a backchannel raceTrond Myklebust
We need to allow the server to send a new request immediately after we've replied to the previous one. Right now, there is a window between the send and the release of the old request in rpc_put_task(), where the server could send us a new backchannel RPC call, and we have no request to service it. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-06-05SUNRPC: Remove unused argument 'tk_ops' in rpc_run_bc_taskTrond Myklebust
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-06-04svcrdma: Add a separate "max data segs macro for svcrdmaChuck Lever
The server and client maximum are architecturally independent. Allow changing one without affecting the other. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-06-04svcrdma: Replace GFP_KERNEL in a loop with GFP_NOFAILChuck Lever
At the 2015 LSF/MM, it was requested that memory allocation call sites that request GFP_KERNEL allocations in a loop should be annotated with __GFP_NOFAIL. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-06-04svcrdma: Keep rpcrdma_msg fields in network byte-orderChuck Lever
Fields in struct rpcrdma_msg are __be32. Don't byte-swap these fields when decoding RPC calls and then swap them back for the reply. For the most part, they can be left alone. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-06-03svcrdma: Remove svc_rdma_xdr_decode_deferred_req()Chuck Lever
svc_rdma_xdr_decode_deferred_req() indexes an array with an un-byte-swapped value off the wire. Fortunately this function isn't used anywhere, so simply remove it. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-06-02SUNRPC: Clean up bc_send()Chuck Lever
Clean up: Merge bc_send() into bc_svc_process(). Note: even thought this touches svc.c, it is a client-side change. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-04-23sunrpc: make debugfs file creation failure non-fatalJeff Layton
v2: gracefully handle the case where some dentry pointers end up NULL and be more dilligent about zeroing out dentry pointers We currently have a problem that SELinux policy is being enforced when creating debugfs files. If a debugfs file is created as a side effect of doing some syscall, then that creation can fail if the SELinux policy for that process prevents it. This seems wrong. We don't do that for files under /proc, for instance, so Bruce has proposed a patch to fix that. While discussing that patch however, Greg K.H. stated: "No kernel code should care / fail if a debugfs function fails, so please fix up the sunrpc code first." This patch converts all of the sunrpc debugfs setup code to be void return functins, and the callers to not look for errors from those functions. This should allow rpc_clnt and rpc_xprt creation to work, even if the kernel fails to create debugfs files for some reason. Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: "J. Bruce Fields" <bfields@fieldses.org> Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-03-31sunrpc: make debugfs file creation failure non-fatalJeff Layton
We currently have a problem that SELinux policy is being enforced when creating debugfs files. If a debugfs file is created as a side effect of doing some syscall, then that creation can fail if the SELinux policy for that process prevents it. This seems wrong. We don't do that for files under /proc, for instance, so Bruce has proposed a patch to fix that. While discussing that patch however, Greg K.H. stated: "No kernel code should care / fail if a debugfs function fails, so please fix up the sunrpc code first." This patch converts all of the sunrpc debugfs setup code to be void return functins, and the callers to not look for errors from those functions. This should allow rpc_clnt and rpc_xprt creation to work, even if the kernel fails to create debugfs files for some reason. Symptoms were failing krb5 mounts on systems using gss-proxy and selinux. Fixes: 388f0c776781 "sunrpc: add a debugfs rpc_xprt directory..." Cc: stable@vger.kernel.org Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-03-31SUNRPC: Introduce missing well-known netidsChuck Lever
Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2015-02-12SUNRPC: Fix a compile error when #undef CONFIG_PROC_FSTrond Myklebust
The definition of rpc_count_iostats_metrics() is borked. Reported by: Jim Davis <jim.epost@gmail.com> Fixes: d67ae825a59d6 ("pnfs/flexfiles: Add the FlexFile Layout Driver") Cc: Tom Haynes <thomas.haynes@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-02-09SUNRPC: Remove the redundant XPRT_CONNECTION_CLOSE flagTrond Myklebust
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-02-09SUNRPC: Cleanup to remove remaining uses of XPRT_CONNECTION_ABORTTrond Myklebust
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>