NFSv4 S. Shepler Internet-Draft M. Eisler Intended status: Standards Track D. Noveck Expires: January 2, 2008 Editors July 2007 NFSv4 Minor Version 1 draft-ietf-nfsv4-minorversion1-13.txt Status of this Memo By submitting this Internet-Draft, each author represents that any applicable patent or other IPR claims of which he or she is aware have been or will be disclosed, and any of which he or she becomes aware will be disclosed, in accordance with Section 6 of BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt. The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. This Internet-Draft will expire on January 2, 2008. Copyright Notice Copyright (C) The IETF Trust (2007). Abstract This Internet-Draft describes NFSv4 minor version one, including features retained from the base protocol and protocol extensions made subsequently. The current draft includes description of the major extensions, Sessions, Directory Delegations, and parallel NFS (pNFS). This Internet-Draft is an active work item of the NFSv4 working group. Active and resolved issues may be found in the issue tracker at: http://www.nfsv4-editor.org/cgi-bin/roundup/nfsv4. New issues Shepler, et al. Expires January 2, 2008 [Page 1] Internet-Draft NFSv4 Minor Version 1 July 2007 related to this document should be raised with the NFSv4 Working Group nfsv4@ietf.org and logged in the issue tracker. Requirements Language The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [1]. Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 10 1.1. The NFSv4.1 Protocol . . . . . . . . . . . . . . . . . . 10 1.2. NFS Version 4 Goals . . . . . . . . . . . . . . . . . . 10 1.3. Minor Version 1 Goals . . . . . . . . . . . . . . . . . 11 1.4. Overview of NFS version 4.1 Features . . . . . . . . . . 11 1.4.1. RPC and Security . . . . . . . . . . . . . . . . . . 12 1.4.2. Protocol Structure . . . . . . . . . . . . . . . . . 12 1.4.3. File System Model . . . . . . . . . . . . . . . . . 13 1.4.4. Locking Facilities . . . . . . . . . . . . . . . . . 14 1.5. General Definitions . . . . . . . . . . . . . . . . . . 15 1.6. Differences from NFSv4.0 . . . . . . . . . . . . . . . . 17 2. Core Infrastructure . . . . . . . . . . . . . . . . . . . . . 17 2.1. Introduction . . . . . . . . . . . . . . . . . . . . . . 18 2.2. RPC and XDR . . . . . . . . . . . . . . . . . . . . . . 18 2.2.1. RPC-based Security . . . . . . . . . . . . . . . . . 18 2.3. COMPOUND and CB_COMPOUND . . . . . . . . . . . . . . . . 21 2.4. Client Identifiers and Client Owners . . . . . . . . . . 22 2.4.1. Server Release of Client ID . . . . . . . . . . . . 26 2.4.2. Resolving Client Owner Conflicts . . . . . . . . . . 26 2.5. Server Owners . . . . . . . . . . . . . . . . . . . . . 27 2.6. Security Service Negotiation . . . . . . . . . . . . . . 28 2.6.1. NFSv4.1 Security Tuples . . . . . . . . . . . . . . 28 2.6.2. SECINFO and SECINFO_NO_NAME . . . . . . . . . . . . 28 2.6.3. Security Error . . . . . . . . . . . . . . . . . . . 29 2.7. Minor Versioning . . . . . . . . . . . . . . . . . . . . 32 2.8. Non-RPC-based Security Services . . . . . . . . . . . . 34 2.8.1. Authorization . . . . . . . . . . . . . . . . . . . 34 2.8.2. Auditing . . . . . . . . . . . . . . . . . . . . . . 35 2.8.3. Intrusion Detection . . . . . . . . . . . . . . . . 35 2.9. Transport Layers . . . . . . . . . . . . . . . . . . . . 35 2.9.1. Required and Recommended Properties of Transports . 35 2.9.2. Client and Server Transport Behavior . . . . . . . . 36 2.9.3. Ports . . . . . . . . . . . . . . . . . . . . . . . 37 2.10. Session . . . . . . . . . . . . . . . . . . . . . . . . 37 2.10.1. Motivation and Overview . . . . . . . . . . . . . . 37 2.10.2. NFSv4 Integration . . . . . . . . . . . . . . . . . 38 Shepler, et al. Expires January 2, 2008 [Page 2] Internet-Draft NFSv4 Minor Version 1 July 2007 2.10.3. Channels . . . . . . . . . . . . . . . . . . . . . . 40 2.10.4. Trunking . . . . . . . . . . . . . . . . . . . . . . 41 2.10.5. Exactly Once Semantics . . . . . . . . . . . . . . . 44 2.10.6. RDMA Considerations . . . . . . . . . . . . . . . . 56 2.10.7. Sessions Security . . . . . . . . . . . . . . . . . 59 2.10.8. Session Mechanics - Steady State . . . . . . . . . . 67 2.10.9. Session Mechanics - Recovery . . . . . . . . . . . . 69 2.10.10. Parallel NFS and Sessions . . . . . . . . . . . . . 72 3. Protocol Data Types . . . . . . . . . . . . . . . . . . . . . 72 3.1. Basic Data Types . . . . . . . . . . . . . . . . . . . . 72 3.2. Structured Data Types . . . . . . . . . . . . . . . . . 74 4. Filehandles . . . . . . . . . . . . . . . . . . . . . . . . . 84 4.1. Obtaining the First Filehandle . . . . . . . . . . . . . 84 4.1.1. Root Filehandle . . . . . . . . . . . . . . . . . . 84 4.1.2. Public Filehandle . . . . . . . . . . . . . . . . . 85 4.2. Filehandle Types . . . . . . . . . . . . . . . . . . . . 85 4.2.1. General Properties of a Filehandle . . . . . . . . . 85 4.2.2. Persistent Filehandle . . . . . . . . . . . . . . . 86 4.2.3. Volatile Filehandle . . . . . . . . . . . . . . . . 86 4.3. One Method of Constructing a Volatile Filehandle . . . . 88 4.4. Client Recovery from Filehandle Expiration . . . . . . . 88 5. File Attributes . . . . . . . . . . . . . . . . . . . . . . . 89 5.1. Mandatory Attributes . . . . . . . . . . . . . . . . . . 90 5.2. Recommended Attributes . . . . . . . . . . . . . . . . . 91 5.3. Named Attributes . . . . . . . . . . . . . . . . . . . . 91 5.4. Classification of Attributes . . . . . . . . . . . . . . 92 5.5. Mandatory Attributes - Definitions . . . . . . . . . . . 93 5.6. Recommended Attributes - Definitions . . . . . . . . . . 94 5.7. Time Access . . . . . . . . . . . . . . . . . . . . . . 104 5.8. Interpreting owner and owner_group . . . . . . . . . . . 105 5.9. Character Case Attributes . . . . . . . . . . . . . . . 107 5.10. Quota Attributes . . . . . . . . . . . . . . . . . . . . 107 5.11. mounted_on_fileid . . . . . . . . . . . . . . . . . . . 108 5.12. Directory Notification Attributes . . . . . . . . . . . 109 5.12.1. dir_notif_delay . . . . . . . . . . . . . . . . . . 109 5.12.2. dirent_notif_delay . . . . . . . . . . . . . . . . . 109 5.13. PNFS Attributes . . . . . . . . . . . . . . . . . . . . 109 5.13.1. fs_layout_type . . . . . . . . . . . . . . . . . . . 109 5.13.2. layout_alignment . . . . . . . . . . . . . . . . . . 109 5.13.3. layout_blksize . . . . . . . . . . . . . . . . . . . 110 5.13.4. layout_hint . . . . . . . . . . . . . . . . . . . . 110 5.13.5. layout_type . . . . . . . . . . . . . . . . . . . . 110 5.13.6. mdsthreshold . . . . . . . . . . . . . . . . . . . . 110 5.14. Retention Attributes . . . . . . . . . . . . . . . . . . 111 6. Security Related Attributes . . . . . . . . . . . . . . . . . 113 6.1. Goals . . . . . . . . . . . . . . . . . . . . . . . . . 113 6.2. File Attributes Discussion . . . . . . . . . . . . . . . 114 6.2.1. ACL Attributes . . . . . . . . . . . . . . . . . . . 114 Shepler, et al. Expires January 2, 2008 [Page 3] Internet-Draft NFSv4 Minor Version 1 July 2007 6.2.2. dacl and sacl Attributes . . . . . . . . . . . . . . 127 6.2.3. mode Attribute . . . . . . . . . . . . . . . . . . . 127 6.2.4. mode_set_masked Attribute . . . . . . . . . . . . . 128 6.3. Common Methods . . . . . . . . . . . . . . . . . . . . . 129 6.3.1. Interpreting an ACL . . . . . . . . . . . . . . . . 129 6.3.2. Computing a Mode Attribute from an ACL . . . . . . . 130 6.4. Requirements . . . . . . . . . . . . . . . . . . . . . . 131 6.4.1. Setting the mode and/or ACL Attributes . . . . . . . 132 6.4.2. Retrieving the mode and/or ACL Attributes . . . . . 133 6.4.3. Creating New Objects . . . . . . . . . . . . . . . . 134 7. Single-server Namespace . . . . . . . . . . . . . . . . . . . 138 7.1. Server Exports . . . . . . . . . . . . . . . . . . . . . 138 7.2. Browsing Exports . . . . . . . . . . . . . . . . . . . . 138 7.3. Server Pseudo File System . . . . . . . . . . . . . . . 139 7.4. Multiple Roots . . . . . . . . . . . . . . . . . . . . . 139 7.5. Filehandle Volatility . . . . . . . . . . . . . . . . . 140 7.6. Exported Root . . . . . . . . . . . . . . . . . . . . . 140 7.7. Mount Point Crossing . . . . . . . . . . . . . . . . . . 140 7.8. Security Policy and Namespace Presentation . . . . . . . 141 8. State Management . . . . . . . . . . . . . . . . . . . . . . 142 8.1. Client and Session ID . . . . . . . . . . . . . . . . . 142 8.2. Stateid Definition . . . . . . . . . . . . . . . . . . . 143 8.2.1. Stateid Types . . . . . . . . . . . . . . . . . . . 143 8.2.2. Stateid Structure . . . . . . . . . . . . . . . . . 144 8.2.3. Special Stateids . . . . . . . . . . . . . . . . . . 145 8.2.4. Stateid Lifetime and Validation . . . . . . . . . . 146 8.3. Lease Renewal . . . . . . . . . . . . . . . . . . . . . 148 8.4. Crash Recovery . . . . . . . . . . . . . . . . . . . . . 149 8.4.1. Client Failure and Recovery . . . . . . . . . . . . 149 8.4.2. Server Failure and Recovery . . . . . . . . . . . . 150 8.4.3. Network Partitions and Recovery . . . . . . . . . . 154 8.5. Server Revocation of Locks . . . . . . . . . . . . . . . 158 8.6. Short and Long Leases . . . . . . . . . . . . . . . . . 159 8.7. Clocks, Propagation Delay, and Calculating Lease Expiration . . . . . . . . . . . . . . . . . . . . . . . 159 8.8. Vestigial Locking Infrastructure From V4.0 . . . . . . . 160 9. File Locking and Share Reservations . . . . . . . . . . . . . 161 9.1. Opens and Byte-range Locks . . . . . . . . . . . . . . . 161 9.1.1. State-owner Definition . . . . . . . . . . . . . . . 161 9.1.2. Use of the Stateid and Locking . . . . . . . . . . . 162 9.2. Lock Ranges . . . . . . . . . . . . . . . . . . . . . . 165 9.3. Upgrading and Downgrading Locks . . . . . . . . . . . . 165 9.4. Blocking Locks . . . . . . . . . . . . . . . . . . . . . 166 9.5. Share Reservations . . . . . . . . . . . . . . . . . . . 167 9.6. OPEN/CLOSE Operations . . . . . . . . . . . . . . . . . 167 9.7. Open Upgrade and Downgrade . . . . . . . . . . . . . . . 168 9.8. Reclaim of Open and Byte-range Locks . . . . . . . . . . 169 10. Client-Side Caching . . . . . . . . . . . . . . . . . . . . . 169 Shepler, et al. Expires January 2, 2008 [Page 4] Internet-Draft NFSv4 Minor Version 1 July 2007 10.1. Performance Challenges for Client-Side Caching . . . . . 170 10.2. Delegation and Callbacks . . . . . . . . . . . . . . . . 171 10.2.1. Delegation Recovery . . . . . . . . . . . . . . . . 172 10.3. Data Caching . . . . . . . . . . . . . . . . . . . . . . 174 10.3.1. Data Caching and OPENs . . . . . . . . . . . . . . . 175 10.3.2. Data Caching and File Locking . . . . . . . . . . . 176 10.3.3. Data Caching and Mandatory File Locking . . . . . . 177 10.3.4. Data Caching and File Identity . . . . . . . . . . . 178 10.4. Open Delegation . . . . . . . . . . . . . . . . . . . . 179 10.4.1. Open Delegation and Data Caching . . . . . . . . . . 181 10.4.2. Open Delegation and File Locks . . . . . . . . . . . 182 10.4.3. Handling of CB_GETATTR . . . . . . . . . . . . . . . 183 10.4.4. Recall of Open Delegation . . . . . . . . . . . . . 186 10.4.5. Clients that Fail to Honor Delegation Recalls . . . 188 10.4.6. Delegation Revocation . . . . . . . . . . . . . . . 189 10.4.7. Delegations via WANT_DELEGATION . . . . . . . . . . 189 10.5. Data Caching and Revocation . . . . . . . . . . . . . . 189 10.5.1. Revocation Recovery for Write Open Delegation . . . 190 10.6. Attribute Caching . . . . . . . . . . . . . . . . . . . 191 10.7. Data and Metadata Caching and Memory Mapped Files . . . 193 10.8. Name Caching . . . . . . . . . . . . . . . . . . . . . . 195 10.9. Directory Caching . . . . . . . . . . . . . . . . . . . 196 11. Multi-Server Namespace . . . . . . . . . . . . . . . . . . . 197 11.1. Location attributes . . . . . . . . . . . . . . . . . . 197 11.2. File System Presence or Absence . . . . . . . . . . . . 197 11.3. Getting Attributes for an Absent File System . . . . . . 199 11.3.1. GETATTR Within an Absent File System . . . . . . . . 199 11.3.2. READDIR and Absent File Systems . . . . . . . . . . 200 11.4. Uses of Location Information . . . . . . . . . . . . . . 201 11.4.1. File System Replication . . . . . . . . . . . . . . 201 11.4.2. File System Migration . . . . . . . . . . . . . . . 203 11.4.3. Referrals . . . . . . . . . . . . . . . . . . . . . 204 11.5. Additional Client-side Considerations . . . . . . . . . 205 11.6. Effecting File System Transitions . . . . . . . . . . . 206 11.6.1. File System Transitions and Simultaneous Access . . 207 11.6.2. Simultaneous Use and Transparent Transitions . . . . 208 11.6.3. Filehandles and File System Transitions . . . . . . 210 11.6.4. Fileid's and File System Transitions . . . . . . . . 210 11.6.5. Fsids and File System Transitions . . . . . . . . . 211 11.6.6. The Change Attribute and File System Transitions . . 211 11.6.7. Lock State and File System Transitions . . . . . . . 212 11.6.8. Write Verifiers and File System Transitions . . . . 216 11.7. Effecting File System Referrals . . . . . . . . . . . . 216 11.7.1. Referral Example (LOOKUP) . . . . . . . . . . . . . 216 11.7.2. Referral Example (READDIR) . . . . . . . . . . . . . 220 11.8. The Attribute fs_absent . . . . . . . . . . . . . . . . 223 11.9. The Attribute fs_locations . . . . . . . . . . . . . . . 223 11.10. The Attribute fs_locations_info . . . . . . . . . . . . 225 Shepler, et al. Expires January 2, 2008 [Page 5] Internet-Draft NFSv4 Minor Version 1 July 2007 11.10.1. The fs_locations_server4 Structure . . . . . . . . . 228 11.10.2. The fs_locations_info4 Structure . . . . . . . . . . 233 11.10.3. The fs_locations_item4 Structure . . . . . . . . . . 234 11.11. The Attribute fs_status . . . . . . . . . . . . . . . . 235 12. Directory Delegations . . . . . . . . . . . . . . . . . . . . 239 12.1. Introduction to Directory Delegations . . . . . . . . . 239 12.2. Directory Delegation Design . . . . . . . . . . . . . . 240 12.3. Attributes in Support of Directory Notifications . . . . 241 12.4. Delegation Recall . . . . . . . . . . . . . . . . . . . 241 12.5. Directory Delegation Recovery . . . . . . . . . . . . . 241 13. Parallel NFS (pNFS) . . . . . . . . . . . . . . . . . . . . . 241 13.1. Introduction . . . . . . . . . . . . . . . . . . . . . . 241 13.2. pNFS Definitions . . . . . . . . . . . . . . . . . . . . 243 13.2.1. Metadata . . . . . . . . . . . . . . . . . . . . . . 243 13.2.2. Metadata Server . . . . . . . . . . . . . . . . . . 243 13.2.3. pNFS Client . . . . . . . . . . . . . . . . . . . . 244 13.2.4. Storage Device . . . . . . . . . . . . . . . . . . . 244 13.2.5. Storage Protocol . . . . . . . . . . . . . . . . . . 244 13.2.6. Control Protocol . . . . . . . . . . . . . . . . . . 244 13.2.7. Layout Types . . . . . . . . . . . . . . . . . . . . 244 13.2.8. Layout . . . . . . . . . . . . . . . . . . . . . . . 245 13.2.9. Layout Iomode . . . . . . . . . . . . . . . . . . . 245 13.2.10. Device IDs . . . . . . . . . . . . . . . . . . . . . 246 13.3. pNFS Operations . . . . . . . . . . . . . . . . . . . . 246 13.4. pNFS Attributes . . . . . . . . . . . . . . . . . . . . 247 13.5. Layout Semantics . . . . . . . . . . . . . . . . . . . . 247 13.5.1. Guarantees Provided by Layouts . . . . . . . . . . . 248 13.5.2. Getting a Layout . . . . . . . . . . . . . . . . . . 249 13.5.3. Committing a Layout . . . . . . . . . . . . . . . . 250 13.5.4. Recalling a Layout . . . . . . . . . . . . . . . . . 253 13.5.5. Metadata Server Write Propagation . . . . . . . . . 259 13.6. pNFS Mechanics . . . . . . . . . . . . . . . . . . . . . 259 13.7. Recovery . . . . . . . . . . . . . . . . . . . . . . . . 260 13.7.1. Client Recovery . . . . . . . . . . . . . . . . . . 261 13.7.2. Dealing with Lease Expiration on the Client . . . . 261 13.7.3. Dealing with Loss of Layout State on the Metadata Server . . . . . . . . . . . . . . . . . . . . . . . 263 13.7.4. Recovery from Metadata Server Restart . . . . . . . 263 13.7.5. Operations During Metadata Server Grace Period . . . 265 13.7.6. Storage Device Recovery . . . . . . . . . . . . . . 266 13.8. Metadata and Storage Device Roles . . . . . . . . . . . 266 13.9. Security Considerations . . . . . . . . . . . . . . . . 268 14. PNFS: NFSv4.1 File Layout Type . . . . . . . . . . . . . . . 269 14.1. Client ID and Session Considerations . . . . . . . . . . 269 14.2. File Layout Definitions . . . . . . . . . . . . . . . . 270 14.3. File Layout Data Types . . . . . . . . . . . . . . . . . 271 14.4. Interpreting the File Layout . . . . . . . . . . . . . . 274 14.5. Sparse and Dense Stripe Unit Packing . . . . . . . . . . 276 Shepler, et al. Expires January 2, 2008 [Page 6] Internet-Draft NFSv4 Minor Version 1 July 2007 14.6. Data Server Multipathing . . . . . . . . . . . . . . . . 277 14.7. Operations Issued to NFSv4.1 Data Servers . . . . . . . 278 14.8. COMMIT Through Metadata Server . . . . . . . . . . . . . 279 14.9. The Layout Iomode . . . . . . . . . . . . . . . . . . . 280 14.10. Metadata and Data Server State Coordination . . . . . . 280 14.10.1. Global Stateid Requirements . . . . . . . . . . . . 280 14.10.2. Data Server State Propagation . . . . . . . . . . . 280 14.11. Data Server Component File Size . . . . . . . . . . . . 283 14.12. Recovery from Loss of Layout . . . . . . . . . . . . . . 283 14.13. Security Considerations for the File Layout Type . . . . 284 15. Internationalization . . . . . . . . . . . . . . . . . . . . 284 15.1. Stringprep profile for the utf8str_cs type . . . . . . . 286 15.2. Stringprep profile for the utf8str_cis type . . . . . . 287 15.3. Stringprep profile for the utf8str_mixed type . . . . . 289 15.4. UTF-8 Related Errors . . . . . . . . . . . . . . . . . . 290 16. Error Values . . . . . . . . . . . . . . . . . . . . . . . . 290 16.1. Error Definitions . . . . . . . . . . . . . . . . . . . 291 16.2. Operations and their valid errors . . . . . . . . . . . 305 16.3. Callback operations and their valid errors . . . . . . . 319 16.4. Errors and the operations that use them . . . . . . . . 320 17. NFS version 4.1 Procedures . . . . . . . . . . . . . . . . . 327 17.1. Procedure 0: NULL - No Operation . . . . . . . . . . . . 327 17.2. Procedure 1: COMPOUND - Compound Operations . . . . . . 328 18. NFS version 4.1 Operations . . . . . . . . . . . . . . . . . 333 18.1. Operation 3: ACCESS - Check Access Rights . . . . . . . 333 18.2. Operation 4: CLOSE - Close File . . . . . . . . . . . . 335 18.3. Operation 5: COMMIT - Commit Cached Data . . . . . . . . 337 18.4. Operation 6: CREATE - Create a Non-Regular File Object . 339 18.5. Operation 7: DELEGPURGE - Purge Delegations Awaiting Recovery . . . . . . . . . . . . . . . . . . . . . . . . 342 18.6. Operation 8: DELEGRETURN - Return Delegation . . . . . . 343 18.7. Operation 9: GETATTR - Get Attributes . . . . . . . . . 343 18.8. Operation 10: GETFH - Get Current Filehandle . . . . . . 345 18.9. Operation 11: LINK - Create Link to a File . . . . . . . 346 18.10. Operation 12: LOCK - Create Lock . . . . . . . . . . . . 347 18.11. Operation 13: LOCKT - Test For Lock . . . . . . . . . . 351 18.12. Operation 14: LOCKU - Unlock File . . . . . . . . . . . 352 18.13. Operation 15: LOOKUP - Lookup Filename . . . . . . . . . 354 18.14. Operation 16: LOOKUPP - Lookup Parent Directory . . . . 356 18.15. Operation 17: NVERIFY - Verify Difference in Attributes . . . . . . . . . . . . . . . . . . . . . . . 357 18.16. Operation 18: OPEN - Open a Regular File . . . . . . . . 358 18.17. Operation 19: OPENATTR - Open Named Attribute Directory . . . . . . . . . . . . . . . . . . . . . . . 373 18.18. Operation 21: OPEN_DOWNGRADE - Reduce Open File Access . 374 18.19. Operation 22: PUTFH - Set Current Filehandle . . . . . . 375 18.20. Operation 23: PUTPUBFH - Set Public Filehandle . . . . . 376 18.21. Operation 24: PUTROOTFH - Set Root Filehandle . . . . . 378 Shepler, et al. Expires January 2, 2008 [Page 7] Internet-Draft NFSv4 Minor Version 1 July 2007 18.22. Operation 25: READ - Read from File . . . . . . . . . . 379 18.23. Operation 26: READDIR - Read Directory . . . . . . . . . 381 18.24. Operation 27: READLINK - Read Symbolic Link . . . . . . 385 18.25. Operation 28: REMOVE - Remove File System Object . . . . 386 18.26. Operation 29: RENAME - Rename Directory Entry . . . . . 388 18.27. Operation 31: RESTOREFH - Restore Saved Filehandle . . . 390 18.28. Operation 32: SAVEFH - Save Current Filehandle . . . . . 391 18.29. Operation 33: SECINFO - Obtain Available Security . . . 391 18.30. Operation 34: SETATTR - Set Attributes . . . . . . . . . 395 18.31. Operation 37: VERIFY - Verify Same Attributes . . . . . 397 18.32. Operation 38: WRITE - Write to File . . . . . . . . . . 398 18.33. Operation 40: BACKCHANNEL_CTL - Backchannel control . . 403 18.34. Operation 41: BIND_CONN_TO_SESSION . . . . . . . . . . . 404 18.35. Operation 42: EXCHANGE_ID - Instantiate Client ID . . . 406 18.36. Operation 43: CREATE_SESSION - Create New Session and Confirm Client ID . . . . . . . . . . . . . . . . . . . 423 18.37. Operation 44: DESTROY_SESSION - Destroy existing session . . . . . . . . . . . . . . . . . . . . . . . . 433 18.38. Operation 45: FREE_STATEID - Free stateid with no locks . . . . . . . . . . . . . . . . . . . . . . . . . 435 18.39. Operation 46: GET_DIR_DELEGATION - Get a directory delegation . . . . . . . . . . . . . . . . . . . . . . . 436 18.40. Operation 47: GETDEVICEINFO - Get Device Information . . 440 18.41. Operation 48: GETDEVICELIST . . . . . . . . . . . . . . 441 18.42. Operation 49: LAYOUTCOMMIT - Commit writes made using a layout . . . . . . . . . . . . . . . . . . . . . . . . 442 18.43. Operation 50: LAYOUTGET - Get Layout Information . . . . 445 18.44. Operation 51: LAYOUTRETURN - Release Layout Information . . . . . . . . . . . . . . . . . . . . . . 448 18.45. Operation 52: SECINFO_NO_NAME - Get Security on Unnamed Object . . . . . . . . . . . . . . . . . . . . . 451 18.46. Operation 53: SEQUENCE - Supply per-procedure sequencing and control . . . . . . . . . . . . . . . . . 452 18.47. Operation 54: SET_SSV . . . . . . . . . . . . . . . . . 459 18.48. Operation 55: TEST_STATEID - Test stateids for validity . . . . . . . . . . . . . . . . . . . . . . . . 461 18.49. Operation 56: WANT_DELEGATION . . . . . . . . . . . . . 462 18.50. Operation 57: DESTROY_CLIENTID - Destroy existing client ID . . . . . . . . . . . . . . . . . . . . . . . 465 18.51. Operation 58: RECLAIM_COMPLETE - Indicates Reclaims Finished . . . . . . . . . . . . . . . . . . . . . . . . 466 18.52. Operation 10044: ILLEGAL - Illegal operation . . . . . . 468 19. NFS version 4.1 Callback Procedures . . . . . . . . . . . . . 468 19.1. Procedure 0: CB_NULL - No Operation . . . . . . . . . . 469 19.2. Procedure 1: CB_COMPOUND - Compound Operations . . . . . 469 20. NFS version 4.1 Callback Operations . . . . . . . . . . . . . 471 20.1. Operation 3: CB_GETATTR - Get Attributes . . . . . . . . 471 20.2. Operation 4: CB_RECALL - Recall an Open Delegation . . . 473 Shepler, et al. Expires January 2, 2008 [Page 8] Internet-Draft NFSv4 Minor Version 1 July 2007 20.3. Operation 5: CB_LAYOUTRECALL . . . . . . . . . . . . . . 474 20.4. Operation 6: CB_NOTIFY - Notify directory changes . . . 477 20.5. Operation 7: CB_PUSH_DELEG . . . . . . . . . . . . . . . 480 20.6. Operation 8: CB_RECALL_ANY - Keep any N delegations . . 481 20.7. Operation 9: CB_RECALLABLE_OBJ_AVAIL . . . . . . . . . . 484 20.8. Operation 10: CB_RECALL_SLOT - change flow control limits . . . . . . . . . . . . . . . . . . . . . . . . . 485 20.9. Operation 11: CB_SEQUENCE - Supply backchannel sequencing and control . . . . . . . . . . . . . . . . . 486 20.10. Operation 12: CB_WANTS_CANCELLED . . . . . . . . . . . . 489 20.11. Operation 13: CB_NOTIFY_LOCK - Notify of possible lock availability . . . . . . . . . . . . . . . . . . . 490 20.12. Operation 10044: CB_ILLEGAL - Illegal Callback Operation . . . . . . . . . . . . . . . . . . . . . . . 491 21. Security Considerations . . . . . . . . . . . . . . . . . . . 492 22. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 492 22.1. Defining new layout types . . . . . . . . . . . . . . . 492 23. References . . . . . . . . . . . . . . . . . . . . . . . . . 493 23.1. Normative References . . . . . . . . . . . . . . . . . . 493 23.2. Informative References . . . . . . . . . . . . . . . . . 494 Appendix A. Acknowledgments . . . . . . . . . . . . . . . . . . 496 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 497 Intellectual Property and Copyright Statements . . . . . . . . . 498 Shepler, et al. Expires January 2, 2008 [Page 9] Internet-Draft NFSv4 Minor Version 1 July 2007 1. Introduction 1.1. The NFSv4.1 Protocol The NFSv4.1 protocol is a minor version of the NFSv4 protocol described in [2]. It generally follows the guidelines for minor versioning model laid in Section 10 of RFC 3530. However, it diverges from guidelines 11 ("a client and server that supports minor version X must support minor versions 0 through X-1"), and 12 ("no features may be introduced as mandatory in a minor version"). These divergences are due to the introduction of the sessions model for managing non-idempotent operations and the RECLAIM_COMPLETE operation. These two new features are infrastructural in nature and simplify implementation of existing and other new features. Making them optional would add undue complexity to protocol definition and implementation. NFSv4.1 accordingly updates the Minor Versioning guidelines (Section 2.7). NFSv4.1, as a minor version, is consistent with the overall goals for NFS Version 4, but extends the protocol so as to better meet those goals, based on experiences with NFSv4.0. In addition, NFSv4.1 has adopted some additional goals, which motivate some of the major extensions in minor version 1. 1.2. NFS Version 4 Goals The NFS version 4 protocol is a further revision of the NFS protocol defined already by versions 2 [21] and 3 [22]. It retains the essential characteristics of previous versions: design for easy recovery, independent of transport protocols, operating systems and file systems, simplicity, and good performance. The NFS version 4 revision has the following goals: o Improved access and good performance on the Internet. The protocol is designed to transit firewalls easily, perform well where latency is high and bandwidth is low, and scale to very large numbers of clients per server. o Strong security with negotiation built into the protocol. The protocol builds on the work of the ONCRPC working group in supporting the RPCSEC_GSS protocol. Additionally, the NFS version 4 protocol provides a mechanism to allow clients and servers the ability to negotiate security and require clients and servers to support a minimal set of security schemes. Shepler, et al. Expires January 2, 2008 [Page 10] Internet-Draft NFSv4 Minor Version 1 July 2007 o Good cross-platform interoperability. The protocol features a file system model that provides a useful, common set of features that does not unduly favor one file system or operating system over another. o Designed for protocol extensions. The protocol is designed to accept standard extensions within a framework that enable and encourages backward compatibility. 1.3. Minor Version 1 Goals Minor version one has the following goals, within the framework established by the overall version 4 goals. o To correct significant structural weaknesses and oversights discovered in the base protocol. o To add clarity and specificity to areas left unaddressed or not addressed in sufficient detail in the base protocol. o To add specific features based on experience with the existing protocol and recent industry developments. o To provide protocol support to take advantage of clustered server deployments including the ability to provide scalable parallel access to files distributed among multiple servers. 1.4. Overview of NFS version 4.1 Features To provide a reasonable context for the reader, the major features of NFS version 4.1 protocol will be reviewed in brief. This will be done to provide an appropriate context for both the reader who is familiar with the previous versions of the NFS protocol and the reader that is new to the NFS protocols. For the reader new to the NFS protocols, there is still a set of fundamental knowledge that is expected. The reader should be familiar with the XDR and RPC protocols as described in [3] and [4]. A basic knowledge of file systems and distributed file systems is expected as well. This description of version 4.1 features will not distinguish those added in minor version one from those present in the base protocol but will treat minor version 1 as a unified whole. See Section 1.6 for a description of the differences between the two minor versions. Shepler, et al. Expires January 2, 2008 [Page 11] Internet-Draft NFSv4 Minor Version 1 July 2007 1.4.1. RPC and Security As with previous versions of NFS, the External Data Representation (XDR) and Remote Procedure Call (RPC) mechanisms used for the NFS version 4.1 protocol are those defined in [3] and [4]. To meet end- to-end security requirements, the RPCSEC_GSS framework [5] will be used to extend the basic RPC security. With the use of RPCSEC_GSS, various mechanisms can be provided to offer authentication, integrity, and privacy to the NFS version 4 protocol. Kerberos V5 will be used as described in [6] to provide one security framework. The LIPKEY and SPKM-3 GSS-API mechanisms described in [7] will be used to provide for the use of user password and client/server public key certificates by the NFS version 4 protocol. With the use of RPCSEC_GSS, other mechanisms may also be specified and used for NFS version 4.1 security. To enable in-band security negotiation, the NFS version 4.1 protocol has operations which provide the client a method of querying the server about its policies regarding which security mechanisms must be used for access to the server's file system resources. With this, the client can securely match the security mechanism that meets the policies specified at both the client and server. 1.4.2. Protocol Structure 1.4.2.1. Core Protocol Unlike NFS Versions 2 and 3, which used a series of ancillary protocols (e.g. NLM, NSM, MOUNT), within all minor versions of NFS version 4 only a single RPC protocol is used to make requests of the server. Facilities that had been separate protocols, such as locking, are now integrated within a single unified protocol. 1.4.2.2. Parallel Access Minor version one supports high-performance data access to a clustered server implementation by enabling a separation of metadata access and data access, with the latter done to multiple servers in parallel. Such parallel data access is controlled by recallable objects known as "layouts", which are integrated into the protocol locking model. Clients direct requests for data access to a set of data servers specified by the layout via a data storage protocol which may be NFSv4.1 or may be another protocol. Shepler, et al. Expires January 2, 2008 [Page 12] Internet-Draft NFSv4 Minor Version 1 July 2007 1.4.3. File System Model The general file system model used for the NFS version 4.1 protocol is the same as previous versions. The server file system is hierarchical with the regular files contained within being treated as opaque octet streams. In a slight departure, file and directory names are encoded with UTF-8 to deal with the basics of internationalization. The NFS version 4.1 protocol does not require a separate protocol to provide for the initial mapping between path name and filehandle. All file systems exported by a server are presented as a tree so that all file systems are reachable from a special per-server global root filehandle. This allows LOOKUP operations to be used to perform functions previously provided by the MOUNT protocol. The server provides any necessary pseudo file systems to bridge any gaps that arise due to unexported gaps between exported file systems. 1.4.3.1. Filehandles As in previous versions of the NFS protocol, opaque filehandles are used to identify individual files and directories. Lookup-type and create operations are used to go from file and directory names to the filehandle which is then used to identify the object to subsequent operations. The NFS version 4.1 protocol provides support for persistent filehandles, guaranteed to be valid for the lifetime of the file system object designated. In addition it provides support to servers to provide filehandles with more limited validity guarantees, called volatile filehandles. 1.4.3.2. File Attributes The NFS version 4.1 protocol has a rich and extensible attribute structure. Only a small set of the defined attributes are mandatory and must be provided by all server implementations. The other attributes are known as "recommended" attributes. The acl, sacl, and dacl attributes are a significant set of file attributes that make up the Access Control List (ACL) of a file. These attributes provide for directory and file access control beyond the model used in NFS Versions 2 and 3. The ACL definition allows for specification of specific sets of permissions for individual users and groups. In addition, ACL inheritance allows propagation of access permissions and restriction down a directory tree as file system objects are created. Shepler, et al. Expires January 2, 2008 [Page 13] Internet-Draft NFSv4 Minor Version 1 July 2007 One other type of attribute is the named attribute. A named attribute is an opaque octet stream that is associated with a directory or file and referred to by a string name. Named attributes are meant to be used by client applications as a method to associate application-specific data with a regular file or directory. 1.4.3.3. Multi-server Namespace NFS Version 4.1 contains a number of features to allow implementation of namespaces that cross server boundaries and that allow and facilitate a non-disruptive transfer of support for individual file systems between servers. They are all based upon attributes that allow one file system to specify alternate or new locations for that file system. These attributes may be used together with the concept of absent file system which provide specifications for additional locations but no actual file system content. This allows a number of important facilities: o Location attributes may be used with absent file systems to implement referrals whereby one server may direct the client to a file system provided by another server. This allows extensive multi-server namespaces to be constructed. o Location attributes may be provided for present file systems to provide the locations of alternate file system instances or replicas to be used in the event that the current file system instance becomes unavailable. o Location attributes may be provided when a previously present file system becomes absent. This allows non-disruptive migration of file systems to alternate servers. 1.4.4. Locking Facilities As mentioned previously, NFS v4.1, is a single protocol which includes locking facilities. These locking facilities include support for many types of locks including a number of sorts of recallable locks. Recallable locks such as delegations allow the client to be assured that certain events will not occur so long as that lock is held. When circumstances change, the lock is recalled via a callback request. The assurances provided by delegations allow more extensive caching to be done safely when circumstances allow it. o Share reservations as established by OPEN operations. Shepler, et al. Expires January 2, 2008 [Page 14] Internet-Draft NFSv4 Minor Version 1 July 2007 o Byte-range locks. o File delegations which are recallable locks that assure the holder that inconsistent opens and file changes cannot occur so long as the delegation is held. o Directory delegations which are recallable delegations that assure the holder that inconsistent directory modifications cannot occur so long as the delegation is held. o Layouts which are recallable objects that assure the holder that direct access to the file data may be performed directly by the client and that no change to the data's location inconsistent with that access may be made so long as the layout is held. All locks for a given client are tied together under a single client- wide lease. All requests made on sessions associated with the client renew that lease. When leases are not promptly renewed lock are subject to revocation. In the event of server reinitialization, clients have the opportunity to safely reclaim their locks within a special grace period. 1.5. General Definitions The following definitions are provided for the purpose of providing an appropriate context for the reader. Client The "client" is the entity that accesses the NFS server's resources. The client may be an application which contains the logic to access the NFS server directly. The client may also be the traditional operating system client remote file system services for a set of applications. A client is uniquely identified by a Client Owner. In the case of file locking the client is the entity that maintains a set of locks on behalf of one or more applications. This client is responsible for crash or failure recovery for those locks it manages. Note that multiple clients may share the same transport and connection and multiple clients may exist on the same network node. Client ID A 64-bit quantity used as a unique, short-hand reference to a client supplied Verifier and client owner. The server is responsible for supplying the client ID. Shepler, et al. Expires January 2, 2008 [Page 15] Internet-Draft NFSv4 Minor Version 1 July 2007 Client Owner The client owner is a unique string, opaque to the server, which identifies a client. Multiple network connections and source network addresses originating those connections may share a client owner. The server is expected to treat requests from connnections with the same client owner has coming from the same client. Lease An interval of time defined by the server for which the client is irrevocably granted a lock. At the end of a lease period the lock may be revoked if the lease has not been extended. The lock must be revoked if a conflicting lock has been granted after the lease interval. All leases granted by a server have the same fixed interval. Note that the fixed interval was chosen to alleviate the expense a server would have in maintaining state about variable length leases across server failures. Lock The term "lock" is used to refer to any of record (octet-range) locks, share reservations, delegations or layouts unless specifically stated otherwise. Server The "Server" is the entity responsible for coordinating client access to a set of file systems. A server can span multiple network addresses. In NFSv4.1, a server is a two tiered entity allows for servers consisting of multiple components the flexibility to tightly or loosely couple their components without requiring tight synchronization among the components. Every server has a "Server Owner" which reflects the two tiers of a server entity. Server Owner The "Server Owner" identifies the server to the client. The server owner consists of a major and minor identifier. When the client has two connections each to a peer with the same major and minor identifier, the client assumes both peers are the same server (the server namespace is the same via each connection), and further assumes session and lock state is sharable across both connections. When each peer has the same major identifier but different minor identifier, the client assumes both peers can serve the same namespace, but session and lock state is not sharable across both connections. Stable Storage NFS version 4 servers must be able to recover without data loss from multiple power failures (including cascading power failures, that is, several power failures in quick succession), operating system failures, and hardware failure of components other than the storage medium itself (for example, disk, nonvolatile RAM). Shepler, et al. Expires January 2, 2008 [Page 16] Internet-Draft NFSv4 Minor Version 1 July 2007 Some examples of stable storage that are allowable for an NFS server include: 1. Media commit of data, that is, the modified data has been successfully written to the disk media, for example, the disk platter. 2. An immediate reply disk drive with battery-backed on- drive intermediate storage or uninterruptible power system (UPS). 3. Server commit of data with battery-backed intermediate storage and recovery software. 4. Cache commit with uninterruptible power system (UPS) and recovery software. Stateid A 128-bit quantity returned by a server that uniquely defines the open and locking state provided by the server for a specific open or lock owner for a specific file and type of lock. Verifier A 64-bit quantity generated by the client that the server can use to determine if the client has restarted and lost all previous lock state. 1.6. Differences from NFSv4.0 The following summarizes the differences between minor version one and the base protocol: o Implementation of the sessions model. o Support for parallel access to data. o Addition of the RECLAIM_COMPLETE operation to better structure the lock reclamation process. o Support for delegations on directories and other file types in addition to regular files. o Operations to re-obtain a delegation. o Support for client and server implementation id's. 2. Core Infrastructure Shepler, et al. Expires January 2, 2008 [Page 17] Internet-Draft NFSv4 Minor Version 1 July 2007 2.1. Introduction NFS version 4.1 (NFSv4.1) relies on core infrastructure common to nearly every operation. This core infrastructure is described in the remainder of this section. 2.2. RPC and XDR The NFS version 4.1 (NFSv4.1) protocol is a Remote Procedure Call (RPC) application that uses RPC version 2 and the corresponding eXternal Data Representation (XDR) as defined in [4] and [3]. 2.2.1. RPC-based Security Previous NFS versions have been thought of as having a host-based authentication model, where the NFS server authenticates the NFS client, and trust the client to authenticate all users. Actually, NFS has always depended on RPC for authentication. The first form of RPC authentication which required a host-based authentication approach. NFSv4.1 also depends on RPC for basic security services, and mandates RPC support for a user-based authentication model. The user-based authentication model has user principals authenticated by a server, and in turn the server authenticated by user principals. RPC provides some basic security services which are used by NFSv4. 2.2.1.1. RPC Security Flavors As described in section 7.2 "Authentication" of [4], RPC security is encapsulated in the RPC header, via a security or authentication flavor, and information specific to the specification of the security flavor. Every RPC header conveys information used to identify and authenticate a client and server. As discussed in Section 2.2.1.1.1, some security flavors provide additional security services. NFSv4.1 clients and servers MUST implement RPCSEC_GSS. (This requirement to implement is not a requirement to use.) Other flavors, such as AUTH_NONE, and AUTH_SYS, MAY be implemented as well. 2.2.1.1.1. RPCSEC_GSS and Security Services RPCSEC_GSS ([5]) uses the functionality of GSS-API [8]. This allows for the use of various security mechanisms by the RPC layer without the additional implementation overhead of adding RPC security flavors. Shepler, et al. Expires January 2, 2008 [Page 18] Internet-Draft NFSv4 Minor Version 1 July 2007 2.2.1.1.1.1. Identification, Authentication, Integrity, Privacy Via the GSS-API, RPCSEC_GSS can be used to identify and authenticate users on clients to servers, and servers to users. It can also perform integrity checking on the entire RPC message, including the RPC header, and the arguments or results. Finally, privacy, usually via encryption, is a service available with RPCSEC_GSS. Privacy is performed on the arguments and results. Note that if privacy is selected, integrity, authentication, and identification are enabled. If privacy is not selected, but integrity is selected, authentication and identification are enabled. If integrity and privacy are not selected, but authentication is enabled, identification is enabled. RPCSEC_GSS does not provide identification as a separate service. Although GSS-API has an authentication service distinct from its privacy and integrity services, GSS-API's authentication service is not used for RPCSEC_GSS's authentication service. Instead, each RPC request and response header is integrity protected with the GSS-API integrity service, and this allows RPCSEC_GSS to offer per-RPC authentication and identity. See [5] for more information. NFSv4.1 client and servers MUST support RPCSEC_GSS's integrity and authentication service. NFSv4.1 servers MUST support RPCSEC_GSS's privacy service. 2.2.1.1.1.2. Security mechanisms for NFS version 4 RPCSEC_GSS, via GSS-API, normalizes access to mechanisms that provide security services. Therefore NFSv4.1 clients and servers MUST support three security mechanisms: Kerberos V5, SPKM-3, and LIPKEY. The use of RPCSEC_GSS requires selection of: mechanism, quality of protection (QOP), and service (authentication, integrity, privacy). For the mandated security mechanisms, NFSv4.1 specifies that a QOP of zero (0) is used, leaving it up to the mechanism or the mechanism's configuration to use an appropriate level of protection that QOP zero maps to. Each mandated mechanism specifies minimum set of cryptographic algorithms for implementing integrity and privacy. NFSv4.1 clients and servers MUST be implemented on operating environments that comply with the mandatory cryptographic algorithms of each mandated mechanism. 2.2.1.1.1.2.1. Kerberos V5 The Kerberos V5 GSS-API mechanism as described in [6] ( [[Comment.1: need new Kerberos RFC]] ) MUST be implemented with the RPCSEC_GSS services as specified in the following table: Shepler, et al. Expires January 2, 2008 [Page 19] Internet-Draft NFSv4 Minor Version 1 July 2007 column descriptions: 1 == number of pseudo flavor 2 == name of pseudo flavor 3 == mechanism's OID 4 == RPCSEC_GSS service 5 == NFSv4.1 clients MUST support 6 == NFSv4.1 servers MUST support 1 2 3 4 5 6 ------------------------------------------------------------------ 390003 krb5 1.2.840.113554.1.2.2 rpc_gss_svc_none yes yes 390004 krb5i 1.2.840.113554.1.2.2 rpc_gss_svc_integrity yes yes 390005 krb5p 1.2.840.113554.1.2.2 rpc_gss_svc_privacy no yes Note that the number and name of the pseudo flavor is presented here as a mapping aid to the implementor. Because the NFSv4.1 protocol includes a method to negotiate security and it understands the GSS- API mechanism, the pseudo flavor is not needed. The pseudo flavor is needed for the NFS version 3 since the security negotiation is done via the MOUNT protocol as described in [23]. 2.2.1.1.1.2.2. LIPKEY The LIPKEY V5 GSS-API mechanism as described in [7] MUST be implemented with the RPCSEC_GSS services as specified in the following table: 1 2 3 4 5 6 ------------------------------------------------------------------ 390006 lipkey 1.3.6.1.5.5.9 rpc_gss_svc_none yes yes 390007 lipkey-i 1.3.6.1.5.5.9 rpc_gss_svc_integrity yes yes 390008 lipkey-p 1.3.6.1.5.5.9 rpc_gss_svc_privacy no yes 2.2.1.1.1.2.3. SPKM-3 as a security triple The SPKM-3 GSS-API mechanism as described in [7] MUST be implemented with the RPCSEC_GSS services as specified in the following table: 1 2 3 4 5 6 ------------------------------------------------------------------ 390009 spkm3 1.3.6.1.5.5.1.3 rpc_gss_svc_none yes yes 390010 spkm3i 1.3.6.1.5.5.1.3 rpc_gss_svc_integrity yes yes 390011 spkm3p 1.3.6.1.5.5.1.3 rpc_gss_svc_privacy no yes Shepler, et al. Expires January 2, 2008 [Page 20] Internet-Draft NFSv4 Minor Version 1 July 2007 2.2.1.1.1.3. GSS Server Principal Regardless of what security mechanism under RPCSEC_GSS is being used, the NFS server, MUST identify itself in GSS-API via a GSS_C_NT_HOSTBASED_SERVICE name type. GSS_C_NT_HOSTBASED_SERVICE names are of the form: service@hostname For NFS, the "service" element is nfs Implementations of security mechanisms will convert nfs@hostname to various different forms. For Kerberos V5, LIPKEY, and SPKM-3, the following form is RECOMMENDED: nfs/hostname 2.3. COMPOUND and CB_COMPOUND A significant departure from the versions of the NFS protocol before version 4 is the introduction of the COMPOUND procedure. For the NFSv4 protocol, in all minor versions, there are exactly two RPC procedures, NULL and COMPOUND. The COMPOUND procedure is defined as a series of individual operations and these operations perform the sorts of functions performed by traditional NFS procedures. The operations combined within a COMPOUND request are evaluated in order by the server, without any atomicity guarantees. A limited set of facilities exist to pass results from one operation to another. Once an operation returns a failing result, the evaluation ends and the results of all evaluated operations are returned to the client. With the use of the COMPOUND procedure, the client is able to build simple or complex requests. These COMPOUND requests allow for a reduction in the number of RPCs needed for logical file system operations. For example, multi-component lookup requests can be constructed by combining multiple LOOKUP operations. Those can be further combined with operations such as GETATTR, READDIR, or OPEN plus READ to do more complicated sets of operation without incurring additional latency. NFSv4.1 also contains a considerable set of callback operations in which the server makes an RPC directed at the client. Callback RPC's have a similar structure to that of the normal server requests. For the NFS version 4 protocol callbacks in all minor versions, there are two RPC procedures, NULL and CB_COMPOUND. The CB_COMPOUND procedure Shepler, et al. Expires January 2, 2008 [Page 21] Internet-Draft NFSv4 Minor Version 1 July 2007 is defined in an analogous fashion to that of COMPOUND with its own set of callback operations. Addition of new server and callback operation within the COMPOUND and CB_COMPOUND request framework provide means of extending the protocol in subsequent minor versions. Except for a small number of operations needed for session creation, server requests and callback requests are performed within the context of a session. Sessions provide a client context for every request and support robust reply protection for non-idempotent requests. 2.4. Client Identifiers and Client Owners For each operation that obtains or depends on locking state, the specific client must be determinable by the server. In NFSv4, each distinct client instance is represented by a client ID, which is a 64-bit identifier that identifies a specific client at a given time and which is changed whenever the client re-initializes, and may change when the server re-initializes. Client IDs are used to support lock identification and crash recovery. In NFSv4.1, during steady state operation, the client ID associated with each operation is derived from the session (see Section 2.10) on which the operation is issued. Each session is associated with a specific client ID at session creation and that client ID then becomes the client ID associated with all requests issued using it. Therefore, unlike NFSv4.0, the only NFSv4.1 operations possible before a client ID is established are those needed to establish the client ID. A sequence of an EXCHANGE_ID operation followed by a CREATE_SESSION operation using that client ID (eir_clientid as returned from EXCHANGE_ID) is required to establish the identification on the server. Establishment of identification by a new incarnation of the client also has the effect of immediately releasing any locking state that a previous incarnation of that same client might have had on the server. Such released state would include all lock, share reservation, layout state, and where the server is not supporting the CLAIM_DELEGATE_PREV claim type, all delegation state associated with same client with the same identity. For discussion of delegation state recovery, see Section 10.2.1. For discussion of layout state recovery see Section 13.7.1. Releasing such state requires that the server be able to determine that one client instance is the successor of another. Where this cannot be done, for any of a number of reasons, the locking state Shepler, et al. Expires January 2, 2008 [Page 22] Internet-Draft NFSv4 Minor Version 1 July 2007 will remain for a time subject to lease expiration (see Section 8.3) and the new client will need to wait for such state to be removed, if it makes conflicting lock requests. Client identification is encapsulated in the following Client Owner structure: struct client_owner4 { verifier4 co_verifier; opaque co_ownerid; }; The first field, co_verifier, is a client incarnation verifier that is used to detect client reboots. Only if the co_verifier is different from that the server had previously recorded for the client (as identified by the second field of the structure, co_ownerid) does the server start the process of canceling the client's leased state. The second field, co_ownerid is a variable length string that uniquely defines the client so that subsequent instances of the same client bear the same co_ownerid with a different verifier. There are several considerations for how the client generates the co_ownerid string: o The string should be unique so that multiple clients do not present the same string. The consequences of two clients presenting the same string range from one client getting an error to one client having its leased state abruptly and unexpectedly canceled. o The string should be selected so the subsequent incarnations (e.g. reboots) of the same client cause the client to present the same string. The implementor is cautioned from an approach that requires the string to be recorded in a local file because this precludes the use of the implementation in an environment where there is no local disk and all file access is from an NFS version 4 server. o The string should be the same for each server network address that the client accesses, (note: the precise opposite was advised in the NFSv4.0 specification [2]). This way, if a server has multiple interfaces, the client can trunk traffic over multiple network paths as described in Section 2.10.4. o The algorithm for generating the string should not assume that the client's network address will not change, unless the client Shepler, et al. Expires January 2, 2008 [Page 23] Internet-Draft NFSv4 Minor Version 1 July 2007 implementation knows it is using statically assigned network addresses. This includes changes between client incarnations and even changes while the client is still running in its current incarnation. This means that if the client includes just the client's network address in the co_ownerid string, there is a real risk, with dynamic address assignment, that after the client gives up the network address, another client, using a similar algorithm for generating the co_ownerid string, would generate a conflicting co_ownerid string. Given the above considerations, an example of a well generated co_ownerid string is one that includes: o If applicable, the client's statically assigned network address. o Additional information that tends to be unique, such as one or more of: * The client machine's serial number (for privacy reasons, it is best to perform some one way function on the serial number). * A MAC address (again, a one way function should be performed). * The timestamp of when the NFS version 4 software was first installed on the client (though this is subject to the previously mentioned caution about using information that is stored in a file, because the file might only be accessible over NFS version 4). * A true random number. However since this number ought to be the same between client incarnations, this shares the same problem as that of the using the timestamp of the software installation. o For a user level NFS version 4 client, it should contain additional information to distinguish the client from other user level clients running on the same host, such as a process identifier or other unique sequence. A server may compare a client_owner4 in an EXCHANGE_ID with an nfs_client_id4 established using SETCLIENTID using NFSv4 minor version 0, so that an NFSv4.1 client is not forced to delay until lease expiration for locking state established by the earlier client using minor version 0. This requires the client_owner4 be constructed the same way as the nfs_client_id4. If the latter's contents included the server's network address, and the NFSv4.1 client does not wish to use a client ID that prevents trunking, it should issue two EXCHANGE_ID operations. The first EXCHANGE_ID will Shepler, et al. Expires January 2, 2008 [Page 24] Internet-Draft NFSv4 Minor Version 1 July 2007 have a client_owner4 equal to the nfs_client_id4. This will clear the state created by the NFSv4.0 client. The second EXCHANGE_ID will not have the server's network address. The state created for the second EXCHANGE_ID will not have to wait for lease expiration, because there will be no state to expire. Once an EXCHANGE_ID has been done, and the resulting client ID established as associated with a session, all requests made on that session implicitly identify that client ID, which in turn designates the client specified using the long-form client_owner4 structure. The shorthand client identifier (a client ID) is assigned by the server (the eir_clientid result from EXCHANGE_ID) and should be chosen so that it will not conflict with a client ID previously assigned by the server. This applies across server restarts or reboots. In the event of a server restart, a client may find out that its current client ID is no longer valid when receives a NFS4ERR_STALE_CLIENTID error. The precise circumstances depend of the characteristics of the sessions involved, specifically whether the session is persistent (see Section 2.10.5.5). When a session is not persistent, the client will need to create a new session. When the existing client ID is presented to a server as part of creating a session and that client ID is not recognized, as would happen after a server reboot, the server will reject the request with the error NFS4ERR_STALE_CLIENTID. When this happens, the client must obtain a new client ID by use of the EXCHANGE_ID operation and then use that client ID as the basis of the basis of a new session and then proceed to any other necessary recovery for the server reboot case (See Section 8.4.2). In the case of the session being persistent, the client will re- establish communication using the existing session after the reboot. This session will be associated with a client ID that has had state revoked (but the persistent session is never associated with a stale client ID, because if the session is persistent, the client ID MUST persist), and the client will receive an indication of that fact in the sr_status_flags field returned by the SEQUENCE operation (see Section 18.46.4). The client can then use the existing session to do whatever operations are necessary to determine the status of requests outstanding at the time of reboot, while avoiding issuing new requests, particularly any involving locking on that session. Such requests would fail with an NFS4ERR_STALE_STATEID error, if attempted. See the detailed descriptions of EXCHANGE_ID (Section 18.35 and CREATE_SESSION (Section 18.36) for a complete specification of these Shepler, et al. Expires January 2, 2008 [Page 25] Internet-Draft NFSv4 Minor Version 1 July 2007 operations. 2.4.1. Server Release of Client ID NFSv4.1 introduces a new operation called DESTROY_CLIENTID (Section 18.50) which the client SHOULD use to destroy a client ID it no longer needs. This permits graceful, bilateral release of a client ID. If the server determines that the client holds no associated state for its client ID (including sessions, opens, locks, delegations, layouts, and wants), the server may choose to unilaterally release the client ID. The server may make this choice for an inactive client so that resources are not consumed by those intermittently active clients. If the client contacts the server after this release, the server must ensure the client receives the appropriate error so that it will use the EXCHANGE_ID/CREATE_SESSION sequence to establish a new identity. It should be clear that the server must be very hesitant to release a client ID since the resulting work on the client to recover from such an event will be the same burden as if the server had failed and restarted. Typically a server would not release a client ID unless there had been no activity from that client for many minutes. As long as there are sessions, opens, locks, delegations, layouts, or wants, the server MUST not release the client ID. See Section 2.10.9.1.4 for discussion on releasing inactive sessions. 2.4.2. Resolving Client Owner Conflicts When the server gets an EXCHANGE_ID for a client owner that currently has no state, or if it has state, but the lease has expired, server MUST allow the EXCHANGE_ID, and confirm the new client ID if followed by the appropriate CREATE_SESSION. When the server gets an EXCHANGE_ID for a client owner that currently has state and an unexpired lease, the server MUST NOT destroy any state that currently exists for the client owner unless one of the following are true: o The principal that created the client ID for the client owner is the same as the principal that is issuing the EXCHANGE_ID. Note that if the client ID was created with SP4_MACH_CRED protection (Section 18.35), the principal MUST be based on RPCSEC_GSS authentication, the RPCSEC_GSS service used MUST be integrity or privacy, and the same GSS mechanism and principal must be used as that used when the client ID was created. Shepler, et al. Expires January 2, 2008 [Page 26] Internet-Draft NFSv4 Minor Version 1 July 2007 o The client ID was established with SP4_SSV protection (Section 18.35), and the client sends the EXCHANGE_ID with the security flavor set to RPCSEC_GSS using the GSS SSV mechanism (Section 2.10.7.4). Note that this is possible only if the server and client persist the SSV. o The client ID was established with SP4_SSV protection. Because the SSV might not be persisted across client and server restart, and because the first time a client issues EXCHANGE_ID to a server it does not have an SSV, the client MAY issue the subsequent EXCHANGE_ID without an SSV RPCSEC_GSS handle. Instead, as with SP4_MACH_CRED protection, the principal MUST be based on RPCSEC_GSS authentication, the RPCSEC_GSS service used MUST be integrity or privacy, and the same GSS mechanism and principal must be used as that used when the client ID was created. If the none of the above situations apply, the server MUST return NFS4ERR_CLID_INUSE. Even the server accepts the principal and co_ownerid as matching that which created the client ID, it MUST NOT delete any state unless the co_verifier in the EXCHANGE_ID does not match the co_verifier used when client ID was created. If the co_verifier matches, then the client is either updating properties of the client ID, or possibly attempting trunking opportunity (Section 2.10.4). 2.5. Server Owners The Server Owner is somewhat similar to a Client Owner (Section 2.4), but unlike the Client Owner, there is no shorthand serverid. The Server Owner is defined in the following structure: struct server_owner4 { uint64_t so_minor_id; opaque so_major_id; }; The Server Owner is returned in the results of EXCHANGE_ID. When the so_major_id fields are the same in two EXCHANGE_ID results, the connections each EXCHANGE_ID are sent over can be assumed to address the same Server (as defined in Section 1.5). If the so_minor_id fields are also the same, then not only do both connections connect to the same server, but the session and other state can be shared across both connections. The reader is cautioned that multiple servers may deliberately or accidentally claim to have the same so_major_id or so_major_id/so_minor_id; the reader should examine Section 2.10.4 and Section 18.35. Shepler, et al. Expires January 2, 2008 [Page 27] Internet-Draft NFSv4 Minor Version 1 July 2007 The considerations for generating a so_major_id are similar to that for generating a co_ownerid string (see Section 2.4). The consequences of two servers generating conflicting so_major_id values are less dire than they are for co_ownerid conflicts because the client can use RPCSEC_GSS to compare the authenticity of each server (see Section 2.10.4). 2.6. Security Service Negotiation With the NFS version 4 server potentially offering multiple security mechanisms, the client needs a method to determine or negotiate which mechanism is to be used for its communication with the server. The NFS server may have multiple points within its file system namespace that are available for use by NFS clients. These points can be considered security policy boundaries, and in some NFS implementations are tied to NFS export points. In turn the NFS server may be configured such that each of these security policy boundaries may have different or multiple security mechanisms in use. The security negotiation between client and server must be done with a secure channel to eliminate the possibility of a third party intercepting the negotiation sequence and forcing the client and server to choose a lower level of security than required or desired. See Section 21 for further discussion. 2.6.1. NFSv4.1 Security Tuples An NFS server can assign one or more "security tuples" to each security policy boundary in its namespace. Each security tuple consists of a security flavor (see Section 2.2.1.1), and if the flavor is RPCSEC_GSS, a GSS-API mechanism OID, a GSS-API quality of protection, and an RPCSEC_GSS service. 2.6.2. SECINFO and SECINFO_NO_NAME The SECINFO and SECINFO_NO_NAME operations allow the client to determine, on a per filehandle basis, what security tuple is to be used for server access. In general, the client will not have to use either operation except during initial communication with the server or when the client crosses security policy boundaries at the server. It is possible that the server's policies change during the client's interaction therefore forcing the client to negotiate a new security tuple. Where the use of different security tuples would affect the type of access that would be allowed if a request was issued over the same connection used for the SECINFO or SECINFO_NO_NAME operation (e.g. read-only vs. read-write) access, security tuples that allow greater Shepler, et al. Expires January 2, 2008 [Page 28] Internet-Draft NFSv4 Minor Version 1 July 2007 access should be presented first. Where the general level of access is the same and different security flavors limit the range of principals whose privileges are recognized (e.g. allowing or disallowing root access), flavors supporting the greatest range of principals should be listed first. 2.6.3. Security Error Based on the assumption that each NFS version 4 client and server must support a minimum set of security (i.e., LIPKEY, SPKM-3, and Kerberos-V5 all under RPCSEC_GSS), the NFS client will initiate file access to the server with one of the minimal security tuples. During communication with the server, the client may receive an NFS error of NFS4ERR_WRONGSEC. This error allows the server to notify the client that the security tuple currently being used contravenes the server's security policy. The client is then responsible for determining (see Section 2.6.3.1) what security tuples are available at the server and choosing one which is appropriate for the client. 2.6.3.1. Using NFS4ERR_WRONGSEC, SECINFO, and SECINFO_NO_NAME This section explains of the mechanics of NFSv4.1 security negotiation. The term "put filehandle operation" refers to PUTROOTFH, PUTPUBFH, PUTFH, and RESTOREFH. 2.6.3.1.1. Put Filehandle Operation + SAVEFH The client is saving a filehandle for a future RESTOREFH. The server MUST NOT return NFS4ERR_WRONGSEC to either the put filehandle operation or SAVEFH. 2.6.3.1.2. Two or More Put Filehandle Operations For a series of N put filehandle operations, the server MUST NOT return NFS4ERR_WRONGSEC to the first N-1 put filehandle operations. The Nth put filehandle operation is handled as if it is the first in a series of operations, and the second in the series of operations is not a put filehandle operation. For example if the server received PUTFH, PUTROOTFH, LOOKUP, then the PUTFH is ignored for NFS4ERR_WRONGSEC purposes, and the PUTROOTFH, LOOKUP subseries is processed as according to Section 2.6.3.1.3. 2.6.3.1.3. Put Filehandle Operation + LOOKUP (or OPEN by Name) This situation also applies to a put filehandle operation followed by a LOOKUP or an OPEN operation that specifies a component name. In this situation, the client is potentially crossing a security Shepler, et al. Expires January 2, 2008 [Page 29] Internet-Draft NFSv4 Minor Version 1 July 2007 policy boundary, and the set of security tuples the parent directory supports differ from those of the child. The server implementation may decide whether to impose any restrictions on security policy administration. There are at least three approaches (sec_policy_child is the tuple set of the child export, sec_policy_parent is that of the parent). a) sec_policy_child <= sec_policy_parent (<= for subset). This means that the set of security tuples specified on the security policy of a child directory is always a subset of that of its parent directory. b) sec_policy_child ^ sec_policy_parent != {} (^ for intersection, {} for the empty set). This means that the security tuples specified on the security policy of a child directory always has a non empty intersection with that of the parent. c) sec_policy_child ^ sec_policy_parent == {}. This means that the set of tuples specified on the security policy of a child directory may not intersect with that of the parent. In other words, there are no restrictions on how the system administrator may set up these tuples. For a server to support approach (b) (when client chooses a flavor that is not a member of sec_policy_parent) and (c), the put filehandle operation must NOT return NFS4ERR_WRONGSEC in case of security mismatch. Instead, it should be returned from the LOOKUP (or OPEN by component name) that follows. Since the above guideline does not contradict approach (a), it should be followed in general. Even if approach (a) is implemented, it is possible for the security tuple used to be acceptable for the target of LOOKUP but not for the filehandles used in the put filehandle operation. The put filehandle operation could be a PUTROOTFH or PUTPUBFH, where the client cannot know the security tuples for the root or public filehandle. Or the security policy for the filehandle used by the put filehandle operation could have changed since the time the filehandle was obtained. Therefore, an NFSv4.1 server MUST NOT return NFS4ERR_WRONGSEC in response to the put filehandle operation if the operation is immediately followed by a LOOKUP or an OPEN by component name. 2.6.3.1.4. Put Filehandle Operation + LOOKUPP Since SECINFO only works its way down, there is no way LOOKUPP can return NFS4ERR_WRONGSEC without SECINFO_NO_NAME. SECINFO_NO_NAME solves this issue because via style SECINFO_STYLE4_PARENT, it works Shepler, et al. Expires January 2, 2008 [Page 30] Internet-Draft NFSv4 Minor Version 1 July 2007 in the opposite direction as SECINFO. As with Section 2.6.3.1.3, the put filehandle operation must not return NFS4ERR_WRONGSEC whenever it is followed by LOOKUPP. If the server does not support SECINFO_NO_NAME, the client's only recourse is to issue the put filehandle operation, LOOKUPP, GETFH sequence of operations with every security tuple it supports. Regardless whether SECINFO_NO_NAME is supported, an NFSv4.1 server MUST NOT return NFS4ERR_WRONGSEC in response to a put filehandle operation if the operation is immediately followed by a LOOKUPP. 2.6.3.1.5. Put Filehandle Operation + SECINFO/SECINFO_NO_NAME A security sensitive client is allowed to choose a strong security tuple when querying a server to determine a file object's permitted security tuples. The security tuple chosen by the client does not have to be included in the tuple list of the security policy of the either parent directory indicated in the put filehandle operation, or the child file object indicated in SECINFO (or any parent directory indicated in SECINFO_NO_NAME). Of course the server has to be configured for whatever security tuple the client selects, otherwise the request will fail at RPC layer with an appropriate authentication error. In theory, there is no connection between the security flavor used by SECINFO or SECINFO_NO_NAME and those supported by the security policy. But in practice, the client may start looking for strong flavors from those supported by the security policy, followed by those in the mandatory set. The NFSv4.1 server MUST NOT return NFS4ERR_WRONGSEC to a put filehandle operation whenever it is immediately followed by SECINFO or SECINFO_NO_NAME. The NFSv4.1 server MUST NOT return NFS4ERR_WRONGSEC from SECINFO or SECINFO_NO_NAME. 2.6.3.1.6. Put Filehandle Operation + Nothing The NFSv4.1 server MUST NOT return NFS4ERR_WRONGSEC. 2.6.3.1.7. Put Filehandle Operation + Anything Else "Anything Else" includes OPEN by filehandle. The security policy enforcement applies to the filehandle specified in the put filehandle operation. Therefore PUTFH must return NFS4ERR_WRONGSEC in case of security tuple on the part of the mismatch. This avoids the complexity adding NFS4ERR_WRONGSEC as an allowable error to every other operation. Shepler, et al. Expires January 2, 2008 [Page 31] Internet-Draft NFSv4 Minor Version 1 July 2007 A COMPOUND containing the series put filehandle operation + SECINFO_NO_NAME (style SECINFO_STYLE4_CURRENT_FH) is an efficient way for the client to recover from NFS4ERR_WRONGSEC. The NFSv4.1 server MUST not return NFS4ERR_WRONGSEC to any operation other than a put filehandle operation, LOOKUP, LOOKUPP, and OPEN (by component name). 2.7. Minor Versioning To address the requirement of an NFS protocol that can evolve as the need arises, the NFS version 4 protocol contains the rules and framework to allow for future minor changes or versioning. The base assumption with respect to minor versioning is that any future accepted minor version must follow the IETF process and be documented in a standards track RFC. Therefore, each minor version number will correspond to an RFC. Minor version zero of the NFS version 4 protocol is represented by [2], and minor version one is represented by this document [[Comment.2: change "document" to "RFC" when we publish]] . The COMPOUND and CB_COMPOUND procedures support the encoding of the minor version being requested by the client. The following items represent the basic rules for the development of minor versions. Note that a future minor version may decide to modify or add to the following rules as part of the minor version definition. 1. Procedures are not added or deleted To maintain the general RPC model, NFS version 4 minor versions will not add to or delete procedures from the NFS program. 2. Minor versions may add operations to the COMPOUND and CB_COMPOUND procedures. The addition of operations to the COMPOUND and CB_COMPOUND procedures does not affect the RPC model. * Minor versions may append attributes to GETATTR4args, bitmap4, and GETATTR4res. This allows for the expansion of the attribute model to allow for future growth or adaptation. * Minor version X must append any new attributes after the last documented attribute. Shepler, et al. Expires January 2, 2008 [Page 32] Internet-Draft NFSv4 Minor Version 1 July 2007 Since attribute results are specified as an opaque array of per-attribute XDR encoded results, the complexity of adding new attributes in the midst of the current definitions would be too burdensome. 3. Minor versions must not modify the structure of an existing operation's arguments or results. Again the complexity of handling multiple structure definitions for a single operation is too burdensome. New operations should be added instead of modifying existing structures for a minor version. This rule does not preclude the following adaptations in a minor version. * adding bits to flag fields such as new attributes to GETATTR's bitmap4 data type * adding bits to existing attributes like ACLs that have flag words * extending enumerated types (including NFS4ERR_*) with new values 4. Minor versions may not modify the structure of existing attributes. 5. Minor versions may not delete operations. This prevents the potential reuse of a particular operation "slot" in a future minor version. 6. Minor versions may not delete attributes. 7. Minor versions may not delete flag bits or enumeration values. 8. Minor versions may declare an operation as mandatory to NOT implement. Specifying an operation as "mandatory to not implement" is equivalent to obsoleting an operation. For the client, it means that the operation should not be sent to the server. For the server, an NFS error can be returned as opposed to "dropping" the request as an XDR decode error. This approach allows for the obsolescence of an operation while maintaining its structure so that a future minor version can reintroduce the operation. Shepler, et al. Expires January 2, 2008 [Page 33] Internet-Draft NFSv4 Minor Version 1 July 2007 1. Minor versions may declare attributes mandatory to NOT implement. 2. Minor versions may declare flag bits or enumeration values as mandatory to NOT implement. 9. Minor versions may downgrade features from mandatory to recommended, or recommended to optional. 10. Minor versions may upgrade features from optional to recommended or recommended to mandatory. 11. A client and server that supports minor version X should support minor versions 0 (zero) through X-1 as well. 12. Except for infrastructural changes, no new features may be introduced as mandatory in a minor version. This rule allows for the introduction of new functionality and forces the use of implementation experience before designating a feature as mandatory. On the other hand, some classes of features are infrastructural and have broad effects. Allowing such features to not be mandatory complicates implementation of the minor version. 13. A client MUST NOT attempt to use a stateid, filehandle, or similar returned object from the COMPOUND procedure with minor version X for another COMPOUND procedure with minor version Y, where X != Y. 2.8. Non-RPC-based Security Services As described in Section 2.2.1.1.1.1, NFSv4.1 relies on RPC for identification, authentication, integrity, and privacy. NFSv4.1 itself provides additional security services as described in the next several subsections. 2.8.1. Authorization Authorization to access a file object via an NFSv4.1 operation is ultimately determined by the NFSv4.1 server. A client can predetermine its access to a file object via the OPEN (Section 18.16) and the ACCESS (Section 18.1) operations. Principals with appropriate access rights can modify the authorization on a file object via the SETATTR (Section 18.30) operation. Four attributes that affect access rights are: mode, owner, owner_group, and acl. See Section 5. Shepler, et al. Expires January 2, 2008 [Page 34] Internet-Draft NFSv4 Minor Version 1 July 2007 2.8.2. Auditing NFSv4.1 provides auditing on a per file object basis, via the ACL attribute as described in Section 6. It is outside the scope of this specification to specify audit log formats or management policies. 2.8.3. Intrusion Detection NFSv4.1 provides alarm control on a per file object basis, via the ACL attribute as described in Section 6. Alarms may serve as the basis for intrusion detection. It is outside the scope of this specification to specify heuristics for detecting intrusion via alarms. 2.9. Transport Layers 2.9.1. Required and Recommended Properties of Transports NFSv4.1 works over RDMA and non-RDMA_based transports with the following attributes: o The transport supports reliable delivery of data, which NFSv4.1 requires but neither NFSv4.1 nor RPC has facilities for ensuring. [24] o The transport delivers data in the order it was sent. Ordered delivery simplifies detection of transmit errors, and simplifies the sending of arbitrary sized requests and responses, via the record marking protocol [4]. Where an NFS version 4 implementation supports operation over the IP network protocol, any transport used between NFS and IP MUST be among the IETF-approved congestion control transport protocols. At the time this document was written, the only two transports that had the above attributes were TCP and SCTP. To enhance the possibilities for interoperability, an NFS version 4 implementation MUST support operation over the TCP transport protocol. Even if NFS version 4 is used over a non-IP network protocol, it is RECOMMENDED that the transport support congestion control. It is permissible for a connectionless transport to be used under NFSv4.1, however reliable and in-order delivery of data by the connectionless transport are still required. NFSv4.1 assumes that a client transport address and server transport address used to send data over a transport together constitute a connection, even if the underlying transport eschews the concept of a connection. Shepler, et al. Expires January 2, 2008 [Page 35] Internet-Draft NFSv4 Minor Version 1 July 2007 2.9.2. Client and Server Transport Behavior If a connection-oriented transport (e.g. TCP) is used the client and server SHOULD use long lived connections for at least three reasons: 1. This will prevent the weakening of the transport's congestion control mechanisms via short lived connections. 2. This will improve performance for the WAN environment by eliminating the need for connection setup handshakes. 3. The NFSv4.1 callback model differs from NFSv4.0, and requires the client and server to maintain a client-created backchannel (see Section 2.10.3.1) for the server to use. In order to reduce congestion, if a connection-oriented transport is used, and the request is not the NULL procedure, o A requester MUST NOT retry a request unless the connection the request was issued over was lost before the reply was received. o A replier MUST NOT silently drop a request, even if the request is a retry. (The silent drop behavior of RPCSEC_GSS [5] does not apply because this behavior happens at the RPCSEC_GSS layer, a lower layer in the request processing). Instead, the replier SHOULD return an appropriate error (see Section 2.10.5.1) or it MAY disconnect the connection. When using RDMA transports there are other reasons for not tolerating retries over the same connection: o RDMA transports use "credits" to enforce flow control, where a credit is a right to a peer to transmit a message. If one peer were to retransmit a request (or reply), it would consume an additional credit. If the replier retransmitted a reply, it would certainly result in an RDMA connection loss, since the requester would typically only post a single receive buffer for each request. If the requester retransmitted a request, the additional credit consumed on the server might lead to RDMA connection failure unless the client accounted for it and decreased its available credit, leading to wasted resources. o RDMA credits present a new issue to the reply cache in NFSv4.1. The reply cache may be used when a connection within a session is lost, such as after the client reconnects. Credit information is a dynamic property of the RDMA connection, and stale values must not be replayed from the cache. This implies that the reply cache contents must not be blindly used when replies are issued from it, Shepler, et al. Expires January 2, 2008 [Page 36] Internet-Draft NFSv4 Minor Version 1 July 2007 and credit information appropriate to the channel must be refreshed by the RPC layer. In addition, the NFSv4.1 requester is not allowed to stop waiting for a reply, as described in Section 2.10.5.2. 2.9.3. Ports Historically, NFS version 2 and version 3 servers have listened over TCP port 2049. The registered port 2049 [25] for the NFS protocol should be the default configuration. NFSv4.1 clients SHOULD NOT use the RPC binding protocols as described in [26]. 2.10. Session 2.10.1. Motivation and Overview Previous versions and minor versions of NFS have suffered from the following: o Lack of support for exactly once semantics (EOS). This includes lack of support for EOS through server failure and recovery. o Limited callback support, including no support for sending callbacks through firewalls, and races between responses from normal requests, and callbacks. o Limited trunking over multiple network paths. o Requiring machine credentials for fully secure operation. Through the introduction of a session, NFSv4.1 addresses the above shortfalls with practical solutions: o EOS is enabled by a reply cache with a bounded size, making it feasible to keep the cache in persistent storage and enable EOS through server failure and recovery. One reason that previous revisions of NFS did not support EOS was because some EOS approaches often limited parallelism. As will be explained in Section 2.10.5, NFSv4.1 supports both EOS and unlimited parallelism. o The NFSv4.1 client (defined in Section 1.5, Paragraph 1) creates transport connections and provides them to the server to use for sending callback requests, thus solving the firewall issue (Section 18.34). Races between responses from client requests, and callbacks caused by the requests are detected via the session's sequencing properties which are a consequence of EOS Shepler, et al. Expires January 2, 2008 [Page 37] Internet-Draft NFSv4 Minor Version 1 July 2007 (Section 2.10.5.3). o The NFSv4.1 client can add an arbitrary number of connections to the session, and thus provide trunking (Section 2.10.4). o The NFSv4.1 client and server produces a session key independent of client and server machine credentials which can be used to compute a digest for protecting critical session management operations (Section 2.10.7.3). o The NFSv4.1 client can also create secure RPCSEC_GSS contexts for use by the session's backchannel that do not require the server to authenticate to a client machine principal (Section 2.10.7.2). A session is a dynamically created, long-lived server object created by a client, used over time from one or more transport connections. Its function is to maintain the server's state relative to the connection(s) belonging to a client instance. This state is entirely independent of the connection itself, and indeed the state exists whether the connection exists or not. A client may have one or more sessions associated with it so that client-associated state may be accessed using any of the sessions associated with that client's client ID, when connections are associated with those sessions. When no connections are associated for any of the sessions associated with the client ID for an extended time such objects as locks, opens, delegations, layouts, etc. are subject to expiration. The session serves as an object representing a means of access by a client to the associated client state on the server, independent of the physical means of access to that state. A single client may create multiple sessions. A single session MUST NOT serve multiple clients. 2.10.2. NFSv4 Integration Sessions are part of NFSv4.1 and not NFSv4.0. Normally, a major infrastructure change such as sessions would require a new major version number to an ONC RPC program like NFS. However, because NFSv4 encapsulates its functionality in a single procedure, COMPOUND, and because COMPOUND can support an arbitrary number of operations, sessions have been added to NFSv4.1 with little difficulty. COMPOUND includes a minor version number field, and for NFSv4.1 this minor version is set to 1. When the NFSv4 server processes a COMPOUND with the minor version set to 1, it expects a different set of operations than it does for NFSv4.0. NFSv4.1 defines the SEQUENCE operation, which is required for every COMPOUND that operates over an established session, with the exception of some session administration operations, such as DESTROY_SESSION (Section 18.37). Shepler, et al. Expires January 2, 2008 [Page 38] Internet-Draft NFSv4 Minor Version 1 July 2007 2.10.2.1. SEQUENCE and CB_SEQUENCE In NFSv4.1, when the SEQUENCE operation is present, it MUST be the first operation in the COMPOUND procedure. The primary purpose of SEQUENCE is to carry the session identifier. The session identifier associates all other operations in the COMPOUND procedure with a particular session. SEQUENCE also contains required information for maintaining EOS (see Section 2.10.5). Session-enabled NFSv4.1 COMPOUND requests thus have the form: +-----+--------------+-----------+------------+-----------+---- | tag | minorversion | numops |SEQUENCE op | op + args | ... | | (== 1) | (limited) | + args | | +-----+--------------+-----------+------------+-----------+---- and the reply's structure is: +------------+-----+--------+-------------------------------+--// |last status | tag | numres |status + SEQUENCE op + results | // +------------+-----+--------+-------------------------------+--// //-----------------------+---- // status + op + results | ... //-----------------------+---- A CB_COMPOUND procedure request and reply has a similar form to COMPOUND, but instead of a SEQUENCE operation, there is a CB_SEQUENCE operation. CB_COMPOUND also has an additional field called "callback_ident", which is superfluous in NFSv4.1 and MUST be ignored by the client. CB_SEQUENCE has the same information as SEQUENCE, and also includes other information needed to resolve callback races (Section 2.10.5.3). 2.10.2.2. Client ID and Session Association Each client ID (Section 2.4) can have zero or more active sessions. A client ID, and a session associated with it are required to perform file access in NFSv4.1. Each time a session is used (whether by a client sending a request to the server, or the client replying to a callback request from the server), the state leased to its associated client ID is automatically renewed. State such as share reservations, locks, delegations, and layouts (Section 1.4.4) is tied to the client ID. Client state is not tied to the sessions of the client ID. Successive state changing operations from a given state owner MAY go over different sessions, provided the session is associated with the same client ID. A callback MAY arrive over a different session than from the session that originally acquired the state pertaining to the callback. For Shepler, et al. Expires January 2, 2008 [Page 39] Internet-Draft NFSv4 Minor Version 1 July 2007 example, if session A is used to acquire a delegation, a request to recall the delegation MAY arrive over session B if both sessions are associated with the same client ID. Section 2.10.7.1 and Section 2.10.7.2 discuss the security considerations around callbacks. 2.10.3. Channels A channel is not a connection. A channel represents the direction ONC RPC requests are sent to. Each session has one or two channels: the fore channel and the backchannel. Because there are at most two channels per session, and because each channel has a distinct purpose, channels are not assigned identifiers. The fore channel is used for ordinary requests from the client to the server, and carries COMPOUND requests and responses. A session always has a fore channel. The backchannel used for callback requests from server to client, and carries CB_COMPOUND requests and responses. Whether there is a backchannel or not is a decision by the client, however many features of NFSv4.1 require a backchannel. NFSv4.1 servers MUST support backchannels. Each session has resources for each channel, including separate reply caches (see Section 2.10.5.1). Note that even the backchannel requires a reply cache because some callback operations are nonidempotent. 2.10.3.1. Association of Connections, Channels, and Sessions Each channel is associated with zero or more transport connections. A connection can be associated with one channel or both channels of a session; the client and server negotiate whether a connection will carry traffic for one channel or both channels via the CREATE_SESSION (Section 18.36) and the BIND_CONN_TO_SESSION (Section 18.34) operations. When a session is created via CREATE_SESSION, the connection that transported the CREATE_SESSION request is automatically associated with the fore channel, and optionally the backchannel. If the client specifies no state protection (Section 18.35). when the session is created, then when SEQUENCE is transmitted on a different connection, the connection is automatically associated with the fore channel of the session specified in the SEQUENCE operation. A connection's association with a session is not exclusive. A Shepler, et al. Expires January 2, 2008 [Page 40] Internet-Draft NFSv4 Minor Version 1 July 2007 connection associated with the channel(s) of one session may be simultaneously associated with the channel(s) of other sessions including sessions associated with other client IDs. It is permissible for connections of multiple transport types to be associated with the same channel. For example both a TCP and RDMA connection can be associated with the fore channel. In the event an RDMA and non-RDMA connection are associated with the same channel, the maximum number of slots SHOULD be at least one more than the total number of credits (Section 2.10.5.1. This way if all RDMA credits are used, the non-RDMA connection can have at least one outstanding request. If a server supports multiple transport types, it MUST allow a client to associate connections from each transport to a channel. It is permissible for a connection of type of transport to be associated with the fore channel, and a connection of a different type to be associated with the backchannel. 2.10.4. Trunking Trunking is the use of multiple connections between a client and server in order to increase the speed of data transfer. NFSv4.1 supports two types of trunking: session trunking and client ID trunking. NFSv4.1 servers MUST support trunking. Session trunking is essentially the association of multiple connections, each with a potentially different target network address, to the same session. Client ID trunking is the association of multiple sessions to the same client ID, major server owner ID (Section 2.5), and server scope (Section 11.6.7). When two servers return the same major server owner and server scope it means the two servers are cooperating on locking state management which is a prerequisite for client ID trunking. Understanding and distinguishing session and client ID trunking requires understanding how the results of the EXCHANGE_ID (Section 18.35) operation identify a server. Suppose a client issues EXCHANGE_ID over two different connections each with a possibly different target network address but each EXCHANGE_ID with the same value in the eia_clientowner field. If the same NFSv4.1 server is listening over each connection, then each EXCHANGE_ID result MUST return the same values of eir_clientid, eir_server_owner.so_major_id and eir_server_scope. The client can then treat each connection as referring to the same server (subject to verification, see Paragraph 5 later in this section), and it can use each connection to Shepler, et al. Expires January 2, 2008 [Page 41] Internet-Draft NFSv4 Minor Version 1 July 2007 trunk requests and replies. The question is whether session trunking and/or client ID trunking applies. Session Trunking If the eia_clientowner argument is the same in two different EXCHANGE_ID requests, and the eir_clientid, eir_server_owner.so_major_id, eir_server_owner.so_minor_id, and eir_server_scope results match in both EXCHANGE_ID results, then the client is permitted to perform session trunking. If the client has no session mapping to the tuple of eir_clientid, eir_server_owner.so_major_id, eir_server_scope, eir_server_owner.so_minor_id, then it creates the session via a CREATE_SESSION operation over one of the connections, which associates the connection to the session. If there is a session for the tuple, the client can issue BIND_CONN_TO_SESSION to associate the connection to the session. The client can invoke CREATE_SESSION regardless whether there is session for the tuple. The second connection is associated with the same session as the first connection via the BIND_CONN_TO_SESSION operation. Client ID Trunking If the eia_clientowner argument is the same in two different EXCHANGE_ID requests, and the eir_clientid, eir_server_owner.so_major_id, and eir_server_scope results match in both EXCHANGE_ID results, but the eir_server_owner.so_minor_id results do not match then the client is permitted to perform client ID trunking. The client can associate each connection with different sessions, where each session is associated with the same server. Of course, even if the eir_server_owner.so_minor_id fields do match, the client is free to employ client ID trunking instead of sessiond trunking. The client completes the act of client ID trunking by invoking CREATE_SESSION on each connection, using the same client ID that was returned in eir_clientid. These invocations create two sessions and also associate each connection with each session. When doing client ID trunking, locking state is shared across sessions associated with the same client ID. This requires the server to coordinate state across sessions. When two servers over two connections claim matching or partially matching eir_server_owner, eir_server_scope, and eir_clientid values, the client does not have to trust the servers' claims. The client may verify these claims before trunking traffic in the following ways: o For session trunking, clients SHOULD reliably verify if connections between different network paths are in fact associated Shepler, et al. Expires January 2, 2008 [Page 42] Internet-Draft NFSv4 Minor Version 1 July 2007 with the same NFSv4.1 server and usable on the same session, and servers MUST allow clients to perform reliable verification. When a client ID is created, the client SHOULD specify that BIND_CONN_TO_SESSION is to be verified according to the SP4_SSV or SP4_MACH_CRED (Section 18.35) state protection options. For SP4_SSV, reliable verification depends on a shared secret (the SSV) that is established via the SET_SSV (Section 18.47) operation. When a new connection is associated with the session (via the BIND_CONN_TO_SESSION operation, see Section 18.34), if the client specified SP4_SSV state protection for the BIND_CONN_TO_SESSION operation, the client MUST issue the BIND_CONN_TO_SESSION with RPCSEC_GSS protection, using integrity or privacy, and a RPCSEC_GSS using the GSS SSV mechanism (Section 2.10.7.4). The RPCSEC_GSS handle is created by CREATE_SESSION (Section 18.36). If the client mistakenly tries to associate a connection to a session of a wrong server, the server will either reject the attempt because it is not aware of the session identifier of the BIND_CONN_TO_SESSION arguments, or it will reject the attempt because the RPCSEC_GSS authentication fails. Even if the server mistakenly or maliciously accepts the connection association attempt, the RPCSEC_GSS verifier it computes in the response will not be verified by the client, the client will know it cannot use the connection for trunking the specified session. If the client specified SP4_MACH_CRED state protection, the BIND_CONN_TO_SESSION operation will use RPCSEC_GSS integrity or privacy, using the same credential that was used when the client ID was created. Mutual authentication via RPCSEC_GSS assures the client that the connection is associated with the correct session of the correct server. o For client ID trunking, the client has at least two options for verifying that the same client ID obtained from two different EXCHANGE_ID operations came from the same server. The first option is to use RPCSEC_GSS authentication when issuing each EXCHANGE_ID. Each time an EXCHANGE_ID is issued with RPCSEC_GSS authentication, the client notes the principal name of the GSS target. If the EXCHANGE_ID results indicate client ID trunking is possible, and the GSS targets' principal names are the same, the servers are the same and client ID trunking is allowed. The second option for verification is to use SP4_SSV protection. When the client issues EXCHANGE_ID it specifies SP4_SSV protection. The first EXCHANGE_ID the client issues always has to Shepler, et al. Expires January 2, 2008 [Page 43] Internet-Draft NFSv4 Minor Version 1 July 2007 be confirmed by a CREATE_SESSION call. The client then issues SET_SSV on the sessions. Later the client issues EXCHANGE_ID to a second destination network address than the first EXCHANGE_ID was issued with. The client checks that each EXCHANGE_ID reply has the same eir_clientid, eir_server_owner.so_major_id, and eir_server_scope. If so, the client verifies the claim by issuing a CREATE_SESSION to the second destination address, protected with RPCSEC_GSS integrity using an RPCSEC_GSS handle returned by the second EXCHANGE_ID. If the server accept the CREATE_SESSION request, and if the client verifies the RPCSEC_GSS verifier and integrity codes, then the client has proof the second server knows the SSV, and thus the two servers are the same for the purposes of client ID trunking. 2.10.5. Exactly Once Semantics Via the session, NFSv4.1 offers exactly once semantics (EOS) for requests sent over a channel. EOS is supported on both the fore and back channels. Each COMPOUND or CB_COMPOUND request that is issued with a leading SEQUENCE or CB_SEQUENCE operation MUST be executed by the receiver exactly once. This requirement is regardless whether the request is issued with reply caching specified (see Section 2.10.5.1.2). The requirement holds even if the requester is issuing the request over a session created between a pNFS data client and pNFS data server. The rationale for this requirement is understood by categorizing requests into three classifications: o Nonidempotent requests. o Idempotent modifying requests. o Idempotent non-modifying requests. An example of a non-idempotent request is RENAME. If is obvious that if a replier executes the same RENAME request twice, and the first execution succeeds, the re-execution will fail. If the replier returns the result from the re-execution, this result is incorrect. Therefore, EOS is required for nonidempotent requests. An example of an idempotent modifying request is a COMPOUND request containing a WRITE operation. Repeated execution of the same WRITE has the same effect as execution of that write once. Nevertheless, putting enforcing EOS for WRITEs and other idempotent modifying requests is necessary to avoid data corruption. Suppose a client issues WRITEs A, and B to a noncompliant server that Shepler, et al. Expires January 2, 2008 [Page 44] Internet-Draft NFSv4 Minor Version 1 July 2007 does not enforce EOS, and receives no response, perhaps due to a network partition. The client reconnects to the server and re-issues both WRITEs. Now, the server has outstanding two instances of each of A and B. The server can be in a situation in which it executes and replies to the retries of A and B, while the first A and B are still waiting in the server's I/O system for some resource. Upon receiving the replies to the second attempts of WRITEs A and B, the client believes its writes are done so it is free to issue WRITE D which overlaps the range of one or both of A and B. If A or B are subsequently executed for the second time, then what has been written by D can be overwritten and thus corrupted. An example of an idempotent non-modifying request is a COMPOUND containing SEQUENCE, PUTFH, READLINK and nothing else. The re- execution of a such a request will not cause data corruption, or produce an incorrect result. Nonetheless, to keep the implementation simple, the replier MUST enforce EOS for all requests whether idempotent and non-modifying or not. Note that true and complete EOS is not possible unless the server persists the reply cache in stable storage, unless the server is somehow implemented to never require a restart (indeed if such a server exists, the distinction between a reply cache kept in stable storage versus one that is not is one without meaning). See Section 2.10.5.5 for a discussion of persistence in the reply cache. Regardless, even if the server does not persist the reply cache, EOS improves robustness and correctness over previous versions of NFS because the legacy duplicate request/reply caches were based on the ONC RPC transaction identifier (XID). Section 2.10.5.1 explains the shortcomings of the XID as a basis for a reply cache and describes how NFSv4.1 sessions improve upon the XID. 2.10.5.1. Slot Identifiers and Reply Cache The RPC layer provides a transaction ID (XID), which, while required to be unique, is not convenient for tracking requests for two reasons. First, the XID is only meaningful to the requester; it cannot be interpreted by the replier except to test for equality with previously issued requests. When consulting an RPC-based duplicate request cache, the opaqueness of the XID requires a computationally expensive lookup (often via a hash that includes XID and source address). NFSv4.1 requests use a non-opaque slot id which is an index into a slot table, which is far more efficient. Second, because RPC requests can be executed by the replier in any order, there is no bound on the number of requests that may be outstanding at any time. To achieve perfect EOS using ONC RPC would require storing all replies in the reply cache. XIDs are 32 bits; storing over four billion (2^32) replies in the reply cache is not practical. Shepler, et al. Expires January 2, 2008 [Page 45] Internet-Draft NFSv4 Minor Version 1 July 2007 In practice, previous versions of NFS have chosen to store a fixed number of replies in the cache, and use a least recently used (LRU) approach to replacing cache entries with new entries when the cache is full. In NFSv4.1, the number of outstanding requests is bounded by the size of the slot table, and a sequence id per slot is used to tell the replier when it is safe to delete a cached reply. In the NFSv4.1 reply cache, when the requester issues a new request, it selects a slot id in the range 0..N, where N is the replier's current maximum slot id granted to the requester on the session over which the request is to be issued. The value of N starts out as equal to ca_maxrequests - 1 (Section 18.36), but can be adjusted by the response to SEQUENCE or CB_SEQUENCE as described later in this section. The slot id must be unused by any of the requests which the requester has already active on the session. "Unused" here means the requester has no outstanding request for that slot id. A slot contains a sequence id and the cached reply corresponding to the request send with that sequence id. The sequence id is a 32 bit unsigned value, and is therefore in the range 0..0xFFFFFFFF (2^32 - 1). The first time a slot is used, the requester must specify a sequence id of one (1) (Section 18.36). Each time a slot is re-used, the request MUST specify a sequence id that is one greater than that of the previous request on the slot. If the previous sequence id was 0xFFFFFFFF, then the next request for the slot MUST have the sequence id set to zero (i.e. (2^32 - 1) + 1 mod 2^32). The sequence id accompanies the slot id in each request. It is for the critical check at the server: it used to efficiently determine whether a request using a certain slot id is a retransmit or a new, never-before-seen request. It is not feasible for the client to assert that it is retransmitting to implement this, because for any given request the client cannot know whether the server has seen it unless the server actually replies. Of course, if the client has seen the server's reply, the client would not retransmit. The replier compares each received request's sequence id with the last one previously received for that slot id, to see if the new request is: o A new request, in which the sequence id is one greater than that previously seen in the slot (accounting for sequence wraparound). The replier proceeds to execute the new request, and the replier MUST increase the slot's sequence id by one. o A retransmitted request, in which the sequence id is equal to that currently recorded in the slot. If the original request has executed to completion, the replier returns the cached reply. See Shepler, et al. Expires January 2, 2008 [Page 46] Internet-Draft NFSv4 Minor Version 1 July 2007 Section 2.10.5.2 for direction on how the replier deals with retries of requests that are stll in progress. o A misordered retry, in which the sequence id is less than (accounting for sequence wraparound) that previously seen in the slot. The replier MUST return NFS4ERR_SEQ_MISORDERED (as the result from SEQUENCE or CB_SEQUENCE). o A misordered new request, in which the sequence id is two or more than (accounting for sequence wraparound) than that previously seen in the slot. Note that because the sequence id must wraparound to zero (0) once it reaches 0xFFFFFFFF, a misordered new request and a misordered retry cannot be distinguished. Thus, the replier MUST return NFS4ERR_SEQ_MISORDERED (as the result from SEQUENCE or CB_SEQUENCE). Unlike the XID, the slot id is always within a specific range; this has two implications. The first implication is that for a given session, the replier need only cache the results of a limited number of COMPOUND requests . The second implication derives from the first, which is unlike XID-indexed reply caches (also known as duplicate request caches - DRCs), the slot id-based reply cache cannot be overflowed. Through use of the sequence id to identify retransmitted requests, the replier does not need to actually cache the request itself, reducing the storage requirements of the reply cache further. These facilities make it practical to maintain all the required entries for an effective reply cache. The slot id and sequence id therefore take over the traditional role of the XID and source network address in the replier's reply cache implementation. This approach is considerably more portable and completely robust - it is not subject to the reassignment of ports as clients reconnect over IP networks. In addition, the RPC XID is not used in the reply cache, enhancing robustness of the cache in the face of any rapid reuse of XIDs by the requester. While the replier does not care about the XID for the purposes of reply cache management (but the replier MUST return the same XID that was in the request), nonetheless there are considerations for the XID in NFSv4.1 that are the same as all other previous versions of NFS. The RPC XID remains in each message and must be formulated in NFSv4.1 requests as it any other ONC RPC request. The reasons include: o The RPC layer retains its existing semantics and implementation. o The requester and replier must be able to interoperate at the RPC layer, prior to the NFSv4.1 decoding of the SEQUENCE or CB_SEQUENCE operation Shepler, et al. Expires January 2, 2008 [Page 47] Internet-Draft NFSv4 Minor Version 1 July 2007 o If an operation is being used that does not start with SEQUENCE or CB_SEQUENCE (e.g. BIND_CONN_TO_SESSION), then the RPC XID is needed for correct operation to match the reply to the request. o The SEQUENCE or CB_SEQUENCE operation may generate an error. If so, the embedded slot id, sequence id, and sessionid (if present) in the request will not be in the reply, and the requester has only the XID to to match the reply to the request. Givem that well formulated XIDs continue to be required, this begs the question why SEQUENCE and CB_SEQUENCE replies have a sessionid, slot id and sequence id? Having the sessionid in the reply means the requester does not have to use the XID to lookup the sessionid, which would be necessary if the connection were associated with multiple sessions. Having the slot id and sequence id in the reply means requester does not have to use the XID to lookup the slot id and sequence id. Furhermore, since the XID is only 32 bits, it is too small to guarantee the re-association of a reply with its request ([27]); having sessionid, slot id, and sequence id in the reply allows the client to validate that the reply in fact belongs to the matched request. The SEQUENCE (and CB_SEQUENCE) operation also carries a "highest_slotid" value which carries additional requester slot usage information. The requester must always provide a slot id representing the outstanding request with the highest-numbered slot value. The requester should in all cases provide the most conservative value possible, although it can be increased somewhat above the actual instantaneous usage to maintain some minimum or optimal level. This provides a way for the requester to yield unused request slots back to the replier, which in turn can use the information to reallocate resources. The replier responds with both a new target highest_slotid, and an enforced highest_slotid, described as follows: o The target highest_slotid is an indication to the requester of the highest_slotid the replier wishes the requester to be using. This permits the replier to withdraw (or add) resources from a requester that has been found to not be using them, in order to more fairly share resources among a varying level of demand from other requesters. The requester must always comply with the replier's value updates, since they indicate newly established hard limits on the requester's access to session resources. However, because of request pipelining, the requester may have active requests in flight reflecting prior values, therefore the replier must not immediately require the requester to comply. Shepler, et al. Expires January 2, 2008 [Page 48] Internet-Draft NFSv4 Minor Version 1 July 2007 o The enforced highest_slotid indicates the highest slot id the requester is permitted to use on a subsequent SEQUENCE or CB_SEQUENCE operation. The replier's enforced highest_slotid SHOULD be no less than the highest_slotid the requester indicated in the SEQUENCE or CB_SEQUENCE arguments. If a replier detects the client is being intransigent, i.e. it fails in a series of requests to honor the target highest_slotid even though the replier knows there are no outstanding requests a higher slot ids, it MAY take more forceful action. When faced with intransigence, the replier MAY reply with a new enforced highest_slotid that is less than its previous enforced highest_slotid. Thereafter, if the requester continues to send requests with a highest_slotid that is greater than the replier's new enforced highest_slotid the server MAY return NFS4ERR_BAD_HIGHSLOT, unless the slot id in the request is greater than the new enforced highest_slotid, and the request is a retry. The replier SHOULD keep slots it wants to retire around until the requester sends a request with a highest_slotid less than or equal to the replier's new enforced highest_slotid. Also a request with a slot that is higher than the new enforced highest_slotid can be retired if the requester specifies a sequence id that is not equal what is in the slot's reply cache. In other words, once the replier has forcibly lowered the enforced highest_slotid, the requester is only allowed to send retries to the to-be-retired slots. o The requester SHOULD use the lowest available slot when issuing a new request. This way, the replier may be able to retire slot entries faster. However, where the replier is actively adjusting its granted highest_slotid, it will not not be able to use only the receipt of the slot id and highest_slotid in the request. Neither the slot id nor the highest_slotid used in a request may reflect the replier's current idea of the requester's session limit, because the request may have been sent from the requester before the update was received. Therefore, in the downward adjustment case, the replier may have to retain a number of reply cache entries at least as large as the old value of maximum requests outstanding, until operation sequencing rules allow it to infer that the requester has seen its rep