Cascaded FICON in a Brocade Environment
Transcription
Cascaded FICON in a Brocade Environment
MAINFRAME Technical Brief: Cascaded FICON in a Brocade Environment Cascaded FICON introduces the open systems SAN concept of the Inter-Switch Links (ISLs). IBM now supports the flow of traffic from the processor through two FICON directors connected via an ISL and on to the peripheral devices such as disk and tape. This paper discusses the benefits and some technical aspects of cascaded FICON in a Brocade environment. MAINFRAME Technical Brief CONTENTS Technical Brief: Cascaded FICON in a Brocade Environment...........................................................................................................................................................1 Contents.................................................................................................................................................................................................................................................................2 Introduction...........................................................................................................................................................................................................................................................3 The Evolution from ESCON to FICON Cascading.....................................................................................................................................................................................4 What is Cascaded FICON?.......................................................................................................................................................4 High Availability (HA), Disaster Recovery (DR), and Business Continuity (BC)......................................................................5 Benefits of FICON Cascading .........................................................................................................................................................................................................................7 Optimizing Use of Storage Resources.....................................................................................................................................8 Cascaded FICON Performance................................................................................................................................................9 Buffer-to-Buffer Credit Management ..........................................................................................................................................................................................................9 About BB Credits ....................................................................................................................................................................10 Packet Flow and Credits ........................................................................................................................................................................... 10 Buffer-to-Buffer Flow Control.................................................................................................................................................................... 10 Implications of Asset Deployment.........................................................................................................................................11 Configuring BB Credit Allocations on FICON Directors........................................................................................................................... 12 BB Credit Exhaustion and Frame Pacing Delay...................................................................................................................................... 12 What is the difference between frame pacing and frame latency?...................................................................................................... 14 What can you do to eliminate or circumvent frame pacing delay?....................................................................................................... 14 How can you make improvements? ........................................................................................................................................................ 15 Dynamic Allocation of BB Credits ............................................................................................................................................................ 15 Technical Discussion of FICON Cascading.............................................................................................................................................................................................16 Fabric Addressing Support ....................................................................................................................................................16 High Integrity Enterprise Fabrics ...........................................................................................................................................19 Managing Cascaded FICON Environments and ISLs: Link Balancing and Aggregation .....................................................19 Best Practices for FICON Cascaded Link Management....................................................................................................................................................................22 Terms and Definitions............................................................................................................................................................22 Frame-level Trunking Implementation ..................................................................................................................................22 Brocade M-Series Director Open Trunking ...........................................................................................................................24 Use of Data Rate Statistics by Open Trunking........................................................................................................................................ 26 Rerouting Decision Making ...................................................................................................................................................................... 26 Checks on the Cost Function ................................................................................................................................................................... 27 Periodic Rerouting..................................................................................................................................................................................... 27 Algorithms to Gather Data........................................................................................................................................................................ 28 Summary of Open Trunking Parameters................................................................................................................................................. 29 Fabric Tuning Using Open Trunking......................................................................................................................................................... 30 Open Trunking Enhancements................................................................................................................................................................. 30 Open Trunking Summary .......................................................................................................................................................................... 31 Controlling FICON Cascaded Links in More Demanding Environments ..............................................................................32 Preferred Path on M-Series FICON Switches .......................................................................................................................................... 32 Prohibit Paths ............................................................................................................................................................................................ 33 Traffic Isolation Zones on B-Series FICON Switches ............................................................................................................35 TI Zones Best Practices ............................................................................................................................................................................ 37 Summary ...........................................................................................................................................................................................................................................................38 Appendix: Fibre Channel Class 4 Class of Service (CoS)..................................................................................................................................................................39 Cascaded FICON in a Brocade environment 2 of 40 MAINFRAME Technical Brief INTRODUCTION Prior to the introduction of support for cascaded FICON director connectivity on IBM zSeries mainframes in January 2003, only a single level of FICON directors was supported for connectivity between a processor and peripheral devices. Cascaded FICON introduced the open systems Storage Area Network (SAN) concept of the Inter-Switch Links (ISLs). IBM now supports the flow of traffic from the processor through two FICON directors connected via an ISL to the peripheral devices, such as disk and tape. This paper starts with a brief discussion of cascaded FICON, its applications, and the benefits of a cascaded FICON architecture. The next section provides a technical discussion of buffer–to-buffer credits (BB credits), open exchanges, and performance. The final section describes management of a cascaded FICON architecture, including ISL trunking and the Traffic Isolation capabilities unique to Brocade®. FICON, like most technological advancements, evolved from the limitations of its predecessor—the IBM Enterprise System Connection (ESCON) protocol—a successful storage network protocol for mainframe systems considered the parent of the modern SAN. IBM Fiber Connection (FICON) was initially developed to address the limitations of the ESCON protocol. In particular, FICON addresses ESCON addressing, bandwidth and distance limitations. FICON has evolved rapidly since the initial FICON bridge mode (FCV) implementations came to the data center, from FCV to single director FICON Native (FC) implementations, to configurations that intermix Fibre Channel (FC) and open systems Fibre Channel (FCP), and now to cascaded fabrics of FICON directors. FICON support of cascaded directors is available, has been supported on the IBM zSeries since 2003, and is supported on the System z processors as well. Cascaded FICON allows a FICON Native (FC) channel or a FICON CTC channel to connect a zSeries/System z server to another similar server or peripheral device such as disk, tape library, or printer via two Brocade FICON directors or switches. A FICON channel in FICON Native mode connects one or more processor images to an FC link, which connects to the first FICON director, then dynamically through the first director to one or more ports, and from there to a second cascaded FICON director. From the second director there are Fibre Channel links to FICON Control Unit (CU) ports on attached devices. These FICON directors can be geographically separate, providing greater flexibility and fiber cost savings. All FICON directors connected together in a cascaded FICON architecture must be from the same vendor (such as Brocade). Initial support by IBM is limited to a single hop between cascaded FICON directors; however, the directors can be configured in a hub-star architecture with up to 24 directors in the fabric. NOTE: In this paper the term “switch” is used to reference a Brocade hardware platform (switch, director, or backbone) unless otherwise indicated. Cascaded FICON allows Brocade customers tremendous flexibility and the potential for fabric cost savings in their FICON architectures. It is extremely important for business continuity/disaster recovery implementations. Customers looking at these types of implementations can realize significant potential savings in their fiber infrastructure costs and channel adapters by reducing the number of channels for connecting two geographically separate sites with high availability FICON connectivity at increased distances. Brocade (via the acquisitions of CNT/Inrange and McDATA) has a long and distinguished history of working closely with IBM in the mainframe environment. This history includes manufacture of IBM’s 9032 line of ESCON directors, the CD/9000 ESCON directors, the FICON bridge cards, and the first FICON Native (FC) directors with the McDATA ED-5000 and Inrange FC/9000. Brocade’s second generation of FICON directors, the legacy McDATA Intrepid 6064, Brocade M6140, Brocade 24000, and Brocade Mi10K are the foundation of many FICON storage networks. Brocade continues to lead the way in cascaded FICON with the Brocade 48000 Director and the Brocade DCX Backbone. Cascaded FICON in a Brocade environment 3 of 40 MAINFRAME Technical Brief THE EVOLUTION FROM ESCON TO FICON CASCADING In 1990 the ESCON channel architecture was introduced as the way to address the limitations of parallel (bus and tag) architectures. ESCON provided noticeable, measurable improvements in distance capabilities, switching topologies and, most importantly, response time and service time performance. By the end of the 1990s, ESCON’s strengths over parallel channels had become its weaknesses. FICON evolved in the late 1990s to address the technical limitations of ESCON in bandwidth, distances, and channel/device addressing with the following features: • Increased number of concurrent connections • Increased distance • Increased channel device addressing support • Increased link bandwidth • Increased distance to data droop effect • Greater exploitation of priority I/O queuing Initially, the FICON (FC-SB-2) architecture did not allow the connection of multiple FICON directors. (Neither does ESCON except when static connections of “chained” ESCON directors were used to extend ESCON distances.) Both ESCON and FICON defined a single byte for the link address, the link address being the port attached to “this” director. This changed in January 2003. Now it is possible to have two-director configurations and separate geographic sites. This is done by adding the domain field of the Fibre Channel destination ID to the link address to specify the exiting director and the link address of that director. What is Cascaded FICON? Cascaded FICON refers to an implementation of FICON that involves one or more FICON channel paths to be defined over two FICON switches connected to each other using an Inter-Switch Link (ISL). The processor interface is connected to one switch, while the storage interface is connected to the other. This configuration is supported for both disk and tape, with multiple processors, disk subsystems, and tape subsystems sharing the ISLs between the directors. Multiple ISLs between the directors are also supported. Cascading between a director and a switch, for example from a Brocade 48000 director to a Brocade 5000 is also supported. There are hardware and software requirements specific to cascaded FICON: • The FICON directors themselves must be from the same vendor (that is, both should be from Brocade) • The mainframes must be zSeries machines or System z processors: z800, 890, 900, 990, z9 BC or z9 EC. Cascaded FICON requires 64-bit architecture to support the 2-byte addressing scheme. Cascaded FICON is not supported on 9672 G5/G6 mainframes. • z/OS version 1.4 or greater, and/or z/OS version 1.3 with required PTFs/MCLs to support 2-byte link addressing (DRV3g and MCL (J11206) or later) • The high integrity fabric feature for the FICON switch must be installed on all switches involved in the cascaded architecture. For Brocade M-Series directors or switches, this is known as SANtegrity Binding, and it requires M-EOS firmware version 4.0 or later. For the Brocade 5000 Switch and 24000 and 48000 Directors, this requires Secure Fabric OS® (SFOS). Cascaded FICON in a Brocade environment 4 of 40 MAINFRAME Technical Brief High Availability (HA), Disaster Recovery (DR), and Business Continuity (BC) The greater bandwidth of and distance capabilities of FICON over ESCON are starting to make it an essential and cost-effective component in HA/DR/BC solutions, the primary reason mainframe installations are adopting cascaded FICON architectures. Since Sept 11, 2001, more and more companies are bringing DR/BC in-house (“insourcing”) and companies are building the mainframe component of their new DR/BC data centers using FICON rather than ESCON. Until IBM released cascaded FICON, the FICON architecture was limited to a single domain due to the single-byte addressing limitations inherited from ESCON. FICON cascading allows the end user to have a greater maximum distance between sites (up to an unrepeated distance of 36 km at 2 Gbit/sec bandwidth). For details, see Tables 1 and 2. Following September 11, 2001, industry participants met with government agencies, including the United States Securities and Exchange Commission (SEC), the Federal Reserve, the New York State Banking Department, and the Office of the Comptroller of the Currency. These meetings were held specifically to formulate and analyze the lessons learned from the events of September 11, 2001. These agencies released an interagency white paper, and the SEC released its own paper, on best practices to strengthen the IT resilience of the US financial system. These events underlined how critical it is for an enterprise to be prepared for disaster—even more for large enterprise mainframe customers. Disaster recovery is no longer limited to problems such as fires or a small flood. Companies now need to consider and plan for the possibility of the destruction of their entire data center and the people that work in it. A great many articles, books and other publications have discussed the IT lessons learned from September 11, 2001: • To manage business continuity, it is critical to maintain geographical separation of facilities and resources. Any resource that cannot be replaced from external sources within the Recovery Time Objective (RTO) should be available within the enterprise. It is also preferable to have these resources (buildings, hardware, software, data, and staff) in multiple locations. Cascaded FICON gives the geographical separation required; ESCON does not. • The most successful DR/BC implementations are often based on as much automation as possible, since key staff and skills may no longer be present after a disaster strikes. • Financial, government, military, and other enterprises now have critical RTOs measured in seconds and minutes and not days and hours. For these end users it has become increasingly necessary to implement insourced DR solution. This means that the facilities and equipment needed for the HA/DR/BC solution are owned by the enterprise itself. In addition, cascaded FICON allows for considerable cost savings compared with ESCON. • A regional disaster could cause multiple organizations to declare disasters and initiate recovery actions simultaneously. This is highly likely to severely stress the capacity of business recovery services (outsourced) in the vicinity of the regional disaster. Business continuity service companies typically work on a “first come, first served” basis. So when a regional disaster occurs, these outsourcing facilities can fill up quickly and be overwhelmed. Also, a company’s contract with the BC/DR outsourcer may stipulate that the customer has the use of the facility only for a limited time (for example, 45 days). This may spur companies with BC/DR outsourcing contracts to a)consider changing outsourcing firms, b) re-negotiate an existing contract, or c) study the requirements and feasibility for insourcing their BC/DR and creating their own DR site. Depending on an organization’s RTO and Recovery Point Objective (RPO), option c) may be the best alternative. • The recovery site must have adequate hardware and the hardware at the recovery site must be compatible with the hardware at the primary site. Organizations must plan for their recovery site to have a) sufficient server processing capacity, b) sufficient storage capacity, and c) sufficient networking and storage networking capacity to enable all business critical applications to be run from the recovery site. The installed server capacity at the recovery site may be used to meet day-to-day needs (assuming BC/DR is insourced). Fallback capacity may be provided via several means, including workload prioritization (test, development, production, and data warehouse). Cascaded FICON in a Brocade environment 5 of 40 MAINFRAME Technical Brief Fallback capacity may also be provided via a capacity upgrade scheme based on changing a license agreement versus installing additional capacity. IBM System z and zSeries servers have the Capacity Backup Option (CBU). Unfortunately in the open systems world, this feature is not common. Many organizations will take a calculated risk with open systems and not purchase two duplicate servers (one for production at the primary data center and a second for the DR data center). Therefore, open systems DR planning account for this possibility and pose the question “What can I lose”? • A robust BC/DR solution must be based on as much automation as possible. It is too risky to assume that key personnel with critical skills will be available to restore IT services. Regional disasters impact personal lives as well. Personal crises and the need to take care of families, friends, and loved ones will take a priority for IT workers. Also, key personnel may not be able to travel and will be unable to get to the recovery site. Mainframe installations are increasingly looking to automate switching resources from one site to another. One way to do this in a mainframe environment is with a cascaded FICON Geographically Dispersed Parallel Sysplex (GDPS). • If an organization is to maintain business continuity, it is critical to maintain sufficient geographical separation of facilities, resources, and personnel. If a resource cannot be replaced from external sources within the RTO, it needs to be available internally and in multiple locations. This statement holds true for hardware resources, employees, data, and even buildings. An organization also needs to have a secondary disaster recovery plan. Companies that successfully recover to their designated secondary site after losing their entire primary data center quickly come to the realization that all of their data is now in one location. If disaster events continue or if there is not sufficient geographic separation and a recovery site is also incapacitated, there is no further recourse (no secondary plan) for most organizations. What about the companies that initially recover a third party site with contractual agreements calling for them to vacate the facility within a specified time period? What happens when you do not have a primary site to go back to? The prospect of further regional disasters necessitates asking the question “What is our secondary disaster recovery plan?” This has led many companies to seriously consider implementing a three-site BC/DR strategy. What this strategy entails is two sites within the same geographic vicinity to facilitate high availability and a third, remote site for disaster recovery. The major objection to a three-site strategy is telecommunication costs, but as with any major decision, a proper risk vs. cost analysis should be performed. • Asynchronous remote mirroring becomes a more attractive option to organizations insourcing BC/DR and/or increasing the distance between sites. While synchronous remote mirroring is popular, many organizations are starting to give serious consideration to greater distances between sites and to a strategy of asynchronous remote mirroring to allow further separation between their primary and secondary sites. HA/DR/BC implementations including GDPS, remote Direct-Attached Storage Device (DASD) mirroring, electronic tape/virtual tape vaulting, and remote DR sites are all facilitated by cascaded FICON. Cascaded FICON in a Brocade environment 6 of 40 MAINFRAME Technical Brief BENEFITS OF FICON CASCADING Cascaded FICON delivers to the mainframe space many of the same benefits of open systems SANs. It allows for simpler infrastructure management, decreased infrastructure cost of ownership, and higher data availability. This higher data availability is important in delivering a more robust enterprise DR strategy. Further benefits are realized when the ISLs connect switches in two or more locations and/or are extended over long distances. Figure 1 shows a non-cascaded two-site environment. Figure 1. Two sites in a non-cascaded FICON environment In Figure 1, all hosts have access to all of the disk and tape subsystems at both locations. The host channels at one location are extended to the Brocade 48000 or Brocade DCX (FICON) platforms at the other location to allow for cross-site storage access. If each line represents two FICON channels, then this configuration would need a total of 16 extended links; and these links would be utilized only to the extent that the host has activity to the remote devices. The most obvious benefit of cascaded versus non-cascaded is the reduction in the number of links across the Wide Area Network (WAN). Figure 2 shows a cascaded, two-site FICON environment. In this configuration, if each line represents two channels, only 4 extended links are required. Since FICON is a packet-switched protocol (versus the circuit-switched ESCON protocol), multiple devices can share the ISLs, and multiple I/Os can be processed across the ISLs at the same time. This allows for the reduction of number of links between sites and allows for more efficient utilization of the links in place. In addition, ISLs can be added as the environment grows and traffic patterns dictate. This is the key way in which a cascaded FICON implementation can reduce the cost of the enterprise architecture. In Figure 2, the cabling schema for both intersite and intrasite has been simplified. Fewer intrasite cables translate into decreased cabling hardware and management costs. It also reduces the number of FICON adapters, director ports, and host channel card ports required, thus decreasing the connectivity cost for mainframes and storage devices as well. In Figure 2, the sharing of links between the two sites reduces the number of physical channels between sites, thereby lowering the cost by consolidating channels and the number of director ports. The faster the channel speeds between sites, the Cascaded FICON in a Brocade environment 7 of 40 MAINFRAME Technical Brief better the intersite cost savings from this consolidation. So, with 4 Gbit/sec FICON and 10 Gbit/sec FICON available, the more attractive this option becomes. Another benefit to this approach, especially over long distances, is that the Brocade FICON director typically has many more buffer credits per port than do the processor and the disk or tape subsystem cards. More buffer credits allow for a link to be extended to greater distances without significantly impacting response times to the host. Figure 2. Two sites in a cascaded FICON environment Optimizing Use of Storage Resources ESCON limits the amount of terabytes (TB) that a customer can realistically have in a single DASD array, because of device addressing limitations. Rather than filling a frame to capacity, additional frames need to be purchased, wasting capacity. For example, running Mod 3 volumes in an ESCON environment typically leads to running out of available addresses between 3.3 and 3.5 TB. This is significant because it requires more disk array footprints at each site, and: • The technology of DASD arrays places a limit on the number of CU ports inside, and there is a limit of 8 links per LCU. These 8 links can only perform so fast. • This also limits the I/O density (I/Os per GB per second) into and out of the frame, placing a cap on the amount of disk space the frame can support and still supply reasonable I/O response times. Cascaded FICON lets customers fully utilize their old disk arrays, preventing them from having to “throttle back” I/O loads and make the most efficient use of technologies such as Parallel Access Volumes (PAVs). Additionally, a cascaded FICON environment requires fewer fiber adapters on storage devices and mainframes. Cascaded FICON allows for Total Cost of Ownership (TCO) savings in an installation’s mainframe tape/virtual tape environment. FICON platforms such as the Brocade 48000 and DCX are “5 nines” devices. The typical enterprise-class tape drive is only 2 or 3 nines at best due to all of the moving mechanical parts. A FICON port on a Brocade DCX (or any FICON enterprise-class platform) typically costs twice as much as a FICON port on a Brocade 5000 FICON switch. (The FICON switch is not a “5 nines” Cascaded FICON in a Brocade environment 8 of 40 MAINFRAME Technical Brief device, while the FICON director is.) However, it may not make sense to connect “3 nines” tape drives to “5 nines” directors, when the best reliability achieved is that of the lowest common denominator (the tape drive). Depending on your exact configuration, it can make more financial sense to connect tape drives to Brocade 5000 FICON switches cascaded to a Brocade DCX (FICON), thus saving the more expensive director ports for host and/or DASD connectivity. Cascaded FICON Performance Seven main factors affect the performance of a cascaded FICON director configuration (IBM white paper on Cascaded FICON director performance considerations, Cronin and Bassener): 1. The number of ISLs between the two cascaded FICON directors and the routing of traffic across ISLs 2. The number of FICON/FICON Express channels whose traffic is being routed across the ISLs 3. The ISL link speed 4. Contention for director ports associated with the ISLs 5. The nature of the I/O workload (I/O rates, block sizes, use of data chaining, and read/write ratio) 6. The distances of the paths between the components of the configuration (the FICON channel links from processor(s) to the first director, the ISLs between directors, and the links from the second director to the storage control unit ports) 7. The number of switch port buffer to buffer credits The last factor—the number of buffer-to-buffer credits and the management of buffer to buffer credits— is typically the one examined most carefully, and the one that is most often misunderstood. BUFFER-TO-BUFFER CREDIT MANAGEMENT The introduction of the FICON I/O protocol to the mainframe I/O subsystem provided the ability to process data rapidly and efficiently. And as a result of two main changes that FICON made to the mainframe channel I/O infrastructure, the requirements for a new Resource Measurement Facility (RMF) record came into being. The first change was that unlike ESCON, FICON uses buffer credits to account for packet delivery. The second change was the introduction of FICON cascading, which was not possible with ESCON. Buffer-to-buffer credits (BB credits) and their management in a FICON environment is often a misunderstood concept. Buffer-to-buffer credit management does have an impact on performance over distances in cascaded FICON environments. At present, there is no good way to track BB credits being used. At initial configuration, BB credits are allocated but not managed. As a result, the typical FICON shop assigns a large number of BB credits for long-distance traffic. Just as assigning too many aliases to a base address in managing dynamic PAVs can lead to configuration issues due to addressing constraints, assigning too many BB credits can lead to director configuration issues, which can require outages to resolve. Mechanisms for detecting BB credit starvation in a FICON environment are extremely limited. This section reviews the concept of BB credits, including current schema for allocating them. It then discusses the only way to detect BB credit starvation on FICON directors, including the concept of frame pacing delay. Finally a mechanism to count BB credits used is outlined, and then another theoretical “drawing board” concept is described: dynamic allocation of BB credits on an individual I/O basis similar to the new HyperPAVs concept for DASD. Cascaded FICON in a Brocade environment 9 of 40 MAINFRAME Technical Brief About BB Credits This section is an overview of BB credits; for a more detailed discussion, consult Robert Kembel’s “Fibre Channel Consultant” series. Packet Flow and Credits The fundamental objective of flow control is to prevent a transmitter from overrunning a receiver by allowing the receiver to pace the transmitter, managing each I/O as a unique instance. At extended distances, pacing signal delays can result in degraded performance. Buffer-to-buffer credit flow control is used to transmit frames from the transmitter to the receiver and pacing signals back from the receiver to the transmitter. The basic information carrier in the FC protocol is the frame. Other than ordered sets, which are used for communication of low-level link conditions, all information is contained in the frames. A good analogy to a frame is an envelope: When you send a letter via the United States Postal Service (USPS), the letter is “encapsulated” in an envelope. When sending data via a FICON network, the data is encapsulated in a frame (although service times in a FICON network are better than those of the USPS). To prevent a target device (either host or storage) from being sent more frames than it has buffer memory to store (overrun), the FC architecture provides a flow control mechanism based on a system of credits. Each credit represents the ability of the receiver to accept a frame. Simply stated, a transmitter cannot send more frames to a receiver than the receiver can store in its buffer memory. Once the transmitter exhausts the frame count of the receiver, it must wait for the receiver to credit back frames to the transmitter. A good analogy is a pre-paid calling card: there are a certain number of minutes, and you can talk until there is no more time on the card. Flow control exists at both the physical and logical level. The physical level is called “buffer-to-buffer flow control” and manages the flow of frames between transmitters and receivers. The logical level is called “end-to-end flow control” and it manages the flow of a logical operation between two end nodes. It is important to note that a single end-to-end operation may have made multiple transmitterto-receiver pair hops (end-to-end frame transmissions) to reach its destination. However, the presence of intervening directors and/or ISLs is transparent to end-to-end flow control. Buffer-to-buffer flow control is the more crucial subject in a cascaded FICON environment. Buffer-to-Buffer Flow Control Buffer-to-buffer flow control is flow control between two optically adjacent ports in the I/O path (that is, transmission control over individual network links). Each FC port has dedicated sets of hardware buffers for send and receive operations. These buffers are more commonly known as “BB credits.” The number of available BB credits defines the maximum amount of data that can be transmitted prior to an acknowledgment from the receiver. BB credits are physical memory resources incorporated in the Application Specific Integrated Circuit (ASIC) that manages the port. It is important to note that these memory resources are limited. Moreover, the cost of the ASICs increases as a function of the size of the memory resource. One important aspect of Fibre Channel is that adjacent nodes do not have to have the same number of credits. Rather, adjacent ports communicate with each other during Fabric LOGIn (FLOGI) and Port LOGIn (PLOGI) to determine the number of credits available for the send and receive ports on each node. Cascaded FICON in a Brocade environment 10 of 40 MAINFRAME Technical Brief A BB credit can transport a 2,112-byte frame of data. The FICON FC-SB-2 and FC-SB-3 ULPs use 64 bytes of this frame for addressing and control, leaving 2 K available for z/OS data. In the event that a 2 Gbit/sec transmitter is sending full 2,112-byte frames, 1 credit is required for every 1 km of fiber between the sender and receiver. Unfortunately, z/OS disk workloads rarely produce full credits. For a 4 K transfer, the average frame size is 819 bytes. Therefore, 5 credits would be required per km of distance as a result of the decreased average frame size. It is important to note that increasing the fiber speed increases the number of credits required to support a given distance. In other words, every time the distance doubles, the number of required BB credits doubles to avoid transmission delays for a specified distance. BB credits are used by Class 2 and Class 3 service and rely on the receiver sending back receiver-readies (R_RDY) to the transmitter. As was previously discussed, node pairs communicate their number of credits available during FLOGI/PLOGI. This value is used by the transmitter to track the consumption of receive buffers and pace transmissions if necessary. FICON directors track the available BB credits in the following manner: • Before any data frames are sent, the transmitter sets a counter equal to the BB credit value communicated by its receiver during FLOGI. • For each data frame sent by the transmitter, the counter is decremented by one. • Upon receipt of a data frame, the receiver sends a status frame (R_RDY) to the transmitter, indicating that the data frame was received and that the buffer is ready to receive another data frame. • For each R_RDY received by the transmitter, the counter is incremented by one. As long as the transmitter count is a non-zero value, the transmitter is free to continue sending data. This mechanism allows for the transmitter to have a maximum number of data frames in transit equal to the value of BB credit, with an inspection of the transmitter counter indicating the number of receive buffers. The flow of frame transmission between adjacent ports is regulated by the receiving port’s presentation of R_RDYs; in other words, BB credits has no end-to end-component. The sender decrements the BB credit by one for each R_RDY received. The initial value of the BB credit count must be non-zero. The rate of frame transmission is regulated by the receiving port based on the availability of buffers to hold received frames. It should be noted that the FC-FS specification allows the transmitter to be initialized at zero, or at the value of the BB credit count and either count up or down on frame transmit. Different switch vendors can handle this using either method, and the counting would be handled accordingly. Implications of Asset Deployment There are four implications of asset deployment to consider when planning BB-credit allocations: • For write-intensive applications across an ISL (tape and disk replication) the BB credit value advertised by the E_Port on the target gates performance. In other words, the number of BB credits on the target cascaded FICON director is the major factor. • For read-intensive applications across an ISL (regular transactions) the BB credit value advertised by the E_Port on the host gates performance. In other words, the number of BB credits at the local location is the major factor. • Two ports do not negotiate BB credits down to the lowest common value. A receiver simply “advertises” BB credits to a linked transmitter. • The depletion of BB credits at any point between an initiator and a target will gate overall throughput. Cascaded FICON in a Brocade environment 11 of 40 MAINFRAME Technical Brief Configuring BB Credit Allocations on FICON Directors There have been two FICON switch architectures for BB credit allocation. The first, which was prevalent on early FICON directors such as the Inrange/CNT FC9000 and McDATA 6064, had a range of BB credits that could be assigned to each individual port. Each port on a port card had a range of BB credits (for example 4 through 120) that could be assigned to it during the switch configuration process. Simple rules of thumb on a table/matrix were used to determine the number of BB credits to use. Unfortunately, these tables did not consider workload characteristics or z/OS particulars. Since changing the BB credit allocation was an offline operation, most installations would calculate what they needed, set the allocation, and (assuming it was correct) not look at it again. Best practice was typically to maximize BB credits used on ports being used for distance traffic, since each port could theoretically be set to the maximum available BB credits without penalizing other ports on the port card. Some installations would even maximize the BB credit allocation on short-distance ports, so they would not have to worry about it. However, this could cause other kinds of problems in recovery scenarios. The second FICON switch architecture, on the market today in products from Brocade and Cisco, has a pool of available BB credits for each port card in the director. Each port on the port card has a maximum setting. However, since there is a large pool of BB credits that must be shared among all ports on a port card, there must be better allocation planning. It is no longer enough to simply use distance rules of thumb. Workload characteristics of traffic need to be better understood. Also, as 4 Gbit/sec FICON Express4 becomes prevalent and 8 Gbit/sec FICON Express8 follows, intra-data-center distances become something to consider when deciding how to allocate the pool of available BB credits. It no longer is enough to say that a port is internal to the data center or campus and assign it the minimum number of credits. This pooled architecture and careful capacity planning it necessitates make it more critical than ever to have a way to track actual BB credit usage in a cascaded FICON environment. What follows is a discussion of what happens when you exhaust available BB credits and the concept of frame pacing delay. BB Credit Exhaustion and Frame Pacing Delay Similar to the ESCON directors that preceded them, FICON switches have a feature called “Control Unit Port (CUP)”. Among the many functions of the CUP feature is an ability to provide host control functions such as blocking and unblocking ports, safe switching, and in-band host communication functions such as port monitoring and error reporting. Enabling CUP on FICON switches while also enabling RMF 74 subtype 7 (RMF 74-7) records for the z/OS system, yields a new RMF report called the “FICON Director Activity Report.” Data is collected for each RMF interval if FCD is specified in the ERBRMFnn parmlib member. RMF will format one of these reports per interval per each FICON switch that has CUP enabled and the parmlib specified. This RMF report contains meaningful data on FICON I/O performance—in particular, frame pacing delay. Note that frame pacing delay is the only indication available to indicate a BB credit starvation issue on a given port. Frame pacing delay has been around since FC SAN was first implemented in the late 1990s by our open systems friends. But until the increased use of cascaded FICON, its relevance in the mainframe space has been completely overlooked. If frame pacing delay is occurring, then the buffer credits have reached zero on a port for an interval of 2.5 microseconds and no more data can be transmitted until a credit has been added back to the buffer credit pool for that port. Frame pacing delay causes unpredictable performance delays. These delays generally result in longer FICON connect time and/or longer PEND times that show up on the volumes attached to these links. Note that only when using switched FICON and only when CUP is enabled on the FICON switching device(s) can RMF provide the report that provides frame pacing delay information. Only the RFM 74-7 FICON Director Activity Report provides FICON frame pacing delay information. You cannot get this information from any other source today. Cascaded FICON in a Brocade environment 12 of 40 MAINFRAME Technical Brief Figure 3. Sample FICON Director Activity report (RMF 74-7) The fourth column from the left in Figure 3 is the column where frame pacing delay is reported. Any number other than 0 (zero) in this column is an indication of frame pacing delay occurring. If there is a non-zero number it reflects the number of times that I/O was delayed for 2.5 microseconds or longer due to buffer credits falling to zero. Figure 3 shows an optimal situation, zeros down the entire column indicating that enough buffer credits are always available to transfer FICON frames. Figure 4. Frame pacing delay indications in RMF 74-7 record But in Figure 4, you can see that on the FICON Director Activity Report for switch ID 6E, an M6140 director, there were at least three instances when port 4, a cascaded link, suffered frame pacing delays during this RMF reporting interval. This would have resulted in unpredictable performance across this cascaded link during this period of time. The next few sections provide answers to questions that arise in this discussion. Cascaded FICON in a Brocade environment 13 of 40 MAINFRAME Technical Brief What is the difference between frame pacing and frame latency? Frame pacing is an FC4 application data exchange measurement and/or throttling mechanism. It uses buffer credits to provide a flow control mechanism for FICON to assure delivery of data across the FICON fabric. When all buffer credits for a port are exhausted, a frame pacing delay can occur. Frame latency, on the other hand, is a frame delivery measurement, similar to measuring frame friction. Each element that handles the frame contributes to this latency measurement (CHPID port, switch/director, storage port adapter, link distance, and so on). Frame latency is the average amount of time it takes to deliver a frame from the source port to the destination port. What can you do to eliminate or circumvent frame pacing delay? If a long-distance link is running out of buffer credits, then it might be possible to enable additional buffer credits for that link in an attempt to provide an adequate pool of buffer credits for the frames being delivered over that link. But the number of buffer credits required to handle specific workloads across distance is surprising, as shown in Table 1. Table 1. Frame size, link speed, and distance determine buffer credit requirements Å Frame Æ Buffer Credits Required to 50 km Payload % Payload Bytes 1 Gbit/sec 2 Gbit/sec 4 Gbit/sec 8 Gbit/sec 10 Gbit/sec 100% 2112 25 49 98 196 290 75% 1584 33 65 130 259 383 50% 1056 48 96 191 381 563 25% 528 91 181 362 723 1069 10% 211 197 393 785 1569 2318 5% 106 321 641 1281 2561 3784 1% 21 656 1312 2624 5248 7755 Keep in mind that tape workloads generally have larger payloads in a FICON frame, while DASD workloads might have much smaller frame payloads. Some say the average payload size for DASD is often about 800 to 1500 bytes. By using the FICON Director Activity reports for your enterprise, you can gain an understanding of your own average read and write frames sizes on a port-by-port basis. To help you, columns five and six of the FICON Director Activity report in Figure 3 show the average read frame size and the average write frame size for the frame traffic on each port. These columns are useful when you are trying to figure out how many buffer credits will be needed for a long-distance link or possibly to solve a local frame pacing delay issue. Cascaded FICON in a Brocade environment 14 of 40 MAINFRAME Technical Brief How can you make improvements? Even with the new FICON directors and the ability to assign BB credits to each port from a pool of available credits on each port card, it is still not easy. The best hope for end users is to make a “correct” allocation and then monitor the RMF 74-7 report for frame pacing delay to indicate that they are out of BB credits. They can then make the necessary adjustments to the BB credit allocations to crucial ports, such as the ISL ports on either end of a cascaded link. However, any adjustments made will merely be a “guestimate,” since the exact number being used is not indicated. A helpful analogy is a car without a fuel gauge in which you have to rely on EPA MPG estimates to calculate how many miles you could drive on a full tank of gas. This estimate would not reflect driving characteristics, and in the end, the only accurate indication that the gas tank is empty is a coughing engine that stops running. Individual ports track BB credit availability, as was discussed earlier, and the mechanism by which this occurs was described. So it is a matter of creating a reporting mechanism. This is similar to a situation with monitoring open exchanges, discussed in a paper by Dr. H. Pat Artis, who made a sound case for why open exchange management is crucial in a FICON environment. He proved the correlation between response/service time skyrocketing and open exchange saturation, demonstrated how channel busy and bus busy metrics are not correlated to response/service time, and recommended a range of open exchanges to use for managing a FICON environment. Since RMF does not report open exchange counts, he derived a formula using z/OS response time metrics to calculate open exchanges. Commercial software such as MXG and RMF Magic use this to help users better manage their FICON environments. Similar to open exchanges, the data needed to calculate BB credit usage is currently available in RMF, and all that is needed are some mathematical calculations. As an area of future exploration, the RMF 74-7 record (FICON Director Activity report) could be updated with these two additional fields and the appropriate interfaces added between the FICON switches and CUP code. Switch management software could also be enhanced to include these two valuable metrics. Dynamic Allocation of BB Credits The technique used in BB credit allocation is very similar to the early technique used in managing PAV aliases. The simple approach used was called “static assignment.” With static assignment, the storage subsystem utility was used to statically assign alias addresses to base addresses. While a generous static assignment policy could help to ensure sufficient performance for a base address, it resulted in ineffective utilization of the alias addresses (since nobody knew what the optimal number of aliases was for a given base), which put pressure on the 64 K device address limit. Users would tend to assign an equal number of addresses to each base, often taking a very conservative approach, resulting in PAV alias overallocation. An effort to address this was undertaken by IBM with WorkLoad Manager (WLM) support for dynamic alias assignment. WLM was allowed to dynamically reassign aliases from a pool to base addresses to meet workload goals. Since this can be somewhat “lethargic,” users of dynamic PAVs still tend to overconfigure aliases and are pushing the 64 K device address limitation. Users face what you could call a “PAV performance paradox”: they need the performance of PAVs, tend to overconfigure alias addresses, and are close to exhausting the z/OS device addressing limit. Perhaps a similar dynamic allocation of BB credits, in particular for new FICON switch architectures having pools of assignable credits on each port card, would be a very beneficial enhancement for end users. Perhaps an interface between the FICON directors and WLM could be developed to allow WLM to dynamically assign BB credits. At the same time, since Quality of Service (QoS) is an emerging topic for FICON, an interface could be developed between the FICON switches and WLM for functionality with dynamic channel path management and priority I/O queuing to enable true end-to-end QOS. In October 2006, IBM announced HyperPAVs for the DS8000 storage subsystem family to address the PAV performance paradox. HyperPAVs increase the agility of the alias assignment algorithm. The primary difference between the traditional PAV alias management is that aliases are dynamically assigned to individual I/Os by the z/OS I/O Supervisor (IOS) rather than being statically or dynamically assigned to Cascaded FICON in a Brocade environment 15 of 40 MAINFRAME Technical Brief a base address by WLM. The RMF 78-3 (I/O queuing) record has also been expanded. A similar feature/functionality and interface between FICON switches and the z/OS IOS would be the ultimate in BB credit allocation: true dynamic allocation of BB credits on an individual I/O basis. This section has reviewed flow control, basics of BB credit theory, frame pacing delay, current BB credit allocations methods and presented some proposals for a) counting BB credit usage and b) enhancing how BB credits are allocated and managed. TECHNICAL DISCUSSION OF FICON CASCADING As stated earlier, cascaded FICON is limited to zSeries and System z processors only with the hardware and software requirements outlined earlier. In Figure 2, note that a cascaded FICON switch configuration involves at least three FC links: • Between the FICON channel card on the mainframe (known as an N_Port) and the FICON director’s FC adapter card (which is considered an F_Port) • Between the two FICON directors via E_Ports (the link between E_Ports on the switches is an interswitch link) • Link from the F_Port to a FICON adapter card in the control unit port (N_Port) of the storage device. The physical paths are the actual FC links connected by the FICON switches providing the physical transmission path between a channel and a control unit. Note that the links between the cascaded FICON switches may be multiple ISLs, both for redundancy and to ensure adequate I/O bandwidth. Fabric Addressing Support Single-byte addressing refers to the link address definition in the Input-Output Configuration Program (IOCP). Two-byte addressing (cascading) allows IOCP to specify link addresses for any number of domains by including the domain address with the link address. This allows the FICON configuration to create definitions in IOCP that span more than one switch. Figure 5 shows that the FC-FS 24 bit FC port address identifier is divided into three fields: • Domain • Area • AL Port In a cascaded FICON environment, 16 bits of the 24-bit address must be defined for the zSeries server to access a FICON CU. The FICON switches provide the remaining byte used to make up the full 3-byte FC port address of the CU being accessed. The AL_Port (arbitrated loop) value is not used in FICON and is set to a constant value. The zSeries domain and area fields are referred to as the F_Port’s port address field. It is a 2-byte value, and when defining access to a CU attached to this port using the zSeries Hardware Configuration Definition (HCD) or IOCP, the port address is referred to as the link address. Figure 5 further illustrates this, and Figure 6 is an example of a cascaded FICON IOCP gen. Cascaded FICON in a Brocade environment 16 of 40 MAINFRAME Technical Brief Figure 5. Fabric addressing support (a) Figure 6. Fabric addressing support (b) Cascaded FICON in a Brocade environment 17 of 40 MAINFRAME Technical Brief The connections between the two directors are established through the Exchange of Link Parameters (ELP). The switches pause for a FLOGI, and assuming that the device is another switch, they initiate an ELP exchange. This results in the formation of the ISL connection(s). In a cascaded FICON configuration, three additional steps occur beyond the normal FICON switched pointto-point communication initialization. A much more detailed discussion of the entire FICON initialization procedure can be found in Chapter 3 of the IBM Redbook, “FICON Native Implementation and Reference Guide,” pp 23-43. The three basic steps are: 1. If a 2-byte link address is found in the CU macro in IOCDS, a Query Security Attribute (QSA) command is sent by the host to check with the fabric controller on the directors if the directors have the high integrity fabric features installed. 2. The director responds to the QSA. 3. If it is an affirmative response, indicating that a high integrity fabric is present (fabric binding and insistent domain ID), the login continues. If not, login stops and the ISLs are treated as invalid (not a good thing). Figure 7. Sample IOCP coding for FICON cascaded switch configuration Cascaded FICON in a Brocade environment 18 of 40 MAINFRAME Technical Brief High Integrity Enterprise Fabrics Data integrity is paramount in a mainframe or any data center environment. End-to-end data integrity must be maintained throughout a cascaded FICON environment to ensure that any changes made to the data stream are always detected and that the data is always delivered to the correct end point. Brocade M-Series FICON directors in a cascaded environment use a software feature know as SANtegrity to achieve this. The SANtegrity feature key must be installed and operational in the Brocade Enterprise Fabric Connectivity Manager (EFCM). Brocade 24000 and 48000 FICON directors and the Brocade 5000 FICON switch use Secure Fabric OS. What does high integrity fabric architecture and support entail? • Support of Insistent Domain IDs. This means that a FICON switch will not be allowed to automatically change its address when a duplicate switch address is added to the enterprise fabric. Intentional manual operator action is required to change a FICON director’s address. Insistent Domain IDs prohibit the use of dynamic Domain IDs, ensuring that predictable Domain IDs are being enforced in the fabric. For example, suppose a FICON director has this feature enabled, and a new FICON director is connected to it via an ISL in an effort to build a cascaded FICON fabric. If this new FICON director attempts to join the fabric with a domain ID that is already in use, the new director is segmented into a separate fabric. It also makes certain that duplicate Domain IDs are not used in the same fabric. • Fabric Binding. Fabric binding enables companies to allow only FICON switches that are configured to support high-integrity fabrics to be added to the FICON SAN. For example, a Brocade M-Series FICON director without an activated SANtegrity feature key cannot connect to an M-Series FICON fabric/director with an activated SANtegrity feature key. The FICON directors that you wish to connect to the fabric must be added to the fabric membership list of the directors already in the fabric. This membership list is composed of the “acceptable” FICON director’s World Wide Name (WWN) and Domain ID. Using the Domain ID ensures that there will be no address conflicts, that is, duplicate domain IDs when the fabrics are merged. The two connected FICON directors then exchange their membership list. This membership list is a Switch Fabric Internal Link Service (SW_ILS) function, which ensures a consistent and unified behavior across all potential fabric access points. Managing Cascaded FICON Environments and ISLs: Link Balancing and Aggregation Even in over-provisioned storage networks, there may be “hot spots” of congestion, with some paths running at their limit while others go relatively unused. In other words, the storage network may be a performance bottleneck even if it has sufficient capacity to deliver all I/O without constraint. This typically happens when a network does not have the intelligence to load balance across all available paths. The unused paths may still be of some value for redundancy, but not for performance. Brocade has several options for supporting more evenly balanced cascaded FICON networks. NOTE: The FICON and SAN FC protocol (the FC-SW standard) utilizes path routing services that are based on the industry-standard Fabric Shortest Path First (FSPF) algorithm of that FC protocol. This is not the CHPID path; it is the connections between FICON switching devices (which cause a network to be created) that will utilize FSPF. FSPF allows a fabric (created when CHPIDs and storage ports are connected through one or more FICON switching devices) composed of more than one switching device (also called a storage network) to automatically determine the shortest route from each switch to any other switch. FSPF selects what it considers to be the most efficient path to follow when moving frames through a FICON fabric. FSPF identifies all the possible routes (multiple path connections) through the fabric and then manages initial route selection as well as sub-second path rerouting in the event of a link or node failure. Cascaded FICON in a Brocade environment 19 of 40 MAINFRAME Technical Brief The Brocade 5000 (FICON), Brocade 24000 and 48000 (FICON) Directors, and the Brocade DCX (FICON) Backbone support source-port route balancing via FSPF. This is known as Dynamic Load Sharing (DLS) and is part of the base FOS as long as fabric and E_Port functions are present. FSPF makes calculations based on the topology of a FICON network and determines the cost between end points. In many cascaded FICON topologies, there is more than one equal-cost path across ISLs. Which path to use can be controlled on a per-port basis from the source switch. By default, FSPF attempts to spread connections from different ports across available paths at the source-post level. FSPF can re-allocate routes whenever in-order delivery can still be assured (DLS). This may happen when a fabric rebuild occurs, when device cables are moved, or when ports are brought online after being disabled. DLS does a “best effort” job of distributing I/O by balancing source port routes. However, some ports may still carry more traffic than others, and DLS cannot predict which ISLs will be “hot” when it sets up routes since they must be allocated before I/O begins. Also, since traffic patterns tend to change over time, no matter how routes were distributed initially, it would still be possible for hot spots to appear later. Changing the route allocation randomly at runtime could cause out-of-order delivery, which is undesirable in mainframe environments. Balancing the number of routes allocated to a given path is not the same as balancing I/O, and so DLS does not do a perfect job of balancing traffic. DLS is useful, and since it is free and works automatically, it is frequently used. However, DLS does not solve or prevent most performance problems, so there is a need for more evenly balanced methods, such as trunking. On Brocade M-Series FICON switches, FSPF works automatically by maintaining a link state database that keeps track of the links on all switches in the FICON fabric and also associates a cost with each link in the fabric. Although the link state database is kept on all FICON switches in the fabric, it is maintained and synchronized on a fabric-wide basis. Therefore, every switch knows what every other switch knows about connections of host, storage, and switch ports in the fabric. Then FSPF associates a cost with each ISL between switching devices in the FICON fabric and ultimately chooses the lowest-cost path from a host source port, between switches, to a destination storage port. And it does this in both directions, so it would also choose the lowest-cost path from a storage source port, between switches, to a destination host port. The process works as follows. FSPF is invoked at PLOGI. At initial power on of the mainframe complex and after the fabric build and security processes have been fulfilled, individual ports supported by the fabric begin their initial PLOGI process. As each port (CHPID and storage port) logs into a cascaded FICON fabric, FSPF assigns that port (whether it will ever need to use a cascaded link or not) to route I/O over a specific cascaded link. Once all ports have logged in to the fabric, I/O processing can begin. If any port is taken offline and then put back online, it will go through PLOGI again and the same or a different cascaded link might be assigned to it. There is one problem with FSPF routing—it is static. FSPF decisions are made in the absence of data workflow that may prove to be inappropriate for the real-world patterns of data access between mainframe and storage ports. Since FSPF cannot know what I/O activity will occur across any specific link, it is only concerned about providing network connectivity. It has only a very shallow concern about performance— number of hops (which for FICON is 1 so that metric is always equal) and speed of each cascaded link (which can be different and can result in costing each cascaded link as a lower-to-higher cost link). FSPF static routing can result in some cascaded links being over-congested (due to a shortage of buffer credits and/or high utilization of bandwidth) and other cascaded links being under-utilized. FSPF does not take this into account as its only real function is to ensure that connectivity has been established. Although mainframe end users have long exploited the MVS and z/OS ability to provide automatic CHPID I/O loadbalancing mechanisms, there is not an automatic load-balancing mechanism built into the FC-SW-2 or FCSW-3 protocol when cascaded links are used. Cascaded FICON in a Brocade environment 20 of 40 MAINFRAME Technical Brief So on one hand a FICON cascaded fabric allows you to have tremendous flexibility and ultra high availability in the I/O architecture. You can typically enjoy decreased storage and infrastructure costs, expanded infrastructure consolidation options, ease of total infrastructure management, thousands more device addresses, access to additional storage control units per path, optimized use of all of your storage resources, and higher data availability. Also, higher data availability in a cascaded FICON zSeries or z9 environment implies better, more robust DR/BC solutions for your enterprise. So from that point of view, FICON cascading has many positive benefits. But on the other hand a plain, FSPF-governed, unmanaged FICON cascaded environment injects unpredictability into enterprise mainframe operations where predictability has always ruled. So you must take back control of your FICON cascaded environment to restore predictability to mainframe operations and stable, reliable, and predictable I/O performance to applications. All vendors provide the following: • Some form of cascaded link “trunking” • A choice of link speeds for the deployment of cascaded links • A means of influencing FSPF by configuring a preferred path (cascaded link) between the FICON switches on a port-by-port basis • A means to prevent a frame in a FICON switching device from transferring from a source port to a blocked destination port—including cascaded link port. But what do these mechanisms mean to you and how do you decide what to use to control your environment to obtain the results you want? First you have to know what you want to accomplish. Often you want to have the system “automatically” take care of itself and to adjust to changing conditions for management simplicity and elasticity of the I/O system in general to respond to situational workloads and unusual events. For other enterprises, it might be rigid control over the environment even if it means more work in managing the environment and less elasticity in meeting shifts in I/O workload hour to hour, day to day. So choosing the correct management strategy means that you must have a general understanding of each of the cascaded link control mechanisms, so that you can wisely plan your environment. The next section presents best practices in FICON cascaded link management. Cascaded FICON in a Brocade environment 21 of 40 MAINFRAME Technical Brief BEST PRACTICES FOR FICON CASCADED LINK MANAGEMENT The best recommendation to start with is to avoid managing FICON cascaded links manually! By doing so you will circumvent much tedious work—work that is prone to error and is always static in nature. Instead, implement FICON cascaded path management, which automatically responds to changing I/O workloads and provides a simple, labor-free but elegant solution to a complex management problem. This simplified management scheme can be deployed through a combination of using the free, automatic FSPF process and enabling a form of ISL trunking on each switching device in the FICON fabric. This section explores ISL trunking in greater detail. Brocade offers several trunking options for the Brocade 5000, 24000, 48000, and DCX platforms; Brocade M-Series FICON directors offer a software- based trunking feature known as “Open Trunking.” Terms and Definitions • Backpressure. A condition in which a frame is ready to be sent out of a port but there is no transmit BB credit available for it to be sent as a result of flow control from the receiving device. • Bandwidth. The maximum transfer bit-rate that a link is capable of sustaining; also referenced in this document as “capacity.” • Domain. A unique FC identifier assigned to each switch in a fabric; a common part of the FC addresses assigned to devices attached to a given switch. • Fabric Shortest Path First (FSPF). A standard protocol executed by each switch in a fabric, by which the shortest paths to every destination domain are computed output to a table that gives the transmit ISLs allowed when sending to each domain. Each such transmit ISL is on a shortest path to the domain, and FSPF allows any one of them to be used. • Flow. FC frame traffic arriving in a switch on a specific receive port that is destined for a device in a specific destination FC domain elsewhere in the fabric. All frames for the same domain arriving on the receive port are said to be in the same flow. • Oversubscription. A condition that occurs when an attempt is made to use more resources than are available, for example when two devices could source data at 1 Gbit/sec and their traffic is routed through one 1 Gbit/sec ISL, the ISL is oversubscribed. Frame-level Trunking Implementation Trunking allows traffic to be evenly balanced across ISLs while preserving in order delivery. Brocade offers hardware (ASIC)-based, frame-level trunking and exchange-level trunking on the Brocade 5000, 24000, 48000, and DCX platforms. The frame-level method balances I/O such that each successive frame may go down a different physical ISL, and the receiving switch ensures that the frames are forwarded onward in their original order. Figure 8 shows a frame-level trunk between two FICON switches. For this to work there must be high intelligence in both the transmitting and receiving switches. At the software level, switches must be able to auto-detect that forming a trunk group is possible, program the group into hardware, display and manage the group of links as a single logical entity, calculate the optimal link costs, and manage low-level parameters such as buffer-to-buffer credits and Virtual Channels optimally. Management software must represent the trunk group properly. For the trunking feature to have broad appeal, this must be as user-transparent as possible. Cascaded FICON in a Brocade environment 22 of 40 MAINFRAME Technical Brief At the hardware level, the switches on both sides of the trunk must be able to handle the division and reassembly of several multi-gigabit I/O streams at wire speed, without dropping a single frame or delivering even one frame out of order. To add to the challenge, there are often differences in cable length between different ISLs. Within a trunk group, this creates a skew between the amounts of time each link takes to deliver frames. This means that the receiving ASIC will almost always receive frames out of order and must be able to calculate and compensate for the skew to re-order the stream properly. There are limitations to the amount of skew that an ASIC can tolerate, but these limits are high enough that they do not generally apply. The real-world applicability of the limitation is that it is not possible to configure one link in a trunk to go clockwise around a large dark-fiber ring, while another link goes counterclockwise. As long as the differences in cable length are measured in a few tens of meters or less, there will not be an issue. If the differences are larger than this, a trunk group cannot form. Instead, the switch creates two separate ISLs and uses either DLS or DPS to balance them. Figure 8. Frame-level trunking concept The main advantage of Brocade frame-level trunking is that it provides optimal performance: a trunk group using this method truly aggregates the bandwidth of its members. The feature also increases availability by allowing non-disruptive addition of members to a trunk group, and minimizing the impact of failures. However, frame-level trunking does have some limitations. On the Brocade 5000, Brocade 24000 (with 16port, 4 Gbit/sec blades) and 48000 Directors, and the DCX, it is possible to configure multiple groups of up to eight 4 Gbit/sec links each. The effect is the creation of balanced 32 Gbit/sec pipes (64 Gbit/sec fullduplex). When connecting a Brocade 48000 or other 4 Gbit/sec switch to a 2 Gbit/sec switch, a “lowest common denominator” approach is used, meaning that the trunk groups is limited to 4x 2 Gbit/sec instead of 8x 4 Gbit/sec. Frame-level trunking requires that all ports in a given trunk must reside within an ASIC port-group on each end of the link. While a frame-level trunk group outperforms either DLS or DPS solutions, using links only within port groups limits configuration options. The solution is to combine frame-level trunking with one of Cascaded FICON in a Brocade environment 23 of 40 MAINFRAME Technical Brief the other methods, as illustrated in Figure 8, which shows frame-level trunking operating within port groups, and DLS operating between trunks. On the Brocade 48000 and DCX, trunking port groups are built on contiguous 8-port groups called “octets.” There are four octets: ports 0 – 7, 8 – 15, 16 – 23, and 24 – 31. The Brocade 5000, 48000, and DCX have flexible support for trunking over distance. Buffers are shared across 16-port groups, not limited by octets. For example, it is possible to configure up to 8-port 4 Gbit/sec trunks at 40 km (32 Gbit/sec trunk group) or 4-port 4 Gbit/sec trunks at 80 km (16 Gbit/sec trunk group). In some cases it may even be more desirable to configure trunks using 2 Gbit/sec links. For example, the trunk group may cross a DWDM that does not have 4 Gbit/sec support. In this case, an 8-port 2 Gbit/sec trunk can span up to 80 km. The above example is per 16-port blade or per 16 ports on the 32-port blade in the Brocade 48000. Brocade M-Series Director Open Trunking Open Trunking is an optionally licensed software feature that provides automatic, dynamic, statistical traffic load balancing across ISLs in a fabric environment. This feature can be enabled on a per-switch basis and it operates transparently to the existing FSPF algorithms for path selection in a fabric. It employs Template Registers in the port hardware and measures flow data rates and ISL loading—and then it uses these numbers to optimize use of ISL bandwidth. The feature controls FC traffic at a flow level rather than at a per-frame level (as is implemented in some hardware trunking deployments), in order to achieve optimal throughput. It does not require any special cooperation from (or configuration of) the adjacent switch. This feature complies with current Fibre Channel ANSI standards and can be used on McDATA switches in homogeneous as well as heterogeneous fabrics. Configuration and management of Open Trunking is provided via management interfaces (EFCM, CLI and SANpilot/EFCM Basic) through the following mechanisms: • McDATA feature key support. A unique feature key is required for each switch that will have Open Trunking enabled • Open Trunking enable/disable. A user configurable parameter that allows Open Trunking to be supported on all ISLs for a switch; the default is “disabled.” • Per-port offloading thresholds. When the bandwidth consumption of outbound traffic on an ISL exceeds the configured threshold, an attempt may be made to move flows to other equal-cost, but less heavily loaded ISLs. • Per-switch low BB credit threshold. When the percentage of time that a port spends with 0 (zero) BB credit exceeds this threshold, an attempt may be made to move flows to other equal-cost, but less heavily loaded ISLs • Event generation enable/disable for “Low BB Credit Threshold Exceeded” and “Bandwidth Consumption Threshold Exceeded”. If enabled, these events appear in the Event Log, as well as events that indicate when the condition has ended. • Open Trunking Reroute Log. This log contains entries that indicate a flow reroute. The objective of the Open Trunking feature is to make the most efficient use of redundant ISLs. Consider the fabric configuration in Figure 10 with five HBA N_Ports (on the right), six storage N_Ports (on the left), and four ISLs—and assume that all N_Ports and ISLs are 2 Gbit/sec. SW1 and SW2 are two Brocade MSeries FICON directors that support Open Trunking. Cascaded FICON in a Brocade environment 24 of 40 MAINFRAME Technical Brief Figure 9. Fabric configuration with Open Trunking Without Open Trunking, M-EOS software makes only a simple attempt to balance the loads on the four ISLs by allocating receive N_Ports round-robin to transmit ISLs. This results in each of SW2’s transmit ISLs carrying data from no less than one and no more than two HBAs, and each of SW1’s transmit ISLs carrying data from no less than one and no more than two disks. While this sort of load balancing is better than nothing, it has a major shortcoming: Actual ISL bandwidth oversubscription is not taken into account. If HBA1 and HBA5 are trying to send data at 2 Gbit/sec each while HBA2, HBA3, and HBA4 are sending little or no data, it is possible that HBA1 and HBA5 nevertheless find themselves transmitting their data on the same ISL. If each ISL has 2 Gbit/sec capacity, the result is that both HBA1 and HBA5 see their effective data rate cut in half, even though 75 percent of the total bandwidth between SW1 and SW2 is unused. Open Trunking periodically examines traffic statistics and reroutes traffic as needed from heavily loaded ISLs to less-loaded ISLs. It does this rerouting by modifying switch hardware forwarding tables. Traffic may be rerouted from an ISL of one capacity to an ISL of another capacity if that would improve the overall balance of traffic. Open Trunking is performed using the FSPF shortest-path routing database. In M-Series switches, all ISLs are assigned equal FSPF cost so that all paths with the minimum number of ISL hops can be used. (This FSPF link cost is independent of the Open Trunking cost functions discussed later.) The result is that the shortest paths from a given switch to a given destination domain often use transmit ISLs that have different speeds or go to different adjacent switches. When rerouting for load balancing, Open Trunking may reroute traffic among all such ISLs. Open Trunking is not restricted to rerouting among ISLs of the same bandwidth. Special care is taken when balancing loads among ISLs of different speed for two reasons: First, the user-perceived latency from a high-bandwidth ISL versus a low-bandwidth ISL at the same loading level is not normally the same; it can be expected to be higher for the low-bandwidth ISL even though both have the same percentage loading. So simply equalizing the percentage loading on the two does not work. Second, it is very easy to inadvertently swamp a low-bandwidth ISL by offloading traffic from a high-bandwidth ISL if the statistics for that traffic are underestimated, as is frequently the case when traffic is ramping up. Much of the complexity in the algorithms used is due to the problem of rerouting safely among ISLs having differing bandwidths. Cascaded FICON in a Brocade environment 25 of 40 MAINFRAME Technical Brief Use of Data Rate Statistics by Open Trunking Open Trunking measures as accurately as possible these three statistics: • The long-term (about a minute or so) statistical rates of data transmission between each ingress port (ISL or N_Port) and each destination domain • The long-term statistical loading of each ISL, measured in the same time span as the above • The long-term average percentage of time spent with zero transmit BB credits for each ISL. In the initial release of Open Trunking, a combination of ingress port and destination domain is called a “flow.” So the first item in the list above simply states that the statistical data rate of each flow is measured. Open Trunking uses these statistics to reroute flows as needed so as to minimize overall perceived overloading. For example, in Figure 10, if ISL1 is 99 percent loaded and has traffic from HBA1 and HBA2, while ISL2 is 10 percent loaded with traffic from HBA3, it might reroute either the flow from HBA1 or HBA2 onto ISL2. The choice is determined by flow statistics: If the flow from HBA1 to SW1 is 1.9 Gbit/sec, it does not reroute that flow, because doing so would overload ISL2. In that case only the flow from HBA2 to SW1 is rerouted. Unfortunately, Open Trunking cannot help ISLs that spend a lot of time unable to transmit due to lack of BB credits. This is a condition that is normally caused by overloaded ISLs or poor-performing N_Ports elsewhere in the fabric, not at the local switch. The 0 (zero) BB credit statistic is primarily used to ensure that Open Trunking does not make things worse by rerouting traffic onto ISLs that are lightly used but have little or no excess bandwidth due to credit starvation. It should be noted that the 0 (zero) BB credit statistic is not just the portion of time spent unable to transmit due to credit starvation. It also includes the portion of time spent transmitting with no more transmit credits. Since a credit is consumed at the start of a frame and not at the end of a frame, an ISL that is transmitting may have no transmit BB credits. It is common for an ISL to be 100 percent loaded and still have a 0 (zero) transmit BB credit statistic of close to 100 percent. Rerouting Decision Making At the core of Open Trunking is a cost function that computes a theoretical cost of routing data on an ISL. It is this cost function that makes it possible to compare loading levels of links with different bandwidth, 1 Gbit/sec versus 2 Gbit/sec: a 1 Gbit/sec ISL with 0.9 Gbit of traffic is not equally as loaded as a 2 Gbit/sec ISL with 0.9 Gbit of traffic. The cost function is based on the ISL loading and the link bandwidth. As a function of the ISL loading, it is steadily increasing with increasing slope. All rerouting decisions are made so as to minimize the cost function. This means that a flow is rerouted from ISL x to ISL y only if the expected decrease in the cost function for ISL x, computed by subtracting the flow’s data rate from ISL x’s data rate, is greater than the expected increase in the cost function for ISL y. In fact, to enhance stability of the system, the expected increase in the cost function for ISL y must be at least 10 percent less than the expected decrease in the cost function for ISL x. The cost functions are kept in pre-compiled tables, one for each variety of ISL (currently 2 Gbit/sec and 1 Gbit/sec). The 10 percent differential mentioned above is hard-coded in the tables. The cost function is needed mainly because of the difficulty of making rerouting decisions among ISLs of different bandwidths; without this requirement Open Trunking could reroute in such a way as to minimize the maximum ISL loading. Cascaded FICON in a Brocade environment 26 of 40 MAINFRAME Technical Brief Checks on the Cost Function Making improvement of the cost function the sole condition for rerouting would create an unacceptable and unnecessary risk of instability in routing for these reasons: • Statistics cannot be measured with 100 percent accuracy. • Statistics, even when measured accurately, may be in a state of flux when measured. • The cost function can be improved by offloading traffic from a lightly loaded ISL onto an even more lightly loaded ISL, but the minimal improvement in latency would be imperceptible to the user. To put it simply, too many flows would be rerouted too often if flows were rerouted every time the cost function could be improved. Therefore multiple checks have been implemented on the rerouting selection process. These all prevent flows from being rerouted, even in cases in which the cost function would be improved. Some of these can be adjusted by Brocade EFCM, CLI, or SANpilot configuration as follows: • Two versions of the ISL statistical data rate are kept, one designed to underestimate the actual data rate and the other designed to overestimate it. When making a rerouting decision, the statistics are used in such a way as to result in the most conservative (least likely to reroute) decision. • No flow is rerouted from an ISL unless the ISL utilization is above a minimum threshold, called the “offloading bandwidth consumption threshold,” or unless it spends more than “low BB credit threshold” portion of its time unable to transmit due to lack of BB credits. If one of these conditions is not present, there is no condition that justifies the cost of rerouting. Both of these parameters are user configurable. • No flow is rerouted to an ISL unless the ISL expected utilization, computed by adding the flow data rate to the ISL current data rate, is less than an “onloading bandwidth consumption threshold.” There is an onloading bandwidth consumption threshold for each ISL capacity. This threshold is not user configurable. • No flow can be rerouted if it has been rerouted recently. A period of “flow reroute latency” must expire between successive reroutes of the same flow. This latency is not user configurable. Periodic Rerouting Periodically, every “load-balancing period,” a rerouting task runs that scans all flows and decides which ones to reroute using the criteria discussed above. The load-balancing period is not user configurable. Cascaded FICON in a Brocade environment 27 of 40 MAINFRAME Technical Brief Algorithms to Gather Data Exponential Smoothing Averages for all statistics measured are kept by means of an exponential smoothing algorithm. The algorithm is partly controlled by a parameter called the “basic averaging time.” This number is 0.093 times the statistical half-life, the time over which a statistic loses half its influence on the smoothed average. The longer the basic averaging time, the slower the system is in reacting to momentary spikes in statistics. This parameter is not user configurable. Use of Template Registers for Flow Statistics There is no way to measure simultaneously the data rates of all flows in the architecture of any McDATA switch product that supports Open Trunking. Flow data rates have to be sampled using a small number of Template Registers. Each Template Register can be set to measure a single flow at a time. Template Registers examine each flow in turn, counting data for any one flow for a period of “sample tree sample time.” The numbers gathered are statistically weighted and exponentially smoothed to provide a flow data rate statistic for use in rerouting decision-making. Frame size estimation The Template Registers on non-4XXX series switches measure frame counts, not word or byte counts. But trunking requires word counts, because flow data rates are compared to ISL capacities measured in words per second. On 4XXX-series products, the Template Registers count actual word rates. On non-4XXX series switches, under normal circumstances, this problem is resolved by multiplying the statistical frame rates by the size of a maximum-size frame plus the minimum inter-frame gap inserted by a non-4XXX switch (which should be 40 to 50 words). This overestimates the data rate, but it is safer (less likely to result in unnecessary reroutes) to overestimate it than to underestimate it. Besides, most frames tend to be close to maximum size in applications having a high data rate. However, if it is impossible to relieve bandwidth oversubscription on an ISL using this overestimate, a frame size estimation algorithm is activated. This algorithm computes the average transmit frame size for the flow’s transmit ISL and computes a weighted average between it and the maximum frame size. This weighted average is then multiplied by a flow’s frame rate to approximate the flow’s data-rate. The weighting is adjusted to favor the average frame size, versus the maximum, as long as flows cannot be rerouted from a heavily loaded ISL and is adjusted the other way when they can be or when it is not heavily loaded. The effect of this is that an overloaded ISL that stays overloaded tends to use an average frame size that is close to the average transmit frame size. The speed at which this happens is controlled by the “frame size estimation-weighting factor” (not user configurable). The default of 64 is chosen so that it takes significantly longer than the half-life of the exponential averaging algorithm to switch to the smaller estimated frame size. Decreasing the frame size estimation factor makes this convergence occur proportionately faster and may result in an unstable system; increasing it increases stability but may slow down rerouting if there are a lot of very small frames in the traffic mix. Cascaded FICON in a Brocade environment 28 of 40 MAINFRAME Technical Brief Summary of Open Trunking Parameters The following table summarizes the parameters used for Open Trunking. Note that only parameters with a “Yes” in the “User Set” column can be changed by an EFCM, CLI or SANpilot/EFCM Basic user. Table 2. Open trunking parameters Name User Set Default What it Affects Comments Basic averaging time No 5575 ms Speed at which statistics reflect changes Internally, load-balancing period should be adjusted with it. Flow reroute latency No 60 sec Minimum time between reroutes for a flow Every reroute has a chance of misordering frames. Sample tree sample time No 100 ms Flow data rate accuracy, CTP processor loading due to flow statistics (the most CPUintensive operation Open Trunking performs) Only increase (internally) if CTP processor loading is too high. Load-balancing period No 45 sec Rate at which flows are checked for rerouting Consider (internally) adjusting basic averaging time with it. Failover disable time No 60 sec Time rerouting disabled after failover Internally, adjust if rerouting instability seen on failover. Offloading bandwidth consumption threshold Yes Default offloading bandwidth consumption threshold for ISL capacity Loading level above which rerouting is considered for the ISL. It may be set individually for each ISL, or the user may select use of defaults per ISL Adjust down for very latencyintensive applications. Adjust with onloading bandwidth consumption threshold. Default offloading bandwidth consumption threshold (1 G and 2 G) No 66% (1G) These are the values 1 Gbit/sec and 2 Gbit/sec ISLs respectively use for defaults (Internal) See above. Onloading bandwidth consumption threshold (1 G and 2 G) No Loading level below which ISL is eligible to have traffic rerouted to it Internally, adjust along with offloading bandwidth consumption threshold. Frame size estimation weighting No 64 Speed at which an extremely oversubscribed ISL switches from using maximum frame size to using average xmt frame size in flow data rate computation Low BB credit threshold Yes 50% Threshold on percentage of sample time during which the ISL has experienced a 0 (zero) BB credit condition 75% (2G) 75% (4G) 75% (4G) 66% (1G) 75% (2G) 75% (4G) 75% (4G) Cascaded FICON in a Brocade environment Reroutes occur from ISL if overall load balance can be improved by doing so and/or reroutes are prevented to this ISL when threshold is exceeded. 29 of 40 MAINFRAME Technical Brief Fabric Tuning Using Open Trunking The default configuration for Open Trunking event generation is “disabled.” When the feature is enabled, it is recommended that these events be left disabled unless the user is explicitly monitoring Open Trunking behavior with an eye to tuning or optimizing the fabric. When enabled, these events will indicate detected conditions that can be improved or alleviated by examining the traffic patterns through the entire fabric. The interpretations given to the two sets of events related to Open Trunking are as follows: • Bandwidth consumption threshold exceeded on an ISL Explanation: Open Trunking firmware has detected that there is an ISL that has Fibre Channel traffic that exceeds the configured offload threshold. Action: Review the fabric topology using the switch topology guidelines. This can be relieved by adding parallel ISLs, increasing the link speed of the ISL, or by moving devices to different locations in the fabric to avoid this condition. • Low BB credit threshold exceeded on an ISL Explanation: Open Trunking has detected a transmit ISL that has no credits for data transmission for a portion of time greater than the low BB credit threshold. This is a possible indication of heavy loading or oversubscription in the fabric downstream from the exit port if the available bandwidth usage on the ISL is not close to 100 percent. Action: Review the fabric topology using the switch topology guidelines. This can be relieved downstream by adding parallel ISLs, increasing the link speed of the ISL, or by moving devices to different locations in the fabric to avoid this condition. If this condition is brief and rare or if the reporting ISL has close to 100 percent throughput, this may be ignored. Manually increasing this configured threshold toward 100 percent when close to 100 percent bandwidth is being utilized will reduce the frequency of these events. Slow-draining downstream non-ISL devices may also be a cause for this event, and adding ISLs will not alleviate the occurrence of this event for those situations. Open Trunking Enhancements Rerouting has impacts: Whenever traffic is rerouted as a result of Open Trunking or other infrequent fabric situations such as the loss of an ISL, there is a possibility of out-of-order frame delivery. Therefore the algorithms used by Open Trunking are extremely cautious and are based on long-term stable usage statistics. A significant change in traffic patterns must last for about a minute or longer, depending on the situation, before Open Trunking can be expected to react to it. Significant improvements to Open Trunking were implemented in M-EOS 6.0 to reduce the likelihood of a reroute causing frames to arrive out of order at N_Ports. Some devices react adversely when they receive an Out-Of-Order Frame (OOOF), sometimes triggering retry processing of the FCP Exchange that can take as long as a minute. These out-of-order frames caused by Open Trunking reroutes can trigger occasional BF2D, xx1F, and AB3E errors on EMC Symmetrix FC adapters. Some Open Systems hosts will log temporary disk access types of events, and Windows hosts attached to EMC CLARiiON® arrays might see Event 11s. In FICON environments, an InterFace Control Check (IFCC) error can result from an out-of-order frame. Note, however, that discarded frames could also trigger many of these same problems. Open Trunking is specifically designed to alleviate congestion conditions that often cause discarded frames. Cascaded FICON in a Brocade environment 30 of 40 MAINFRAME Technical Brief In order to reduce the likelihood of these types of host or array problems, significant resources were invested to reducing the occurrence of OOOFs when Open Trunking determines a reroute is necessary. M-EOS 6.0 and later includes optimizations to allow the original path to drain remaining queued frames prior to starting transmission on the new rerouted path. In addition to a small delay to allow frames to drain from the current egress port, a small additional delay is included to allow the downstream switch some time to deliver the frames it has already received. This prevents OOOFs resulting from unknown downstream congestion when the re-route occurs. Even with the improvements to reduce OOOF delivery by temporarily delaying transmission of frames on the new path, it is strongly recommended that Open Trunking be used in single-hop configurations to reduce incremental increase in the possibility of an OOOF. Fabric configurations with more than one hop are acceptable as long as the hop count between data paths (N_Port to N_Port) is limited to one. Through extensive testing, it has been determined that the delay imposed by allowing the original path to drain does not significantly impede performance. In fact, the net delay introduced with these enhancements is typically less than 10 ms. In most situations, the congestion resolved by the reroute typically would have caused much longer frame delivery delays than the new Open Trunking behavior introduces. Product testing also shows that the new enhancements have virtually eliminated the occurrence of OOOFs, even in extremely congested and dynamic fabric conditions. Open Trunking Summary Brocade M-Series Open Trunking feature automatically balances performance throughout a FICON cascaded storage network, while minimizing storage administrator involvement in that management. Brocade Open Trunking could be characterized as “load balancing” in that it detects conditions where FICON is experiencing congestion on a single cascaded link and checks to see if there are other uncongested cascaded links available. If it can relieve the congestion, it permanently shifts some of the traffic, essentially "balancing" the loads across all available cascaded links over time. The term "dynamic load-balancing" is often used to reflect the fact that it continuously monitors for cascaded link congestion and can automatically rebalance the flows across these fabric links at any time, adjusting as traffic patterns change and continuously balancing loads as conditions dictate. Although Brocade calls this intelligent fabric management scheme “Open Trunking,” it is dissimilar to the traditional “hardware trunking,” because cascaded links are not grouped into "trunks" and data flows are not interleaved across those trunks of cascaded links. There are benefits to hardware trunking for certain aspects of fabric behavior, but there are also drawbacks. There are restrictions for which ports can be trunked and simultaneous over-congestion on these hardware trunked ports must be constantly monitored. Open Trunking is much more flexible in this regard, because it sets no limit to the number of ports that can be used for an “open trunk” group. Regarding the interleaving capability of hardware trunks, unless you experience cascaded link congestion, you do not want to "balance" an ISL. No benefit is derived from frame interleaving over cascaded links that are not suffering from congestion. Open Trunking is invisible to all mainframe applications and requires no user interaction; it is truly “automatic”. So FSPF can and should do your initial cascaded link routing automatically, and then Open Trunking immediately and automatically solves cascaded link congestion problems that occur, when they occur and without your involvement in the Brocade M-Series FICON environment. Cascaded FICON in a Brocade environment 31 of 40 MAINFRAME Technical Brief Controlling FICON Cascaded Links in More Demanding Environments Sometimes customers have requirements to control how FICON cascaded links are used explicitly. For example, deploying an intermixed FICON and FC infrastructure might create one of these more rigid environments. “Fat Pipe” high-speed cascaded links might also need to be managed to service one specific environment and not another. So you need to understand what mechanisms are available to you under these circumstances and others situation that you might encounter specific to your enterprise. FSPF and some variation of trunking can still be used to automate as much cascaded link decongestion as possible. But they must be influenced by other tools to give us greater manual control over the complete infrastructure. Two additional tools are available to influence the allocation and decongestion of cascaded links—Preferred Path and Prohibit Path. Preferred Path on M-Series FICON Switches Preferred Path is an optional feature for Brocade M-Series switches that allows you to influence the route of data traffic when it traverses multiple switches in a fabric. Using Preferred Path and your in-depth knowledge of your own I/O environment, you can define routes across a fabric and specify your preference regarding the assignment of CHPIDs and storage ports to specific cascaded links. If more than one cascaded link (ISL) connects switches in a fabric, you can specify a cascaded link preference for a particular flow. The data path consists of the source port of switch being configured, the exit port of that switch, and the domain ID of the destination switch, as shown in Figure 10. Each switch must be configured for its part of the desired path to achieve optimal performance. You may need to configure Preferred Paths for all switches along the desired path for proper multi-hop Preferred Path operation. Preferred Path can be configured using either CLI or Brocade EFCM. Preferred Path allows you to control which cascaded links certain applications use based on the ports to which the channels and devices are connected, while allowing failover to alternate cascaded links should the preferred path fail. Figure 10. Specifying a Preferred Path using Brocade EFCM Cascaded FICON in a Brocade environment 32 of 40 MAINFRAME Technical Brief If a Preferred Path fails, FSPF assigns a functional cascaded link to the F_Port flows on the failed connection. When the Preferred Path is returned to service, its original F_Port flows are re-instated. Any F_Ports that are not assigned to a cascaded link via Preferred Path are assigned to a cascaded link using the FC SPF process. NOTE: If FSPF becomes involved in allocating cascaded links, then these flows will co-exist with the flows created using Preferred Path on the same cascaded links. This could create congestion on one or more cascaded links if it is not taken into consideration. The limitations of Preferred Path are as follows: • Open Trunking does not manage Preferred Path data flows, so Open Trunking cannot do any automatic decongestion of Preferred Path links. • Preferred Path cannot be used to ensure a FICON Cascaded 1-hop environment. This is due to the fact that Preferred Path will fail over to another random cascaded link path if its primary Preferred Path fails, which could lead to a multi-hop FICON fabric. Use port blocking to ensure that FICON pathing contains only a single hop. • The Brocade 48000 and DCX do not currently support "preferred pathing" or "blocked E_ports" for cascaded FICON directors. Brocade has a command called uRouteConfig, which allows you to set up static routes for specific ports on the switch. But uRouteConfig requires aptpolicy=1 (port-based routing). (Note that IBM requires port-based routing for Brocade 48000 and DCX FICON Cascade Mode.) However, uRouteConfig is not supported when the chassis configuration is chassisconfig 5, which is the chassis configuration for the Brocade 48000 and DCX. Here are some best practices for using Preferred Path: • If you are going to use Preferred Path, then assign every F_Port in the fabric to a cascaded link using Preferred Path and do not let FSPF do any cascaded link assignments. • Disk and tape should not share the same cascaded links. FICON and FCP traffic should also be separated across the cascaded links. Use Preferred Path in these cases to direct these frame traffic flows to different cascaded links. Prohibit Paths A FICON frame and an FCP frame are essentially the same with the exception of the data payload the frame is carrying. So, technically speaking, a cascaded link (ISL) can carry both FICON and FCP frames consecutively over their links with no problems. But many customers want to separate FICON network traffic from SAN network traffic (and maybe even from disk replication traffic), and then make sure that these different flows of network traffic remain independent but uncongested as well. Keep in mind that creating these separate network data flows across a fabric is a business decision and not a technical decision. But if it is a business requirement, there are technical means to do it. If you run systems automation and have implemented the function known as “I/O operations (IO-Ops),” then from the MVS or zOS console you can use in-band FICON commands that allow you to block or unblock ports on any FICON switching devices. If you do not use IO-Ops, then by utilizing EFCM you can configure the PDCMs by using the address configuration matrix to block or unblock ports. In that case, you use EFCM management routines rather than MVS or zOS procedures to do this blocking and unblocking. First, block all SAN ports on a switching device in the fabric from transferring frames to all of the FICON ports on the switching device. Then, block all FICON ports on a switching device in the fabric from transferring frames to all of the SAN ports on the switching device. This completely stops the flow of frame traffic from the blocked ports as both source and destination ports. It is done at the hardware level, so regardless of how you zone a port or prefer a path, a frame will NEVER pass to the blocked port from the Cascaded FICON in a Brocade environment 33 of 40 MAINFRAME Technical Brief source port unless and until you revise the PDCM blocking configuration. And you can do exactly this same blocking for the cascaded link ports. For example, for 10 cascaded links, you can use 4 links for SAN FC traffic only and the other 6 for FICON traffic only. Choose 6 of the cascaded links for FICON only and block all SAN ports from using those cascaded links. Block those 6 cascaded link ports from connecting to all of the SAN ports. Then perform the same procedure, but this time block the 4 remaining cascaded links away from the FICON ports. This creates two separate flows of network traffic that will never intermingle. At this point, you should consider implementing some form of trunking to manage both of these now physically separate network data flows independently; decongesting cascaded links but only in the set of cascaded links assigned to that specific flow. PDCM port blocking is the strongest method you can use to control frame flow through a switching device. For that reason you should be very careful when you use it, since it affects FSPF, Preferred Path, and Trunking algorithms. FSPF cannot assign a blocked cascaded link as a route across the network for ports that are blocked from using it. You can configure a blocked cascaded link as a Preferred Path, but no frames will ever be sent to it from the ports that are blocked. Open Trunking cannot move work from a congested cascaded link to an uncongested cascaded link if that uncongested link is blocked from connecting to the port with the work flow that will be moved. NOTE: If you are experiencing a problem getting a switching port to connect to another switching port, it might be a PDCM, hardware-blocked port. A blocked port can be very difficult to diagnose and quickly troubleshoot. The only way you will know is to check the PDCM addressing matrix using FICON CUP or EFCM. Figure 11. Prohibit path and the PDCM addressing matrix Cascaded FICON in a Brocade environment 34 of 40 MAINFRAME Technical Brief Traffic Isolation Zones on B-Series FICON Switches Using Traffic Isolation (TI) Zones, you can provision certain E_Ports to carry only traffic flowing from a specific set of source ports. This allows you to control the flow of interswitch traffic, for example, to dedicate an ISL to high-priority, host-to-target traffic. Or it might be used to force high-volume (but lower-priority) traffic onto a given ISL to limit the effect of this high traffic pattern on the fabric at large. In either case a TI zone can be created that contains the set of N_Ports and the set of E_Ports to use for specific traffic flows. When a TI zone has been set up, the traffic entering a switch from one of the given set of ports (E_Ports or N_Ports) use sonly those E_Ports defined within that zone for traffic to another domain. But if there is no other way to reach a destination other than by using an E_Port that is not part of that zone, that E_Port is still used to carry traffic from and to a device in its group. This is the default behavior of TI zones, unless it is overridden by creating a TI zone with failover disabled. In a TI zone with failover disabled, when any of the E_Ports comprising the TI-Zone go down, and E_Port that does not belong to TI zone will not be used to carry traffic and the traffic isolation path is deemed broken. Similarly, the E_Port belonging to a particular traffic isolation zone does not carry any other traffic belonging to devices outside the zone unless that E_Port is the only way to reach a given domain. The TI zones appear in the defined zone configuration only and not in the effective zone configuration. A TI zone is only used for providing traffic isolation, and zone enforcement is based on the regular userconfigured zones. Consider the following when you are thinking about using TI zones: • TI zones are supported on Condor and Condor 2 (ASIC) Brocade FICON switches running in Brocade native mode. TI zones cannot be used in FICON environments running in interop or McDATA fabric mode. • TI zones are not defined in an FC standard and are unique to Brocade. However, their design conforms to all underlying FC standards, in the same way as base Fabric OS. • TI zones are not backward compatible, so traffic isolation is not supported in FICON environments with switches running firmware versions earlier than FOS 6.0.0. However, TI zones in such a fabric do not disrupt fabric operation in switches running older firmware versions. You must create a TI zone with members belonging to FICON switches that run firmware version 6.0.0 or later. When a zone is marked as a TI zone, the fabric attempts to isolate all inter-switch traffic entering a switch from a member of that zone to only those E_Ports that have been included in the zone. In other words, the domain routes for any of the members (N_Port or E_Port) to the domains of other N_Port members of the zone are set to use an E_Port included in the zone, if it exists. Such domain routes are used only if they are on a lowest-cost path to the target domain (that is, the FSPF routing rules will continue to be obeyed). The fabric will also attempt to exclude traffic from other TI zones from using E_Ports in a different TI zone. This traffic shaping is a “best effort” facility that will do its work only as long as doing so does not violate the FSPF “lowest cost route” rules. This means that traffic from one TI zone may have to share E_Ports with other TI zones and devices when no equal-cost routes can be found using a “preferred” E_Port. And if a “preferred” E_Port fails, traffic fails over to a “non-preferred” E_Port if no preferred E_Ports offer a lowestcost route to the target domain. Similarly, a non-TI device’s traffic uses an E_Port from a TI zone if no equal cost alternatives exist. As mentioned earlier, TI zones do not appear in the effective zone set for a number of reasons. First, the members are defined using D,I notation. Doing so allows the routing controls to be determined at the time the zones are put into effect, eliminating the significant overhead that would be required if WWNs were used and the routing controls were discovered incrementally, as devices come online. But the use of D,I in TI zones would cause issues on switches running versions of FOS earlier than 6.0.0 if included in the effective set, setting all zones referencing devices included in a TI zone to Session mode based on mixed mode zoning. Additionally, the intent of a TI zone is to control routing of frames among the members and Cascaded FICON in a Brocade environment 35 of 40 MAINFRAME Technical Brief is not intended to “zone them all together.” The Zone daemon (zoned) extracts all TI zones from the defined zone database whenever a change is made to the defined database and pushes them to the nsd for application. When a TI zone is being activated, the nsd in each switch determines if any of the routing preferences for that zone apply to the local switch. This determination must include the appropriate screening for Administrative Domain (AD) membership if ADs are being used. If any valid TI zones are found that apply to members on this switch, the nsd in turn pushes the TI zone to fspfd. The FSPF daemon (fspfd) is responsible for applying the routing controls specified by TI zones. The fspfd applies those preferences using a new set of APIs (that is, ioctls) provided by the kernel routing functions. Figure 12. An example of the use of TI zones Consider the following TI zones created for the fabric shown in Figure 12: Zone --create -t ti “redzone” -e “1,1; 2,2; 2,4; 2,6; 3,8; 4,5” -n “1,8; 1,9; 3,6; 3,7; 4,8; 4,9; 4,7” Zone --create –t ti “bluezone” -e 1,10; 2,10; 2,20; 3,12; 4,20” -n “1,20; 3,22; 4,10” The TI zone redzone creates dedicated paths from Domains 1, 3, and 4 through the core switch Domain 2. All traffic entering Domain 1 from device ports 8 and 9 are routed through port 1, regardless of which domain they are going to. And no traffic coming from other ports in Domain 1 uses port 1, again regardless of which domain it is going to. Similarly, any traffic entering Domain 2 from port 2 is routed only to port 8 or 6 when going to Domains 3 or 4 respectively. And port 2 is used solely for traffic coming from ports 4 or 6 (the other redzone E_Ports in Domain 2). Each TI zone is interpreted by each switch and each switch considers only the routing required for its local ports. No consideration is given to the overall topology and to whether the TI zones accurately provide dedicated paths through the whole fabric. For example, the TI zone called “bluezone” creates a dedicated path between the two blue devices on Domains 1 and 3 (port 20, 22). However, a misconfiguration of Domain 4 will result in port 20 being used only for traffic coming from the device on port 10 (that is, a dedicated E_Port for outbound traffic), but that traffic uses only the “black” E_Ports to go to Domains 3 or 1. Similarly, all blue traffic coming into Domain 2 goes to Domain 4 through one of the 44 – 50 ports, since no blue E_Port has been configured in Domain 2 that connects to Domain 4. Nothing fatal will occur, but the results may not meet expectations. The correct configuration would have included 3,44 in the E_Port list. Cascaded FICON in a Brocade environment 36 of 40 MAINFRAME Technical Brief TI Zones Best Practices A few general rules for Traffic Isolation zones: • An N_Port can be a member of only a single TI zone, because a port can have only one route to any specific domain. This “non-duplication” rule is enforced during zone creation and modification. If ADs are configured, this checking is done only against the current ADs Zone database. The zone --validate command checks against the defined database of all ADs. • An E_Port can be a member of only a single TI zone. Since an E_Port can be a source port (that is, for incoming frames) as well as a destination, the same “one route to a specific domain” rule applies to E_Ports and forces this limitation. The same checking is done as described for N_Ports. • If multiple E_Ports are configured that are on the lowest-cost route to a domain, the various source ports for that zone are load balanced across the specified E_Ports. • A TI zone provides exclusive access to E_Ports (for outbound traffic) as long as other equal-cost, nondedicated E_Ports exist. Only source ports included in the zone are routed to zone E_Ports as long as other paths exist. If no other paths exist, the dedicated E_Ports are used for other traffic. Note that when this occurs, all traffic routed to the “dedicated” E_Port uses the dedicated path through switches, regardless of which ports are the source. • No port can appear in a TI zone and an ISL Binding zone. A few more rules if ADs are in effect: • If used within an AD, the E_Ports specified in a TI zone must be in that AD’s device list, enforced during zone creation and modification. • Since TI zones must use D,I notation, the AD’s device list must be declared using D,I for ports that are to be used in such zones, enforced during zone creation and modification. • Take care if you are using TI zones for shared ports (E_Ports or N_Ports) because of the limitation that a given port can appear in only one TI zone. Conflicting members across ADs can be detected by the use of zone –validate, and best practice dictates that such situations not be allowed to persist. (It might be best not to allow ISL Bind or TI zones to reference a shared N_Port or E_Port, since one AD administrator can then interfere with actions of another AD administrator. But this may be hard to do.) Following is an example of implementing FICON and FCP (SAN) intermix on the same fabric(s) to more rigidly control FICON and cascaded links in this type of environment. The challenge for mixing FCP and FICON comes from the management differences between the two protocols, primarily the mechanism for controlling device communication. Because FICON and FCP are FC4 protocols, they do not affect the actual switching of frames, therefore the differences are not relevant until the user wants to control the scope of the switching through zoning or connectivity control. Name Server zoning used by FCP devices, for example, provides fabric-wide connection control. By contrast, PDCM connectivity control typically used by FICON devices provides switch-wide connection control. Mainframe and storage vendors strongly recommend that if you are implementing intermix you should block the transfer of any and all frames from a FICON switch port to all SAN connected ports. And then you will need to do the reverse as well, blocking the transfer of any and all frames from a SAN switch port to all FICON connected ports. But what about the cascaded links (called ISLs in the SAN world)? Can they be shared by both FICON and FCP? Cascaded FICON in a Brocade environment 37 of 40 MAINFRAME Technical Brief SUMMARY For the mainframe customer, FICON cascading offers new capabilities to help meet the requirements of today’s data center. Your challenge is to ensure performance across the FICON fabric’s cascaded links to insure the highest possible level of data availability and application performance at the lowest possible cost. Cascaded FICON in a Brocade environment 38 of 40 MAINFRAME Technical Brief APPENDIX: FIBRE CHANNEL CLASS 4 CLASS OF SERVICE (COS) Some initial QoS efforts were made in the T11 Standards group to develop a QoS standard for FC. It was written as a Class of Service, and it was very complex. Consultants worked with the major switch vendors to develop a set of proposals that impacted several different standards. A summary of Class 4 follows. It was never formally adopted or implemented. The discussion of Class 4 is included to reinforce the point that QoS is a complex topic, and not just a marketing buzzword. A Fibre Channel class of service can be defined as a frame delivery scheme exhibiting a specified set of delivery characteristics and attributes. ESCON and FICON are both part of the FC standard and class of service specifications. • Class 1. A class of service providing a dedicated connection between two ports with confirmed delivery or notification of non-delivery. • Class 2. A class of service providing a frame switching service between two ports with confirmed delivery or notification of non-deliverability. • Class 3. A class of service providing a frame switching datagram service between two ports or a multicast service between a multicast originator and one or more multicast recipients. • Class 4. A class of service providing a fractional bandwidth virtual circuit between two ports with confirmed delivery or notification of non-deliverability. Class 4 is frequently referred to as a “virtual circuit” class of service. It works to provide better quality of service guarantees for bandwidth and latency than Class 2 or Class 3 allow, while providing more flexibility than Class 1 allows. Similar to Class 1, it is a type of dedicated connection service. Class 4 is a connectionoriented class of service with confirmation of delivery (acknowledgement) or notification that a frame could not be processed (reject). Class 4 provides for the allocation of a fraction of the bandwidth on a path between two node ports and guarantees latency within negotiated QoS bounds. It provides a virtual circuit between a pair of node ports with guaranteed bandwidth and latency in addition to the confirmation of delivery or notification of non-deliverability of frames. For the duration of the Class 4 virtual circuit, all resources necessary to provide that bandwidth are reserved for that virtual circuit, so it is frequently referred to as a “virtual circuit class of service.” Unlike Class 1, which reserves the entire bandwidth of the path, Class-4 supports the allocation of a requested amount of bandwidth. The bandwidth in each direction is divided up among up to 254 Virtual Circuit (VC) connections to other N_Ports on the fabric. When the virtual circuits are established, resources are reserved for the subsequent delivery of Class 4 frames. Like Class 1, Class 4 provides in-order delivery of frames. A Class 4 circuit includes at least one VC in each direction with a set of QoS parameters for each VC. These QoS parameters include guaranteed transmission and reception bandwidths and/or guaranteed maximum latencies in each direction across the fabric. When the request is made to establish the virtual circuit, the request specifies the bandwidth requested, as well as the amount of latency or frame jitter acceptable. Bandwidth and latency guarantees for Class 4 virtual circuits are managed by the QoS Facilitator (QoSF), a server within the fabric. The QoSF is at the well-known address x’FF FFF9’ and is used to negotiate, manage, and maintain the QoS for each VC and assure consistency among all the VCs set up across the full fabric to all ports. The QoSF is an optional service defined by the Fibre Channel Standards to specifically support Class 4 service. Because the QoSF manages bandwidth through the fabric, it must be provided by a Class 4-capable switch. At the time the virtual circuit is established, the route is chosen and a circuit created. All frames associated with the Class 4 virtual circuit are routed via that circuit insuring in-order frame delivery within a Class 4 virtual circuit. In addition, because the route is fixed for the duration of the circuit, the delivery latency is Cascaded FICON in a Brocade environment 39 of 40 MAINFRAME Technical Brief deterministic. Class 4 has the concept that the VCs can be in a “dormant” state, with the VC set up at the N_Ports and through the fabric but with no data flowing or a “live” state, where data is actively flowing. To set up a Class 4 virtual circuit, the CircuiT Initiator (CTI) sends a QoS Request (QoSR) extended link service command to the QoSF. The QoSF verifies that the fabric has the available transmission resources to satisfy the requested QoS parameters, and then forwards the request to the CircuiT Recipient (CTR). If the fabric and the recipient can both provide the requested QoS, the request is accepted and the transmission can start in both directions. If the requested QoS parameters cannot be met, the request is rejected. In Class 4, the fabric manages the flow of frames between node ports and the fabric by using the virtualcircuit flow control mechanism. This is a buffer-to-buffer flow control mechanism similar to the R_RDY FC flow control mechanism. Virtual-circuit flow control uses the VC ready (VC_RDY) ordered set. VC_RDY resembles FC R_RDY, but it contains a virtual circuit identifier byte in the primitive signal, indicating which VC is being given the buffer-to-buffer- credit. Managing the flow of frames on ISLs must also support the virtual-circuit flow control to manage the flow of Class 4 frames between switches. Each VC_RDY indicates to the N_Port that a single Class 4 frame is needed from the N_Port if it wishes to maintain the requested bandwidth. Each VC_RDY also identifies which virtual circuit is given credit to send another frame. The fabric controls the bandwidth available to each virtual circuit via the frequency of VC_RDY transmission for that circuit. One VC_RDY per second is permission to send 1 frame per second (2 kilobytes per second if 2 K frame payloads are used). One thousand VC_RDYs per second is permission to send 1,000 frames per second (2 megabytes per second if 2 K frame payloads are used). The fabric is expected to make any unused bandwidth available for other live Class 4 circuits and for Class 2 or 3 frames, so the VC_RDY does allow other frames to be sent from the N_Port. There are potential scalability difficulties associated with Class 4 service, since the fabric must negotiate resource allocation across each of the 254 possible VCs on each N_Port. Also, Fabric Busy (F_BSY) is not allowed in Class 4. Resources for delivery of Class 4 frames are reserved when the VC is established, and therefore the fabric must be able to deliver the frames. Class 4 is a very complex issue. For more detailed information, refer to Kembel’s Fibre Channel Consultant series of textbooks. In addition, because of the complexity, Class 4 was never fully adopted as a standard. Further work on it was stopped, and much of the language has been removed from the FC standard. FC-FS-2 letter ballot comment Editor-Late-002 reflected the results of surveying the community for interest in using and maintaining the specification for Class 4 service. Almost no interest was discovered. It was agreed to resolve the comment by obsoleting all specifications for Class 4 service except the VC_RDY primitive, which is used by the FC-SW-x standard in a way that is unrelated to Class 4. Therefore, other mechanisms/models for QoS in FICON (FC) were considered, such as the method used by InfiiniBand. © 2008 Brocade Communications Systems, Inc. All Rights Reserved. 07/08 GA-TB-017-01 Brocade, Fabric OS, File Lifecycle Manager, MyView, and StorageX are registered trademarks and the Brocade B-wing symbol, DCX, and SAN Health are trademarks of Brocade Communications Systems, Inc., in the United States and/or in other countries. All other brands, products, or service names are or may be trademarks or service marks of, and are used to identify, products or services of their respective owners. Notice: This document is for informational purposes only and does not set forth any warranty, expressed or implied, concerning any equipment, equipment feature, or service offered or to be offered by Brocade. Brocade reserves the right to make changes to this document at any time, without notice, and assumes no responsibility for its use. This informational document describes features that may not be currently available. Contact a Brocade sales office for information on feature and product availability. Export of technical data contained in this document may require an export license from the United States government. Cascaded FICON in a Brocade environment 40 of 40