a simulation model for streaming applications over a power
Transcription
a simulation model for streaming applications over a power
A SIMULATION MODEL FOR STREAMING APPLICATIONS OVER A POWER-MANAGEABLE WIRELESS LINK Andrea Acquaviva Emanuele Lattanzi Alessandro Bogliolo ISTI - University of Urbino Piazza della Repubblica 13 61029 Urbino, Italy E-mail:acquaviva,lattanzi,bogliolo sti.uniurb.it KEYWORDS Multimedia, Event-Oriented, Resource Management, RealTime, Simulators. ABSTRACT In this work we introduce a hardware-validated simulation model for the exploration of real-time multimedia systems, where system components are modeled as interacting generalized semi-Markov processes (GSMPs). We apply the simulation model to explore the design space of a mobile client accessing streaming data through a wireless network. The model has been characterized and validated against power and performance measurements performed on an instrumented HP’s iPAQ with wireless LAN running a MPEG4 video application. We analyze the impact of tuning parameters for the real-time multimedia system (buffer sizes, channel bandwidth, power management policy) on the trade off between power consumption and QoS. INTRODUCTION One of the most critical challenges in designing wireless multimedia systems is to provide adequate quality of service (QoS) with optimal energy efficiency. In many cases, there are obvious trade offs between quality of service (e.g., bandwidth, latency) and power consumption. To avoid poorly controlled QoS degradation, multimedia systems are often designed and managed in a conservative fashion, with little consideration for energy efficiency. QoS modeling of wireless networked systems (Cali et al., 1998; Eshghi & Ekhakeem, 1998) is a mature and highly active discipline (refer to (MSWIM, 2002) for an up-to-date overview of the topic), and various approaches have been explored to enhance QoS-oriented models with power consumption models (Raghunathan et al., 2002; Sinha & Chandrakasan, 2001; Marculescu et al., 2001; Krashinski & Balakrishnan, 2002; Zorzi & Chockalingam, 1998). Most of the approaches explored in the past adopted a stochastic discrete event model (Lee & Sangiovanni-Vincentelli, 1998), Luca Benini DEIS - University of Bologna Viale Risorgimento 2 Bologna, 40136, Italy E-mail: lbenini deis.unibo.it where the system evolves in an enumerable set of time instants and transitions are randomized. Our work moves from this widely adopted modeling framework and pushes it one step closer to practice. We present two original contributions. First, we developed, consistently with previous work, a detailed stochastic discrete-event model of a complex multimedia system. Our model includes all hardware and software components involved in communication of multimedia data over a wireless channel: application software, transport and network stack, operating system drivers, power management software, network interface card hardware, wireless channel, base station hardware and software. Many of the model components are parameterized, in an effort to represent a large design space of different hardware and software configurations. Second, we have fully characterized the power and performance metrics in the model with experimental measurements on fully operational real-life hardware and software. As a result, our model is detailed enough to represent a number of complex effects that impact power and quality of service in real-life multimedia systems, while at the same time our characterization flow is precise enough to obtain power and performance estimates that closely match the measured data. Our results demonstrate that the performance and energy consumption of a multimedia system are deeply influenced by a number of interacting hardware and software components, and that focusing only on a part of the system (e.g., the wireless channel or the operating system), could lead to serious design pitfalls. MODELING POWER-MANAGEABLE REAL-TIME MULTIMEDIA SYSTEMS At the high level of abstraction, electronic systems exhibit different operating modes and make transitions among them at discrete points in time. Hence, they can be suitably represented as Discrete event systems (DES) (Cassandras, 1993). We model real-time power-manageable systems as DES composed of interacting state machines. Each component has a state structure that represents its operating modes and the transitions among them. State transitions are triggered by events that can be either generated within the component according to some distribution (internal events) or received in input (external events). The number of states of the model of each component may be finite, discrete, or continuous. Infinity states are modeled by means of a finite number of parameterized states. The evolution of the system is described by specifying next-event and next-state functions for each component. Next-event function determines the next triggering event generated by the component, while next-state function determines the destination state of next transition. We model each component as a generalized semi Markov process (GSMP) with non-deterministic next-event and next-state functions based on conditional residual-time distributions and on conditional next-state probabilities (Glynn, 1989). Any GSMP component is composed of a state structure and a clock structure. The state structure component is a Stateflow model that takes in input both internal events (generated by the local clock structure) and external events (coming from an input port) and generates a timeout value corresponding to the residual time of the next triggering event. The clock structure takes in input a timeout and a reset signal and generates an event when the timeout has elapsed. The interface of the component is specified by means of input/output ports, used to exchange both events and parameter values. The interaction among multiple GSMP components is simply obtained by connecting their input/output ports, as shown in Figure 1. Additional output signals can be used to observe the system behavior, to evaluate cost/performance metrics and to exchange data among modules. Figure 2. State manageable NIC. diagram of the power- STREAMING VIDEO OVER AN IEEE 802.11b LINK The video server runs on a notebook computer directly connected to an access point (AP) through a dedicated wired link. The AP acts as a bridge to the wireless network interface card (NIC) installed on a HP’s IPAQ palmtop. The palmtop runs the client MPEG4 decoder that reads, buffers, elaborates and plays-back at a constant rate (consumer rate) the video information read from the network. In order to match real-time video constraints, the client application buffer should be never empty when the decoder looks for a frame to decode. If this happens, a deadline miss occurs. Without any power control, the shape of the traffic on the wireless channel reflects the server transmission rate. If the 802.11b MAC-level protocol power management (LAN/MAN Standards Committee of the IEEE Computer Society, 1999) is enabled, the AP performs traffic reshaping by bufferizing incoming packets to allow the NIC to sleep for a determined period of time. In this period, energy can be saved by the wireless interface. After the expiration of the sleeping period, the card wakes up and asks to the AP if there are packets accumulated to be received. If this is the case, the AP sends the back-log to the card. The traffic reshaping function performed by AP imposes bufferization both at producer and consumer sides. Model The block diagram of the Simulink model of the system is shown in Figure 1. Producer. The producer generates output events (representing network packets) according either to a given distribution of inter-arrival times, or to a trace of time-stamped packet information. For each packet, three properties are either randomly generated or read from the trace: the size (in bytes), the frame it belongs to, and the total number of packets representing the same frame. Packets belonging to the same frame are generated as a burst, while packets belonging to different frames are generated at a rate depending on the application. Packet information is made available at the output ports, together with the event that represents the generation of a new packet. Base station buffer. Packet events generated by the producer become input events for the buffer of the base station, which is explicitly represented in the model as a limited FIFO queue with a customizable size. The content of the queue is saved in memory as an array of packets each with the corresponding information (size, frame number, packet per frame). If the producer tries to send a packet and the queue is full, a lost event is generated and the corresponding packet is discarded. Base station. The base station (AP) gets the packets from the queue and send them to the wireless channel. If the input buffer is empty or the receiver in not ready, the AP goes to a waiting state. If the receiver is sleeping because of DPM, the AP goes to an idle state, causing incoming packets to accumulate in the input buffer. Wireless Channel. The wireless channel is represented by a block that receives input events representing incoming packets and generates, with a given latency, output events representing packet delivery. The wireless channel is bidirectional and it has a user-defined packet-loss probability. Lost packets are not delivered to the receiver. Notice that we use a simple channel model since we are interested in modeling the entire wireless system, rather than the wireless channel by itself. Channel latency, loss probability and bandwidth (implicitly modeled by the receiver) are sufficient to perform realistic system-level simulations. Nevertheless, any channel model (Chong, 2003) can be easily embedded in this block to take into account complex error statistics. Wireless Network Card. This is the most critical block of the system, since its power consumption is critical for the battery lifetime of the palm-top. The state diagram is reported in Figure 2. It consists of 5 states: idle, waking up, waiting, Figure 1. Simulink model of the streaming application over a wireless channel. stant rate. Rather, the consumer decides how many packets to read within a frame period based on the information associated with the incoming packets. Frames that either arrive late with respect to the deadline or are incomplete because of a packet loss are discarded by the consumer. EXPERIMENTAL RESULTS To perform our characterization experiments, we used a HP’s IPAQ palmtop computer. With the IPAQ, we used two different wireless network interface cards for our experiments, CISCO Aironet 350 Series (Cisco System, Cisco Aironet 350 Series Wireless LAN Adapters, 2003) and COMPAQ WL110 (HP WL110, 2003), hereafter denoted by CISCO and COMPAQ, respectively. The power consumption of the NICs was Idle - PM ON Receiving - PM ON Receiving - PM OFF 200 CISCO 150 100 50 300 0 250 COMPAQ receiving, acknowledge. The card is normally waiting. When an incoming packet is detected, the card goes into the receiving state and generates a busy output event to indicate that it cannot receive more packets until the actual one has been processed. The card remains in the receiving state for a time interval that is dynamically computed as the ratio between the size of the packet (including the payload and the protocol overhead) and the bandwidth of the channel. After each packet has been completely received, the NIC takes some time to process the packet at the MAC level and then sends an acknowledge back to the AP through the wireless channel. After the acknowledge packet has been sent, the NIC goes back to the waiting state until the next packet arrives. The timing diagram of the wireless channel is shown in Figure 4. When the power management is enabled, the card can go from the waiting state to a low-power idle state. In our model, this transition is triggered by an input event generated by an external power manager, described later in this section. Wake-up from idle is triggered by the power manager when a determined sleep time has elapsed. Wake up transitions may take a non-negligible amount of time and power, modeled by a separate state. Power Manager. The power manager does not represent a hardware unit, but the model of the actual implementation of the power management protocol of the 802.11b standard (LAN/MAN Standards Committee of the IEEE Computer Society, 1999). Based on external settings (that can be decided by the user), the power manager generates output events (ShutDown and WakeUp) to notify the beginning and ending of sleeping periods. After the card has been woken-up, it starts to receive packets accumulated by the AP. After each received packet, the power manager resets a timeout counter. If the timeout expires before the reception of a new packet, the card is put again in the idle state by means of a ShutDown event. Output Buffer. This block represents the bufferization performed by the consumer. It can be used to model either the protocol stack buffer or the application buffer (if present). More levels of buffering can be added. In our case we decided to represent only the UDP protocol buffer, since the application buffer is usually larger and less critical. Consumer. The consumer simulates a streaming application that reads packets from the output buffer at a given rate. Since real-time constraints impose a constant frame rate, but each frame may be encoded using a different number of packets, packet requests from the consumer do not arrive at a con- 200 150 100 50 0 0.73 0.74 0.75 time (s) 0.76 0.77 0.78 0.79 time (s) 1.6 0.8 1.61 time (s) 1.62 Figure 3. Current profiles obtained under different workload and DPM conditions. measured using a Sycard Card Extender that allowed us to monitor the time behavior of the supply current drawn by the cards. Characterization We performed three sets of experiments to characterize: i) the power states of each NIC, ii) the effective bandwidth of the wireless channel and iii) the buffer size of the base station. Power States. The power states of the NIC were characterized in terms of power consumption and transition time by looking at the power profiles under different workload conditions and DPM settings. Typical results are shown in Fig- CISCO COMPAQ Wait 495 375 Rx 650 450 Tx 870 750 Sleep 100 35 WakeUp 325 - of the Ethernet packets transferred from the laptop to the base station. The actual transmission time can be expressed as: Table 1. Power consumption of the NICs in mW ure 3. The power consumption of CISCO is higher than that of COMPAQ in all states (sleep, idle, receiving, transmitting) but during wake up. Moreover, differently from COMPAQ, CISCO has a non-negligible wake-up time. As for performance, COMPAQ has a slower reaction to incoming packets, as shown in the right-most graphs of Figure 3 obtained while receiving the same burst of packets. Both cards stay idle for a very short time among packets, reception takes most of the time, while higher peaks correspond to the acknowledge transmitted upon reception and processing of each packet. The acknowledge peaks produced by COMPAQ look delayed and wider with respect to those produced by CISCO, actually reducing the number of packets received in a time unit (e.g., in the 20ms time window shown in Figure 3, COMPAQ receives 14 packets while CISCO 15). The measured average power consumption is reported in Table for all power states of the two NICs. The wake-up time of CISCO is 12ms, while that of COMPAQ is neglected. overhead AckTime APtime latency payload latency t overall packet time Figure 4. Timing diagram of the wireless channel 10 Transmission time (ms) CISCO (measured) COMPAQ (measured) Fitting model 5 0 packet overhead 0 2000 4000 frame size (bytes) 6000 8000 Figure 5. Transmission time as a function of the frame size. Channel bandwidth. The effective bandwidth provided by the wireless link depend on traffic fragmentation in packets. In fact, each packet has a two-fold overhead: the headers and tails introduced by the protocol stack, and the acknowledge time. To characterize the effective bandwidth by taking packet overheads into account we measured the total time required to send 50 UDP frames of a given size from the laptop to the palmtop PC. The number of frames transferred was chosen small enough to avoid the saturation of the AP buffer. Figure 5 shows the transmission time per frame as a function of the frame size. Steps occurring every 1500 bytes are due to fragmentation. In fact, 1500 bytes is the maximum payload !#"%$'&(*),+-/. 01 (1) where 100 is the number of protocol bytes per packet, L is the channel latency, AckTime is the overall reaction time of the NIC. Notice that both the AckTime and the channel Latency are due to the acknowledge needed after each packet, as shown in Figure 4. Using Equation 1 as a fitting model for the experimental results we can obtain indirect measures of the fitting parameters Bw and L. Fitting curves are plotted in Figure 5. Finally, the channel latency depends on the distance between the base station and the NIC. For our experiments we used a fixed distance of 50cm, providing an ideal channel quality, in order to evaluate the frame loss due to DPM only. Buffer size. While the size of the UDP and application buffers on the palmtop can be set by the user, the size of buffer used by the base station is constant. We characterized the behavior of the internal buffer by sending bursts of packets across the wireless channel and measuring the packet loss as a function of the packet size and number. For a given packet size, we call limiting burst size the maximum number of packets in a burst delivered without any loss. Interestingly, the limiting burst size grows with the packet size: a few packets are lost if more than 100 packets of size 10bytes are sent across the wireless channel, while up to 500 packets of size 1000bytes can be sent without packet loss. The reason for this counter-intuitive behavior is two-fold. First, the internal buffer of the base station is organized in packets, so that it can contain a given number of packets regardless of their size. Second, according to the results outlined in the previous section, the larger the packets the larger the effective bandwidth provided by the wireless channel. Since the internal buffer of the base station saturates because input data (provided by the laptop) are faster than output data (sent across the wireless link), the higher the output bit rate the longer the time required to saturate the buffer. Experimental results indicate that the base station can bufferize up to 100 packets. Model validation We validated our model by comparing simulation results and measurements obtained by running two MPEG4 benchmarks: conference and fireworks. The first one has a frame rate of 15 frames/sec and is composed of 899 frames. The second has a frame rate of 30 frames/sec and is composed of 522 frames. Each benchmark was ran and simulated using both NICs, with all available DPM configurations, namely, COMPAQ PM OFF, COMPAQ PM ON 200ms, COMPAQ PM ON 100ms, CISCO PM OFF, CISCO PM ON 200ms. In order to make simulation results directly comparable with measurements we implemented a trace-based producer generating packets according to a time-stamped trace collected while running the streaming application on the laptop. Experimental results reported in Table show that the average power consumption provided by the simulation model was always within 4% from measurements. COMPAQ PM OFF PM ON 200ms PM ON 100ms CISCO PM OFF PM ON 200ms CONFERENCE meas sim 380.31 377,1 56.48 55.5 61.66 63.2 497.11 501 129.85 133.5 FIREWORKS meas sim 383.74 382.7 143.21 139.5 146.77 141.9 502.45 503.6 197.3 201.5 Table 2. Power consumption of the NICs in mW units PM OFF PM ON period (ms) 200 200 300 300 300 400 400 400 500 500 500 500 1000 1000 latency (ms) 100 100 200 100 200 300 200 300 400 200 300 400 500 800 1000 avg pw (mW) 377.1 55.5 55.5 52.17 52.17 52.17 52.17 48.51 48.51 47.28 47.28 47.28 47.28 42.83 42.83 AP buf (pac) 1 4 4 5 5 7 7 7 7 9 9 9 9 17 17 fr. lost (perc.) 0% 18% 0% 47% 11% 0% 52% 8% 0% 61% 43% 7% 0% 29 % 0% Table 3. Exploration for different PM configuration and latency values Design space exploration The simulation model was used to evaluate the impact of DPM settings not yet supported by the real NICs. In particular we performed experiments on the conference benchmark for different durations of the sleeping periods and for different values of the consumer latency. Results are reported in Table . For a determined sleeping period, we repeated the experiments by changing the initial latency of the consumer. By increasing the sleeping period, the average power consumption decreases at a cost of higher frame loss. However, the frame loss probability can be reduced by increasing the initial latency, allowing the consumer to bufferize more packets before starting playback. Since the frame consumption rate is constant, the initial bufferization allows the consumer to compensate for the long inter arrival times imposed by the power management policy. In fact, as shown in the table, for a determined sleep duration, the frame loss decreases as a function of the latency. In particular, the packet loss is null whenever the initial latency is greater or equal than the sleep period, since in this case the application buffer is never fully depleted because of DPM. In practice, client-side buffering provides the opportunity for saving power without impairing quality of service, at a cost of an initial latency. The limiting sleep time (and latency) depends on the size of the application buffer. For our case study, power consumption could be reduced to 42.83mW without violating real-time constraints, with a sleep period and a latency of 1sec and a maximum occupancy of the application buffer of 17 packets. This is a 22% improvement over the power savings provided by the longest sleep period supported by the real NIC. References Cali, F., Conti, M. & Gregori, E. (1998) Ieee 802.11 wireless lan: capacity analysis and protocol enhancement. INFOCOM, , 142–149. Cassandras, C. G. (1993) Discrete event systems: modeling and performance analysis. Aksen, . Chong, C. C. (2003) A new statistical wideband spatiotemporal channel model for 5-ghz band wlan systems. IEEE Transactions on Selected Areas in Communications, 21 (2), 139–150. Cisco System, Cisco Aironet 350 Series Wireless LAN Adapters (2003). http://www.cisco.com/univercd/cc/td/doc/product/wireless/ airo 350/350cards/index.htm. Eshghi, F. & Ekhakeem, A. (1998) Performance analysis of ad hoc wireless lans for real-time traffic. IEEE Journal on Selected areas in Communications, 21 (2). Glynn, P. W. (1989) A gsmp formalism for discrete event systems. IEEE Proceedings, 77 (1), 14–23. HP WL110 (2003). http://h18004.www1.hp.com/products/ wireless/wlan/wl110.html. Krashinski, R. & Balakrishnan, H. (2002) Minimizing energy for wireless web access with bounded slowdown. Procedings of MOBICOM, . LAN/MAN Standards Committee of the IEEE Computer Society (1999). Part 11: Wireless LAN MAC and PHY Specifications: Higher-Speed Physical Layer Extension in the 2.4 GHz Band. Lee, E. A. & Sangiovanni-Vincentelli, A. (1998) A framework for comparing models of computation. IEEE Transactions on CAD ICAS, 17 (12), 1217–1229. Marculescu, R., Nandi, A., Lavagno, L. & SangiovanniVincentelli, A. (2001) System-level power/performance analysis of portable multimedia systems communicating over wireless channel. Procedings of ICCAD, , 207–214. MSWIM (2002) ACM International Workshop on Modeling Analysis and Simulation of Wireless and Mobile Systems. Raghunathan, V., Schurgers, C., Park, S. & Srivastava, M. (2002) Energy-aware wireless microsensor networks. IEEE Signal Processing Magazine, 19 (2), 40–50. Sinha, A. & Chandrakasan, A. (2001) Dynamic power management in wireless sensor networks. IEEE Design and Test of Computers, 18 (2), 62–74. Zorzi, M. & Chockalingam, A. (1998) Energy efficiency of media access protocols for mobile data networks,. IEEE Transactions on Communications, 46 (11), 1418–1421.