«•-> / U.S. Department of Commerce National Bureau of Standards X ^ffAU O' Computer Science and Technology NBS Special Publication 500-69 An Analytic Study of a Shared Device Among Independent Computing Systems NATIONAL BUREAU OF STANDARDS The National Bureau of Standards' was established by an act of Congress on March 3, 1901. The Bureau's overall goal is to strengthen and advance the Nation's science and technology and facilitate their effective application for public benefit. To this end, the Bureau conducts research and provides: (1) a basis for the Nation's physical measurement system, (2) scientific and technological services for industry and government, (3) a technical basis for equity in trade, and (4) technical services to promote public safety. The Bureau's technical work is per- formed by the National Measurement Laboratory, the National Engineering Laboratory, and the Institute for Computer Sciences and Technology. THE NATIONAL MEASUREMENT LABORATORY provides the national system of physical and chemical and materials measurement; coordinates the system with measurement systems of other nations and furnishes essential services leading to accurate and uniform physical and chemical measurement throughout the Nation's scientific community, industry, and commerce; conducts materials research leading to improved methods of measurement, standards, and data on the properties of materials needed by industry, commerce, educational institutions, and Government; provides advisory and research services to other Government agencies; develops, produces, and distributes Standard Reference Materials; and provides calibration services. The Laboratory consists of the following centers: Absolute Physical Quantities 2 — Radiation Research — Thermodynamics and Molecular Science — Analytical Chemistry — Materials Science. THE NATIONAL ENGINEERING LABORATORY provides technology and technical ser- vices to the public and private sectors to address national needs and to solve national problems; conducts research in engineering and applied science in support of these efforts; builds and maintains competence in the necessary disciplines required to carry out this research and technical service; develops engineering data and measurement capabilities; provides engineering measurement traceability services; develops test methods and proposes engineering standards and code changes; develops and proposes new engineering practices; and develops and improves mechanisms to transfer results of its research to the ultimate user. The Laboratory consists of the following centers: Applied Mathematics — Electronics and Electrical Engineering-' — Mechanical Engineering and Process Technology-' — Building Technology — Fire Research — Consumer Product Technology — Field Methods. THE INSTITUTE FOR COMPUTER SCIENCES AND TECHNOLOGY conducts research and provides scientific and technical services to aid Federal agencies in the selection, acquisition, application, and use of computer technology to improve effectiveness and economy in Government operations in accordance with Public Law 89-306 (40 U.S.C. 759), relevant Executive Orders, and other directives; carries out this mission by managing the Federal Information Processing Standards Program, developing Federal ADP standards guidelines, and managing Federal participation in ADP voluntary standardization activities; provides scientific and technological advisory services and assistance to Federal agencies; and provides the technical foundation for computer-related policies of the Federal Government. The Institute consists of the following centers: Programming Science and Technology — Computer Systems Engineering. 'Headquarters and Laboratories at Gaithersburg, MD, unless otherwise noted; mailing address Washington, DC 20234. ; Some divisions within the center are located at Boulder, CO 80303. Computer Science and Technology NBS Special Publication 500-69 An Analytic Study of a Shared Device Among Independent Computing Systems Alan Mink Center for Computer Systems Engineering Institute for Computer Sciences and Technology National Bureau of Standards Washington, DC 20234 U.S. DEPARTMENT OF COMMERCE Philip M. Klutznick, Secretary Luther H. Hodges, Jr., Deputy Secretary Jordan J. Baruch, Assistant Secretary for Productivity, Technology and Innovation National Bureau of Standards Ernest Ambler, Director Issued November 1980 Reports on Computer Science and Technology The National Bureau of Standards has a special responsibility within the Federal Government for computer science and technology activities. The programs of the NBS Institute for Computer Sciences and Technology are designed to provide ADP standards, guidelines, and technical advisory services to improve the effectiveness of computer unitilization in the Federal sector, and to perform appropriate research and development efforts as foundation for such activities and programs. This publication series will report these NBS efforts to the Federal computer commurjity as well as to interested specialists in the academic and private sectors. Those wishing to receive notices of publications in this series should complete and return the form at the end of this publication. National Bureau of Standards Special Publication 500-69 Nat. Bur. Stand. (U.S.), Spec. Publ. 500-69, 176 pages (Nov. 1980) CODEN: XNBSAV Library of Congress Catalog Card Number: 80-600170 U.S. GOVERNMENT PRINTING OFFICE WASHINGTON: 1980 For sale by the Superintendent of Documents, U.S. Government Printing Office, Washington, D.C. 20402 Price $5.50 (Add 25 percent for other than U.S. mailing) ABSTRACT Global queueing network performance models are developed for the increasingly important class of computer networks comprising a number of independent computing systems sharing a single resource. An extensive bibliography and survey of prior work relating to this topic are included. Analytic expressions of performance measures for this class of systems are derived from the general theory of multi-class queueing networks, and new computational algorithms for evaluating them are presented that are memory-space efficient (linear vs. exponential) compared with known algorithms for the general theory. This exact analytic model, called the Shared Central Server Model, incurs approximately the same exponential time complexity in its evaluation as do all models based on the general theory; because of this, a simple heuristic approximate model of this class of systems is also presented that is computationally efficient in both time and space. Modular expansion of this class of systems is investigated using the approximate model, and a useful relationship is derived between the number of additional independent computing systems and the incremental increase in capability of the shared resource required to maintain the existing level of system performance. Key words: Approximate queueing models; computer architecture; modular expansion analysis; performance evaluation; performance modeling; queueing models; queueing networks. 111 ACKNOWLEDGEMENT The author would like to express his sincere thanks to the following people. To Dr. Charles B. Silio, my deepest appreciation for providing the light on the long dark road, and the push over the mountains. To Dr. James H. Pugsley, my sincere thanks for his patience and guidance. To Mr. Joseph R. Singer, my deepest gratitude for his continual understanding, consideration, and support. To Ms. Linda Jenista and Ms. Elizabeth Becker, my thanks for their assistance in preparation of this manuscript. To Dr. Marshall Abrams, Dr. Ira Cotton, Ms. Shirley Watkins, and Dr. Paul Amer, my thanks for their support and understanding during the final phases of this endeavor. To Mr. Tom Giammo, my thanks for his insight and consultation. Last, but not least, to Dr. Mary Catherine McKenna, my deepest appreciation for providing impulses of energy to re- establish momentum when necessary. * Note: This report was submitted as a dissertation in partial completion of the requirements for the degree of Doctor of Philosophy in Electrical Engineering at The University of Maryland. IV TABLE OF CONTENTS ACKNOWLEDGEMENT iv LIST OF TABLES vii 1 LIST OF FIGURES viii I. INTRODUCTION A. Resource Sharing 1 B. Objectives 3 C. Organization of Dissertation 4 II. MODELING CONCEPTS A. Related Areas 6 B. Queueing Networks 9 C. Approximations 15 III. SHARED CENTRAL SERVER MODEL A. The Model 21 B. Performance Measures 34 C. Computational Algorithms 42 IV. APPROXIMATE SCS MODEL A. The Approximation 55 B. Computational Algorithm and Performance Measures 59 C. Error Analysis of Approximation 65 V. ANALYSIS OF MODULAR EXPANSION A. Exact Analysis 97 B. Approximate Analysis 106 C. Application 116 VI. SUMMARY AND RECOMMENDATIONS A. Summary 122 B. Research Extensions 124 APPENDICES A. REVIEW OF QUEUEING NETWORK EQUATIONS 1. The Single Server Queue 126 2. Open and Closed Queueing Networks 129 3. The Central Server Model 133 4. Closed Queueing Networks With Multiple Job Classes 139 B. GLOSSARY 147 C. MATHEMATICAL NOTATION 149 D. SAMPLE SPACE FROM ASSIGN ALGORITHM 153 BIBLIOGRAPHY 155 vi LIST OF TABLES Table IV-1. Execution unit processing times 70 Table IV-2. Relative Error statistics for Qq^' 72 Table IV-3. Relative Error statistics for T cpu 73 Table IV-4. Relative Error statistics for Q SPR 74 Table IV-5. Relative Error statistics for T SPR 75 Table V-l. Relative Error statistics for W cpu Ill Table V-2. Relative Error statistics for W SpR 112 Vll LIST OF FIGURES Figure III-l. SCS Block Diagram 22 Figure III-2. CS Block Diagram 23 Figure III-3. SCS Transition Probability Matrix 24 Figure III-4. Storage & Computation Complexity 49 Figure III-5. Example of Storage & Computation Complexity 50 Figure IV-1. M/M/l/K and M/M/l throughput comparison 63 Figure I V-2. M/M/l/K and M/M/l mean queue length comparison 64 Figure IV-3. Relative error plot of Q CPU for all points 77 Figure IV-4. Relative error plot of T cpu for all points 78 Figure IV-5. Relative error plot of Q SPR for all points 79 Figure IV-6. Relative error plot of T SpR for all points 80 Figure IV-7. Relative error plot of T SPR for P SPR 81 Figure I V-8. Relative error plot of T spR for J 82 Figure IV-9. Relative error plot of T SpR for s 83 Figure IV- 10. Relative error plot of T SpR for R 84 Figure IV-11. Relative error plot of T SPR for u SPR 85 Figure IV-12. Relative error plot of Q spR for P SPR 86 Figure IV-13. Relative error plot of Q SPR for J 87 Figure IV-14. Relative error plot of Q SPR for s 88 Figure IV-15. Relative error plot of Q spR for R 89 Figure IV-16. Relative error plot of Q SPR for u SpR 90 viii Figure IV-17. Exact vs. M/M/l plot of Q CPU vs. u SPR 91 Figure IV-18. Exact vs. M/M/l plot of T cpu vs. u SpR 92 Figure IV-19. Exact vs. M/M/l plot of Q SpR vs. u SpR 93 Figure IV-20. Exact vs. M/M/l plot of T SpR vs. u SPR 94 Figure V-l. Relative error plot of W cpu for all points 109 Figure V-2. Relative error plot of W spR for all points 110 Figure V-3. Exact vs. M /M/l plot of W cpu vs. u SpR 113 Figure V-4. Exact vs. M /M/l plot of W SpR vs. u SPR 114 Figure A-l. Central Server model block diagram 134 Figure A-2. Central Server model transition probability matrix 135 IX I. INTRODUCTION A. Resource Sharing Resource sharing is an old concept. It exists in every industry and facet of life. The basic motivation for resource sharing is the existence of a scarce resource, caused primarily by economic or physical considerations. In the computer industry there has been significant interest in distributed processing [IDC 76] and a proliferation of computer networks [COTTON 79, LEUNG 78, MONAHA 79, SPRING 78, WILKES 79]. The primary goal of these schemes is to provide various mechanisms for resources sharing [KAHN 72, ROBERT 70]. In these environments the scarce resources include information and capabilities (programs or processes) as well as devices. The cost of processing power and primary memory is currently decreasing at a rate of 50% per year, while the costs of communication facilities, secondary memory, peripherals and special purpose devices are decreasing at a rate of 10% per year or less [BBN 79]. LSI techniques and mass production have been the primary causes of these cost reductions. This has resulted in two basic system development strategies that can be generally associated with opposite ends of the computer cost spectrum. These two strategies can be categorized as more- for- the-same-cost and the-same- for-less-cost. The trend on the upper end of the cost spectrum, is for the cost not to decrease, but to compensate by providing increased speed, capacity, and capability. New generations of more powerful systems are being offered which include parallel architectures based on array and pipeline concepts. Previously their cost would have been prohibitive. As a result, their predecessors of the previous generation are subsequently being offered on the market at discount prices. On the lower end of the cost spectrum actual cost reductions are being offered. These basically comprise microcomputer and minicomputer systems, as well as some individual components. With each cost reduction the acquisition of these systems becomes more feasible for an increasing segment of the business community and the general public. The cost reduction in microcomputer and minicomputer systems as compared to maxicomputer systems has encouraged acquisition of many independent small systems vs. a single large one. Functional and administrative separation, along with the lower price, has also fueled this trend [IDC 76]. But as applications become more sophisticated and complex they tend to require more data from other parts of the organization and/or more computational power. In this environment the small systems can handle the local and overhead processing, allowing the large system to be utilized efficiently to provide computational power and large storage repositories. Secondary storage systems, peripherals, and special purpose devices are now the scarce resources within and between computing systems, rather than the processors and primary memories as was previously the case. The increasing economies of scale in secondary storage technology has made the concept of a large pooled storage subsystem attractive [WATSON 80]. One of the objectives of a Back-End Network [CHAMPI 80, LAM 79, WATSON 80] is to provide high speed access to global peripherals and storage subsystems [CHAMPI 80, WATSON 80]. The Octopus network at Lawerence Livermore Laboratory and NASA's Skylab network are two examples of this type of architecture [THORNT 80]. Several mass storage subsystems are currendy on the market (e.g. IBM 3850 and CDC 38500). A recent mass storage subsystem design from Nippon electronics Co. [SEKINO 79] listed shared use by multiple independent computer systems as one of its primary design criteria. Although large expensive systems have always been a scarce resource, they are now being shared by low cost small systems. Some organizations that had more than one independent computing system (ICS) have integrated them [IDC 76]. Their motivation has been to reduce costs by sharing resources (data, programs, and devices). Other organizations, in similar circumstances, are investigating the feasibility and benefits of embarking on similar integration efforts [IDC 76]. Two popular approaches to performing this type of integration are through a common shared secondary memory system [CDC 75] or a local area network (LAN) [CARPEN 79]. Neither of these approaches excludes the other. The shared secondary memory approach requires a multiplexing device between the ICSs and the shared secondary memory. Some intelligence is 2 also required either in each ICS or in the multiplexor to handle the synchronization and lock-out mechanisms necessary to accommodate simultaneous access to common objects. If the ICSs are not local, then communication facilities are also required. If the ICSs are within a few kilometers, then this communication facility could be a LAN. The LAN approach requires a communication medium and a number of interface units, at least one for each device placed on the network. It is desirable for these interface units to have some intelligence, so that the operation of the LAN is kept relatively transparent to each device. A device on the network may be an ICS, a shared secondary memory, or any other device that may be shared or desires to share the devices on the network. B. Objectives We have briefly discussed the rationale for resource sharing within and between computer systems. Two approaches that have been used to accomplish resource sharing were also discussed. The resulting architectures of these approaches facilitate modular expansion by allowing the addition of ICSs, as well as making the sharing of data easier. It is our objective in this dissertation to investigate these types of systems which share resources. Although our previous discussion has been primarily concerned with secondary memory as the shared processing resource (SPR), it is applicable to any SPR. Our focus will be on an architecture consisting of a single SPR among a number of ICSs. Our investigation will be concerned with the relationship between the performance of each ICS based on the processing rate of the SPR and the number of ICSs. To accomplish this we will construct a queueing model, and develop the expressions for the desired performance measures. Little, if any, analysis on this class of architecture has appeared in the literature. Most of the analyses are concerned with the individual subsystems, rather than the overall system. Investigation into the performance evaluation of ICSs [BRANDW 77, BUZEN 71], secondary memory subsystems [CHANG 72, COFFMA 68A, HOOGEN 77], and general communication subsystems [FRANK 72, KLIENR 64, KOBAYA 77, WONG 78] have and continue to be done. We feel it is important to provide designers and analysts the results of an analysis on this class of architectures, while in addition providing them with useful analytic tools. By taking this global analysis viewpoint we will not directly take into account the effects caused by various strategies within the subsystems, such as the effects of different communication protocols. We will assume that the communication subsystem (multiplexors, LAN, etc.) has sufficient bandwidth so that it may be disregarded as a bottleneck for performance evaluation purposes [THORTON 80]. Generally these delay times are insignificant when compared to the processing delays of the various devices comprising the system. When these times are significant, then the processing delays of the devices can be extended to incorporate them. C. Organization of Dissertation Chapter II discusses some of the related and previous work in the computer queueing preformance evaluation area. Appendix A provides a short review of the mathematics of queueing network theory. Appendices B and C provide a glossary of terms and a definition of mathematical notation used in this dissertation, respectively. Due to the rather extensive use of mathematics in this dissertation the reader is urged to refer to Appendix C whenever unfamilar notation is encountered. In chapter III a queueing network model is developed for the system architecture we have introduced here, along with relevant performance measures. Efficient computational algorithms are presented for the evaluation of these performance measures. Previously the evaluation of queueing network models required memory-space and time complexity both growing exponentially with the size of the state-space. The algorithms we develop to evaluate our model require memory-space that grows linearly with the size of the state- space, although the time complexity still grows exponentially. This provides the designer and analyst the ability to evaluate this model when it has a large state-space if they are willing to invest the computation time. Whereas, previously it may not have been possible due to physical memory-space limitations. Chapter IV presents an approximate model for this class of architectures that is significantly more efficient in computation time and memory-space complexity than is the exact model of chapter III. The associated performance measures and an efficient computational algorithm to evaluate them are also presented. The results of this approximation are compared to those of the exact model. The performance measures predicted by this approximate model do result in a varying relative error, which we consider to be within acceptable engineering limits. The efficiency gained in their evaluation is, for most applications, thought to be an acceptable compromise for the error incurred. For situations with extremely large state-spaces, it may be the only analysis method possible. As a result we provide the designer and analyst the capability to use the approximate model to obtain estimates of the performance of a large number of system configurations in a very short period of time. Once a small number of candidate configurations are culled, the exact model may be applied to obtain more accurate performance predictions. In chapter V both the exact and approximate models are analyzed to obtain relationships between response/delay times as a function of the number of ICSs and the processing rate of the SPR. The analysis of the exact model is shown to produce only an upper bound which is too high to be of any practical use. The analysis of the approximate model yields a very useful and intuitively satisfying relation between the addition of ICSs and the incremental increase in SPR processing rate required to maintain system response time. This result will be useful to designers and analysts when they consider building new systems or augmenting existing systems which are based on this class of architecture. These results are then applied to two design situations to provide examples of the utility of this model. Chapter VI summarizes the salient results of this dissertation. Some directions for future potential research extensions are also discussed. II. MODELING CONCEPTS A. Related Areas Prior to setting forth the proposed shared processing architecture (SPR) two related areas were investigated. One is the development and status of the theory that is to be applied toward the modeling endeavor. The second is similar situations that have been studied. Queueing network theory will be applied toward developing our SPR model. Existing queueing network modeling techniques and programming facilities, other than product form, reasonably allow analysis of systems consisting of a few thousand states. These techniques are generally limited by algorithm complexity and machine resources. There is currently no indication that significantly larger state-spaces will be accommodated in the near future [CHANDY 78]. However, systems whose structure conforms to the requirements of a product form solution may accommodate state-space sizes many times larger than other queueing network modeling techniques. In the next section we review the area of queueing network theory. Modeling requires one to be concerned with two levels of abstraction. The first level is where the analyst specifies a descriptive model by selecting "key" aspects of the actual system. The second level is where the analyst formulates or applies an analytic model to represent the descriptive model. In formulating both these levels of models various simplifying assumptions are generally made. In applying these assumptions the resultant models, on either level, stray from accurately portraying the actual system. In the general literature, as in this dissertation, a first level descriptive model is presented and a second level analytic model is formulated to closely or exactly represent that descriptive model. But the accuracy of the analytic model depends on how well the descriptive model represents the actual system. In certain situations this presents a dilemma to the analyst, whether to formulate an inaccurate descriptive model for which an analytic model can provide an exact solution; or to formulate an accurate descriptive model for which an approximate analytic model can provide an inaccurate solution. Both of these approaches yield inaccurate results for the actual system. The more correct approach remains an open question, to be handled on a case-by-case basis. The approach chosen by a given analyst depends on many factors, such as time, analytic tools and techniques both familar and available, as well as the level of confidence in them. Chandy [CHANDY 78] suggests the analyst selection criteria are, in descending order of importance, (1) solution speed, (2) credibility, and (3) degree of accuracy. Based on this ordering it seems that analysts will sacrifice accuracy for quick results. This should be further qualified. An analyst will sacrifice accuracy in return for a quick solution if it will provide insight into the behavioral trends of the actual system. Therefore, one may conclude that approximate models that provide a fast solution and/or more closely represent a faithful descriptive model of an actual system are always in demand. The major limitations of current queueing network theory can be placed in two categories, size and structure. Size limitations are concerned with the time and memory-space complexity required to obtain solutions to the models of systems which have large state spaces. Structure limitations are concerned with systems whose operational structure does not conform to the basic assumptions and requirements necessary to be modelled by queueing network theory. As a result of these queueing network theory limitations various approximations to model systems which suffered from one or more of these limitations have been proposed. Depending on the actual system and the limitations one is attempting to overcome, these approximate models produce varying degrees of success and utility. In section C we review some of these approximation techniques. Although no previous known work has been attempted for our SPR architecture, a somewhat similar situation is the study of memory interference (MI) [BASKET 76, BHANDA 73, BURNET 70, HOOGEN 77, MCCRED 73, OSTERW 72, RAU 79, SMITH 77]. Briefly, this environment consists of n processors sharing m primary memory modules. The main analysis endeavor is to determine the resultant effective memory bandwidth available to the processors. This class of descriptive models differs in three primary aspects from the descriptive operational structure of our SPR architecture. First, in MI the processor and memory are tightly coupled such that processors operate directly out of primary memory, and their interactions occur within a few clock cycles. In contrast, the SPR is a separate device which is loosely coupled, and interactions require hundreds or thousands of clock cycles. For MI, any delay caused by interference from another processor attempting to access the same primary memory module results in the blocked processor becoming inactive. Work cannot progress without the information from primary memory. The exceptions to this are machines that have have buffered look-ahead and/or prefetch environments. This resulting inactivity applies as well to multiprogrammed processors, since the context cannot be switched due to the nonexistence of a quiescent state and the prohibitive overhead which would be incurred. For the SPR, the processor formulates a request for service to the SPR according to some type of message protocol. Even if the SPR is not busy and the request does not encounter any additional delays other than that required to perform the actual service, the processor expects a relatively large amount of time to elapse before receiving the response. During this time the processor can then attempt to continue processing (i.e. overlap) or if multiprogrammed can switch context and proceed to process another job. Second, MI is concerned with the interference occuring at the access to the individual memory modules and is not concerned with the job flow or direct delays to any other devices. The SPR architecture is concerned with the job flow and the direct delays a job incurrs as it flows through the system. The integrity of maintaining the proper flow paths is a prime consideration of the SPR model. MI has no flow path, other than the implied processor to primary memory and return, and has no reason to maintain one. Therefore, after the primary memory services the processor's request there is no distinction or accounting as to which processor the job returns to. Third, MI generally assumes at least two or more primary memory modules and two or more processors, each of which are identical. The SPR architecture assumes only a single SPR and one or more processors, each of which may be distinctly different and have a number of peripheral processors. The basic approach in constructing MI analytic models has been as follows. Assume a probability distribution for the primary memory access pattern and a relation between the processor and memory cycle time. From these derive the probability of a processor making a request to each of the memory modules. Then apply combinatorics to weight these probabilities to determine the mean number of busy memory modules and the expected unit execution rate of the processors. A different approach was used by McCredie [MCCRED 73] which was based on queueing network theory. There are many differences between McCredie's model and the SPR architecture which do not make this model applicable. A single class of jobs is assumed and therefore, no flow path integrity between processors can be maintained. There are a multiple number of identical shared devices and only one job per processor is allowed. McCredie relied on Buzen's algorithm [BUZEN 71] to evaluate this model, which is efficient since only a single job class was involved. B. Queueing Networks Queueing network models were originally developed as an aid to the management sciences for "jobshop" flow problems. This specific class of problems consisted of those in which various work requests flow through a network of service centers, forming waiting lines (queues) at each depending upon the density of traffic. As computing systems became more sophisticated, by distributing functions in channels and controllers, it was recognized that the executing characteristics (flow) of a job could be thought of as migrating between these service centers (CPUs, channels, controllers, etc.) and, therefore, could be modelled using queueing networks. There are two basic research approaches to queueing network theory, probabilistic and algebraic. The probabilistic or decomposition [DISNEY 73] approach decomposes the network into subnetworks, solves the stochastic flow (arrival and departure processes) of each subnetwork independently, and then recombines the results to produce the overall stochastic flow. This approach has the advantage of allowing quite general rules concerning the arrival processes and service distributions, which may be non-markovian in nature. The main disadvantage is a consequence of their general stochastic nature, which usually results in intractable equations yielding no closed form solution. Currently, solutions are available for networks consisting of only two or three service centers. The algebraic [WALLAC 73] approach represents the state probabilities as a set of homogeneous algebraic equations to be solved. The state is a vector description of the distribution of jobs among the service centers of the network. The main advantage of the algebraic approach is the ability to handle networks with a large number of service centers. The main disadvantage of this approach is the restrictive assumptions on arrival processes and service distributions, generally Markovian. The algebraic approach has been applied using various techniques. A separation of variables technique has been applied by Jackson [JACKSO 63]. Some numerical evaluation techniques have been attempted. This approach expands the system state-space description so that the model may remain Markovian, but also increases the size of the state-space. The equilibrium state probability equations are formulated and then solved numerically using techniques such as the relaxed Jacobi iteration [WALLAC 66] method and more recently the Gauss Seidel [GAVER 76] method. Analytic solutions represent a special subclass of Markovian networks, but yield very efficient computations due to their structural properties. Numerical solutions on the other hand can handle any Markovian network, but are limited by the size of their state space. In this dissertation we are mainly concerned with analytic solutions to the algebraic approach of modeling systems. A general queueing network consists of a set of service centers arbitrarily connected, each with a queue and an arbitrary but fixed number of servers. The network is referred to as closed if no new jobs arrive or leave, but a constant number continuously circulate. If arrivals and departures are allowed, permitting the overall number of customers to vary, the network is referred to as open. Operationally, a job arrives at a service center and is placed in a queue, until a server, according to some scheduling policy, is available to provide the required service. Upon completion of service, the job transits, with no delays and a constant known routing probability, to another service center. This sequence is repeated at each service center. A queueing network is specified by the number of service centers, the number of servers at each center, their service time probability distributions and scheduling policy, a probability transition matrix, the number of jobs in the network (if a closed network) or an arrival time probability distribution (if an open network). The network equilibrium state probability 10 distribution is computed from the above parameters and is subsequently used to compute various performance measures. Performance measures of general interest include mean queue length, busy probability, and throughput (jobs/unit-time). The "classical" algebraic solution technique, in general, requires simplifying assumptions to allow the system to be modeled as a Markovian network, having a managable solution. The service and arrival processes of jobs at each service center are assumed to be statistically independent and identically distributed (iid). The service process is assumed to be exponentially distributed. The arrival process is assumed to be Poisson, which implies the time interval between consecutive arrivals has an exponential distribution. The scheduling policy is assumed to be first- come first-served (FCFS). This allows the probabilistic flow rate into and out of a state to be expressed as a simple linear function of time, the mean service rate, and the mean arrival rate. From the resultant expressions for the probabalistic flow rates a set of simultaneous state balance equations is formed, and a product form solution is assumed which eventually reduces to a set of simultaneous linear equations. Appendix A provides a brief review of this solution technique. Jackson [JACKSO 57] presents one of the earliest works on solving the equilibrium state probabilities of an open queueing network. A fairly general network is assumed. It consists of a set of service centers each containing an arbitrary number of servers. Arrivals into the network follow a Poisson process whose service times are iid exponential distributions, and service is rendered on a FCFS basis. A solution for the equilibrium state probability distribution is presented. Jackson points out the similarity between the form of this solution and that of an elementary single service center with multiple servers, under the same arrival and service assumptions. Jackson [JACKSO 63] later extended this model by incorporating arrival and service time distributions which are functions of the queue length. In addition he generalized the upper and lower limits on the number of customers in the network, which were previously infinity and zero, respectively. He introduced the concepts of "triggered arrivals" and "service deletions" to accomplish this. Triggered arrivals occur when the total number of customers drop below a specified threshold, thus triggering immediate arrivals of new customers to replace those that left. Service deletions occur when the length of a queue exceeds some maximum threshold. New customers arriving at this queue are given zero service time and continue on their routing. 11 Gordon and Newell [GORDON 67 A] independently produced a result similiar to Jackson's [JACKSO 63] in that they derived the equilibrium state probabilities for a closed queueing network. Gordon and Newell represent this closed job flow system as an irreducible Markov process, consisting of a constant number of customers whose service time distributions are exponential. They present the state balance (difference) equations for this system. By assuming the solution is of a product- form and utilizing a separation of variables technique, they obtain a set of linear simultaneous equations of the form E=E[P]. Since [P] is the transition probability matrix, which is stochastic, a solution to the simultaneous equations exists. The solution to these simutaneous equations can then be substituted back into the assumed product-form solution for the equilibrium state probabilities. The product form solution incorporates a normalization constant whose purpose is to force these product terms to be proper probabilities that sum to unity. Thus the normalization constant can be solved for by summing the product-form terms over the entire state space. The state space grows exponentially, 0[ (K + l) s ], with the number of jobs, K, and the number of service centers, s, in the network. Therefore, this presents a nontrivial computational requirement. A simple, nontrivial case of a closed queueing network model is that of the Central Server model. Buzen's [BUZEN 71] work on the central server model is among the initial applications of closed queueing network theory to computer systems. The model consists of a CPU (the central server) and a set of peripheral processors (PPUs) which service a set of continually circulating jobs, see Figure III-2 of chapter III. The behavioral characteristics of a job are as follows. A job requests service from the CPU. If the CPU is busy, the job must wait in a FCFS queue. Once the CPU service request is satisfied, the job then transits to one of the PPUs or back to the CPU with a fixed probability. After service is completed at the PPU, the job proceeds with probability one to the CPU. The probability transition matrix, [P], for the central server model consists of a single nonzero row and column, see Figure A-2 of Appendix A. Based on Gordon and Newell's technique the solution to the resulting set of simultaneous equations is obtainable by inspection. Substituting this solution into the product form yields the expression for the equilibrium state probabilities; one must still solve for the normalization constant, which is a nontrivial task. 12 Buzen introduced an efficient iterative procedure to solve for the normalization constant. Instead of growing exponentially, the computational complexity of Buzen's algorithm grows as 0[ Ks ], where K is the number of jobs and s is the number of service centers in the network. Buzen further derived expressions for the busy probability, throughput and mean queue length of each service center (CPU and PPUs). Buzen also developed a similar central server model which allowed the mean service rates to be arbitrarily dependent on queue lengths, but still required exponential distributions. Moore [MOORE 72] independently had applied Gordon and Newell's [GORDON 67] work to modeling computer systems. His approach to obtaining an efficient solution for the normalizing constant was based on a partial fraction expansion method. The complexity of this method is 0[ Ks 2 ]. Buzen's iterative method is less complex, 0[ Ks ], and also more versatile. Reiser and Kobayashi [REISER 75] have generalized Moore's solution technique for mixed networks (i.e. a mix of both open and closed subchains) and removed several of the previous modeling constraints. They provide a general algorithm based on multiplication of power series, which can be viewed as a multi-dimensional linear filter. In an independent effort Lam [LAM 77B] had extended Moore's solution technique to include nondistinct traffic intensities. The concept of "local balance" was introduced by Chandy [CHANDY 72]; the previous works by Jackson, Gordon and Newell, and Buzen were based on "global balance." Local balance is a subset of global balance in which one concentrates on the flow through a single queue, rather than through all queues. More specifically it requires equivalent terms on one side of the balance equation to equal those on the other side, instead of the more general solution to the equation. Chandy has also shown that some other service time distributions and scheduling policies yield the same results as the exponential distribution with a FCFS scheduling policy. Therefore, only the mean service time is required in the product-form solution for these other distributions and corresponding scheduling policies. The four cases that Chandy identifies as having equivalent solutions to the exponential distribution are: (1) exponential service distributions and FCFS scheduling; (2) geometric service distributions whose Laplace transform is rational and a processor sharing scheduling policy; (3) a service distribution whose Laplace transform is rational and a processor sharing scheduling policy; and (4) a service distribution whose Laplace transform is rational and a last-come-first-served-premptive-resume (LCFS) scheduling policy. One way to 13 understand the relationship between easel (previouly the standard) and the others is to realize that a rational Laplace transform inverts to a sum of exponential distributions [COX 55], in other words, a hypo- or hyper-exponential distribution. A hypo-exponential distribution is realizable as a set of serial exponential stages, while a hyper-exponential distribution is realizable as a set of parallel exponential stages. Another attempt to extend the range of probability distributions of the central server model is due to Baskett and Gomez [BASKET 72]. Using an approach similar to Chandy's [CHANDY 72] "case-3" for the CPU, a service time distribution with a rational Laplace transform and a processor sharing scheduling policy, they introduced the coefficients of variation into their model. For this model they derived the equilibrium state probabilities of the network which are identical to Buzen's, so therefore, all of Buzen's results extend to this variation. Baskett and Muntz [BASKET 73] extended Chandy's [CHANDY 72] local balance model by incorporating multiple classes of customers. The allowable classes are obtained from the four cases presented by Chandy. Each service center contributes a factor dependent on its class to the product form solution. The equilibrium state probability distribution is presented in two forms, a detailed form which denotes customer class per service center and an aggregate form of total customers per service center, the latter exists only under specific conditions. Due to the properties of local balance the marginal state distributions for open networks are obtained in closed form. The resemblance between these marginal distributions and those of single server systems is striking. The marginal distributions are equivalent to those of an M/M/l queue (a single server queue with Poisson arrival and exponential service time distribution), and the exception (infinite server, and iid rational Laplace transform) is equivalent to that of an M/G/l queue (a single server queue with Poisson arrival and general service time distribution). The more recent results of queueing network theory, using multiple job classes, are applicable to our SPR acrhitecture. The primary limitation is the memory-space complexity closely followed by the time complexity of obtaining solutions. The state-space for these models grows in an exponential manner. Although efficient algorithms exist for these models, the SPR architecture and many other systems have state-spaces too large to be reasonably evaluated by these algorithms, especially when multiple job classes are involved. 14 C. Approximations The current queueing network limitations of size and structure have provided the motivation for approximate modeling techniques. The thrust of the approximation techniques has attacked the structure limitation. This thrust has occured because previously most subsystems had a state- space whose size was reasonable to evaluate using existing queueing network techniques, but whose operational structure was not. Therefore, rather than applying an accurate analytic model to an inaccurate descriptive model, approximate analytic models were formulated to approximate an accurate descriptive model. As a result of these approximation techniques some economy on the size limitation has also been realized. These approximation techniques can be categorized as decomposition or substitution, neither precludes the use of the other. Decomposition separates the system into various pieces, each piece is modelled individually to obtain a constituent model, and then these constituent models are joined together to form the overall system model. To be able to insure some correlation between the individual constituent models and the overall model, some common relations must usually be satisfied. Although each individual constituent model, and usually the overall system, separately satisfy these relations they do not do so in a consistent manner. Therefore, techniques to coordinate consistent interrelations are required. These relations are generally concerned with the flow and capacity aspects of the system. The basis for this decomposition into individual subsystems stems from the work done by Courtois [COURTO 71]. Courtois investigated the level of coupling between queues and determined that the subsystem selection should be based on this parameter. Queues that were strongly coupled should be decomposed into subsystems such that the coupling between subsystems is weak. The error introduced by this approximation technique is proportional to the degree of coupling. The other general approximation technique is substitution. The substitution technique is concerned with substituting one form (or model) for another in an existing solution method. For 15 example, one might substitute a M/G/l for a M/M/l queue form in an analytic model to approximate the operation of a M/G/l queue in the descriptive model. The concept is that the analyst feels the inaccuracies introduced as a result of this approximation are worth the expediency of utilizing existing methods rather than formulating a new, more complex model or method. Much of the effort to extend the structure limitations have been concerned with the service time probability distribution. An exponential service time distribution is the most prominent. When an exponential distribution is assumed in formulating a model, it contributes to tractable solutions. This is primarily due to two properties of the exponential distribution. First, it may be specified by a single parameter, its mean. Second, it is memoryless, requiring no previous information to determine its future operation. A wide range of analytic models are based on the exponential service time distribution assumption, while the real systems they are applied to do not possess such service times. In many of these cases when the analytic predictions were compared to actual measurements a reasonable agreement was observed. Although efforts continue to formulate models based on more general service time distributions, the robustness of the exponential distribution should not be dismissed. This robustness has been discussed [BASKET 72A] and investigated [GROSS 75], and as a result some quantification of the expected error is available for some approximate substitution modeling applications. Decomposition techniques have been applied by others [BROWNE 75], but these were basically specific to a given model or situation. Chandy [CHANDY 75A] introduced the concept of an equivalent queue, analogous to Norton's theorem in electrical networks. This allows one to transform a subsystem of service centers into a single equivalent service center of a queueing network. This resulting composite service center, referred to as the complement, captures the interface between a specific queue and the rest of the network. Chandy has shown that the equilibrium queue length and wait time distributions of the non-reduced service centers are equivalent to those of the original network, provided local balance [CHANDY 72B] is satisfied by the network. This means that for systems that satisfy local balance, this technique produces exact results. For systems that do not satisfy local balance Chandy [CHANDY 75B] has generalized the above decomposition technique by using an iterative converging flow balance relation to "adjust" the service rate (flow) of the composite queue. 16 The intent of this decomposition technique was to reduce the complexity of the analysis task. If one were interested in the analysis of a specific service center, then all other service centers could be represented by a single composite queue resulting in a two queue system, which is generally less complex to analyze then the larger original system. Systems whose decomposition results in multi-class composite queues must also consider job ordering of the different classes, or sub-chains. The resultant state-space is generally not decreased and little, if any, economy is realized in the complexity of the analysis. In developing our exact SPR model in chapter III, we have applied this decomposition technique. Since multiple job classes are being dealt with, only marginal economy is achieved and it is still necessary to evaluate an extremely large state-space. Currently the product form solution [BASKET 75] of queueing network theory has evolved to a structure allowing a general connectivity of a variety of service centers each of which may have a general service time distribution (having a rational Laplace transform), and may also have a number of different scheduling policies and job classes. A service center having a FCFS scheduling policy, however, is constrained to have an exponential service time distribution. In an effort to extend this structure even further Shum [SHUM 77] presented an "extended product form" (EPF) solution method. Based on the fact that each product factor has the form of an M/M/l marginal distribution, Shum postulated that a reasonable approximation would be to replace each factor with an appropriate M/G/l marginal distribution form. The basic convolution computations and performance measures of product form queueing network theory are still applicable to obtaining solutions for this approximate model. Utilizing this approximate method requires an additional constraint on the solution of the initial set of simultaneous equations. Prior to this approximation the solution to these simultaneous equations resulted in "relative" visit frequencies, which were related to the absolute visit frequencies by a multiplicative constant. This multiplicative constant has no effect on the resultant equilibrium state probabilities or the performance measures, since it is cancelled out of the expression for the product form solution. Therefore, the relative visit frequencies are sufficient for the product form solution. The EPF model, on the other hand, requires that the "absolute" visit frequency be used and, therefore, the multiplicative constant must be obtained. 17 There is no known method for computing the "absolute" visit frequencies. Shum suggests that balancing a flow relation containing the M/G/l substitution factors could be used to obtain them. Once this relation is formulated a bounded binary search process could be used to satisfy a least square error criterion. A procedure is presented for computing the M/G/l product factors for a general distribution with different coefficient of variation (C). When compared to other models (machine repairman model, cyclic model, and central server model) with similar corresponding parameters, an error analysis showed that the largest error, as a function of the coefficient of variation, occured in "mid-range," while exact results were obtained for C= 1 (exponential) and diminishing errors resulted for large values of C. Shum has indicated that further work is needed to extend the EPF model to include multiple classes, and queue dependent service times. The approximation developed by Shum is an example of the combined use of both substitution (M/G/l for M/M/l) and decomposition or flow equivalent techniques (used to obtain the absolute visit frequencies). The EPF approximation is not applicable for the SPR architecture principally due to the need to account for multiple job classes. An approximate solution for queueing networks has been developed by Kobayashi [KOBAYA 74A}, using the Kolmogorov diffusion equations (also known as Fokker-Planck equations). By using a "Central Limit-Theorem" argument, Kobayashi has hypothesized that changes in queue length over a large enough time interval approximates a stochastic process with a normal distribution. As a result, this queue length process can be modeled by a Wiener-Levy process (or Brownian motion) with a suitable boundary condition. Equations for the equilibrium state probability distribution for a queue are developed. When these results are compared to the known solution of a M/M/l queue they are found to be in error. In an effort to reduce this error, these results are modified to conform to this known solution. Using a multi-dimensional diffusion equation this approach is extended to both open and closed networks. Once the the equilibrium state probabilities are obtained they are then substituted back into the product form solution. For the SPR architecture we have already assumed an exponential service process. Therefore, existing exact queueing network theory can be used without the need to determine the probability distributions by using the diffusion approximation. 18 Using this approximate technique the transient state probability distributions are obtainable. Kobayashi [KOBAYA 74B] derives them for a single-server and a cyclic queueing system. Obtaining the transient solution to general open and closed networks is much more difficult, and no method currently exists. Decomposition or flow equivalent techniques apply to exact as well as approximate models. In certain situations these composite sub-models can significantly reduce the size of the overall system state-space, therefore, allowing large system models to be evaluated much more efficiently. There are some problems with this technique. One is that the service center or centers that one may desire to investigate must usually be excluded from any of the composite sub-models. Another is if any parameters of the service centers within a sub-model changes a new sub-model must usually be evaluated and "re-aligned" to agree with the overall model consistencies. This re- evaluation process may negate any size economy that may otherwise been realized. Although the decomposition technique is applicable to the SPR architecture it does not reduce the state-space size. Also, when these techniques are applied to approximate models the resultant accuracy is generally variable and unknown. Chandy [CHANDY 75] has indicated a 10% to 20% expected error for his approximation technique. Others [SHUM 76, KOBAYA 74A] have indicated an error exists, usually by example for a few configurations, but do not indicate what one can expect for the general case. We have mentioned earlier the robustness of the exponential distribution. Realizing that this assumption does result in inaccuracies some studies have been conducted to investigate them as well as errors caused by substituting other service time distributions. Gross [GROSS 75] has investigated the resultant error when a M/M/l queueing model is used to approximate a M/G/l queue. He found that the resulting error was proportional to the coefficient of variation. Buzen [BUZEN 74] has investigated the use of a M/G/l queueing model to approximate a M/G/l/K queue. He found that the largest error occured when the traffic intensity was high or the value of K was low. Buzen [BUZEN 77] also investigated the use of a M/M/l/K queueing model (a single server queue with Poisson arrival and exponential service time distribution, with a queueing limit of K jobs) to approximate a M/G/l/K queue (a single server queue with Poisson arrival and general service time distribution, with a queueing limit of K jobs). His results indicate that 19 response time performance measures have a greater sensitivity to this approximation than do the utilization or throughput performance measures. Also Buzen indicates that the error is proportional to the coefficient of variation, which independently verifies similar results obained by Gross. All of these studies have helped to quantify the error that results from these substitution approximations. Therefore, they provide the analyst with some quantification of the error that can be expected when these substitution approximations are applied. In a similar manner we attempt to quantify the error that can be expected from the approximate model presented in chapter IV. 20 III. Shared Central Server Model A. The Model Utilizing the general theory of queueing networks, a model will be constructed for a special class of computer architecture. The model (figure III-l) is a network consisting of a set of independent computing systems (ICS) and a single shared processing resource (SPR). Each ICS is a central server system (figure III-2) consisting of a single central server (CPU) and a number of peripheral processors (PPUs). Each ICS processes a separate class of jobs, while the SPR processes all classes of jobs. Class distinction is the means by which the job flow from each ICS is kept segregated. This model to be developed will be referred to as the Shared Central Server (SCS) Model. The SPR and the devices within an ICS each represent an intricate subsystem, as do the various communication subsystems interconnecting each of them. The model does not directly incorporate the effects of various strategies that may be utilized within any of the subsystems, such as the effect of different communication protocols. Instead these effects are assumed either to be incorporated into the device processing time or to be insignificant when compared to the overall device processing time. Specifically, this model consists of R ICSs where the i-th ICS is composed of Sj devices. For the i-th ICS, the CPU is denoted as device (i,l), and devices (i,2) through (i,Sj) are the PPUs. The SPR is designated as device (i,0), or just (0), and has a device count s = 1. The network structure can be represented by a vector denoting the number of devices in each ICS, S = (s Q , ... , s R ), and the total number of devices in the network is L= 1 s^. The network is closed, constantly circulating and processing a total of K jobs, which are composed of separate and distinct classes, one for each ICS; while the SPR processes every class using a FCFS scheduling policy. A vector description of the job class allocation is J = (J 1 , ... , J R ), where J { is the maximum number of (class i) jobs in the i- th ICS. The network state n is defined as the distribution of the various classes of jobs among all the devices, n = (n 10 , n ll5 ... , n ls , n 20 , ... , n Rs ), where nj : is the number of class i jobs both waiting and being processed at device j in the i-th ICS. The network job constraints are 21 A A A A A A 1— +j CD »r- -o O O •r— E 'o. .-— ^ X s- CD QJ > c &- fO d) to t/> ^~ 4-> •o cu CT s- C fO •r— .c -O CO CU * — ' O CU to s- CJ> Q. oo X cu jc -Q 4-> CU 4- -C O 1— E fO • i- 00 01 TO l-H •r™ •0 j^ «3 CD .0 ^~ <4- -O O T3 -C CU +-> •r— ^O • M- Q. CD •r" 3 r— 2 CU a. 3 E 1— CT •r™ <4- 00 to CU +-> +-> •r— • ea E S- 4- fO fO O S- a. cn cu C 03 to O •r— "r— -O CU +-> .c fO ±£ +J +-> O C O en CU r— c to Xi •r— CU s S- 10 D. .c CD u~> t/> s- CD S- CD 22 A A ZL OH o o. A O O. I— W> A ■>- A \/ cnx? C C ■r- fO 4-9 13 -—- a. a> E o O T- O > 0) +-> "D c - — CD -o s- C O) CD +J Cl C > CJ) S- 1— 1 CD I/) .c +-> fO •i- if) +-> CD C -C CD -)-> CT -^ u sz w o o +J i— fO •1 — ja uj 1/1 -o +J CD • c •<- 2 GJ 4- O 00 •p— r— CJ r— 4- S- D. C_ E -Q CD •r- O 5- 1/0 •!-} 4-> CD •i — • JZ E -P cn fO c S_ CD •r— cr> a "O fT3 •■- a> •r- S u T3 O CD -C i- _i«i to Q. u o-— X 1 — £ o -Q CD -Q 4-> i>o to CD O >,J£Z •— i GO +J CM i — i i — c 1 — 1 CD i. 3 cn 23 1 ... Pi.o Pi.i Pl,2 Pl, Sl 1 ... • • 1 [?]= • • • • • • • 10... Pr.o Pr.i Pr.2 •• • P **R 10.. • • . • • • • 10. • • • . Figure 1 1 1-3. SCS model transition probability matrix. 24 (la) nj = I nij < J, i>0 i.j — i db) N = (n 1>0 n R0 ) R R sj (lc) Z Jj = 2 In.. = K _ and i = l i=i j=0 I n.. ' i>0 (Id) n. = { j=i I n j0 - i=0 At each device jobs are processed on a FCFS basis with an exponential service time distribution that is independent and identically distributed (iid), and not dependent on the number of jobs waiting for service. The j-th device of the i-th ICS is characterized by its mean processing service rate \i { ., where u^ = u Q for all i. The job flow (as indicated in figures III-l and III-2) is from the i-th CPU to one of the devices within the i-th ICS or to the SPR with a constant known probability, (Kp i l . [ -<1 , where OXj^. Note that the jobs do not change class as they transit between devices. After being processed by a PPU or the SPR, the job returns to the i-th CPU with probability p i =. i 1 = 1. All other transition probabilities are equal to zero. The transition probability matrix for the SCS model is shown in figure III-3. The p 4 -^ •' s are the only transition probabilities that are not or 1 and, therefore, they are the only transition probabilities that need to be specified by a variable. Based on this we may shorten the subscript notation for these transition probabilities to p ; •. The SCS transition probability matrix can be conceptualized as a matrix whose diagonal is a set of sub-matrices and all non-diagonal elements are zero. Each sub-matrix is an (Sj+ 1) x (Sj+ 1) matrix, with a single non-zero row and column. 25 Within the model the job flow is assumed to occur in zero time, although in an actual system a finite amount of time is required. The communication subsystem for these architectures generally has sufficient excess bandwidth to prevent it from becoming a bottleneck [WATSON 80, THORTO 80]. If the communication time is significant compared to the device processing time, then the device processing time can be increased to account for it. So, this modeling assumption is reasonable. With this "micro" information one can then compute the relative load factor [GIAMMO 76], x- •, for each device by solving a set of simultaneous equations as follows: (2a) (2b) R s. e i,j ~ * * e nU p m,k;ij m=l k=0 e U x ii ss !J U U , and where e i . is the relative visit frequency to the j-th device in the i-th ICS by a class i job. For the SCS model the relative visit frequencies can be found in terms of e [ , by substituting the transition probabilities of figure III-3 into (2a). Choosing & 1 = u i l results in the following (See Appendix A Section 3): > u ={ u u , j = l (CPU) Pyu-i , j*l (not CPU) Hence, x^ , = 1 by choice. Using the relative load factors, the state equilibrium probabilities can be computed by utilizing the product form solution [BASKET 75] (also see Appendix A, sections 3 and 4) (3) P(n) = -L fcW n f z (n z ) G(J) z=i 26 where z is a mapping from the double device indices (i,j) to a single index (somewhat analogous to the mapping of a Fortran two dimensional array into a linear address space), such that , j=0 Vi = { J , j>0 i = l m (4) m=l R L= 2 s. i=0 i-1 j+ I s m , j>0 i>l , and The product factor is (note, the subscript order is reversed from Appendix A, Section 4) (5a) f (N ) = n ! n x i,0 n i,0 i = 1 \o ! (5b) f(n) = x * , z>0 z v z The normalization constant is G(J)= 2 f (N ) n f z (n z ) n z=l and the sum is over the enure state space n, which is constrained as specified in (1). A computationally efficient method of computing the normalization constant was first presented by Buzen [BUZEN 71], for a single class of jobs (see appendix A). Others have since presented a generalization of this iterative method for multiple job classes [MUNTZ 74, GIAMMO 76, SHUM 77]. The efficient generalized computation method requires evaluation of an auxiliary function. This auxiliary function has been established as an aid to explicitly present the recursive structure of the normalization constant (Appendix A, sections 3 and 4). This auxiliary function, for our model, is 27 g(M;z) =g(M;z-l) + x 2 gCM-d^z) l K X r,0 S R 2 [ {n ! n } n x Rk n R,M (11) s R r=l n r0 ! k = l Sn R,k^ J R k=l Rearranging and further factoring (11) results in R-l n i X r,0 ,0 n R,0 X R,0 S R n 2 [ {n ! } n r=l n r,0 ! S R 2n R,k^ J R n R,0 ! k = l k = l R-l "r.O J p n R,0 x r,0 R X R,0 S R = ( n ) [2 I {n ! } n x Rk Vk] r=l n r0 ! m R =0 s R n R0 ! k = l Sn R,k = m R k=l Noting that n r q = J r - m r further yields R-l _ n r.O j n R,0 x r,0 K X R,0 (12) ( n ) [I ( { n Q ! } 2 n x R k n R.k ) ] r=1 n r,0 ! m R = n R,0 ! S R k = 1 In R.k = m R k = l 30 The inner-most factor of (12) is recognized as a recursive function of the form s R s R -l s R z n x Rk Yk = 2 n x RJ > + Xrsr 2 n x R1 > s R k=l s R -l k = l s R k = l Sn R,k = m R 2n R,k = m R Sn R,k = m R 1 k=l k=l k=l This can be expressed on a term by term basis by a family of auxiliary functions, one for each ICS, in a form similar to (6) by letting: g r (m r ) = g r (m r ;s r ) = g r (m r ;s r -l) + x^g^-l-.s,) where g r (0;s r ) =1 , and g r (m r ;l) = x rl m r Therefore, (12) becomes R-l _ "r.O J R n R,0 x r,0 K X R,0 (13) ( n ) [1 n ! g R (m R )] r=l n^! m R =0 n R:0 Substituting (13) into (10) results in R " 1 s i x i,0 ' J R X R,0 G(J) = I { n n f 4 k (n. k ) [ 2 n ! g R (m R ) ] } s i 1 = 1 k = 1 n i,0 ! m R = n R,0 ! 2n ij< J i' Vi=0 m R = i = l J-i Jp K- R = 2 ... 2 { [ 2 x r0 f (J-d r -M) ] n g i (m i ) } m-,=0 m R = ^ r= l * = 1 R R J : J r -1 J R R = f (0) n g.(j.) + i x r0 {i ... i ... 2 f (j-d r -M) n g^) } i = l r=l m-,=0 m =0 nip=0 i = l Noting that J l V 1 J R R G(J-d r ) = 2 ... I ... 2 f (J-d r -M) n gjCm.) , and m-j=0 m =0 mp = i = l f (Q) = l , yields a recursive relation for the normalization constant as (19) G(J) = fl gjQ) + I x r0 G(J-d r ), i=l r=l where G(0) = 1 . The above normalization constant is computed as the convolution of the product factors for the SPR and the auxiliary "g" functions for each ICS over the range of the job class allocation vector. This means that due to the structure of the SCS we are able to represent each ICS by a 33 single composite "g" function rather than by a set of product factors, one for each device. The "g" functions represent the convolution of the product factors for each device in an ICS, and can be computed efficiently. The state equilibrium probability is obtained by solving for G(J), by using any of (15) through (19), which can then be substituted into (3) to yield R s- (20) P(n) = _±_ f (N ) n n f ou G(J) i=l j=l B. Performance Measures Having established the basic relationships to compute the state probabilities, we shall now utilize them to form relationships for some performance measures. We shall temporarily exclude the SPR from the following. The threshold equilibrium queue length probability distribution, which is the marginal probability that device (i,j) is serving k or more jobs, is P[ nij >k] =5^P(n) (21) n 3 n->k = 1 R s f (N ) n fl f a (n r>t ) G(J) n 3 n H >k r=l t=l n . k From (5) the expression for f rt (n rf ) is x t r - 1 and, therefore, when n-->k a factor of x-: can I , L 1,1 1,1 1 J 1 J be extracted from (21). This extraction changes the job class allocation vector over which sums are taken, from J=(J 1 , ... , J R ) to J' = (J 1 , ... , Jj-k, ... , J R ) and also causes a corresponding change in the state-space from n to n\ Applying these transformations to (21) results in R s r P[n y >k] = _1_ x,/ 2 f (N ) n n f rt (n rt ) G(J) n' r=l t=l 34 Noting the similarity of the summation portion of the above expression to (9) results in P[ nij >k] = G(J') X./ G(J) (22) = Gd-kd j) Xj, , for ij*0 G(J) If the marginal probability of device (i,j) is desired, it may be expressed as P[n y = k] =P[ nij >k] - P[ nu >k + 1] = _i_ [xj/Gd-kdi)- x. k+1 G(J-(k+l)di)] (23) G(J) X U k [ GCJ-kdj)- x y G(J-(k+l)d.)] , for i,j*0 G(J) Of more interest than these probabilities are the performance measures of device (i,j), such as the busy probability, A- •, the mean queue length, Q. =, and average throughput, T- •. The device *o *»j m busy probability is obtained from the threshold marginal probability by noting that (24) A.. =P[n i >l] = Gd-d jJL *n , for i,j*0 G(J) The mean queue length of device (i,j) is by definition J i Q u = 2 k P[n. j = k] k = l = 2 k { P[n io >k] - P[ny£k+1] } k = l 35 J i J i 2 k P[n H >k] - 2 k P[n H >k+l] k=l k=l h h = 2 kP[n >k] - {2 (k-l)P[n >k] + J i P[n i >J. + l] } k=l k=2 Noting that P[nj :>Jj] =0 results in J J- J- 1 1 1 Qij = Plny^l] + 2: k P[n y >k] -{2k Ptn-^k] - 2 P[n y >k] } k = 2 k=2 k=2 (25) = 2 Ptnjj^k] , for i,j*0 k = l Substituting (22) into (25) yields (26) Q y = 1 2 x^CKJ-kdj) , for ij*0 G(J) k=i The device throughput when the service rate is independent of the queue size is defined as (27) T iJ = 2 "u^Kr*] k = l = «y 2 PK J = k] k = l = Uy {1- P[n^0] } = u >j \j 36 = GO-d j) e.j , for i,j*0 . G(J) The i-th ICS throughput for service rates independent of queue size is defined to be that of its CPU, which is (28) T. = Gg-djLe-! G(J) It should be noted that although this measure is referred to as throughput, it may more properly be thought of as effective processing rate or departure rate, as can be seen from its definition in (27). Similar measures for the SPR will now be derived. The SPR busy probability is (29) A =P[n >l] =1- P[n = 0] From (1), it can be seen that when n Q = 0, n i0 =0 and I n i - = J i for i = l, ... , R. From (5), when n Q = it can be seen that f Q (0) = 1. Substituting this into (20) yields R s. P[n Q = 0] = _J_ 2 I! n fydly) G(J) s. i=l j=l 2 ny =Jj , Vi J = l Repeating the same partitioning and factoring process used to obtain (14) results in R P[n = 0] = _1_ n g.(J.) G(J) i = l Substituting this into (29) yields 37 R (30) A = P[n >l] = 1 - n ^U,! i = l G(J) To obtain the mean queue length of the SPR its aggregate marginal probability, independent of class, must first be computed. This may be expressed as P[n Q = k] = 2 P(n) n R 3 2n i0 =k i=l R s- = -J- Z f ( N 0) " n fj/lly) G(J) n R i=l j=l 3 Sn i0 =k i = l R x rQ n r,0 R s . = _L_ 2 { [ k! n ] [ n n xy\i ] } G(J) n R r=l n rQ ! i = l j=l 3 En iQ =k i = l R X r>0 n r,0 = _kj_ i n { [ ] n x ri n rj } G(J) n R r=l n r0 ! j=l i = l rj 3 Sn i0 = k R X A0 s K A r,0 s r = k! i [ s n { n x rj n rj } ] G(J) R n-N r=l n rQ ! j = l Sn. =k i = l * x ,o nr '° - _y_ { n _ [ i n x rj n rj ] } G(J) R r=i n r0 ! s r j = l Sn ,,0 = k In rj = J r" n r,( i = l j = l 38 Substituting, into the above, as in (13) yields an expression for the aggregate marginal probability for the SPR as R x A.O (31) P[n = k] = _k_L_ 2 [ n 8 r (V n ro) 1 G(J) R r=l n r0 ! Sn i0 =k i=l R = _kj_ 2 n hO^-n^) G(J) R r=l 2n. = k i=l Forming the defining equation for the SPR mean queue length and then substituting (31) results in Q = 2 kP[n =k] k = l (32) = 2 k { k! 2 n h(r;J r -n r0 ) } k = l G(J) R r = l 2n i0 = k i=l 3 n i>0 0 n h(i;J r n. ) }. G(J) n 1Q =0 n rQ =l n R o=0 i=l Forming the defining equation for the throughput of the SPR and then substituting (30) results in T = u o A R (35) n g.(j.) = u {l •- } G(J) We have derived the standard queueing network probabilities and performance measures for the SCS model, which are recapitulated here: 40 The normalization constant (36) G(J) = 2 ... Z (foCNo) n gA'^o)} ' where (37) J l J R G(J) = 2 ... I " n l,0 = n R,0 =0 W N o) = 2 x. foflvcy i = l i = l The device busy probability (38) A y = GfJ-d jX, x y , and G(J) R n tfiO (39) 1.1 A = 1- G(J) The mean queue length J i (40) Q y = 1 2 Xj/Gd-kdj) , for ij*0 G(J) k=i K R (41) Q = 1 2. k ! k { 2! n h(r;J r -n r ) }, and G(J) k=l R r=l Xn i0 = k i = l J l J r J R R (42) Q r0 =^_ 2. ... 2 ... 2 {n !n r0 n h(i;J r n l0 ) } G(J) n 10 = n r ,o = 1 n R,(r i = 1 41 The device throughput (43) T., = ujj A { . = Gd-dj ) ft: , for i,j*0 , and n gA) (44) i=i T = u o A = u o L 1 ' J G(J) The system (class) throughput (45) T, = CKJ-dj ) e n G(J) C. Computational Algorithms As discussed previously the general iterative procedures for computation of the normalization constant and other performance measures require substantial memory space. In an effort to reduce the memory required without significantly increasing the computations we have reformulated the network expressions utilizing the structure of the SCS model. In addition, we were able to derive expressions for mean queue length, which in general are not available in the literature. Current iterative algorithms to evaluate our expressions require processing time and memory-space that grows exponentially. We will present algorithms to evaluate our expressions which use a minimal amount of memory and require the same order of processing time as do the existing iterative procedures. Examining the expressions for our performance measures it can be seen that the computations are all very similar, requiring the sum over a restricted state space. Two forms of this computation are represented by (15) and (17), which we shall call the sum-of-products (SOP) expansion and the factorial (FAC) expansion, respectively. Each form has its advantages and disadvantages. 42 The SOP expansion minimizes the number of multiplications, but places a burden on the factorial computation since the value is not monotonically changing. The FAC expansion simplifies the factorial computation, but requires generation of all states in a restricted sub-space. We will present efficient algorithms for both evaluation forms and for the generation of a restricted sub-space. The SOP algorithm requires an efficient method to evaluate the factorial, n Q !, in the inner- most product term of (15). This value is a function of all the indices and, therefore, is continually changing during the evaluation of (15). If the value of n Q were monotonically increasing then an efficient method to compute the next factorial value n Q , based on its previous value, is the well known recursion n ! = (n -l)!n In our case the value of n Q varies in a cycle which first monotonically increases and then abruptly decreases. This decrease occurs at well defined points; when any product term "sum-limit" is reached. By keeping track of the last value to be factorialized and its factorial value for each product term, the above efficient method may still be applied. The SOP expansion is of the following form (note: m— Jj-n i0 ): J i J R-1 J R G(J)= 2 hd^-n^) [ ... [ 2 h(R-l;J R .];n R . 1>0 ) [l h(R;J R -n R0 )n ! ] ]...]. n 1>0 =0 n R-l,0 =0 n R,0 = Defining three temporary vectors as follows: T=(tj_, ... , t R ) is the accumulated sum of job class distribution, where r tj = 2 n i0 , note: t R =n Q i = l W = (Wp ... , w R ) is the accumulated factorial values, where w f = t r ! , note: w R =n Q ! V = (v 1 , ... , v R ) is the accumulated summation value of the product terms, where 43 J i J R-1 J R v. = 2 h(i;J r n. ) [ ... [2 h(R-l;J R4 -n R . 10 ) [ 2 h(R;J R -n R0 )n 1] ] ...] n 1Q = n R-l,0 = n R,0 = Thus Vj = G( J), and v R is the innermost product term. The SOP algorithm proceeds as follows: BEGIN: SOP algorithm STEP 1: initialize 1-st vector elt. i=l t i = w.=0 v i= STEP 2: compute remaining vector elts. STEP BY 1 j = i+l TO R BEGIN n j0 =0 w-w v J = END STEP 3: compute inner-most product term STEP B T BEGIN STEP BY 1 n R0 = TO J R V R = V R + W R* h ( R ' J R- n R,o) END i = R t R = t R + 1 W R = W R* t R STEP 4: expand outward accumulating product term sums i=i-l IF i = n i > + 1 STEP 5: update factorial if "sum-limit" not reached IF n i0 >Jj GO TO STEP 4 t^ + 1 GO TO STEP 2 STEP 6: terminate algorithm STOP END: SOP algorithm 44 The SOP algorithm requires storage for three temporary vectors (T, V, W), each containing R elements, and for the R vectors of h(i,J i ), each containing J.+l elements. Therefore, the total storage required for this algorithm is 3R + 2 (J. + l) = 4R + 1 J. = 4R + K i = l i = l To determine the number of operations needed to evaluate the SOP equation form, note that one addition and one multiplication are required for each combination of the first (outer-most) R- 1 product terms. For each of these combinations the entire inner-most (R-th) product term and the factorial must be evaluated, requiring two multiplications and one addition at each step. This results in an operation count on the order of R-l R-l [n (J. + l)] [2 + 3(J R + l)] =[5 + 3J R ][n (Jj+1)] i=l i=l The FAC expansion requires an additional algorithm to sum over every state in a constrained sub-space, which yields all combinations of different job classes keeping the total number of jobs constant. The sub-space is defined by all solutions to 2 n i>0 = k i=l with the constraint of n. < J. , V i To sum over the constrained sub-space start with class 1 jobs. Next from a total of k jobs determine the maximum and minimum number of jobs that can be allocated to classes 2 thru R in conjunction with the constriants, O^n^Jj. Then determine the allowable range of class 1 jobs based on the maximum and minimum values just computed. Stepping through the range of class 1 jobs, determine the maximum and minimum number of jobs that can be allocated to classes 3 45 through R from the remaining k-n 10 jobs. Then compute the allowable range of class 2 jobs. Continuing this procedure for each job class results in the following state-space generation process MIN[J 1 .mj MIN[J R ,m R ] 2 ... I n 10 =MAX[0, qi ] '* n RO =MAX[0,q R ) where rrij represents the number of jobs to be distributed over queues i thru R, and is expressed as m^-n^ , R>i>l k , i = l and q i represents the minimum number of jobs that must be placed in the i-th queue (which may be negative), and is expressed as "rt+i ■ il, n i . I0 =MAX[O,q i . 1 ] n RO =MAX[0,q R ) UP= (up^, ... , upj^) is the maximun number of jobs that can be allocated to the i-th queue, where upj = MINIJj.mj] The FAC algorithm proceeds as follows: BEGIN : FAC algorithm STEP 1: initialize fac=l l R = J R v R = k = -l STEP BY ■1 i = R-l TO 1 BEGIN V: 1 = V = t i + i + J i END 47 STEP 2: compute 1-st elt of temporary distribution vectors k = k+l IF k>t x GO TO STEP 8 m,=k qi=m r t2 n 10 = MAX[0,qJ up-^ MINp^mj i=2 STEP 3: compute remaining vector elements STEP BY 1 r=i TO R-l BEGIN m r = m r-f n r-l C lr = m r- t r+l n r0 = MAX[0,q r ] up> MIN[J r ,m r ] END STEP 4: compute inner-most product term STEP BY 1 n R . 10 =n R . 10 TO up^ BEGIN n R,0 = m R-l" n R-l,0 v R =v R +h(R-l;J R . 1 -n R . 10 )*h(R;J R -n R0 ) END i=R STEP 5: expand outward accumulating product term sums i = i-l IF i<2 GO TO STEP 7 v i = v i + v i + l* h ( i - 1, ' J i-l" n i-l ) o) v i+ i=0 n i-l,0 = n i-l,0 + 1 STEP 6: test if "sum-limit" reached IF n^up^ GO TO STEP 3 GO TO STEP 5 STEP 7: accumulate outer-sum term & update factorial v-,= Vj + v 2 *fac fac = (k+l)*fac GO TO STEP 2 STEP 8: terminate algorithm STOP END: FAC algorithm 48 Solution Method Storage Requirements Computation Requirements 1 SOP FAC SCS Iterative 4R + K R-l [5 + 3JJ [ n (Jj+1)] i=l 6R+K 2 [ n (J|+l)] +3K i = l R n (Jj+i) i = l 4R[ n (J: + l)] i=l General Iterative R n (Jj+1) 2(R+L-1)[ n (Jj+1)] i = l Figure III-4. Storage and computation complexity. 49 Solution Method I Storage Requirements Computation Requirements SOP example 1 example 2 18 54 120 155,520 ! FAC example 1 example 2 22 66 102 93,402 r scs Iterative example 1 example 2 36 46,656 288 1,119,744 General Iterative example 1 example 2 36 46,656 Example 1: R = 2 S = (1,3,3) K = 10 J = (5,5) Example 2: R = 6 S = (1,3,3,3,3.3,3) K = 30 J = (5,5,5,5,5,5) Figure III-5. Example of storage and computation complexity. 50 The FAC algorithm requires storage for five temporary vectors (M, Q, T, V, UP), each containing R elements, and for the R h(i;Jj) vectors, each containing Jj + 1 elements. Therefore, the total storage required for this algorithm is R R 5R+I (J. + l) =6R+I J i= 6R+K i=l i=l The total number of states in the state space is R n (Jj+l) i=l The number of steps carried out for the inner product terms is equal to the total number of states. Each step (a value of k) requires one addition and one multiplication. The outer term for each step requires two multiplications and one addition. This results in an operation count on the order of R R R [211 (J.-hl)] +[3 2 Jj =[2n (J.+l)] + 3K i=l i=l i=l Figure III-4 provides a summary of the storage and computational requirements of the SOP and FAC algorithms, as well as those for the general iterative procedure for multi-class queueing networks and for that procedure adapted to the SCS model. The general iterative procedure is adapted to the SCS model (SCS iterative) by representing each ICS as a single equivalent device [CHANDY 75B, GIAMMO 76], therefore, the equivalent number of devices L-l now becomes R. Note that the storage requirements for the SOP and FAC algorithms increase linearly with the number and distribution of jobs, K and J, and ICSs, R; while the storage requirements of previous algorithms increase exponentially. In figure III-5 these requirements are evaluated for two examples', the first is a small network of 2 computer systems (or job classes) with a total of 7 devices and with 10 jobs equally allocated between the 2 systems; the second is a moderate network comprising 6 computer systems (or job classes) with a total of 19 devices and with 30 jobs equally allocated among the 6 systems. 51 It can be seen that the SOP and FAC algorithms require very little storage compared to both iterative algorithms, while also requiring fewer computations. In addition, the mean queue length for all devices in the SCS model can be computed using either of the algorithms; whereas, in the general case no effective procedure yet exists. Although it should be noted that by using the iterative procedures relatively little additional computation is needed to obtain the device busy probability and the mean queue length for all but the SPR. Computing these measures using the SOP or FAC algorithms entails a larger amount of computation, but also includes evaluation of performance measures for the SPR. Comparing the SOP and FAC algorithms one can see from figures III-4 and III-5 that the SOP algorithm uses less storage while the FAC algorithm requires fewer computational steps (i.e. less time). To complete this discussion we shall present our expressions for the performance measures of the SCS model, equations (36) through (45), restructured into forms readily evaluated by the FAC or SOP algorithms. Normalization constant J i J R-1 J R (46) G(J) = 2 h(l;j 1 -n 10 ) [ ... [ 2 h(R-l;J R . f n R . 10 ) [ 2 h(R;j R -n R0 )n ! ] ] ...] , or n 1Q = n R-l,0 =0 n R,0 = K MIN[Jj ,m { ] (47) = 2 n ! [ 2 h(i;J r n i0 ) F^nQ-n^)] , for any i , n =0 n i0 = MAX[0,q i ] where, R F i( n o" n i,o) = 2 n kfrV^o) ' and R r = l ,*i Sn i,0 = n 0" n r,0 r=l ,*i R Qi = n - 2 J r r=l , *i 52 Device busy probability J l V 1 J R-l (48) Ay = jc.:_ I h(l;j r n 10 )[...[2 h(i-l ;Ji -l-n. )... [ I hCR-lu^-n^,,) G(J) n li0 = n i0 = n R-l,0 = ° J R [2 h(R;j R -n R0 )n !] ] ... ] ...] , or n R,0 = K-l MINfJj -Lnij] (49) Ay = JL^ 2 n ! [ 2 h(i;J r l-n. ) F^-iiy,) ] , i = n i0 = MAX[0 >qi ] R (50) A = P[n >l] = 1 - n _^Lli± i=l G(J) Mean queue length J i J i J i-i " k (51) Qy = _J_ 2 xj [lh(l;j r n 10 )[...[l hO-l-j^-n^) G(J) k = l n 10 = n i-l,0 = J R-1 J R [I h(R-l;j R . r n R . 10 ) [2 h^-n^no! ] ] ... ] ...] , or n R-l,0 =0 n R.0 =0 Jj K-k MIN[Jj -k.no] (52) Qy = _1_ 2 x./ [ 2 n ! { I h(i;J.-k-n. ) F-Oyn^) } ] , and G(J) k=l n Q = n i0 =MAX[0,q i ] J l J R-1 J R (53) Q = _J_2 h(l ;Jl -n 10 ) [...[l hCR-ljJ^-n^o) [ll h(R;j R -n R0 )n o !n Q ]] ...], G(J) n 10 = n R-l,0 = n R.0 = 53 K MIN[J t ,m.] (54) Q = 1 2 n ! [ n I h(i;J f n i0 ) P(n -n i0 )] , and G(J) n Q = l n ij0 =MAX[0, qi ]' K MIN[J t .nij] (55) Q i0 = _L_ 2 n ! [ 2 n- h(i;J f n i0 ) Ffc^ ] G(J) n Q = l n i0 =MAX[0 >qi ] The device throughput (56) T n = Uij A n = GfJ-d ^ fc, , for ij^O , and G(J) R n gi(jj) (57) i = l T - u o A - u - {i } G(J) These expressions can be computed simultaneously in groups, equations (46) and (53), or (47) and (54) comprise one group, and (49), (52) and (55) another. Once the values for these performance measures are computed they can then be applied to directly evaluate the remaining equations, (50), (56), and (57). The FAC and SOP algorithms can be modified to compute each group at the same time; this is especially useful for the later group which can share intermediate values ( e.g. Fj(nQ-nj q) ) and, therefore, eliminate duplicate computations. A Fortran implementation of these algorithms was developed and is used later in this dissertation to compute values for these performance measures. 54 IV. APPROXIMATE SCS MODEL A. The Approximation Efficient algorithms for queueing networks have been previously developed [BUZEN 73, MUNTZ 74, SHUM 76], and a new algorithm that is very efficient in its memory space requirements has been presented here in chapter III. Still, it can be seen from Figure III-4, that the computation time is a significant burden; it is of exponential complexity and, therefore, computationally intractable. In addition, the complex form of the equations conveys little useful intuitive information or discemable insight. Some previous efforts have concentrated on developing approximate solutions for various models. Reducing the computation and memory-space complexity, or generalizing the modeling assumptions are the primary motivations. These generalizations include more general service time distributions, accounting for passive resources, simultaneous acquisition of multiple resources, resource blocking, priority and other scheduling policies, state dependent routing, and others [CHANDY 78]. Kobayashi [KOBAYA 74A] has utilized the diffusion approximation to model queueing networks with general service time distributions, assuming a Poission arrival process and a FCFS scheduling policy. This approach has the potential to investigate the network transient state behavior [KOBAYA 74B]. The diffusion approximation is primarily applicable to open networks and currently has limited utility for a closed network. Chandy [CHANDY 75A] has introduced an aggregation technique similar to Norton's theorem in electrical circuits. This technique allows one to represent a number of queues as a single equivalent queue. If the queue satisfies local balance [CHANDY 72B], then the technique yields exact solutions; if not, a similar technique with an additional flow approximation procedure yields approximate solutions [CHANDY 75B]. These techniques are of computation time and memory-space complexity equivalent to those of the convolution algorithm of basic queueing network theory [BUZEN 73, MUNTZ 74, SHUM 76]. 55 Other approximation efforts have studied the effect of substituting one queueing type for another. Buzen [BUZEN 74] investigated using a mathematically less complex M/G/l service center to approximate an M/G/l/K service center. In a later effort Buzen [BUZEN 77] approximated an M/G/l/K service center by using an M/M/l/K service center. Buzen's efforts were directed towards a single service center; whereas, Shum [SHUM 76] investigated the substitution of M/G/l product terms for M/M/l product terms in an effort to approximate general service time distributions in a multi-class queueing network. Avi-Itzhak [AVI-IT 73] used a conservation of flow argument to establish an expression for the mean burst cycle time in a central server model. This expression requires the mean number of busy servers (busy probability) at a central server, which must be obtained by solving the queueing network equations and summing over the entire state space. He then used this parameter along with an assumed geometric cycle distribution as an approximation to the queue dependent mean service rate in a single server queue. Solving the basic state balance equations, assuming the arrival process is Poisson, results in expressions for waiting and delay times for the system. A major obstacle in using the basic queueing equations as approximations to queueing networks is the difficulty of relating the corresponding input parameters of the basic queueing equations to those of queueing networks. Queueing networks require the mean service rate of each device, Uj , transition probabilities between devices, p 4 ■ , and the number of jobs constantly circulating in the network, K. The basic single server queueing equations require the same first parameter, but utilize arrival rate as the other. We shall utilize a similar conservation of flow argument as Avi-Itzhak to establish a relationship between arrival rate and the number of jobs in the network. From this we shall utilize independent single server queues to approximate the behavior of the SCS queueing network model. Assuming that each device of the SCS is an M/M/l single server queue, it can be shown [BURKE 56, FINCH 59, BURKE 72, MUNTZ 73, KLIENR 76] that the arrival process is equivalent to the departure process. The arrival process in an M/M/l queue is Poisson with parameter a, therefore, the mean flow rate in is equal to the mean flow rate out: rate in = rate out = a 56 For the CPU in each ICS the flow out is decomposed into separate Poisson flows, which proceed to the various PPUs and the SPR. The decomposition of a Poisson flow in this manner is linear [COFFMA 73, pg 149-150]. For the SCS this results in (1) a cpUi -2 a^ + agp^ j=2 and a ij ~ Pij a CPU 4 (2) _R ^PR- ~ P SPR; a CPU ; ~ p i,0 a CPU- ' and ^PR ~ * ^PR; 1 11 ■ 1 i = l where the following subscript notation is adopted for clarity SPR; = i,0 CPU; = i,l PPUjj = i,j j>l Having established a flow relationship between devices, a relationship between the queueing network parameter K and the independent single sever queueing parameter a ■ is necessary. For an M/M/l queue the mean queue length (including a job in service), given its mean arrival (a) and service (u) rates, is [KLIENR 75, KLIENR 76, COFFMA 73] (3) Q = 1 1/p-l where p = a/u. By assuming each device is an independent M/M/l single server queue we may use (2) to establish the following relation R sj (4) K = 2 { 1 + 1 + 2 1 } i=l 1/PsPR" 1 i/Pcpu" 1 J = 2 ^Pppu." 1 1 *J 57 (5) J, = _ai + _I + 2 where 1/Pspr" 1 i/PcPU* 1 J =2 ^Pppu-." 1 1 *J R 2 PsPRj a CPU i PSPR = -1=1 U SPR Pij a C3>Uj Pppu. , _ U ij ^PUj P CPU . = » and i u «i = CPUj PsPRj a CPUj R 2 p SpR ^ a cpu ^ r=l Assuming father that each ICS is identical (i. e. a balanced system), this then yields (6) J. = 1/R + 1 + 2 L l/p SPR -l l/p cpu -l j=2 1/pppu ..-1 i 1J 58 B. Computational Algorithm and Performance Measures Given the queueing network parameters (J-, p- = , and u- . ) one may approximate the flow rate, a cpu , by solving (6). Although (6) is an equation with a single unknown and not computationally complex, it does not lend itself to a closed form analytic solution. We shall present an algorithm, utilizing a bounded binary search technique, to efficiently solve (6) for a CPU • R ewi "iting (6) results in s i (7) Jj = 1/R + __1 + Z __! l/(a cpu x SPR ) - 1 l/(a cpu x cpu )- 1 j=2 l/Ca^y x ppu ) - 1 where R PSPR; X SPR — U SPR X PPUj = = PiJ U ij X CPUj = 1 U CPU ; , and Without loss of generality assign u cpu =1. This produces a normalizing effect, allowing all other service rates to be stated relative to this standard unit of service. A lower bound for flow rate is zero, and from inspection of (7) an upper bound is MAX[x SpR , x^^. , x pp , - , ... , x ppl , ]. Using these flow rate bounds, a binary search technique may be used to u i,Si approach the flow rate that will satisfy (7) to within some arbitrary error 8 . This algorithm. 59 the BIN algorithm, is stated more formally below. Because systems are balanced for i=l, ... , R (i.e. identical) we perform the following on the arbitrarily chosen system i=l. BEGIN BIN STEP 1: compute initial parameters x spr = ( r Pspr) /u spr CPU = STEP BY 1 j = 2 TO s= j i X PPU 1J = Plj /u lj END STEP 2: set initial search bounds low = high = MAX[x spR , x CPU ,Xp pUi , ... ,x ppUi ] STEP 3: evaluate at midpoint of bounds mid = (low+high)/2 val = (l/R)/(l/(midx SPR )-l) + l/(l/(mid x^-l) STEP BY 1 j=2 TO Sj val=vaH-l/(l/(midx pplJ )-l) lj END STEP 4: convergence test and adjust bounds IFdval-Jj < 5) GOTO STEP 5 IF(valJ 1 ) high = mid GOTO STEP 3 STEP 5: terminate with flow rate = mid STOP END BIN The BIN algorithm requires storage for the vector X, containing s s + 1 elements, the convergence error, and the instantaneous solution along with its corresponding search region description (bounds and midpoint). Therefore, the total storage required for this algorithm is (Sj+1 ) + 1 + (1 + 3) = Sj+6 The number of operations necessary to evaluate (7) using the BIN algorithm depends on the number of iterations required, which is a function of the convergence error. For 2"^ n+1 K 6 < 2" n , the maximum number of iterations is n. Each iteration requires 4 + 5 s, operations, thus requiring an operation count 0[ n(4 + 5 s^) ]. Comparing these complexities with those in figure III-4 of chapter III, one can see the significant advantage of this approximation over the exact model. 60 The BIN algorithm applied to (7) allows one to determine the job flow rate for a given set of queueing network parameters. Once this is done then the performance measures for each device may be easily computed using the following [COFFMA 73, KLIENR 75] : Device busy probability (8) PrhVO] = PiJ Pr[n >0] =p spR Device mean queue length °o= l (9) l/p SPR - 1 °ii = l l/ P?PU. .' l y Device throughput (10) T = U SPR Pspr T- . = u- o A ij u ij ^ij ij>0 U>0 Note, throughput may more properly be referred to as the effective service rate or departure rate of the device, which for an M/M/l queue equals a, the arrival rate. The mean cycle time of a job is the mean time (wait or delay) between successive requests to the CPU by the same job. This is the weighted sum of the mean time it takes a job to be serviced at each device. Using Little's formula ( W = Q/a) this may be computed as s i W i - e SPRj W SPR + W CPU i + 2 e PPUy W PPU i : (ID j=2 e SPRjQsPR QcPL'j s i e PPU H QpPU-: + + I T SPR a CPU^ j-2 T PPU i( 61 e SPR i QsPR QcpiL s i e ppiL .Qppu- j R PsPRj a CP\J { a CPUj j = 2 ^UjPpPU:: s i 1 { Q SPR /R + Q CPU . + I Q ppu .. } a rpi j. j=2 - J i /a CPU 4 where the relative visit frequency is (note: Uq,^ = 1) *-{ PsPR^CPU^PsPRj Pppu- U cpu - Pppu- ij i ij An approximate analysis technique has been presented for a balanced SCS model which is much less complex to evaluate compared to existing efficient queueing network technique. The question remains as to the error this approximation introduces, and a justification for the choice of an M/M/l queue. An M/M/l/K queue is an M/M/l with a finite queue length, and intuitively would seem to better approximate the operations of the individual devices of a closed network. We have investigated the use of this well known queue, and typical results for device throughput and mean queue length are presented in figures IV-1 and IV-2. As can be seen from these figures the M/M/l/K queue did not produce significantly better results than the M/M/l queue for the examples considered. Generally, the most important aspect of using these models concerns when and how these curves react to variations in parameters. Little , if any, significance is associated with the absolute values of these curves, except on a relative basis. This implies that the primary importance of any approximation is in "tracking" the actual curve rather than replicating it. As can be seen from the figures both the M/M/l and the M/M/l/K approximations track the exact (SCS) queueing network results. 62 u (0 X o o - evj csj oj II II II II CI <-} OO cc D_ CO Q. 00 V£> c o •r— s- rd Q. O O C71 =5 O S- c CD cn SPR 63 II q; -r- •<- on a. r o tsi o i- a. E o o c CD CD =5 OJ (T3 GO S! -o c <0 CM I > 3 CT> 64 The M/M/l/K approximation does not yield a significantly better fit to the exact curve than does the M/M/l approximation; therefore, the M/M/l approximation was selected for the following reasons. First, as mentioned earlier, the major concern is tracking the exact curve and not in duplicating it Since both curves track well, choosing the best fit was not necessary. Second, the computational complexity of the M/M/l/K approximation is greater than that of the M/M/l. The M/M/l/K expression for the mean queue length [ALLEN 78] corresponding to (3) is p[l-(K + l)p K + Kp( K + 1 >] Q = - (1- P )(l-P< K+1 >) K + l 1/p-l (l/p) (K + 1) -l In a practical situation these expressions are evaluated by a computational device (computer or calculator), which introduces errors due to the use of approximation algorithms for exponentiation and the lack of precision (bits) when the queue approaches saturation. Also the lower computational complexity of evaluating (3) allows one to gain insight into the systems' operation directly from the form of the equations. Buzen [BUZEN 74] has compared the M/G/l queue as an approximation to the M/G/l/K queue. He has determined that except for heavy traffic (p ~ 1) and small queue capacity (K cr 1) that the relative error is small. Using arguments similar to the ones presented here, Buzen recommends the use of the M/G/l as a reasonable approximation to the M/G/l/K queue. C. Error Analysis of Approximation The sensitivity and magnitude of the error introduced by our approximate SCS model compared to the exact SCS model is investigated. Since the form of the exact model is so mathematically complex, a direct analytical comparison is not feasible. The alternative is to numerically evaluate the two models for corresponding parametric values and compare the results. 65 The problem in attempting this is that the combination of all possible parametric values is infinitely large. Therefore, a reasonable and representative subset of values will be selected. Using Fortran programs developed to implement our algorithms, and this set of chosen parametric values we will compare the performance measures presented in the previous section, specfically, the throughput and mean queue length of the CPU and SPR. Note, that the throughput measure may more correctly be referred to as effective departure rate. Since the device busy probability performance measure is directly related to the throughput by a constant, comparing either one to the corresponding exact value would yield identical results. A balanced system(identical ICSs) is assumed for simplicity. Due to the assumption of a balanced system we may drop the added burden of carrying extra subscripts to distinguish between individual ICSs, as the notation below indicates. This notation simplification results in previously defined vector elements (i.e. Jj and Sj) now being denoted by their vector notation (i.e. J and s). For both the exact and approximate models the following are the pertinent parameters and their complete allowable ranges (see Appendix C for the simplified notation): 0 .80) and approaches saturation (p SPR ~ 1), the relative error tends to become small. This is consistent with results obtained by Buzen [BUZEN 74] in his use of single server approximations. It should be mentioned that in applying this approximate model, if this saturation condition occurs, one immediately knows that this device is a bottleneck and is causing serious problems. 76 0CCPU3 °»1at1v» Error •C.e „ -0.2 C.2 C.6 3CCPU) S«Ut1vt Error ■ i- • i- i .: 2: «..J . '.-} 3»»s 5 1 1 : 3 '. S«2 g 4iJ : 3?<6 1 1^22 155*12 22:3: 6»1 3 22 3! 1 oJJJ 2 3652 rs: 23121 2 2 12:3' : n:i2 42 I n 1!32 2 131 '2 12 1 1' 1 5« 2 1 131 1 1122 2151 SI 1 2 3 1 «21 12 1 3122 123 1 112 1"3 311 1 111 1 1 1 1 1313 1 1 1 131 ] 1 11 32 111 13 11 51 22 22 1 ii ; ; 21 1 22 1 1 12 121 21 1 1 21 1 13 1 1111 1 21 1! 1 11 132 I 121 3 1 2 It 1 11 32 1 1 231 2 1 2« 1 11 1 2 1 1 1 1 21 1 11 1 11 1 1 1 2 1111 21 1 11 2 2 1 1 11 1 221 « 1 1 13 31 122 1 111 13 1 1 11 1 112 1 1 1 1 2 2 1 11 mi 1 1 111 5 1 1 « 1 2 11 1 21 1 1 1 1 !• 11 1 1 2»J 1 1 1 2 1 11 1 1 112 22 21 1 1 11 1 1 331 1 1 1122 1 12 5 112 1 1 32 11311 1 1 372 a 11 »»» I6»«133 11 1 1121 1 31 11 2 2 3 1 2 3»2 !«•• 2 232 1 1 11»« 1 J«.«J 57B1 ...5 3 31 l»7 13 3»»33 '.it •91 1 11 ••1 3 3 3 »• ITS •• 3 c — _ c _ _ — ft. ■ c • o • «» m u O 3o = .» 2 c o a •— <2 a, u a o c a i_ p > o i > .op 77 TCCPUJ R«l at 4 y« Erroe TCCPU) Reletlvt Error 16 IT ie J c 20 z: 22 23 2c 25 26 27 2« 2« 30 I! 32 3:. 5' C2 «3 «7 se ce 50 SI 52 S3 S« 55 56 57 se 5« 60 61 . 62 63 6« 65 66 67 66 *6 70 71 72 73 Te 75 76 T7 76 70 SO ei 82 S3 6o 95 •6 67 se ♦ c *1 -2 •3 • • •5 66 0.2 »»♦ £ * $••535 ee«3 « »7«*7l 90.U2 !5»3 e 11131 5"<3 812231 126121 3272 636 c 2 7 2 2121 1 36*311 «3323 ail 2 6221 13521 11711 525 1 2211 32'1 1 11 2 2«a 2 3* 332 21222 223111 2« 2 S 2:311 211 12 1 2211 13« • 1 2333 312 1 31 • 3 • S3 «32 1 21 2 2 111 21 1 1222 1212 1 221 21 12 2 31 1121 2 21 211 2 211 32 21 1232 11 23 1 0.6 ■0.2 C.2 c.t 221 221 2 1 1311 21 22 131 21 1221 152 21 32« 332 113« 1325 21t 2 t a;. 87r 2 6 »2 »l ]• 1» «7 !• •3 !• 3« 2 ; 3 ; a ; 5 • 6 j • • ! j a I It • 11 I 1 1 '.3 ; it I 15 - 16 I 17 1 If ? 16 i 20 • 21 i 22 i 23 i 2« i 25 • 26 : 27 : 28 : 26 : 30 • 31 i 32 j 33 : 3a : 35 ■ 36 « 37 : 38 i 36 * ©>• Q< » 63. I_ >•—»•» J3^a-« "3 *- ft ** ft »> O 3— 3 — kZ *2 • «. o o ,o c: ••■OB • o c - c 3 — 5 • » • -IE 0. •— o o • • aX I_- Q. ■— O > o I > 3 op 78 3CSPP5 Relative Error GCSPP) Relative Error 1 1 2 1 1 l 1 1 2i i i ? i 2 ) 12 1112 l« 0.2 J«J»»f 7 )i>uii J. .J < B6«5*l 6'«H2 56«J « 13131 1"53 442132 Q7121: 3'2 at 22 «5 11 ■ 4 1 1 TSSlll »e322 5 2 t 62111 12351 13511 T 5 1 1 2 3 13321 1 1 1 11 1 2<3 11 1213 S21 13122 332 1 1 1331 1 1212 2 I 11 1 1 11 II 1 12221 1 211 1 21«12 < 2 I 31 31 2 3121 1 231 11 1 1 1 11 12 1 1 II 1 1 1 111 111 1 11 1 1 2 1 1 1 1 1 12 11 3 12 •11 11* 1 «1 1 1 112 111 1 1 3 2 2211 51 23 5 11 TS !• 12l»l 1 3 11 121 52 J*» • 1 531 • 1 • 2 • 1 • 22 • «2 • « • a ••1 •5» 2 12 3 1 1 12 1 1 I 1 1 1 1 1 1 2 1 1 1 I 1 12 1 1 1 11 11 1 1 11 1 3 11 II 1 12 1 11 I 1 3 1 1 1 1 1 1 1 1 II 1 I 1 1 1 1 ! 1 1 3 2 1 21 11 -Ct •C.2 : .« c — — — 33 • « ■ C — — • c c cmxtnz — ** «■ M 1*1 m — •*>■ — "*-•—•»♦%* •»■ — O aw O • c * c • «- — ■©--—*=> — © o • • o • w * v - ** *t — c u. • • • *> *5 O— O — c o> o> O • eV a. Q. > ••« • e> Xi «y x» t^ <- # o e «r O 3 — 3 m t.2 .Z a lata t_ o O a • O c • c > 5 • * — a. m *# a> g£ - r z Q- •— • o oo • • a. X a CL o > 03 3 > o t* 3 00 79 T(SPQ) "elet «f .© « O 3— 3 — > r • V5 w C 'o a. — O Q. o > 'S O i > 80 TCSPR) Ralatlve Error ■c.» -c.2 C.2 2««2 tm l»e 2231 523 21 63 32 i 3 2 2 23 2'l a 22 2 3 22 1 2 I 11 21 2 1 11 13 11 1 1 2 2 1 1! 1 1 11 1 1 1 1 1 1 1 1 1 1 1 1 2 11 11 1 1 1 1 1 1 11 1 I 11 12 11 1 1 1 2 112 2 2 2 11 a oi l la 31 2 t 7 32 S 6 6 ■C.o (S o: ?) °al at 1 ve Error •0.2 :.; c.t ■ i. © ^ »■. c — — • c e X — c — c ec • • « • c — o — X> t* -C f— e a— 3 — eal as c/5 a o > 00 81 T(SPO) R«l«t e> « 4. ». > • m i L h e- © o => 3 r © © «J • o- m c C c > a • as a. — • * K o w Q. C fc. *J c •— o o > '■5 *o ■ > i 82 TCS°P5 Rf1«t1v» Error T(SP°) Relative Error . r • " • a •t.i :.2 :(•! jseji i « 33S2 23J 4 7 1 11 <.231 "11! 1221 3 225 2 3 1 111 1 212 1 «l2l 131 I 1 112 11 IK 2' 22 1 21 II 1 1 1 1 111 1 21* 2 13 1 12 2 1 21 1 1 1 11 ' 1 11 11 11 21 1 1 2 1 il 2 1 1 1 121 I 121 1 3 1 1 1 i I 3 a 111 1 1 11 11 1 1 12 212 i : 2 5 1 5 5 » 2 11 21 la 1» 12 Ift • t 1 • 2* •» - .e 1 — 1 — 3 3 • re C mm mm — * i/i • i\j O • o • «j — — C m> C a. • • • • O D— 3 — *. 2 • Z • > 3 83 T(SPR) «e1it1v« Error T(S<9?0 Relative Error t - 1 2 T. J i i ; 5 • t : * i ( • 9 t IE • n i u i 13 i 1' : 15 - 17 ! It ; 19 : 2D • 21 i 22 i 2« i 25 • 26 i 27 i 2! : 2" : JO - 32 i 3« i 35 • 36 ! 17 I 3* 1 3" : 30 • 31 i 33 : 33 : 36 i 37 i 39 T SO • 51 I 5? 1 S« I 55 • 56 I S» I 60 • 61 : 63 i 65 ■» 67 ; 66 i 70 • 72 j 73 i 73 i 75 • 76 i 77 : 76 i •1 : 52 i S3 i »« i 86 i 67 i 68 * 61 : 90 • »1 : 9c i 93 i 93 : 95 - 96 I •0.2 0.2 — :—— ....-;. 215 2 1331 1 1221 1 1IJ1 11! 1 2 3! 1 122 1 11 11 Mi l l 1 2 I 3 1111 II 1 1 2 21 1 12 1 311 21 1 2 1 112 2 112 1 1 III 1 1 1 1 1 1 11 1211 1 1 11 1 3 1 1 1 I 111 1 1 1 1 1 i 1 I 1 1 II 1 2 1 1 2 1 1 11 11 2 1 11 21 I 1 11 2 2 2 11 23 S 6 32 3 5 10 11 12 II 1« 15 17 IS 19 20 21 22 23 25 26 27 26 29 30 32 3« 35 36 37 38 39 10 81 C3 S3 3« 87 39 50 51 52 5« 55 56 59 60 61 63 65 67 66 70 72 73 73 75 76 77 78 81 82 63 63 66 67 66 69 90 »! 92 93 93 95 96 ■0.6 -0,2 8.2 .. I ......... i ......... j .. 0.6 1..0 ..I.-™....! ■ a f E ■ 3 3 e a at — — O #*v C* > at ■ ~ « a — S 3 c mx —X O • V • o - - c - c a • • • • > •«* . o a » .o « *. E — E * O 3— 3 — w Z >z . a. VI o O "5. o o > 3 2 00 84 TcSPR) R ( 1 lt lvt Eppop •1.5 7 e « it i: 12 13 1« 1? 17 1* 1* 20 £2 2' 2? a* 27 it 2« 31 33 3t 35 3* 37 3* to *\ <3 £* S7 tl S2 5* 5? 5« $• fcC 61 65 »7 t>l 70 72 T3 7« tl S2 (3 ss «* e7 »e »» •o *i •2 «3 »5 1 t 1 2 12 2 1 2 c 22' l«3l 1231 HI 112 1 2 21 11 211 1 11 111 111 1 1 1 « 1 1 1 2 1 211 1 1 1 2 • ! 1 lil 112 1 1 1 1 1 1 1 1 12 1 I I 1 1 1 1 1 1 1 1 1 1 1 1 I I I 1 2 1 2 1 11 I 1 I 1 2 1 2 2 ! I 1 .o »l,o c T(SPS) Rel itWi Eppop -c.t O 3— s — 2_ in si O o c o o o > 3 00 85 3CSPR) Relative Error ■Die -0.2 0.2 7 J 9 1*/ ; ; 12 13 l« 15 le 17 le I 19 I ?o • az : 2J : 2* : at : 2t : 30 - si : 52 ! 33 : it : se : «e • c ; • u2 : «3 I st : £5 • '6 ; at I si : S3 : 5« : ss » S6 : ?7 t s» : 6i : 62 ; »« i 65 - 6» : 67 T 7? i 73 : 7e i 75 • 76 I 77 : 76 : 7* : »c - ei : 62 ! t" I BS - it : •7 1 ee : s« : 90 • 91 I •2 : •3 I 9c i 95 • 9* I 2 22 3li 131 35 11 a 2 I 2 1122 12 11 121 32 22 2 1 I 1 2 1 1 11 21 1 t 2 1 122 1 32 3 1 1 1 1 1 1 1 1 1 11 t 2 1 13 2 11 2 1 1 1 1 1 : i i 21 13 1 • 1 1 1 1 1 3 8 1 1 a •2 2 11 1 • 1 1 5 1 91 62 • 3 11 « 1 .9 «5 • ft* I «7 I «8 : 51 : S3 i 5ft i S5 • 56 ; S7 I 39 1 61 j 62 : 6< i 65 - 66 T 67 i 72 i 73 i T« i 75 * 76 • 77 i 78 : 79 i 60 • ei i 82 i *« i 85 • 86 ; 87 ; 88 : 89 i 90 • • 1 ; 92 i 93 : 911 ! •5 - 96 I •3CSPR) Relative Error .6 -C.2 0,2 ; ......... i ......... ; ... 0,6 l.C • £ £ x • • • • «: o — o — CL 00 o =» o > a. • «- a. >•*-»•* t- £> c- X- •* <2 »- e o e »r» O 3 — D — t-2 »2 • -^ — ■ o- • a. • o c • c (/2 — a • « * a Q. u- O t- o > o 3 00 86 2C3PB) Ctlitivi Erpop Q(SP8) R«lat1v« irtor •1.9 •C.» ...I — •0.2 0.2 t.k i,( -i . ........ i ......... j ......... ; !• i 16627 6 - ««S« I ; S3 « 2 : asti j : • 112 < : 63 « 5 - 131 t. : 2S3 i j 132 t * 131211 ' i 31 10 - 12 22 n : 21 11 12 1 1 1 1 13 I 12111 1« 1 222 15 - 1 1 U t 11 11 17 ! 11131 ie I 1111 l"» I 1 1 20 • 1 2 2i : 11 1 1 22 2 I . 11 23 I 1 1 1 11 2« I 11 1 25 - 1 26 ! 11 21 27 I 2 1 1 26 I 11 1 2* I 1 2 30 - 1 1 3i : 1 1 32 I in i i 33 I i 38 I i 35 - i i m 36 I 2 36 I 1 3» : 1 1 at . 11 11 81 I 1 «2 : 1 1 11 <3 ! 11 1 • 5 - 12 11 1 «» : 1 1 1 «7 1 1 86 I 1 «9 : 1 so - 1 1 1 51 1 I 1 1 52 I 21 S3 I 1 1 S* I 1 1 55 - 2 1 st : 2 1 1 S7 I 1 1 56 I 11 1 5* I 11 1 1 1 60 - l» 1 61 I a 62 I 2 63 1 « 1 I 6< : 11« 65 • 1 • 66 I 1 1 1 67 2 I 1 6i : 1 6« : 1 11 71 2 1 2 1 72 ! 2 1 73 : 21 1 78 ! 3 1 75 • 3 1 76 2 15 1 77 2 * 1 1 76 2 • 1 7* I 1 61 I 1 1 62 2 2 11 1 63 2 1 »8 2 21 66 2 « 11 67 2 SI 1 6! 2 • 11 e« 2 • 22 «0 - 13 •1 2 27 •i I •a «3 1 • 1 c« : •0.6 .-I — •0.2 ...I.. c — — * c • c • o O m © o •-ftv ■ ft. • • O • «/ * II c C K • • • T o t* • ■— > e > o >• » W a. u. > • l> • »■ a J3 Wo .O «- c *«n O 3 3 ft. o z o at w o~ ■ m • C/5 • o > X — AC c c 41 • a X o <*_ ■ * S. o a *■» o a. >*. 6W o o "5. • Im o O i— o > o si 3 00 87 QCSPR) R«Ut1ve Error •o.* -c.i OCSPR) R«1ativ« Error 7 f G ic J! 12 13 t« • « 16 17 16 11 26 2: 22 2! 2« 2? 26 27 28 2» 35 11 1? 11 1* IS 1* 37 :e 36 SO «1 62 «3 aa as a. 67 at 50 II 52 53 5a St 57 5* 5* 60 • 1 62 61 *a 65 66 67 61 76 7'. 72 73 7C ?«. 76 77 78 7* 60 81 62 83 ea 8S 86 87 88 8* 66 •1 62 61 6a 65 66 C.2 C 6 1 -. - 2 5111 sas U 256 : s i ; : 213 i: 226 11 11 '221 12: :2 is 251 a 11 1 22 1 SI 1 21 1 2 1 11 1 11 .113 3 3 1 6 1 111 111 1 1 211 1 12 11 11 1 11 1 1 1 1 1 1 111 1 1 1 11 1 2 11 11 J 1 1 1 1 11 1 - 11 2 2 1 1 11« 1 ai 1 21 2 11 2 1 11 7 !»• 1 2 1 1 31 5 :i 6 •1 6 •1 81 •1 •11 1 •11 •e -1 . 6 -C.» -0.2 6.2 C.t 1 c • • 2 1 . i : • * ! • 5 • • At At t : • o m m 7 I • o ■ • i • • o o « ; <- ■ ■ 1C • . C £ ■ a s ii : ♦ t s 12 1 ♦ wm u m ii : ♦ •— m • • X X i« i • 15 ■ * 16 : • e m •* 17 1 t *" ~ ~ 18 1 ♦ o o i6 : ♦ M 1 1 > ■ w 26 ■ * m C ■ — 9 9 2i : ♦ • ■ ■ 22 ) * C — — • C C 21 1 c mx «>X 2a ; ♦ — r* »m 25 « » V O -* »M 27 : 4 28 ] t • C • C • 26 ] t 10 • • • • O • u • w ii ; ♦ X •• C m C 12 : a • • • • o — o — ii ■ t tm S- la : * e > o> 35 « • • «. a. 16 * X *■ Jim 17 : «. e e- e r* 18 ] • © 3 O 3 — 16 ] • «. o o ao • ♦ -»•©-■ ai ! • • o c • c A2 1 • — 5 • » « ai 3 ♦ X-S 1 aa ] ♦ — — s • • tt *5 < • BE *6 ; . -* © e «7 ♦ ai ♦ 6B 56 • ► ♦ O 51 » 52 : ♦•' 51 1 ♦ 5a ♦ 56 ♦ 57 ♦ 58 • 56 ♦ 60 • I * 61 •- 62 * 61 * 66 * 65 < * ♦ 66 ♦ 67 ♦ 66 ♦ 76 > • 71 ♦ 72 • 73 7a 75 * 76 77 t 76 76 80 ■- # 61 ♦ 62 81 ta 85 66 t 67 (8 e» * 66 . • •1 • 62 • 61 ! • • • • •5 . • 66 • l > .1 88 Q(SP«n Rilitlvi Error ■ ; .e -c .2 1.2 c 1 '.32 23: I '. 12 : ill : sil: : :22 : i : : i 2 11 i i 12 i: 12 i 2 1 1 11 111 1 121 311 3 1 2 1 22 2 :2i i i 311 1 2! 1 1 1 1 11 1 1 1 1 1 11 1 1 11 2 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 2 1 1 21 c i Si * 1 • 1 1 121 i 1 2 c •• 11 • 53 Q(SPP) p t 1 at i vt Error ■ C , e - c , i : , t C — — c \r> 3 — I — u. • ft. » O • u • o 5E *m C ** C .3 P- J W O 3 — 3 »M ,5 00 i > CO 89 3CSP«0 Relative Er^oc QCSPR) Relative Erro* -C.e -0.2 10 11 IS 13 1* IS 17 It 11 26 22 2« 25 26 27 26 2" 31 33 T 3* 35 36 37. 31 66 «1 «3 At a7 «8 52 56 55 5» 5* to 61 65 67 bl 70 72 73 75 61 82 S3 es 66 n «e 81 10 11 12 '3 la 15 16 2 2 1 1 1 1 2 12 11 32 1 1 1 11 1 3 1 1 1 111 « 1 1 1 11 211 I 1 1 1 1 I 2 1 HI 1 1 1 1 1 1 1 1 ) 1 1 1 1 11 1 1 1 1 11 13 •1.0 -0.6 -0.2 0.2 : c • i : * 2 I ♦ 3 ! * o ! * 5 - * t : ♦ 7 : ♦ e : ♦ ' : * 10 - + ii i ♦ 12 I ♦ 13 I * 1« I ♦ 15 - * 17 I *- It I ♦ 11 I ♦ 20 - * 22 I * 2« : t 25 - * 2o : ♦ 27 I ♦ it : * 21 : 6> 31 ! ♦ 33 : * 3< : ♦ 35 • ♦ 36 I ♦ ■ 37 I ♦ 31 I 4 40 - + 61 I * «3 : •> 66 : * «7 : * 66 Z ♦ 52 I ♦ 56 I • 55 - ♦ 5( 1 ♦ 51 I ♦ 60 - t 6i : ♦ 65 - ♦ 67 I ♦ 66 I 6> 70 - ♦ 72 1 ■ ♦ 73 I •> 75 - ♦ ei : ♦ t2 : t 83 ! ♦ 85 • ♦ »6 I ♦ 67 I * 66 I + ei : ♦ 10 - ♦ n i "♦ 12 I ♦ 13 I ♦ la ; ♦ 15 - * 1» 1 + 1.8 ....I «• • « — 33 * * ■ C — — O • •> ' B x - c - c cr • • • • o— o-— ja i* xt tf* O 3<* 3«w at o- V3 a, 1/3 a o o a. s o > Cri vd i— i i > 3 90 -t-> i— O --» CM CM II II II or Q. CO a. •r— ■"3 eo O Ln Lf> LO LO CM cr> r^ vo «* CO in X^PU 91 X • CM C\J II II II cc .p. •i D- O I/) V) Q. c o u o I o o: o Q. CM ZD > ZD O S- O cu \S> 3 > «g* CXL Q. «^ 1/1 •— ZD "O C\J c +-> O X 00 s- CD 00 \o C\J CPU 92 CM CM II II n cc .,_ •i D_ n CO co o. o » o 4J r- o ^ to s: x CM LO OH co CO > a. co Cr s- o 4- oo CD > a: ^ to — o -o c u X cr> i cu s- C7) CM O CM CO VO CM CO VO CM o to cu > CC i— ■D C U re X o CM I CU s- en •r- CO T SPR CM 94 From these relative error plots and statistics tables of the individual parameters, one can see that this approximation does possess some parameter sensitivity. The throughput performance measure does not indicate any sensitivity trends for the p SpR and s parameters. For the J parameter a significant decreasing trend in mean and variance of the relative error is observed as J increases. No definite trend for the R parameter can readily be detected. Although the overall mean does decrease as R increases, the variance and the mean for those points not near device saturation do not. The p SpR parameters are observed to behave similarly . As a result, the throughput performance measure exhibits some sensitivity to the J and R parameters. In contrast to the throughput performance measure, the mean queue length has lower relative error values, approximately 12% for the CPU and 15% for the SPR. The mean queue lengths predicted by the approximate model underestimates those of the exact model for low values of p SPR and overestimates them for high values of p SPR - This accounts for the lower mean relative error values. In a manner similar to throughput, as saturation is approached the relative error becomes small. The variance of this relative error is higher than that for throughput, approximately ±14% vs. ±6%. This can be seen from the scatter plots, figures IV-3 through IV- 16, especially for mid-range values of p SPR . This implies that the mean queue length performance measure is more sensitive to our approximation than is the throughput performance measure. This is consistent with results obtained by Buzen [BUZEN 77] in his use of single server approximations. Observations of the limited relative error statistics from the tables and plots indivate some trends and sensitivities, although they are inconclusive. The mean queue length performance measure does indicate a decreasing sensitivity to the s parameter as this parameter increases. This may be credited to these PPUs handling a larger portion of the workload and, therefore, the CPU and SPR are less heavily loaded. Both devices do not indicate any sensitivity to the p SPR parameter. The J parameter indicates a definite sensitivity trend in both mean and variance. As J increases the relative error decreases. Both devices indicate a trend for the u SPR parameter, but in opposite directions. The CPU demonstrates a lower mean and variance as u SpR increases, but then as u spR becomes relatively fast the mean and variance tend to reverse and increase. The SPR demonstrates a totally opposite response. The mean and variance increase with increasing U SPR an( * t * ien decrease. F° r the R parameter both devices demonstrate a trend, but again exhibit opposite reactions. 95 The CPU exhibits a decreasing mean as R increases, while the SPR trend is an increasing one. The variance of both devices do not indicate a clear trend for this parameter. Therefore, the mean queue length performance measure demonstrates a higher sensitivity to the J and R parameters than the throughput performance measure, and is also sensitive to the u SpR parameter. In addition, the CPU and SPR demonstrate opposite sensitivity for the u spR and R parameters. An error analysis of the approximation has been presented to aid designers and analysts when they apply this approximation. Although the error analysis is by no means elaborate or conclusive, some preliminary trends and sensitivities have been identified. This by far exceeds the error analysis presented to support other approximations in the literature. Further work to establish an accurate error function which incorporates all these parameters is still needed. In conclusion, we feel that the approximate SCS model provides reasonable results with small computational requirements. The approximate model computation times are on the order of a few seconds vs. minutes, hours, or even days for the exact model. The throughput values computed by the approximate model are always less than or equal to the exact values and on the average 20% less, ±6%. While the mean queue length underestimates the exact value for low p SPR and overestimates it for high values of p SPR , the average is approximately 12% to 15%, ±14% , with the greatest variation occuring in the mid-range of p SPR . 96 V. ANALYSIS OF MODULAR EXPANSION A. Exact Analysis In chapter III, a queueing network model was developed for an architecture consisting of independent computing systems (ISCs) sharing a single device. In chapter IV, a much less complex approximate model for this type of architecture was introduced. Of interest for this architecture is the effect when the system is incrementally expanded by the addition of ICSs. Expansion of this type places a heavier load on the shared device, causing degraded service to each ICS. This introduces a dual problem. First, for a given configuration and a specific expansion, what is the degradation in service that results? This can be determined by using either of the models to compute any of the previously discussed performance measures for both the before and after cases. By comparing these measures against each other, as well as the requirements of the facility, one may determine if the degradation is significant and acceptable. The second problem occurs when it is determined that the degradation is not acceptable. The alternatives then are to either forego the expansion or augment the shared device to increase its processing rate. The problem is then one of determining the amount by which the processing rate of the shared device must be increased to maintain the current level of service being delivered to each of the ICSs. Both the exact and approximate models are used to develop corresponding relationships between adding ICSs and increasing the processing rate of the shared device. A before and after comparison of a response performance measure for a modular expansion of a balanced system is considered. The performance measure of interest here is the mean cycle time. This is the mean time of a renewal interval. This interval begins when a job enters the CPU queue, and terminates when that same job next enters the CPU queue again. This is a measure of the average time spent at each device (both waiting for and being processed), weighted by the probability of visiting that device. Each job, in general, will require many different cycles to complete its processing task, of concern here is the mean time for this performance measure. 97 From Little's result (W=Q/a), the mean wait time (W) spent at a device (in queue and processing) can be determined if the mean queue length (Q) and the mean arrival rate (a) of that device are known. The mean queue lengths for the exact SCS model, from (40) and (41) of chapter III, are J i (1) Q H = 1 I x H k G(J-kd) . for ij*0 , and G(J) k=i K R (2) Q = _i_ 2 k! k { 2 n h(r;J r -n r0 )}. G(J) k=l R r=l 2n i0 = k i = l The departure rate of a server whose service process is exponential is equal to its arrival rate [BURKE 56, FINCH 59,BURKE 72, MUNTZ 73, KLIENR 761. As noted before, the throughput performance measure is actually the device departure rate, and from (43) and (44) of chapter III is (3) T H = u n A H = GCr-d jL e n , for i,j*0 , and R n giCJj) i = l (4) T = u o A = u { 1 - } G(J) Therefore, an expression for the wait time at a device is (5) Wjj = Q y /Ty , for i>0 and j > . From this an expression can be formulated for the mean cycle time of a job assigned to an ICSas 98 s i W. = S p. . W. . , for i > and j > j=0 (6) s i j=0 S ! = Pi, Qo /T o + PuQu /T u + 2 Pu Qy /Tij j = 2 _ Pi.SPR QsPR ^SPR + Pi,CPU Qi.CPU /T i,CPU + 2 P ij Qij /T ij Given an SCS system consisting of R ICSs, the desire is at least to maintain the same mean cycle time after a modular expansion resulting in an SCS system of R' ICSs. It is assumed that this can be accomplished by increasing the mean processing rate of the SPR, and further that this increase can be expressed as some multiplicative factor fi. This relation can be expressed using the mean cycle time performance measure as W^u^R') < WjCugpR, R) , or (7) s i Pi,sPR Qspr(£ u spr' r ' ) /t spr(# u spR' r ') + Pi,cpu Qlcpu ^.cpu + 2 Pij Qij 'Ty j=2 s i ^ Pi.SPR QsPR ( U SPR' R ) /T SPR^ U SPR' R ) + Pi.CPU Qi.CPU ^i.CPU + 2 Pij Qij ^i j j=2 where R' > R > 1 , and (8a) fi = a R7R , or 99 (8b) £ =1 + a (R'/R - 1) By successfully increasing the SPR processing rate to handle the incremental load of R'-R additional ICSs, it can be assumed that the wait at each device within each ICS remains the same. This results in (7) reducing to the wait at the SPR only, which is Pi,SPR QsPR^ U SPR> R ' ) ^SPR^ U SPR' R ' ) ^ Pi.SPR ^SPR < U SPR' R ^SPR^ U SPR' R ) (9) QSPR^ U SPR' R ' ) /T SPR^ U SPR' R ' ) ^ QsPR < U SPR' R ) /T SPr( U SPR' R ) By reformulating (4) and substituting equation (31) of Chapter III we obtain R n gjdi) i = l T SPR = U SPR A SPR = U SPR i l ' J G(J) K R (10) =u SPR J_ 2 k! { 2 n h(r;J r -n r0 )} G(J) k = l R r=l 2n- Q=k i=l' Substituting (2) and (10) into (9) a complete expression for the inequality is obtained of the form 100 K' R' _J 2 k'! k' { X n h(r;J r -n r0 ) } G(J') k'=l R' r=l En i0 = k i = l K' R' ft u spR _1_ 2 k! { I n h(r;J r -n r0 )} G(J') k*=l R' r=l 2n i0 =k' i=l K R J_ 2 k! k { 2 n h(r;J r -n r0 ) } G(J) k = l R r=l i = l Sn i,0 =k < K R J^ I k! { I n h(r;J r -n r0 )} G(J) k = l R r=l U SPR Sn. =k i=l Expanding the h functions in the above expression, from their definition in chapter III.B between equations (14) and (15), results in K' R' X' r /r,0 _±_ i kM k' { x n g r (V n r,o)} G(J') k' = i R' r=l n r0 ! 2n i(0 =k' i = l < K' R< K>\0 fi u SPR _J_ I k' ! { I n g r (J r -n r ) } G(J') k'=l r' r=i n r0 ! 2n i0 = k i = l 101 K R X AO { 2 H gr(V n r,0)} 1 2 k! k G(J) k=i R r=i n r0 ! Sn i,0 =k i=l K R Xr0 n r,0 U SPR _1_ I k! { 2 n g r (V n r,0)} G(J) k=l R r=l n n ! r,0 ) =k i=l 2n i,0 =k vhere K' = R' J. K = RJj ) J' =(J ls ... ,Jr.) , J =(J l5 ... . Jr) . J. = J. i J , for all iandj X' A.O = A r,0 ( x r //2)\0 and Cancelling similar terms, cross multiplying, and moving everything to the left hand side of the inequality results in K R X r /r,0 { 2 n g r (j r -n r n ) } 1 k! k=l R r=l n rQ ! 2n i0 = k i=l (ID K' R' X' AO fi 2 k'! { Z n g r (J r -n r , )} k' = l R' r=l n rQ ! 2n ij0 =k' i = l 102 K { R X „ r.0 2 k! k I n k = l R Sn i>0 =] i=l r=1 n r,0 ! c K' R' x' A.0 K A r,0 2 k' ! : k' { 2 n k' = l R' 2n i,0 = r-1 n r0 ! = k' S r (V n r,0) } J r (J r -n ri0 ) } r-l n r0 ! 0= k ' i = l Inspecting (11) we notice that all terms may be moved into the innermost summation. Noting the similarity of the summation in both numerators and denomenators a simplifying notation is introduced. Let R X r /r,0 b k = { 2 k! n 8r(V n r,0)} > and R r=l n r0 ! 2n i,0 = k i=l R' X' r>0 n r,0 b« k . = { 2 k'! n g r (V n ,o)} R' r-l n r0 ! Sn i0 =k' i = l Substituting this notation into (11) yields 103 K 2 b k k=l K 2 kb k k = l K' fi 2 b' k , k' = l K' 2 k' b' k , k' = l Separating the common outer summation term results in K (12) 2 b k [ ] < k=l K' K' P 2 b' k , 2 k' b' k , k' = l k' = l The b k 's of this inequality are always positive. Therefore, for the inequality to be satisfied requires the inner term to act as a weighting function and force the entire expression to be non- positive. Although a solution may exist for /? which will satisfy the equality, no obvious method of obtaining it is apparent. In an attempt to satisfy the inequality and obtain a lower bound for fi we will investigate the situation when the the inner term is always < 0. By inspection of (12) we notice that within the inner term, except for k in the numerator, the other terms are independent of k. As a result this inner term achieves its maximum value at the minimum value of k, which is 1. Ifthistermis < for its maximum value, it is < for all values of k, and the inequality is satisfied. This leaves the following relation for a lower bound solution for /J : k' k' 2 b' k . 2 k' b' k , k' = l k' = l ] < By rearranging the above we obtain 104 K' K' [ 2 k' b' k , - fi 2 b' k . ] < k' = l k' = l Repeating the previously applied separation process yields K' I b' k> [ k' — ] < k' = l Again applying our previous arguments we obtain a further lower bound solution to the expression, since b' k is always positive. This inner term achieves its maximum value at the maximum value of k\ which is K' = R' i^ . Therefore, the inequality is always satisfied if it is satisfied for k' = K'. This yields the following lower bound solution for /? : p = K' = R' J. Solving for a by substituting (8a) into the above yields a = R Jj . The interpretation of this result implies that by increasing the processing rate of the SPR by a factor commensurate with the total resulting number of jobs in the entire system one will be assured no degradation in response occurs as compared to the response prior to the expansion. Unfortunately this is such an extremely high lower bound that the result is not very useful. In retrospect, based on the results derived in the next section, if one repeats this procedure and differs only by substituting in (12) the maximum (K = RJ) rather than its minimum (1) value of k, the result obtained is : p = R7R and from either (8a) or (8b) a = 1 . 105 This result is much more intuitively appealing due to its linear one-to-one relation, but its derivation cannot be substantiated from the above equations. Although b k J ] > * b k f- K' K' k = l K' K' P 2 b' k . 2 k' b' k , fi 2 b' k , 2 k' b' k k' = l k' = l k'=l k' = l B. Approximate Analysis Analysis of the expression for mean cycle time of the exact SCS model has resulted in a dissappointingly high lower bound for /?. In this section the approximate SCS model developed in chapter IV is used to perform a similar analysis. We can immediately write a similar expression for (6), from (9) and (11) of chapter IV, as s i W i ~ e SPR- W SPR + W CPU f + 2 e PPU ; j W PPU- : 1 * »»J ^jj j=2 ^PRj^SPR QcPUj s i ^UyQpPUy (13) = + + 2 T SPR a CPU- J -2 CTUij 1 1J { Q S PR /R + Q C PU. + * QPPU: } ; SpR /iv -r VCPU- T *■ vppu 1 1J a CPUj i=2 106 The intent is to maintain the mean cycle time of a job after a modular expansion. Therefore, a relation similar to (7) can be established as W.08u SpR ,R') < Wj(u SPR ,R) , or (14) s i s i _i_ {Q' S p R /R' + Q CPU . + 2 Q PPU . } < __L {Qspr/R + Qcpu. + 2 Qppu- . } a CPU- j = 2 a CPU- j = 2 Assuming that the processing rate of the SPR is successfully increased to satisfy the above inequality, it can be assumed that the wait at each device within each ICS remains the same. This further implies that a^y also remains the same. This results in reducing (14) to -1- { Q' S PR /R ' } < — i- { Q S PR /R } a CPU i a CPUj (15) Q'spr /r ' ^ Qspr /r 1/R' < 1/R (fi u SpR /R' p SpRi aQ,^ - 1) ( u SPR /R p SPRi a cpu - 1) By inspection of (15) one can determine that by maintaining a constant ratio of /? Us PR /R' = C , where C = u SPR /R results in P = R'/R and from (8a) a = 1 , which satisfies the inequality and actually improves the cycle time rather than just maintaining it. This was conjectured, but not proven, at the end of the the preceeding section. Maintaining the assumption that a cpu remains the same, the equality of (15) is solved for the required fi. 107 R ( u spR /R p SpR . acpUi - 1) = R' {fi u spR /R' p spR _ a cpu . - 1) (/? - 1) = (p SPR . a cpu . /u spR ) (R' - R) Noting that p SpR = R p SpR a cpu /u SPR and subtituting it into the above yields i i 08 - 1) = (p SPR /R) (R' - R) . This may also be expressed as 0=1 + (PSPR /R > ( R ' " R ) (16) = 1 + p SpR (R'/R - 1) From this, and (8b), a = p SPR is obtained. To evaluate the accuracy of the wait time performance measures and, therefore, the accuracy of this modular expansion result, we present an error analysis similar to that of chapter IV. The relative error scatter and mean value plots of the CPU and SPR mean wait time are presented in Figures V-l and V-2 for the entire sample space. The relative error statistics for this performance measure are listed by parameter in Tables V-l and V-2. Representative comparison curves for the exact and approximate models are plotted in Figures V-3 and V-4. As can be seen from Tables V-l and V-2, this performance measure exhibits some parameter sensitivity. Some sensitivity is indicated for the s parameter. The CPU indicates a decreasing error as s increases, while the SPR show an opposite response. This sensitivity is attributed to the large number of PPUs spawned by increasing s, resulting in a greater portion of the workload begin handled by the PPUs, decreasing the SPR and CPU workload. Both devices demonstrate a sensitivity to the p SPR parameter. The CPU does not exhibit any trend, while the SPR shows an increasing error trend with increasing P SPR - Both devices exhibit a decreasing error as the J parameter increases. Both devices demonstrate an opposite sensitivity trend to the u SpR and the R parameters. The CPU shows an increasing mean error, 108 •C.t -0.2 :.2 c.« 1.0 -l.C • I- : :i 7»«fa 2 ;ii 7.t« !« 1 1 16»«1 21 -.132 11 7»5 11 223 23?* 12M 1 6«62 ;2 : : « 2 2252121 573 1 1 (13 225 2 11 63 3 it'. 2 7 J 2 1 5*5 u 2; 2 a 11 2 22 2 21 112221 12 3 11 2 2 2«3 221 21 12 1 313 111 1 52 1 1 311 31 1 1 1231 IS 11 2 1 I 11 1 ce 1 1 111 2 11 5 12 « 1 1 312 121 131 1 111 1221 2 1 11 1 3 1 11 1 211 1 111 1 11 1 1 « 1 112 1 12 11 1 1 1 1 2 21 1 21 1 12 2 111 23 1 1 1 1 1 131 2 1 33 21 1 2 11 11 3 1 12 1 11 121 2 2 21 1 1 111 11 1 1 11 1 122 1 21 1 • 1 1 21 33 1 1 1 121 1 111 3 1 1 1 1 1 111 1 1 2 a 2 1331 1 "1 3 1 1 1 O- £> O h. ft f» C-O o ro a -• « ■» • ■ ©■ ■ c • c 3 o •— p 3 o > o OS > I 109 ■o.» * -e.j c.2 p.* i 5 t2l«t«744 3 is 2»*2 W(SPR) SeUt1v« E^for •o.» j : ••1 : i »» c ; •>• c m »• • : * 7 I 3» c : 2* • : 3*1 10 • 2* ii : 1*2 12 1 26! is : 73 i» i 2*5 1! - n It I 161 IT I 2 T2 11 I 3135 1« ! 146 it - 1*1 11 I 1 U h : i ne 23 1 1 2 1 24 I 1 I1T2 25 - 1 1221 it : 71 i' I 22 131 2! ! 15211 J* : 143 1 30 . 1241 3i : 1 1 1! 12 I 1 111 33 : 3 2 1 34 I 3 c 1 35 - 1 121 3b I 1 1 2151 37 I 1 5 3t I 32 3* I 1 13 2 se - 1 421 «i : 1 1 2312 42 I 1 1 1 O I 1 1 11 4« I 12 4? - 111 1 &t : 11 112 1 •7 I 1 1112 1 a» 1 2 111 80 I 1 11 50 - 1 1 1 51 I 1 11111 52 I 1 1 I 11 53 : li 11 S« 1 1 in 55 • 1 11 21 5* : 11 1 1 31 57 : 2 1 112 1 58 I 1 1 5* 1 1 11 2 1 *« • 2 12 1 1 61 I 5 111 11 *2 : 4 1 1 63 I 2 1 44 I »1 11 1 2 1 65 • 2* 1 1 1 .* : •3 1 111 »T I 1 12 1 6* ! 1 1 1 64 I 3 1 11 TO . 1 2 1111 l Tl ! 1 2 T? I 212 1 12 T3 I 4 11 11 Tt I 6 11 1 T5 - 41 1 11 1 11 T4 I 23 11 TT I »l 1 1 TS 1 1» 1 12 1 1 T4 1 ••til 1 1 to • 1 ei i 3 11 1 *2 I 3 1 11 13 1 «12 11 1 »t ! 1*31 1 45 • • 1 1 46 1 432 2 4T I •51 1 tt I •1 1 H ! •3 2 1 l 40 - t»21 1 1 ♦ 1 I •121 21 «2 : •4 2 »i i • 1 »« i •2211 2 ♦5 - •64 •* I • - 1 ! 2 : j : 4 : c • » ! 7 : e : 4 : le • ii : 12 I 13 I 14 : 15 • 16 : 17 I It I 14 I 20 • 21 I 22 : 23 I 24 I 25 • 26 I 2T I 2t I 2« I 30 - 31 : 32 I 33 : 34 I 35 • 36 : 3T I 3t I 34 I 40 • 41 1 42 1 43 I 44 J 45 - 4* I 4T : *» : 44 J 50 - 51 I 52 : 53 I 54 I 55 - 56 I 5T : st : s» : 60 - 6i : 62 : 63 I 64 I 65 • 66 1 67 : 4« 1 64 I 70 - ti : T2 ; T3 I T4 I T5 • T6 1 TT I Tt I T4 I to • tl I ti I ts : tt : tj - t6 I 47 I tt 1 t« : 40 - 41 1 •2 I 43 I 44 : 45 • 4* : -0.2 0.2 -Z — o — — — a t • 4 ■ C — — ■ C C enzanx O • U * o X ** c *- C * o— © * - « > <*^-» • W 4. C > »— • * "5 AN .01*1 «.**>* * a* O DO SO a»Z X • * <2 -*■■. • V X •— "~o — — X • • OC K a. c/o *■» O O aw •*. a. £ m * O ■4—1 O Q. 0) _> J2 "3 i > o i_ 110 Points Mean Variance Minimum Maximum 1650/1158 -0.0795/-0.1001 0.0245/0.0309 -0.6388/-0.6388 0.3056/0.1515 PSPR Points Mean Variance Minimum Maximum 0.10 330/ 266 -0.0739/-0.0857 0.0203/ 0.0238 -0.4489/-0.4489 0.0798/ 0.0798 0.25 330/ 260 -0.0686/-0.0778 0.0247/ 0.0297 -0.5893/-0.5893 0.0787/ 0.0787 0.50 330/ 208 -0.1526/-0.2246 0.0378/ 0.0390 -0.5647/-0.5647 0.3056/ 0.0777 0.75 330/ 227 -0.0000/ 0.0075 0.0026/ 0.0021 -0.1042/-0.0947 0.2500/ 0.0970 0.90 330/ 197 -0.1025/-0.1416 0.0253/ 0.0352 -0.6388/-0.6388 0.1515/0.1515 J Points Mean Variance Minimum Maximum 2.00 600/ 458 -0.1169/-0.1447 0.0335/ 0.0387 -0.6388/-0.6388 0.3056/ 0.1515 4.00 600/ 411 -0.0682/-0.0818 0.0206/ 0.0264 -0.5352/-0.5074 0.3056/ 0.0685 6.00 450/ 289 -0.0448/-0.0555 0.0147/ 0.0196 -0.4723/-0.3813 0.0956/ 0.0956 s Points Mean Variance Minimum Maximum 2.00 550/ 387 -0.1000/-0.1275 0.0287/ 0.0353 -0.6388/-0.6388 0.3056/ 0.1515 6.00 550/ 389 -0.0420/-0.0492 0.0138/ 0.0175 -0.4914/-0.4914 0.3056/ 0.1515 11.00 550/ 382 -0.0966/-0.1243 0.0291/ 0.0365 -0.5893/-0.5893 0.3056/ 0.1515 R Points Mean Variance Minimum Maximum 1.00 450/ 450 -0.0786/-0.0786 0.0271/ 0.0271 -0.5814/-0.5814 0.0956/ 0.0956 2.00 450/ 383 -0.0787/-0.0927 0.0248/ 0.0277 -0.5811/-0.5811 0.1515/0.1515 5.00 450/ 213 -0.0799/-0.1287 0.0215/ 0.0354 -0.6239/-0.6239 0.2500/ 0.0816 8.00 300/ 112 -0.0815/-0.1575 0.0250/0.0431 -0.6388/-0.6388 0.3056/ 0.0670 U SPR Points Mean Variance Minimum Maximum 0.01 165/ 76 0.0089/ 0.0133 0.0069/ 0.0017 -0.1111/-0.0622 0.3056/ 0.1515 0.02 165/ 79 -0.0152/-0.0008 0.0011/ 0.0005 -0.1042/-0.0617 0.0426/ 0.0402 0.05 165/ 85 -0.0084/-0.0003 0.0007/ 0.0006 -0.1044/-0.1044 0.0798/ 0.0798 0.10 165/ 92 -0.0123/-0.0106 0.0018/ 0.0029 -0.2889/-0.2889 0.0787/ 0.0787 0.20 165/ 100 -0.0257/-0.0292 0.0056/ 0.0087 -0.4051/-0.4051 0.0777/ 0.0777 0.50 165/ 120 -0.0745/-0.0856 0.0159/ 0.0203 -0.5105/-0.5105 0.0956/ 0.0956 1.00 165/ 131 -0.1288/-0.1389 0.0272/ 0.0320 -0.5644/-0.5644 0.0771/0.0771 2.00 165/ 147 -0.1721/-0.1691 0.0390/ 0.0422 -0.5893/-0.5893 0.0805/ 0.0805 5.00 165/ 163 -0.1876/-0.1840 0.0466/ 0.0461 -0.6388/-0.6388 0.0828/ 0.0828 10.00 165/ 165 -0.1797/-0.1797 0.0445/ 0.0445 -0.5898/-0.5898 0.0819/0.0819 Table V-l. Relative Error statistics for W cpu , for both all p/p < .90 . Ill p Points Mean Variance Minimum Maximum — 1650/1158 -0.0621/-0.0656 0.0248/ 0.0346 -0.5974/-0.5974 0.4989/ 0.4989 PSPR Points Mean Variance Minimum Maximum 0.10 330/ 266 -0.0367/-0.0325 0.0188/ 0.0227 -0.5974/-0.5974 0.3547/ 0.3547 0.25 330/ 260 -0.0299/-0.0247 0.0241/ 0.0298 -0.5810/-0.5810 0.4402/ 0.4402 0.50 330/ 208 -0.0899/-0.1105 0.0239/ 0.0358 -0.5930/-0.5930 0.3320/ 0.3320 0.75 330/ 227 -0.0604/-0.0634 0.0266/ 0.0382 -0.5774/-0.5774 0.4989/ 0.4989 0.90 330/ 197 -0.0936/-0.1194 0.0275/ 0.0436 -0.5916/-0.5916 0.4262/ 0.4262 J Points Mean Variance Minimum Maximum 2.00 600/ 458 -0.1116/-0.1206 0.0370/ 0.0480 -0.5974/-0.5974 0.3547/ 0.3547 4.00 600/411 -0.0367/-0.0378 0.0187/ 0.0267 -0.3979/-0.3979 0.4989/ 0.4989 6.00 450/ 289 -0.0299/-0.0178 0.0118/ 0.0166 -0.3018/-0.3018 0.4262/ 0.4262 s Points Mean Variance Minimum Maximum 2.00 550/ 387 -0.0518/-0.0515 0.0265/ 0.0368 -0.5811/-0.5811 0.4262/ 0.4262 6.00 550/ 389 -0.0651/-0.0688 0.0235/ 0.0327 -0.5930/-0.5930 0.4072/ 0.4072 11.00 550/ 382 -0.0693/-0.0766 0.0244/ 0.0342 -0.5974/-0.5974 0.4989/ 0.4989 R Points Mean Variance Minimum Maximum 1.00 450/ 450 -0.1553/-0.1553 0.0401/ 0.0401 -0.5974/-0.5974 0.3714/ 0.3714 2.00 450/ 383 -0.0767/-0.0738 0.0172/ 0.0201 -0.3964/-0.3964 0.3759/ 0.3759 5.00 450/ 213 -0.0055/ 0.0462 0.0100/ 0.0142 -0.2500/-0.2500 0.4262/ 0.4262 8.00 300/ 112 0.0147/ 0.1103 0.0117/ 0.0140 -0.0946/-0.0730 0.4989/ 0.4989 U SPR Points Mean Variance Minimum Maximum 0.01 165/ 76 -0.1575/-0.2698 0.0208/ 0.0209 -0.5974/-0.5974 -0.0308/-0.0549 0.02 165/ 79 -0.1517/-0.2512 0.0223/ 0.0268 -0.5811/-0.5811 0.1923/ 0.1923 0.05 165/ 85 -0.1346/-0.2062 0.0283/ 0.0432 -0.5797/-0.5797 0.4110/ 0.4110 0.10 165/ 92 -0.1225/-0.1717 0.0278/ 0.0438 -0.5930/-0.5930 0.3184/ 0.3184 0.20 165/ 100 -0.1049/-0.1359 0.0279/ 0.0428 -0.5916/-0.5916 0.3126/ 0.3126 0.50 165/ 120 -0.0590/-0.0623 0.0215/ 0.0288 -0.5223/-0.5223 0.4072/ 0.4072 1.00 165/ 131 -0.0067/ 0.0001 0.0115/0.0137 -0.3395/-0.3395 0.4989/ 0.4989 2.00 165/ 147 0.0395/ 0.0443 0.0092/ 0.0094 -0.2571/-0.2571 0.4402/ 0.4402 5.00 165/ 163 0.0567/ 0.0553 0.0083/ 0.0083 -0.1921/-0.1921 0.4262/ 0.4262 10.00 165/ 165 0.0197/ 0.0197 0.0097/ 0.0097 -0.5172/-0.5172 0.3759/ 0.3759 Table V-2. Relative Error statistics for W sp) . , for both all p/p <.90. 112 +J i— to s: X \ CM CM II II II ca •r~ . Q_ '~3 oo c/> Q. o o CM o o IT) CkI ID CM I/O > D. o O +-> O Dd r— Q_ C I/O t/1 > o X O CO CD C7> — CM O w CPU 113 CM CM II II II cc •r- • Q. r O in oo Q. O O o . • a: CM D_ t/> => 1/1 > a • on r— o_ i/i o LT> Oi -^- Q. ^^, <•) S ID l/> CM > O X UJ ■ r ~ 1 > i- en IT) • r— O U_ CM O w SPR 114 while the SPR shows a decreasing error for increasing R. For increasing u spR , the CPU shows an increasing error, while the SPR first shows a decreasing error and then as u SpR gets large, it tends to increase. Therefore, the wait time performance measure exhibits the greatest sensitivity of all the performance measures. Although wait time possesses the lowest mean relative error, it also possesses the highest variance, and demonstrates sensitivity to all parameters. This is reasonable to expect since the wait time is a function of the other two performance measures and, therefore, compounds their errors. In this case the variance was amplified while the mean was attenuated. The interpretation of (16) implies that by increasing the processing rate of the SPR by p$p R /R for each additional ICS in the resultant expanded system, the mean response time would be preserved when compared to that prior to the expansion. It is noted that p SpR < 1 and, therefore, p SPR /R < 1. This means that the incremental increase in SPR processing rate (per additional ICS), p SPR /R, is only a fractional increase. This is a significant improvement over the upper bound derived in the previous section. In fact this is an improvement over the incremental increase of unity conjectured in the preceding section. The unity increase in processing rate is quite intuitively appealing. For each additional ICS added, the SPR processing rate should increase by 1/R from the current processing rate. The newly derived incemental increase of p SPR /R seems intuitively correct. Condider the fact p SpR represents the current total traffic intensity to the SPR, p$ PR /R represents the portion of that traffic intensity generated by an ICS. Utilizing this perspective of traffic intensity, this fractional increase seems more credible. Since each ICS generates traffic proportional to p SPR /R, each additional ICS would genarate additional traffic also proportional to p$ PR /R. Therefore, if the processing rate of the SPR is increased by that fractional increase in traffic, the response time should not increase. There is an additional implication to this fractional increase in SPR processing rate, proportional to the increase in ICSs. A system of this class may undergo one or more modular expansions, encompassing a large increase in the number of ICSs. The extent of expansion will in general be limited technically by the maximum attainable processing rate of the SPR, which is dependent, for the most part, on the existing technology for that type of device. The processing 115 rate of applicable devices will generally span several orders of magnitude and may include a number of different technologies. Therefore, depending on a fractional processing rate increase, rather than a unity or greater increase, will allow a given technology to support a larger range of expansion. This minimizes, or at least delays, the implementaion and investment risk of changing device technology in a given system. Additionally it will also allow for a far greater maximum expansion range, since the overall existing technology limit will be approached at a much slower rate. C. Applications Two examples of the SPR architecture are presented to illustrate the utility of the approximate SCS model. The first example is a complex of multiple minicomputers linked to a common shared secondary memory subsystem by a local area network (LAN). This is applicable to engineering and scientific environments. The second example is a point-of-sales (POS) application. This is applicable to grocery and department stores, and has a direct analogy to certain office automation environments. Example 1 Suppose The current processing system of a technical organization consists of 2 minicomputers (ICSs); each has an average multiprogramming level of eight, and both share a common secondary storage subsystem. The minicomputers are identical and each has a CPU and the same complement of four peripheral devices (PPU). These PPUs consist of (1) an input card reader (CR), (2) an output line printer (LP), (3) a private local disk (disk), and (4) a set of interactive devices (TTY). 116 Operationally, each ICS functions independently, processing both batch and interactive jobs. The composition of both types of jobs is the same, so no distinction between them is necessary. The set of interactive devices, as a set, has been characterized as a single device with an exponential service time distribution. Therefore, we may aggregate these interactive devices and represent them as a single device in our model. The local disk has two functions, (1) The system software resides there, and (2) during processing of a job it acts as a cache between the ICS and the SPR. The jobs exhibits an exponential service time distribution at each device. The CPU mean service time of a job is 25 msec, which in the SCS model is normalized to 1. For each device the mean service time, its corresponding normalized value, and its estimated transition probabilities are as follows: mean normalized estimated service service transition time time propabilities U SPR = 100 msec U CPU = ^^ msec u disk= 25 msec u = .25 u 1= 1.0 u 2 = 1.0 P = -25 Pl = .05 P 2 = -45 u tty = 2-5 sec U LP = 2^0 msec u 3 = .01 u 4 = .10 p 3 = .05 p 4 = .10 U CR = 250 msec u 5 = .10 p 5 = .05 The installation is about to be modified. The organization is expanding and has determined a requirement to expand the processing complex by eightfold. They have decided to implement a local area network (LAN ) which will allow interactive device access to the central processing complex from the desk of each employee. The bandwidth of the LAN is sufficiently high so that it will not be a bottleneck or cause any significant delay, therefore, the LAN may be neglected in our analysis. Due to existing software investment and staff familarity, the organization plans to retain the existing two minicomputers and modularly expand by adding identical ones as required. The shared common secondary storage subsystem has been very successful for sharing, rather than duplicating, programs and data bases as well as providing an effective electronic mail system. Therefore, the retention and expansion of this facility is also planned. The organization is 117 expected to grow over the next two years and the modular expansion of the processing facilities is planned to coincide. The maximum planned processing expansion is for a complex of 16 minicomputers. Two minicomputers are currently being acquired to bring the complex to 4 minicomputers. The current SPR does not have sufficient speed or capacity to handle the planned expansion. It is desired to size the SPR so that its current mean response time is maintained. In addition, there is potential for additional future organizational growth, which may result in a further processing expansion to 32 minicomputers. This potential growth is 5 to 8 years away and no definite planning is currently being done. It is desired to know if the SPR sized for the 16 ICS system will be able to adequately service a 32 ICS system, and if not, is there one that will. To determine the approximate job flow rate and traffic intensity of the current system we solve (7) of chapter IV using the BIN algorithm and the system parameters listed above. These values are then used to obtain the current approximate mean SPR response time by applying (13). This results in p SpR = .3508 and W SpR = 6.16. Repeating this procedure for the 4 ICS configuration results in W SpR = 13.2 . Similarly, the planned 16 ICS configuration yields W spR = 475, and the hypothesized 32 ICS configuration yields W SpR = 992. Applying (14) and (16), using the previously computed value of p SPR we obtain the /Ts from which the processing rate of an SPR for each configuration is computed to be: U SPR < 4 > = .338 «- 74 msec j8(4) = 1.35 U SPR ( 16 > = .865 «- 29 msec 0(16) = 3.46 U SPR ( 32 ) = 1.57 «- 16 msec 0(32) = 6.26 and Processing rates for various secondary storage technologies [HOAGLA 79, TOOMBS 78, WARNAR 79] indicate that an SPR subsystem using existing rotating disk technology can support a 16 ICS configuration. This same SPR subsystem cannot adequately support a 32 ICS configuration, although a faster SPR subsystem using this same technology can support a 32 ICS configuration. 118 Example 2 For the second example a POS environment is considered. POS can generally be described as many small independent processors, each processing a single job, and accessing a shared inventory data base subsystem (IDBS). The inventory data base may be implemented in several different configurations. One is a distributed configuration, where each geographically distinct organizational unit has its own local IDBS, which is shared among its own POS stations. Another configuration is a centralized one, where a single IDBS is located at a single site and is shared by all the POS stations. There are of course a spectrum of configurations in between these two extremes. The centralized configuration requires additional communication facilities from the IDBS central site to each of the geographically distinct organizational units. Structurally each POS station consists of a processor and a number of local I/O devices. These I/O devices may include any of the following typical devices; (1) a digital display or two, (2) a printer for a sales receipt, (3) an input scanner, (4) an input alphanumeric keyboard, and (5) an auxiliary input device, for instance a scale. A normal transaction consists of one or more human interactions to enter data through the I/O devices. The station processor, once it accepts the data, requests service from the IDBS to process this data. This request is serviced on a FCFS basis by the IDBS with an exponentially distributed service time. Any time devoted to communication between a POS station and the IDBS will be incorporated into the IDBS service time. It is assumed that the communication subsystem is not a bottleneck and, therefore, this time may be accounted for by increasing the mean processing time of the IDBS. The processed data is returned to the POS station, which has been idle while waiting, and it is then displayed. The cycle is then repeated. There is only a single job at each station and the station processor is fast enough to service all of its requests and manage all of its devices. Due to the nature of service requests the service time of the station processor is generally constant for local operations, while the human interaction through the devices is random and is assumed to be exponentially distributed. For a single job there is no need to model each device within a station separately since there is no contention for 119 devices. Also, since the station processor has essentially a constant service time and is not a bottleneck, the mean time of the human interaction distribution can be increased to account for it. Therefore, it is assumed that all of these devices and the station processor can logically be thought of as a single device with an exponentially distributed service time. A feedback loop to the POS station provides for error conditions that arise, which mainly occur when entering data. The processing rate of the POS station is normalized to 1. Each transaction requires one or more service requests at the POS station followed by a single service request to the IDBS. Each additional service request to the POS station represents a re-entry of data which was required by an error on the previous attempt. The response as seen by the user is strictly a local one. Therefore, the only time the IDBS affects the response time is when the data entry was successful. Based on this descriptive model, an acceptable response time is on the order of 1/4 sec. The POS mean processing time is 1 sec. Therefore, IDBSs which have processing rates many times faster than the POS station must be considered. The range of potential IDBS processing rates is 5 to 2000 times faster than the POS station. At 2000 times faster, the IDBS would require a mean processing time of .5 msec. This represents the upper limit of current secondary storage subsystem technology [HOAGLA 79, TOOMBS 78, WARNAR 79], especially if a communication subsystem is involved. The jobs have an exponential service time at each device. The POS station mean service time of a job is 1 sec. For this model the mean service times and their estimated transition probabilities are as follows: mean estimated service transition time probabilities U IDBS= 5 - 2000 PlDBS_= - 90 U POS~ 1 P POS = ■*" It is desired to size the IDBS for the two configurations under consideration. The first configuration being considered is a distributed configuration supporting 20 POS stations. The second configuration is a centralized one supporting 1000 POS stations. 120 To determine the approximate job flow rate and traffic intensity of a 20 POS station configuration we solve (7) of chapter IV using the BIN algorithm and the system parameters listed above. These values are then used to obtain the corresponding approximate mean IDBS response time by applying (13). This results in p IDBS =.6797 and W IDBS = .250 for u IDBS (20)= 12.5. Applying (14) and (16), using the previously computed value of p IDBS we obtain the fi from which the processing rate of an IDBS to support a 1000 POS station configuration is u IDBS (1000 ) = fi u IDBS = 428.9. The local 20 POS station configuration requires a u IDBS = 80 msec, which is a reasonable speed for disk technology. The 1000 POS station configuration requires a u IDBS = 2.33 msec. For this configuration the IDBS is remotely located and at this speed the communications delay, although not a bottleneck, must be accounted for. This communications delay has been measured to be .75 msec, resulting in a IDBS with a mean processing time of 1.58 msec. This speed exceeds the current capability of disk technology, but can be met using magnetic bubble technology. An analysis of a modular expansion was performed using both models. One of the key design aspects is the effect on performance due to a modular expansion, or conversely, the amount of increased capability required by the shared resource to continue to deliver some threshold amount of performance. The exact model yielded only an lower bound for the incremental increase in the SPR processing rate required to maintain system response time. This lower bound was too high to be of any practical use. The analysis of the approximate model yielded a useful and intuitively satisfying relation between the addition of ICSs and the incremental increase in SPR processing rate required to maintain system response time. It indicated that for each ICS added to expand the system, the required increase in incremental SPR processing rate was directly related to the incremental traffic intensity caused by each additional ICS. The implication is that, for example, by doubling the number of ICSs, an increase in the SPR processing rate of < 2 is required to maintain system response time, since the traffic intensity for a stable system is always < 1. This result is verified when compared to values predicted by the exact model. This result will be useful to designers and analysts when they consider building new systems or augmenting existing systems which are based on this class of architecture. Two examples illustrating the utility of the approximate model were presented. 121 VI. SUMMARY AND RECOMMENDATIONS A. Summary The basis of resource sharing and its application to computer architecture has been discussed. Some examples of architectures that support resource sharing were provided, and many more will be constructed. It was our intention to investigate the performance of this class of computer architecture which shares a single processing resource among multiple independent computing systems through the use of analytic queueing models Utilizing multi-class queueing network theory and the structure of this class of computer architecture a specific queueing network model was developed. Two efficient computational algorithms, SOP and FAC, were presented which could be used to evaluate the performance measures of this model. Previously the evaluation of queueing network models required memory-space and time complexity both growing exponentially with the size of the state-space, 0[ (J.+ 1) R ]. The algorithms developed here to evaluate the model require a memory-space which grows linearly, 0[ R(Jj + 1) ], with the size of the state-space, although the time complexity still grows exponentially. This provides the designer and analyst the ability to evaluate this model when it has a large state-space if he or she is willing to invest the computation time. Whereas, previously it may not have been possible to evaluate this model due to memory-space limitations. Although the algorithms to evaluate this exact queueing network model are memory-space efficient, they are still of exponential time complexity. This computational complexity limits the utility of the model, as is generally the case for other multi-class queueing network models. In an attempt to overcome this computational limitation, an approximate queueing model was introduced. This approximate model consists of a set of independent M/M/l single server queues. The solution technique is based on approximating the job flow rate between these queues. An efficient algorithm, predicated on a binary search technique, was presented to evaluate the performance measures of this approximate model. The development and solution form presented apply to balanced as well as unbalanced systems. The solution algorithm and the error analysis considered only balanced systems 122 To determine the utility of this approximate model a comparison of results between it and the exact model was made. A random sample space of the input parameters for these models was generated, and the corresponding performance measures evaluated. The performance measures predicted by this approximate model do result in a varying error, which is considered to be within ac eptable engineering limits. The efficiency gained in evaluation of the performance measures is, for most applications, thought to be an acceptable compromise for the error incurred. For situations with extremely large state-spaces, it may be the only analysis method possible. As a result, designers and analysts are provided the capability to use the approximate model to obtain estimates of the performance of a large number of system confiurations in a very short period of time. Once a small number of candidate configurations are culled, the exact model may be applied to obtain more accurate performance predictions. An analysis of a modular expansion was performed using both models. One of the key design aspects is the effect on performance due to a modular expansion, or conversely, the amount of increased capability required by the shared resource to continue to deliver some threshold amount of performance. The exact model yielded only a lower bound for the incremental increase in the SPR processing rate required to maintain system response time. This lower bound was too high to be of any practical use. The analysis of the approximate model yielded a useful and intuitively satisfying relation between the addition of ICSs and the incremental increase in SPR processing rate required to maintain system response time. It indicated that for each ICS added to expand the system, the required increase in incremental SPR processing rate was directly related to the incremental traffic intensity caused by each additional ICS. The implication is that, for example, by doubling the number of ICSs, an increase in the SPR processing rate of < 2 is required to maintain system response time, since the traffic intensity for a stable system is always < 1. This result is verified when compared to values predicted by the exact model. This result will be useful to designers and analysts when they conider building new systems or augmenting existing systems which are based on this class of architecture. Two examples illustrating the utility of the approximate model were presented. 123 B. Research Extensions We have stated that our position in accepting this M/M/l approximation was that it "tracked" the performance measures predicted by the exact models, although the values predicted were in error by a varying degree. Similar approximate models based on other independent queueing systems or a mix of different systems may provide a better "fit" than the M/M/l. The M/M/l system was chosen because the simplicity of its mathematical formulation provides both an expression that yields insight and one that allows manipulation for analysis. To provide insight to the designer and analyst as to the error incurred when they apply this approximation an error analysis was presented. This analysis was meant to indicate parameter sensitivity and trends. It was not an elaborate analysis, nor were the results conclusive. Further work is needed to characterize the error incurred over a wider range of the parameter values, especially the job allocation vector, J, and the number of ICSs, R. The BIN algorithm presented in chapter IV, provided an efficient solution to the approximate model when the system is balanced (identical ICSs). Additional work is needed to establish an efficient algorithm for the general case of an unbalanced system, because this system requires solution of a set of simultaneous nonlinear equations. Existing solution techniques should be investigated, including converging iterative ones ar is the BIN algorithm. Also, a similar error analysis should be done to determine if an unbalanced system results in different accuracy patterns or is any differently parameter sensitized than a balanced system. In certain situations the error incurred by using the M/M/l approximation is not satisfactory. The alternative is to utilize the exact model, but the computation time required may be excessive. Similar models, based on other queueing systems may provide results with a lower error tolerance and less variation. It would be useful to formulate efficient solution algorithms and perform similar error analyses for them, as was done for the M/M/l approximation. Assuming these other approximations yield values significantly closer to the exact values, the results of a modular expansion analysis would be useful. The results of this analysis would be interesting to compare to the results obtained from the M/M/l approximation. 124 A greatest lower bound for the exact model should be pursued. The one established here is too high to be of any practical value. A value of unity was conjectured for the exact model, but not proven, in this dissertation. Proving either this conjecture or the fractional bound (p), estab'ished through the analysis of these approximate models, is a useful endeavor. There are other research areas that may be pursued based on these modeling techniques. One is formulation of more efficient exact and approximate models for architectures with multiple SPRs. These models would be especially applicable to architectures incorporating a LAN allowing high speed access to multiple shared subsystems. Another research direction would be to establish non-exponential approximate models. These could be based on M/G/l servers or M/G/l/K servers. The utility of these models could be established by comparing their predictions to those of Shum's EPF model [SHUM 76, SHUM 77]. The EPF model is an approximation based on a queueing network formulation, in which the product terms are replaced by terms representing M/G/l/K servers. The computation time and memory-space complexity of the EPF model are equivalent to those of general queueing network theory. An approximate non-exponential model of time and space complexity comparable to that of the approximate SCS model would be useful. 125 APPENDIX A Review of Balance Equations for Queueing Networks 1. The Single Server Queue To understand the balance, or flow equations we shall start with a simple single server queue. Assume the service time distribution is exponential, with mean u , and assume the arrival process is Poisson (i.e. has exponentially distributed interarrival intervals), with mean a . Let the probability of n jobs in the queue at time t be Pr[n(t)] = Pr[X(t)=n], t>0 , where X(t) is a random variable and n is the state of the system, which is the number of jobs in the queue (including any being serviced). Let us also assume that in a small time interval, At, at most only one event can occur. Therefore, the state probability balance equation is Pr[n(t+ At)] = Pr[n(t)] Pr[no arrv. + no departures in At / n at t] + Pr[n(t)-1] Pr[l arrv. + no departures in At / n-1 at t] + Pr[n(t)+ 1] Pr[no arrv. + 1 departure in At / n+ 1 at t] + Pr[At 2 ] 2 where Pr[At ] is the collective probability that more than one event occurs during AL We 2 make At is small enough so that this probability is essentially zero, Pr[At ] = 0. Since the arrival and departure processes are exponential (i.e. memoryless), we are able to remove the conditions in the above equation which yields Pr[n(t+ At)] = Pr[n(t)] Pr[no arrv. + no departures in At] (1) + Pr[n(t)-1] Pr[l arrv. + no departures in At] + Pr[n(t)+ 1] Pr[no arrv. + 1 departure in At] 126 By definition the probability of each process occuring is (-a At) Pr[an arrival occurs in At] = 1 - e , and (-uAt) Pr[a departure occurs in At] = 1 - e Expanding the exponential by an infinite series and truncating after the second term yields 2 3 (a At) (a At) -aAt e = 1 - aAt + - + ... s 1 - aAt . 2 ! 3 ! This truncation may occur since for small At, At 2 «At , and as At approaches zero in the limit, At 2 approaches zero much more rapidly. Therefore, all the higher order terms are negligible compared to the first order term. Substituting this back into the probability definitions we obtain the following approximations Pr[an arrival occurs in At] ^ aAt , and Pr[a departure occurs in At] = uAt and conversely Pr[no arrivals occur in At] s 1 - aAt , and Pr[no departures occur in At] ^ 1 - uAt . Noting that the arrival and departure processes are independent of each other allows us to express their joint probability as a product of the two probabilities, which as above may be approximated by the first order term, as Pr[no arriv. + no departures in At] = (1-aAt) (TuAt) = 1 -aAt- uAt + au(At) 2 jj l-(a + u)At Applying this truncation again allows us to express the following probabilities as Pr[l arriv. + no departures in At] = aAt (1-uAt) = aAt - aAt(At) 2 __ aAt , and 127 Pr[no arriv. + 1 departure in At] = (1-aAt) uAt = uAt - au(At) 2 ~ uAt Substituting these results into equation (1) yields (2) Pr[n(t+At)] = Pr[n(t)] (l-(a+u)At) + Pr[n(t)-1] aAt + Pr[n(t)+1] uAt Assuming an infinite size queue and, therefore, no upper limit on the number of jobs, a lower limit must still be considered since it is physically impossible to have less than an empty queue. This results in the following boundary equation for an empty queue (i.e. state = "0") (3) Pr[0(t+At)] = Pr[0(t)] (1-aAt) + Pr[l(t)] uAt . When the queue is empty no departures can occur and there exists no state = "-1". Rearranging equations (2) and (3) algebraically yields Pr[n(t+ At)] = Pr[n(t) + 1] uAt - Pr[n(t)] (a + u)At + Pr[n(t)-1] aAt + Pr[n(t)] , and Pr[0(t+ At)] = Pr[l(0] uAt - Pr[0(t)] aAt + Pr[0(t)] Then dividing by At and taking At to its limit results in Pr[n(t+At)]-Pr[n(t)] (4) lim = Pr[n(t)+l]u - Pr[n(t)](a+u) + Pr[n(t)-1] a , and At->0 At Pr[0(t+At)]-Pr[0(t)] (5) lim = Pr[l(t)] u - Pr[0(t)]a At^O At Note that these equations are the derivatives (Pr'[n], etc.) of the state probabilities (that is, the state probabilistic rate of change). Assuming a stationary distribution implies Pr[n(t)] = Pr[n] The average rate of change must be zero, or an infinite accumulation in some state might occur. Therefore, equations (4) and (5) are assumed to be identically zero, resulting in 128 Pr'[n] =0 = Pr[n + l]u- Pr[n](a + u) + Pr[n-1] a , and Pr'[0] = = Pr[l] u - Pr[0] a By rearranging these equations we obtain (6) Pr[n] (a + u) = Pr[n + 1] u + Pr[n-1] a , and (7) Pr[0] a = Pr[l] u From studying these equations we realize that the left side is the (flow) rate out of a state, while the right side is the rate into that state. Therefore, these equations represent the balance of flow between states, hence the term balance (or flow) equations. 2. Open and Closed Queueing Networks To obtain the balance equations for a network of queues (i.e. a Jacksonian network [JACKSO 63]) the same basic procedures are necessary. We now have an additional problem of many independent queues connected to each other. This changes our scalar state n for one queue, to a vector N = (n Q , ... ,n s )to account for each ofthes + 1 queues. The state equilibrium probability distribution becomes Pr[N] =Pr[n n , ... , n ] where K = 2 n. i=0 and nj is the number of jobs in the i-th queue, and K is the total number of jobs in the network. Similarly, the mean arrival and departure rates become vectors also, one element for each queue, 3^ and Uj. To specify how these queues are connected, we use constant transition probabilities to indicate the flow (transition) of one job from queue j to queue i, p. .. Without proceeding through 129 the tedious algebra (as in the previous section) we may write [JACKSO 63] the left side as (S aj+'i u iy (n.) ) Pr[N] i=0 i=0 where , n. <0 y(nj) = { 1 , n.>0 Note that the extra factor y(nj) is equivalent to accounting for our "less than empty" boundary conditions, since some queues may be empty while others are not. The right side is a little more complex. We are looking for all the ways to reach the destination state N through a single transition from a source state, N'. Any queue i may receive a job from any other queue j, therefore, the source state must be of the form N' = (n Q , ... , n + 1, ... , n^l, ... ,n s ). All possible combinations of source states of this form must be accounted for on the right side of the equation, which yeilds s s 2 2 Uj y^) Pj .?r[n , ... ,^ + 1, ... , n f l, ... , nj j=0 i=0 s + 2 a 4 yCnj) Pr[n , ... , n-1, ... , n s ] i=0 These portions yield the following for an open queueing network with exponential service and interarrival times, and an infinite queueing capacity [JACKSO 63]: (8) [(I a l ) + (l u.y(n-))] Pr[n , ... , nj i=0 i=0 s s = [ 2 I ^ Pjj y(n.) Pr[n , ... , ^ + 1, ... , n^l, ... , nj i=0 j=0 s + [ 2 aj y(nj) Pr[n , ... , n r l, ... , n s ] ] i=0 130 For a closed queueing network with a constant number of jobs, K, continually circulating [GORDON 67] there is no arrival process, which implies aj = 0, for all i. The resulting equation is (9) [ z Ujydii)]^, ... ,n s ] i=0 s s = 2 2 Uj Pj 4 y(nj) Pr[n , ... ,il+1, ... ,n.-l, ... , n s ] . i=0 j=0 It should be noted that our approach to arrive at equations (8) and (9) was an intuitive extension of equations (6) and (7), the balance equations for a single server queue. A more rigorous approach would require a procedure similar to that from which equations (6) and (7) were derived [JACKSO 57, JACKSO 63, GORDON 67]. The solution to these equations is based on assuming a product form solution [GORDON 67] of the form s nv. (10a) Pr[n , ... , nj = 1 n x k " k , and G(K) k=0 s (10b) Pr[n , ... ,il+1, ... , n r l, ... , nj = *, /^ II G(K) k=0 where G(K) is a probability normalization factor and the x k 's are unknowns whose solution must be obtained. Substituting the solution form of (10) back into (9) yields s s s s n k i n x 2 Uj y(n.) 1_ n x k k = [ 2 2 Uj p j(i y(n.) Xj. ] i=0 G(K) k=0 i=0 j=0 ' X i G(K) k=0 k The term on the right-hand side of the equation, outside the brackets, is always non zero and is a term common to both sides. This term is not dependent on the indices i or j and, therefore, may be manipulated without effecting the summations. Extracting and cancelling this term results in 131 s s s X- 2 u A yCi^) =2 x u. v- } - yC^) — i=0 i=0 j = ' X- Further extracting the common term y(nj) yields s s x- (11) 2 yCn,) [ uj - 2 Uj Pj • — ] = i=0 j=0 ' X- The next step in the solution of this closed queueing network is based on the fact that at any time it is possible for all but one queue to be empty. Let this single non-empty queue be the k-th queue. In this case o i*k y(n,) = { 1 i = k Hence, (11) is reduced to x j Uj - 2 U: p- — =0 , for i = 0, ... ,s j = ' Xj By rearranging the above the following results are obtained: s x i u i ■ 2 x j u j Pj,i =° j=0 Letting e. = x ; u 4 ; then (12) e i= 2 e jPjl j=0 132 which is the vector equation E = E[P], where the vector E = (e Q , ... , e s ) and the matrix [P] = [p. .]. The e^s are usually called the relative visist frequencies. Finally we can solve for the probability normalization factor, by using the fact that the probabilities must sum to unity. Hence, from (10) we may formulate N U" ,... ," s j G(K) N i i=0 Therefore, (13) G(K) = s N s n Xj 1 , i = where the summation over N implies all non-negative solutions to the equation i=0 See [KLIENR 76], pg216. In summary, the state equilibrium probabilities can be obtained from the following Pr[n , ... , n g ] = 1 n x i * , where G(K) i=0 G(K) =i n x : ni N i=0 3. The Central Server Model A simple, nontrival case of a closed queueing network is called the Central Server Model [BUZEN 71]. A diagram of this model is shown in figure A-l and the corresponding transition probability 133 A A A A E n s- cn fO •i - -D O o /v A s- O) i- +■> c a> o i a; s- 3 •i— v A -> 134 [P] = Po.o 1 1 Po,i Po Po.s Figure A-2. Central Server model transition probability matrix. 135 matrix is structured as shown in figure A-2. This results in a simple solution to (12), which consists of s+ 1 simultaneous equations with s+ 1 unknowns. In this case there are only s independent equations. Therefore, one can solve all these equations in terms of one of the unknowns, say e Q . This does not result in a unique (absolute) solution, but rather a relative solution. Consequently, any value substituted for e Q will yield a solution satisfying (12). Therefore, this relative solution is related to the absolute solution by some multiplicative constant. The relative solution for the Central Server Model is E =(e Q , e p 01 , ... ,e Q p 0s ) Note that the first subscript (j) of the p= j's denotes the source of the transition. There is only a single source in this model, the central server. Therefore, this subscript may be dropped since it is implied, resulting in E= (e , e Q p v ... , eoP s ) Letting x =l, which then implies e =u x =u results in E=(u , u oPl , ... ,u p s ) and u oPi u oPs X = (l, , ... , ) . u l u s The structure of the central server model results in a simple solution to (12). The evaluation of (13), although straightforward, requires a summation over a state-space whose size increases s exponentially, 0[ (K + 1) ]. Buzen [BUZEN 71] developed an iterative method for evaluating s (13) has requires a computational complexity 0[Ks], vs. 0[(K + 1) ]. This method is based on an recursive partitioning technique. Define the following auxiliary function j (14) g(m,j) =1 n x" 1 i j i=0 i=0 S n- = m 136 Note that when m = K and j = s, then G(K) = g(K,s). By partitioning g(mj) based on the occupancy of the last queue, j, being either empty or not, yields j-l j (15) g(m,j) =2 n x^ 1 + Xj 2 n x^ 1 = g(mj-l) + x. g(m-l j) , j-l i=0 j i = X rij=m 2n- = m-l i=0 i=0 with the following boundary conditions g(0j)=l ( 0l) s G(K) i=0 G(K) s i=0 Znj:sKOn>l) 2n i = K-l i=0 i=0 X J G(K-l) G(K) 137 To derive the mean queue length of the j-th queue, Q., first define R-(k) as the probability that the j-th queue has k or more jobs (i.e. npk-1), expressed as n- , n- R/k) = 2 Pr[n , ... ,n s ] =2 [ L_ n Xj '] = 2 n x, * N(3n>k) s G(K) i=0 G(K) s i=0 2 n i = K (3n>k) 2n i = K-k 1=0 i=0 Hence, k x j R/k) = G(K-k) G(K) Note that R/0)=1, R/l) = Aj, R/K+1)=0, and the probability that the j-th queue has exactly k jobs is R:(k)-R.(k+1). Therefore, the mean queue length is K K K Qj =2 k[ R.(fc)-R.(k+1)] = 2 k R/k) - 2 kR/k+1) k=l k=l k-1 K 2 [C-:i k=l = 2 k Rj(k) - [ 2 (k-1) Rj(k) + KR/K+1)] K K R/l) + 2 k R/k) - 2 (k-1) R/k) k = 2 k=2 K R/l) +2 [k-(k-l)] R/k) k = 2 K 2 R/k) k = l k x/ G(K-k) 2 k = i G(K) 138 4. Closed Queueing Networks with Multiple Job Classes Basket et. al. [BASKET 75] have extended the theory of queueing networks to include multiple classes of jobs, with other than exponential service time probability distributions. This extended model is based on expanding the state description, the state transition probabilities, and the mean service rates to include class distinction. Define the following (Note: these definitions differ somewhat from those used in the rest of this dissertation, see Appendix C, but they conform to the definitions generally associated with multi-class queueing network theory): N = (n Q , ... , n s ) is the state-space vector description ; = (N ,...,N S ) = (n 01 ,...,n 0R , ... , n sl ,...,n sR ) n 4 r = number of class r jobs at the i-th service center (both in service and in queue) ; R n i = 2 n ir , the total number of jobs (of all classes) at the i-th service center ; r=l R = (n 4 1 , ... , n 4 R ) is the job class distribution at the i-th service center ; s s R K = 2 n 4 = s 2 n 4 r , the total number of jobs in the network ; i=0 i=0 r=l J = (J-L , ... , J R ) is the job class allocation vector for the network ; J r = the maximum number of jobs that can be allocated to the r-th class , s J r = s n i is the total number of class r jobs in the network ; i=0 u. r = the mean service rate of a class r job at the i-th service center ; p ; s = the probability of a class r job at the i-th service center transiting to the j-th service center and becoming a class s job ; R = the number of different job classes in the network 139 The state equilibrium probabilities are assumed to have the form ■ 1 s Pr[n j , ... , n s ,r] = n «Ni> ' G(J) i=0 where the product factors are given by R nj! n • n ir X. ' i,r PS, FCFS, LCFS r=l V f i( N i) = { R n • IS ; r=l V the normalization constant is given by G(j) = 2 n qcNj) N i=0 where the summation over the state-space N means all non negative solutions to I N i= J i=0 The relative visit frequency, e i r , is given by the solution to s R i.r j,t K j.t;i,r ' j = t=l 140 and the relative load factor, x i , is given by e i,r X i,r "■ U i,r Buzen's efficient computational procedures have been generalized by Muntz and Wong [MUNTZ 74], Giammo [GIAMMO 76], and Shum [SHUM 77] for networks with a constant number of jobs in each of the R classes, specified by the vector J. The efficient iterative solution technique of (15) for a single class of jobs is based on a two dimensional conceptualization, jobs (a scalar) by devices. A direct extension of this two dimensional technique for multiple classes of jobs requires the substitution of a job class distribution vector for the previous scalar job specification. Defining an auxiliary function, as in (14), results in (16) g(M;k) = X k n n ir R v nj! n k 2 I i=0 r=1 n i,r ! i=0 Following a recursive partitioning procedure similar to that used to obtain (15), yields the following recursive defining relation for the auxiliary function [SHUM 76] R (17) g(M;k) = g(M;k-l) + X x ir g(M-d r ;k) r=l Equation (16) may also be expressed as [MUNTZ 74] m l m R (18) g(M;k) = x ... x f k (N k -M) g(M;k-l) n k,l = n k,R = 141 where M is a vector representing a job class distribution such that M = (m 1 , ... , m R ) , nij = the number of class ijobs 3 m i i=0 J l J R = 2 ... 2 [ f s (N s ) {g(J-N s ;s-l) }] n s,l = n s,R = = g(J;s) = G(J) 142 where g(N ;0) = f (No) f^) = I x ir fjCN^) , and r=l fi(0) =1 This completes the review of the basic equations for current queueing network theory. The following is provided to demonstrate the details of the partitioning procedure used to obtain (17). We will only use a two queue system to minimize the length of the presentation, while still demonstrating the basic concepts. First, we expand the function definition of (16) to obtain the recursive relation. Then as an alternative approach we expand the right-hand side of (17) by substituting (18) , and show that the left-hand side of (17) is obtained. From (16) we obtain r x n °' r R X 0,r g(N ;0) = f (N ) =n ! n r=l n 0r ! (n 0>t -d r ) R R \ 0t 1 x 0r [(n -l)! n ] r=i t=l (n 0t -d r )! Therefore, g(N ;0) = 2 x 0r g(N -d r ;0) r=l 143 From (18), substituting this definition, we also obtain m l n lr m R R X lr i,r R n lr x i,r ' g(M;l)= 2 .. 2 {n^ n ji s n njin- n l,l = "LR^O r=1 n l,r ! ° i=0 r=1 2 N-sM-Nj V } i=0 m l m R = 2 ... 2 f^) g(M-N i; 0) n l,l _ n l,R-0 and n is m-i m- 1 mo R X-i ' g(M-d r ;l) =2 ... 2 ... 2 ni !n g(M-d r -N i; 0) n l,l = n l,r=0 n l,R=0 s = 1 n l,s ! R x (d s n l,s- d r) 1 r R Is = 2 ... 2 ... 2 (n^l)! n — - gCM-N^O) n i,r° n i, r =1 n i,R =0 s=1 ( d s n i,s" d r) ! m l m r m R = 2 ... 2 ... 2 ^(N^) g(M-N i; 0) n l,l =0 n l,r =1 n l,R =0 Restructuring the above yields g(M;l) =f 1 (0)g(M;0) +2 f^) gCM-N^O) R = g(M;0) + 2 [ 2 x 1>r f^-d,) ] g(M-N i; 0) (KN^M r=l = g(M;0) + 2 [ 2 ... 2 ... 2 x 1>r f^-d,) g(M-N i; 0) ] r=l n u =0 n lr =l n 1R = 144 m l m r m R = g(M;0) + 2 x lr | I ... x ... 2 ^(N^) g(M-N i; 0) ] r=l n u = n u =l n 1>R =0 R = g(M;0) + 2 x lr g(M-d r ;l) r=l which is also g(M;l) ={f 1 (M)g(0;0) + ^(M-d^gCQ+d^O) + ... + f^O + d^gCM-d^O) + fi(fi)g(M;0)} = {fi(M)g(fl;0) +f 1 (M-d r )g(0+d r ;0) + ... + f^O+d^M-d^O)} + f 1 (Q)g(M;0) R = g(M;0) + 2 x lr g(M-d r ;l) r = l As an alternative approach we will expand the right-hand side of (17), substituting (18) and derive the equality of (17). R R 2 x lr g(M-d r ;l) = 2 x lr { 2 ... 2 ... 2 f^-dj.) g(M-N i; 0) } r=l r=l n ll = ^ n lf = ^ n lR = ^ = 2 x lr {^(M-d^gcao) + ... + f^gCM-d^o)} r = l 145 ={f 1 (M)g(a;0) +f 1 (M-d r )g(0 + d r ;0) + ... +f 1 (0 + d r )g(M-d r ;0)} , and by adding the final term to complete the series and by noting that g(M;0) = f^gCMjO) , and fi(Q) = l we obtain g(M;0) + 2 x lr g(M-d r ;l) = { f x (M) g(ftO) + ^(M-d^gCfi+d^O) + f 1 (Q+d r )g(M-d r ;0)}+g(M;0) r=l + = g(M;l) Following the above procedure it can be shown that the general case yields R g(M;k) =g(M;k-l) + I x ir g(M-d r ;k) r=l 146 APPENDIX B GLOSSARY ASSIGN BIN balanced system C EPF FAC FCFS ICS IDBS IS iid LAN LCFS LSI M/G/l the assignment algorithm the BIN algorithm each ICS is identical coefficient of variation extended product form model factorial expansion algorithm first-come first-served scheduling policy independent computing system inventory data base subsystem scheduling policy consisting of an infinite number of servers, i.e. no scheduling policy statistically independent and indentically distributed local area network last-come-first-served-preemptive-resume scheduling policy large scale (circuit) integration a single server queue with Poisson arrival and general service time distribution 147 M/G/l/K M/M/l M/M/l/K POS PPU PS Relative error SCS SOP SPR a single server queue with Poisson arrival and general service time distribution, with a queueing limit of K jobs a single server queue with Poisson arrival and exponential service time distribution a single server queue with Poisson arrival and exponential service time distribution, with a queueing limit of K jobs point-of-sales peripheral processing unit processor sharing scheduling policy (Exact value - Approximate value)/(Exact value) Shared Central Server sum-of-products expansion algorithm shared processing resource 148 APPENDIX C MATHEMATICAL NOTATION mean arrival rate for device (ij) A. . the busy probability of device (i,j) f$ assumed SPR increase processing rate factor L-l R e. . = i i e t p t • : relative visit frequency for device (ij) t=0 r=l r*i d^Cbp ... ,b R ) a unit vector, 3 b r = \ , r=l, ... , R 1 r=i, V R ! n r = l V PS, FCFS, LCFS fiW ={ J" product factors R n • n ir x i,r ' IS r=l n i,r ! G(K) =G(J) normalization constant g(M;k) auxiliary iterative function used in the computation of the normalization constant 149 h(i,Jj ) aggregate of the auxiliary iterative function i,j index denoting device j within the i-th ICS J = (J, , ... , J R ) is the job class allocation vector for the network J. =5; n ir the maximum number of jobs that can be allocated to the i-th r=i ' class (i-th ICS) J r = Jj for i and r = 1, ... ,R when the network is balanced, and in Chapter IV J is used as a generic scalar such that J = J { R R s i K =1 Jj = 2 s n ir the total number of jobs in the network i = l i = l r =0 K = RJj , i = 1, ... ,R when the network is balanced R 2 i=0 L= 2 Sj total number of devices in the network M = (nip ... , m R ) is a dummy counting vector that may range over the job allocation vector , R R 3 (Km^Jj , ||M|| =n (mj +1) , and |M| = 2 m { i=l i=l R ||M|| =n (mj+l) is the vector range (product) i = l R |M| = I nij is the vector value (summation) i = l n = (n Q , ... , n ) state vector description of the network = (n 10 ,...,n lv ... , n R0 ,...,n RsR ) n 4 r number of class i jobs at the r-th service center of the i-th ICS (both in service and in queue) 150 iij the total number of jobs (of all classes) at the i-th ICS or the SPR , = { R 1 n .,r i>0 r = l R Z n r,0 i = r=l N Q =(n 10 , ... , n R0 ) is the job class distribution at the SPR 0[ X ] the order of complexity of X Prf^ q , ... , n R s ] state equilibrium probabilities ' * R Pj rj t the probability of a class r job at the i-th service center transiting to thej-th service center and becoming a class tjob p i • transistion probability from the i-th CPU to the j-th device in the i-th ICS or to the SPR, where 0k Q. ■ the mean queue length of device (i,j) p traffic intensity of device (i,j) i.j R number of ICSs in the network, which is equal to the number of different classes in the network S = (s , ... , s R ) the device allocation vector 151 Sj number of devices in the i-th ICS (i>0) or the SPR (i = 0) s = Sj , i = 1, ... ,R when network is balanced T- = average throughput of device (ij) u- ; mean processing rate of device (i,j) u= = Uj = , i = 1, ... ,R and j = 0, ... , Sj when network is balanced W mean wait time at device (ij) ij Xy = relative load factor of device (i j) 152 APPENDIX D SAMPLE SPACE FROM ASSIGN ALGORITHM S= 2,6,11, P(SPR) = 0.100, 0.250, 0.500, 0.750, 0.900, TOTAL C COMBINATIONS = 15 1. P(SPR) = 0.100 S = 2 1 2 P(J)= 0.132 0.768 U(J)= 1.000 1.000 2. P(SPR) = 0.250 S = 2 1 2 P(J)= 0.182 0.568 U(J)= 1.000 0.020 3. P(SPR) = 0.500 S = 2 1 2 P(J)= 0.024 0.476 U(J)= 1.000 1.000 4. P(SPR) = 0.750 S = 2 1 2 P(J)= 0.010 0.240 U(J)= 1.000 0.100 5. P(SPR) = 0.900 S = 2 1 2 P(J)= 0.029 0.071 U(J)= 1.000 5.000 6. P(SPR) = 0.100 S = 6 12 3 4 5 6 P(J)= 0.276 0.374 0.006 0.125 0.026 0.093 U(J)= 1.000 0.200 0.020 0.100 0.020 0.020 153 7. P(SPR) = 0.250 S = 6 12 3 4 5 6 P(J)= 0.386 0.083 0.077 0.020 0.037 0.147 U(J)= 1.000 0.2001 0.000 0.200 0.010 0.100 8. P(SPR) = 0.500 S = 6 12 3 4 5 6 P(J)= 0.164 0.139 0.009 0.081 0.071 0.036 U(J)= 1.000 2.000 0.020 5.000 0.200 0.100 9. P(SPR) = 0.750 S = 6 12 3 4 5 6 P(J)= 0.054 0.051 0.025 0.011 0.068 0.041 U(J)= 1.000 0.020 10.000 0.100 0.200 0.500 10. P(SPR) = 0.900 S = 6 12 3 4 5 6 P(J)= 0.061 0.003 0.004 0.001 0.013 0.018 U(J)= 1.000 0.500 0.020 0.500 0.010 2.000 11. P(SPR) = 0.100 S = 11 1234 56789 10 II P(J)= 0.376 0.259 0.046 0.138 0.006 0.025 0.020 0.001 0.011 0.013 0.005 U(J)= 1.000 0.500 1.000 0.050 5.000 0.020 10.000 5.000 0.200 1.000 0.010 12. P(SPR) = 0.250 S = 11 12 3 4 5 6 7 8 9 10 11 P(J)= 0.551 0.040 0.017 0.089 0.035 0.007 0.003 0.001 0.005 0.001 0.001 U(J)= 1.000 2.000 10.000 10.000 10.000 2.000 0.010 0.200 0.020 0.200 0.020 13. P(SPR) = 0.500 S= 11 1234 56789 10 11 P(J)= 0.288 0.042 0.016 0.098 0.007 0.004 0.028 0.008 0.001 0.001 0.007 U(J)= 1.000 1.000 5.000 5.000 0.010 0.010 1.000 1.000 0.100 2.000 1.000 14. P(SPR) = 0.750 S = 11 12 3 4 5 6 7 8 9 10 11 P(J)= 0.088 0.113 0.006 0.013 0.012 0.004 0.006 0.005 0.001 0.001 0.001 U(J)= 1.000 0.020 10.000 0.020 0.100 2.000 0.050 10.000 0.200 2.000 0.020 15. P(SPR) = 0.900 S = 11 12 3 4 5 6 7 8 9 10 11 P(J)= 0.003 0.027 0.024 0.006 0.005 0.020 0.006 0.001 0.006 0.001 0.001 U(J)= 1.000 5.000 0.050 1.000 0.010 1.000 10.000 5.000 0.100 5.000 0.500 154 BIBLIOGRAPHY [ADIRI 69] Adiri, I. and Avi-Itzhak, B., "A Time-sharing Queue", Manag. Sci., Vol. 15 No. 1, pp 639-657, 1969. [ADIRI 71A] Adiri, I., "A Dynamic Time-sharing Queue", J.ACM, VOL. 18 No. 4, pp 603-610, Oct. 1971. [ADIRI 71B] Adiri, I., "A Note on Some Mathematical Models of time-sharing Systems", J.ACM, VOL. 18 No. 4, pp603-610, Oct. 1971. [AHO 74] Aho, A. V., Hopcroft, J. E. and Ullman, J. D. "The Design and Analysis of Computer Agorithms" Addison-Wesley Pub. Co., Reading, Mass., 1974. [ALLEN 78] Allen, A. O. "Probability, Statistics, and Queuing Theory" Academic Press, New York, NY, 1978. [AVI-IT 73] Avi-itzhak, B. and Heyman, D., "Approximate Queueing Models for Multiprogramming Computer Systems", Opns. Res., Vol. 21 No. 6, pp 1212-1230, Dec. 1973. [BAASE 78] Baase, S. "Computer Algorithms: Introduction to Design and Analysis" Addison-Wesley Pub. Co., Reading, Mass., 1978. [BAER 73] Baer, J. L., "A Survey of Some Theoretical Aspects of Multiprocessing", Computer Surveys, VoL 5 No. 1, pp 31-80, Mar. 1973. [BASKET 72A] Baskett, F. and Gomez, F. P., "Processor Sharing in a Central Server Queueing Model of Multiprogramming With Applications", Proc. 6-th Annual Princeton Conf. on Info. Sci. and Syst., Princeton, N.J., pp 589-603, Mar. 1972. [BASKET 72B] Basket, F. and Muntz, R., "Queueing Network Models with Different Classes of Customers", Digest of Papers, COMPCON 72, San Francisco, CA., pp 205-209, Sept. 1972. [BASKET 75] Basket, F.; Chandy, K.; Muntz, R. and Placaios, F., "Open, Closed, and Mixed Networks of Queues with Different Classes of Customers", J. ACM, VOL. 22 No. 5, pp 248-260, Apr. 1975. [BASKET 76] Baskett, F. and Smith, A. J. "Interference in Multiprocessor Computer Systems with Interleaved Memory", CACM, Vol. 19, No. 6, pp 327-334, June 1976. [BHANDA 73] Bhandarkar, D. P. "Analytic Models for Memory Interference in Multiprocessor Computer Systems", Ph.D. Thesis, Carnegie-Mellon University, Sept. 1973. [BHAT 73] Bhat, U. N. and Nance, R. E., "Dynamic Quantum Allocation and Swap- Time Variability in Time-sharing Operating Systems", Southern Methodist University report No. TR-CP-73009, Apr. 1973. [BHAT 78] Bhat, U. N., Fisher, M. J. and Shalaby, M. "Aproximation Techniques in the Solution of Queueing Problems". Southern Methodist University Rpt., NTIS #ADA053499, Mar. 1978. 155 [BRANDW 77] Brandwajn, A. " A Queueing Model of Muliprogramming Computer Systems Under Full LOad Conditions", JACM, Vol. 24, No. 2, pp 222-240, Apr. 1977. [BRECHT 78] Brechtlein, R. "Comparing Disc Technologies", Datamation, pp 139-150, June 1978. [BROWNE 73] Browne, J. C, et. al. "The Effect on Throughput of Multiprocessing In a Multiprogramming Environment", IEEE Trans, on Comp., Vol. C22 No. 8, pp 728-735, Aug. 1973. [BROWNE 75] Browne, J. C, et. al. "Hierarchical Techniques for the Development of Realistic Models of Complex Computer Systems", Proc. of the IEEE, Vol. 63, No. 6, pp 966-975, June 1975 [BUCCI 79] Bucci, G. and Streeter, D. "A Methodology for the Design of Distributed Information Systems", CACM, Vol. 22, No. 4, pp 233-245, Apr. 1979. [BURKE 56] Burke, P. J. "The Output of a Queueing System", Opns. Res., Vol. 4, No. 6, pp 699-704, Dec. 1956. [BURKE 72] Burke, P. J. "Output Processes and Tandem Queues", Proc. of the Symposium on Computer-Communications Networks and Teletraffic, Polytechnic Institute of Brooklyn, N.Y., pp 419-428, Apr. 1972. [BURNET 70] Burnett, G. J. and Coffman, E. G., "A Study of Interleaved Memory Systems", Proc. of AFIPS SJCC, Vol. 36, pp 467-474, Spring 1970. [BUZEN 71] Buzen, J. P. "Queueing Network Models of Multiprogramming", Ph.D. Thesis, Harvard University, 1971. [BUZEN 73] Buzen, J. P. "Computational Algorithms for Closed Queueing Networks with Exponential Servers", CACM, Vol. 16 No. 9, pp 527-531, Sept. 1973. [BUZEN 74] Buzen, J. P. and Goldberg, P. S. "Guidelines for the use of Infinite Service Queuing Models in the Analysis of Computer System Performance", Proc. of AFIPS NCC 74, Vol 43, pp 371-374, 1974. [BUZEN 76] Buzen, J. P. "Fundamental Laws of Computer System Performance", Proc. of the International Symposium on Computer Performance Modeling, Measurement and Evaluation, Harvard Univ., Cambridge, Mass., pp 200- 210, Mar 1976. [BUZEN 77] Buzen, J. P. and Potier, D. "Accuracy of Exponential Assumptions in Closed Queueing Models", Proc. of the SIGMETRICS Conf. on Computer Performance: Modeling, Measurement, and Management, Wash., D.c, pp 53-63, Nov. 1977. [BUZEN 78A] Buzen J. P. and Denning, P. J. "The Operational Analysis of Queuing Network Models", ACM Computing Surveys, Vol. 10, No. 3, pp 225-262, Sept 1978. [BUZEN 78B] Buzen, J. P. "A Queuing Network Model of MVS", ACM Computing Surveys, Vol. 10, No. 3, pp 319-332, Sept 1978. 156 [CADY 72] [CARPEN 79] [CDC 75] [CHAMPI 79] [CHAMPI 80] [CHANDY 72A] [CHANDY 72B] [CHANDY 75A] [CHANDY 75B] [CHANDY 77] [CHANDY 78] [CHANG 79] [CHANG 72] [CHANG 66] [CHLAMT 80] Cady, G. M. "Computation and Communication Trade-off Studies, An Analytical Model of Computer Networks", Proc. WESCON, Vol. 16 Section 7/2, pp 1-12, 1972. Cartpenter, R. J. and Sokol, J. "Serving Users With a Local Area Network", Proc. of the Local Area Communications Network Symposium, Boston, Mass., pp 75-85, May 1979. CDC "Scope 3.4.4 Multi-Mainframe Operations Programming systems Bulletine", Control Data Corp., St. Paul, Minn., CDC #60493600, 1975. Champine, G. A. "Current Trends in Data Base Systems", IEEE Computer, Vol. 12, No. 5, pp 27-41, May 1979. Champine, G. A. "Back-End Technology Trends", IEEE Computer, Vol, 13, No. 2, pp 50-58, Feb. 1980. Chandy, K. M. "The Analysis and Solutions for General Queueing Networks", Proc. 6-th Annual Princeton Conf. on Info. Sci. and Syst., Princeton, N.J., pp 224-228, Mar. 1972. Chandy, K. M. "Local Balance in Queueing Networks With Independent Servers With Applications to Computer Systems", Proc. 10-th Allerton Conf., University of 111., Oct. 1972. Chandy, K. M., Herzog, U., and Woo, L. "Performance Analysis of Queueing Networks", IBM J. Res. Develop., Vol. 19, No. 1, pp 36-42, Jan. 1975. Chandy, K. M., Herzog, U., and Woo, L. "Analysis of General Queueing Networks", IBM J. Res. Develop., Vol. 19, No. 1, pp 43-49, Jan. 1975. Chandy, K. M., Howard, J. H., and Towsley, D. F. "Product Form and Local Balance in Queueing Networks", JACM, Vol. 24, No. 2, pp 250-263, Apr. 1977. Chandy, K. M. and Sauer, C. H. "Approximate Methods for Analyzing Queuing Network Models of Computer Systems", ACM Computing Surveys, Vol. 10 , No. 3, pp 281-318, Sept 1978. • Chang, H. "Major Activity in Magnetic Bubble Technology", Computer Design, Vol. 13, No. 12, pp 117-125, Nov 1979. Chang, J. H. and Gorenstein, S. "A Disk File System Shared by Several Computers in a Teleprocessing Envirnoment", Proc. of the Symposium on Computer-Communications Networks and Teletraffic. Polytechnic Institute of Brooklyn, N.Y., pp 177-185, Apr. 1972. Chang, W. "A Queueing Model for a Simple Case of Time Sharing", IBM Syst. J., Vol. 5, No. 2 pp 115-125, 1966. Chlamtac, I. Franta, W. R., Patton, P. C. and Wells, B. "Performance Issues in Back-End Storage Networks", IEEE Computer Vol. 13, No. 2, pp 18-31. Feb 1980. 157 [CHIU 75] [CHOW 75] [CHOW 77] [CITREN 72] [COFFMA 68A] [COFFMA 68B] [COFFMA 69] [COFFMA 70] [COFFMA 73] [CONWAY 67] [COOPER 69] [COTTON 79] [COURTO 71] [COX 55] [DISNEY 73] Chiu, W., Dumont, D., and Wood, R. "Performance Analysis of a Multiprogrammed Computer System", IBM J. Res. Develop., Vol. 19, No. 3, pp 263-271, May 1975 Chow, W. M. "Central Server Model for Multiprogrammed Computer Systems with Different Classes of Jobs", IBM J. Res. Develop., Vol. 19, No. 3, pp 315-320, May 1975. Chow, W. M. and Woo, L. "Buffer Performance Analysis of Communication Processors During Slowdown at Network Control", IBM J. Res. Develop., Vol. 21, No. 3, pp 264-272, May 1977. Citrenbaum, R. "AN On-line Model for Computation-Communication Network Analysis and Trade-off Studies", Proc. of the Symposium on Computer-Communications Networks and Teletraffic, Polytechnic Institute of Brooklyn, N.Y., pp 613-634, Apr. 1972. Coffman, E. G. Jr. "Analysis of Two Time-Sharing Algorithms Designed for Limited Swapping", J.ACM, Vol. 15 No. 3, pp 341-353, July 1968. Coffman, E. G. Jr. and Kleinrock, L. "Feedback Queueing Models for Time-Shared Systems", J.ACM, Vol. 15 No. 4, pp 549-576, Oct. 1968. Coffman, E. G. Jr. and Muntz, R. R. "Models of Resource Allocation Using Pure Time-Sharing Disciplines", Proc. of the 24-th ACM National Conf., pp 217-228, 1969. Coffman, E. G. Jr., Muntz, R. R., and Trotter, H. "Waiting Time Distributions for Processor Sharing Systems", J.ACM, Vol. 17 No. 1, pp 123-130, Jan. 1970. Coffman, E. G. Jr. and Denning, P. J. "Operating Systems Theory", Printice-Hall, Inc., N.J., 1973. Conway, R. W.; Maxwell, W. L. and Miller, L. W. "Theory of Scheduling", Addison-Wesley, Reading, MA., 1967. Cooper, R. B. and Murray, G. "Queues Served in Cyclic Order", Bell Syst. Tech. J., Vol. 48 No. 3, pp 675-689, Mar 1969. Cotton, I. W. "Technologies For Local Area Computer Networks", Proc. of the Local Area Communications Network Symposium, Boston, Mass., pp 25-45, May 1979. Courtois, P. J. "On the Near-Complete-Decomposability of Networks of Queues and of Stachastic Models of Multiprogramming Computer Systems", Ph.D. Thesis, Carnegie Mellon University, Nov. 1971. Cox, D. R. "A Use of Complex Probabilities in the Theory of Stochastic Processes", Proc. Cambridge Philos. Soc, Vol. 51, pp 313-319, 1955. Disney, R. L. and Cherry, W. P. "Some Topics In Queueing Network Theory", Proc. of the Sympoium on Mathematical Methods in Queueing Theory, Kalamazoo, Mich., PP 23-44, May 1973. 158 [DISNEY 77] Disney, R. L. "Research into Queueing Network Theory", NTIS #ADA047148, Sept. 1977. [DUDICK 71] Dudick, H. L. and Pack, C. D. "Round Robin Scheduling in a Computer Communication System With Finite Swap-Time and Statistically Multiplexed Arrivals", Proc. ACMTEEE 2nd Symposium on Problems in the Operations of Data Communication Systems, Oct 1971. [FELLER 68] Feller, W. "An Introduction to Probability Theory and Its Applications", Vol. I, John Wiley ans Sons Inc., Ney York, N.Y., 1968. [FENICH 69] Fenichel, R. R. and Grossman, A. J. "An Analytic Model of Multiprogramming Computing", Proc. of AFIPS SJCC, Vol. 34, pp 771- 721, Spring 1969. [FERRAR 78] Ferrari, D. "Computer Systems Performance Evaluation", Prentice-Hall Inc., Englewood Cliffs, N.J., 1978. [FINCH 59] Finch, P. D. "The Output Process of the Queuing System M/G/l", J. Roy. Stat. Soc, Ser B21, No. 2, pp 375-380, 1959. [FRANK 72] Frank, H.; Kahn, R. E. and Kleinrock, L. K. "Computer-Communication Network Design-Experience With Theory and Practice", Proc. of AFIPS SJCC, Vol 40, pp 255-270, Spring 1972. [FRATTA 72] Fratta, L. and Gerla, M. "The Synthesis of Computer Networks: Properties of the Optimum Solution", ACM International Computing Symposium, Venice, Italy, 1972. [FREEMA 80] Freeman, H. A. "Great Editors's Introduction: Back-End Storage Networks", IEEE Computer, Vol. 13, No. 2, pp 7-8, Feb 1980. [FUCHS 70] Fuchs, E. and Jackson, P. E. "Estimates of Distributions of Random Variables for Certain Computer Communications Traffic Models" C.ACM, Vol. 13 No. 12, pp 752-757, Dec. 1970. [FUNG 79] Fung, F. T. and Torng, G. C. "In the Analysis of Memory Conflicts and Bus Contentions in a Multiple - Microprocessor System", IEEE Trans, on Comp., Vol. C28, No. 1, pp 28-37, Jan 1979. [GAVER 67] Gaver, D. P. "Probability Models for Multiprogramming Computer Systems", J.ACM, Vol. 14, No. 3, pp 423-438, July 1967. [GAVER 73] Gaver, D. P. and Shedler, G. S. "Processor Utilization in Multiprogramming Systems Via Diffusion Approximations", Opns. Res., Vol. 21, No. 2, pp 569-576, Apr. 1973. [GAVER 76] Gaver, D. P. and Humfeld, G. "Multitype Multiprogramming Models", Acta Informatica, Vol. 7, pp 111-121, 1976. [GELENB 73] Gelenbe, E. "The Distibution of a Program in Primary and Fast Buffer Storage", C.ACM, Vol. 16 No. 7, pp 431-434, July 1973. [GERLA 73] Gerla, M. "The Design of Store- And-Forward (S/F) Networks for Computer Communication", Ph.D., UCLA, 1973. 159 [GIAMMO 76] [GORDON 67A] [GORDON 67B] [GREENB 73] [GROSS 75] [HOOGEN 77] [HOAGLA 79] [IDC 76] [JACKSO 57] [JACKSO 63] [JAFARI 77] [KAHN 72] [KAMOUN 76] [KELLER 72] [KIMBLE 73] [KLEINR 64] Giammo, T. "Extensions to Exponential Queuing Network Theory for Use in a Planning Environment", Digest of Papers, COMPCON 76, Fall, Wash., D.C., pp 156-160, Sept. 1976. Gordon, W. J. and Newell, G. F. "Closed Queueing Systems With Exponential Servers", Opns. Res., Vol. 15 No. 2, pp 254-265, Apr. 1967. Gordon, W. J. and Newell, G. F. "Cyclic Queuing Systems with Restricted Length Queues", Opns. Res., Vol, 15 No. 2, pp 266-277, Apr. 1967. Greenberg, I. "Distribution-Free Analysis of M/G/l and G/M/l Queues", Opns. Res., Vol. 21 No. 2, pp 629-635, Apr. 1973. Gross, D. "Sensitivity of Queuing Models to the Assumption of Exponentiality", Naval Res. Logistics Quarterly, Vol. 22, No. 2, pp 271-287, June 1975. Hoogendoorn, C. H. "A General Model for Memory Interference in Multiprocessors", IEEE Trans, on Comp., Vol C26, No. 10, pp 998-1005, Oct 1977. Hoagland, A. S. 'Storage Technology: Capabilities and Limitations", IEEE Computer, Vol. 12, No. 5, pp 12-18, May 1979. IDC "Distributed Processing", International Data Corp. Rpt #1763, 214 Third Ave., Waltham, Mass., 02154, Dec. 1976. Jackson, J. R. "Networks of Waiting Lines", Opns. Res., Vol. 5, No. 4, pp 518-521, Aug. 1957. Jackson, J. R. "Jobshop-Like Queueing Systems", Manag. Sci., Vol. 10 No. 1, pp 131-142, Oct. 1963. Jafari, H. and Lewis, T. "A New Loop Structure for Distributed Microcomputing Systems", Proc. of the 1st Annual Rocky Mountain Symposium on Microcomputers Systems, Software and Architecture, Ft. Collins, Col., pp 121-141, Aug 1977. Kahn, R. E. "Resource-Sharing Computer Networks", Proc. IEEE, Vol. 60, pp 1397-1407, Nov. 1972. Kamoun, F. and Kleinrock, L. "Analysis of Shared Storage in a Computer Network Environment", Proc. of the Ninth Hawaii International Conference on System Sciences, Honolulu, Hawaii, pp 89-92, Jan. 1976. Keller, T. W.; Towsley, D. F.; Chandy, K. M. and Browne, J. C. "A Tool for Network Design: The Automated Analysis of Stochastic Models of Computer Networks", Digest of Papers, COMPCON 72, San Francisco, CA., pp 151-153, Sept. 1972. Kimbleton, S. R. "Computer Systems Models", University of Mich. Rpt., NTIS#AD774678, June 1973. Kleinrock, L. "Communications Nets: Stochastic Message Flow and Delay", McGraw-Hill, N.Y., 1964. 160 [KLEINR 67] [KLEINR 69] [KLEINR 70A] [KLEINR 70B] [KLEINR 75] [KLEINR 76] [KOBAYA 74A] [KOBAYA 74B] [KOBAYA 75] [KOBAYA 76] [KOBAYA 77] [KOBAYA 78] [KONHEI 76] [KRISHN 66] [LAM 79] [LAM 77A] Kleinrock, L. "Time-Shared Systems: A Theoretical Treatment", J. ACM, Vol. 14 No. 2, pp 242-261, Apr. 1967. Kleinrock, L. "Models for Computer Networks", Proc. of the International Communications Conf., Boulder, CO., Section 21, pp 9-16 Section 21, June 1969. Kleinrock, L. "Analytic and Simulation Methods in Computer Network Design", Proc.ofAFIPS SJCC, Vol 36, pp 569-579, Spring 1970. Kleinrock, L. "A Continuum of Time-Sharing Scheduling Algorithms", Proc. of AFIPS SJCC, Vol 36, pp 453-458, Spring 1970. Kleinrock, L. "Queueing Systems Volume I: Theory", John Wiley & Sons, Inc., N.Y., 1975. Kleinrock, L. "Queueing Systems Volume II: Computer Applications", John Wiley & Sons, Inc., N.Y., 1976. Kobayashi, H. "Application of the Diffusion Approximation to Queueing Networks I: Equilibrium Queue Distributions", JACM, Vol. 21, No. 2, pp 316-328, Apr. 1974. Kobayashi, H. "Application of the Diffusion Approximation to Queueing Networks II: Nonequilibrium Distributions and Applications to Computer Modeling", JACM, Vol. 21, No. 2, pp 459-469, Apr. 1974. Kobayashi, H. and Reiser, M. "On Generalization of Job Routing Behavior in a Queueing Network Model", IBM Res. Rpt. RC5252 (#23079), Feb. 1975. Kobayashi, H., "A Computational Algorithm for Queue Distributions Via Polyas' Theory of Enumeration", IBM Res. Rpt. RC6154 (#26522), Aug. 1976. Kobayashi, H. and Konheim, A. "Queuing Models for Computer Communications System Analysis" IEEE Trans, on Comm., Vol COM-25, No. 1, pp 2-28, Jan 1977. Kobayashi, H. "Modeling and Analysis: An Introduction to Systems Performance Evaluation Methodology". Addison-Wesley Pub. Co., Reading, Mass., 1978. Konheim, A. G. and Reiser, M. "A Queueing Model with Finite Waiting Room and Blocking", JACM, Vol. 23, No. 2, pp 328-341. Apr. 1976 Krishnamoorthi, B. and Wood, P. C. "Time-Shared Computer Operations With Both Interarrival and Service Time Exponential", J. ACM, Vol. 13, No. 3, pp 317-338. July 1966. Lam, C. Y., "The Intelligent Memorv Svstem Architecture - Research Directions", Mass. Inst, of Tech. Rpt., NTIS # ADA073485, Aug. 1979. Lam, S. S., "Queuing Networks with Population Size Constraints", IBM J. Res. Develop., Vol. 21, No. 4, pp 370-378, July 1977. 161 [LAM 77B] [LAMPSO 80] [LEE 78] [LEUNG 78] [LEWIS 71] [LIPSKY 77] [McCRED 73] [MCKINN 69] [MONAHA 79] [MOORE 72] [MORSE 58] [MUNTZ 72A] [MUNTZ 72B] [MUNTZ 73] [MUNTZ 74] Lam, S. S., "An Extension of Moore's Result for Closed Queuing Networks", IBM J. Res. Develop., Vol. 21, No. 4, pp 384-387, July 1977. Lampson, B. W. and Redell, D. D. "Experience with Processess and Monitors in Mesa", CACM, Vol. 23, No. 2, pp 105-117, Feb. 1980. Lee, C.M.B. "System Modeling - Where we are now and Where we have to go", Proc. of 17th Annual Tech. Symp.: Tools for Improved Computing in the 80's, NBS, Gaithersbur;g, MD., pp 211-220, June 1978. Leung, S. K. "Architecture of a Modular Network - The BANCS Network", Proc. COMPON 78: Computer Communications Networks, Wash. D.C., pp 212-220, Setp. 1978. Lewis, P. A. and Shedler, G. S. "A Cyclic-Queue Model of System Overhead in Multiprogrammed Computer Systems", J.ACM, Vol. 18 No. 2, pp 199-220, Apr. 1971. Lipsky, L. and Church, J., "Applications of a Queueing Network Model for a Computer System", ACM Computing Surveys, Vol. 9 No. 3, pp 205-221, Sept. 1977. McCredie, J. W. "Analytic Models as Aids in Multiprocessor Designs" Proc. Seventh Annual Princeton Conf. on Information Sciences and Systems, Princeton, N.J., pp 186-191, March 1973. McKinney, J. M. "A Survey of Analytical Time-Sharing Models", Computer Surveys, Vol. 2, No. 2, pp 106-116, June 1969. Monahan, J. H. "Opportunities and Challenges in the Evolution of Local Area Communications Networks", Proc. of the Local Area Communications Network Symposium, Boston, Mass., pp 15-24, May 1979. Moore, F. R. "Computational Model of a Closed Queueing Network with Exponential Servers", IBM J. Res. Develop., pp 567-572, Nov. 1972. Morse, P. M. "Queues, Inventories and Maintenance", John Wiley & Sons, Inc., N.Y., 1958. Muntz, R. R. "Waiting Time Distribution for Round-Robin Queueing Systems", Proc. of the Symposium on Computer-Communications Networks and Teletraffic, Polytechnic Institute of Brooklyn, N.Y., pp 429- 440, Apr. 1972. Muntz, R. R. "Poisson Departure Processes and Queueing Networks", IBM T. J. Watson Res. Center, IBM Rpt.#RC4145, Sept. 1972. Muntz, R. R. "Poisson Departure Processes and Queuing Networks", Proc. of the 7th Annual Princeton Conf. on Info. Sci. and Syst., Princeton, N.J., pp 435-440, Mar 1973. Muntz, R. R. and Wong, J. W., "Efficient Computational Procedures for Closed Queueing Network Models", Proc. of the Seventh Hawaii International Conference on System Sciences, Honolulu, Hawaii, pp 33-36, Jan. 1974. 162 [MUNTZ 78] [NAKARM 71] [NOETZE 76A] [NOETZE 76B] [OMAHEN 72] [OSTERW 72] [OUSTER 80] [PACK 73A] [PEARCE78] [PETERS 77] [PRICE 74] [RAU 79] [REGIS 73] [REISER 74] [REISER 75] Muntz, R. R. "Queuing Networks: A Critique of the State of the Art and Directions for the Future", ACM Computing Surveys, Vol. 10, No. 3, pp 353-360, Sept 1978. Nakarmura, G. "A Feedback Queueing Model for an Interactive Computer System", Proc. of AFIPS FJCC, Vol. 39, pp 57-64, Fall 1971. Noetzel, A. S., "Recent Generalizations of Queueing Networks with Product-Form Solutions", Proc. of the 1976 Conference on Information Sciences and Systems, The John Hopkins University, Bait., Md., pp 549- 552, Apr. 1976. Noetzel, A. S., "Throughput In Locally Balanced Computer System Models", Brookhaven National Laboratory Rpt., Upton, N.Y., NTIS#BNL-22531, Mar. 1977. Omahen, K. and Schrage, L. "A Queueing Analysis of a Multiprocessor System With Shared memory", Proc. of the Symposium on Computer- Communications Networks and Teletraffic, Polytechnic Institute of Brooklyn, N.Y., pp 77-88, Apr. 1972. Osterweil, L. J. "A Stochastic Model of Multiprocessor Access to an Interleaved Memory", Coloardo University Rpt., NTIS#PB-233695, Oct. 1972. Ousterhout, J. K., Scelza, D. A. and Sindhu, P. S. "Medusa: An Experiment in Distributed Operating System Structure", CACM, Vol. 23, No. 2, pp 92-104, Feb. 1980. Pack, C. D. "The Effects of Multiplexing on a Computer-Communication System", CACM, Vol. 16 No. 3, pp 161-168, Mar. 1973. Pearce, R. C. and Majithia, J. C. "Analysis of a Shared Resource MIMD Computer Organization", IEEE Trans on Comp, Vol. C27, No. 1, pp 64-67, Jan 1978. Peterson, J. L. "Petri Nets", ACM Computer Surveys, Vol. 9, No. 3, pp 223-252, Sept. 1977. Price, T. G. "Balanced Computer Systems", Stanford University Rpt., NTIS#AD/A001071, Apr. 1974. Rau, B. R. "Interleaved Memory Bandwidth in a Model of a Multiprocessor Computer System", IEEE Trans on Comp., Vol. C28, No. 9, pp 678-681, Sept 1979. Regis, R. C. "Multiserver Queueing Models of Multiprocessing Systems", IEEE Trans, on Comp., Vol. C22 No. 8, pp 736-744, Aug. 1973. Reiser, M and Kobayashi, H. "Accuracy of the Diffusion Approximation for Some Queueing Systems", IBM J. Res. Develop., Vol. 18 No. 2, pp 110- 124, Mar 1974. Reiser, M. and Kobayashi, H., "Queuing Networks with Multiple Closed Chains: Theory and Computational Algorithms", IBM J. Res. Develop., Vol. 19, pp 283-294, May 1975. 163 [REISER 76] Reiser, M. and Kobayashi, H., "On the Convolution Algorithm for Separable Queuing Networks", Proc. of the International Symposium on Computer Performance, Modeling, Measuement and Evaluation", pp 109- 117, Mar. 1976. [ROBERT 70] Roberts, L. G. and Wessler, B. D. "Computer Network Development to Achieve Resource Sharing", Proc. AFIPS SJCC, Vol. 36, June 1970. [ROOME 74 A] Roome, W. D. "Modeling and Design of the Communications and Computation Aspects of Computer Networks", Ph.D. Thesis, Cornell Universiy, 1974. [ROOME 74B] Roome, W. D. and Torng, H. C. "Modeling and Design of Computer Networks With Distributed Computation Facilities", Proc. 1974 Symposium - Computer Networks: Trends & Applications, National Bureau of Standards, Gaithersburg, Md., May 1974. [ROSE 78] Rose, C. A. "A Measurement Procedure for Queuing Network Models of Computer Systems", ACM Computing Surveys, Vol. 10, No. 3, pp 263-280, Sept 1978. [SAATY 73] Saaty, T. L. "Elements of Queueing Theory", McGraw-Hill Book Co., Inc., N.Y., 1961. [SCHERR 66] Scherr, A. L. "An Analysis of Time-Shared Computer Systems", MIT Press, Cambridge, MA., 1966. [SCHWAR 79] Schwartz, M. "Thoughput and Time Delay Analysis for a Common Queue Configuration in a Multiprocessor Environment", IEEE Trans, on Computers, Vol C-25, No 12, pp 939-941, Dec 79. [SEGALL 76] Segall, A., "Basic Models for Dynamic Routing in Computer- Communication Networks", Proc. of the Ninth Hawaii International Conference on System Sciences, Honolulu, Hawaii, pp 77-79, Jan. 1976. [SHEDLE 73] Shedler, G. S. "A Queueing Model of a Multiprogrammed Computer With a Two Level Storage System", CACM, Vol. 16, No. 1, pp 3-10, Jan 1973. [SHEMER 67] Shemer, J. E. "Some Mathematic Considerations of Time-Sharing Scheduling Algorithms", JACM, Vol. 14, No. 2, pp 262-272, Apr. 1967. [SHEMER 69] Shemer, J. E. and Heying, D. W. "Performance Modeling and Empirical Measurements in a System Designed for Batch & Time-Sharing Users", Proc. of AFIPS FJCC, Vol. 35, pp 17-26, Fall 1969. [SHERMA 72] Sherman, S.; Basket, F. and Browne, J. C. "Trace Driven Modeling & Analysis of CPU Scheduling in a Multiprogramming System", CACM, Vol. 15 No. 12, pp 1063-1069, Dec 1972. [SHUM 76] Shum, A. W. "Queuing Models for Computer Systems with General Service Time Distribution" Ph.D. Thesis, Harvard Univ., Dec 1976. [SHUM 77] Shum, A. W. and Buzen, J. P. "The EPF Technique: A Method for Obtaining Approximate Solutions to Closed Queuing Networks with General Service Times" in Measuring, Modeling, and Evaluating Computer Systems, H. Beilner and E, Gelenbe (Eds.), North Holland Pub. Co., pp 201-220, 1977. 164 [SMITH 77] [SMITH 79] [SPIRN 76] [SPIRN 79] [SPRING 78] [STOYAN 78] [TAKACS 62] [THEIS 78] [THORNT 80] [TOOMBS 78A] [TOOMBS 78B] [VOTAW 73] [WALLAC 66] [WALLAC 72] [WALLAC 73] [WARNAR 79] Smith, A. J. "Multiprocessor Memory Organization and Memory Interferences", CACM, Vol. 20, No. 10, pp 754-761, Oct 1977. Smith, A. J. "An Analytic and Experimental Study of Multiple Channel Controllers", IEEE Trans, on Comp., Vol C28, No. 1, pp 38-49, Jan 1979. Spirn, J. R., "Multi-Queue Scheduling of Two Tasks", Proc. of the International Symposium on Computer Performance, Modeling, Measurement and Evaluation, pp 102-108, Mar. 1976. Sprin, J. R. "Queuing Networks with Random Selection for Service", IEEE Trans, on S. W. Eng., Vol. SE5, No. 3, pp 287-289, May 1979. Spinger, J. F. "The Distributed Data Network, Its Architecture and Operationk", Proc. COMPON 78: Computer Communications Networks, Wash. DC, pp 221-228, Setp. 1978. Stoyan, D. "Queuing Networks - Insensitivity and a Heuristic Approximation", Proc. of the Seventh Conference on Stockastic Processes and Their Applications, Twente University of Tech., The Netherlands, Aug 1977. Takacs, L. "Introduction to the Theorv of Queues", Oxford University Press, N.Y., 1962. Theis, D. J. "An Overview of Memory Technologies", Datamation, pp 113- 131, June 1978. Thornton, J. E. "Back-End Network Approaches", IEEE Computer, Vol. 13, No. 2, pp 10-77, Feb 1980. Toombs, D. "CCD and Bubble Memories: System Implications," IEEE Spectrum, pp 36-39, May 1978. Toombs, D. "An update: CCD and Bubble Memories", IEEE Spectrum, pp 22-30, May 1978. Votaw, D.F. "A Theory of Storage Sizing", Mitre Corp. Rpt., NTIS # AD- 765175, July 1973. Wallace, V. L. and Rosenberg, R. S. "Markovian Models and Numerical Analysis of Computer System Behavior", Proc. AFIPS SJCC, Vol. 28, pp 141-148, 1966. Wallace, V. L. "Toward an Algebraic Theory of Markovian Networks", Proc. of the Symposium on Computer-Communications Networks and Teletraffic, Polytechnic Institute of Brooklyn, N.Y., pp 397-407, Apr. 1972. Wallace, V. L. "Algebraic Techniques for Numerical Solution of Queueing Network Theory", Proc. of the Sympoium on Mathematical Methods in Queueing Theory, Kalamazoo, Mich., PP 295-305, May 1973. Warnar, R. B. J. , Calomeris, P. J. and Recicar, S. A. "Computer Peripherial Memory Systems Forecast", National Bureau of Standards, Wash. D.C., NBS #SP500-45, Apr 1979. 165 [WATSON 80] Watson, R. W. "Network Architecture Design for Back-End Storage Networks", IEEE Computer, Vol. 13, No. 2, pp 32-48, Feb 1980. [WANG 78] Wang, J. W. "Queuing Network Modeling Computer Networks", ACM Computing Surveys, Vol. 10, No. 3, pp 333-342, Sept. 1978. [WELCH 79] Welch, T. A. "Analysis of Memory Hierarchies for Sequential Data Access", IEEE Computer, Vol. 12, No. 5, pp 19-26, May 1979. [WILEY 77] Wiley, J. M. "Just Enough Queueing Theory", Datamation, pp 87-96, Feb. 1977. ■ [WILKES 79] Wilkes, M. V. and Wheeler, J. D. "The Cambridge Digital Communication Ring", Proc. of the Local Area Communications Network Symposium, Boston, Mass., pp 47-61, May 1979. [WONG 78] Wong, J. W. "Queueing Network Modeling of Computer Communication Networks", ACM Computing Surveys, Vol. 10, No. 3, pp 333-342, Sept. 1978. [WYSZEW 74] Wyszewianski, R. J. "Feedback Queues in the Modeling of Computer Systems: A Survey", University of Michigan Rpt, NTIS#AD782360, May 1974. [ZUSSMA 77] Zussman, R. "Computer Queueing Analysis On a Handheld Calculator", Computer Design, pp 85-94, Nov. 1977. 166 ft U. S. GOVERNMENT PRINTING OFFICE : 1980 340-997/298 NBS-114A (REV. 2-8C) U.S. DEPT. OF COMM. BIBLIOGRAPHIC DATA SHEET (See instructions) 1. PUBLICATION OR REPORT NO. NBS SP 500-69 2. Performing Organ. Report No. 3. Publication Date November 1980 4. TITLE AND SUBTITLE Computer Science and Technology: An Analytic Study of a Shared Device Among Independent Computing Systems 5. AUTHOR(S) Alan Mink 6. PERFORMING ORGANIZATION (If joint or other than NBS, see instructions] NATIONAL BUREAU OF STANDARDS DEPARTMENT OF COMMERCE WASHINGTON, D.C. 20234 7. Contract/Grant No. 8. Type of Report & Period Covered Final 9. SPONSORING ORGANIZATION NAME AND COMPLETE ADDRESS (Street. City. State, ZIP) Same as No. 6 10. SUPPLEMENTARY NOTES Library of Congress Catalog Card Number: 80-600170 J Document describes a computer program; SF-185, FlPS Software Summary, is attached. 11. ABSTRACT (A 200-word or less factual summary of most significant information. If document includes a significant bibliography or literature survey, mention it here) Global queueing network performance models are developed for the increasingly important class of computer networks comprising a number of independent computing systems sharing a single resource. An extensive bibliography and survey of prior work relating to this topic are included. Analytic expressions of performance measures for this class of systems are derived from the general theory of multi- class queueing networks, and new computational algorithms for evaluating them are presented that are memory-space efficient (linear vs. exponential) compared with known algorithms for the general theory. This exact analytic model, called the Shared Central Server Model, incurs approximately the same exponential time complexity in its evaluation as do all models based on the general theory; because of this, a simple heuristic approximate model of this class of systems is also presented that is computationally efficient in both time and space. Modular expansion of this class of systems is investigated using the approximate model, and a useful relationship is derived between the number of additional independent computing systems and the incremental increase in capability of the shared resource required to maintain the existing level of system performance. 12. KEY WORDS (Six to twelve entries; a/phabetico/ order; capitalize only proper names; and separate key words by semicolons) Approximate queueing models; computer architecture; modular expansion analysis; performance evaluation; performance modeling; queueing models; queueing networks. 13. AVAILABILITY PT| Unlimited | | For Official Distribution. Do Not Release to NTIS KH Order From Superintendent of Documents, U.S. Government Printing Office, Washington, D.C. 20402. 3] Order From National Technical Information Service (NTIS), Springfield, VA. 22161 14. NO. OF PRINTED PAGES 176 15. Price $5.50 USCOMM-DC 6043-P80 There's anew look . . . the monthly magazine of the Nation- al Bureau of Standards. Still featured are special ar- ticles of general interest on current topics such as consum- er product safety and building technology. In addition, new sec- tions are designed to . . . PROVIDE SCIENTISTS with illustrated discussions of recent technical developments and work in progress . . . INFORM INDUSTRIAL MANAGERS of technology transfer activities in Federal and private labs. . . DESCRIBE TO MAN- UFACTURERS advances in the field of voluntary and mandatory standards. The new DIMENSIONS/NBS also carries complete listings of upcoming conferences to be held at NBS and reports on all the latest NBS publications, with information on how to order. Finally, each issue carries a page of News Briefs, aimed at keeping scientist and consum- alike up to date on major developments at the Nation's physi- cal sciences and measurement laboratory. (please detach here) SUBSCRIPTION ORDER FORM Enter my Subscription To DIMENSIONS/NBS at $11.00. Add $2.75 for foreign mailing. No additional postage is required for mailing within the United States or its possessions. Domestic remittances should be made either by postal money order, express money order, or check. Foreign remittances should be made either by international money order, draft on an American bank, or by UNESCO coupons. Send Subscription to: 1 1 1 1 1 1 1 1 1 NAME-FIRST, LAST 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 Q Remittance Enclosed (Make checks payable to Superintendent of Documents) D Charge to my Deposit Account No. COMPANY NAME OR ADDITIONAL ADDRESS LINE I 1 I I 1 I I 1 I I I I I I I I I I I I I I I I I I I I I STREET ADDRESS I 1 I I I I I I I 1 1 I I I 1 1 I I CITY I I 1 I I I I I I I 1 I 1 1 MAIL ORDER FORM TO: Superintendent of Documents Government Printing Office Washington, D.C. 20402 PLEASE PRINT U.S. DEPARTMENT OF COMMERCE National Bureau of Standards Washington. DC. 20234 OFFICIAL BUSINESS Penalty for Private Use. S300 PE NN STATE UNIVERSITY LIBRARIES AQQ007n23130 POSTAGE AND FEhb kaiu U.S. DEPARTMENT OF COMMERCE COM— '215 SPECIAL FOURTH-CLASS RATE BOOK