Unit Testing and Performance Using Entity Framework 4.0 Tommy H¨ ornlund
Transcription
Unit Testing and Performance Using Entity Framework 4.0 Tommy H¨ ornlund
Unit Testing and Performance Using Entity Framework 4.0 Tommy Hörnlund January 24, 2013 Master’s Thesis in Computing Science, 30 credits Supervisor at CS-UmU: Jan-Erik Moström Examiner: Fredrik Georgsson Umeå University Department of Computing Science SE-901 87 UMEÅ SWEDEN Abstract POÄNGEN is a web application for rent management. The core of the application is a module that performs rent calculations. In the past the application relied heavily on business logic in stored procedures that made the program hard to test and maintain. The purpose of this thesis was to find a new method for combining unit testing and data access. A new implementation of the rent calculation had to be created that was easier to test, maintain and have good performance. This thesis shows how to combine data access and unit tests using Entity Framework 4.0, an object relational mapping framework from Microsoft. The new module uses the Repository and Specification design patterns to create a data abstraction that is suitable for unit testing. Also the performance of Entity Framework 4.0 is evaluated and compared to traditional data loading and it shows that Entity Framework 4.0 severely lacks in performance when loading or saving large amounts of data. However the use of POCO entities makes it possible to create optimized functionality for time critical data access. ii Contents 1 Introduction 1 1.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.3 Goals & Purpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 2 POÄNGEN 3 2.1 Utility principle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 2.2 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 2.3 Residential unit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.4 Apartment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.5 Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.6 Score . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.7 Rent . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.8 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 3 Entity Framework 7 3.1 Entity Data Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 3.2 Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 3.3 LINQ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 3.4 Loading Related Entities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 3.5 Change Tracking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Testability 11 13 4.1 Repository . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 4.2 Unit of Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 4.3 POCO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 4.4 Mocking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 4.5 Inversion of Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 4.6 Unit Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 iii iv CONTENTS 5 Result 5.1 Overview . . . . . . . . . . 5.2 Entity Data Model . . . . . 5.3 POCO . . . . . . . . . . . . 5.4 Specification . . . . . . . . . 5.5 FetchStrategy . . . . . . . . 5.6 Repository . . . . . . . . . . 5.7 Calculation . . . . . . . . . 5.8 Dependencies . . . . . . . . 5.9 Data Access . . . . . . . . . 5.10 Data Persistence . . . . . . 5.11 Unit Tests . . . . . . . . . . 5.11.1 Testing Data Access 5.11.2 Test Data . . . . . . 5.11.3 Mocking . . . . . . . 5.11.4 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 17 18 19 19 20 20 21 22 23 23 23 24 25 25 25 6 Performance 6.1 Test Data . . . . . . . . . 6.2 Test Application . . . . . 6.3 Execution . . . . . . . . . 6.4 Result . . . . . . . . . . . 6.4.1 Calculation time . 6.4.2 Memory Usage . . 6.4.3 Persistence . . . . 6.4.4 Legacy Calculator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 29 29 29 30 30 30 31 33 . . . . . . . . 7 Conclusions 35 7.1 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 7.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 8 Acknowledgements 37 References 39 List of Figures 3.1 3.2 An example database diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . The entities that are mapped to the database tables in figure 3.1 . . . . . . . 5.1 5.2 5.3 Conceptual overview of the system . . . . . . . . . . . . . . . . . . . . . . . . 18 The real and the mock context implements the same interface . . . . . . . . . 19 Calulation module dependencies . . . . . . . . . . . . . . . . . . . . . . . . . 21 6.1 6.2 6.3 6.4 6.5 Data loading comparison . . . . . . . . . . . . Memore usage comparison . . . . . . . . . . . Entity Framework persistance performance. . Entity Framework persistence memory usage Comparison with legacy rent calculator . . . v . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 8 30 31 32 32 33 vi LIST OF FIGURES List of Tables 2.1 2.2 2.3 2.4 2.5 2.6 Common apartment properties. . . . . . . . . . An example model. . . . . . . . . . . . . . . . . Two apartments with different property values. Property values converted to score . . . . . . . Formula calculated score . . . . . . . . . . . . . Example . . . . . . . . . . . . . . . . . . . . . . vii . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 4 5 6 6 6 viii LIST OF TABLES Listings 3.1 3.2 3.3 3.4 3.5 3.6 3.7 4.1 4.2 5.1 5.2 5.3 5.4 5.5 5.6 5.7 5.8 5.9 5.10 5.11 Using LINQ to query a list of integers. . . . . . . . . . . Using LINQ to query an entity collection. . . . . . . . . Loading related entities by including them in the query. Explicitly loading related entities after the query. . . . . Loading related entities with lazy loading enabled. . . . Loading related entities before the query is executed. . . Saving changes to the context . . . . . . . . . . . . . . . Regular dependency . . . . . . . . . . . . . . . . . . . . Inversion of Control . . . . . . . . . . . . . . . . . . . . The calculation module interface . . . . . . . . . . . . . The specification interface . . . . . . . . . . . . . . . . . An excerpt from the generic repository interface . . . . Ordinary object instantiation. . . . . . . . . . . . . . . . Dependency injection using a factory lambda expression. Specification for an active apartment. . . . . . . . . . . Unit testing the Specification for an active apartment. . Mock example . . . . . . . . . . . . . . . . . . . . . . . . IFormulaCalculator interface . . . . . . . . . . . . . . . IModelLayoutCalculator interface excerpt . . . . . . . . Example unit test . . . . . . . . . . . . . . . . . . . . . ix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 9 10 10 10 10 11 15 15 17 20 20 22 22 24 24 25 26 26 27 x LISTINGS Chapter 1 Introduction TRIMMA is an IT company focusing on solutions for business management and decision making software. One if its major products is POÄNGEN (translates to ”Score”), a complete solution for rent management according to the utility principle. An explanation of the utility principle can be found in section 2.1. POÄNGEN started out as a small and simple application, but it has quickly grown in both functionality and number of customers. It is becoming harder and harder to maintain and develop the application and new development methods has to be found in order to ensure the future of POÄNGEN. A core part of the application is the rent calculation module. It is a data heavy process where large amounts of data is brought together to calculate the rent for each apartment. The current implementation suffers from many issues. The first problem is that it is hard to test the correctness of the module. The only way to know that it works correctly is because it has worked in the past. This means that the module cannot easily be extended, because the correctness cannot be verified if a change is made. Another issue is performance. With larger and larger customers the system cannot handle the tens of thousands of apartments these companies manage. If a new system design is proposed, it has to be highly efficient. 1.1 Overview Chapter 1 contains the background and problem statement. Chapter 2 is a brief introduction to the problem domain. Chapter 3 is an overview of Entity Framework, the object relational mapping framework used in this project. Chapter 4 contains a theoretical study about testability and Entity Framework. Chapter 5 describes the final application that was created. Chapter 6 contains a performance study on the resulting application. Chapter 7 contains the discussion of the results and conclusions. 1 2 1.2 Chapter 1. Introduction Problem Statement The task is to create a new rent calculation module with automated testing. The possibility to add new features should be taken into consideration as well as the performance of the calculation. A large part of the problem is that existing calculation logic reside in the database in the form of stored procedures. It is not possible with the current tools to automatically test this functionality. If a change is made there is no safeguard that the previous functionality is not altered. The problem is to move this logic to the application code without loosing performance. Presently POÄNGEN is using an object relation mapping (ORM) framework known as dOOdads[16]. This framework however is obsolete and no longer maintained. Therefore another ORM framework will be used, Microsofts Entity Framework 4.0[14]. Part of the project will be to evaluate the testability and performance of Entity Framework. 1.3 Goals & Purpose The purpose is to develop a method to incorporate automated testing in the development process. The method should be evaluated to determine if it is suitable for the existing application as well as in future development. The goal is to develop software that is easier to maintain and modify, without introducing new bugs in the existing functionality. Chapter 2 POÄNGEN This chapter is a brief introduction to the utility principle. The utility principle is the underlying principle of POÄNGEN, a complete solution for rent management. 2.1 Utility principle The idea behind the utility principle is to provide an alternative to market prices. The difference in price between apartments should be easily explainable because of different standards on the apartments[3]. In Sweden a landlord is governed by law to use the utility principle[7]. The method used in POÄNGEN is based on work by the Swedish Association of Public Housing Companies (SABO)[3]. The basic idea is that a score is calculated for each apartment and then the current rent is redistributed relative to the score to calculate the new rent. This means that the total income remains constant while some apartments get increased rent and others decreased. 2.2 Properties To describe the different aspects of the apartments a number of properties are defined. Properties can be of different data types; string, numeric, boolean and predefined values. The most common properties has been defined by SABO[3] while others has to be chosen based on experience. Some of the common properties can be seen in table 2.1. Properties can also be undefined, represented by a null value. Table 2.1: Common apartment properties. Name Data type Possible values Area Apartment type Balcony Address Location Numeric Predefined Boolean String Predefined Real numbers 1ROK, 2ROK, 3ROK yes or no Any string A, B, C 3 4 2.3 Chapter 2. POÄNGEN Residential unit Apartments are contained in groups called residential units. A unit usually consist of a single building or several very similar buildings. Properties such as building year, surroundings and distance to different services can be associated with residential units. 2.4 Apartment The apartment score is calculated from the number of rooms and the type of kitchen. For example, one room and kitchenette receives 24 points, while two rooms and a regular kitchen receives 40 points. This score is added to the area of the apartment and this becomes the total score for the apartment. The apartment score assumes that the apartment has all the standard equipment of a regular apartment. If some part of the apartment differs from the standard an adjustment score has to be added. For example, if the balcony is extra large the apartment should gain extra points, while if the balcony is extra small or missing the apartment should loose points. The different types of scores are described in section 2.6. 2.5 Model A model is a mapping from properties to points, different property values can be assigned different points. An example model can be seen in table 2.2. Each property is also assigned a formula alias. All points with the same formula alias are summed and substituted into the corresponding variable in the formula, described in the next section. Table 2.2: An example model. The apartment type has a fixed score for each possible value. The area property is a numerical property so it can be directly converted to points. Formula alias Property Value Points A A A Apartment type Apartment type Apartment type 1ROK 2ROK 3ROK 34 40 44 A Area X X B B Balcony Balcony Yes No 0 -5 C C C Location Location Location A B C 35 33 21 2.6. Score 2.6 5 Score There are three different types of points in the model. The A score is known as the apartment score. It is a measure of the size and number of rooms in the apartment. Each apartment is assumed to satisfy the minimum standard requirements. For example, rooms should have at least one window and heating should be included in the rent. The second type is the B score, called the adjustment score. If the apartment differs from a standard apartment an adjustment has to be made. For example if an apartment has no balcony a negative score will be added to the B score. The final type is the C score, called the residential unit score. This is the score for all properties that are shared by all apartments in a residential unit. The desirability of the building location can be one such property. The total score is calculated by taking the residential unit score C and adding 100. This score is then multiplied by the apartment score A. Finally the adjustment score B is multiplies by 100 and added to the product. The final formula can be seen in equation 2.1. Total score = (100 + C) × A + (100 × B) 2.7 (2.1) Rent To convert from points to rent the total income from all apartments are divided by the sum of the score for all apartments, see equation 2.2. Factor = Total income Sum of all scores (2.2) The resulting factor is then multiplied by the score of the apartment to calculate the rent, equation 2.3. Apartment rent = Score × Factor 2.8 (2.3) Example An example is two apartments having the same rent of 3000 SEK, but different standard. With the utility principle the rent should be redistributed to reflect the differences of the apartments. Table 2.3: Two apartments with different property values. Value Apartment one Area Type Balcony 32 2ROK Yes Apartment two 30 1ROK No 6 Chapter 2. POÄNGEN The first apartment in table 2.3 has an area of 32 m2 , two room and kitchen (2ROK) and a balcony. The second apartment has an area of 30 m2 , one room and kitchen (1ROK) but no balcony. Using the model in table 2.2 the properties can be converted to points in table 2.4. Table 2.4: The score for each property in table 2.3. Points Apartment one Area Type Balcony Apartment two 32 40 0 30 34 -5 Again using the model the A, B and C score can be calculated in table 2.4. The final score is calculated by using equation 2.1. Table 2.5: The score for the two apartments. The total score is calculated using the formula (100 + C) × A + (100 × B). Points A B C Total Apartment one Apartment two 72 0 0 7200 64 -5 0 5900 Apartment ones type is worth 40 points. Added with the area the A score becomes 72. The total points become (100 + 0) × 72 + (100 × 0) = 7200. The second apartment receives (100 + 0) × 64 + (100 × −5) = 5900. 60 The factor becomes 3000+3000 7200+5900 = 131 . Now the factor can be multiplied with the new score to calculate the new rent. Table 2.6: Example Apartment one Points Old rent New rent 7200 3000 SEK 3298 SEK Apartment two 5900 3000 SEK 2702 SEK Note that the sum of the new rents are the same as the sum of the old rents. It has been redistributed to better reflect the utility of the apartments. Chapter 3 Entity Framework According to the requirements Entity Framework had to be used, the reason being that the framework was already in use for other projects at the company. Entity Framework is an object relation mapping framework from Microsoft that is included in the .NET framework[14]. The version used for this project is Entity Framework 4.0. The basic idea of Entity Framework is to eliminate the impedance mismatch between business logic and data representation. This is done using the Entity Data Model (EDM). 3.1 Entity Data Model The Entity Data Model has two basic components. – Entities are strictly typed data structures that contains the record data and an identifier key. – Relationships are associations between entities. More advanced features of the EDM are inheritance and complex types[14], but these are not used in this project. The Entity Data Model should be created to reflect the structure of the business objects used in the application. It may be necessary to create different data models for different parts of the application, while still using the same database. 3.2 Mapping To populate the data model with data from an actual relational database a mapping has to be created. Entities can be mapped to database tables, but several tables can also map to a single entity or a table can be split up into several entities. In figure 3.1 the Employee and ContactInfo tables are combined into a single entity. The mapped entities can be seen in figure 3.2. The Company entity can be accessed from a property on the Employee entity, and the Company entity contains a list of all employees associated with the company. When accessing the entities in the application code Entity Framework will automatically fetch data from the database and populate the in-memory data structure. 7 8 Chapter 3. Entity Framework Company Employee Employee PK Company EmployeeID PK CompanyID Salary FK1 ContactInfoID FK2 CompanyID Name ContactInfo PK ContactInfoID Name Adress Phone Figure 3.1: An example database diagram. The dashed boxes show the entities that the database tables are mapped to. Employee +Salary : decimal +Name : string +Adress : string +Phone : string +Company : Company 0..* 1 Company +Name : string +Employees : List< Employee > Figure 3.2: The entities that are mapped to the database tables in figure 3.1. 3.3. LINQ 3.3 9 LINQ Instead of using SQL query strings to query the entity model the C# language has introduced a new feature called Language Integrated Query (LINQ). LINQ can be used to query a number of different data sources like databases, collections, XML documents and entity models using the same syntax. In listing 3.1 a LINQ query is made against a list of numbers. All numbers less or equal to five are selected and the numbers are sorted in ascending order. Listing 3.1: Using LINQ to query a list of integers. int [] numbers = new int [] {5 , 7 , 1 , 4 , 9 , 3 , 2 , 6 , 8}; var smallnumbers = from n in numbers where n <= 5 orderby n select n ; foreach ( var n in smallnumbers ) { Console . Write ( n ) ; } OUTPUT : 12345 The same syntax is used to load entities from the database and when used with entities it is usually referred to as LINQ to Entities. Each entity type is represented as a collection and all collections are contained in an ObjectContext class. The object context act as a repository and a unit of work, concepts described in chapter 4. For now the important thing is that entities are accessed through a collection, in the same way as the number example in listing 3.1. Listing 3.2 shows an example where LINQ is used to query the company context for all employees named ”Bob”. Listing 3.2: Using LINQ to query an entity collection. CompanyContext companyContext = new CompanyContext () var bobs = from e in companyContext . Employees where e . Name == " Bob " select e ; 10 Chapter 3. Entity Framework 3.4 Loading Related Entities When loading an entity related entities can be loaded as well, as defined by the relationships in the EDM. There are several ways related entities can be loaded. 1. Specified in the query 2. Explicit loading 3. Lazy loading 4. Eager loading The first method is used in listing 3.3 and it references the related fields in the query and selects them. Listing 3.3: Loading related entities by including them in the query. var result = from e in companyContext . Employees select new { Name = e . Name , Company = e . Company . Name }; In listing 3.4 the employee entity is loaded first, then the company navigational property of the employee is explicitly loaded. The First() method simply returns only the first employee in the result set. This methods requires two round trips to the database to retrieve the data. Listing 3.4: Explicitly loading related entities after the query. var employee = ( from e in companyContext . Employees select e ) . First () ; employee . Company . Load () If the lazy loading option is enabled in Entity Framework there is no need to explicitly load the related entity, it is automatically loaded when it is accessed like in listing 3.5. This too requires two round trips to the database and care must be taken when accessing a navigational property. If for example lazy loading happens inside the loop iterating over a list of employees an SQL query will be executed for every iteration. Listing 3.5: Loading related entities with lazy loading enabled. var employee = ( from e in companyContext . Employees select e ) . First () ; Company company = employee . Company ; The final method in listing 3.6 is eager loading. Here the Company related entity is included just after the LINQ query. This only creates a single SQL query joining the tables together. Listing 3.6: Loading related entities before the query is executed. var result = ( from e in companyContext . Employees select e ) . Include ( " Company " ) ; 3.5. Change Tracking 3.5 11 Change Tracking When a change is made to an entity the change is automatically tracked by Entity Framework. To persist the changes to the database the SaveChanges method is called on the context object, as in listing 3.7. Listing 3.7: Persisting the entity changes made to the object context. var employee = ( from e in companyContext . Employees select e ) . First () ; employee . Salary += 1000; companyContext . SaveChanges () ; 12 Chapter 3. Entity Framework Chapter 4 Testability The focus of the study has been on the subject of testability, specifically how to unit test data access code. Because one of the requirements was to use Microsoft Entity Framework 4.0 a lot of effort was put into finding information about testability when using EF4. In an articled published on MSDN Scott Allen demonstrates some common unit testing techniques that can be applied to EF4 [1]. Allen argues that extensive unit testing is a valuable tool to developer teams. However, the effort in creating these unit tests are related to the testability of the code. Therefore Entity Framework 4.0 was designed with testability in mind. Allen presents two metrics that will always be exhibited by highly testable code. The first one is observability. If a method is observable, it is easy to visually observe the output of the method, with a given input. Methods with many side effects are hard to observe. The other metric is isolation. When you unit test a method you only want to test the logic inside the method. But if the method depends on some external resource, for example a network socket or database, the unit test might fail if the resource is off line. The resource might also take a very long time to respond, leading the automated test to take a very long time to run. To achieve testable code a separation of concerns should be maintained. This concept was termed the Single responsibility principle by Robert C. Martin [11]. It is based on the concept of cohesion and can be summarized as: “There should never be more than one reason for a class to change”. In this case the logic should reside in one class or module and the external resource access should reside in another. They can then be unit tested in isolation. These metrics presented by Allen are very basic metrics but they can easily be applied to any newly developed code. The concept will also be repeated when other patterns are discussed, so these metrics will be used to evaluate the resulting code of the project. 4.1 Repository Allen goes on to explain some common abstractions that are useful for abstracting data persistence. One very common abstraction is the Repository pattern. This design pattern has been documented by Martin Fowler in his book Patterns of Enterprise Application Architecture[5] and a short overview of the pattern can also be found on his website[6]. The repository pattern is very commonly used, both for unit testing and other uses. According to Fowler a repository “mediates between the domain and data mapping layers using a collection-like interface for accessing domain objects”. Allen says that this isolates 13 14 Chapter 4. Testability the details of the data persistence, and that it fulfills the isolation principle required for testability. However, he also adds that the interface for a repository shouldn’t contain any operations for actually persisting the objects back to the data source. In the spirit of the single responsibility principle, a separate structure should be used, and he presents the Unit of Work pattern. 4.2 Unit of Work The Unit of Work pattern is also described by Martin Fowler in his book Patterns of Enterprise Application Architecture[5] and on his website[6]. Allen mentions that the unit of work pattern should be a familiar pattern for .NET developers, because it has been used in the ADO.NET DataSet class. It has the ability to handle update, delete and insert operation on database table rows. It is however tightly coupled to database logic. The goal is to isolate the specifics of data persistence. This is why Allen argues that the unit of work pattern is required. The default behaviour in Entity Framework 4.0 is to create a class extending the ObjectContext class. The object context serves as both a repository for generated entities as well as a unit of work. There is no interface defined for the object context, which is a problem when you want to achieve isolation and testability. Fortunately there exists several extensions for generating entities which can be used instead of the default code generator. Allen uses a template that generates POCO (Plain Old CLR Objects). 4.3 POCO The POCO concept originates from the Java POJO classes (Plain Old Java Objects). The POCO object is independent of the data source and contains only data and business logic. This is known as Persistence Ignorance. According to Allen objects using POCO classes are easier to test than entities that include information about persistence. Julie Lerman mentions two flavours of POCO entities on her blog[9] which is the same information as in her book Programming Entity Framework[10]. The first type is Data Transfer Objects (DTO) that are unable to notify the object context of any changes made to the entities. The changes to the context is only checked before a commit is made on the context. Lazy loading is not possible with this POCO type. If all the properties and associations of the POCO class is declared virtual a second type of POCO entites are possible. When the context creates the POCO entity it actually creates a proxy class that overrides the methods of the POCO class and provides feedback to the context when the POCO is manipulated. This makes it possible for the context to intercept if an association is accessed that is not yet loaded. It can then be lazily loaded on demand. One interesting detail is that Lerman puts the generated POCO entities in a separate class library, allowing to create different applications that are only connected by the POCO entities. 4.4 Mocking One of the biggest thresholds in beginning unit testing is how to isolate a unit before testing it. Tim Mackinnon, Steve Freeman and Philip Craig used mock objects to isolate units in their paper[20]. According to them, what makes unit testing hard is that the units are tested from the outside. 4.5. Inversion of Control 15 Using mock objects it is possible to test code in isolation. Mock objects replace the application code with dummy classes that emulate the real objects, but provide a much simpler implementation that can be set up with data relevant to the unit tests. If the mock objects become too complex this is an indication that the application code itself is too complex and requires refactoring. 4.5 Inversion of Control In his examples Allen creates a class that is dependent upon the interface of a unit of work class like in listing 4.1. Because it uses an interface he can create another implementation of the unit of work class that has no database connection, it just uses hard coded in memory data known as a fake class. To be able to switch implementation the creation of the unit of work class is moved from the constructor to a member variable that can be sent to the constructor like in listing 4.2. Listing 4.1: The Controller class depends on the UnitOfWork class. class Controller { UnitOfWork unitOfWork ; Controller () { this . unitOfWork = new UnitOfWork () ; } } Listing 4.2: Inversion of control is used to break the dependency. class Controller { IUnitOfWork unitOfWork ; Controller ( IUnitOfWork uow ) { this . unitOfWork = uow ; } } This is a very simple implementation of a pattern known as Dependency Injection. As Allen mentions this is only a simple example, a real project would use a more complex method to automate the process of dependency injection. When creating the data needed in the unit test Allen creates a class that initializes test data intended to be used across multiple test suits. This is a design pattern known as Object Mother, described by Schuh and Punke[18]. They show that it can be a very useful pattern for unit tests that requires data that closely resembles real data. However, as mention by Martin Fowler[4], it creates a strong coupling between tests that use the same test data. Changes to a test that requires the test data to change might affect other tests. The pattern still seems very useful, but it is slightly outside the scope of this thesis. 16 4.6 Chapter 4. Testability Unit Testing For unit testing Lerman uses mock contexts that implement the context interface[9]. Instead of accessing a database the mock context returns mock object sets that read its data from an internal list of POCO entities. Several mock contexts are created, for example one with valid data and one with invalid data. This approach is similar to using the ObjectMother pattern mentioned in section 4.5, and it suffers the same drawback that the tests becomes strongly connected through the shared test data. The practical use of testability is the ability to unit test the code. R. Venkat Rajendran is writing in a paper[19] about the impact of testing in general and the benefits and drawbacks of unit testing. One of the benefits is the ability to test one part of the code without having to rely on other parts being available. This makes it possible for several programmers to work on and create unit tests simultaneously. Unit testing also makes it possible to debug a very confined piece of code. It is also possible to test special test cases with state that is very hard to set-up for the whole program. The overall structure of the code is also improved when unit testing is enforced. Unit testing is the most cost effective type of testing, because it occurs in the early stages of development. Some of the drawbacks with unit testing according to Rajendran is that unit testing is boring. The solution to this is to provide better tools to automate repetitive task. Another problem is that documentation of test cases is rarely done in practise. This makes it hard to modify existing test cases. Because lots of stubs have to be created in order for a unit test to function, the test code is in many cases larger then the production code. Stub code can have bugs as well. Some of these drawbacks can be resolved, like enforcing code conventions that create self-documenting code. Also if the code has a high testability the unit tests will be less complex, reducing the number of bugs in the test code. The effort of writing full coverage unit tests will always be great, and a careful decision has to be made if the program is important enough to justify such an effort. Chapter 5 Result This chapter describes the final implementation of the application. The module has a service oriented interface shown in listing 5.1. There is a method for calculating the rent for a set of apartments, given the id of a model, and to calculate the rent for all apartments. There is also event handlers to receive feedback about the calculation progress. Listing 5.1: The calculation module interface. public interface I O b j e c t C a l c u l a t i o n S e r v i c e { event P u m a C a l c u l a t i o n S e r v i c e . ObjectService . O b j e c t C a l c u l a t i o n S e r v i c e .←CalculationProgressHandler CalculationProgress ; event P u m a C a l c u l a t i o n S e r v i c e . ObjectService . O b j e c t C a l c u l a t i o n S e r v i c e .←C a l c u l a t i o n E v e n t H a n d l e r C al c ul a ti o nE v en t ; void Ca l cu l at e Ob j ec t s ( IEnumerable < int > objectIDs , int modelID , string ←ca lcul atio nNam e ) ; void C a l c u l a t e A l l O b j e c t s ( int modelID , string c alcu lat ionN ame ) ; } 5.1 Overview A conceptual overview of the architecture can be seen in figure 5.1. The business logic is separated from the data access layer and only depends on the POCO entities. The generic query repository uses specifications and fetch strategies to fetch entities from the context. The resulting entities can then be used in the business logic module. The data access and business logic is wrapped in a service layer that acts as a layer between the whole calculation module and the service consumer, in this case a web application. 17 18 Chapter 5. Result Service GenericRepository Logic Context Specification FetchStrategy POCO Figure 5.1: Conceptual overview of the system. 5.2 Entity Data Model The core of the data access layer is the Entity Data Model. It is semi-automatically generated from the current development database and each table is directly mapped to an entity object. The foreign key relations are also included as associations between entities. Because of legacy artefacts in the database some minor adjustment has to be made to the data model, for example relations without foreign key constraints has to be added manually. A T4 template[12] is used to generate the context. T4 templates are a combination of program code and a scripting language that’s used to output program code. A mock context and mock object set is also generated, to allow mocking of dependencies in the unit tests. Figure 5.2 shows how the real context and the mock context implement the same interface. This allows for unit tests that replace the real data access with mock data access. 5.3. POCO 19 «interface» IPumaModelContext «interface» IObjectSet +Entities () : IObjectSet <Entity> MockObjectSet PumaModelContext +Entites () : IObjectSet <Entity> PumaModelContextMock +Entities () : MockObjectSet <Entity> ObjectContext Figure 5.2: The real and the mock context implements the same interface. 5.3 POCO The POCO entities were generated with the same template as the context. They are placed in a separate project, having no references to any other project. This makes it possible to write business logic that is not dependent on the data source. Although the POCO entities are generated from a database, this is a one-time operation. When the classes are in place, instances can be created at any time, without requiring a database connection. 5.4 Specification A specification checks if an entity satisfies a certain condition. The condition is specified as a LINQ expression. The same expression is used both to check if an in-memory entity satisfies the specification, but it is also used in LINQ to Entities (3.3) to receive entities from the database. This eliminates any duplicate code between accessing the in-memory model and 20 Chapter 5. Result accessing the database, as well as isolating the query expression so it can be unit tested. The most important method in listing 5.2 is IsSatisfiedBy. It determines if an entity satisfies the LINQ expression in the specification. The Predicate property simple return the internal expression. There is also methods to combine the specifications using boolean logic. Listing 5.2: The specification interface public interface ISpecification <T > { Expression < Func <T , bool > > Predicate { get ; } bool IsSatisfiedBy ( T entity ) ; ISpecification <T > And ( ISpecification <T > other ) ; ISpecification <T > Or ( ISpecification <T > other ) ; } 5.5 FetchStrategy FetchStrategy is a very simple class that contains the associated entities that should be loaded when the root entity is loaded. This is the same feature as Include in Entity Framework (3.4), but wrapped in its own class. The fetch strategy is used together with the specification when loading entities from the repository, as can been seen in listing 5.3. 5.6 Repository The repository is based on a generic repository created by Will Beattie[2]. The basic idea when loading an entity is to provide a specification of the same entity type. Only entities satisfying the specification will be loaded. Part of the interface of the repository can be found in listing 5.3. It contains methods to load a single entity matching a specification, load all entities matching this specification and to check if any entity exists that matches the specification. In addition a FetchStrategy can be supplied. It determines if any associated entities should be loaded as well. This allows Entity Framework to load the associated entities joined in a single query, decreasing the number of queries required thus increasing performance. Listing 5.3: An excerpt from the generic repository interface public interface I G e n e r i c Q u e r y R e p o s i t o r y { T Load <T >( ISpecification <T > spec ) where T : class ; IEnumerable <T > LoadAll <T >( ISpecification <T > spec ) where T : class ; bool Matches <T >( ISpecification <T > spec ) where T : class ; T Load <T >( ISpecification <T > spec , IFetchStrategy <T > fetchStrategy ) where T : class ; ... } 5.7. Calculation 5.7 21 Calculation The logic module handles all aspect of the rent calculation. The class dependency diagram can be seen in figure 5.3. It is important to note that all dependencies are actually dependencies on the interface. Another important point is that the module is not aware of any part of the data access modules. It uses the POCO entities as if they were in-memory object graphs. Calculator ObjectCalculator FormulaCalculator RentCalculator ModelLayoutCalculator AdjustmentValueCalculator CalcValueCalculator DependencyCalculator OPICalculator Figure 5.3: The depenency between classes in the calculation module. 22 Chapter 5. Result 5.8 Dependencies Each dependency between two classes is implemented using dependency injection. Instead of instantiating an object the usual way, as in listing 5.4, a factory method is used in listing 5.5 to instantiate the dependency. Listing 5.4: Ordinary object instantiation. public class F o r m u l a C a l c u l a t o r : I F o r m u l a C a l c u l a t o r { public decimal MyMethod () { IModelLayoutCalculator modelLayoutCalculator = new M o d e l L a y o u t C a l c u l a t o r () ; ... } } Listing 5.5: Dependency injection using a factory lambda expression. public class F o r m u l a C a l c u l a t o r : I F o r m u l a C a l c u l a t o r { public Func < IModelLayoutCalculator > M o d e l L a y o u t C a l c u l a t o r F a c t o r y = () = > new M o d e l L a y o u t C a l c u l a t o r () ; public decimal MyMethod () { IModelLayoutCalculator modelLayoutCalculator = M o d e l L a y o u t C a l c u l a t o r F a c t o r y () ; ... } } The factory method in listing 5.5 may look complex if you are unused to the the syntax, but it is simply a first class function stored in the member variable ModelLayoutCalculator Factory. The function is assigned a default value that uses the C# language feature of lambda expressions to create a method that has no input (the empty parenthesis) and returns a new instance of the ModelLayoutCalculator class. To invoke the function the variable name is used, followed by parentheses. This implementation differs from the one found in the study in section 4.5. In this case the only reason for using dependency injection is to replace the real dependency with a mock object. In the actual application the dependencies are hard coded. Therefore the factory methods allows a default dependency to be implemented, and this makes the classes easier to use, because the dependencies doesn’t have to be sent to the constructors. If dependency injection is used to allow different implementations in the actual production code this method is likely insufficient. 5.9. Data Access 5.9 23 Data Access The generic repository and specification is only one way to load the data. To measure the performance of this approach four other loading methods has been implemented to be used as a reference. – Using LINQ to query directly against the object sets, as described in chapter 3. – Using a SQL query string with ADO.NET. – Using a stored procedure and calling it using ADO.NET. – Using Entity Framework function import to create a strongly typed result object from the stored procedure. 5.10 Data Persistence To save the result from the calculation to the database, instances of POCO classes that are to be saved to the database are created and added to the object context. The result is then persisted by Entity Framework to the database. The test in section 6.4.3 showed that this method was highly inefficient and had to be abandoned. Instead the SqlBulkCopy[13] class was used that can efficiently copy data from any data source to a database. 5.11 Unit Tests All these techniques come together in the unit tests. Because the classes are decoupled from both the data access layer and from each other the unit tests become very simple to write. The framework used for unit testing is Visual Studio Unit Testing Framework [15]. This framework is built into Microsoft Visual Studio. 24 Chapter 5. Result 5.11.1 Testing Data Access The generic repository only have to be tested once, it doesn’t have to change when new entities are added. What remains is to test the specifications. The specification in listing 5.6 is only satisfied by apartments (objects) that are active. In this case an apartment is active if its isInActive attribute is null or false. There are three possible states for apartments: 1. isInActive = null should satisfy the specification. 2. isInActive = false should satisfy the specification. 3. isInActive = true should not satisfy the specification. Each state can now be tested in a unit test. The only dependency that specification has is on POCO entities. Recall from section 5.3 that POCO entities have no dependencies at all. Listing 5.6: Specification for an active apartment. public class O b j e c t I s A c t i v e S p e c i f i c a t i o n : SpecificationBase < PumaPOCO . Object > { public O b j e c t I s A c t i v e S p e c i f i c a t i o n () { predicate = obj = > ! obj . isInActive . HasValue || obj . isInActive . Value ←== false ; } } Listing 5.7: Unit testing the Specification for an active apartment. [ TestMethod () ] public void ←O b j e c t I s A c t i v e S p e c i f i c a t i o n _ s h o u l d _ m a t c h _ o b j e c t _ w i t h _ i s I n A c t i v e _ n u l l () { PumaPOCO . Object obj = new PumaPOCO . Object () { isInActive = null }; O b j e c t I s A c t i v e S p e c i f i c a t i o n target = new O b j e c t I s A c t i v e S p e c i f i c a t i o n () ; bool expected = true ; bool actual = target . IsSatisfiedBy ( obj ) ; Assert . AreEqual ( expected , actual , " Object should satisify specification " ) ; } 5.11. Unit Tests 5.11.2 25 Test Data Because the test cases are so isolated in most cases the amount of test data required for each unit test is very small. Instances of POCO entities are created on the fly in the test method and sent as parameters to the method under test. 5.11.3 Mocking Using the technique in section 5.8 it is possible to replace the real implementation of a dependency with a fake, or mock object. One way of doing this is to implement the same interface and replace the factory to return the mock object. In this case an external library called Moq[8] is used. It is a library that makes it possible to easily implement an interface on the fly. By default each method will return the default value of the return type, for example null for all reference types. Specific methods can then be overrided to return any value. A short example can be seen in listing 5.8. Listing 5.8: A mock implementation of ICalcValueCalculator is created and the IsObject MatchingCalcValue method is overridden to always return true for any input. [ TestMethod () ] public void MyTest () { Mock < ICalcValueCalculator > c a l c V a l u e C a l c u l a t o r M o c k = new Mock < ICalcValueCalculator >() ; c a l c V a l u e C a l c u l a t o r M o c k . Setup ( x = > x . I s O b j e c t M a t c h i n g C a l c V a l u e ( It . IsAny < ICalculationObject >() , It . IsAny < CalcValue >() ) ) . Returns ( true ) ; } } 5.11.4 Example An example of a test method from the project can be seen in listing in 5.11. The test tests the FormulaCalculator method GetFormulaCalculatedPointsForObject which interface can be seen in listing 5.9. The purpose of this class is to take an apartment (ICalculationObject), model and formula and calculate the score for the apartment. Because each class is supposed to have only a single responsibility (section 4) this class only takes the score for each formula alias (A, B, C) and substitutes them into the formula to calculate the final score. The rest of the calculation is performed by another class, through an interface called IModelLayoutCalculator. The method that calculates the points is called GetFormulaCalculatedPointsForObject. The interface can be seen in listing 5.10. The first thing to do in the unit test in listing 5.11 is to set up the test data. Because POCO entities are used they can simply be created on the fly, and only the relevant data has to be initialized. For example the model and object will never be read, so no fields has to be initialized. The formula is initialized to a + 2 × b. The next step is to hard code a return value for the GetFormulaCalculatedPointsForObject method, because the purpose of this test is to test the FormulaCalculator, not any other classes. Using Moq[8] a mock object is created and the method is set up to return the hard coded value. 26 Chapter 5. Result Using the inversion of control factory method the FormulaCalculator is set up to use the mock object instead of the real implementation. Now that everything is set up the actual method that is to be tested can be called, and the returned value should be 1 + 2 × 2 = 5. Listing 5.9: The interface IFormulaCalculator implemented by FormulaCalculator. [ TestMethod () ] public interface I F o r m u l a C a l c u l a t o r { decimal G e t F o r m u l a C a l c u l a t e d P o i n t s F o r O b j e c t ( I C a l c u l a t i o n O b j e c t obj , ←I C a l c u l a t i o n M o d e l model , Formula formula ) ; } Listing 5.10: An excerpt from the interface IModelLayoutCalculator implemented by ModelLayoutCalculator. [ TestMethod () ] public interface I M o d e l L a y o u t C a l c u l a t o r { Dictionary < string , decimal > G e t P o i n t s F o r R o o t M o d e l L a y o u t s (←I C a l c u l a t i o n O b j e c t obj , I C a l c u l a t i o n M o d e l model ) ; ... } 5.11. Unit Tests 27 Listing 5.11: A test method for the formula calculator. [ TestMethod () ] public void S h o u l d _ c a l c u l a t e _ t h e _ p o i n t s _ b a s e s _ o n _ t h e _ f o r m u l a () { // Setup test data . I C a l c u l a t i o n M o d e l model = new C al c ul a ti o nM od e l () ; I C a l c u l a t i o n O b j e c t obj = new C a l c u l a t i o n O b j e c t () ; Formula formula = new Formula () { Formula1 = " a + 2 * b " }; // Instead of the model layout calculator calculating the points the ←result is hard coded . Dictionary < string , decimal > points = new Dictionary < string , decimal >() ; points . Add ( " a " , 1) ; points . Add ( " b " , 2) ; // Create a mock of the model layout calculator that returns the hard ←coded points . Mock < IModelLayoutCalculator > m o c k M o d e l L a y o u t C a l c u l a t o r = new Mock <←IModelLayoutCalculator >() ; m o c k M o d e l L a y o u t C a l c u l a t o r . Setup ( m = > m . G e t P o i n t s F o r R o o t M o d e l L a y o u t s ( It . IsAny < ICalculationObject >() , It . IsAny < ICalculationModel >() ) ) . Returns ( points ) ; // Use the inversion of control factory to make the formula calculator ←use the mock object instead of the real object . F o r m u l a C a l c u l a t o r target = new F o r m u l a C a l c u l a t o r () ; target . M o d e l L a y o u t C a l c u l a t o r F a c t o r y = () = > m o c k M o d e l L a y o u t C a l c u l a t o r .←Object ; decimal expected = 5; // Make the call to the method under testing . decimal actual = target . G e t F o r m u l a C a l c u l a t e d P o i n t s F o r O b j e c t ( obj , model , ←formula ) ; // Assert that the returned value is the expected one . Assert . AreEqual ( expected , actual , " The formula calculated points are ←incorrect . " ) ; } Almost all unit test are based on the layout of the test in listing 5.11. Sometimes not all steps are necessary, for example if a class has no dependencies. The following are the steps used: 1. Create test data. 2. Create mock object that return test data. 3. Replace the real dependencies with mock objects. 4. Call the method that is to be tested. 5. Assert that the return value is the expected value. 28 Chapter 5. Result Chapter 6 Performance The purpose of the performance measurement is to determine how well the application performs when the amount of data is scaled up. The different loading methods in section 5.9 are compared. 6.1 Test Data The test data sets are based on a customer database with 8723 apartments. The apartments were duplicated or removed to create differently sized databases. The number of apartments in each database are 100, 200, 500, 1000, 2000, 5000, 10000, 20000, 50000, 100000 and 200000. The numbers were chosen based on the size of existing customers databases and expected size of future customers. 6.2 Test Application A test application was created to test the performance. The time to perform the calculation, including loading all data, and store the result in memory was measured separately from the time to save the result to the database. The data persistence is the same for each test and therefore not interesting. However, to be able to compare the new calculator to the legacy one the save time has to be measured as well. The test application measures the number of apartments that were calculated each second. This is used to see if the calculators performance change over time. The memory usage was also measured. A separate thread was used that wakes up every seconds and polls the garbage collector for the current amount of memory allocated by the garbage collector. Before each poll the garbage collector releases all unreferenced memory. 6.3 Execution Each implementation is run ten times for each database. The first execution takes longer and is discarded. This is because Entity Framework performs some initialization the first time it is invoked. The mean value of the remaining nine values are used as the result. 29 30 Chapter 6. Performance 6.4 Result 6.4.1 Calculation time The average number of calculated apartments per second is shown in figure 6.1. The function import, inline query and stored procedure performs equally with a peak performance at about 10000 apartments. This means that the calculation is not completely linear and for larger number of apartments the performance rapidly decreases. The reason for this is that Entity Framework performs some automatic linking of related entities that is not executed in constant time. The LINQ and Specification methods also performs equally but compared to the other methods the performance is awful. They were also unable to calculate more than 50000 apartments so no result is recorded for bigger data sets. 1000 Calculations per second (n/s) 900 800 700 600 Function Import Inline Query LINQ Specification Stored Procedure 500 400 300 200 100 0 1 10 100 1000 10000 Apartments (n) 100000 1000000 Figure 6.1: Comparison between the different data loading methods. 6.4.2 Memory Usage The memory usage for each methods for different number of calculated apartments can be seen in figure 6.2. The specification method is the most memory intensive with a peak memory usage of 361 MiB calculating 50000 apartments. The LINQ method uses almost 200 MiB for the same calculation and the rest uses only about 20 MiB. Even for 200000 apartments the memory usage is only 50 MiB. 6.4. Result 31 512 Peak memory (MiB) 256 128 64 Function Import Inline Query LINQ Specification Stored Procedure 32 16 8 4 2 1 10 100 1000 10000 Apartments (n) 100000 1000000 Figure 6.2: Comparison between the memory usage of the different data loading methods. 6.4.3 Persistence Early versions used the data persistence feature of Entity Framework to save the result of the calculation to the database. An early test run showed that for large amounts of data the time to save the result was greater then the actual calculation. The result of the test run can be seen in figure 6.3. Saving the result of 200000 apartments took 35 minutes. The memory usage was also extremely high, figure 6.4 shows a peak memory usage of 963 MiB. The persistence step starts at about 1000 seconds and continues until the end of execution. 32 Chapter 6. Performance 10000 Save time (s) 1000 100 Calculation Time Save Time 10 1 1 10 100 1000 10000 Apartments (n) 100000 1000000 Figure 6.3: Performance of using the function import method for calculation and the Entity Framework persistence feature to save the result. Memory Usage (MiB) 1200 1000 800 600 Function Import 400 200 0 0 500 1000 1500 Elapsed time (s) 2000 2500 Figure 6.4: The memory usage when saving using Entity Framework. The final program uses the SqlBulkCopy[13] function and saving the same amount of data takes only 14 seconds with a memory usage of only a few kibibytes. This method is not 6.4. Result 33 as flexible however and the records persisted to the database are not automatically updated on the client side, but have to be loaded again manually. 6.4.4 Legacy Calculator A comparison was made with the old calculator and one of the most efficient methods, function import. This comparison includes the time to persist the result to the database. The result can be seen in figure 6.5. The old calculator calculates only five apartments per second while the new one using function import and the SqlBulkCopy function has a peak of 762 apartments per second. 800 Calculations per second (n/s) 700 600 500 400 Function Import Old calculator 300 200 100 0 1 10 100 1000 Apartments (n) 10000 Figure 6.5: Comparison between the function import method and the old calculator. 34 Chapter 6. Performance Chapter 7 Conclusions Because of lack of time the external supervisor did not have time to create a formal specification. This meant it took some time to actually figure out what the thesis was all about. Despite the slow start the work went on smoothly and the project was finished only one week behind schedule. The resulting application is fully functional and it performs the same task as the old program but over a hundred times faster. A big improvement in performance was expected, but the result still exceeded the expectations. The module has over 200 unit tests and only time can tell if it is easily maintained, but it will have a greater chance than the legacy application. The main goal of the thesis was to find a method to incorporate unit testing into the development cycle. It turned out that the main problem was not to write test cases, it was to write code that is easy to test. By abstracting away the database access and adhering to the rules of observability, isolation and single responsibility principle writing unit test will be a lot more feasible in the future. Because of the unit tests some of the bugs introduced when adding new features to the program will be avoided. Smaller and less coupled classes will also make it possible to reuse tried and tested classes, avoiding the need to modify classes and risking introducing new bugs. The thing that is missing is integration tests that make sure that the module as a whole is still working after modifications has been carried out on the module. Another main topic was how to test data access code. This turned out to be the hardest part where several approaches had to be completely abandoned. It was either too much effort to write the tests or the tests were useless. The final solution of using specifications is a good compromise and, at least in theory, the whole concept has a lot of potential. The evaluation of Entity Framework 4.0 showed that almost all code can be automatically generated from the database, minimizing the effort needed to bring the database into object oriented code. The performance however is awful for large sets of data. Thankfully it is possible to optimize the bottlenecks by replacing them with stored procedure. It would have been interesting to compare Entity Framework with more mature ORM frameworks, most notably NHibernate[17]. The unit tests created are very useful, but there is also a need for integration tests to test the interactions of units. A big challenge here is to maintain test data that can be updated together with the application. This is another topic that would be interesting to explore. 35 36 7.1 Chapter 7. Conclusions Limitations The main limitation of the module is that is not yet integrated into the graphic user interface of the rest of the application. More issues will probably have to be considered when the module is integrated with user input. There is a feature to select only a subset of the apartments to be used in the calculation, but its performance does not compare to loading all apartments at once. A better solution has to be found in creating this subset. 7.2 Future Work A lot of things like maintainability cannot be evaluated before the module starts to expand. There is also not known how much an effort is required to maintain the code, keeping all test cases up to date. Because unit tests only test each unit in isolation the test suite will not detect errors that occur when units are interacting. It would be possible to create a suite of integration tests that test the service layer, because it has a well defined interface. These tests requires another database with test data that has to be maintained when the application changes. Chapter 8 Acknowledgements I would like to thank TRIMMA for the opportunity of doing this project and my external supervisor Mattias Blom and the other employees at TRIMMA for their feedback. A thanks also to my internal supervisor at Umeå University, Jan-Erik Moström. 37 38 Chapter 8. Acknowledgements References [1] Scott Allen. Testability and Entity Framework 4.0. http://msdn.microsoft.com/enus/library/ff714955.aspx (visited 2012-05-21). Specification Pattern, Entity Framework & LINQ. [2] Will Beattie. http://blog.willbeattie.net/2011/02/specification-pattern-entity-framework.html (visited 2012-06-01). [3] SABO Sveriges Allmännyttiga Bostadsföretag. Sätt rätt hyra, handledning i systemematisk hyressättning, 2010. [4] Martin J. Fowler. ObjectMother. http://martinfowler.com/bliki/ObjectMother.html (visited 2012-05-22). [5] Martin J. Fowler. Patterns of Enterprise Application Architecture. Addison-Wesley Professional, 2002. [6] Edward Hieatt and Rob Mee. Repository. http://martinfowler.com/eaaCatalog/repository.html (visited 2012-05-22). [7] Hyressättningsutredningen. Sou 2004:91 reformerad hyressättning. Socialdepartementet, 09 2004. [8] Clarius Consulting Labs. Moq. http://code.google.com/p/moq/ (visited 2012-06-05). [9] Julia Lerman. Agile Entity Framework 4 Repository. http://thedatafarm.com/blog/dataaccess/agile-entity-framework-4-repository-part-1-model-and-poco-classes/ (visited 201205-23). [10] Julia Lerman. Programming Entity Framework. O’Reilly Media, 2009. [11] Robert C. Martin. The single responsibility principle. Principles of Object Oriented Design, 2002. [12] Microsoft. Code Generation and T4 Text Templates. http://msdn.microsoft.com/enus/library/bb126445.aspx (visited 2012-07-30). SqlBulkCopy Class. http://msdn.microsoft.com/en[13] Microsoft. us/library/system.data.sqlclient.sqlbulkcopy.aspx (visited 2012-06-25). [14] Microsoft. The ADO.NET Entity Framework Overview. http://msdn.microsoft.com/enus/library/aa697427(v=vs.80).aspx (visited 2012-06-07). [15] Microsoft. Unit testing framework. http://msdn.microsoft.com/enus/library/ms243147(v=vs.80).aspx (visited 2012-08-01). 39 40 REFERENCES [16] MyGeneration. The dOOdads .NET Architecture. http://www.mygenerationsoftware.com/portal/dOOdads/Overview/tabid/63/Default.aspx (visited 2012-06-07). [17] NHibernate. Nhibernate. http://nhforge.org/Default.aspx (visited 2012-08-01). [18] Stephanie Punke Peter Schuh. Objectmother - easing test object creation in xp. XP Universe, 2003. [19] R. Venkat Rajendran. White paper on unit testing. Deccanet Designs Ltd., 2002. [20] Philip Craig Tim Mackinnon, Steve Freeman. Endo-testing: Unit testing with mock objects. XP eXamined, 2000.