From start to finish, best practices and practical advice for doing a simulation-based project
This post is part 5 from a 7 part series about the Simulation Model LIfe cycle. You can catch up on the previous post here
Life cycle of a simulation model
Experimentation
Analyze the Results
Report on the Findings
This post focuses on the fifth step and future posts will focus on the remaining steps - you can start watching the lecture from step 5 here, or continue reading the blog post.
Note: Although one does go "back" in later steps to do some activities during previous steps, you by no means redo the entire step. The steps listed above are guidelines of the general steps that a typical project follows. On all steps, you can go backward and revisit previous steps, but unlikely that you will skip a step or move forward before completing a step to at least 80%-90%.
For a more insightful discussion on the matter read here
5) Experimentation
Experimentation is arguably one of the more difficult steps in the simulation model life cycle. I sometimes feel that is more an art than a science. From the start, it is hard to just think about the right experiments to execute, let alone think about doing them in the right way. This post is by no means an all-encompassing guide to creating good experiments for simulation models, but more of a quick start guide to getting the basics right.
Given that we know that experiments should not be designed to give us the answer we want or expect, let's look at a few best practices to help us create meaningful experiments. Thereafter we will dive into an example with some best practices fro setting up an experiment in AnyLogic.
1) Test the scenarios defined in objective
During the simulation phase, you need to keep your focus on the original objective of your model, as defined in step 1 of the simulation model life cycle. Else you risk getting sidetracked by interesting ideas and hypotheses about your system, which although value-adding, is not the primary objective of the model nor the project.
Create different scenarios to test the different parts of your objective as well as combined scenarios testing the entire objective in one go. This sounds much easier than what it is in practice! Take your time and structure your scenarios well. If you followed all the correct model building and data import best practices described in step 2 and step 3, your scenario should just be a simple Excel or other basic data file that you can import and run in the model without any fuss.
Based on our retailer example our scenarios are very simple:
Single queue scenario: where there is a single queue of customers in front of all servers
Multiple queue scenario: where there are individual queues in front of each server.
The only difference in our two scenario files would be a single parameter change - the number of queues.
Believe it or not but this can often be the case with real-life examples as well, where the only difference between scenarios is a single parameter change.
2) Stress test the model to find the boundary conditions.
This amount of time spent on stress testing is greatly dependent on the time available, the confidence in the model output as well as the magnitude of the decision.
Essentially you can ask yourself: "What is the cost of getting it wrong versus the cost of making sure you are right?"
During this phase you create, sometimes absurd, scenarios that push the limits of the model to determine the solution space. Be careful not to test absurd inputs,e.g. having zero queues in the model. These kinds of tests are designed to see if the model breaks and are best done during step 3 when you implement unit testing. The focus here should be to test under what conditions are the assumptions and findings of point 1 valid.
For example:
Make the server time a constant 1 minute for all customers
Make the server time a constant 100 minutes for all customers
Have all customers arrive 1 minute apart from each other.
The aim here is to test if the conclusions you made are still valid, and under which conditions are they invalid or just different. This can also help you identify possible bugs in the logic. If the results are not explainable there is either a bug or maybe you just need to add better output data in order to provide enough context and insight to explain the results.
3) Use multiple runs with different random seeds
This is an absolute must to get any valid simulation results from your experimentation. You must run multiple replications with different random streams to get a distribution of outputs. This is often achieved by creating a Monte Carlo Experiment where you simply collect specific model outputs while varying the seed across multiple runs.
If you only run 1 iteration of each scenario, even if both are with the same random seed, you have no idea where in the universe of possibilities these results lie.
Example: Imagine you built a simulation that could simulate the average temperature in a day for any given day of the year in New York City and you were asked to give the expected delta between winter and summer temperatures. If you simulated just one day in summer and compared it to just one day in winter you might end up, due to pure randomness, simulating the coldest summer day, and the warmest winter day, which will give you a delta of less than 10'F.
Is it now safe to assume that the difference between winter and summer temperatures is almost negligible?
Off course note, you need to simulate 100s if not thousands of days to get a representative average.... This will tell you that the average difference between a summer and a winter day is around 40'F.
This brings us to our second point. Averages. Everybody knows the old adage that if your head is in the freezer and your feet are in the oven, on average you are at a very comfortable temperature.
Not only do you need multiple runs to get to a representative average but you also need them to get to a distribution of the possible outcomes. You need to know what is the min, the max, the standard deviation and whatever else could be applicable to the specific project. Often presenting the measured outcome in a histogram gives a nice visual representation of the output across various runs.
In our retailer example: If the average customer waiting time was 10 minutes, but the maximum was 2 hours, we also need to ask ourselves if there is a bug in the model or the input data, or is this just the nature of the process?
Note: Since we were previously stimulating historical service times for our customers doing multiple replications runs in the experiment phase would not make sense, as we will be simulating the same service times over and over again... Thus we will get the same result for every run. We will need to add new parameters to our scenario file and use them in the model to simulate a distribution of service times.
See the example for more details.
3) Output requirements
Building on the previous point, your experiment output results need to contain enough information for you to not only give the outcome of the results but also be able to justify and explain them. To do this you will need the following:
Summary results for a single scenario
Detailed results per replication
The detailed results would be the same results that you output from a single-run experiment and the summary would just be the summary over multiple runs, for each scenario. Typically you will only ever go to the detailed data if you see a big standard deviation or a significant min or max in your model and suspect that one of the replications might have encountered a bug. Since we do have the individual results for each replication available in the output file we can easily calculate our summary statistics in some external tool,e.g. Excel. But why go through the effort of having to compute this for every scenario comparison when we can have the model do this for us.
Example
Now that we have looked at the basics of good experimentation let's continue with our example from step 4 and implement these actions into the model.
Note: In this example, we will continue to make use of the custom experiment object we set up in step 3. AnyLogic does come standard with some predefined experiment options for Sensitivity, Monte Carlo, Parameter Variation and other experiments, but for most complex and long term simulation models I prefer not to use them for the following reasons:
They require me to export different models for the client, one for the single runs, one for sensitivity analysis, one for Monte Carlo experiments etc. With the custom experiment option, I still have all my experiment functionality available in a single model export.
There is less control over how you want to model to run. This can be a problem if you have different Excel or other input files that read data into the DB.
The custom experiment is already set up to be used for unit testing, thus there is no additional work.
The example is split into three parts with the first part focussing on some of the upgrades we need to do to the model and the scenario file so that we can set up and save some specifics for each scenario. The second part is all about saving the outputs and running multiple scenarios one after the other. The last part is focussing on how do we now use the updates in parts 1 and 2 to run multiple scenarios one after the other, save the output and download it from the model.
Part 1 - Upgrade the scenario file, scenario object and model
a) Add new columns in the scenario file
These new columns are to define some of the model behavior that was previously not set through the scenario object.
I find it best to add a new sheet to the model named "ModelSetup" and here I have a single row with multiple columns that I use to store the settings of the model run. It is here where I also add a column for the scenario name, which I can then output to the results file when running multiple scenarios.
Here is a quick explanation for each of the new fields:
scenario_name: The name of this scenario, preferably unique as it will be used in the output file.
multiple_queue_setup: if True then the model will use the multiple queue setup. If False the model will use a single queue for all the servers.
use_historical_service_time: If True then the model will use the historical service times found with the historical customers. If False then the service times for each customer will be drawn from a random triangular distribution using the parameters in the following fields below:
min_service_time: The minimum service time for the triangular distribution of service times
likely_service_time: The likely service time for the triangular distribution of service times
max_service_time: The maximum service time for the triangular distribution of service times
Once you created these fields in the new sheet inside your Excel file. you can import the new sheet as a new database table into your model. Now this data is available to use in the scenario creator.
b) Add fields to the scenario object and initialize them from the scenario creator
Now we need to add the same fields from the scenario file to the scenario object so that the model can use them.
And we also need to populate these fields in the scenario loader function we created `getScenarioFromDB`
Please take careful note of the following:
I make use of the non-cached values when getting the values from the internal database.
If you make use of the AnyLogic internal Database in any way, you must know what you are doing when using cached values.
(Unfortunately) Cached values are used by default so you need to add the extra initial parameter as False, before you specify the column to return.
Read this blog post for more information.
In this example, the scenario variables are public and I do not make use of getters and setters
It is best coding and modeling practice to always keep all non-final variables private and make use of getters and setters.
If you find it cumbersome to create getters and setters you might consider using project Lombok in your models which can do this for you automatically.
Read this post for more details on how to use project Lombok inside AnyLogic.
c) Use the new parameters to create customer service times
Now that the parameters are available inside the scenario object we can use them inside the model logic to set up the model.
The first point is to use the setting inside the scenario object that defines whether or not to use the historical data or the triangular distribution.
The next is to use the scenario variable to define whether or not we are going to use a single queue or multiple queues.
Now that the model is using the new setup variables we can work on saving the adding more results from the simulation to the output file.
Part 2 - Save your results
a) Detailed Results:
In step 4 we were only saving the single simulation output to a text file. The text file was an object inside the main and thus if we have multiple simulation runs, every replication will overwrite the following replication's data.
In order for customer waiting times for all models to be written to a single text file, we need to store the text file object on the simulation object and passed through to the model using a parameter. Now all iterations will write to the same text file and all the results will be in the same file. Luckily text files are also thread-safe, so if we were to do multi-threading for our experiment execution the results will still be saved correctly. (More on multi-threading in a future post ;-) )
But how will we distinguish one iteration from the next? Or one scenario from the next?
In the text file, we add two new columns, the scenario name and the seed, and we separate all the columns using tabs (hence the "/t" in the code). FYI: Tab separate text files can easily be opened in Excel with each tab denoting a new column.
But where did the seed parameter come from?
Identifying the seed in a simulation is often a big question that gets posted on Stack Overflow every now and then... as is evident by these questions, and a number of others.
In this example I have upgraded our Custom Experiment object from step 4 to now also include the seed and the text file as inputs into the "startRound" function;
The seed can now be provided to the custom experiment, which means that we have full control over it when we create a custom experiment.
You will also notice an additional parameter called Results, which is used further down in the code... We will get to that in a minute....
b) Summary Results:
Often we are not interested in the individual results of each replication but rather the summary statistics across multiple runs. In order to achieve this, we create a new Java Class called Results. Each replication saves results in this object where it will store the statistics and we can write the summary statistics out to a next text file.
The Results file below is the most basic form, which I used in almost all my models.
With this approach you don't need to drag and drop hundreds of objects from the pallet to main or to your simulation page as all statistics are stored in this one object.
Part3: Putting it all together
Now that we have the updates to the model and the results class we can set up a new feature on our simulation front page where we can select multiple Excel files, have them imported into the DB one at a time and have the scenario they represent execute and saving all the results when done.
1) We create a button to allow users to select multiple excel files using the following code:
FileDialog fileDialog = new FileDialog(new Frame(), "Select all input files", FileDialog.LOAD );
fileDialog.setMultipleMode(true);
fileDialog.setFile( "*.xlsx" );
fileDialog.setVisible( true );
File[] files = fileDialog.getFiles();
for (File file:files) {
filesToLoad.add(file.getAbsolutePath());
}
2) Then we allow the user to set the number of replications
3) Then a button to run all the scenarios, in order, setting the same seed for every corresponding iteration between different scenario runs. This function will also pass the txt file for the detailed results of each model run. It also passes a new Result object for every scenario and saves it to a map with the corresponding scenario name.
Once all the scenarios have been executed it will take the results from the results map and write the summary statistics to a new txt file called 'summaryResults.'
Here is a short video of the scenario comparison functionality in action
If we open the Summary file in Excel we get a nice summarised table like the one below.
If you would like access to the model, please contact us admin@theanylogicmodeler.com
Summary
In the post, we looked at the experimentation phase of the simulation model life cycle and how to correctly set up the experiments inside your model. We looked at some advanced methods to make use of the custom experiment that allows us full control over the scenario comparison and sensitivity analysis experiments. This is definitely an increase in the complexity scale compared to the previous post, but worth the effort, especially if you are building complex models where you require full control over the experiment execution.
In the next post, we will be taking a breather and looking at some best practices for analyzing results.
Looking forward to the next post on best practices for building your model?
Remember to subscribe, so that you can get notifications via email or you can also join the mobile app here!
Watch the full lecture here, the video below starts at step 5
What next?
If you liked this post, you are welcome to read more posts by following the links above to similar posts. Why not subscribe to our blog or follow us on any of the social media accounts for future updates. The links are in the Menu bar at the top or the footer at the bottom.
If you want to contact us for some advice, maybe a potential partnership or project or just to say "Hi!", feel free to get in touch here and we will get back to you soon!
Comentários