How could/should I handle test generated data in a real world scenario?

I have no experience with load testing, however I’ve created a ASP.NET Core Web API in which I need to implement loading tests on the CI pipeline (Azure DevOps). I’m considering k6 to perform the load testing. I did some reading and how to write the tests, however I wasn’t able to find any material on the conceptual part of load testing.

My API gets a collection of small files via a HTTP POST request, saves the metadata from these files in a SQL Server database and saves the files (binaries) to Microsoft Azure Blob Storage. What is troubling me is that every time the pipeline runs I’ll make thousands of POST requests with mock files to the API. Meaning that my SQL Server database and Blob Storage container will increase greatly in size with each run.

How usually a scenario like this is handled? Would developers in this case typically purge the database and blob storage periodically? Also does this mean that I shouldn’t run the load tests against the production environment? I don’t want my production environment polluted with mock data. Also what’s the point of having load tests if I can’t run them against the production environment?

I was thinking of maybe running a docker image with a database just for load testing, but I wouldn’t be able to run something like the Microsoft Azure Blob Storage within a container… So I don’t know how to proceed on that end. I would really appreciate the input of professionals with load testing experience regarding these scenarios and the application of load testing in a real world environment.

How usually a scenario like this is handled?

I don’t think there’s one universal solution here, it all depends on your use case and requirements. What exactly do you want to ensure and how much money and time you’re willing to spend? :slightly_smiling_face: I can’t claim I’m a load testing professional, even though I’m one of the k6 developers, but here are some of the possible options:

  1. Test against your production environment, but make your test filenames predictable (e.g. k6-loadtest-{randomString}.file), so you can clean them up easily after the load test. Add a threshold with abortOnFail so you abort your test run prematurely if your production performance starts to suffer.
  2. Have a persistent staging environment that resembles production and test that. Maybe scale it down so you don’t incur too much costs. Re-create the database and storage after every test, or clean it up like :arrow_up:
  3. Have an on-demand test environment that matches your production 1:1 and use that for load tests. If you bring it up only occasionally, for testing, it shouldn’t cost too much. And you can completely destroy it after you’re done. But in a lot of cases, especially when there are legacy systems that are not easily deployed, this is not very realistic.
  4. Test locally, in a development environment and/or CI, with mocks for the external services like Microsoft Azure Blob Storage. I haven’t personally used Azure’s Blob Storage, but generally you probably can exclude external services like it from your load tests. They may incur significant costs for large tests, you don’t really have a control over them, so you can’t optimize them if they’re the bottleneck anyway. Besides, they usually have SLAs, so you can somewhat realistically mock their performance…
    So, you can load test the part of your service that you control and mock the rest. More importantly, you can do that early in the development life cycle, in some CI environment that verifies that every change you introduce to your product doesn’t affect its performance negatively. MinIO is a project that I’ve used to mock S3, but it seems it also supports Azure Blob Storage: MinIO Object Storage for Kubernetes — MinIO Object Storage for Kubernetes

Finally, you can do any combination of these things, or all of them, and some people do. With environment variables and/or JavaScript modules you can reuse the same k6 scripts to test multiple environments. With --execution-segment you can even scale down your production load tests for smaller CI or staging environments.

Again, you can do pretty much anything you want :sweat_smile: The question is how much time and effort you’re willing to put into it. I suggest starting small and iterating.