One common question raised while dealing with any SaaS architecture is “What kind of design should I follow for my Data isolation?” . Should I go for an isolated database for every customer or should there be a shared database / shared table for all customers.
To take a decision on data isolation you should ask the following questions:
- Will my customers be willing to share the data with other customers or is it a big no?
- Do my customers demand access to the database?
- Will my customers need the option to individually backup and restore the data at any point in time?
- Is the data volume high for all customers?
- Will the workload from all my customers be high and at the same time(concurrency)?
If your answer to question 1 is an Yes, then you might not have an option but to provide those customers an isolated database.
If your answer to question 2 is an Yes, you might need to go for an isolated database. Having said that please note that it is a very bad practice and a big blunder to open up the data environment to the customer and it is wise to stay away from it. Usually customers would need data access to backup the data regularly and sync to their on-premise environment or when they need to create their custom reports.
A better way to handle this would be to provide data export files and APIs through which they can sync the data with the on-premise data and as for the custom reports have a custom report builder capability within the product itself.
If your answer to question 3 is an Yes, while it is easy to achieve this in a isolated database, you can still achieve this in a shared table by building an utility.
If your answer to question 4 and 5 is an Yes, you might have to do a capacity planning using workload and storage for your customers and see if combining customers data makes sense from a performance perspective. In a shared table scenario you typically shard the data based on tenant. If you end up with one shard for each customer then going for a shared table makes no sense. A shared table scenario makes much sense from a performance perspective only for the following scenarios,
- When you have many customer with lesser or medium volume of data
- When the workload of the customer is spread at different times
- When the data volume and workload pattern is different between different customers
Never the less an ideal option in SaaS would be to go for a hybrid model where you completely design for a shared table approach but have an option to isolate specific customers on a need basis as you are always going to have a mix of customers. If you are designing for sharding by tenants you can easily achieve the hybrid option.