Although building an application is quite an uphill battle, undermining the potential interest the business/application may receive is quite counter-productive to the whole effort investment. Moreover, multiple businesses that face sudden increase in customers/requests are sometimes not able to estimate such events and either end up making quick fixes that are not very cost efficient or tend to lose customers. To avoid such issues, system scalability and system analysis requires a proactive approach and planning as essentially every business wants to keep the user experience the same (or rather improve) for its very first and N customer.
In addition to this, there is no one definitive answer to managing and scaling your system as it depends on multiple factors from the choice of framework to the choice of db engine to the nature of business but as every application requires a database, understanding DB scaling is not only important but to an extent mandatory for a software architect/developer.
Scaling a database is effectively dependent on the type of database engines that is chosen and the DB architecture. It is quite important to understand that SQL(RDBMS) uses quite a different technique to scaling as compared to a NoSQL DB and your choice of the architecture paves the road for future scalability management.
Types of Database Scaling
To understand scaling, we must first understand the difference between SQL and NoSQL DBs. SQL DBs are structured in a row-column relational format where there is a link/relation between the data. Although an application using SQL can be distributed among multiple tables, but these tables are further linked to each other, and each write operation subsequently affects the complete database. In contrast, NoSQL DBs do not have such strict schemas to adhere to and are more flexible than traditional RDBMs.
Moving further there are 2 types of scaling formats:
1. Vertical Scaling
Vertical scaling in simple terms means adding more processing power to the database server and this processing power can be increased by simply adding more CPU units to the server. SQL DBs usually scale using the vertical scaling technique because of the rigid schema and the relations within the database. Although horizontal scaling looks like a natural and simplistic progression to scaling but it has its own limitations as scaling proves to be inefficient and expensive beyond a certain threshold as hardware is quite expensive. Such a limitation is quite a deterrent for an application and its users that generate a high number of read/write requests within a given unit of time.
2. Horizontal Scaling or Scale-out
In contrast to SQL DBs, NoSQL DBs can be scaled by maintaining multiple nodes and adding multiple servers and not just CPU units to one server. This is horizontal scaling because instead of stacking up processing power in a single server we are essentially spreading the database across multiple servers.
The cost and hardware limitations, experienced in vertical scaling, become obsolete in the horizontal scaling format as adding more instances/nodes turns out to be more efficient.
There are multiple methods to horizontally scale a NoSQL DB. We can either opt for Shading which means distributing the database into several nodes wherein each node will be a subset of the overall database, or we can store the complete copy of the database across multiple nodes through a process called replication.
Database Sharding
Partitioning a database into multiple small subsets not only provides cost benefits but also enables the small subsets (logical shards) to be spread geographically so that certain regions can access a particular shard that is more easily available to cater to requests from that region.
A system of Sharding can further be attained through different architectures:
Hash-Based Sharding
Hash based sharding uses either a single key or a compound key to spread and partition data across multiple clusters. Although this form of sharding does ensure even distribution of data but whenever the hash tables are resized (effectively altering the number of array slots) this algorithm results in remapping all the keys and to overcome such inefficiencies NoSQL DBs such as BangDB use consistent hashing that further ensure any resizing only impacts a finite number of keys and slots. In addition to this BangDB utilizes the P2P virtual network overlay to implement node assignment and routing requests.
Range Based Sharding
As the name suggests Range Based Sharding distributes the database in defined number of continuous ranges based on the shard key and a good example of this will be age-based distribution of data resulting in efficient database access but considering the possibility that a data set can be implicitly skewed, Range-Based sharding might need an efficient shard balancer to handle these scenarios.
Don’t Miss-
What is Data Modelling?
15 Best MySQL Management Applications
Zoned Sharding
Zoned sharding associates a group of shards with a particular zone based on the defined shard key (compound shard key) and the requests for a particular range are forwarded to the specific zones.
Also, the sharding techniques discussed above work in coherence with a shard balancing algorithm(or a manual setting) to discover unbalanced nodes and further govern the routing process of shards to particular nodes.
Shard Balancing
As an application grows so do the shards along with it but after a certain period few shards would have increased in sized as compared to other shards and to address this uneven distribution of data we would need a shard balancing algorithm that will balance the load among shards.
Data Replication
The basic premise of data replication is creating a copy of data across each node. As compared to sharding where a subset of data exists on each node, replication effectively enables ease of availability of read transactions as the same data set exists over multiple nodes but at the same time write operations become quite a task as any updating/addition/deletion to/from the DB needs to be reflected across all the copies spread across multiple nodes
Beyond Scaling: BangDB The Modern NoSQL
Although there are multiple ways to scale up(vertical) and scale out(horizontal) but in real-time use cases it is quite imperative for an application to have the ease of scaling down. Moreover, multiple conventional NoSQL DBs have great infrastructure to scale out but not a robust methodology to scale down. Scaling down becomes quite important for successful business as well because certain situations such as moving out of a particular geographic region or a change in business model for a particular product will demand a decrease in resources and it is primarily because of these reasons that more modern implementations of NoSQL DBs such as BangDB provide the necessary flexibility to not only horizontally scale but also scale down. Also, BangDB’s Auto Scaling capability enables the developers to manage resources specially for small to mid-sized applications and use cases.
In addition to this, NoSQL dbs are based on the CAP theorem which implicitly means that a data base can either be
# Consistent-Partition Tolerance (CP): Upon partitioning the system eliminates any nodes that are not consistent.
# Available-Partition Tolerance (AP): Upon partitioning the system reverts any nodes to an older version but every node remains up and running.
# Consistent-Available (CA): Fault tolerance is sacrificed here, and the system remains consistent and available at the cost of limiting partitioning.
Most modern use cases within the same business will not be able to force-fit into one category and might require CP, AP or CA based on need and to cater to these scenarios BangDB provides a ‘knob approach’ wherein application developers, as per their requirements, can easily shift to one of the three states.
Scaling has always been an issue irrespective of the advancement in DB systems and the C10M problem (a more modern version of C10K) questions the ability of DB engine to handle multiple million concurrent connections. There are various ways to handle this scaling issue but deploying an event polling mechanism for higher concurrent connections is the concept built into BangDB to support applications at scale.
Finally, stakeholders and developers should acknowledge that scaling irrespective of the choice of DB is inevitable to such an extent that a great product might have to face an early demise if the scaling architecture is not in place as this directly impacts the consumer/user experience and a good product that is able to handle not only the increase(and decrease) in demand/traffic/usage builds a solid foundation for sustained competitive advantage.