본문 바로가기
[AWS]/Highly Available, Scalable, AWS Stack

26. Intro: Scalability - data layer

by SAMSUNG CLOUD-OKY 2022. 1. 24.
반응형

 

 

 

 

 

 

Welcome to Module five.

In this module we will look at scalability once again but for the data layer right and by data layer we mean the

RDS instance that we have where we have our my sql database like this one here.

And we also mean the S3 bucket which has our images.

OK.

So the database and the image store are both data layers.

And we need to consider scalability options for both.

Let's start with the database.

So traditionally relational databases have been hard to scale.

OK.

Especially horizontally.

That's because the databases have foreign keys and they have joins.

And if you keep the data in multiple servers it becomes harder to pull all the required information

from multiple servers quickly.

So that's why it's better to have all the data in a single server.

Right.

So we can't really scale this this database horizontally.

And the traditional solution has been to scale this vertically which means you can go

from a small server to a large server but this has limits right this has limits.

So so vertical scaling right.

Is limited is limited so.

So what I mean by that is first of all the server sizes are limited right how big a server can you can

you get right two when you scale the server vertically

there's an outage.

This is a downtime because you need a reboot.

Right.

So how frequently you can do this kind of scaling is limited as well.

And also the time of scaling is a problem as well because you're not going to reboot your server

during business hours.

So vertical scaling is very limited.

And this has traditionally been a challenge with relational databases.

So in AWS However there are some solutions not perfect but they are good solutions and

one solution is to use read replicas.

OK.

So your RDS instance supports something called a read replica where you can create one or more

you can have one or more of these instances these are separate instances.

And there is replication from the primary server to the replica and this is not synchronous this

is asynchronous replication and the replica will have all your data with a small lag sometimes

maybe some small lag sometimes but the data will be there.

In these instances and what you can do is you can change your application to use the replica

for read requests.

So the reads can go to the replica or replicas because remember you can have many replicas

as well.

So this is by the way horizontally scalable right.

So you can scale them horizontally and the writes will go.

We continue to go to be primary database.

So the writes will go to the primary primary database.

Right.

And this way what happens is you you are offloading some of the requests to the replicas and thereby

freeing up the primary database for write requests.

So this is a scalability solution because even if the load increases.

Right.

you have offloaded a lot of those requests to the replicas and your primary database

is not overloaded it is performing well.

OK so this is a great solution.

And remember that can have one more read replicas.

OK.

We can also use another solution which is the memcached or in-memory database cluster.

So.

So there is a service called ElastiCache in AWS

And this has two options.

Okay this has Redis as a database engine.

And the other one is memcached you can use either one right in the course we look at memcached but

you can use Redis as well.

And what we can do to solve the scalability challenge of the of the mysql database is to use a

memcached cluster along with the RDS instance and the way it works is the application will go to the

cache for frequently accessed information.

So any kind of data which is frequently accessed which is accessed by many people many of your users

right.

That can be stored in the cache temporarily.

Right.

And you can retrieve from the cache and you don't have to go to the database all the time.

Correct.

Right so.

So this one you can minimize right you can minimize going to the database for those kinds of information.

And you can store them in an in memory cache.

So these by the way store data in the RAM not on disk so.

So this retrieval anyway will be much faster much faster because you are retrieving from the

memory not from the disk the way you do with the main database.

This is another way of offloading you can offload requests away from the main database.

And this way you can reduce the load on the main database and thereby make it more scalable.

OK so so this is another solution that we want to look at in this module.

For database scalability.

OK let's now look at the images the images are in S3.

So S3 is highly scalable.

Okay this is scalable already both in terms of storage and in terms of the number of requests it can

handle a very small number of requests for month.

Or it can have too many requests for that image at the same time simultaneously right per second number

of requests per second can also be quite high and S3 is perfectly scalable there no problem right.

However the challenge here is that let's say you are in the U.S. East region if you are in the U.S.

East region and let's say your users are in different parts of the world so they could be users in say

Australia which is quite far from us East.

They could be somewhere in Asia.

They could be somewhere in South America.

Right.

They could be in U.S. West the western coast of the US is also pretty far from us east.

So in all those cases what would happen is let's say you have images you have videos.

So these are big files images and videos and what happens with this kind of content is that the distance

between the region and the users matters if the distance is too big.

Then there is network latency so the distance from say US East to Australia is quite high.

So you'll find network latency and this means that the application performance and the user experience

will not be the same for all the uses that you have right.

That's why we can use Cloud front we can use cloud front which is basically a content distribution

network.

And this uses edge locations around the world.

So basically the user can be can be served from edge locations close to where the users are right.

So they don't get the content from us East.

They get the content from the edge location close to Australia close to Asia close to South America

and so on and this way the performance of your application for content like images and videos and also

for dynamic content by the way.

Right.

If you want to implement that can be much better.

Right.

So performance can be good no matter where the user is.

So thats why we want to use cloud front as well.

And this is only for the image page that we have in this particular application.

Okay.

So this is what we want to do and the way we will implement this is we will create our RDS

replica we will create a memcache cluster.

We have to configure that in our PHP application.

in our PHP application we will be using new pages.

So one will be memcached dot PHP.

In this we configure the memcache end point.

So that we can connect from the PHP application.

We will also use another page.

This would be RDS underscore rr.

read replica

dot PHP.

And here we will configure the Read replicas endpoint and test that out and we'll also be using another page.

This will be cloud front underscore image dot PHP.

And here we will configure the URL of the image via the clock front distribution, cloudfront has a

DNS name and you can then say DNS slash the image dot PNG right this way you can fetch

the image via CloudFront and not S3.

So we will configure all these three applications and then we will test out the the pages just to make sure

that the application is successfully working with these new services that we have set up.

OK.

All right so so this is what we want to achieve in this module and this will make our

application more scalableon the database and the data side.

So in the next set of videos resources in this module lets see how to implement this architecture.

Welcome to Module five.

In this module we will look at scalability once again but for the data layer right and by data layer we mean the

RDS instance that we have where we have our my sql database like this one here.

And we also mean the S3 bucket which has our images.

OK.

So the database and the image store are both data layers.

And we need to consider scalability options for both.

Let's start with the database.

So traditionally relational databases have been hard to scale.

OK.

Especially horizontally.

That's because the databases have foreign keys and they have joins.

And if you keep the data in multiple servers it becomes harder to pull all the required information

from multiple servers quickly.

So that's why it's better to have all the data in a single server.

Right.

So we can't really scale this this database horizontally.

And the traditional solution has been to scale this vertically which means you can go

from a small server to a large server but this has limits right this has limits.

So so vertical scaling right.

Is limited is limited so.

So what I mean by that is first of all the server sizes are limited right how big a server can you can

you get right two when you scale the server vertically

there's an outage.

This is a downtime because you need a reboot.

Right.

So how frequently you can do this kind of scaling is limited as well.

And also the time of scaling is a problem as well because you're not going to reboot your server

during business hours.

So vertical scaling is very limited.

And this has traditionally been a challenge with relational databases.

So in AWS However there are some solutions not perfect but they are good solutions and

one solution is to use read replicas.

OK.

So your RDS instance supports something called a read replica where you can create one or more

you can have one or more of these instances these are separate instances.

And there is replication from the primary server to the replica and this is not synchronous this

is asynchronous replication and the replica will have all your data with a small lag sometimes

maybe some small lag sometimes but the data will be there.

In these instances and what you can do is you can change your application to use the replica

for read requests.

So the reads can go to the replica or replicas because remember you can have many replicas

as well.

So this is by the way horizontally scalable right.

So you can scale them horizontally and the writes will go.

We continue to go to be primary database.

So the writes will go to the primary primary database.

Right.

And this way what happens is you you are offloading some of the requests to the replicas and thereby

freeing up the primary database for write requests.

So this is a scalability solution because even if the load increases.

Right.

you have offloaded a lot of those requests to the replicas and your primary database

is not overloaded it is performing well.

OK so this is a great solution.

And remember that can have one more read replicas.

OK.

We can also use another solution which is the memcached or in-memory database cluster.

So.

So there is a service called ElastiCache in AWS

And this has two options.

Okay this has Redis as a database engine.

And the other one is memcached you can use either one right in the course we look at memcached but

you can use Redis as well.

And what we can do to solve the scalability challenge of the of the mysql database is to use a

memcached cluster along with the RDS instance and the way it works is the application will go to the

cache for frequently accessed information.

So any kind of data which is frequently accessed which is accessed by many people many of your users

right.

That can be stored in the cache temporarily.

Right.

And you can retrieve from the cache and you don't have to go to the database all the time.

Correct.

Right so.

So this one you can minimize right you can minimize going to the database for those kinds of information.

And you can store them in an in memory cache.

So these by the way store data in the RAM not on disk so.

So this retrieval anyway will be much faster much faster because you are retrieving from the

memory not from the disk the way you do with the main database.

This is another way of offloading you can offload requests away from the main database.

And this way you can reduce the load on the main database and thereby make it more scalable.

OK so so this is another solution that we want to look at in this module.

For database scalability.

OK let's now look at the images the images are in S3.

So S3 is highly scalable.

Okay this is scalable already both in terms of storage and in terms of the number of requests it can

handle a very small number of requests for month.

Or it can have too many requests for that image at the same time simultaneously right per second number

of requests per second can also be quite high and S3 is perfectly scalable there no problem right.

However the challenge here is that let's say you are in the U.S. East region if you are in the U.S.

East region and let's say your users are in different parts of the world so they could be users in say

Australia which is quite far from us East.

They could be somewhere in Asia.

They could be somewhere in South America.

Right.

They could be in U.S. West the western coast of the US is also pretty far from us east.

So in all those cases what would happen is let's say you have images you have videos.

So these are big files images and videos and what happens with this kind of content is that the distance

between the region and the users matters if the distance is too big.

Then there is network latency so the distance from say US East to Australia is quite high.

So you'll find network latency and this means that the application performance and the user experience

will not be the same for all the uses that you have right.

That's why we can use Cloud front we can use cloud front which is basically a content distribution

network.

And this uses edge locations around the world.

So basically the user can be can be served from edge locations close to where the users are right.

So they don't get the content from us East.

They get the content from the edge location close to Australia close to Asia close to South America

and so on and this way the performance of your application for content like images and videos and also

for dynamic content by the way.

Right.

If you want to implement that can be much better.

Right.

So performance can be good no matter where the user is.

So thats why we want to use cloud front as well.

And this is only for the image page that we have in this particular application.

Okay.

So this is what we want to do and the way we will implement this is we will create our RDS

replica we will create a memcache cluster.

We have to configure that in our PHP application.

in our PHP application we will be using new pages.

So one will be memcached dot PHP.

In this we configure the memcache end point.

So that we can connect from the PHP application.

We will also use another page.

This would be RDS underscore rr.

read replica

dot PHP.

And here we will configure the Read replicas endpoint and test that out and we'll also be using another page.

This will be cloud front underscore image dot PHP.

And here we will configure the URL of the image via the clock front distribution, cloudfront has a

DNS name and you can then say DNS slash the image dot PNG right this way you can fetch

the image via CloudFront and not S3.

 

 

 

 

반응형

댓글