본문 바로가기
[AWS]/Highly Available, Scalable, AWS Stack

7. Intro: Fault tolerance - single instance architecture

by SAMSUNG CLOUD-OKY 2022. 1. 17.
반응형

 

 

 

 

 

 

Welcome to module 2, in this module will look at fault tolerance for single instance architectures

what we mean by single instance architecture is that you have the PHP application right.

And then you have the mysql.

server right also the Apache web server right.

All in the same in the same instance.

And and we want to look at how to recover from different types of failures.

Right.

And this architecture is simple quite simple but quite useful because there are many applications which

need to use single instances because the multi-tiered architectures that we will see in subsequent modules

in this course, they are better they are better but they can be more expensive.

And also they are more complex they are more complex so single instance is very useful very simple.

And we need to consider ways in which we can recover from different kinds of failures.

So this is what we want to do.

Now the three types of failures that we look at by the way are for instance failure.

So what happens if this instance that we have here if this fails.

Right.

And storage failure.

This is let's say you're EBS volumes which have your data.

What happens if if that fails.

And also the AZ failure of this is the availability zone and this could fail.

Let's say if there's a power failure or at least some internet connection problem.

And so on

So these are the three failure scenarios

And and the way we will do this.

First of all we will set up the single instance architecture.

So we need to install the my sql server on the same EC2 instance right and then we will test the application

to see if it works with the local server.

So.

So the first step here be will be to basically create the single instance architecture.

Right.

And this will mean local mysql we can use apt get the Ubuntu package manager to install the mysql server

configure a page a page in our application.

This is local db dot php.

localdb.php

And then we verify this page using the public IP address of the EC2 instance.

So basically the public IP slash localdb

dot PHP.

So we go to this page using our browser.

And we cab we can then verify this page.

So this means our single instance architecture is ready and.

And then we are ready to look at the the recovery options for failures

The second thing we have to do before we look at the recovery is to is to prepare for the the recovery.

Right.

So there the preparation will involve four elements.

Okay we'll have to do four different things.

And those are one the elastic IP need the elastic IP.

This is an IP address which is also a public IP but allocated at an account level.

And you can move this IP address from one server to the other.

So this way this way your domain name your domain let us say domain dot com whatever this is for you.

This points to be elastic IP address right.

And when the we when we recover let's say by creating a replacement instance the elastic IP address can

be changed can be moved from failed instance to the replacement instance and the domain name does not

have to the domain or record set the domain mappings.

They don't have to change right which is very good because a domain DNS record set changes can take

time to propagate.

And because they have they are cached in many places and if you change those there can be some outage for some some

users.

So we don't want to change the domain records that's too much and therefore elastic IP is a good solution.

You can simply move it from the failed instance to a replacement instance.

OK we will also set up the data volume.

The second thing we want to do as preparation is to set up the data volume.

And this is an EBS a new EBS volume in addition to the root volume that we already have.

Right.

And we want to keep our data.

The application data let's say the mysql database we have to configure my sql to read and write from

the data volume instead of the root volume to which it is configured normally right.

And that this is this is something we do because we when we recover let's say we have a replacement

server.

It's very easy to detach the data volume from the failed instance and attach it to the new replacement instance.

So this we can recover without data loss.

So this is an important element of our preparation.

The third thing we want to do is we want to create an AMI Amazon machine image.

This is a different AMI from the AMI we created in the first module because this one will have our

data volume configuration and also the MySQL configuration which.

changes the data directory of MySQL

So that's why this AMI will be will be a different one right we create this and we can use this AMI

to create our replacement servers.

So so that's the third thing.

And the fourth thing we need to do is to create a snapshot of our data volume and this will have our

data, this will be a backup of our data.

And in cases such as volume failure where we have lost data we can go back to our snapshots and we can

to create or restore data into a new volume. So this is the preparation the four things we need to do

as part of preparation.

OK.

After this we look at we look at the the failure scenarios.

Right so there are three failure scenarios.

So the first one

is instance failure

And here what will happen is the EC2 instance on which our application and the mysql server is running fails

like this one the instance fails.

So what do we do.

So what we do here is we have to create a replacement server will we create a new one in the same

availability zone using this AMI.

By using this AMI.

And then we switch the elastic IP we move the elastic IP from the failed instance to the new one and

we also move the data volume because the AMI creates a with server with a root volume.

All right.

Not a data volume

And then we move the data volume from the failed instance to the replacement instance like this way we can

recover without data loss.

And that's how you recover from an instance failure.

All right.

Next scenario is the volume failure.

So this volume the data volume especially which has sensitive data our application data let us say that fails

So how do we recover there.

So there what we do is we use a snapshot and we create and we create we restore our data into a new EBS.

volume.

Right.

And then we you know we replace the failed volume with the restored volume.

OK so this is a restored volume and this replaces the failed volume in our server and then we are back online.

However here we have some data loss we have some data loss because of the snapshot could be several

hours and sometimes several days old.

So that is a limitation of this architecture that in certain types of failures we will have data loss.

Right and by the way also note that in both these scenarios the instance failure and the volume failure

we have we have downtime we have downtime because when we are switching from one server

to the other or when you're switching from one volume to the other the application is not really functional

but these are things we have to accept.

If you want to use the single instance architecture.

OK.

Finally we have the last scenario which is the availability zone failure.

Right and here the whole zone is down and this could be a power supply problem.

Let us say and none of our servers are accessible and therefore we can't even access our storage.

We have no choice but to use the AMI and create a server in a different zone the server comes up in a

different zone.

And because in AWS zones are independent.

And if there is a power failure in one the other zones don't use that same power supply.

Right.

So you know that problem will not affect the other zones and we will have a healthy availability zone

for us.

Right.

So we use one of those.

One of the other availability zones and create a replacement instance using the AMI.

And then we also use the snapshot to create our data volume Right.

And this will have some data.

This may be old but at least we'll have most of our data and then we can attach this restored

volume to our new server.

We can also switch the elastic IP switch the elastic IP to the new.

server right.

So by the way what you're seeing here is that the Elastic IP the AMI and the snapshot these are available

in any zone they are available in any zone.

Because because for instance the AMI and the snapshot they are stored in S3 and S3 is

available in any zone.

So even if a zone fails the data will be available to you because many copies of the data are stored in

all the zones

So basically even if the zone fails the AMI is available the snapshot is available and the elastic ip is

available as well.

And we can simply bring them over to a new zone and create a replacement server and then be back

online but here as well.

Remember there's data loss.

They data loss because we're restoring from a snapshot which could be hours or days old.

Right now in all these scenarios we have to verify right we have to verify.

So so.

So we basically use the same page the local db page right after every recovery.

We test the local db page and make sure that the mysql server is responding right.

And that hopefully means that our recovery has gone off smoothly.

OK so.

So this is what we want to do in this module.

Essentially we're looking at the false tolerance and ways of recovering from from failures.

For single instance architecture.

Right.

And we have to recognize and accept that in these recoveries recovery methods there will be some data

loss sometimes.

And in all these scenarios we will have downtime now in the next module

multi-tier architectures.

We will solve this problem solve this problem.

But ut it's instructive to look at recovery options for single instance as well because as I said single

instance architectures are quite simple and they're quite useful.

So in the rest of this module.

There are other videos and there are some other instructions instructions and other material.

Let's see how to implement implement this.

we will implement a single instance architecture.

First of all and then we will prepare for recovery and then we will work through all these failure scenarios

and practice doing the recovery.

So see you in the next set of videos.

 

 

 

반응형

댓글