본문 바로가기
[AWS]/Highly Available, Scalable, AWS Stack

15. Recovery: Availability zone failure

by SAMSUNG CLOUD-OKY 2022. 1. 19.
반응형

 

In this video let's see how to recover from an AZ failure.

So the AZ availability is on has failed.

Right.

And this could be because of say a power supply problem or say the Internet connection into the availabilities

on has failed or something like that.

Right.

And we're not able to access our services in this availability.

So OK.

And so we have to do something where we recover into a different availability.

So because remember the way it us is architected is that availability zones are independent.

Each zone has its own set of supplies its own set of Internet connections and so on.

And if one zone has a problem say that the power supply supplier then the same parts supplier was not

used in other zones.

Therefore the other zones will not be affected.

Right.

So.

So the assumption here is that we will always have a healthy right one or more reasons and we can recover

in two doors and we should not be creating a replacement that will soon be freed.

Availability.

So okay.

So.

So what we have to do the steps involved are essentially we disassociate the elastic IP like we disassociate

the elastic IB like we did in the in the in the instance failure scenario and and here we can access

anything else but we can't access these instance begin to access the data volume.

So which is which is a bad thing.

I began access today that volume which means there will be some data loss based on the scenario.

We again once again have data loss because the volume that we had which has the latest information is

not available to us.

Right.

And we can only use the snapshot which will be at least a few hours old and therefore we have lost that

much of it.

Okay.

So so there will be some data loss in this scenario as well.

Okay.

So once we do that once we disassociate the elastic IP right we can then use the AMI to create our replacement

instance and this will be in some other.

So it will be in some other zone.

Remember the new Several has to come up in some other zone which is Hateley.

Right.

And then we'll also have to use the snapshot and create a volume again in the same zone as the replacement

instance.

So again this will also be in a different zone but in the same vein as these sober the replacement.

So right.

So when we do this and then we attach the value to the replacements or we associate the elastic IP with

the replacement of our right and then we restart we restart the server and verify as you fall buy locally

we don't it.

Ok so lets see how to do how to recover from a zonal field.

Okay so this is for instance and this is in one be right.

This is in one B.

Let's assume that does on one be has failed.

So what we do is first people disassociate Let's remove the elastic IP address from the Celso.

All right so that's done but that's done.

And just to keep things clean.

Let me just move this civil right because any case we won't be able to access the server.

So we will have this server.

And then we all sort of move the volume.

OK we have some we have a problem and the data on you of this server.

Let me move that just to keep things clean here.

Right.

So so when you're when you're done with the instance the data is normally not crumbled it's not doubled.

So we have to separately go and remove this particular one.

OK.

Let me get to this.

OK so now the data is available and let you know delete the data.

This way everything is clean and everything is clean.

And now what we need to do is we need to go to the armies and create a replacement instance but that

should not be in zone 1 Be Like we need to launch in a different sort.

And let's say we launch in one day let's launch in one day and not one B because this is a zone which

is not healthy but let's make sure we do some others on and up and then everything else remains the

same.

But everything else I mean it's the same.

The security group really one default and the same keep there as well.

Right.

So this is our replacement instance in a different school in a different song.

Now we can now attach right along with this we can attach the volume and the lastic I'd be like so.

So let's that the data volume.

Well I'm sorry we don't have a better way.

So now we have a sober.

Remember we have a sober what we don't have a day that we could get.

We have to use the snapshot and create the data.

So.

So let me go now to the snapshot.

So what's coming up.

Well let's go to the DWM snapshot here and create a value.

Right.

And the volume again should be in one day which is where our replacements are is coming up.

Right.

It should not be in one be because that's a failed availability.

So OK so here's our data you know.

Right.

So so now you can see we have a DWM which is in an available state right.

Not get attached to the replacement server so be it.

So we do that let's use the actions and attach the value to the replacements like this.

Right.

Also let's see the elastic IP address with the replacement over like this.

All right so.

So this is the the resources have been attached by the resources being attached the subway system coming

up.

But remember we also need to restart the server.

Ok the subway has to be restarted for my sql to work properly.

So I mean don't stop the summer.

And then I'll stop this over once again.

And then we just have to let it fly.

Right.

You just have to wait in line after that.

OK.

So so basically just to summarize the steps that we have taken are.

First we disassociated the Mustique IP address right.

We created a replacement so we're in a different zone right in zone 8.

This is something we did then we use the snapshot to create a.

Right.

You want you again in the same zone as the replacement.

So in this case in my demonstration this was in some a lot of these were in zone 8 then we attached

the new volume to the server by also associated the elastic IP address book to the replacement server.

And then now we're trying to restart the server after this.

We just waited by the application.

OK.

So here's our server Let's see if it's in a stop state OK let's say it let's just wait for this like

and and then we started.

So.

So here it is substate.

And we stopped the server and in a few seconds we'll be in a running state.

And then again by the application

All right.

So OK.

Here it is right here.

Yes it's over.

And let's now verify the application using the elastic IP address.

OK.

Let me try again because someone may just be in the process of coming up right.

So now when I refresh you can now see that the page works right.

And more importantly it is connecting successfully to the mysql server.

OK.

So here's how you recover from the video and remember the replacement server and the new one you the

restore volume has to come up.

You know different availability.

So good luck with this task.

 

 

 

반응형

댓글