본문 바로가기
[AWS]/Highly Available, Scalable, AWS Stack

14. Recovery: Volume failure

by SAMSUNG CLOUD-OKY 2022. 1. 19.
반응형

 

This video let's see how to recover from a volume failure.

Right.

And specifically we're talking about the data volume failure because this has are available data right we don't

care so much about the root volume because the Rupali is already backed up in the AMI.

Right.

And the software doesn't change so much that the data is what changes very frequently and and the snapshotting

that they do is usually on a nightly basis.

So we are most concerned about.

Most concerned about.

Serious crime.

And so let's assume that this volume has failed.

What do we do.

Well the first thing to understand is that your data is lost right some of the data is lost because

the snapshot the snapshot is at least a few hours old.

So this does not have all the data because this is a point in time snapshot right.

It doesn't have the data since the last time the snapshot was big.

OK so so so there will be data loss.

There will be data loss.

In this scenario and your business has to accept this your business has to understand that in an instance

single instance architecture will have a sudden turn out of use like this one where they will be some

data loss it or so now how do we recover.

So.

So what we do is and we add a couple in the sense we get some of the data back or most of that back.

Right but the most recent data may be lost.

Ok so so what we do is we stopped the first bite and then we detach we can detach this this this will

you any way this has failed it's not responding.

We don't need it anymore right.

We detached and then we can also delete it like we can also delete this volume.

Right.

And after that we will restock my bill have to restore our Dedo what it would be that we have into a

new eBay is what you know use the snapshot and restore a new volume in this one and we'll have the data.

I wonder what do we have in the snapshot.

Great.

And then we attached this and then we'll abash this to be easy to instance.

Okay.

And then rest on the Sabbath and then we check we verify the application essentially the same local

DB dot PHP page.

Right.

So let's see how do I do this.

So here's our civil right and here's our.

Here are the volumes that we have like we have the Palu and we have the data we get.

So let's say the volume has freed and let me stop the server first.

And in this way we can detach the volume easily.

So let's let's try and put this into a stop state and to and then from here from here from the volume

section we will select the value for you and then we say detach.

Okay you say did that.

So let me just wait for this because sometimes it can it can cause some problems if you try to detach

the civil rights changing state.

So let we just wait for this.

And and.

OK.

So now this has moved into stopped.

So let's go to volume now and select the DWM and from actions say didn't write say the Dutch OK.

All right.

Not the state will change from use to available which means it's now detached.

Right.

And just to keep things clean.

Let me remove this so that there's no confusion as to which one is the fallible human which was

the new one you let me rule this D-W right.

And this failed warning has now been revealed.

OK.

All right so now we need to restore.

So we go to our Snapshot.

If you go to a snapshots and we have a snapshot right off our day w you can see the size of one G-B

could be use this interim actions we say create what you like and let's make sure that it comes up in

the same so.

The one who has to be created in the same zone.

Otherwise we won't be able to sleep on the server that we have.

Right.

So the server is in one beat.

So the volume has to be in one bit.

All right so now the replacement value of the all the data has been restored.

Right into this new wallaroo that we that we have right.

And once that happens we can we can simply attach this to our instance.

Right now this is in a stopped state right.

We attach and then we can start the server.

So this is our server.

Maybe I mean I'll start this.

So OK.

And once this comes up once it moves into running we will verify the application.

And and for that we just have to use the same old page the local db page and our elastic IP address.

OK.

So let's just wait to this any moment now.

OK here despite running state and let's refresh this page ok you can see it say it's connected so.

So we have recovered successfully from one failure.

And what were the key things here.

One is detached the failed volume and then the store a new one from a snapshot and attach and attach

and remember it has to be in the same.

But the volume should be created in the same zone.

This is the key bit here.

OK.

It should be the same as on the restoration.

And also don't forget to restart the server should be restarted as well.

And then you can bet it's like the application.

OK.

So good luck with this task.

 

 

 

반응형

댓글