Video Transcript

Hi. So I’m Ben Christiansen are one of the architects here at kinetic data.

So what is continuous availability? One of the things that we’ve been hearing more and more from our customers is that they’re using the kinetic product is going to platform for, for their business critical needs. So the things that is actually running their business.

And they really want to avoid downtime, right? Downtime costs, money, downtime means that people can’t be productive doing their day to day tasks. And that’s really what continuous uptime is. It’s that pursuit of 100% uptime.

And this practice is really awesome, because no one likes downtime.

So how do we achieve or how do we strive for this 100% uptime, there’s really two types of downtime, there is planned downtime for operational tasks, such as upgrading a database, upgrading your application version, you know, changing your hardware out, and then there is unplanned downtime.

And, you know, that’s for things like you know, your system crashing, you know, network going out, you know, critical application error or issues. First part portion, that’s called continuous operation. And the second part part is called high availability.

And you combine those together, that’s kind of how you get continuous availability. So to achieve this, you really need to do look at your whole solution and apply it to all of the different, you know, what I call layers of that.

So you’ve got your front layer, with most of our solutions being web applications, that would be something like your load balancer or your web proxy, then you’ve got another application layer where you’re, you know, running things like, you know, the kinetic request, or, you know, any of your ITSM solutions.

And then you’ve got your data layer, which is things like your database, and file system with dealing with attachments, and all three of those layers, you need to look at, well, how do I, you know, minimize my planned downtime? How do I minimize my unplanned downtime?

So cannot as an implementation just like any of our other on prem? You know, we talked a little bit about that this morning.

And we wanted 100% uptime, just like all of you do, you know, we don’t want want the system to ever go down. And so we spent a lot of time looking at CONOPS. So how do we actually achieve this, and a lot of it comes down to avoiding downtime with redundancy.

So you can kind of see from this, this slide, we have kind of different boxes, which represent the different components within each box, we’ve got multiple instances is oftentimes referred to as running a cluster of instances. So you might have three instances of your kinetic request application that all serve up that kinetic request functionality.

So really, this started before we started working with canape specifically, we really wanted to look for a database that had a really good continuous availability story. That was one of the major draws to Cassandra, as a distributed database, it’s really easy to run a Cassandra cluster with multiple instances, and then their fault tolerant there also, it’s really great for your operational changes.

So for example, on Kinnarps, we have changed our underlying database hardware, and upgraded our version of Cassandra all live in production with no downtime. That’s a really, really great feeling.

Also, while you know, CONOPS is an implementation is like everyone else’s, else’s on prem, we really wanted to pick a cloud provider that had some good continuous availability stories, that’s one of the major draws to Amazon.

Everything that we do, that’s using an Amazon technology you can do on prem as well, Amazon just kind of wraps it up with some nice graphical interfaces to make it a little bit easier.

So how do we avoid planned downtime with CONOPS?

You know, this kind of gets into that clustering, and making sure that all of our components, so can any request, bridge hub, you know, anything that we’re relying on our elastic, so anything was shared state needs to go into a database that, again, requires us to look at how to keep it continuously available. And we really wanted all of these components to be deployable without human interaction.

This also has the benefit of addressing scaling both up and down, and making sure that these systems can very easily be interacted with. So in practice, this means kind of one of two primary strategies. Generally, the preferred approach is rolling restarts, meaning that I’ve got a cluster of three instances, I can shut one down and the other two are still serving.

That way. If one crashes, I still have something that’s that’s serving, bring it back up with a new version with, you know, new hardware, whatever that requires. Sometimes that’s not an option, like bringing up an instance that isn’t going to be compatible with other nodes in the cluster and Have a technique or a strategy for that is called a bluegreen. deployment, which basically means that you set up an entirely new cluster.

And then in one fell swoop, you switch your load balancer or you know, however you’re doing your network layer from one cluster to another. And again, you’re losing No, no traffic, you it’s all available, and you’re just switching over to an entirely new environment.

So how do we avoid unplanned downtime with Kynapse? Really, the key to this is no single points of failure. So for our components, services, everything is running in multiple instances, and then everything is running in separate data centers, or in AWS, it’s called a availability zone, that makes sure that if there is a site disaster, you’re still going to be able to be serving up your different service components.

So again, in practice, a lot of this has to do with clustering, there are some technologies that are difficult to cluster, for example, your actual load balancer, at your, you know, outermost layer, and oftentimes, then you resolve that with what’s called active passive with failover.

So you’ve got two instances, and you’ve got something that’s monitoring so that when you shut down or when something, one of those instances crashes, it’ll automatically switch over which one is active.

Some additional considerations that we had when working with CONOPS is, you know, we really wanted to simplify this deployment and testing and, and kind of take some of the uncertainties or fears away from tradition, or from most of the operational tasks.

As John mentioned, you know, right now, we’re actually deploying to catch up sometimes multiple times in a week. So we wanted that to be very comfortable, familiar process.

Yeah, that kind of comes from, we have a alternate staging environment, so we can deploy their first test it in, you know, 100% equal environment. And then depending on which system it is, sometimes we’ll do a bluegreen deployment and just switch over to what we were using for testing.

Sometimes we’ll just deploy that to our primary instance. And yeah, for us, this has been great. You know, as developers, we don’t want to be constantly maintaining and monitoring the system. And so we can achieve a lot of these things with no outage windows. And something that’s kind of fun.

So we actually do most of our management of CONOPS using our own canape space, so things like updating CE spinning up new instances, those have forms in our service catalog, that then use task to interact with AWS to do a lot of those tasks for us. So we have a lot of the tracking the visibility, and simplicity that comes along with that. And really, everybody loves, you know, really likes this idea of continuous availability. And it’s been great for us because everybody likes uptime.

Thank you.

 

 

 

Back to Videos

Try the Kinetic Platform Today.

When you're ready to learn how Kinetic Data can help you achieve better business outcomes, we're here to answer your questions.