Hello, I’m Austin Peterson, here to talk about log aggregation.
So I’ll kind of show you something you’ve probably seen a little bit already today. And that’s our architecture that we have in our CONOPS environment. And one of the things they can kind of see is that there’s a bunch of servers all around.
And kind of, one of the things that you might not have really been thinking about, though, is that, you know, more servers, more logs, more problems. So that was one thing that we did kind of learn about while we were going through this.
So one of the things that you’ll kind of, or that we came across, when we were going through this and sign up can ops and trying to make sure that we had a continuous uptime and availability was making sure that we didn’t lose track of any of the log files.
As we spin up new servers, we need to make sure that we know where those log files are. And when we d provision them, we need to make sure that we don’t lose any information log files that might help us to look at problems that may have occurred in between. And so trying to get all these servers and look through them for a specific error kind of became a needle in a haystack.
Once you started having the servers come and go to deal with upgrades and for scaling. And the other thing that we kind of just kind of encountered was that we didn’t really get any analytics out of log files. And that was something that we realized that there is a lot of good information inside our log files that we could use as analytics instead of just for troubleshooting.
So ah, what we did was we actually changed. And we put out a new feature in our versions of Connect request and branch hub and file up called structured logging. And what structured logging does is it puts the log files in a consistent and machine readable format, so that these other tools that do log aggregation can actually make use of them.
So log aggregation is actually getting all of those log files from the different servers and applications and getting them into one central place. That then makes it very easy to query on and to do reporting and filtering and other stuff like that. And we’ll have a couple of examples show you here in just a little bit. But the log aggregation tools that we use, were file beat Elastic Search and cabana. filebeat is what is used for actually reading the log files and then sending it over to Elastic Search.
Elastic Search is a full text search engine that makes it easy to be able to actually search for the errors in the log files. And then finally, cabana is kind of a GUI front end to Elastic Search and allows you to actually have graphs charts and have a website that you can enter in your query to find the errors in the log files. So in a picture, that is kind of the way that it works is that we start out with the log files, goes through filebeat, filebeat sends it off to Elastic Search.
And Elastic Search then makes it available inside of Cabana, where then you can have charts graphs and be able to search for whatever it is that you want. So what can be logged? There’s quite a bit.
And some of the stuff that for example that we have is logging authentication as far as whenever it was a success or a failure, what that authentication type was, whether it was LDAP, or SAML, or using our built in local authentication, HTTP information as far as response times and for what URLs, IP addresses, you know, what server it went to and where the request came from.
And then also just your typical stuff that you get in log files as far as timestamps and thread information, a bunch of other gobbly gook technical stuff so
What we have done with it is, we have actually use it to quickly find errors that have been sent to us in emails with screenshots, you know, and just arrows pointing saying error. It’s like, okay, well, sure. Now I’ll try and find that a log somewhere. And thankfully, though, with the user name, we are able to make a simple query and find the errors for that user very quickly.
And we don’t have to try and go to all the individual servers that might have that log file. We’ve also done some server side performance trending with it.
So for a, again, with a specific user in mind, we decided to take a look and see how long our API call was taking over time. And this is over the course of 60 days, we were able to trend on how, what what the average was for the response times of API calls coming back.
We also then create our own interactive dashboard that allows us to choose the view time of either between a day, week or month and they would update the graphs and the numbers there that tell us information as far as how many submissions were made in that time period, or how many API calls were made in that time period, users created or unique users that logged in in that time period.
And we use that to get a very quick glance to figure out how active a particular spaces in our kin ops environment. And then another thing that you can just know is that you can do this too.
This isn’t exclusive to CONOPS, or anything like that. This is something that we’ve actually documented on our community site. And we have instructions on how you can set up structured logging with our applications and then how to specifically set up filebeat Elastic Search and cabana to do the same things that we just showed in those previous slides.
So that’s log aggregation.