Clients website going offline at random

Have you ever been asked for one of those cases that nobody can figure out? Or do you have one of those recurring problems that are impossible to troubleshoot? Maybe you fix them with a reboot when they occur. I love cases like that. You can’t always fix them, but you always learn something. I had one yesterday and I managed to fix it. Here are some approaches that I used to solve it.

What was the setup?

This client had two servers. One with some WordPress sites and one with their application. The users connect to the application server and this server is also the proxy to the WordPress sites. This is an easy way to make it look like the WordPress pages are part of the application. When the user is filling forms they interact with the application and when they click on documentation they get a WordPress page. And because the application server proxying this page, it looks like the documentation is in the application. It’s on the same domain and uses the same layout. Pretty neat.

Making a quick sketch of the situation is often a good idea. It shows the client that you understand their setup and their problem. And it gives you insights that you wouldn’t by just clicking around.

What was the problem?

According to the customer, the WordPress server would be completely unreachable and they had no idea what triggered it. Maybe Digital Oceans network was to blame. They solved it temporarily by changing Nginx configs, connecting the proxy to internal addresses instead of public addresses and flushing firewall rules. Oh yes, and lots of reboots.

Don’t believe the customer

Like any detective show on television, server problems also come with a lot of misinformation. My client was blaming the Digital Ocean network for being unreliable. But Digital Ocean has a status page, so I went there first. According to them, everything was fine when the problems occurred. So I noted that as their alibi.

When I logged into the server it turned out they were using Apache and not Nginx. And when I asked them if the WordPress pages were unreachable for everyone or just the production server, nobody knew.

Test it yourself

I did not get a chance to experience the unreachable state myself. Because the downtime was a big problem for their customers and they would fix it as soon as it happened. If you get the chance to see a problem live that’s always better than relying on reports from eyewitnesses.

I had 3 approaches left to follow:

  1. look around on the server
  2. read the logfiles
  3. read the config files.

The solution is probably in the log files but I choose to look around on the server first. Just a quick check to see the uptime, the load, disk usage, last logins and what is running.

My look around showed nothing that made me suspicious. I used ‘w’ to see uptime, load and logged in users with one command:

Read the logs

The log files are the place where you usually find the murder weapon. Or in this case, the line that will explain the problem. I always start with the syslog. It’s in /var/log/syslog. If it’s not too big I’ll just open it with Vim or Nano and scroll through it so I get an idea what is going on. If it is big I’ll grep it for keywords like error, block, killed, etc.

After that, I read individual log files starting with the ones that are the most system and networking related. For example the kernel log, firewall log, fail2ban log. And then the logs from applications that run on the server like the webserver and the database server.

Conclusion

This time it was fail2ban that blocked the server with the WordPress sites because it was sending too much traffic. I disabled fail2ban and the site has been up ever since!

 

How to enable pruning on bitcoind

If you don’t want to have the whole bitcoin blockchain on your computer or server you don’t have to. There is an option that deletes previously verified blocks from your local copy. This is called pruning. To enable it you have to add the following line to your bitcoind config. This config file is usually in ‘/etc/bitcoin/bitcoin.conf’.

After that you have to restart bitcoind and 5 minutes later you have a whole lot more free diskspace.

You can check if pruning is enabled with:

It will output something like this:

Good to know:

The prune= value has to be higher than 550. And the number indicates the diskspace to be used in Megabytes. I went with 5000 because 5GB seemed reasonable to me.

Troubleshooting:

After I enabled the prune=5000 option bitcoind wouldn’t start anymore. There was no usable error:

I checked for typo’s in the config file and read the logfile in /var/lib/bitcoin/debug.log but everything seemed fine. So I started the daemon by hand. This is usually a good approach to see what’s going on.

And there it is. You have to disable txindex in the bitcoin.conf to use pruning.

How to use HMAC-SHA256 to connect to a REST API like Ticketmatic

A client recently asked me to export records from Ticketmatic. Ticketmatic is a SAAS application for selling event tickets. They have a JSON API, so I figured it would be easy. Just send a GET request to some URL and parse the result as JSON right?

That doesn’t work because they use a hashing algorithm called HMAC-SHA256. This requires you to sign every request you make with a secret key to create a signature. After that, you have to put the signature, the current timestamp and an access key in the Authorization header of the request. Not just once but for every request!

 

Continue reading “How to use HMAC-SHA256 to connect to a REST API like Ticketmatic”

Multiple Passenger apps on your server? Give them names!

If you run multiple versions of a Rails application on the same server it’s easy to get them mixed up. I have a client that has 2 versions of the same application on the same server. One version for clients in the Netherlands and one for Belgium. Because they are almost, but not totally, identical I’m doing as much as I can to make them easy to identify. And today I found a simple trick that I’d like to share.

Continue reading “Multiple Passenger apps on your server? Give them names!”

Is Digital Ocean’s One-click app for Ruby on Rails any good?

Digital Ocean offers 2 types of droplets (servers):

  1. Droplets with a clean Linux install.
  2. Droplets with some application preinstalled: “One-click apps”

Let’s have a look at the One-click Rails installation they offer. I’ll describe what you get and what I like about, what I don’t like about it and I’ll give some tips on how to use it.

Welcome to One-click apps, what do they do? Do they do things? Let’s find out!”

Continue reading “Is Digital Ocean’s One-click app for Ruby on Rails any good?”

Use a git-hook to deploy your app

There are a lot of ways to deploy an app to a server, here is a simple one that I often use. This can be a bit confusing if you are not familiar with Git but I promise it’s the easiest way!

It works like this: you create an empty git repository on the server and you push the branch you want to this reposity from your development machine. The repository on the server has a little script (hook) that puts the files in the right directory (‘rails_project’) and runs all the bundle commands. You end up with 2 directories: the bare repository and the ‘checkout’ with the project. Continue reading “Use a git-hook to deploy your app”

Monitoring Sidekiq with email and SMS alerts

Sidekiq is a simple and efficient background processing tool for Ruby on Rails apps. You should use it for tasks that take too long to put in a controller or tasks that need to run on a schedule. Common examples are sending emails, generating pdf’s and connecting to other services through an API.

Like all things in IT, Sidekiq can crash, get slow or need more capacity. And for some applications that really should not happen. Because nobody directly interacts with Sidekiq it can be a long time before someone notices that something is wrong.

A good solution to quickly restart (kick, hehe) a crashed Sidekiq is to have a watchdog on the server. Both systemd and upstart can do that for you, and there are lot of other watchdogs you can install. But I’d still like to know something happend, or when an edge case happens where the watchdog cannot fix it. The solution? Monitoring Sidekiq from the outside.

Continue reading “Monitoring Sidekiq with email and SMS alerts”

Simple downtime alerts for your Rails app in 5 minutes

How do you know your Rails app is still online? How do you know it’s not displaying some error? That’s what monitoring is for. If you look around you’ll find lots of solutions. Most of these solutions are overkill if you are just starting with Rails servers or if you only have a few applications.

What is important when choosing a monitoring tool?

Continue reading “Simple downtime alerts for your Rails app in 5 minutes”