Intro

Welcome back. Yesterday we focused on how to use Nginx alongside a dynamically rendered web site or service built with something such as Express.js or Ruby on Rails. However, the benefits experienced from using Nginx with a dynamic web site is minimal compared to some of the other capabilities that Nginx offers and is actually intended to specialize in. Today we are going to focus on some of the key things that Nginx does best and that is acting as a load balancer, reverse proxy, and cache. So let's not waste time. Let's crack open the terminal and get started with a freshly configured FreeBSD droplet we've already

Getting started

Now, before we start entering commands it is worth mentioning that the droplet we are about to log in to already has Nginx installed and a basic configuration where we have activated the firewall, ntpd, time zone, etc. If you don't have a droplet that is already configured as such, just take a quick look back at day 1 and follow the configuration steps provided. If you still have access to that droplet, then there is nothing else you need to do and you're ready to proceed.

Setting up the reverse proxy and load balancer

ssh freebsd@your_server_ip
sudo chown -R freebsd /usr/local/etc/nginx/nginx.conf (if you're using Sublime SFTP)
sudo nano /usr/local/etc/nginx/nginx.conf

First, we're going to clear out nearly everything in this configuration file so we can focus in on what we are specifically looking to do.

video showing everything being cleared out

That's better, now let's write out our load balancing and reverse proxy scripting so we can then go through it line by line.

upstream myapi {
    server api_ip_address;
    server api2_ip_address;
}

server {
    listen 80;
    server_name testapp;

    location / {
        proxy_pass http://myapi;
    }
}

There we go, that doesn't look too bad upon first inspection. There will be more that we will want to add to this but for now, let's see what we've got presently.

First, let's pay attention to the upstream part here. What we are doing here is listing the ip addresses of the apis we wish to proxy and load balance with. In our case we're just using some dummy data for now to illustrate our key points here.

Next, let's take a look at this server block. We can see that we're listening on port 80, we have named our server "testapp" then we have this root location block where pass in our upstream we created earlier. Nothing too bad or fancy at all.

So technically, what we have here can work but if we are going to launch this in a real production scenario then there are a few more add ons we need to consider. We need more than just this if we want Nginx to operate in an optimum manner. For example, when Nginx receives the initial request from the client, it will modify some headers for performance reasons. For this reason, we need to add a simple directive that will ensure the host header is set to one of our pretend APIs and not the Nginx server itself. To do this, let's make a quick change to our code.

server {
    listen 80;
    server_name testapp;

    location / {
        proxy_set_header HOST $host;
       proxy_pass http://myapi;
    }
}

There, that should do it, however we aren't done. We need to also forward some IP information to the pretend APIs because in a production scenario, our pretend APIs may want to know where their requests are coming from. So let's get this handled.

server {
    listen 80;
    server_name test;

    location / {
        proxy_set_header HOST $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_pass http://myapi;
    }
}

Excellent, now it's worth taking a moment to discuss the load balancing algorithms that we have at our disposal. The first one is the default algorithm used by Nginx called round robin. It ensures that your load between the servers is equally distributed. Pretty simple. Next we want to take a look at least connected, optionally weighted. In this algorithm, it will send requests to the server with the least connections. Helping to lighten the load if either server is becoming overloaded. If you wanted to use it, the code would like this.

upstream nodeapp {
    least_conn;
    server api_ip_address weight=1;
    server api2_ip_address weight=1;
}

Not bad, now it's time to take a look at another algorithm called ip hash. To understand this algorithm, we need to take into consideration that many servers are going to be stateful. Many applications use sessions for their authentication schemes for example. In this case, either of the 2 previous algorithms would be unsuitable because if a client initiated a session with one server, then Nginx routes the request to another server, that other server will have no clue what this is about and then you have a problem. This is where ip hash comes in. It will continue to load balance but it will make sessions sticky to a particular ip address. So if a client opens a session that was started with one server, then their requests will continue to be routed to that particular server. Enabling this is remarkably easy, let's write it.

upstream nodeapp {
    ip_hash;
    server api_ip_address;
    server api2_ip_address;
}

Again, simple to set up. Now, let's go ahead and return our configuration to the round robin algorithm.

video showing return to round robin algorithm

That's it for configuring the reverse proxy and load balancer. Now it's time to shift our attention to how to use Nginx for cache duties. For basic caching we only really need two directives to achieve the baseline of what we need. Let's see what they look like.

proxy_cache_path /var/cache/myapp levels=1:2 keys_zone=my_cache:10m max_size=10g
                 inactive=60m use_temp_path=off;

server {
    listen 80;
    server_name test;

    location / {
        proxy_cache my_cache;
        proxy_set_header HOST $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_pass http://myapi;
    }
}

Let's take a look at proxy_cache_path first. This is where we define precisely where in our system the cached data will be stored. We also handle some configuration options that we will go over. First, you will see here where we defined the path to where our cache will be stored. Next we have defined levels which in this case will be a two level directory hierarchy. This is to help our system stay fast and efficient in the long run and it's a recommended setting by the Nginx team themselves. Here you will see this this keys_zone option which sets up a shared memory zone which can help Nginx not have to go to the disk for many of its cache duties. In this case we have given it 10mb of key space which should suffice in many cases. Next we have max_size which should be pretty easy to understand. It determines the maximum size that our cache can reach. Our next item, inactive, is fairly simple to understand too. It determines how long a particular cached item can remain without being used before it is removed from the cache storage. Lastly we have use_temp_path which would normally be a functionality that allows Nginx to copy certain cached files to a particular area. However, in most cases, the Nginx team advises leaving this feature off.

Lastly let's turn our attention to the proxy_cache setting. What this does is activates the cache to store data for the root location it is located in. In this case it is in the / location block so that is what it will store information for.

Conclusion

So we've gone through quite a bit on Nginx's proxying, load balancing, and cache abilities. Tomorrow we will go over managing logging and monitoring using Nginx.

Helpful Links

Alex Allen

Alex is an independent developer who is obsessed with both performance and information security. When not writing code, Alex is either playing guitar or working in his garden.

  1. Comments for Working With APIs

You must login to comment

You May Also Like