How to prevent forwarding to the web server back-end port when using Varnish

Depending on your web server and Varnish configurations, you may find that URLs that do not end in a trailing slash get redirected to Varnish's back-end port:

http://example.com/about --> http://example.com:8080/about/

If you have (and hopefully you do) your firewall blocking direct access to the back-end port, you can end up with a timeout error.

The solution to this is to put your web service back on port 80, rather than 8080 or something else, and configure Varnish to talk to it on the loopback address 127.0.0.1 on port 80 (not localhost or the server's IP address):

backend default {
.host = "127.0.0.1";
.port = "80";
}

If using Apache, set your ports.conf to chat on port 80 again, and any change any port references in your enabled site config files.  Then restart Varnish and Apache.

Get your Drupal site crawled by Google more efficiently with the XML Sitemap module

Google does a frighteningly good job of finding all the stuff you add to your site, but even it will miss some of the darker corners of your online properties.  The best way to ensure that the pages you WANT to have crawled are indeed crawled is via a Google Sitemap.

There are lots of solutions to this - you can roll your own, use external crawlers, and install standalone scripts that will do this for you.  If you're using the Drupal platform, there are a number of projects out there you can use.  One of the easiest to set up is (logically) called XML Sitemap.  Installation is a quick thing:

Get a daily server overview via email with logwatch

Keeping up with activity on your Linux server can be a full-time job if you allow it to be (as well it should be, it can easily be argued).  There are many services that can ease the burden by providing you with a detailed overview of what's going on at the moment (things like Monit, Munin, Cloudpassage, and a host of others).  

Something that can augment this nicely is the 30,000-foot view that logwatch provides.

Logwatch is a series of Perl scripts packaged together that will detect, parse, and summarize a wide variety of log file types, including Apache, sshd, postfix, iptables, and a host of others.  Simply tell it who to email reports to, and schedule it to run at least once a day to receive a regular email digest of server activity you can quickly scan for issues.

Chaining and piping at the command line

At the command line, you will find that you frequently need to run multiple commands to accomplish your end goal. It’s important, therefore, to understand the various options you have to link commands together:

Consecutively

The simplest way is to run multiple commands is to connect them with semi-colons:

command1 ; command2 ; command3

Doing this will run command 1, then command 2, then command 3, consecutively. It will wait for the previous command to complete before proceeding with the next one.

AND-ed

Alternately, you can connect commands with two ampersands between each, which will add a logic component to the operation by AND-ing them together. This runs the commands consecutively, but has the additional effect of will paying attention to how each command exits, and only proceeds to the next on success of the previous. So, for example:

Edge Side Includes with Varnish Cache

Overview

When caching content with Varnish, you can set different cache times for different types of content or URL patterns, which gives you some pretty good control over how quickly content is refreshed on your site (especially when combined with a good cache invalidation, or in Varnish nomenclature, purge, strategy).

With Varnish's implementation of ESI, you can make this caching strategy even more granular by caching different pieces of the same page for differing amounts of time. ESI stands for Edge Side Include, and is a standardized way to include cached content within other cached content at the proxy level.

Just as server-side includes are injected into the page by the web server, ESI content is injected as it passes through the reverse proxy service - in this case, the Varnish cache. Other proxy services, such as Mongrel, Squid, and Akamai have also implemented versions of this standard.

Scheduled DB backup with automysqlbackup

If you are running your own MySQL server, setting up a good backup routine is an inescapable responsibility. If you’re not running your own server, check with your hosting provider to verify that they are indeed backing your data up. If your hosting provider does not and does not at least provide you with way to roll your own database backups, consider switching to someone that does.

A handy utility that we like use to backup databases is AutoMySQLBackup, a mature and very effective Sourceforge project which manages both the backups as well as their compression and organization. It will retain daily, weekly and monthly backups of some or all of your databases to your file system. Couple this with something to move the database backups offsite or an effective server imaging schedule, and you’ve got a pretty good mysql backup process.

Prepping Varnish Cache

When we need to scale the amount of traffic a server can handle, we frequently turn to Varnish Cache for the job. It is a reverse proxy cache, meaning that it usually sits on the same server as your web service, between it and the outside world.

Varnish will take care of caching web content (markup, stylesheets, images - anything you see in the network inspector of your web browser), and then determines whether it needs to bother your web server about subsequent requests or just pull their contents from the cache. As such, a well-tuned Varnish configuration will reduce the amount of work that your web and database services need to do.

It’s stinkin’ fast as well - your configuration is compiled directly into c, and the cache can be configured to live in a reserved block of RAM.

Stopping email spoofing with SPF

SFP, or Sender Policy Framework is a standard that has been implemented in order to try to prevent or reduce email address spoofing.

Email Spoofing

At my workplace, we've managed email for our clients for years, initially self hosted, and later hosted at Rackspace's excellent Email and Apps group.

In the early to mid 2000's, we were suddenly faced with a rapid upswing in complaints of people receiving emails from themselves or other people at their company, when very clearly hadn't actually done the sending. Usually, their first thought they had was that they had been compromised - a virus on their own system or their mail credentials stolen. What ended up being the case was that their email was being spoofed.

Watching files with tail and less

There are many instances in which you may want to watch a file for new writes. In my experience, I’m usually checking for particular activity in an Apache or nginx log file. Frequently the gut-check question of “is my service being hammered?” can be (at least initially) answered with a quick visual sweep of the frequency and types of writes to a log file.

At the command line, there are two common ways of doing this - one using tail, and one using less.

Tail

By default, tail will show you the last 10 lines of a file, and then exit.  Now, it may be that’s all you really need to check -- perhaps all you're looking to vary is how far back you’re looking in the file.  You can change the number of lines tail outputs by using the -n option.

For instance, to show just the last two lines of an nginx log file, use:

Pages