Quantcast
Channel: SeattleIT.Net
Viewing all articles
Browse latest Browse all 20

HTTP 4xx & 5xx from Nginx / Apache Web Server Logs (Daily Email Report)

$
0
0

It’s a good practice to monitor your web server logs for erroneous HTTP status codes in the requests made to your web server. There are several reasons for doing this routinely, including troubleshooting web server problems, identifying web server attacks, SEO purposes, and providing an overall improved experience for the end users.

Many system administrators and developers will monitor web server logs for issues. A lot of the time, it isn’t a routine practice and can be inadvertently neglected. This is a quick and dirty bash one-liner that will grab all of the matching 4xx & 5xx web server log entries, and send them to you in a daily email digest for further analysis, which automates the practice of proactively monitoring web server logs for issues.

Consider the following…

LogFormat

The web server access log format displays:

172.16.1.12 - - [17/Mar/2012:19:54:05 -0700] "GET / HTTP/1.1" 200 7837 "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"

Log Location

The web server access logs are located in:

/var/log/apache2/VHOST-access.log

Where VHOST is your virtual host(s).

The Script

date="`date +%d/%b/%Y`"; \
for logfile in `ls /var/log/apache2/|grep "access.log"`; \
do echo -e "\n$logfile\n"; \
awk -v d=$date '$4 ~ d && $9~/^[4-5]/' /var/log/apache2/$logfile; \
done | mail -s "Daily 4xx/5xx Report from `hostname`" your@email.tld

The above one-liner will effectively find all of your individual Apache virtual host access log files, and extract all of the HTTP 4xx and 5xx entries for a given date (in the format of “18/Mar/2012“).

A variation of the above script caters to server deployments that run Nginx along side of Apache, where the server is logging to two different default web server locations (/var/log/apache2/ and /var/log/nginx/), and will perform the script operations on both sets of access log files:

date="`date +%d/%b/%Y`"; \
for webserver in apache2 nginx; do \
echo -e "********** BEGIN Report for $webserver **********\n\n"; \
for log in `ls /var/log/$webserver/ | grep "access.log"`; do \
echo -e "\n$log\n"; \
awk -v d=$date '$4 ~ d && $9~/^[4-5]/' /var/log/$webserver/$log; \
done; \
echo -e "\n\n********** END Report for $webserver **********\n\n"; \
done | mail -s "Daily 4xx/5xx Report from `hostname`" your@email.tld

Adding this to cron, running it a 11:59pm daily will give you the digest. A consideration before deploying this is the possibility of having large access logs that might potentially produce thousands of lines of log entries. A better solution in those types of deployments for this script example is to use uuencode for attaching the output as a compressed attachment instead.

Depending on your tolerance for the email size, one can incorporate simple logic using wc to count the lines in the output, and send it raw via the mail utility if the lines are less than X or use uuencode if the lines are greater than Y.


For those that are fairly new to web server administration or HTTP in general, here are some of the most common HTTP server and client error codes you will encounter:

HTTP 5XX

HTTP error codes beginning with 5 indicate a problem with the server. Most commonly these fire in scenarios such as:

500 Internal Server Error

  • A server misconfiguration prevents the request from being processed

502 Bad Gateway

  • Can indicate server in your load balance pool are down at the time of the request

503 Service Unavailable

  • May indicate that there isn’t enough capacity on the back-end servers or back-end server threads to handle the traffic

HTTP 4XX

These HTTP codes relate problems with the client request, but the most common scenarios are:

401 Unauthorized

  • The client attempted to access a resource that requires authentication
  • Large amounts of HTTP 401′s in your server logs can indicate a brute force attempt against resources protected by HTTP authentication

403 Forbidden

  • The client attempted to access a resource that the server’s configuration denies permission to
  • Can be caused by file permissions, an htaccess directive, a client attempting to view directory indexes, and so forth

404 Not Found

  • Files no longer exist or were moved
  • Broken links or paths, best remedied with 301 or 302 redirects to the new location

444 No Response (Nginx specific)

  • This is a special Nginx HTTP response code that simply does not respond and resets the connection to the client
  • Useful for protecting web server assets and mitigating requests from known-bad user agents, and has several other advantages


 


Viewing all articles
Browse latest Browse all 20

Trending Articles