Do you have any useful awk and grep scripts for parsing apache logs? [closed]

You can do pretty much anything with apache log files with awk alone. Apache log files are basically whitespace separated, and you can pretend the quotes don’t exist, and access whatever information you are interested in by column number. The only time this breaks down is if you have the combined log format and are interested in user agents, at which point you have to use quotes (“) as the separator and run a separate awk command. The following will show you the IPs of every user who requests the index page sorted by the number of hits:

awk -F'[ "]+' '$7 == "https://serverfault.com/" { ipcount[$1]++ }
    END { for (i in ipcount) {
        printf "%15s - %d\n", i, ipcount[i] } }' logfile.log

$7 is the requested url. You can add whatever conditions you want at the beginning. Replace the ‘$7 == “https://serverfault.com/” with whatever information you want.

If you replace the $1 in (ipcount[$1]++), then you can group the results by other criteria. Using $7 would show what pages were accessed and how often. Of course then you would want to change the condition at the beginning. The following would show what pages were accessed by a user from a specific IP:

awk -F'[ "]+' '$1 == "1.2.3.4" { pagecount[$7]++ }
    END { for (i in pagecount) {
        printf "%15s - %d\n", i, pagecount[i] } }' logfile.log

You can also pipe the output through sort to get the results in order, either as part of the shell command, or also in the awk script itself:

awk -F'[ "]+' '$7 == "https://serverfault.com/" { ipcount[$1]++ }
    END { for (i in ipcount) {
        printf "%15s - %d\n", i, ipcount[i] | sort } }' logfile.log

The latter would be useful if you decided to expand the awk script to print out other information. It’s all a matter of what you want to find out. These should serve as a starting point for whatever you are interested in.

Leave a Comment