+menu-


Updating .Htacces file based on Apache log files

I am still seeing massive amounts of referal traffic hitting my site, eating up my bandwidth.. I did not get time to update my .htaccess file for the last 2 days.. and within the last 24 hours I have had more than 6000 hits, generating in almost 24.000 pageviews… Generating more than 1 GB worth of traffic (So at that speed I will reach my 10 GB limit soon)

Looking through the Apache logs, figuring out which sites I get most referral traffic from, getting the hostnames, transforming them into a format that can be used by the Apache rewrite engine in the .htaccess file has been time consuming. So I decided that some powershell magic, might speed up the process a bit.

function Select-FileDialog
{
	param(
		[string]$Title,
		[string]$Directory,
		[string]$Filter="All Files (*.*)|*.*")
			[System.Reflection.Assembly]::LoadWithPartialName("System.Windows.Forms") | Out-Null
			$objForm = New-Object System.Windows.Forms.OpenFileDialog
			$objForm.InitialDirectory = $Directory
			$objForm.Filter = $Filter
			$objForm.Title = $Title
			$Show = $objForm.ShowDialog()
				If ($Show -eq "OK")
					{
						Return $objForm.FileName
					}
				Else
					{
					Write-Error "Operation cancelled by user."
					}
}

#Function to create the http rewrite rules.

Function Create-Rewrite {
	Param (
			$Hostname
		  )

		$HtaRule = "RewriteCond %{HTTP_REFERER} ^http://" + "$($hostname.replace(".","\."))" +" [OR]"
		$script:BlockList += $HtaRule
}

Function add-htaccess {
	Param (
		$HtaRules
		)
	(Get-Content $htaccess) | foreach-object {
		$_ 
			if ($_ -match "RewriteEngine") {
				if (!(Select-String -simplematch "$htarules" -Path $htaccess))
				{				
				$HtaRules
				}
			 }

		} |	set-Content $tempFile
	Copy-Item $tempFile $htaccess
}

Function Upload-Ftp {
Param ([Parameter(Position=0, Mandatory=$true)]
		[ValidateNotNullOrEmpty()]
		[System.String]
		$FTPHost,
		[Parameter(Position=1)]
		[ValidateNotNull()]
		$File
		)
			$webclient = New-Object System.Net.WebClient
			$uri = New-Object System.Uri($ftphost)

			"Uploading $File..."

			$webclient.UploadFile($uri, $File)		
		}

#Variables
$log = Select-FileDialog -Title "Select an Apache logfile"  
$htaccess = "c:\Temp\.htaccess"
$tempFile = [IO.Path]::GetTempFileName()
$URLCount = 15
$FTPUsername = "Username"
$FTPPassword = "PassW0rd"
 
$BlockList = ""
#Create list of sites to block
$script:BlockList = @()

#Get the list of URLS in the the logfile, capturing each element into different named capturing groups

$urls = Select-String '^(?<client>\S+)\s+(?<auth>\S+\s+\S+)\s+\[(?<datetime>[^]]+)\]\s+"(?:GET|POST|HEAD) (?<file>[^ ?"]+)\??(?<parameters>[^ ?"]+)? HTTP/[0-9.]+"\s+(?<status>[0-9]+)\s+(?<size>[-0-9]+)\s+"(?<referrer>[^"]*)"\s+"(?<useragent>[^"]*)"$' $log |
     Select -Expand Matches | Foreach { $_.Groups["referrer"].value } 

#Output statistics for the referer hostnames (Only show top 15)
$urls | group | ForEach -begin   { $total = 0 } `
	-process { $total += $_.Count; $_ } |Sort Count | Select Count, Name |
	Add-Member ScriptProperty Percent { "{0,15:0.00}%" -f (100*$this.Count/$Total) } -Passthru | select -Last $URLCount

#Getting the base hostnames from the complete URLS, and outputs statistics to the screen.

$hosts = $urls | Select-String '\b[a-z][a-z0-9+\-.]*://([a-z0-9\-._~%!$&()*+,;=]+@)?(?<host>[a-z0-9\-._~%]+|\[[a-z0-9\-._~%!$&()*+,;=:]+\])' |
Select -Expand Matches | Foreach { $_.Groups["host"].value }  | group | sort count |  where {($_.name -notlike "*xipher.dk*") -and ($_.Count -gt 100)} |
 ForEach -begin   { $total = 0 } `
	-process { $total += $_.Count; $_ } | Sort Count | Select Count, Name |
	Add-Member ScriptProperty Percent { "{0,10:0.00}%" -f (100*$this.Count/$Total) } -Passthru 

Write-Host "List of root hostnames"

$hosts

Foreach ($Url in $hosts) {

Create-Rewrite $url.Name
}


Foreach ($Block in $script:BlockList) {
add-htaccess $Block
}

notepad $htaccess

$script:BlockList

Upload-Ftp -FTPHost "ftp://$($FTPUsername):$($FTPPassword)@xipher.dk/httpdocs/.htaccess" -File $htaccess
Upload-Ftp -FTPHost "ftp://$($FTPUsername):$($FTPPassword)@xipher.dk/httpdocs/WordPress/.htaccess" -File $htaccess

Unfortunately my current hosting company, does not allow me to download the log files via FTP, but I have to connect to the Parallels interface and download it manually.. (I have not had the time looking into automating this part yet, so this is still a manual step)
That is why I added a little function to use a GUI to pick the access_log file.

function Select-FileDialog
{
	param(
		[string]$Title,
		[string]$Directory,
		[string]$Filter="All Files (*.*)|*.*")
			[System.Reflection.Assembly]::LoadWithPartialName("System.Windows.Forms") | Out-Null
			$objForm = New-Object System.Windows.Forms.OpenFileDialog
			$objForm.InitialDirectory = $Directory
			$objForm.Filter = $Filter
			$objForm.Title = $Title
			$Show = $objForm.ShowDialog()
				If ($Show -eq "OK")
					{
						Return $objForm.FileName
					}
				Else
					{
					Write-Error "Operation cancelled by user."
					}
}

I then call the function like this:

$log = Select-FileDialog -Title "Select an Apache logfile"  

A little Regex magic runs through the logfiles, and captures the different elements into different named capturing groups, in this step, I expand all referrer hostnames, and put them into the $urls variable

$urls = Select-String '^(?<client>\S+)\s+(?<auth>\S+\s+\S+)\s+\[(?<datetime>[^]]+)\]\s+"(?:GET|POST|HEAD) (?<file>[^ ?"]+)\??(?<parameters>[^ ?"]+)? HTTP/[0-9.]+"\s+(?<status>[0-9]+)\s+(?<size>[-0-9]+)\s+"(?<referrer>[^"]*)"\s+"(?<useragent>[^"]*)"$' $log |
     Select -Expand Matches | Foreach { $_.Groups["referrer"].value } 

I modified a script by Joel Bennet, to get a little statistics as well, since there can be 1000′s of hostnames, I have selected only to output top 15 by default (using the $URLCount variable.

$urls | group | ForEach -begin   { $total = 0 } `
	-process { $total += $_.Count; $_ } |Sort Count | Select Count, Name |
	Add-Member ScriptProperty Percent { "{0,15:0.00}%" -f (100*$this.Count/$Total) } -Passthru | select -Last $URLCount

Then I loop through all the hostnames, and extract the base domain name, using regex again. (Here I choose to ignore all traffic from my own domain name Xipher.dk, and I choose only to look for referral domains, that have generated 100 hits or more

$hosts = $urls | Select-String '\b[a-z][a-z0-9+\-.]*://([a-z0-9\-._~%!$&()*+,;=]+@)?(?<host>[a-z0-9\-._~%]+|\[[a-z0-9\-._~%!$&()*+,;=:]+\])' |
Select -Expand Matches | Foreach { $_.Groups["host"].value }  | group | sort count |  where {($_.name -notlike "*xipher.dk*") -and ($_.Count -gt 100)} |
 ForEach -begin   { $total = 0 } `
	-process { $total += $_.Count; $_ } | Sort Count | Select Count, Name |
	Add-Member ScriptProperty Percent { "{0,10:0.00}%" -f (100*$this.Count/$Total) } -Passthru 

The script expects to find a .htaccess file in c:\temp containing at least the following two lines:

RewriteEngine On
RewriteRule (.*) http://%{REMOTE_ADDR}/$ [R=301,L]