Recently, working with a customer, it came time to test a script, and process to ensure success during a recovery in the event of a catastrophic failure. The basic requirement of this script is to automate the download of multiple files from a cloud storage provider since this cannot be done from the existing UI (i.e. Azure, AWS, and Google).
First some background; A customer has a product implemented to backup their AD environment. A specific requirement of this deployment is that these backups also need to be copied to an immutable storage destination off-site (i.e. in the cloud). The backup solution is implemented, and a script to copy data to immutable storage is also implemented. A recovery test has been completed using on-prem backup data. What happens if the on-prem data no longer exists, or can no longer be trusted?
This brings us to the resultant script. Now the customer needed a way to download or retrieve the data from immutable storage. Enter Powershell and a script to do just that from the chosen cloud storage provider. The script works, yet, performance is underwhelming. The entire process takes hours because the script is very linear. Functional, not terrible, just time-consuming.
$Configs = $ConfigsALL | Where-Object {$_.LastModified -ge $TargetDate} | Sort-Object -Descending
$Jobs = @()
$ScriptBlock = {
$Config = $args[0]
$ConfigPath = $args[1]
$Config | Get-AzStorageBlobContent -Destination $ConfigPath -Force
}
Write-Output “Files to download: $($Configs.Count)”
#Process the file list
for($i = 0 ; $i -le $files.count ; $i+=($chunkSize))
{
Write-Host $i
if ($files.Count – $i -le $chunkSize)
{ $c = $files.Count – $i } else { $c = $chunkSize }
$c– # Array is 0 indexed.
# Spin up job.
$jobs += Start-Job -ScriptBlock $ScriptBlock -ArgumentList ( $files[($i)..($i+$c)],$ConfigPath )
$running = @($jobs | ? {$_.State -eq ‘Running’})
# Throttle jobs.
while ($running.Count -ge $maxJobs) {
$finished = Wait-Job -Job $jobs -Any
$running = @($jobs | ? {$_.State -eq ‘Running’})
}
}
# Wait for remaining.
Wait-Job -Job $jobs > $null
$jobs | Receive-Job
This snippet resolves the issue. The short version: Get a list of files from the cloud providers ($ConfigsALL previously retrieve with the list of all configs currently in the provider). Select from the list of files only those within a specified date range sort from newest to oldest (so that the newest data is downloaded as soon as possible).
Once this is done, loop through the list of files and begin download jobs up to a specified maximum of concurrent jobs. Check the list of concurrent downloads, and start more as the number of running jobs goes down, up to the maximum jobs again until the entire list of specified files is downloaded.
Leave a comment