Put my backup files back

In this article I show how regular expressions can help us find SQL backup files. We'll use maintenancesolution.sql generated files to retrieve backup information. Beware the power of Objects in the pipeline!

Case

We use Ola Hallengren's magnificent maintenance solution to create sql backups. These stored procedures create neat directory hierarchies to store the files per instance, database and backup type.
There are different ways to inspect our backup history, like:

Those methods conceivably query the sql instance from which we need the information. But let's assume we are at the other end, meaning we have a file directory where we see a number of files we recognize as backup files because of the distinct naming convention. Maybe we have a junior DBA who made a little mistake while copying or moving our backups.
We have no idea, however, how many files there are, which SQL Instances they originate from, how old they are,...

Folders

Wouldn't it be nice to have some PowerShell magic to find our backup files back?
Here it is:

Backups here, backups there, backups everywhere

Listing the files

We have Get-ChildItem to collect files. But suppose there are dozens of backup files in dozens of directories, mingled with thousands of pictures, text documents, spread sheets, and so on. How do we select only those files we want?
We will check the object returned with Get-ChildItem | Get-Member -MemberType Properties.
We get 2 different object types back! There are both 'system.IO.DirectoryInfo' and 'system.IO.FileInfo' in our collection.
For our purpose, we are looking for files only, so we immediately know one optimization: Get-ChildItem -File will ignore the directories.
Second, because our share has subdirectories, we'll need to search in there too, so Get-ChildItem -File -Recurse is our next addition.
But how about the real filtering challenge, leaving only maintenancesolution's backup files?

We know our backup files are consistently named like 'ServerName_DatabaseName_BackupType_Date_time.ext'. We could try to put some wildcards in the -Filter parameter, like '*_*_*_????????_??????.*' but that's not an efficient method and will not likely return the desired collection. Let's turn to Where-Object and some regexp:

Get-ChildItem -File -Recurse |
 Where-Object {$_.name -match '\w*_\w*_(FULL|DIFF|LOG)_\d{8}_\d{6}\.(bak|trn)'}

Let's split this regexp in parts:

expression meaning in backup file example
\w* any number of word characters ServerName 'SQL1'
_ an underscore _ '_'
\w* any number of word characters DatabaseName 'secretDB'
_ an underscore _ '_'
(FULL|DIFF|LOG) one of those sequences BackupType 'FULL'
_ an underscore _ '_'
\d{8} 8 digits Date '20170814'
_ an underscore _ '_'
\d{6} 6 digits Time '233000'
\. a full stop (period, dot) . '.'
(bak|trn) one of those sequences bak 'bak'

If you use 3rd party software, the extension can also be 'sqb' or 'safe'.

Pulling out the properties

Split the string

At first sight, this seems easy: by splitting the file name on every '_', we get all the desired parts.

$Instance,$database,$BUtype,$BUdate,$BUtime = $filename.split('.')[0].split('_')

But if the instance or database name contain underscores, this system falls apart. There are always limitations when working with strings. Nevertheless, let's try something.
First because I have a situation that is resolvable, albeit with some extra logic. And second, because I want to demonstrate some more advanced regexp :-)

Named captures

When a file matches the expression above, the $Matches variable contains 3 matches: Name Value
2 bak
1 full
0 sql1_secretDB_FULL_20170814_233000.bak

Match 0 has the entire matching part, in this case the whole file name. The other two are submatches because they were defined as a group: (FULL|DIFF|LOG) and (bak|trn).
We need more submatches, so we have to define them as groups as well:

Get-ChildItem -File -Recurse |
 Where-Object {$_.name -match '(\w*)_(\w*)_(FULL|DIFF|LOG)_(\d{8})_(\d{6})\.(bak|trn)'}

Now, $Matches returns this:

groups

Just by adding some () we created all the groups we want. We could move on to the next step, but let's make it a bit more user friendly:

Get-ChildItem -File -Recurse |
 Where-Object {$_.name -match '(?<sqlinstance>\w*)_(?<database>\w*)_(?<BUType>FULL|DIFF|LOG)_(?<BUDate>\d{8})_(?<BUTime>\d{6})\.(?<ext>bak|trn)'}

And we get this:

namedcaptures

By putting ?<groupname> right after the opening parenthesis '(', we have given each submatch a name.

The underscore problem

We mentioned the risk of having underscores in a Database name. As a possible solution, we can slightly alter the expression. If I'm sure no SQL Instance in my realm ever had an underscore in it's name, I replace the first \w* by [^_]*, meaning a sequence of characters that are not an underscore. This will match with the beginning of the file name up to the first underscore. Because of the rest of the expression, the part between sqlinstance and backup type will be assigned to the 'database' submatch.

NameWithUnderscore

Returning Objects

By sending the retrieved files to Foreach-Object, we can add the submatches as new properties to the FileInfo objects. We also assign the resulting collection to a variable $MyBUFiles:

$MyBUfiles = Get-ChildItem \\SomeFileServer\SomeShare -File -Recurse |
 Where-Object {$_.name -match '(?<sqlinstance>\w*)_(?<database>\w*)_(?<BUType>FULL|DIFF|LOG)_(?<BUDate>\d{8})_(?<BUTime>\d{6})\.(?<ext>bak|trn)'} |
 ForEach-Object {
    Add-Member -InputObject $_ -NotePropertyName SQLInstance -NotePropertyValue $Matches.sqlinstance -PassThru
    Add-Member -InputObject $_ -NotePropertyName Database -NotePropertyValue $Matches.database -PassThru
    Add-Member -InputObject $_ -NotePropertyName BUType -NotePropertyValue $Matches.BUType -PassThru
    Add-Member -InputObject $_ -NotePropertyName BUDate -NotePropertyValue $Matches.BUDate -PassThru
    Add-Member -InputObject $_ -NotePropertyName BUTime -NotePropertyValue $Matches.BUTime -PassThru
    }

!Acknowledgement !: we could do better with the time and date, I know. If we put some extra manipulation to make real dates of the substring '20170814', we could use it to do some date calculations. I won't do that this time because this article is long enough as is. Please search around for some casting techniques if you are ready for that :-)

Counting, Sorting, Filtering

Yes, now it's time to decide what we want to do with the information we get back.
We can send it to Out-Gridview, or to another Where-Object. Possibly we want to save it to a Export-Csv.

$redundant = $MyBUFiles | Where-Object {$_.sqlinstance -notmatch "prod"}
$JobSaver = $MyBUFiles.Where({$_.Database -match "LostDB"})

Indeed, we can now use our new created properties to filter, sort or group the files further!

Moving, Deleting, Copying

Once we have the collection of backup files, and a strategy on how to handle them, we can feed each object to a Copy-Item or Remove-Item or both,...

A very important feature of the strategy we used, is that we still have FileInfo objects. We can't stress that enough: this is a key part, the strength of PowerShell!

If you have followed along and populated your own $MyBUFiles, very carefully, try:

$MyBUFiles | Where-Object {$_.lastwritetime -lt (get-date).AddDays(-14)} | Remove-Item -WhatIf

or, if you have a recent PowerShell version:

$MyBUFiles.Where({$_.LastWriteTime -lt (Get-Date).AddDays(-14)}) | Remove-Item -WhatIf

Summary

Hopefully you learnt from this article:

  1. we can use regexp to pull information from strings.
  2. we can add submatches as properties to an existing object !!!
  3. we can keep objects while sending them through the pipeline, and use that to our advantage
  4. We have to be very careful and act responsible, because PowerShell is powerful

To be continued

Of course, there are some issues that could come in our way. If your naming convention is different, you'll have to invent another cunning expression. The success of this method depends on both the quality of the regular expression you can compose, and the extent of reliablility of the assumptions you make. If your junior DBA also made a little mistake that renamed the files, it becomes another story.

Previous Post Next Post