Keith's profileKeith Hill's BlogPhotosBlogListsMore Tools Help
    March 02

    Nothing's Perfect Including PowerShell

    Today I needed to count the number of errors in a log file.  Pretty straightforward stuff that I would typically accomplish like so:

    5> Select-String '^\d+,Error' Messages.log | Measure-Object

    And that normally works well for me - except for today.  It turns out that this log file is big, really big - as in 600MB worth of log file!  The command above runs for quite some time and then fails ignominiously with a System.OutOfMemoryException.  Sure enough, a quick execution of "gps -id $pid" revealed that the PowerShell process was consuming 1.7 GB of private memory.  No wonder we hit an OOM exception. 

    So back to the drawing board on how to accomplish this in PowerShell.  But first I had to do something about the memory footprint of my current PowerShell session.  In PowerShell Community Extensions we have a Collect function (which just calls [System.GC]::Collect()).  This brought the private memory footprint back down to ~76MB which tells me that PowerShell's pipeline or one of the cmdlets above is hoarding memory.  No matter.  One of the best things about PowerShell is this awesome escape hatch it provides - direct access to the .NET Framework.  Fortunately there is a simple class in the .NET Framework called System.IO.StreamReader that allows you to read text files a line at a time which is important when you' re dealing with huge log files.  Here is the resulting solution I came up with:

    7> $sr = new-object System.IO.StreamReader("$pwd\Messages.log")
    8> $sum = 0; while (($line = $sr.ReadLine()) -ne $null) {if ($line -match '^\d+,Error') {$sum++}}; $sum 2702996
    9> $sr.Dispose()

    I monitored the private memory usage of the PowerShell process during the execution of this script.  The private memory usage increased about 200K and then didn't budge until the script was finished.  No doubt this contributed to the script finishing much faster as compared to the time it took my first attempt to finish, err, run out of memory -  1 min 43 secs versus 7 min 16 secs respectively.

    When it comes to reading files, another useful .NET Framework method is the static method: [System.IO.File]::ReadAllText(string path) which returns a single string containing the file's entire contents.  If you ever need to load the entire contents of a file into a variable for manipulation (say you need to execute a regex over an entire file's contents  - not just line-by-line) this method is a good way to go. I find the ReadAllText() method a bit easier to use in this case than Get-Content piped to Out-String.  The other benefit of ReadAllText() is that it doesn't add an extra line terminator to the end of the string which is something Out-String will do.  It seems like Get-Content should have a parameter to indicate that it should read the entire file into a single string and output that.