Effective PowerShell Item 8: Output Cardinality – Scalars, Collections and Empty Sets – Oh My!

In the last post Effective Powershell Item 7: Understanding "Output", we covered a lot about PowerShell output.  However there is a bit more you need to understand to use PowerShell effectively.  This post concerns the cardinality of PowerShell output.  That is, when does PowerShell output a scalar versus a collection (or array) versus no output (empty set).  In this post I use the term collection in a broad manner for various types of collections including arrays.

Working with Scalars
Working with scalars in PowerShell is straight forward.  All the examples below generate scalar values:

PS C:\> $num = 1
PS C:\> $str = "Hi"
PS C:\> $flt = [Math]::Pi
PS C:\> $proc = (get-process)[0]
PS C:\> $date = Get-Date
PS C:\> $date.GetType().Fullname
System.DateTime

However you may be dealing with scalars when you think you are working with collections.  For instance, when you send a collection down the pipe, PowerShell will automatically "flatten" the collection meaning that each individual element of the collection is sent down the pipe, one after the other.  For example:

PS C:\> filter GetTypeName {$_.GetType().Fullname}
PS C:\> $array = "hi",1,[Math]::Pi,$false
PS C:\> $array | GetTypeName
System.String
System.Int32
System.Double
System.Boolean

So in fact, the down stream pipeline stages do *not* operate on the original collection as a whole.  The vast majority of the time, PowerShell’s collection flattening behavior within the pipe is what you want.  Otherwise, you would wind up with code like this to manually flatten the collection:

PS C:\> foreach($item in $array){$item} | GetTypeName

Note that this would require us to manually flatten every collection with the insertion of an extra foreach statement in the pipe.  Since pipes are typically used to operate on the elements of a sequence and not the sequence as a whole, it is very sensable that PowerShell does this flattening automatically.  However there may be times when you need to defeat the flattening.  There’s good news and bad news on this topic.  First the bad news.  Technically you can’t defeat this behavior.  PowerShell always flattens collections. The good news is that we can work around PowerShell flattening behavior by creating a new collection that contains just one element – our original collection.  This sounds like it would be a real pain to do this but fortunately PowerShell provides us with a nice shortcut.  For example this is how I would modify the previous example to send an array intact down the pipe and not each element:

PS C:\> ,$array | GetTypeName
System.Object[]

The change is subtle.  Notice the comma just before $array?  That is the unary comma operator and it instructs PowerShell to wrap the object following it, whatever that object is, in a new array that contains a single element – the original object.  So PowerShell is still doing its flattening work, we just introduced another collection to get the result that we wanted.

Another feature of PowerShell that is somewhat unique with respect to scalar handling is how the foreach statement handles scalars.  For example, the following script might surprise some C# developers:

PS C:\> $vars = 1
PS C:\> foreach ($var in $vars) { "`$var is $var" }
$var is 1

This is because in languages like C#, the variable $vars would have to represent a collection (IEnumerable) or you would get a compiler error.  This isn’t a problem in PowerShell because if $vars is a scalar, PowerShell will treat $vars as if it were a collection containing just that one scalar value.  Again this is a good thing in PowerShell otherwise if we wrote code like this:

PS C:\> $files = Get-ChildItem *.sys
PS C:\> foreach ($file in $files) { "File is: $file" }
File is: C:\config.sys

Would need to modify it to handle the case where Get-ChildItem finds only one .SYS file.  Our script code does not have to suffer the "line noise" necessary to do the check between scalar versus collection data shapes.  Now the astute reader may ask – What if Get-ChildItem doesn’t find *any* .SYS files?  Hold that thought for a bit.

Working with Collections
Working with collections in PowerShell is also straight forward.  All the examples below generate collections:

PS C:\> $nums = 1,2,3+7..20
PS C:\> $strs = "Hi", "Mom"
PS C:\> $flts = [Math]::Pi, [Math]::E
PS C:\> $procs = Get-Process

Sometimes you may always want to treat the result of some command as a collection even if it may return a single (scalar) value.  PowerShell provides a convenient operator to ensure this – the array subexpression operator.  Let’s look at our Get-ChildItem command again.  This time we will force the result to be a collection:

PS C:\> $files = @(Get-ChildItem *.sys)
PS C:\> $files.GetType().Fullname
System.Object[]
PS C:\> $files.length
1

In this case, only one file was found.  It is important for you to know when you are dealing with a scalar versus a collection because both collections and FileInfo’s have a Length property.  I have seen this trip up more than a few people.  Given that the unary comma operator always wraps the original object in a new array, what does the array subexpression operator do when it operates on an array?  Let’s see:

PS C:\> $array = @(1,2,3,4)
PS C:\> $array.rank
1
PS C:\> $array.length
4

As we can see, in this case the array subexpression operator has no effect.  Again the astute reader should be asking – what about the case where Get-ChildItem returns nothing?

Working with Empty Sets
OK let’s address this issue of a command returning no output.  IMO this is a somewhat tricky area of PowerShell that you should understand in order to avoid script errors.  First lets document a few rules:

  1. Valid output can consist of no output i.e. empty set
  2. When assigning output to a variable in PowerShell, $null is used to represent an empty result set.
  3. The foreach statement iterates a scalar value once even if that scalar is $null (which is considered a scalar value)

Seems simple right?  Well these rules combine in somewhat surprising ways that can cause problems in your scripts.  Here’s an example:

PS C:\> function GetSysFiles { }
PS C:\> foreach ($file in GetSysFiles) { "File: $file" }
PS C:\>

So far so good.  GetSysFiles has no output so the foreach statement had nothing to iterate over.  Let’s try a variation.  Let’s say for sake of argument that our function took a long argument list and we wanted to put the function invocation on its own line:

PS C:\> $files = GetSysFiles SomeReallyLongSetOfArguments
PS C:\> foreach ($file in $files) { "File: $file" }
File:

Hmm, now we got output and all we did was introduce an intermediate variable to contain the output of the function.  Honestly this violates the Principle of Least Surprise in my opinion.  Let me explain what is happening.  By using the temp variable we have invoked rule #2 – assigning to a variable results in our empty set being represented by $null in $files.  Seems reasonable so far.  Unfortunately our foreach statement abides by rule #3 so it iterates over the scalar value $null.   Now PowerShell handles references to $null quite nicely.  Notice that our string substitution above in the foreach statement didn’t error when it encountered the $null.  It just didn’t print anything for $null.  However, .NET framework methods aren’t nearly as forgiving:

PS C:\> foreach ($file in $files) { "Basename: $($file.Substring(2))" }
You cannot call a method on a null-valued expression.
At line:1 char:16
+ $file.Substring( <<<< 2)

Basename:

Bummer.  That means that you really need to be careful when using foreach to iterate over the results of something where you aren’t sure whether the results could be an empty set and your script won’t tolerate iterating over $null.  Note that using the array subexpression operator can help here but it is crucial to use it in the correct place – again an issue with the language that shouldn’t exist IMO.  For example, the following placement does *not* work:

PS C:\> foreach ($file in @($files)) { "Basename: $($file.Substring(2))" }
You cannot call a method on a null-valued expression.
At line:1 char:16
+ $file.Substring( <<<< 2)
Basename:

Since $files was already set to $null, the array subexpression operator just creates an array with a single element – $null – which foreach happily iterates over. 

What I recommend is to put the function call entirely within the foreach statement if the function call is terse.  The foreach statement obviously knows what to do when the function has no output.  If the function call is lengthy, then I recommend that you do it this way:

PS C:\> $files = @(GetSysFiles SomeReallyLongSetOfArguments)
PS C:\> foreach ($file in $files) { "Basename: $($file.Substring(2))" }
PS C:\>

When you apply the array subexpression operator directly to a function that has no output, you will get an empty array and not an array with a $null in it.  If you find this situation as confusing and error prone as I do, please feel free to vote on the following defect submission:

Foreach should not execute the loop body for a scalar value of $null

Update 05/09/2008: Functions can return empty arrays.  You just have to use the comma operator to wrap the result in another array so that when PowerShell flattens the result, you get the original array. For example:

function ReturnArrayAlways {
    $result = @()
    # Do something here that may add 0, 1 or more elements to array $result
    # $result = 1
    # or
    # $result = 1,2
    ,$result
}
In summary, watch out for how the foreach statement deals with the scalar value $null which can get synthesized automatically by PowerShell when a function has no output.

This entry was posted in Effective PowerShell. Bookmark the permalink.

9 Responses to Effective PowerShell Item 8: Output Cardinality – Scalars, Collections and Empty Sets – Oh My!

  1. Unknown says:

    [what follows is a note from a J. Random Powershell user, not Mr. Hill.  I\’m posting it here \’cause Mr. Hill\’s Item 8 was the top hit for the query powershell+list+flatten, so I figured this would be a good place for other people to find it.  And someone who knows more than I do might be prompted to respond with some useful information.]
     
    more fun with flattening:
     
    I had a function that wanted to return a list of lists (array of arrays).  I had some difficulty with returning a list that had one top level element – PowerShell always wanted to flatten it to a 1 level list.  Rather than create a separate code path for this (which was certainly one solution), I decided on this workaround;
     
    function a {
     . . .
        $result=@{}; $i = 0;
        $listOfLists | %{$result[++$i] = $_}
        return $result
    }
     
    $retval = a $params
    $retval = $retval[1..$retval.count]
     
    (Note: this is not a general workaround – the function in question was guaranteed to always return at least one top level item.  If the list has zero top level items, $retval will be a list of 2 empty lists)
     
    Also useful (and an example of dealing with some list edge conditions)
     
    # Coalesce-Args## return the first argument that would be considered TRUE# in a boolean context## bits of funkiness:##     the + ,$null guards against an error when everything\’s FALSE##     the [object[]] prevents a singleton pipe result from#     appearing as an object rather than an array
    function Coalesce-Args{    ([object[]]($args | ?{$_}) + ,$null)[0]}
     
    hope someone finds this helpful
     

  2. Keith says:

    It turns out there is an even easier way to do this:
     
    function ReturnArrayAlways {
        $result = @()
        # Do something here that may add 0, 1 or more elements to array $result
        # $result = 1
        # or
        # $result = 1,2
        ,$result
    }
     
    You will always get an array with 0, 1 or more elements back from this function.  I\’ll be updating this post to reflect this.
     

  3. Unknown says:

    Hmmm.  I was using the ,$result construct when I had the problem.  I can\’t reproduce the problem in a simple case.
     
    PS C:\\> function comma2 {$a=((1,2,3),(4,5,6)); return ,$a};$b=comma2;$b.count2PS C:\\> function comma {$a=,(4,5,6); return ,$a};$b=comma;$b.count1PS C:\\> function comma0 {$a=@(); return ,$a};$b=comma0;$b.count
    0
     
    (had the problem occurred I would\’ve seen $b.count -eq 3 for $b=comma).  I\’ve since changed the function to return a list of hashes and don\’t have the old code at hand.  Perhaps I was looking at a bug or glitch; perhaps I was just wedged.  If I bump into again I\’ll try to isolate it and report it.

  4. Christopher Pitts says:

    Hello Keith,

    Hope my question finds you doing well. I am new to powershell and have an array formatting question. Do you know why an FL or FT will not formatt my data into a table or list?

    For instance:

    $aryItem1 = ‘one’
    $aryItem2 = ‘two’
    $aryItem3 = ‘three’
    $aryItem4 = ‘four’
    $aryItem5 = ‘five’

    $myArray = “$aryItem1,$aryItem2,$aryItem3,$aryItem4,$aryItem5″

    $myArray|FL

    ———-Returns————
    one,two,three,four,five
    ——————————–

    If I remove ” “, it puts the data in a list output. Also, $myArray|FT returns the below without quotes around the array string of items.

    ———-Returns————
    one,
    two,
    three,
    four,
    five
    ——————————–

    Interesting enough, if I say get-service|FT …. this works as expected. Seems to only work for me on commandlets.

    Thank you for your time,
    Christopher

    • rkeithhill says:

      Format-List and Format-Table work on individual items in an array. If you pass these an array of strings, they’ll just output the string. What these formatting cmdlets are really meant for are individual objects with properties. Try this:

      C:\PS> $ht = @{Prop1='one'; Prop2='two'; Prop3='three'; Prop4='four'}
      C:\PS> $obj = new-object psobject -Property $ht
      C:\PS> $obj | Format-Table -AutoSize

      Prop4 Prop1 Prop2 Prop3
      ----- ----- ----- -----
      four one two three

      C:\PS> $obj | Format-List

      Prop4 : four
      Prop1 : one
      Prop2 : two
      Prop3 : three

      $obj will automatically output itself using the table formatter because it has less than 5 public properties. IIRC, if there are 5 or more public properties, then PowerShell will use the list formatter by default.

  5. Pingback: Powershell: nulls, empty arrays, single-element arrays « Surrounding The Code

  6. Tim! says:

    Note that you should not use both the comma operator to force an array return and the @() array operator to force the returned value into an array.

    function Get-Foo([xml]$Manifest)
    {
    $result = $Manifest.SelectNodes(“./Foo”)
    return $result
    }
    function Get-FooAsArray([xml]$Manifest)
    {
    $result = $Manifest.SelectNodes(“./Foo”)
    return $result
    }
    $basic = Get-Foo -Manifest $m
    $wrapped = @(Get-Foo -Manifest $m)
    $array = Get-FooAsArray -Manifest $m
    $wrappedarray = @(Get-FooAsArray -Manifest $m)

    Assume SelectNodes returns $Count nodes. Then:
    $basic is an array with $Count XMLElements, or scalar $null if ($Count -eq 0)
    $wrapped is an array with $Count XMLElements
    $array is an XPathNodeList with $Count elements
    $wrappedarray is an array with one XPathNodeList with $Count elements

    • Tim! says:

      Oops — of course Get-FooAsArray should have the unary comma in the return statement.

    • rkeithhill says:

      Well, I’d argue that you should never use a return statement in a function except to exit early. 🙂 That is, this is more canonical PowerShell:

      funtion Get-Foo([xml]$Manifest)
      {
      $Manifest.SelectNode('./Foo')
      }

      The “return ” form is unnecessary and implies that the argument in the return statement would be the only value returned from the function. But in PowerShell any uncaptured command output becomes the output of the function.

      Regarding the use of the comma operator and the array subexpression, this *was* an issue in PowerShell V1 e.g.:

      PS> function ReturnEmpty { @() }
      PS> $res = ReturnEmpty
      PS> $res.length # notice no output here!!
      PS> function ReturnEmpty { ,@() }
      PS> $res = ReturnEmpty
      PS> $res.length
      0

      This is no longer necessary in at least PowerShell V3 Beta:

      28# function ReturnEmpty { @() }
      PS> $res = ReturnEmpty
      PS> $res.length
      0
      PS> function ReturnEmpty { ,@() }
      PS> $res = ReturnEmpty
      PS> $res.length
      0

      Also keep in mind that using the @() on an expression does not always create an array “wrapper”. It will only do this for a scalar value. If the value is an array @() is a no-op.

Leave a comment