The array anti-pattern
Introduction
I recently saw someone share a code snippet on LinkedIn which was meant to demonstrate PHP's list syntax, just a trivial example something like this:
function getTotals(int ...$numbers): array
{
$min = min($numbers);
$max = max($numbers);
$total = array_sum($numbers);
return [$min, $max, $total];
}
[$min, $max, $total] = getTotals(1, 2, 3);
Nifty, right? Well, list syntax is a nice feature at times, for sure. In modern versions of PHP you can even use it with keys:
['foo' => $a, 'bar' => $b] = ['bar' => true, 'foo' => false];
// $a = false, $b = true
But what really stood out to me was the use of array
as the return type for this function. And I left a comment on the post saying this:
An array can be anything, any length, any collection of values of any mixed types, so we really want to avoid returning and type hinting this catch-all structure if we can. Return an object of a specific type with public read-only properties of integer types and give it a meaningful name and structure, IntegerAggregate or something in this example.
Now of course we've all done it; I could go through code I've written, mature libraries I use and I could find you hundreds of examples of functions or methods either accepting parameters type hinted as array
or returning array
.
But using array as a type hint is (usually) an anti-pattern. It's something we should avoid doing where we have a viable alternative.
Most of the time we do it, we're doing it out of laziness or poorly thought-through design. We're building a sub-optimal solution and introducing technical debt where we have better options.
What's an anti-pattern
There are two basic criteria for something we do to be considered an anti-pattern:
-
It's a commonly-used process, structure or pattern of action that, despite initially appearing to be an appropriate and effective response to a problem, has more bad consequences than good ones.
-
Another solution exists to the problem the anti-pattern is attempting to address. This solution is documented, repeatable, and proven to be effective where the anti-pattern is not.
The problem
The basic problem with using array
as a type hint is that arrays are a very broad type in PHP. They represent multiple, disparate data structures; we do not, for example, have different types for dictionaries versus lists (notably, in respect of this limitation PHP 8.1 introduced one of the most idiomatic PHP things you'll ever see, an array_is_list
function).
What this means is that when we type hint it, we and the rest of our code never really know what we're going to get. We don't know what length the array will be, we don't know whether it's a list or a map, we don't know what the types of any values will be, or even if they're scalars, objects or other arrays. We're running blind. This means either writing code to explicitly validate the structure of the array we expect to receive, or assuming we'll have that structure, crossing our fingers and hoping for the best until something inevitably breaks.
We see this very commonly in larger code bases with multiple authors (and smaller ones, even with a single author) - a specific array structure is assumed, then something is added or changed, or the code itself in a function doesn't account for some edge-case or error condition and suddenly something, somewhere is trying to access an array key which doesn't exist, or assumes a data type for a value which is wrong, etcetera. Often these errors only occur under specific circumstances and aren't picked up until something happens in production.
These are our bad consequences.
"So just use static analysis..."
It's true the shortcomings of using array
can be overcome by docblocks which can be read and understood by tools like Psalm and PHPStan. For example in the opening example, we can add a comment like @return int[]
or @return array<int, int>
But this is not always an adequate solution:
- There is no one, official tool or specification for PHP static analysis, so we can't necessarily rely on any particular docblock syntax.
- We now need to assume our IDE and SA tools are properly configured to the appropriate level of strictness and always integrated and always checking our code for us whenever we make a change. Maybe mine are, maybe another team member's aren't. Problems might possibly be caught in a build pipeline if you have an optimal devops flow but now we've wasted time.
- An author might forget to docblock a portion of code. Such a problem might not be caught depending on SA configuration, or we might rely on it being caught in a peer review. Either way we're now relying on processes which might be prone to error.
Are you saying never type hint array?
No. What I'm saying is it's often the case that where you're type hinting an array, either as a parameter or return, an object will be a better alternative. We're on PHP 8.2 now and with constructor property promotion and read-only properties, it's trivial to write something like:
class IntegerAggregate
{
public function __construct(public readonly int $min, public readonly int $max, public readonly int $total)
{
}
}
This is barely one line of code. The (previously valid) excuse that there's too much boilerplate and overhead in a class for accessing a few numbers no longer holds water. But look, the difference this makes to that trivial example from LinkedIn is stark:
function getTotals(int ...$numbers): IntegerAggregate
{
$min = min($numbers);
$max = max($numbers);
$total = array_sum($numbers);
return new IntegerAggregate($min, $max, $total);
}
// $aggregate = getTotals(1, 2, 3);
// echo $aggregate->min;
Your IDE, and more importantly, the PHP engine itself, now knows exactly what you're getting and will prevent unexpected, unpleasant surprises. If we have an IntegerAggregate
object, we know we will always have a min property and it will always be an integer. It's a type-safe solution.
There are occasions where I would feel justified returning array
type. For example:
- If the author's example here had been a list of integers which were not semantically different. The kicker for me in a case like this one is that min, max and total are all different things, the distinction between them matters. We can't treat every item in the list we get back the same. If we could, array with a docblock might be more justifiable.
- Where we're returning a list of unknown types, for example generic repository methods in an ORM. It's kind of not great in that situation, but it's more a limitation of PHP's dynamic nature that we just have to accept sometimes. This is a problem which would be solved if the language were able to support generics but if/until then SA tools and docblocks are really our only option.
- When returning a variable or unknown data structure such as the result of
json_decode
, if we are not able to rely on the presence of any specific structure and map portions of the result to a better type.
The array type hint is a poisoned chalice, a double-edged sword or whatever similar metaphor you feel is appropriate here. I'm not saying don't ever use it, I'm saying use it sparingly and with caution.
Addendum - response to feedback
Someone else wrote a response to this blog post under the title To Array Is Human. It's worth a read to get the balance of an opposing view, however I believe the author has made a couple of unjustified, quite sweeping assumptions about my mindset and the kind of developer I am (and aim to be).
Matias says, for example, "the issue discussed was not about type-hinting as arrays in general but...in a very specific context". It's true the original example on LinkedIn was something quite specific and a trivium that would be unlikely to come up in the real world, but my point in cautioning against the approach wasn't to counter one trivial example. It's about the approach and way of thinking towards programming problems which is being encouraged by that approach - namely, that it's okay to pass around and deal in unknown, unstructured state.
I say type hinting array is usually an anti-pattern because something is an anti-pattern when two conditions are met. First, that the thing in question superficially appears to be a suitable solution but has a hidden or non-obvious potential to introduce problems and second, that a better solution in context exists. Type hinting array will more often than not meet these criteria.
You can absolutely find countless examples in both code I've written and major libraries I and every other PHP dev use on a daily basis where array has been type-hinted (or worse, nothing has been type-hinted). Sometimes it's justified, sometimes it isn't and we could have done something better, at least on the isolated level of programming theory.
The idea I am an advocate of dogmatism or following strict rules in programming from which you never deviate is simply untrue; indeed I've expressly stated the opposite in multiple blog posts in the past. If Matias had taken the time to digest my post properly, he would have noticed a few paragraphs above where I've listed some of the contexts and situations I would be happy to use array as a type hint. I didn't say it's something you should never do, either here or on the LinkedIn discussion. I said it's a design smell and usually one which indicates an anti-pattern.
There's also a response from the original author of the post on the LinkedIn community, titled Misconceptions about returning arrays as an antipattern.
The author says "Some developers argue that returning arrays is an antipattern, while others believe it can be a perfectly valid approach in certain situations", but if you read his article you'll see he has failed to differentiate the PHP-specific concept of an array from arrays in other programming languages. He says "In many programming languages, arrays are a fundamental part of the language and provide a convenient way to group related data" but we're not talking about arrays in C, or tuples in Python, or maps versus lists in Java. We're talking about an idiosyncrasy of PHP where all these different structures are bundled in to the same type.
So my response would be it's a nonsense to write as if there's any suggestion that using arrays as a parameter or return type in other programming languages may be an anti-pattern. No one's suggesting returning int*
from a C function is an anti-pattern.
He then goes on to say "Critics of returning arrays argue...returning arrays exposes the internal implementation of a function and can lead to tightly coupled code that is harder to modify and test."
This is not an argument I've made. The problem with return type-hinting array in PHP is nothing to do with "exposing internal implementation" and everything to do with the fact that it simply doesn't tell client code what exactly it should expect to receive, so it's not really type-safe.
The author concludes that returning array "is not always an antipattern" - I agree. In fact, I never made any claim to the contrary. But I am and always will be an advocate for good practice in terms of type-safe, self-documenting code which can be understood as far as possible within the boundaries of the language itself. In PHP there will always be aspects of dynamic typing and gaps which need to be filled in with static analysis and other third-party tooling, but we should aim to keep our code as safe, clean and readable as practical.
Comments
All comments are pre-moderated and will not be published until approval.
Moderation policy: no abuse, no spam, no problem.
Thanks for the good blog post. 🐘
Recent posts
The difference between failure and success isn't whether you make mistakes, it's whether you learn from them.
musings coding
Recalling the time I turned down a job offer because the company's interview technique sucked.
musings
Buy this advertising space. Your product, your logo, your promotional text, your call to action, visible on every page. Space available for 3, 6 or 12 months.
Recalling the time I was rejected on the basis of a tech test...for the strangest reason!
musings
Why type hinting an array as a parameter or return type is an anti-pattern and should be avoided.
php
Leveraging the power of JSON and RDBMS for a combined SQL/NoSQL approach.
php
I got halfway down this and was ready to be like "but static analysis!!" but you already covered that :) FWIW tools like Codacy can sit in your PR workflow and ensure that everyone on the team is coding to the same standards -- sadly we don't (yet) support Psalm or PHPStan, we will have to get on that!