The Picnic backend tech stack is polyglot in nature, comprising mostly Java and Python. To support the majority of our developers we have two platform teams supporting our developers. I am an engineer in the Java Platform team, where we manage a wide range of topics, not necessarily exclusive to Java. For instance, we run and maintain Keycloak, our identity and access management system, but also Spinnaker, our recently adopted technology for continuous delivery. We also support our Java developers directly by providing Java support libraries that enable product teams to implement applications quickly by providing core features ranging from security to instrumentation.
Historically, we’ve been able to stay close in the loop, and fix things ourselves. With the growth of our development team and the business as a whole, so has the breadth and diversity of our tech stack. As such, we are now required to take a different stance, and that is to invert the responsibility. Rather than doing the work for each team ourselves, we enable each team to do it themselves by providing a developer platform. Most importantly, we want to continue to enable excellence: enable developers to build the best applications that they can build.
In this article we walk through the beginnings of our developer platform, and the aspects that we will consider in order to keep delivering on the promise of excellence.
On creating value
Unlike product teams, the value of a developer platform is not immediately tangible to a business. But first, let’s consider what dimensions we could be adding value to. The Iron Triangle, also known as the Triple Constraint, is a common concept to project management. Most of us are more familiar with this concept as a diagram:
Like any project, a developer platform also delivers value within these dimensions. However, due to its foundational nature, any improvement acts as a multiplierfor projects on our platform.A simple example is the following: if we optimize code such that a commonly used library runs just a bit more efficiently, then this doesn’t result in a singular increase in efficiency, but rather increased efficiency across all projects using that library.
Developer platforms have been extensively researched, especially the impact of openness on a platform. Parker, Van Alstyne, and Jiang (2016) have shown that an open, flexible platform like Android has led to more apps available on its Play store compared to a more closed platform like Apple’s app store. On the other hand, quality has suffered due to the lack of curation, but also due to the openness of the platform itself as it lowers the barrier of entry and enables a greater variety of code.
Software development in general is not that different in this respect: fewer restrictions allow you to go fast. Some of us have enjoyed the thrills of a greenfield or hobby project, with its honeymoon period where everything is exciting and development is fast. However, things start to slow down eventually. Refactoring becomes more expensive, maintainability decreases, deployments become more difficult and less frequent, and crystallized intelligence lives in silos that does not easily translate to other projects. This can be mitigated for example by applying a different software architecture, such as microservices, or organizational restructuring and application of agile processes. Picnic itself has gone through some of these: we started with a monolithic architecture and have since migrated to a micro services architecture which has kept our velocity high.
We try to be open by giving developers the autonomy to develop as they see fit, and encourage contributions to the platform when they hit pain points. These pain points help guide us towards solutions that expand the borders of openness, without removing these borders altogether. Openness is therefore a tradeoff that must be balanced: allow everything and eventually you might descend into chaos, while extreme rigidity can lead to stifled velocity.
As mentioned previously, openness is freeing. Restrictions are the opposite: they limit the room in which you can move. But I think restrictions can actually be freeing. Restrictions help keep complexity contained and uphold the principle of least surprise (and thus, minimizing WTFs/minute), and avoid (repeated) discussions of how a problem should be solved. This frees you from these burdens, allowing you to focus on what matters: delivering value to the business. Nonetheless, we should not strive for frivolous restrictions, we should strive for restrictions implied by uniformness.
Uniformness, from the Latin uniformis (“having only one shape or form”), is a restriction in its definition. Nonetheless, as argued before, I believe it to be beneficial for several reasons, and is one of the major keys of delivering value as a multiplier:
- Changes can be applied unilaterally, because the code we operate on is uniform in shape;
- Developers can more easily switch and understand code as context switches become less impactful. Meyer, Fritz, Murphy and Zimmerman (2014) found that “developers think about productive days in […] which many or big tasks are completed without significant context switching”. Consequently, discoverability improves as we can traverse the code in an familiar structure;
- Bad patterns, while they arguably can spread more easily due to increased applicability in a uniform code base (in other words, easier copy pasting which can also be a bad pattern in itself), they are fortunately also more easily detected and dealt with;
- Time-to-productivity (across projects) is minimized, and instead domain knowledge becomes the limiting factor, fostering a collaborative tech environment.
On achieving uniformness
It should then not come as a surprise that as a platform team we strive for uniformity. Not only does this make our day-to-day business easier, as we can easily check out and validate other teams’ code, but also apply changes that improve the stack as a whole. Exceptions slow us down, and limit our multiplying factor. But how do we strive for this? First, we rely on automation wherever we can. We have adopted Error Prone, a static code analysis tool that hooks into the Java compiler, allowing us to prevent patterns from ever entering our code base, and migrating newly discovered (bad) patterns. This works best on a uniform code base (less complex rules), but also helps maintain uniformity. Secondly, we design for uniformness, while allowing for extensibility on top:
- We have a single common Maven parent to enforce e.g. license checks, dependency conflicts and formatting (enforcing most of Google’s open source practices);
- We provide a Bill of Materials for our libraries and manage most common dependencies in one place and thereby limiting drift;
- We have a set of support libraries that provide an application basis to build on top of, while providing extension points for customizability;
- Our CI/CD definitions are distributed and centrally managed via a Git submodule, configured downstream via environment variables.
On inverting the responsibility
The inversion of responsibility between the platform and every other team does not happen overnight. In fact, it comes with its own set of challenges that we are constantly adapting to and improving upon.
A major challenge that we tackled previously (more about this story can be found in this article) was returning to a state of uniformness after we completed the migration to Spring 5 which had split our support libraries in two. Since then we also support both WebFlux (fully reactive) and WebMVC applications, which are not always entirely compatible with each other. Our approach to this has been the following:
- If possible, design modules such that they can be used by either. Since most of our new code is reactive, we often design with a “reactive-first” mindset.
- If we need to do something specific to WebFlux or WebMVC (e.g. servlet filters), we introduce a 3-way module structure consisting of commons, webflux, and webmvc submodules. We try to share most of the code between WebFlux and WebMVC in commons. This structure is for instance applied to our security-support library.
Another challenge is related to testing. Although our applications are uniform in their foundation, what is built on top might not be. Despite us having a fairly good grasp on configurations out there, testing each application individually against a new set of support libraries is no longer feasible. Instead, we need to minimize the blast radius of changes and improve our testing strategy. To this extent, we have adopted deprecation policies, and are looking into automatically verifying a set of configurations that applications could adopt.
The inversion of responsibility also presents another challenge in the form of ensuring that changes are actually applied, since upgrading often does not have the same priority as implementing a new business feature, and we can no longer do this for other teams due to the amount of repositories. For this reason, we started providing a changelog detailing what should be done, and started releasing on a set monthly basis. We are also looking into forms of reinforcement, where we present teams with an overview of the status of their project in terms of costs, the number of dependency versions behind, potentially compared to other teams.
On the future
Uniformity has helped us keep our velocity high, and made it easy to roll out changes. Looking into the future we will therefore continue to strive for uniformity by continuing to extend Error Prone, and exploring whether we can leverage Error Prone for automated patching following a newer version of the support modules.
Moreover, we expect that with the ever expanding tech stack, that we need to:
- Address the increasingly more complex tech space (we are starting to have specialized technologies for each part of running an app);
- Improve discoverability of both the organization (which team is responsible for what service), but also code and documentation itself.
We are particularly interested in Backstage and how it can help us contain the tech space and enable greater discoverability, as we keep expanding and improving our developer platform.
Parker, G., Van Alstyne, M. W., & Jiang, X. (2016). Platform ecosystems: How developers invert the firm. Boston University Questrom School of Business Research Paper, (2861574).
Meyer, A. N., Fritz, T., Murphy, G. C., & Zimmermann, T. (2014, November). Software developers’ perceptions of productivity. In Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering (pp. 19–29).