Beyond Programming: The Art of Software Engineering à la Google - takeaways from the book "Software Engineering at Google"

Last update December 18, 2024

Curious about the art of Software Engineering as practiced at Google? From understanding the difference between programming and software engineering to fostering a no-blame culture, managing technical teams and championing teamwork, the book “Software Engineering at Google” offers food for thoughts for every engineer and technical leader. Let’s explore new approaches of Software Development in order to create robust solutions and happy teams, inspired by insights from my recent reading.

If you want to know more about the culture and way of working at Google, the book is available in an online version (for free), or in paperback version: Software Engineering at Google (O’Reilly)

Please note this article only reflects my understanding of the book and not necessarily the authors’ words, and is limited to only the points that I found interesting to share, and not its whole content. The content is also only based on the above mentioned book and not on an actual work experience at Google.

Table of Contents

Software Engineering vs Programming
Teamwork vs Alone time
Blameless Culture
Knowledge Sharing
- Tools and media
- Culture encouraging knowledge sharing
How to lead a Software Engineers team
Engineering for equity
Readable code
Code style
Code Review
Documentation
Testing
Whiteboard markers

Software Engineering vs Programming

It’s Programming if “clever” is a compliment. It’s Software Engineering if “clever” is an accusation.
Titus Winters on X

Programming is aimed at short-lived solutions, it’s creating code that addresses the problem of the moment. Software Engineering (SE) is a broader application of code, tools, policies and processes that offer a solution to a problem that can span up to lifetimes. The scope of responsibilities is different.

Teamwork vs Alone time

Although it’s important to have periods of quiet and uninterrupted time as a Software Engineer, they also need connection to their team for various reasons: confront their opinions, share knowledge, find misconceptions early…

If less-knowledgeable people on your team feel that there’s a barrier to asking you a question, it’s a problem. Finding balance between focus and availability to the team is an art.

About focus time, teams or companies should have clear policies on when and how people can get interrupted, so that alone time is used at its best. Those policies can be as simple as having a green/red light (available/do not disturb), or other tokens, on everyone’s desk and ask others to respect the status of their coworkers.

Working in isolation is risky. You shouldn’t be concerned about sounding not intelligent and way more concerned about wasting a lot of time going in the wrong direction.

Criticism is also part of the teamwork. Be humble about your skills, trust that team members have your best interests at heart and care about the success of the project above all. “You are not your code“, receiving comments about what you wrote isn’t a criticism of you as a person.

Do not underestimate the power of playing the social game. Direct or indirect colleagues with whom you have a good relationship will be more willing to go the extra-mile for the success of your projects.

Blameless Culture

Coworkers should feel safe to fail and not feel the need to hide it. There’s a saying in the tech world that if you’re not failing from time to time, you’re not being innovative enough. Failure is an opportunity to learn and improve.

To learn from our mistakes, we should document our failures in “post-mortems“, by describing root causes and solutions. Post-mortems shouldn’t be an occasion for finger-pointing, and no one should feel ashamed to openly share failures and explain how they managed to solve things. Properly documenting and presenting this will allow others to learn and will avoid similar problems in the future.

I found the article “Effective teams” written by Addy Osmani, Software Engineer at Google, very relevant and complementary to this point (and other points made in the book).

Tools and media

Knowledge sharing should be tailor made in every company, with common systems ranging from Q&A, newsletters, mentoring, tutorials and classes.

Google has in-house forums (similar to Stack Overflow) for engineers to ask questions and exchange on technical topics. It allows to ask questions openly without having to hide sensitive code. Engineers benefit from the views of SE that work in the same environment, and it helps spread knowledge in the company.

Written knowledge is scalable but doesn’t adapt to each individual learner and has to be maintained; experts (tribal knowledge) are not scalable but adapt to the audience, so it’s better to mix media, they complement each other.

Mix media for knowledge sharing ensures a wider audience finds what it’s looking for.

Group chats serve when one doesn’t know exactly who to ask for help or get quick feedback, and other members of the chat can learn at the same time. It’s great for active, back-and-forth discussion. Those chats can be topic or team driven. Topic-driven chats tend to be bigger (cross-teams), offer faster answers and gather experts. Team-driven chats are less intimidating but getting an answer might be slower.
Mailing lists/Newsletters can serve for dedicated topics.
Q&A platforms.
Teaching, which is not limited to experts or seniors: everyone has something to teach!

Teaching your colleagues can happen in different formats:

Office hours: Regular (weekly) event during which someone makes themself available to answer questions on a topic.
Tech talks and classes: On complex and recurring topics that don’t need heavy regular updates and that are difficult to learn without guidance.
Writing documentation: The first time you learn something is the best time to check the existing documentation and improve it. When you’re learning it’s obvious what’s difficult and unclear, later it won’t be.

Culture encouraging knowledge sharing

A culture of learning requires a safe place where people can openly admit what they don’t know, they shouldn’t fear finger pointing or punishment if they lack some knowledge.

At Google, newcomers are assigned an official mentor who’s not in their team or management line (so it’s easier to talk about touchy issues). Their responsibility is to answer questions and help the newcomer ramp up. Mentors are volunteers who’ve been at Google more than a year.

At individual level, learning is an iterative process. Not knowing is an opportunity to learn, whether you’re a senior or a junior. Keep on asking questions. Don’t fall in the trap of spending a lot of time trying to find the solution by yourself because your question seems too basic, don’t be afraid to say you don’t know and ask for explanations, even when you grow in seniority.

As a senior, openly asking questions and admitting lack of knowledge on a topic shows that it’s OK for anyone to do the same. When given the opportunity to answer questions, patience and kindness foster an environment where people feel safe looking for help.

Learning is not only understanding new things but also developing an understanding of the existing design and implementation (ex: an existing code base).

You might also like: The art of knowledge sharing in development teams

When you learn something in a 1-1 discussion, write it down and share with others (My personal though on this: I’d advise people giving explanations to ask the person who asked the information to write the documentation themselves, then review it together to find spots that are unclear for that person. A tutorial is never as complete as one written by someone who has few knowledge on the topic.)

Knowledge sharing is part of the requirements to achieve senior levels.

A group can be reduced to its most toxic members. The behavior of just a few individuals can discourage an entire team to ask questions and show vulnerability. One should never tolerate, or worse praise, the “brilliant jerk”. Being an expert and being kind shouldn’t be exclusive.

A good culture has to be actively nurtured and people participating to the positive culture should be recognized and rewarded.

How to lead a Software Engineers team

At Google, there is a difference between Managers, leaders of people, and Tech Leads, leaders of the technology efforts. They have similar planning skills but different people skills. In small teams, although not ideal, the same person might have both roles, they’re called Tech Lead Manager. Managers must have a technical background.

The Manager is responsible for the performance, productivity and happiness of every team member, while making sure the business needs are met. The Teach Lead, who usually reports to the Manager, is responsible for the technical aspects, like technology decisions, architecture, priorities, velocity, project management… TLs usually also contribute to the project. One of their responsibilities is to choose between doing something quickly themselves or delegating it to a team member.

Even if you’ve sworn yourself that you’ll never become a manager, at some point in your career, you’re likely to find yourself in a leadership position, especially if you’ve been successful in your role.

Quantifying management work is more difficult than for engineering work. What makes you feel like you’ve done something is really different, it’s less tangible in management work.

Managers should fight the urge the manage and act as servant leader: serve the team, create an atmosphere of humility, respect and trust, advise when necessary, manage the health of the team (technical AND social). The social aspect is often overlooked but it’s as important as the technical aspect and often way more difficult to manage.

If a manager makes it obvious that they trust their employees, those feel positive pressure to live up to that trust.

Great managers worry about what things get done and trust their team to figure out how to do it.

If your employees are so uninterested in their job that they actually need traditional management style, that is the real problem.

Publicly praise individual successes but don’t finger point individual failures.

Never ignore low performers. One of the most difficult parts of the job is dealing with people who don’t meet expectations. They will not magically improve or leave by themselves. In the meantime, high performers waste valuable time pulling the low performers along and those low performers prevent new hires. Deal with them quickly with a coaching plan and a defined time frame with progressive steps, and take the necessary measures if they can’t keep up. During the recruitment process, don’t compromise, wait for the good match.

“People are like plants, they all need water, sun and fertilizer in different quantities. It’s your role as a manager to find out who needs what.” (more direction, encouragements, recognition…)

When evolving to higher management positions, the scope of the problems gets larger, problems get more abstract, prior technical expertise becomes less relevant.

Engineering for equity

There is an imbalance of power between those making development decisions and those who simply live with such decisions (end users), which can disadvantage already marginalized communities. There is a lack of diverse representation in the IT workforce (mainly white and Asian men). The software engineering teams should look like the population they build products for, recruiters have to contribute building a more representative workforce.

A Computer Science degree or work experience don’t provide all the skill you need to become an engineer (see the difference between being a programmer and a software engineer).

When diversity is not achievable, the engineers should focus the most on users that are different from themselves and understand them. Engineers have to learn how to build for all users. It is an individual as well as a company responsibility to make it happen.

There is no simple solution, nor a single methodology that works. It is a complex, multifactorial issue. Hiring is not the only problem, as the systemic inequities in progression and retention, the lack of psychological safety in the team or company and the lack of belonging, make it more difficult to retain diverse talents.

User Research with a group of users most negatively impacted by bias and discrimination can help when diversity in the project team cannot be achieved yet.

Readable code

At Google, code readability is highly important. About 2% of the Software Engineers at Google are “readability reviewers“. After passing an internal “certification”, they can give readability approval in code reviews. Every change requires one.

The reason for this focus is that code is read more than it’s written. For such a big engineering workforce as Google’s one, it’s highly important that engineers work with the same code visual for a given language. It enables readers to focus on functionalities rather than being distracted by the code that looks different from what they’re used to.

It does come with some costs: teams without a readability reviewer have to find them outside their team to have their changes approved, it can create additional rounds of review, and it’s a human-driven process (although what can be automated, is, as described later).

Code style

Having well established style guides and rules allows developers to really focus on the functionalities (what the code does) instead of the presentation of the code. If there are clear rules, there is no space for debates on this topic. There is also less distraction for the developers. Those rules are important because there are many developers at Google, which means many different backgrounds, skills level and experiences.

Code styles ensure the code is easy to read. Code will get read way more often than it’ll be written, it’s uttermost important that it’s easy to understand for any developer working on it. They value simple to read over simple to write, for example there’s no point finding shorter ways of writing the same piece of code if the way it’s written is unfamiliar for a lot of software engineers.

For each language used at Google, there is a group of experts in that language that are owners of the style guides and make the decisions for that language.

Code Review

Nothing can be merged without being approved.

Code reviews serve several goals, which are clear for software engineers. It means any merge request has to go through three identified quality gates:

The code has to pass a comprehension check (LGTM) by another engineer, often within the same team.
The code has to be checked by a code owner (someone who’s knowledgeable in the functionality or project).
The code has to pass the language readability, by someone who’s certified “readability reviewer” (see Code readability earlier in this post).

One person can assume the 3 roles if they have the readability reviewer certification and are code owner. The author of the code themself can also assume the reviewer role (the last 2 gates) and only get the LGTM (Looks Good To Me) from another engineer. This is because Google recognizes that the code review is very important but having too many actors in the process can quickly create bottlenecks.

They noticed that there is a tendency in the software engineering industry to try to get additional input from several engineers in order to feel safer about the changes, but in fact the first review has the most value. Subsequent ones don’t add that much value but makes the process longer. Any way, if there is a need for several reviewers, they should each focus on different aspects of the change.

Reviewers should only point out alternative solutions if the author’s approach is deficient or dangerous. It’s not their role to challenge the approach in other cases. If the author can demonstrate that their approach is equally valid to the proposition of the reviewer, the reviewer has to accept the preference of the author.

Changes should be as small as possible to ease the review process. 35% of changes at Google are to a single file. A merge request should not be larger than 200 lines of code.

Merge request should have a good change description, explaining what was changed and why (1 line summary and a detailed description).

Everything that can be automated, should be: for example, static code analysis (linters and formatters, tests) before submitting the review (the tool prevents the author from sending the changes for review until it’s fixed).

Documentation

It’s important to identify the different types of documents necessary to your company and define workflows for each one. Knowledge useful to the company should be centralized, each team shouldn’t have to create and maintain the same documentation several times. Documentation should always have owners, a freshness date (last review/update), policies and rules, be source controlled, have a maintenance lifecycle, have issues tracking…

Make it easy for developers to write documentation by integrating it in the development process when possible (for example a Markdown file in the project). Include anything that can be in the usual development flow and treat documentation as code. Since it’s a well-known fact that developers don’t like to write documentation, make things easier for them to remove some of the burden.

What makes a good “documentation writer” is not the proficiency in writing but the ability to step out and see things from the audience’s perspective.

Your company should have tutorials to walk newcomers through the setup of a new project. It allows a fast start, it gives great value for a moderate investment (technologies used at a company don’t change that often, those tutorials will rarely need a heavy update) and it makes sure everyone starts on the right (and same) foot. It can be in the form of a “Hello World” project that assumes nothing.

The best time to write such a documentation or to have it reviewed is when a newcomer joins the team or the project. When someone joins the team, they should write everything they go through until they are ready to start. This will be the most detailed tutorial possible. Every mistake made during the onboarding phase, every blocker encountered, why and how it was solved, are precious insights for future newcomers.

Another type of documentation is “conceptual documentation“. It’s an overview of API and systems. It doesn’t replace the official documentation but augments it. It should be useful for any engineer, from novice to expert. It has to be clear and focused on common usage at the company. This is usually not stored with the code as it’s not linked to a single functionality or project.

Testing

The book has extensive chapters on testing. Here are a few takeaways.

All tests should be integrable in the CI pipeline, otherwise they don’t serve their purpose.

At Google, code reviews are very important and mandatory. If the code includes tests that prove the code works as intended but also in edge cases, the reviewer spends less time and energy reviewing the code. They don’t need to mentally walk each case through the code but can just check that each case has been tested.

When writing new code, if it’s difficult to write tests for it, it might mean the code is not great (too many responsibilities, difficult to manage dependencies, tight coupling…). By having to write tests, engineers have to fix design issues early on.

Google has their own definition of test scopes and sizes, “small”, “medium” and “large”. Small (unit) tests must be fast and deterministic (they are infinitely repeatable, always returning the same result, contrary to flaky tests). Small tests run on a single process, often on a single thread, the same as the code being tested. They don’t communicate with other servers, 3rd parties (databases, …), they cannot sleep, perform I/O operations, make blocking calls…

Medium tests can span multiple processes, use multiple threads, make blocking calls, network calls to localhost (but not outside), use a database… They run on a single machine. They present a risk of being slower and nondeterministic, because of their dependencies.

Large tests don’t have those restrictions, risk of flakiness is way higher. They are full-system end-to-end tests and are meant to validate the configuration more than the code. The different type of tests run in an isolated manner from one another.

Some automated tools check that tests respect their constraints (for example, small tests will fail if they attempt to make an external connection).

It’s important to have very little flaky tests, as flakiness will cause a loss of confidence in the tests, then engineers not reacting to failures anymore. At Google, they noticed that approaching 1% flakiness already makes the tests lose value. They have a rate around 0.15%, which at their scale already represents thousands of tests daily.

Tests have to be isolated, they contain all the necessary information to execute (for example, no shared database). In testing, we don’t try to achieve DRY. They must be easy to read and straightforward. Conditions and loops are discouraged. They have to be really easy to maintain because most engineers will read less tests than “main” code, thus being less familiar with them.

The goal of testing is to improve engineers’ productivity. Thus, tests shouldn’t break when there is a small change in the main code. Spending one hour writing a feature and the rest of the day debugging broken tests is not ok.

One example from the book: “Often, she has difficulty figuring out what the tests were trying to do in the first place, and the hacks she adds to fix them make those tests even more difficult to understand in the future. Ultimately, what should have been a quick job ends up taking hours or even days of busywork, killing Mary’s productivity and sapping her morale. Here, testing had the opposite of its intended effect by draining productivity rather than improving it while not meaningfully increasing the quality of the code under test. This scenario is far too common, and Google engineers struggle with it every day. There’s no magic bullet, but many engineers at Google have been working to develop sets of patterns and practices to alleviate these problems, which we encourage the rest of the company to follow. […] Tests were brittle: they broke in response to a harmless and unrelated change that introduced no real bugs. The tests were unclear: after they were failing, it was difficult to determine what was wrong, how to fix it, and what those tests were supposed to be doing in the first place. […] Any time spent updating old tests is time that can’t be spent on more valuable work. Therefore, the ideal test is unchanging: after it’s written, it never needs to change unless the requirements of the system under test change. “

Test doubles, or mocks, also introduce some complications. Improper use can lead to complex, easy-to-break tests, lowering the productivity of the engineers. The behavior of mocks must be as close as possible to the original (although perfect fidelity is rarely possible). Because of their imperfection, they must be backed by other larger-scope tests on the real implementation.

Whiteboard markers

I am a part-time Collective Intelligence facilitator, I organize workshops where we use sticky notes and markers, which I often have a hard time to find, so I found this particular anecdote amusing.

It’s common for companies to keep their office supplies under control and not in free access. At Google, they decided to let whiteboard markers (and other supplies) in unlocked closets.

They decided to trade-off the potential loss of markers, that cost hardly anything, for smooth meetings where people are not interrupted in their train of thoughts by dry or missing markers. In the words of the authors, “it is far more important to optimize for obstacle-free brainstorming than to protect against someone wandering off with a bunch of markers“.