Mastering API Dependencies: A Guide to Resolving Kernel Conflicts Like the TCMalloc Case

From Wwwspill, the free encyclopedia of technology

Introduction

Every API, no matter how well-documented, can become a victim of its own success. Hyrum's Law—a principle stating that any observable behavior of a system will eventually be depended upon by someone—recently manifested in the Linux kernel community. The kernel's restartable sequences (rseq) mechanism, designed to improve per-CPU data access, saw performance enhancements in the 6.19 release. These changes stayed within the documented API, yet they broke Google's TCMalloc memory allocator. TCMalloc had been relying on undocumented behaviors, accidentally preventing other code from using restartable features. The kernel's strict no-regressions rule forced developers to find a way to accommodate TCMalloc's unintended usage while preserving the performance gains. This guide walks you through the steps to handle similar API dependency conflicts, using this real-world example as a case study.

Mastering API Dependencies: A Guide to Resolving Kernel Conflicts Like the TCMalloc Case

What You Need

  • Understanding of Hyrum's Law and its implications for API design
  • Familiarity with kernel development processes, especially the no-regressions rule
  • Basic knowledge of restartable sequences (rseq) and memory allocators like TCMalloc
  • Access to kernel documentation and mailing lists for reference
  • Ability to analyze API usage patterns in downstream projects

Steps

  1. Step 1: Recognize That Every Observable Behavior Becomes a Contract

    Hyrum's Law reminds us that even undocumented side effects—such as timing, error codes, or memory layout—will eventually become dependencies. In the rseq case, the documented API specified certain control structures, but TCMalloc began relying on specific behavior of those structures that wasn't promised. Start by auditing your own API: list not only the explicit guarantees but also any implicit behaviors that might be observed. Use tools like static analysis and runtime instrumentation to detect odd usage patterns.

  2. Step 2: Document Your API Explicitly and Completely

    Once you identify potential implicit contracts, document them. The rseq API had clear documentation, but TCMalloc relied on behavior that was not written down. In your documentation, describe parameter ranges, state transitions, and allowed concurrency. Add explicit notes about what is not guaranteed—for example, “The order of arrival for multiple events is not specified.” This reduces the chance of accidental dependencies forming.

  3. Step 3: Monitor Actual Usage Patterns in Downstream Projects

    Even with perfect documentation, users may still depend on undocumented implementation details. Set up CI tests that run prominent downstream projects (like TCMalloc, glibc, or other allocators) against your API changes. The kernel community often runs regression tests on common userspace projects. In the rsec situation, the break was discovered only after integration. Establish a feedback loop with major consumers to catch regressions early.

  4. Step 4: When a Dependency Breaks, Assess Its Nature

    When a project fails under your API changes, determine whether the failure is due to a legitimate bug or an unintended dependency. For TCMalloc, the code violated the documented rseq API by reading kernel-internal fields that were never meant to be stable. Classify the regression: is it a misuse of the API, a documentation gap, or a missing feature? Use bug reports and code reviews to understand the exact behavior being relied upon.

  5. Step 5: Engage with the Dependent Project

    Open a dialogue with the maintainers of the affected project. In this case, Google's TCMalloc team needed to be informed that their usage was incorrect. Explain the intended API contract, why the change was necessary, and propose a path forward. Often, the project can adjust its code to conform to the documented API. However, if the dependency is deeply embedded and the user base is large, you may need to consider temporary compatibility layers.

  6. Step 6: Apply the No-Regressions Rule Thoughtfully

    The kernel's no-regressions rule prioritizes not breaking existing userspace programs. This means you cannot simply reject a valid use case like TCMalloc's—even if it's technically a violation. Instead, you must find a way to accommodate it. In the rseq scenario, kernel developers added a compat mechanism so that TCMalloc's unintended behavior continued to work while the new performance improvements were enabled for other code. The no-regressions rule forces creativity: you can sometimes preserve backward compatibility by deprecating old behavior gradually, using feature flags, or maintaining multiple code paths.

  7. Step 7: Implement a Solution That Balances All Constraints

    Finalize a patch or configuration that satisfies the no-regressions rule, the original performance goals, and the needs of all stakeholders. For the rseq case, the solution likely involved a runtime check that allowed TCMalloc to continue using the old behavior while new users could opt into the faster path. Your solution should be well-tested across multiple userspace projects. Document the fix clearly, noting why the accommodation was made and how downstream projects should migrate to the documented API in the future.

Tips

  • Communicate early and often. The TCMalloc conflict escalated because the dependency was discovered late. Proactively share API changes with major projects on kernel mailing lists.
  • Use explicit contracts. Where possible, add defensive checks in your API that reject invalid usage early. For example, validate structure sizes and magic numbers to prevent code from relying on internal fields.
  • Plan for migration. If you must accommodate an unintended dependency, provide a clear deprecation timeline and documentation so that the dependent project can eventually comply with the documented API.
  • Write tests for unintended dependencies. Create test cases that deliberately misuse your API in plausible ways to see if downstream projects are likely to break.
  • Remember Hyrum's Law. No matter how careful you are, someone will rely on your implementation quirks. Accept this and build flexibility into your development process.