Output-Based Testing, Contracts, and the True Meaning of Refactoring

Introduction: Beyond the Label of “Refactoring Kata”

Whenever I take on programming exercises like “Audit Everything,” I see them as much more than routine practice. They’re an opportunity to explore the underlying principles that guide day-to-day coding decisions. As I worked through this kata, inspired by Vladimir Khorikov’s Unit Testing: Principles, Practices, and Patterns, I found myself wrestling with some foundational questions: What does “refactoring” actually mean? Where should we draw the line around a “unit” in unit testing? And at the end of the day, how do we really define behavior, contracts, and safe change in our applications?

If you’re passionate about writing meaningful tests and about evolving code with confidence, let’s go deep together.

What Is a “Unit”? Why Its Vagueness Is a Strength

The “unit” in unit tests is gloriously ill-defined by design. Sometimes it’s a method; sometimes, a whole collaboration of classes. The goal isn’t rigidity, but adaptability—allowing us to shape tests around meaningful seams in real code.

In the Audit kata, I view the “unit” as the AuditManager plus any contract it owns—in this case, the IFileSystem interface. I’ve written before about how the interface owned by the client becomes an integral part of the unit boundary. This keeps tests relevant and refactor-friendly—anchored where design intent makes sense.

Observable Behavior and “System Under Test”: Blurring the Lines

The centerpiece of Martin Fowler’s refactoring definition is “improving the design of existing code without changing its observable behavior”. Most developers interpret “observable behavior” as “whatever the user, or the outside world, can see”—which, in this kata, means the files produced.

But that perspective can be dangerously shallow. In reality, tests are your code’s first clients, and they define your contract as surely as runtime consumers do. If you must rewrite your tests (not just deduplicate or clarify), you are, by definition, breaking their contract—even if system outputs remain unchanged. That’s more than a refactoring; it’s a rewrite, with all the attendant risk.

Core Principle:
Not every change that preserves system behavior is a refactoring. But every genuine refactoring, by definition, preserves client-facing behavior completely.

API Refactoring: The Case of Renaming a Method

Let’s examine the popular “Rename Method” refactoring, often showcased as both essential and harmless. It’s true that, in codebases where all clients are under your control—like internal helpers used only in your app’s internals or your unit tests—renaming is a safe refactoring. Modern IDEs support it perfectly, updating all references atomically.

But what if that method is part of your public API? If clients exist outside your ownership, renaming that method is a breaking change, full stop. Even if tools allow you to quickly migrate your own code, external consumers don’t get that luxury. Library maintainers know that even trivial signature tweaks demand version bumps and migration guidance, precisely because the client’s contract is as sacred as the code itself.

If you rename a method used only internally, it’s a refactoring.
If you rename a public method or API and any client outside your control must change, it’s a breaking change—not a refactoring, regardless of how IDE-supported it is.

Context is always king. Code transformations are refactorings only when they are invisible to every client outside your purview.

My Output-Based Take: Trading Mocks for Smart Fakes

Most “standard” kata solutions abstract file operations with interfaces and then use mock frameworks to verify how AuditManager collaborates with its dependency. But that approach is deeply communication-based; tests inevitably check how something is done, not just what is done. This leads to brittleness as seemingly innocuous refactors (method splits, call order changes) break test intent.

Instead, I advocate for output-based tests—using a carefully crafted fake file system. My fake exposes only test helpers through an internal ITestDouble interface. This allows the fake to be used as a black box in tests, asserting directly on the files produced and their content, never on call order or interaction minutiae.

Example: The Fake File System

internal interface ITestDouble
{
  void   AddFile(string  filePath, string content);
  string ReadFile(string filePath);
}

internal sealed class FakeFileSystem
  : IFileSystem, ITestDouble
{
  private readonly Dictionary<string, string> files = new();

  public string[] GetFiles(string directoryName)
    => files.Keys.ToArray();

  public void WriteAllText(string filePath, string content)
    => files[filePath] = content;

  public IEnumerable<string> ReadAllLines(string filePath)
    =>
      files[filePath]
       .Split(Environment.NewLine,
              StringSplitOptions.RemoveEmptyEntries | StringSplitOptions.TrimEntries);

  void ITestDouble.AddFile(string filePath, string content)
    => files.Add(filePath, content);

  string ITestDouble.ReadFile(string filePath)
    => files[filePath];
}

Sample Output-Based Test

[Fact(DisplayName = "When current file is full a new one is created")]
public void Test3()
{
  // Arrange
  ITestDouble fileSystem = new FakeFileSystem();

  fileSystem.AddFile(Path.Combine(DirectoryName, "audit_1.txt"), "");
  fileSystem.AddFile(Path.Combine(DirectoryName, "audit_2.txt"),
    """
    Peter;2019-04-06 16:30:00
    Jane;2019-04-06 16:40:00
    Jack;2019-04-06 17:00:00
    """);

  var sut = new AuditManager(3, DirectoryName, (IFileSystem)fileSystem);

  // Act
  sut.AddRecord("Alice", DateTime.Parse("2019-04-06T18:00:00"));

  // Assert
  var content = fileSystem.ReadFile(Path.Combine(DirectoryName, "audit_3.txt"));

  content.ShouldBe("Alice;2019-04-06 18:00:00");
}

Comparison Table: Communication-Based vs. Output-Based Tests

Aspect	Communication-Based (Mocks)	Output-Based (Smart Fake)
Test Focus	Calls/interactions/arguments	Observable outputs (files, content)
Refactoring Fragility	High (breaks with collaboration tweaks)	Low (breaks only if domain outcomes change)
Contract Smells	Test logic or framework may leak into prod	Test helpers are strictly internal
API Impact	May force exposure for test’s sake	Interface contracts remain design-driven
Tests Change When?	On “how” changes	On “what” changes
Use Case	Message protocol, orchestration	State- and output-driven logic

A Note on Vladimir Khorikov’s Work and Refactoring Terminology

A quick but essential aside: my discussion here is not a critique of Vladimir Khorikov’s excellent book or the design journey it showcases. The move towards a functional, decision-oriented AuditManager is one I endorse as a design goal.

However, terminology matters.

It’s too easy—even in pursuit of noble design improvements—to violate the principle that refactoring means invisible change for all clients. If our internal change requires rewriting tests or any client code we do not own, it’s not truly a refactoring. We owe it to our future selves and to newcomers to be clear about these boundaries and the risks of breaking them.

Final Reflections: Precision, Contracts, and Pragmatic Testing

This exploration didn’t just teach me one more way to test a kata. It reminded me that the boundaries between “refactor,” “rewrite,” and “breaking change” are contextual, dynamic, and demand constant care. The safest tests are those built around observable, domain-driven outcomes, with clear contracts at every seam.

Refactoring is about invisibility: If your client can tell, it wasn’t a refactoring.
Context is king: Not every “rename method” is harmless. Not every “output same as before” change is as safe as it looks.
Own your language: Don’t confuse code modernization with purpose-built, client-safe refactorings.

Let’s continue questioning dogma and building codebases that are understandable, evolvable, and honest—with vocabulary that matches our intent.

Softwarecraft

Software is hard. Let's not make it harder.

Rethinking the “Audit Everything” Kata

Output-Based Testing, Contracts, and the True Meaning of Refactoring

Introduction: Beyond the Label of “Refactoring Kata”

What Is a “Unit”? Why Its Vagueness Is a Strength

Observable Behavior and “System Under Test”: Blurring the Lines

API Refactoring: The Case of Renaming a Method

My Output-Based Take: Trading Mocks for Smart Fakes

Final Reflections: Precision, Contracts, and Pragmatic Testing

Like this:

Related

Output-Based Testing, Contracts, and the True Meaning of Refactoring

Introduction: Beyond the Label of “Refactoring Kata”

What Is a “Unit”? Why Its Vagueness Is a Strength

Observable Behavior and “System Under Test”: Blurring the Lines

API Refactoring: The Case of Renaming a Method

My Output-Based Take: Trading Mocks for Smart Fakes

Final Reflections: Precision, Contracts, and Pragmatic Testing

Share this:

Like this:

Related

Discover more from Softwarecraft