Coding Against Data or Data Structures


Your an applications developer. You are provided data in some format; an API, a database of one form or another, or something else. You are asked to build a feature using the provided data. You look at the data, the format it is in, how you might consume the data in order to build the feature, and then you build the feature. The data changes, the feature breaks because the data is not in the expected format, and rework is needed.

The “proper” solution that will often be suggested is to code the logic to not be overly stringent on the format of the data. Code for edge cases, do not bake in assumptions into your code, protect for every scenario and alternative format. The code becomes complex and unmanageable, and defining what is a feature and what is a bug becomes extremely difficult. Defects are raised, investigations happen, and no one really knows what is supposed to happen, if any such objective declaration is even possible.

The solution of course, is to not write code based upon data, it is to write code based upon agreed upon data structures. This is not to say that code should not be resilient, gracefully degrade, and look for alternative formats, but none of these are solutions to the original problem.

Too often we want “real data” so we know what it will look like “in production”. This is often an indication of slipping into the mistake of coding against data that may change instead of data structures that are agreed upon by both systems.

I very often do not want the “real data” especially not when the features are being developed. Rather I want an agreed upon data format, with sample data that reaches into the limit of that data format in every possible direction. That data structure and the corresponding data samples can then be developed and tested against. The “real data” should never pose new scenarios that were not already encompassed in the definition of the data structure. Indeed it does help to have access to “real data”, “production data”, or “user data”, but this should never be substitute for a well defined data structure and the corresponding sample data that reaches every boundary of that data structure.

Check out my blog for more of musings upon technology and the various other topics that I am interested in.