Introducing "status" column to an SQL table changes its definition

Question

I saw this pattern in multiple projects I wrote: I create an SQL model for a certain type of entities and at some point, we realize that there's a need to store multiples types of the same entities. Usually I introduce a "status" column to the table, because the type is mutually exclusive. For example, person can be "dead" or "alive", chat_message can be "to_send", "sent", "to_edit", "to_delete" and "deleted".
The problem is, when I introduce such a column, I need to re-trace all of the queries I made and consider whether it's still valid for all statuses - otherwise I'd need to specify it in the query. It's very easy to introduce a bug at this point, which makes me wonder: is it a common pattern in software engineering? Which approaches would help me avoid it?
If it wasn't for database normalization, my perfect solution would be to copy the schema of the SQL table for each possible value of the "status" field. For instance, when I need to introduce "status" to "people", I would drop the "people" table and instead create variants for "alive people" and "dead people". I don't think I ever saw this kind of solution though and it feels like tech debt to me. What other options do I have?

Joop Eggen · Answer

Moving people to dead_people rarely makes sense. It is a form of an archiving cleanup, so the original table works optimal. One use is having a fast small user table for mass event checks (user rights), and a user table with full info.
With O/R mappings I would suggest discriminator columns and use database inheritance when a "status" column determines which other columns are used.
With that the queries can be still maintainable.
Say you have a query on people and now introduce dead or alive people. Then the business rule determines the scope of the query: what is the goal of that piece of code.
The obvious advice is do not repeat yourself. Queries may often vary in some criteria. Measures to follow DRY can be done in JPA (java).

A concrete example:
A data logger receives binary message from devices. Such a message consists of bytes, containing several signals, bit groups. For the definition of such binary message parsing there is a table. The signal is an entity.
Now comes a time, where custom signals user-defined as expression of signals is introduced. Signal becomes message-specified original signal or expression based custom signal. Consider that now a custom signal can also be defined from other custom signals rather than merely the original hardware signals.
This is a semantic change Signal = HardwareSignal xor CustomSignal.
A discriminator column and separate O/R mappings would do. A clean separation of corresponding fields: reference to a MessageSpecification/SignalSpecification for HardwareSignal only. No unused fields. So no fields can be accessed unintended.
That would be sufficient for a good software maintainance.
All this is not optimal, but I still have not encountered a database + O/R framework that does this all gracefully. Despite Domain Driven Development and more. But who knows.

Tulains Córdova · Answer

For instance, when I need to introduce "status" to "people", I would drop the "people" table and instead create variants for "alive people" and "dead people".

You can instead create a dead_people view and a living_people view. Anyway, your code should query views and not tables to begin with.
I understand the overhead of checking everything everywhere, but with modern MVC approaches and separation of concerns, the places where a certain table or view is queried should be very limited and not widespread. Is it possible you are talking about legacy code? Been there, done that. Creating views, specially for the more widely used cases helps. What helps even more is creating functions or business objects that return the stuff you want and that are the only places where the database is queried.

tale852150 · Answer

It is very common to use a “status” column in tables even if it might add overhead.
The added benefit is you can do “logical or soft” deletes in such cases as you might use a “status” column as representing the status of your row data.
I (almost) always include some kind of “status” or “flag” column but that may due to the types of applications I have worked on.
I have also encountered (but less so) each row being given a start and end date with the “active” record having no end date.  But that will only assist you when dealing with “active record status”.  The benefit here is that record history is kept in the table.  An obvious drawback would be increased space consumption.
As mentioned in the comments you would also probably be best served with a lookup table that contains all your statuses and related codes.  But be careful not to create a Massively Unified Code-Key (MUCK) table.
Which brings me to the conclusion that there’s no silver bullet / one size fits all answer here since the need for a “status” column is driven by the needs of the application.

Introducing "status" column to an SQL table changes its definition

3 Answers

That would be sufficient for a good software maintainance.

Add your own answers!

Ask a Question