Detecting unused database columns using Ecto schemas

Detecting unused database columns using Ecto schemas
Photo by Mikhail Fesenko / Unsplash

When building large database-backed Elixir applications using Ecto, it is inevitable for your database schema to evolve over time. As your database schema changes, it's possible for there to be discrepancies between the tables in the database and your local schema definitions. Tracking inconsistencies between the schema defined in your application's modules and what's actually in the database can be tedious.

Thankfully, there's a way we can look for inconsistencies programmatically.

Finding all schema modules

A normal Ecto schema will look something like the following:

defmodule User do
  use Ecto.Schema

  schema "users" do
    field :name, :string
    field :age, :integer, default: 0
    field :password, :string, redact: true
    has_many :posts, Post
  end
end

schema is actually a macro, that, when expanded, will declare a number of functions. We can use the presence of the __schema__ function to filter our application's modules to grab only those that have a schema definition.

To get all modules defined in your application we can do:

{:ok, modules} = :application.get_key(:your_app, :modules)
Obtain a list of all modules in our OTP application

Now we want to filter out modules without the __schema__ function:

schema_modules =
  modules 
  |> Enum.filter(&({:__schema__, 1} in &1.__info__(:functions))) 
Filter out modules that do not have a schema definition

This will give us all modules where the __schema__ field is present. This will include modules that define embedded schemas however. Since embedded schemas don't directly map to a database table we'll want to exclude those modules. Taking a look at the implementation of the schema and embedded macros reveals that only schema definitions will add an additional __meta__ field to the module struct.

  |> Enum.filter(&(:__meta__ in Map.keys(&1.__schema__(:loaded))))
Further filter out embedded schemas

Comparing the schemas with the database

We can now iterate over all schema modules. Let's define a function for this:

defmodule SchemaChecker do
  def schema_modules do
    {:ok, modules} = :application.get_key(:your_app, :modules)
    
    modules 
    |> Enum.filter(&({:__schema__, 1} in &1.__info__(:functions))) 
    |> Enum.filter(&(:__meta__ in Map.keys(&1.__schema__(:loaded))))
  end
  
  def check_module(module) do
    [...]
  end
end

To obtain the list of fields on each schema we can continue using the __schema__ function.

fields = module.__schema__(:fields)

This is almost perfect, but fields in our schema can have different names than the database column. For this we need to get the :field_source, so we change the above to:

fields = 
  module.__schema__(:fields)
  |> Enum.map(&module.__schema__(:field_source, &1))
  |> Enum.map(&Atom.to_string/1)

You'll note that we also convert the atoms to strings. This is to more easily compare the lists later on.

We can now run a simple dummy query against our database with a LIMIT 0 that will return an empty result set but list out all available columns. By computing the difference between our list of fields and the database columns we can determine which columns are no longer referenced by any schema:

table_name = module.__schema__(:source)

{:ok, %{columns: columns}} = DB.Repo.query("SELECT * FROM #{table_name} LIMIT 0")

columns -- fields

You may have to play around with the result set a bit in case you are not using PostgreSQL.

Summary

This post is the result of several database cleanup efforts in my dayjob. I was positively surprised how easy it was to leverage Ecto's powerful API and well-written documentation to get this task done.

You can find the final version at: https://gist.github.com/halfdan/6853a8cc8994eca4c9311ecad04a0eb0