Detecting unused database columns using Ecto schemas
When building large database-backed Elixir applications using Ecto, it is inevitable for your database schema to evolve over time. As your database schema changes, it's possible for there to be discrepancies between the tables in the database and your local schema definitions. Tracking inconsistencies between the schema defined in your application's modules and what's actually in the database can be tedious.
Thankfully, there's a way we can look for inconsistencies programmatically.
Finding all schema modules
A normal Ecto schema will look something like the following:
defmodule User do
use Ecto.Schema
schema "users" do
field :name, :string
field :age, :integer, default: 0
field :password, :string, redact: true
has_many :posts, Post
end
end
schema
is actually a macro, that, when expanded, will declare a number of functions. We can use the presence of the __schema__
function to filter our application's modules to grab only those that have a schema definition.
To get all modules defined in your application we can do:
Now we want to filter out modules without the __schema__
function:
This will give us all modules where the __schema__
field is present. This will include modules that define embedded schemas however. Since embedded schemas don't directly map to a database table we'll want to exclude those modules. Taking a look at the implementation of the schema
and embedded
macros reveals that only schema
definitions will add an additional __meta__
field to the module struct.
Comparing the schemas with the database
We can now iterate over all schema modules. Let's define a function for this:
defmodule SchemaChecker do
def schema_modules do
{:ok, modules} = :application.get_key(:your_app, :modules)
modules
|> Enum.filter(&({:__schema__, 1} in &1.__info__(:functions)))
|> Enum.filter(&(:__meta__ in Map.keys(&1.__schema__(:loaded))))
end
def check_module(module) do
[...]
end
end
To obtain the list of fields on each schema we can continue using the __schema__
function.
fields = module.__schema__(:fields)
This is almost perfect, but fields in our schema can have different names than the database column. For this we need to get the :field_source
, so we change the above to:
fields =
module.__schema__(:fields)
|> Enum.map(&module.__schema__(:field_source, &1))
|> Enum.map(&Atom.to_string/1)
You'll note that we also convert the atoms to strings. This is to more easily compare the lists later on.
We can now run a simple dummy query against our database with a LIMIT 0
that will return an empty result set but list out all available columns. By computing the difference between our list of fields and the database columns we can determine which columns are no longer referenced by any schema:
table_name = module.__schema__(:source)
{:ok, %{columns: columns}} = DB.Repo.query("SELECT * FROM #{table_name} LIMIT 0")
columns -- fields
You may have to play around with the result set a bit in case you are not using PostgreSQL.
Summary
This post is the result of several database cleanup efforts in my dayjob. I was positively surprised how easy it was to leverage Ecto's powerful API and well-written documentation to get this task done.
You can find the final version at: https://gist.github.com/halfdan/6853a8cc8994eca4c9311ecad04a0eb0