Videos, Snowflake, Blog

What is a Database Schema?

Jared Hillam

Jared Hillam

December 3, 2019


The other day I had somebody ask me, what is a database schema? This is a great question and something that I think should get its own video title. I explain it in other videos but it's something that needs its own attention.

In essence the schema represents any kind of structure that we’re defining around the data. This would include  tables, views, fields, relationships, packages, procedures, indexes, functions, types, sequences, materialized views, queues, triggers, synonyms, database links, directories, XML schemas, and other elements. Without any of this stuff it’s easy to get lost in data.

Imagine for example, that I gave you access to a folder housing a bunch of data from different sources all garbled together. Imagine trying to make heads or tails of such data. Now with a lot of serious time and effort you could eventually get to some kind of structure for understanding the data. That structure, and its corresponding data relationships is what we would call the schema of the database. The application of that structure is something that each database vendor does differently. For instance there are unstructured data stores which apply schema only when data gets read. In other words the data lives in its garbled state and we apply the structure to the query code. This is what is called Schema on Read. On the other hand, there are databases which force structure as a condition before data gets written. This is called schema on write.

The method of how a schema gets designed can influence different behaviors in a database. For example, if a database schema is designed as a series of tables connected by primary keys, then it is likely something designed for reading and writing singular records which is ideal for applications like

However if a schema has a central table connected to keys supplied by surrounding tables, then it’s likely something designed to make read output highly efficient. This kind of schema is ideal for high scale information delivery, which is called a star schema.

When we’re interacting with data, the schema of the data becomes an important topic. A good schema can mean the difference between a query lasting a few seconds to a query lasting many hours. And this isn’t something that we can cheat our way around. The Big Data craze proved that lesson to the entire market. Many organizations jumped into Hadoop hoping for magic, but consuming oceans of data with no conformed schema just made for painfully slow analytics. To get a sense of this concept further you may want to take a look at the analogy I gave at the beginning of my video titled “What is NoSQL?”

Back when the Hadoop craze was at its peak, and the emperor was still walking naked without anybody noticing. We had a new customer asking, “Why are we doing all this data modeling? Isn’t data modeling an old concept?” To which our Solution Architect said, “I agree that data modeling is a long tested concept, but it works. Sort of like Pi, why do we still use pi? Isn’t that an old method of calculating a circle?” In other words, the concept of engineering an analytically focused schema will remain an important part of being able to feed information to a body of business users. Not only does it lower the processing time and cost, but it also reduces the code complexity of SQL queries. Incidentally that customer is today another happy and referenceable Intricity account.

Generating a data warehouse schema is a process that requires quite a bit of iteration. Usually it starts with a series of interviews which provide inputs into some documentation assets. Those assets ultimately lead to an early trial schema which we call a conceptual data model. This can act as a backbone for determining development costs for the Data Warehousing project etc.

Once a project is ready to say go on development, that early model will get iterated on multiple times. Often it will undergo the scrutiny of a room full of people attempting to ask questions of the model which it can’t answer. This beefs up the model’s resilience to being a representation of the business's fundamental building blocks for analytics. Once this model is vetted, it will then by physicalized into a database schema which the source data will be loaded and conformed to. Intricity specializes in this strategic process of defining future state Data Warehouses, and executing on their deployment. I have an earlier video which walks through the data warehouse strategy in detail. If you would like to see this video click on the link in the video description, and of course if you would like to talk with an Intricity specialist about your situation I’ve included a link for that as well. 

Intricity 101 Video:

Talk with a Specialist:

Related Post

What is a Partition?

Understanding the concept of database partitioning can be significantly illuminated by the historical context of hard drive defragmentation.

Learn More

CI/CD for Data Projects

Despite its prevalence in software development, CI/CD is less common in data projects due to differences in pace and cultural perception. Discover the importance of implementing CI/CD in...

Learn More

Snowflake ProTalk | Next-Level Features: Cortex LLM, ASOF Join, & Hybrid Tables

April 24 | Explore some of Snowflake's newest innovations with Cortex LLM, ASOF Join, & Hybrid Tables for data innovation!

Watch Now