ChatGPT and data management
Written by Jared Hillam
The reaction to ChatGPT caught even the OpenAI team off guard. For the last 73 years, we’ve had a measure of consciousness called the Turing test (see abbreviated illustration of Turing test from this Wikipedia image).
I would argue that the reaction to ChatGPT is proof enough that it has passed a practical Turing test. The universities and scientists would likely disagree vehemently with me. But oddly enough, these same university professors are frantically trying to determine whether papers have been written by their students or by ChatGPT.
In essence, ChatGPT is an extra-informed intern at your side, able to answer questions creatively. If it isn’t intelligent, then it at least fakes it very well. Since it has already passed many standardized tests for intelligence, it is at least useful. If it is intelligent, how can this intelligence be used for business intelligence (oxymoron not unnoticed)? Or how does it fit within a modern data-to-information pipeline? This whitepaper explores what GPT currently has up its sleeves and proposes some categories for its thoughtful use.
Openness vs Conscientiousness
Two of the Big Five personality traits are Openness and Conscientiousness. These traits could be renamed as Creativity and Consistency respectively. Organizations that are too heavily leaned on creativity struggle to scale their businesses and organizations that are too heavily leaned on consistency are easily surpassed by innovative competitors. So a balance is necessary for running a business.
Where does ChatGPT sit?
Ask yourself that question. Do you use ChatGPT for creativity or consistency? Without much hesitation, most people will say creativity. This is precisely why the technology has university professors worried because the same question can be asked by the entire class and ChatGPT will come up with a bespoke response to every one of them.
So how does this available creativity impact the data-to-information landscape?
Well, what is the data-to-information landscape? In short, it is the process of taking raw data and turning it into actionable information. To the layperson, it looks as simple as manipulating an Excel spreadsheet. That person wouldn't be wrong either. The difference is that, to take data from a spreadsheet and turn it into actionable information, it takes 57 tabs of manipulation gymnastics to get it there. Where the layperson gets this wrong is that the process of using Excel is not repeatable.
What organizations use is a process that can be leveraged enterprise-wide for hundreds of sources and thousands of users on predictable basis. So the systems that run the pipeline for turning data into information are far-reaching, highly tunable, logically reusable, and very scalable...or at least the organization hopes that's the case. One of the main things organizations expect from the systems is predictability. There shouldn't be 6 ways of generating the logic for an income statement. This is partially the reason organizations really care about reusable logic within their data-to-information landscapes. That's not to say there aren't different ways of coming at the same answer. Rather, to do so, shards the organization's ability to manage patterns of logic, if they are all going in different directions.
So where does this leave ChatGPT? Where can ChatGPT participate in a process that requires high consistency? This question really dives into the exciting unexplored realms that can act as opportunities for both future product innovation and current development.
If we live in the realm of today, ChatGPT can be immensely helpful to developers and architects. We're not in a place where it replaces them but rather becomes a good take-off point to start development. Imagine, for example, you're starting a new data warehouse for sales and marketing. The following could get the creative juices flowing while in a design session with the two in the same room:
On the flip side, once the tables have been identified they can be shared with ChatGPT to get started on table design. Here is one leveraging the sample facts and dimensions from above:
Once the designs are in DDL form, they can be pulled into SqlDBM as a good takeoff collaboration point for tying things together and making edits. The human touch is not eliminated, but we're also not having to start from a blank canvas.
Notice I'm asking ChatGPT to iterate on the design-time event. That is precisely where ChatGPT belongs in this iteration of the service. This would coincide with creating consistent assets that will be processed over and over. The run-time of those consistent assets is something that sits (for now and likely in the long term) outside of something like ChatGPT.
Perhaps the greatest boon with ChatGPT is the acceleration in development. I conducted a brief Slack interview with one of the Intricity developers. Here was some of his feedback:
- [Jared Hillam] How much more are you able to accomplish with ChatGPT? Is it a 20% increase?
- [Intricity Developer] Easily 20% overall. In some areas, it is like 50%.
Writing Python - it saves me so much time. I do not look up anything in the Python docs or search Google anymore - just ask ChatGPT.
Code documentation - I write a SQL stored procedure and then let Chat format it for readability and add documentation as comments - saves tons of time.
Writing code - I will sometimes ask Chat to write the code for me when it is something that can be contained and not too crazy. I'll iterate with it starting from simple concepts and then building from that as an anchor. I feel like I'm finding better ways every day to work with it.
Technical docs - I use Chat to build my technical docs. I talk to it just like I was teaching or training someone and then have it put all the info into a technical doc. Saves me so much time and I will now be able to revamp the templates and training materials that we use for health checks or new configs.
[Intricity Developer cont.] Here is an example of my code before ChatGPT comments on it:
[Intricity Developer cont.] Here is a pic after Chat added comments:
[Intricity Developer cont.] Here is an example of some documentation that I am putting together for the process that I have built using metadata tables and Snowflake scripting to generate Snowflake base config and schema create scripts. I am not done with the doc, but with Chat, I have spent less than half the time it would have taken to get this far.
One of the habits that developers had (and have) is they would (by necessity) go back to see if any code had already been written related to the problem they were trying to solve. This proved to be a VERY useful tactic for long-term code maintenance, because it established patterns that could be captured broadly and then used for global code management. Consistent repetition creates natural hooks which can be leveraged for downstream automation. This is why we follow naming conventions in development. However, this is where ChatGPT can be a problem as creativity is not always the right way to solve a problem over time and scale. GPT is already faster at generating samples than a developer looking at past code to start from. So this is already creating the problem of inconsistent coding patterns. This is where some additional enterprise capabilities need to be added to GPT in order to allow appropriate referencing of preexisting patterns when possible. This means that developers need to add steps in the add-on development to be sure consistent patterns are being used.
Mind the Gap
Here is where we reach an intermission. Rather this is the point between the design/development and the runtime. The runtime is where ChatGPT isn't. The runtime is the database, the ETL tool, the orchestration tool, etc. ChatGPT may have produced the very code that these systems use, but ChatGPT isn't any of these systems. The consistency of execution is processed by these tools day in and day out.
Many weeks ago, I was approached by quite a number of folks asking me what was going to happen to Snowflake & Databricks now that ChatGPT is around. The panacea of the technology was overshadowing its placement in our current technology landscape. At the time, over a LinkedIn video, I compared ChatGPT to a human and the database to an excavator. Then I sent a command out to the human to dig a pit 8-feet deep, but they couldn't use the excavator...that was the equivalent of asking ChatGPT to query an S3 file store without a query engine.
I would argue that the economics of creativity will be better served by leveraging the technology "excavators" of our current data landscapes. That doesn't mean these technologies are safe for the long run from innovations afforded by AI, but the very companies that provide these technologies will have the appropriate incentives to use AI to improve these repetitive processes.
Excel Once Again
On the other side of the intermission of mundane automated data routines, you have the consumption layer. Once again, this is the realm where ChatGPT can be useful. One simple exercise you can try is go and copy a balance sheet from any public company and paste it right into ChatGPT, then ask for its opinion on it. Here's ChatGPT's opinion on the GE balance sheet:
ChatGPT doesn't generate interfaces...yet. So the interaction of analytics lives in either web tables or...spreadsheets. While visually boring, ChatGPT already has developers creating extensions for Excel and Google Sheets. The use of these spreadsheets is interesting. For example, the ChatGPT responses can be generated on a cell-by-cell level allowing the adjacent column to be variables for determining the response ChatGPT should have. This could be useful for sales reps generating sales messaging for a list of first names that speak most closely to their given title on LinkedIn.
Or you can use it to generate generic categories for product types like this:
There are a ton of data manipulation capabilities as well, but the cost can be prohibitive beyond the scale of an individual. Part of the magic of ChatGPT is in asking just the right question so there are services that provide the plugins for Excel and Google Sheets which narrow GPT in certain cases. For example, using a dirty list of companies, I ask for a list of official names.
The spreadsheet may undergo a mini-Renaissance due to its flexibility in allowing multi-directional presentation with back and forth interactions.
What We're Seeing
So what we're seeing is ChatGPT being used during design, development, and consumption. The power of the ChatGPT service is undoubtedly an insanely useful platform, like hiring a highly-skilled junior developer that fumbles around here and there, but for practical purposes, has become indistinguishable enough from a real human to be an indispensable productivity asset. Designers, developers, and data consumers that aren't using ChatGPT are not slightly missing the boat. They're being left in the dust. It's not replacing processes requiring mundane consistency, but rather speeding up creative steps to get real work done.
Who is Intricity?
Intricity is a specialized selection of over 100 Data Management Professionals, with offices located across the USA and Headquarters in New York City. Our team of experts has implemented in a variety of Industries including, Healthcare, Insurance, Manufacturing, Financial Services, Media, Pharmaceutical, Retail, and others. Intricity is uniquely positioned as a partner to the business that deeply understands what makes the data tick. This joint knowledge and acumen has positioned Intricity to beat out its Big 4 competitors time and time again. Intricity’s area of expertise spans the entirety of the information lifecycle. This means when you’re problem involves data; Intricity will be a trusted partner. Intricity's services cover a broad range of data-to-information engineering needs:
What Makes Intricity Different?
While Intricity conducts highly intricate and complex data management projects, Intricity is first a foremost a Business User Centric consulting company. Our internal slogan is to Simplify Complexity. This means that we take complex data management challenges and not only make them understandable to the business but also make them easier to operate. Intricity does this through using tools and techniques that are familiar to business people but adapted for IT content.
Intricity authors a highly sought after Data Management Video Series targeted towards Business Stakeholders at https://www.intricity.com/videos. These videos are used in universities across the world. Here is a small set of universities leveraging Intricity’s videos as a teaching tool:
Talk With a Specialist
If you would like to talk with an Intricity Specialist about your particular scenario, don’t hesitate to reach out to us. You can write us an email: email@example.com
(C) 2023 by Intricity, LLC
This content is the sole property of Intricity LLC. No reproduction can be made without Intricity's explicit consent.
Intricity, LLC. 244 Fifth Avenue Suite 2026 New York, NY 10001
Phone: 212.461.1100 • Fax: 212.461.1110 • Website: www.intricity.com