Intro
I was recently granted the opportunity to model a database for a project involving multiple integrations that required different transaction types. Though I have done database modeling in the past, I haven’t done it for a project of this size and complexity. During this time, I did get to learn a handful of things of what it takes to be a successful data architect. Upon reflection, two of the more interesting, yet very simple, learning moments that I will utilize in future modeling opportunities are asking lots of good questions and understanding life cycle models.
Asking Lots of Good Questions
Don’t get me wrong, you may have very bad questions because you aren’t able to formulate your thoughts well or you just don’t know enough, but I’m in the boat of a bad question is better than no question; it can at least spark up conversation. I suppose the first topic I should’ve wrote about is “learn the data” or “learn the domain” because that’ll make everything a lot easier. My bad.
Anyway, from my experience, which is a little embarrassing, but my domain knowledge for this project wasn’t up to par. Because of that, it was imperative that I asked tons and tons and tons of questions – even if they were “common knowledge”; there was no time to worry whether or not I would look stupid. My questions generally sounded like the questions below:
- What is this?
- What is that?
- What does this entity represent?
- Are these entities related? If so, how?
- Do these entities have a one-to-many relationship?
- Do these entities have a many-to-many relationship?
- Do we need to store this data?
- Are we allowed to store this data?
- Is there a need for historical data tracking?
- When a certain action is invoked, what do you expect the data to look like?
- Can this specific scenario happen?
- Are there edge cases in any of these relationships?
- What’s the expected format of the values?
- Who needs to access the data?
- What transactions are going to happen against this data?
And you can easily add a 1,000,000 additional questions to that list.
While in the questioning phase, a tactic that helped avoid getting ambiguous answers was providing specific details within the question. Since everyone has their own perspective on the project, it was important to ensure the other person understood what exactly I was asking. For an overly simplified example, asking the question “What’s the format of this piece of data?”, can be interpreted differently between a business analyst (BA) and a database engineer (DE). A BA may say “should be text”, whereas a DE may say “it’s an integer”. A couple of revised questions may be “What should this format be when displayed on the web?”, or “What should the data type be for this column?”. Simple example, I know, but I hope it gets the point across.
Life Cycle of Models
Something I never really thought about was the life cycle of models. Similar to systems development life cycle (SDLC), data model life cycles (DMLC) are just as important; in my perspective, for documentation and consistency. For instance, in the scenario where a developer utilizes the logical model to create the physical model, but the logical model is never updated for new changes that occur in the physical, that would obviously cause inconsistent information. Then per se the original developer is hit by a bus and a new developer takes over and realizes there is a gap between the models, how would they know what is truly correct? All of this wasn’t really important to me until data analysts were asking me why there were inconsistencies between the database the logical model. Though it was a simple fix, I found it interesting that I probably wouldn’t have cared about it if I wasn’t doing the modeling.
When developing the database models, I was using Idera ER/Studio Data Architect. It was simple and intuitive, but there was one aspect I found quite interesting that also relates to DMLC – the repository. It never occurred to me how beneficial version control was for models, especially when you can publish them on the portal for others to quickly see. In tandem with publishing, being able to have a central repository for other developers or architects to make changes to the model was eye-opening. It goes to show how closed-off I was to the idea that only code would benefit from version control.
Conclusion
In any case, I found the entire data modeling process to be more challenging and rewarding than I originally anticipated. Lessons were definitely learned during the process. I have to admit, I did lean on an actual data architect for a lot of help, but the collaboration is a fun part for me. I’m looking forward to more modeling opportunities when I get them.
Happy coding, err I mean architecting?