As a Data Science Consultant, I often have to coach my clients not to let data shame interfere with successful Customer Intelligence. What is data shame, you ask? It’s the common instinct to avoid data sources or integrations because they are messy, inaccurate, or sparse. It includes shying away from data sources that:

You feel are not as up-to-date as they should be
Require significant clean up or review
You suspect will be difficult to integrate because of the way they were set up
Are difficult to access
Or that you generally feel your internal / involve.ai team might judge you for.

We are not judging you for imperfect data!

Nobody’s data is perfect. That’s why the Data Science Consulting team exists. We help you sort through the messes, work out kinks, fill in gaps, and get to a place where data has meaning.

In my experience, there are a few categories of data shame, and none are insurmountable.

Concerns around imperfect data

“Our data is disjointed. We have some in our CRM, some in excel workbooks, some in Snowflake, and some in places I can’t even get to.”

This is one of involve.ai’s essential value propositions. We are built to unify your disparate data sources into a single view. If nothing else, if you don’t even care about applying AI to predict behavior, you can use involve.ai to bring your disjointed data sources together. My team does this work every day and the more data the better, so bring it all to the table.

“We have data, but it’s not what we want to show our end users or roll up in the involve.ai dashboard.”

I have worked with several customers who don’t track or capture the exact data they want to see in their dashboard. We partner together to determine what types of formulas we can run on what types of available data in order to provide the desired metric in the dashboard which would sense to the broader team.

“We don’t even know which customers are active.”

Not a problem. We have encountered this a few times and if given enough data, our models are able to extrapolate that information for you.

“We’ve acquired companies with their own unique tech stacks so we have several different tech stacks/data strategies/customer segments/unique user IDs, etc.

We are able to apply something called fuzzy matching in these cases to tie together the same customers across each system and subsidiary and have seen accuracy levels between 70-95% for our customers.

“We haven’t had great processes/accountability and our data is junk.”

I actually love when this is an issue because our team can provide so much value. Typically, we’ll host a working session with your team to better understand the data. Then we can clean it up on the involve.ai end. Then - best part - we can write it back to your initial data source. I love this, because there are outside services that charge for that type of work. We do it as part of onboarding and it’s so fulfilling!

Concerns around too little data

Okay, this is a harder problem to solve, but one of the most important. AI runs better and more accurately the more data it ingests, so when we work with customers who have very sparse data sources, our goal is to help them get more data. As many of our customers can attest, you don’t necessarily need a ton of data to start, and in fact may have a cleaner start with fewer sources. However, over time, adding data sources will have a huge impact.

If you’re in this boat, we typically recommend trying out a few common data collection tools. For example, SurveyMonkey for NPS or CSAT tracking, Google Analytics or Pendo for product usage, or even manually pulling together .CSV files for things like financial records. If anyone reading this uses a tool that you love, please share in the comments.

To summarize

By talking to our Data Science Consulting team openly about your data sources and potential problems with them, our team can better map, weight, and help you improve or find alternatives to them.

Data Science Consultants are partners in your Customer Intelligence journey and we are eager to help you move from data shame to data-driven.

Related resources

Defining your data

Preparing your data

Be the first to reply!