Short Form Data Quality & Validation

Data Hygiene, Data Quality, & Validation:

In our last post on the Visiqua blog, we discussed the first point of interaction and often the first point of failure for many marketers – The Landing Page.  Today we’ll look past it and into monitoring, measuring, and optimizing the traffic you send to the landing page.  It is critically important that we setup testing and data quality management apparatuses before launch so that we can learn from the first campaign. After all, “There is no failure, only failure to learn.”

Assuming you have created a landing page with your core objectives in mind, as we previously discussed , you are now converting users and need to process that data. In our last post about landing pages, we alluded to the need to score traffic and validate the data. Now let’s look at how to do that.  The process for this varies based on two main campaign types: 1) Broadly targeted campaigns that are conversion focused, and 2) highly targeted, high consideration purchase campaigns that focus on educating the user prior to converting them.

Today we will focus on data hygiene for broadly targeted campaigns.   A short form located above the fold is going to be the best practice for converting users in broadly targeted campaigns.  However, it will limit your data hygiene and validation options.  As we achieve scale, scoring the data coming into the form is the first step we can take to begin measuring quality and optimizing the campaign.  There are numerous effective click and user based traffic scoring options out there including (but not limited to): Are You A Human, FraudLogix & Forensic.  These tools are primarily designed to protect you from bot traffic, form stuffing, and other large-scale automated threats to data quality.  Traffic scoring in general is not perfect, but it serves as a good place to start.  To build your baseline, implement your tool of choice, apply it to a subset of known good traffic such as search, and create a set of target scores.   This will contribute to the full picture of your data, and the blocking features in these products will help you weed out the worst of the worst data so that it doesn’t clog up your CRM.

Once you have settled on a traffic scoring methodology, integrated that into your campaign, and created baselines, it’s time to look at what happens with the data once you acquire it. Assuming only a minimal set of fields are collected like: First Name, Last Name, Email – you will have little to work with from a cross-reference stand point.  Since the registration flow is short, time on site and other user engagement metrics will be of little use.  So, what to do? At the most basic level make sure you have a syntax validation system in place for your form fields.  Make sure the correct format, types of characters, numbers of characters, etc. are present.  Force the user to fill them out properly before allowing the form to convert.  This wont necessarily do a lot for keeping bogus data out of your database, but it will stop partial submits.  It will also give you an important data point:  Do the names at least loosely match the email address in a large number of instances? If they don’t it is time to dig into the data.  Once we are through basic syntax validation, then it is time to look to third parties who can assist in validating the data that is passing through the form.  In the case of a short three field form like the one above, that means validating the email address.  Examples of service providers that do this include Strike Iron & BrightVerify.  Given the short from limitations this is going to be your best indicator of overall quality.   Look at it from the source level stand point and build a baseline for what is an acceptable percentage of bad emails.  Then shut down the sources that exceed the baseline.

Above we have covered off on data scoring and validation.  This is a very general framework, but it should give you a good understanding as to which moving parts to evaluate and tools to put in place prior to an aggressive public rollout. These tools will not solve all of the data quality issues, but in concert with a good measurement regime, they will keep out the worst of the worst, and they’re a good early indicator of overall quality. Using indicators from each of these like: percentage of bot traffic, bounced emails, or lack of correlation between fields, gives us a solid foundation for optimization at the traffic source level.

In the next Visiqua blog update we’ll look at higher consideration purchases, optimization, and managing lead quality in content marketing.

POSTED ON Feb 23 16