News, Articles & Knowledge

How Dark Traffic Can Lead to Inaccurate Google Analytics Data

Posted by Charlie on 23-Apr-2020 13:25:14

Every business seeks accurate data. Accurate data can be transformed into useful insights that guide your overall strategy. You may rely on Google Analytics for accurate and reliable website data however if this is the case; you need to be aware of something called ‘dark traffic’. Although dark traffic can go mostly under the radar, it’s likely having a huge effect on the quality and accuracy of your data.

But what is dark traffic?

Dark traffic is simply traffic to your site which is misattributed to direct. To better understand dark traffic, you need to know what counts as direct traffic. Direct traffic is usually considered as when people enter your site through entering a specific URL in their browser or clicking on a bookmark. However, direct traffic is a lot more complex than this.

Google will attribute traffic as direct when they’ve come from email, messaging apps like WhatsApp, secure sites, mobile apps, certain image links, shortened untagged links, untagged documents and almost any kind of installed software.  

Any easy way to remember how Google Analytics defines direct traffic is the following:

Google will set your traffic source to direct when there’s no attribution or referral information.

Attribution and referral information is lost whenever someone navigates to your site using a link or URL with no UTM parameters or referral information from a previous site.

In light of this we can see that a huge number of visitors will be attributed to direct. However, a significant proportion of these visitors will have engaged or come from a source which is being completely lost. This means your previous marketing efforts which would’ve led to this visit would be ignored – devaluing your work and the success of a campaign, medium or source.

Dark Traffic Scenario

You’ve created an incredible piece of content. You’ve spent a lot of time optimising the copy, design and more. There’re infographics people will find engaging and you’re looking at a common issue from a new and insightful perspective. 

It’s a great piece of content that’s going to get some traction once you start to share and publish it around the web. 

You start to share this content around social media, through email campaigns and other channels. 

The posts themselves and the emails you sent do really well. You can see in your ESP (Email Service Provider) you got a lot of clicks and your social media posts have been shared and retweeted a number of times. 

After a few weeks of this and your content gaining even more traction you go to have a look at your Google Analytics to gain a better understanding of your users’ journey and source. 

But one of the biggest sources of traffic to this page is direct.

In this scenario, you have no idea where a significant proportion of users to this page have come from. This means you don’t know whether your social posts on Twitter or LinkedIn for example actually led to any real engagement on your website. 

One way the true source of these visits might have been lost is the following: 

  1. A user clicks on the link on Twitter
  2. They find the article interesting and want to share it with their colleagues 
  3. They copy the URL from the browser
  4. They then share the post URL in an email to a number of colleagues
  5. The user’s colleagues open the email and click the link 
  6. The colleagues visit the site (but they all appear as direct traffic)

The problem here is that once the user copied the URL and shared it through an application (such as email) the referral data was lost. So, although the traffic which follows is actually attributed to the work done on social, Google Analytics won’t be able to tell this. Decisions could then be made to no longer work on social – even though it led to all these visits. 

In this case, a wrong decision was made, because important information was lost, and dark traffic obscured to the true value of social for this piece of content. And this is just one example of how dark traffic can be created and its impact on a campaign.

We’re going to talk a little further down about how you can avoid this happening yourself. 

The Impact of Dark Traffic on Data Insights

We use Google Analytics to analyse the performance of all our marketing channels and campaigns together. This way we can compare results and see where budget and time should best be spent for the best results. The trouble is that if the majority of your traffic is being attributed to direct it can be impossible to accurately analyse this data and identify trends or insights.

The consequence of this is that time and money might be spent on a channel which isn’t performing as well as another. In addition, from a user experience perspective, understanding where people enter the site from and what led them to do so is essential to providing the best experience possible and increasing conversion rates on your site (for whatever action brings value to your business).

Misattribution could be damaging your Google Ads campaigns

Not only will dark traffic be negatively affecting your reporting and strategy development. It could also be actively harming your Google Ads campaigns.

If you use any kind of automated bidding strategy (that relies on conversion/action data) you’re not going to be providing accurate data for Google’s machine learning algorithms to optimise your campaigns with. This might lead to Google Ads missing a whole range of conversions, including all data/signals around those conversions like user interests, demographics, time, placement and more – all of which Google could have then optimised for moving forward.

What about Multi-Channel Funnels and Conversion Paths?

Google Analytics provides a number of reports that supposedly look at the full picture of a user’s journey to your site and to a conversion. However, they’re still unable to decipher direct traffic.

If you look in your report and you’re seeing direct x2,3,4,5 it’s quite likely a number of these direct visits should actually be attributed to paid, organic or referral traffic. In order to make these reports accurate you need to clean your data and stick to best practices. Only then does the full value of these reports become apparent.

multi-channel conversion paths on google analytics

How to Avoid Dark Traffic in Google Analytics

Due to the nature of dark traffic, it’s hidden and hard to locate in most cases. However, there are a number of tactics you can try to uncover this data and shed some light on the true performance of your campaigns.

Step 1 – Prevention

The most effective way to avoid dark traffic in the first place is through the use of UTM parameters. These are attributes which you manually define and place on the end of a link. These parameters tell Google the exact source of a visit (the source you originally defined).

When following best practices, you should include the following on your links:

  • Source - google, facebook, twitter, local
  • Medium – cpc, paidsocial, email, organic
  • Campaign – spring_sale, product_launch

Example: https://www.innovationvisual.com/?utm_source=twitter&utm_medium=social&utm_campaign=dark_traffic

The UTM parameter used is everything that follows the “?”. This character tells Google that everything after is an attribute and not part of the link itself (meaning it won’t affect where you send people).

The UTMs which were added in this example are:

?utm_source=twitter&utm_medium=social&utm_campaign=dark_traffic

By ensuring you apply UTMs to your links, wherever they might be shared or posted you can keep control over your data. If someone was to share this link on WhatsApp, Messenger or open it through an email, Google would still attribute the visit source to Twitter and not direct.

In addition, you can use link shortening services to make these more user friendly. And less likely to be altered down the line (when shared). Tools like Bit.ly and HubSpot all do this.

Step 1 – Location

Once you’re making sure the links you share are fully optimised and follow best practices with regards to UTMs you can start to locate traffic in GA which might be anomalous or inaccurate due to dark traffic.

It’s important to note any changes you make won’t be retroactive in GA and data you’ve collected up to this point cannot be changed. Potentially you might want to create a new view or utilise notes/comments to ensure you can see when changes were made on the account.

To locate dark traffic, you’ll first want to implement a segment looking at only Direct traffic. Once this segment is applied (there should be one built in) you can start to try and identify potential pages in direct which have been bolstered by dark traffic. It’s important to note it’s unlikely you’ll find everything, but the aim of this exercise is to identify any noticeable pages and potential trends.

You’ll now want to navigate to Behaviour > Site Content > All Pages. The pages and data you see will have been segmented into Direct due to the segment. However, now we want to filter out any top-level pages which people might actually remember and come direct to… These are pages like /blog, /contact, /about and so on. You’ll want to try and filter out as many of these as possible till we only have longer, harder to remember URLs which people have unlikely come direct to – meaning the traffic will mostly be as a result of dark traffic.

direct only traffic using a segment

It’s important to remember that some pages will be a mix of actual direct traffic and also dark traffic. But if a page is getting huge amount of traffic all in direct (and it’s a long, hard to remember page it’s likely suffering from a lot of dark traffic).

Additional ways to locate dark traffic in GA:

  • Forms, chatbots and surveys – try asking users on some of the pages you located earlier how they found the page and discovered the site. You might find some interesting trends. Tools like Hotjar can be great for this kind of data collection. At the end of the day, only the user knows for sure how they got to the site. We’re just making educated guesses.
  • Include easy and accessible sharing options for content. This includes sharing for applications like Messenger, Slack and WhatsApp. This ensures all bases are covered. And you maintain control over your attribution.
  • Compare data between platforms. If your social media posts are blowing up and you know from first-hand data that people might be talking, sharing and promoting your content – but you’re not seeing the data in social channels in Google Analytics. Perhaps it’s being misattributed? With this you can try and fix the issue at the source. A key link might not have UTMs applied?
  • You can try removing return visitors – this might remove people who’ve bookmarked a page or written down a URL somewhere. Reducing the number of direct visitors when trying to locate dark traffic visitors.

Step 2 – Improving GA Reports

Additionally, you can improve the GA reports you rely on to provide clearer data on direct and dark traffic coming to your site.

One way of doing this is by completely removing the idea of direct traffic altogether. At the end of the day direct traffic is just traffic which Google – nor anyone else knows the source or referral data for. Making it… dark.

With this in mind we can segment dark traffic into 3 groups. Credit to Sayf Sharif at Seer for the idea.

An underused and undervalued area of Google Analytics is located in Channel Settings > Channel Grouping. Here you can define specific channel settings and their definitions. Here we’ll change direct into 3 groups.

Surface Dark – Channel 1: This is the traffic the comes to the homepage i.e. innovationvisual.com. This is easy to remember and less likely to be so dark. It’s likely to be people who’ve bookmarked the page, typed in the URL etc…

Shallow Dark – Channel 2: This is traffic which comes to category pages, sections, and is a mix of slightly longer tail pages. The number of people bookmarking etc these pages will be reduced.

Deep Dark – Channel 3: These are visitors to deep pages on the site. This includes blog posts with long URLs. Any direct traffic to these pages will likely be return visitors (which you can filter out) but the majority will be new users who found the content through ‘dark’ means.

This strategy doesn’t tell you exactly where the dark traffic came from. But it does highlight areas which might be affected more, and it does shed a light on a number of interesting visitor behaviours. 

long tail urls that are potentially dark traffic

In the end, what’s the best outcome for Dark Traffic in Google Analytics

Your data will never be perfect, but it does need to be good enough that you can trust the data you see, enough that you can make judgements and decisions based of its insights. Without this you’ll likely see your campaigns suffer due to the actions which come off these insights.

Dark traffic is a huge part of this optimisation process, but it does require some time to evaluate and amend where possible. But by following best practices and keeping a vigilante eye on the dark you can hopefully reduce its impact.

Is it time to review your Google Analytics setup?

Dark traffic isn’t the only enemy to quality data in Google Analytics, everything from poor event tracking, spam traffic and internal traffic can all skew your data. And as we all know data is one of the most valuable assets to any business.

If you’re worried about the quality of your data, including the effect of dark traffic on your website analytics, get in touch with one of our analytics and tracking experts at Innovation Visual and improve your data, today.

Topics: Strategy, Google Analytics