Build an Automated Website Tag Monitor in CloudWright

February 09, 2020

While building out exploratory products, it frequently falls on Product Managers to ensure that customers successfully adopt a tool and remain active users. While some PMs have development experience, product teams more often rely on internal teams to provide monitoring tools.

CloudWright is an ideal platform for building tools to support product teams. Because CloudWright integrates with the SaaS tools already used in your organization, the touchpoint for internal users can be familiar interfaces like spreadsheets and messaging tools.

In this article, we'll use CloudWright to build an automated website tag monitor to support a product team building a new analytics product.

Automated Tag Monitoring

Our example company is building a web analytics product — customers implement this solution by including a <link href=...> tag on their website. Our Product Manager's highest priority is making sure that customers include this tag on their website:

  • Until the tag is implemented, the customer isn't getting any value from the product
  • If a customer removes the tag, they'll likely churn — the worst-case outcome!

Unfortunately, opening the browser console, loading sources, and digging around daily is time-intensive and not easy for a product manager — and very impractical on 1,000+ customer websites. Instead we'll use CloudWright to build an automated tag health checker:

Scraper workflow

  • our product managers will add the URLs they'd like to monitor into a Google Sheet
  • each day, a CloudWright Application will check whether each URL still loads our tag
  • if the tag is missing, we'll email the account owner and flag the tag as missing in our spreadsheet

We'll use a simple three-column Google Sheet: (1) the URL to monitor, (2) the customer account owner to notify about problems, and (3) a tag status column we'll populate automatically:

Doc

In this article we'll walk through how to build, test, and deliver this application to the product team.

Modules

Our application will interact with two services — Google Sheets (using the popular gspread library) and Gmail. Our applications will interact with these services using CloudWright Modules — you can learn more about how to create and use Modules in the CloudWright docs.

Modules

We'll also include the beautifulsoup package directly from PyPi, to help us parse HTML. We can include all three of these dependencies in the module configuration dialog.

Code

We'll walk through the important parts of the CloudWright application in this section — you can find the full source on GitHub.

We start by using the gspread module to open our Google Sheet tracking document (the user our module authenticates as will need edit permissions on this document):

sheets = CloudWright.get_module("gspread")
sheet = sheets.open_by_key("1SS9UxMkH8Pagpcnsz537CdFBsqo15vRsdWJ13-_lpkc").sheet1

We can read the entire document with the get_all_values method, and iterate over every row (skipping the header). We only have three rows in our spreadsheet — the URL to check, the account owner (to get email alerts) and the tag's status:

values = sheet.get_all_values()
for row_num in range(1, len(values)):
    url,owner,old_status = values[row_num]

The only important logic in this application lives in the contains_tag method, where we use BeautifulSoup to scan an HTML page for tags of a certain type — in this case, link objects with an href which matches our domain (for this example, we'll just look for the Google Analytics tag)

matches = lambda link: link['href'] == 'https://www.google-analytics.com'
return next((x for x in soup.find_all('link') if matches(x)), None)

Fetching each URL and parsing it into a BeautifulSoup object is straightforward. After calling our helper method, we'll update the cell's status with an indicator of whether we found the tag or not:

    request = urllib.request.Request(url)
    response = urllib.request.urlopen(request)
 
    if contains_tag(BeautifulSoup(response.read())):
        status = "YES"
    else:
        status = "NO"
    ...
    sheet.update_cell(row_num+1, 3, status)

Last, we'll also want to push alerts about customers at risk of churn to our product team, instead of waiting for them to check the spreadsheet. If the tag is newly missing, we'll use the gmail module to shoot a message to the account owner marked on the spreadsheet:

    gmail = CloudWright.get_module("gmail")
    ...
    if old_status != "NO":
        gmail.send_email(f"{url} tag removed", 
            "Site analytics tag removed!", 
            owner) 

Since we want this script to run automatically, we can set up a daily cron Trigger through the Trigger dialog:

Trigger

With the code and trigger in place, we're ready to actually test our application.

Testing it out

Our trigger will handle the daily automation we want in production, but to test our application while developing, we can just use the Dry Run console. We can set up our spreadsheet with some example entries, kick off a dry-run, and watch our application sequentially check and update each tag status:



We can also check that we've gotten the emails we expect our account managers to receive:

Email

With the email and spreadsheet integrations working as expected, we're good to go — we can deploy our application and hand off the spreadsheet to the product team. Since the touchpoint is a simple Google Sheet, PMs can take ownership of populating the URL list and watching for missing integrations.

Conclusion

By deploying your code as scalable cloud infrastructure with robust integrations to cloud services, CloudWright makes it easy to build internal tools which integrate with familiar tools and hand them off to non-technical teams.

If you'd like to use CloudWright to build your own product monitoring tools, we'd love to help — you can get started using CloudWright today.