Marketo Field Cleanup with Python
If I were a betting man, I would wager that your Marketo instance has custom fields that aren’t being used — probably quite a few. Although these fields may not appear to cause any issues, they contribute to the clutter within Marketo, making navigation more challenging. Moreover, they create the risk of unintended data ending up in incorrect fields. The question arises – why retain elements that serve no purpose?
While there isn’t an easy way to find out which fields have data in Marketo without going one by one, we can use a somewhat manual process (export lists) plus an automatic process (python) to do this rather easily. The most time consuming part of this process is exporting the data. The volume of leads and fields dictates the export size; for instance, in a case where there were over a million leads and around 700 fields, I had to export 80,000 leads per list, resulting in 14 exports. Ultimately, this process revealed that more than 200 fields were not actively in use. While you may be able to use Excel and a pivot table if your database is small enough, it may take a while and I think most instances would benefit from using a python script instead.
Marketo Field Use Process
Create Export Lists
You will need to create static lists to add leads to in the following steps.
1. In the database section of Marketo, find the total number of leads in your database
2. Navigate to the Field Management section in the admin area and export a list of all fields.
For determining the number of lists, you can use the following formula:
Number of lists = ((Total Leads x Total Fields) / 70,000,000)
Here’s an example.
If there were 800,000 leads and 600 fields, you would multiply them to get 480,000,000. You take that number and divide it by 70,000,000 to get 6.85 which we would round up to 7.
This calculation ensures that you have sufficient lists to accommodate the leads and fields in your Marketo instance. Adjust the formula accordingly based on your specific circumstances.
Smart Campaign to Add Leads to List
Now that the lists are in place, follow these steps to use a smart campaign to add leads to those lists:
1. Smart List:
- Drag in a filter for “Email Address” and set it as ‘is not empty.’ This ensures that you capture all leads in your database.
2. Flow
- Drag in the ‘Add to List’ action.
- Add choices for all the lists you created in the previous step
- Set the condition to ‘Random Sample’ based on the number of lists you have. For example:
- If you have 5 lists, set the value to 20 (since 100/5 = 20)
- If you have 10 lists, set the value to 10 (since 100/10 = 10)
- If you have 15 lists, set the value to 7 (since 100/15 = 6.666)
- Add choices for each of your lists and set the last list as the default option
- Run the smart campaign
By using this smart campaign, you’re randomly distributing leads across the lists, ensuring a representative sample from your entire database in each list.
Export leads in Marketo
Exporting is the most time consuming part but you can do other things while it works in the background. Go to each list, one at a time, and export the list and choose ‘All Columns’. It will usually take a while (up to 10 minutes in some of my cases) but when it is done, you can save the file to your computer. Once you have exported all of the lists, go to the next step.
Run Python Script
Add all of the files to a folder that contains only the files you exported as it will make the next step easier. Below is the script I use. It will go through each file, get each row of data, and look at each column. If the column has a value with a length of greater than zero, it will add 1 to the count for that field in the dictionary. Once it is finished going through the files, it will create a csv based on the dictionary to give you a count of how many fields have a value.
import csv, os
def field_counter():
#change this to the folder where you put your export files and remember to leave a trailing slash at the end
folder_path = 'I:/Marketo Export Folder/'
files = os.listdir(folder_path)
#add your files location and name the csv whatever you'd like
csv_create = open('Your folder location/file.csv','w',newline='',encoding='utf-8')
csv_writer = csv.DictWriter(csv_create, fieldnames=['Field','Count'])
csv_writer.writeheader()
#this dictionary is where will will store all the data to count field usage
field_json = {}
for file in files:
open_csv = open(f'{folder_path}{file}','r',encoding='utf-8')
csv_reader = csv.DictReader(open_csv)
for row in csv_reader:
for mkto_field, value in row.items():
if mkto_field in field_json:
if len(value.strip()) > 0:
field_json[mkto_field] += 1
else:
if len(value.strip()) > 0:
field_json[mkto_field] = 1
else:
field_json[mkto_field] = 0
open_csv.close()
for field, count in field_json.items():
csv_writer.writerow({'Field': field, 'Count': count})
csv_create.close()
field_counter()
When the script is finished running, you can review the data in Excel or Sheets. I always sort the column by the number of fields in use, descending. I create a column that divides the field in use column by the top column (the first row will always be 100%) to see what percent of my leads have a value for that field. If any field is used in less than 25% of the leads, I look for ways to consolidate and delete those fields. Once you have your fields identified you can either hide them or better yet, have Marketo Support delete them entirely. This will make your instance much cleaner.