I’m no Excel wizard. Before I joined Openprise, if someone had asked me to put together data from four different spreadsheets into a single resource by joining data, I would have been baffled as to how to even start.
Over the last few months, I’ve been learning how to use Openprise. As both an exercise for me and a way of creating a new resource for our customers, I recently took on the challenge of building out a new resource: easy-to-use metro area data for Europe.
As a bit of background, Openprise already had the ability to map US zip codes and Canadian postal codes and cities to metro areas. To do this, we use the census data that is freely available from the countries’ governments. Those tables are available as part of the Open Data we make available at no charge to all of our customers.
One of our customers asked if we had, or could develop, metro area data for European countries. Admittedly, it’s not as easily available as the US and Canada data, but it is possible to create—and I did it using Openprise.
First, the resources. I had four different tables that needed to be put together, all free and easily available with a bit of research:
- NUTS codes plus metropolitan areas: NUTS is Nomenclature des unités territoriales statistiques, the statistical territorial divisions that Europe uses to define various urban and rural areas. This table had the NUTS code, a five-digit alphanumeric code, paired with metropolitan areas for European countries.
- NUTS codes plus postal codes: This table had the NUTS code paired with the postal code of the regions for European countries.
- Postal codes paired with cities for each country: For countries such as Ireland, only the postal code prefixes are used, but for many the full postal code is used. These are available as a separate table for each country.
- City names and their equivalents in several languages: Because several European countries are multilingual and others have multiple names for the same place, having aliases for these cities provides a way of ensuring that Köln, Koln, Koeln, and Cologne are all recognized as being the same city.
Ultimately, the data needed to be knitted together like this so that the result could use postal code, city, or alternate city name to infer the metro area:
My first Openprise job was a test that used the data for Germany. I selected Germany as my test because the data contained two challenges that would be seen in other country data:
- Accent marks in city names (such as München) and the need for alternate city names
- Leading zeros in postal codes
This second item is an issue because many spreadsheets will automatically convert digits to number fields. When that happens, leading zeroes get dropped, so you end up with 34 as a postal code instead of the critical 00034 you would need in order to actually mail something or determine the location.
Importantly, because several countries have 5-digit postal codes it was critical that each postal code was mapped to the correct country so that, for example, a 5-digit postal code in Italy did not accidentally get assigned to a metro area in Germany.
Using append, infer, combine, and other Openprise templates, I developed a model that incorporated the four data sources and folders for each type of data, so I could easily format the spreadsheets for other countries’ data and use the same data sources for each Openprise job, creating one per country.
When my Germany job was complete and I had checked and validated the data, I was able to clone the job and make adjustments for other countries as needed. For example, Italian postal codes can have up to three leading zeroes (ex: 00034) and Irish postal codes only provide the prefix (ex: “begins with” filters and infer rules were needed). French postal codes needed to take the CEDEX suffix on some values into consideration. When all twelve of my country jobs were created, tested, and validated, I used some test data to see if I could get the correct metro area assignments for the inputs and tested and validated the output.
The result is a new resource now available to Openprise customers in the Open Data Catalog: “Reference – Europe Metro Areas,” which includes metro area mapping to city name and postal code for Austria, Finland, France, Germany, Ireland, Italy, the Netherlands, Norway, Spain, Sweden, Switzerland, and the United Kingdom.
While creating this taught me new Openprise templates, it also taught me something else: how much data I can put together using Openprise.
Instead of it taking me basically forever to try to use Excel or other data processing tools to put together these four data sources to get what I wanted, I was able to put it together myself in a matter of hours—and the only reason it took that long was that I was teaching myself Openprise and testing all the data as I went, spot-checking that the right places were in the right metro areas and that those without metro areas assigned were legitimately rural. For my more experienced colleagues, I’m sure this would have taken minutes. Because the Openprise jobs are already configured, they can run again any time—or set to run continuously if the data tables were updated frequently. There’s no need to recreate everything if there are data changes, very unlike Excel spreadsheets which would need constant adjustments with data updates.
Most importantly, this experience taught me just how many different data points I could put together if I had the raw data. Product information siloed from marketing data? If you can create two tables with even only a single reference point in common, like email address, you can put together that data. Retail information about what is purchased where? If you can combine your point of sale data with your location data, you can analyze purchasing trends. Finance information isolated from your CRM? If you can bring it into a spreadsheet with the right account information, it can be related and analyzed in Openprise.
In case you’re wondering, Liechtenstein has only 14 postal codes. You know, for trivia night.