Data Integration

Data Integration

The role of data integration in helping to produce an effective official statistical system is becoming increasingly apparent. The process of bringing together information from different sources paves the way for a broader range of questions to be answered. Through integration it becomes possible to examine underlying relationships between various cross-sections of society, thus improving our knowledge and understanding about a particular subject.

What is data integration?

Data integration can occur in many different ways with different levels of data. Here, data integration involves taking two or more different data sources and finding information about the same record. It relies on common data fields being present on different files.

Why integrate?

Linking information from different sources allows the examination of relationships not previously able to be considered. Data integration offers a less time consuming and less costly alternative than other investigative methods such as surveys. It also reduces respondent burden by making more effective use of existing data sources.

Privacy issues

Data integration raises privacy and confidentiality concerns. Integrating data uses information beyond the purposes for which it was originally obtained. Privacy implications exist because the range of information available about an individual will be greater than would have been considered in any of the original data collections. Furthermore, data intended for statistical use does not require identifying information, but to enable linking to proceed, it is necessary to use identifiers.

In recognition of genuine privacy concerns and sensitivity from the public regarding the linking of records, the Government has directed that:

"Where datasets are integrated across agencies from information collected for unrelated purposes, Statistics New Zealand should be custodian of these datasets in order to ensure public confidence in the protection of individual records".

For data integration projects undertaken by Statistics New Zealand, managing privacy issues while achieving the benefits of linking data is done by ensuring that:

  • The linked data is only used for the production of official statistics and approved statistical research.
  • The project is authorised by the custodians of the data sources being brought together and maintains the integrity of the source data collections.
  • Linking of records is carried out by staff at Statistics New Zealand in accordance with the Statistics Act, the Privacy Act, other relevant legislation and Statistics New Zealand's data integration protocol.
  • The project does not put at risk public trust in the methods used by Statistics New Zealand.
  • There is an assessment of privacy risks through a Privacy Impact Assessment.

Data integration projects

Approval for any data linking rests with the Government Statistician and is not delegated.

In addition to these projects, Statistics New Zealand has been using tax sourced administrative data to produce a range of outputs for many years.

Looking ahead

Data integration is a relatively new area of statistics. As a result, much of the work at Statistics New Zealand to date has involved extensive investigation for upcoming projects. Areas to concentrate on include investigating the issue of mismatches, looking at how matching variables change across years and how this affects longitudinal linking, assessing the quality of integrated records and subsequent analysis based on them, dealing with ongoing privacy issues, and developing knowledge in all areas of data integration theory.

For enquiries and information requests

Page last modified on Tuesday, 14 October 2008