This year the fundamental principles of statistical work prepared by the United Nations Statistical Commission will be 30 years old. Over the years, their importance has grown in society. The fundamental principles provide a sound basis for the management and dissemination of information, especially in the public sector.
How on earth to identify reliable data?
These days, a huge array of answers can be found on the web to satisfy just about any data need. While the amount of data available has actually exploded, intentional dissemination of incorrect information has also grown immensely. So how can one know which data can be trusted?
Critical reading skills of data and the use of source criticism are offered to help in identifying reliable data. But what do they mean in practice and what should be taken into account when assessing the reliability of data? And how to assess the responsibility and openness of the producer of the data source?
Answers to this can be found in the UN's Fundamental Principles of Official Statistics that have their 30th anniversary this year. There are a total of ten principles that guide statistical authorities. It is instructed in the principles that ”to facilitate a correct interpretation of the data, the statistical agencies are to present information according to scientific standards on the sources, methods and procedures of the statistics.”
In practice, this means that detailed information should be given on statistics about how they have been compiled. In accordance with this principle, statistical authorities have for decades produced reliable data describing the methods and data source used. The principle can also be used to assess the reliability of other data generated in society.
Good description makes the use of data easier
A responsible producer of data can be identified from the fact that the producer attaches a description of the method to the disseminated data. By means of a clear description, users can assess the quality of data.
The quality and reliability of data consist of many parts. Firstly, it must be known how the data have been collected.
From where have the source data been obtained and how were they selected? Have the data been produced by random sampling or is the data source, for example, exhaustive register data? It is also good to know who maintains the register, how and when the data were obtained from it or how the sample was designed.
When we know where the data come from, we need to know how the data were compiled. How were the data processed and what methods were used to obtain the results? It is important to know the grounds for the selected methods and procedures.
A responsible data producer also gives information about limitations related to the use of data and possible errors contained in the data. For example, the data produced with a sample survey always include sampling error and a responsible data producer gives users an estimate of this, for example, in the form of a margin of error.
Another important part of data quality is how the data meet users’ needs. Users should know how and when the data are released and where the descriptions of the data compilation are available.
Check with the help of a list whether the data are reliable
The requirements described above can be summarised as a checklist. When selecting/looking for data on the web for a survey, thesis or report, please note the following:
- Who is the producer of the data?
- Are the data accompanied by a description of the data sources used, such as data collection or registers?
- Do the data describe the phenomenon as a whole or have data been collected from random respondents as a sampling?
- Is the processing of data described in more detail? For example, have missing data been replaced or have duplicate data been removed?
- Have mathematical models or other scientific methods been used in the processing and analysis of data?
- How are the data distributed to users? Are the prepared tables, reports and indicators clear and easy to use?
If you get clear answers to the questions above, the use of data is reliable.
The author works in Statistics Finland's Partnership and Ecosystem Relations service area.
Lue samasta aiheesta:
Alamme Suomessa lähestyä kriittistä pistettä, jossa datataloudesta tulee oikeasti merkittävää liiketoimintaa. Mikä sitten tekee menestyvän datapalvelun? Datamenestyjät -kilpailutöissä nousee esiin ainakin kaksi ominaisuutta: visualisointi ja oikeanlainen kysymyksen asettelu.
Information collected by statistical interviewers is intended for all of us. People who are selected for interviews of Statistics Finland help us draw a more accurate picture of our time and its characteristics.
Avoimen datan potentiaalia uuden digitalouden käyttövoimana ja mahdollistajana on vielä paljolti hyödyntämättä. Myös julkisten virastojen ja laitosten datapalveluiden on tulevaisuudessa oltava entistä ammattimaisempia.
Digitaalisen datan merkitys taloudessa, kulttuurissa ja viihteessä kasvaa koko ajan, ja tilastoinnin piirissä on vain hyvin pieni osa kaikesta nykymaailmassa syntyvästä datasta. Tämä haastaa tietoa tuottavat viranomaiset – millainen data on merkityksellistä kansalaisille, yrityksille tai päätöksenteolle?
Data on uusi öljy. Alustat on voittava liiketoimintamalli. Lausahdukset kertovat, kuinka merkittävänä dataa ja alustamaista liiketoimintaa pidetään. Mutta mistä oikein puhutaan?
The description of global value chains advances as international statistical projects are completed. According to the preliminary results of the OECD and WTO’s joint initiative TiVA (Trade in Value Added), Finland's dependency on international trade does not differ much from other OECD countries.