Europepolls

The largest open dataset of voting intention polling data for the European Union

github link

documentation link

europepolls is a dataset of country-level historical voting-intention polling data for the European Union (+Switzerland and UK). Typically polling data is available from ~2007 and later, however some countries have records going as far back as the early 1980s or early 1990s. The raw data was mined from wikipedia, and has been cleaned and edited so that the format is uniform across all countries.

europepolls aims to foster increased research into polling, inference and prediction of voting intentions. A particular aim is increasing the utilization of multiple data modalities such as text,video and audio by means of modern deep learning approaches. It is our belief that access to open voting intention polling data, specifically, properly infered voting intention trendlines, can have a big impact on estimating correlations between socioeconomical events and voting intentions. This in turn can lead to better predictions of voting intentions, and, hopefully, better governance.

:bookmark: Years covered.

Below is a detailed table of the years covered by the dataset. The mean year when records start is 2007, however there are notable deviations. The UK and Spain have publicly available records on wikipedia from 1983 and 1986 respectively, while Bulgaria and Latvia only have records since 2014.

| Country        | First Year | Final Year |
|----------------|------------|------------|
| Austria        | 2013       | 2022       |
| Belgium        | -          | 2022       |
| Bulgaria       | 2014       | 2022       |
| Croatia        | 2011       | 2022       |
| Cyprus         | 2008       | 2022       |
| Czech Republic | 2013       | 2022       |
| Denmark        | 2011       | 2022       |
| Estonia        | 2011       | 2022       |
| Finland        | 2011       | 2022       |
| France         | 2007       | 2022       |
| Germany        | 2009       | 2022       |
| Greece         | 2004       | 2022       |
| Hungary        | 2006       | 2022       |
| Ireland        | 2007       | 2022       |
| Italy          | 2006       | 2022       |
| Latvia         | 2014       | 2022       |
| Lithuania      | 2016       | 2022       |
| Luxembourg     | -          | 2022       |
| Malta          | 2013       | 2022       |
| Netherlands    | -          | 2022       |
| Norway         | 2013       | 2022       |
| Poland         | 1991       | 2022       |
| Portugal       | 1999       | 2022       |
| Romania        | 2012       | 2022       |
| Slovakia       | 2012       | 2022       |
| Slovenia       | 2008       | 2022       |
| Spain          | 1986       | 2022       |
| Sweden         | 2006       | 2022       |
| Switzerland    | -          | 2022       |
| UK             | 1983       | 2022       |

How does this compare with other available datasets? To the best of our knowledge, the two main other aggregators of polling data for the European Union are politico.eu and europeelects.eu. While politico.eu is useful for visual inspections of an infered polling timeseries that is calculated in-house, the raw polling data is not publicly available. At the same time, europeelects.eu publish their historical data, however records go only as far as 2018.

:blue_book: Format.

Each entry has the following format.


| Date                 | Polling Firm | Commissioner | Sample Size | Named Parties | Others     |
|----------------------|--------------|--------------|-------------|---------------|------------|
| :obj:pandas.datetime | :obj:str     | :obj:str     | :obj:float  | :obj:float    | :obj:float |

A brief explanation of each entry is given below.

  • Date: The date when the poll was conducted. In some cases both a start and an end date is reported in the raw data, corresponding to the start and the end of the polling. In this case we report the end date.
  • Polling Firm: The name of the firm conducting the poll.
  • Commissioner: The name of the public or private entity which commissioned the poll.
  • Sample Size: The size of the sample of the poll.
  • Named Parties: All parties with high enough polling numbers such that they are included by name in the poll.
  • Others: This entry includes the polling total of all parties polling too low to be included by name in the results.

Undecided voters are excluded when applicable. The results for each party are then normalized to reflect the polling percentage over all valid votes.