Docs Italia beta

Public documents, made digital.

This document was translated by a machine.

We want to make our country more efficient. We believe humans and machines should complement each other. Artificial Intelligence is the technology that will enable such symbiosis. This document has been translated using a mix of state-of-the-art machine translation and human-driven AI. The raw machine translation output has been edited by an automated system trained on millions of professionally corrected sentences. Finally, a human went through the document to make sure that no information had been lost.

This means leaving behind some stylistic improvements and potential errors. However, this AI-augmented approach to translation allowed us to prepare this English version at a fraction of the cost and time of the legacy translation process (this translation was made in a few days including the human review; we didn’t publish it right away because we had to convert it to reStructuredText in order to share it on GitHub and we had a ton of things to do before that!).

If you want to contribute with feedback and changes to the Three Year Plan for ICT in the Public Administration, visit the Github repository.

We remind you that only the Italian version approved every year by the Italian Government has legal value.

9. Data & Analytics Framework

The Data & Analytics Framework (DAF) is part of the activities aimed at enhancing the national public information heritage. DAF aims to develop and simplify the interoperability of public data between PAs, standardize and promote the dissemination of open data, optimize data analysis processes and generate knowledge. The idea is to open the world of public administration to the benefits offered by modern platforms for managing and analysing big data, acting in four main directions:

  • To significantly amplify the value of the information assets of the PA through the use of the big data technologies that help to create knowledge about the decision makers and drastically reduce the time of analysis. Horizontal scalability of these technologies can, in fact, extract information from the intersection of multiple data bases and process data in real-time allowing you to have more prospects of analysis on a given phenomenon, in a timely fashion;
  • Foster and optimize data exchange between PAs, minimizing transactional costs for access and use. It will be possible to overcome the one-to-one conventions scheme, which lead to multiple copies of the same data and allow standardized access to an ever-updated data;
  • To encourage the diffusion of open data and make it more effective. DAF, in fact, allows centralized and redistributed public data through APIs, ensuring standardization of formats and ways of reusing them on up-to-date data;
  • Foster explorational data analysis by teams of data scientists, both within the PA and at the central level, in order to improve knowledge of social phenomena. The analysis techniques used will also allow the development of “intelligent” applications that take advantage of regularity in data to provide services to citizens, businesses and public administrations;
  • Finally, the framework will allow the promotion of scientific research initiatives on issues of particular interest to the PA, encouraging collaboration with universities and research bodies.

The DAF will be structured in accordance with what is defined in the CAD and Interoperability Model, in Intangible Infrastructures and consistency with the requirements related to the Monitoring Functions of the Plan.

The DAF is based on a big data platform, composed of: a data lake, a set of data engines and tools for data communication.

In the data lake personal data such as: (i) data bases that PA generates in order to carry out its institutional mandate, in compliance with personal data protection regulations; (ii) data generated by Public Administration’s IT systems such as logs and usage data that do not fall under the previous definition; (iii) authorized data from the web and social networks of potential public interest, are stored.

The Big Data Engines are useful for harmonizing and processing, both in batch mode and real-time, the raw data stored in the Data lake, and implementing models of machine learning.

Lastly, data communication tools are useful in favouring the use of data processed by stakeholders, including through APIs that display data and functionality to third-party applications.

The implementation and subsequent management of the DAF is entrusted to the BDT-PA, or PA’s Big Data Team, a team consisting of Data scientists, Big data architects and domain experts who provide the conceptual design and evolution of the Big data platform, the construction of interconnection models of different data sources, data analysis, and development of machine learning, the coordination of the development of Data applications and the organisation of scientific “competitions” on issues of interest to the PA.

9.1. The current situation

To date there is no public administration framework of analysis, standardization and interchange of public data, which favours the definition and monitoring of data-driven policies at the same time. Already from 2013, AgID has verified the possibility of using this type of tools in the specific domain of Public Administration through numerous experimental initiatives conducted in collaboration with national research institutes and various Italian universities within the project Italia.gov.it, the digital administration engine.

In recent years, big data technologies have matured to such an extent that they are used not only in the production environments of major IT companies (e.g. Google, Facebook, Twitter, Linked-In), but also those of banks, insurance companies, lotteries and betting operators, trading companies. Consequently, new professional profiles have emerged, such as Data scientists and the Big data architects, whose skills are considered to be necessary for the governance and the use of Big data.

As regards the exchange of data between PAs, the present scenario still sees the widespread practice of direct accords and agreements between PAs to regulate the exchange of data necessary for the conduct of institutional activities. This practice is not scalable and limits the sharing of public sector information.

9.2. Strategic objectives

  • Enhance the wealth of public administration information by facilitating access to data by PAs and encouraging the creation of both central and federated analysis teams.
  • Focus on quality and standardization of data. DAF, in fact, is the operational tool that allows to coordinate the efforts described in paragraph 4.1 “PA Data”, focusing on the processes of generation, management, updating and dissemination of data.
  • Facilitate the development and diffusion of open data and API economy, through which civil society can reuse, in compliance with the law, the wealth of public information and create new business opportunities. To this end, DAF will develop standardized APIs on up-to-date databases to help build applications and services to the citizen.
  • Encourage collaborations with universities and research bodies. They will be given access to a sandbox containing meaningful samples of appropriately anonymized data, to stimulate research and create useful knowledge for the community.
  • Encourage data exchange between Public Administrations, overcoming the limitations of the current practice of access to data based on conventions between individual administrations.
  • Rationalize the resources involved in data exchange and in analytics, including Data warehouse and Business intelligence. These initiatives, often uncoordinated, are often characterized by high licensing costs and hardware and they have a high tendency to respond many times to the same need.
  • Provide tools that measure in a timely manner the progress of the implementation of the Plan and allow integrative or corrective actions based on data-driven logic to be identified.

9.3. Lines of action

The DAF, as said, is based on the development of a Big data platform and on the establishment of a team of Data scientists, big data architects and Data engineers. The Big Data Team of the PA, set up within the Digital Team, has the task of actively managing the phase of conceptual and implementation development of the infrastructure, along with all phases of the life cycle of the data, from ingestion to analysis and application development. In addition, BDT-PA will develop technology and project partnerships between the PAs involved.

The BDT-PA designs and defines the implementation and use of the PA big data platform by:

  • Identifying the governance model that provides a leadership and control role by the Digital Team, in collaboration with AgID and paying attention to Privacy;
  • Planning any regulatory adjustments that would facilitate the implementation of the project;
  • The definition of the data sources of the Data lake and their modalities of population. These will be included in the guidelines produced under the Interoperability Model;
  • The definition of the logical architecture of the platform and the identification of implementing technologies;
  • The identification of information needs useful to the definition of Data driven policy and the realization of related analytical tools;
  • The use of public and private cloud for storage and computing;
  • The involvement of the scientific community for the promotion of initiatives aimed at conducting research activities on issues of interest to the PA;
  • The usage and consultation directives.

Over the next few months, BDT-PA will release the DAF Development Plan. It will provide an incremental roll-out based on the agreements with the PA that the Digital Team is implementing. In particular, a phase of experimentation will be planned which will involve a group of selected central and local PAs and will have the objective to develop data exchange models and use cases with services for PAs, citizens and businesses.

The data in the DAF will also be used to synthesise useful knowledge of the monitoring activities described in Chapter 10 “Management of Change”: in this regard, AgID and the Digital Team will provide tools that will complement the tool kit described in the action line ” Tools for Monitoring the Implementation of the Plan “of Chapter 10.

Subject Definition and implementation of the development plan of the experimental phase of the Data & Analytics Framework
Time Frames By December 2017
Players AgID, Digital Team
Description

Identify the governance model of the DAF and the PAs that will be part of the testing phase. Defining the platform architecture and its evolutionary roadmap. Definition of use cases for the development of services for Public Administrations, Citizens and Businesses. This activity is coordinated with the Guarantee of Privacy.

Implementation of the technological infrastructure, consistent with the development plan of the DAF pilot phase, which implements all the components necessary for the operation of the Platform.

Result

DAF Development Plan (Release date: June 2017).

Big Data Cluster and Component Testing and Use Cases (Release date: December 2017).

Subject Data Ingestion in the DAF - Experimental Phase
Time Frames From June 2017 to December 2017
Players AgID, Digital Team
Description Definition of the data to be included in the project during the experimental phase and putting into operation the extraction and ingestion procedures. Defining the relations between the DAF manager and the PAs involved in the initiative.
Result

Regulating relationships with PAs.

Standard operational definition in compliance with privacy standards.

Definition of data ingestion procedures in the platform.

Supply of DAF (release date: December 2017).

Subject Putting DAF into Production
Time Frames From January 2018
Players Digital Team, AgID, PA
Description

The Digital Team and AgID will set up procedures for the future owner of the DAF that will manage the operation and evolution of the project.

The owner of DAF will take care of interactions with PAs to define plans for incorporating their databases and usage cases. PAs from time to time will define how to ingest data and how DAF uses its activities.

Result

Substitute plan.

Current operation (release date: to be defined).

Subject Implementation of support tools for monitoring the Plan
Time Frames From April 2017
Players AgID, Digital Team
Description The Digital Team and AgID provide tools that, based on the information contained in the DAF, provide useful information for the Plan monitoring activities described in Chapter 10 “Managing Change”.
Result Plan Monitoring Support Tools (Since January 2018)