Garante per la protezione dei dati personali (Italy) - 9921184

Garante per la protezione dei dati personali - 9921184

Authority:	Garante per la protezione dei dati personali (Italy)
Jurisdiction:	Italy
Relevant Law:	Article 5(2) GDPR Article 24 GDPR Article 25 GDPR Article 32 GDPR
Type:	Investigation
Outcome:	Violation Found
Started:
Decided:	08.06.2023
Published:
Fine:	n/a
Parties:	Istituto Nazionale di Statistica
National Case Number/Name:	9921184
European Case Law Identifier:	n/a
Appeal:	Unknown
Original Language(s):	Italian
Original Source:	Garante (in IT)
Initial Contributor:	Bernardo Armentano

The Italian DPA investigated the processing of personal data carried out by the Italian National Institute of Statistics for the creation of a permanent census. The DPA found violations of Articles 5(2), 24, 25 and 32 GDPR and issued an order for the implementation of corrective measures.

English Summary

Facts

In 2020, the Italian DPA issued Provision n. 10 authorizing the processing of personal data necessary for the creation of a permanent census as part of the national statistical program carried out by the Italian National Institute of Statistics, the controller. While authorizing the creation of the permanent census, the DPA ordered the controller to implement the following security and data protection measures:

a) adopt a pseudonymisation technique [Article 5(1)(f) GDPR]; b) apply the principles of data minimization [Article 5(1)(c) GDPR], purpose limitation [Article 5(1)(b) GDPR] and storage limitation [Article 5(1)(e) GDPR]; c) and to implement privacy by design and by default [Article 25 GDPR].

The provision also addressed specific critical issues in relation to the attribution of a unique code that identifies the individual in the various databases, requiring the controller to implement a rotating decoupling mechanism of pseudonymised codes.

Considering the complexity of these measures, the DPA found that it was necessary to follow their implementation so that compliance with data protection principles was ensured. Therefore, it opened an investigation on the controller based on Article 58(1) (a), (e) and (f) GDPR.

Holding

While recognizing the difficulties in implementing the measures and the efforts undertaken by the controller, the DPA found some violations of the GDPR.

First, the DPA noted that the controller was classifying the collected data in three categories: "identifying data" (residence, contact information, etc.), "thematic data" (income, education, work, etc.) and "sensitive thematic data" (sexual orientation, health, etc.). However, it found that the controller was not implementing specific security measures according to this classification. In the DPA's view, it was necessary to consider the nature of the data in the risk analysis to implement safeguards that are adequate to the identified risks. By failing to do so, the controller violated the principle of accountability and the obligations of privacy by design and by default provided for by Articles 5(2) and 25 GDPR.

Second, the DPA noted that the controller uses a "specific integration domain" (SID), from where the data necessary for the pursuit of a specific statistical purpose are retrieved. In essence, a subset of information can be extracted from a master table based on specific selection criteria. To make the integration domains independent from each other, the controller uses hierarchical pseudonymisation. When the data is collected, it is assigned a universal pseudonym. This pseudonym is then translated through a cryptographic function into different pseudonyms for the different integration domains. A specific unique encryption key is assigned to each specific integration domain and each domain has a different encryption key.

In the current testing phase, the requests formulated by individual statisticians (analysts) are sent to the competent central directors, who are responsible for assessing and aproving the requests when they are pertinent. According to the DPA, this system leaves the application of the minimization principle to the evaluation of the directors and, therefore, to mere organizational measures. Moreover, the DPA found critical issues regarding the impossibility of automatically detecting anomalies in the SIDs such as the singularity issue, which refers to the possibility of combining unique attributes of a single subject. The DPA deemed necessary to implement automatic detection measures, so that the controller can be immediately aware of the risks to which data subjects are exposed. Therefore, it found a violation of Articles 5(2), 24 and 25 GDPR.

Third, the DPA found that the pseudonymisation system implemented by the controller calculates a hash code for each statistical unit that flows into the integrated microdata system, which is a suitable technique to ensure the uniqueness of pseudonyms on a formal level. However, considering that the creation of the secret key used for the generation of pseudonyms is linked to an extremely volatile parameter (the exact time with milliseconds), the "controlled reversibility" of the pseudonymisation technique is inapplicable on a concrete level. According to the DPA, the technique would only be applicable if the controller kept track of the exact time at which each transformation took place. The DPA also stresses that the pseudonymisation carried out by the controller does not yet imply the use of random secret keys diversified by statistical unit. Consequently, in the event of a security incident, even on a single statistical unit, the integration domain would be completely compromised once the syntax of the data is known.

Finally, the DPA also found that the prescribed hierarchical decoupling mechanism of the codes in the various databases and its rotation over time had not yet been implemented. Therefore, the DPA considered that it was not possible to verify that two different pseudonyms deriving from the same master pseudonym code possibly contained in two different integration domains may refer to the same statistical unit. The DPA concluded that this violated Articles 25 and 32 GDPR.

For the above reasons, the DPA held that the processing of personal data by the controller infringed Articles 5(2), 24, 25 and 32 GDPR and ordered the following corrective measures to bring the activities into compliance:

a) comprehensively complete the source classification process, associating each class with the corresponding level of measurable risk of re-identification of the interested parties whose data are processed in each survey and the technical-organisational privacy by design and security measures capable to mitigate this risk;

b) implement technical measures which, within each integration domain, and before proceeding with the survey, identify the possible presence of singularities on specific combinations of attributes configurable ex-ante and their percentage incidence in the sample, and which signal these with an alert occurrences to the analyst;

c) develop an internal policy for the management of these singularities by the analysts responsible for carrying out the statistical survey, which possibly includes the possibility of reconfiguring the sample through the use of generalization techniques in order to eliminate, or in any case reduce, the incidence of these singularities;

d) refine the process of "controlled reversibility" of derived pseudonyms, using a secret key that is more controllable than the millisecond time (e.g. a cryptographic salt) and storing these keys separately, as required by the definition of pseudonymization referred to in the Article 4(5) GDPR;

e) use the master SIM pseudonym also for automated consistency checks on newly created statistical units (for example, uniqueness and accuracy checks) in place of the traditional "ictu oculi" checks on direct identifying attributes (such as name, surname, tax code), still widely used and a likely source of errors in the data entry process.

Comment

Share your comments here!

Further Resources

Share blogs or news articles here!

English Machine Translation of the Decision

The decision below is a machine translation of the Italian original. Please refer to the Italian original for more details.

[doc. web no. 9921184]

Provision of 8 June 2023

Register of measures
no. 337 of 8 June 2023

THE GUARANTOR FOR THE PROTECTION OF PERSONAL DATA

IN today's meeting, which was attended by prof. Pasquale Stanzione, president, prof.ssa Ginevra Cerrina Feroni, vice president, dr. Agostino Ghiglia and the lawyer Guido Scorza, components, and the cons. Fabio Mattei, general secretary;

HAVING REGARD TO Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 April 2016 on the protection of natural persons with regard to the processing of personal data, as well as on the free circulation of such data and repealing Directive 95/46 /CE-General Data Protection Regulation (hereinafter "Regulation");

HAVING REGARD TO Legislative Decree 30 June 2003, n. 196 containing the "Code regarding the protection of personal data" (hereinafter "Code");

HAVING REGARD TO Legislative Decree 6 September 1989, n. 322, containing the "Rules on the National Statistical System and on the reorganization of the National Institute of Statistics" and in particular, the art. 6-bis of the same decree;

HAVING REGARD to the "Ethical rules for processing for statistical or scientific research purposes carried out within the National Statistical System", Annex A.4 to the Code (hereinafter "Ethical rules") and in particular, Articles 4-bis and 6, paragraph 2, of the same Deontological Rules;

HAVING REGARD TO the provision of 23 January 2020, with which the Guarantor completed the process of authorizing the processing of personal data necessary for the creation of the permanent census (web doc. 9261093)

HAVING REGARD to the documentation in the deeds;

GIVEN the observations made by the Secretary General pursuant to art. 15 of the Regulation of the Guarantor n. 1/2000 on the organization and functioning of the office of the Guarantor for the protection of personal data, in www.gpdp.it doc. web no. 1098801;

SPEAKER the lawyer Guido Scorza

WHEREAS

1. Premise

The Office, by virtue of the service order n. XX, note of the XX, in the XX days, carried out inspections pursuant to art. 58, par. 1, lit. a), e) and f) of the Regulation and of the articles 157 and 158 of the Code, as well as pursuant to articles 21 and 22 of Regulation no. 1/2019 of the Guarantor for the protection of personal data (G.U. n. 106 of 8 May 2019), at the National Institute of Statistics (hereinafter "Istat" or "Institute"), in order to verify compliance with the provisions on the protection of personal data, with specific reference to the methods of implementation of the measures referred to in the Provision adopted by the Guarantor for the protection of personal data n. 10 of 23 January 2020” (web doc. 9261093).

With this provision, the Authority completed the process of authorizing the processing of personal data necessary for the creation of the permanent census, requiring Istat, among other things, to adopt pseudonymisation techniques suitable for ensuring the effective implementation of the principles of of personal data, in particular, minimization, purpose limitation and conservation, in compliance with the obligations of privacy by design and by default (articles 5, paragraph 2, letter b), c) and e) and 25 of the Regulation).

In the aforementioned provision, the Guarantor has, among other things, highlighted specific critical issues in relation to the attribution of a unique code that identifies the individual in the various Istat databases and has therefore ordered Istat to introduce a decoupling mechanism hierarchy of pseudonymised codes in the various databases and their rotation over time.

In fact, this measure makes it possible to achieve a double objective: on the one hand, to make the application of the aforementioned principles effective, since each pseudonymous code used in the individual surveys could not be directly linked to codes used in other surveys nor have a longer than the data retention time envisaged for each statistical work; on the other hand, making it possible to create integration domains, i.e. the interconnection of multiple tables in cases where this constitutes the institutional purpose of the survey and is permitted by law.

With the note of the XX (prot. n. XX) and related attached technical documentation, the Institute represented a multi-year plan for the evolution of the SIM (Microdata Integration System), which, although capable of strengthening the profiles of processing security, had not fully grasped the need for decoupling and rotation of pseudonymous codes expressed in the provision.

Following numerous informal discussions with the Office of the Guarantor, the Institute, on the XX date, in compliance with the aforementioned provision, sent the Guarantor a new version of the document called "Technological and organizational solutions to achieve full compliance of the Microdata Integration (SIM)”.

In considering that the provisions of the aforementioned provision have been complied with, with a note of the XX, prot. no. XX, it was noted that the proposed solutions were particularly articulated and complex and in some passages not without potentially critical issues for which further and specific investigations were deemed necessary.

The Institute was therefore invited to continue to represent to the Authority the solutions that will concretely, from time to time, be envisaged to complete the implementation of the program presented, sharing the need to continue, in line with the tasks of each institution, in the effective collaboration immediately undertaken, so that compliance with the principles applicable to the processing of personal data in the delicate sector of public statistics is concretely ensured.

With notes of the XX (prot. n. XX) and of the XX (prot. n. XX), Istat provided updates on the progress of the work to implement the changes to the SIM, as prescribed by the Guarantor in the aforementioned provision.

Nonetheless, also taking into account the complexity of the project and the difficulty of representing in writing in a complete and effective manner the activities actually carried out also from a technological point of view, the Office ordered the aforementioned inspection.

2. The inspection activity carried out

2.1. Inspection checks

The inspection initiative took place on the twentieth day during which the Institute illustrated the activities carried out by the Organization following the aforementioned provision, starting with a representation, documented on site, of the methods for classifying the sources of personal data acquired by the Institute in order to map all the variables present in them in a complete and exhaustive way.

2.2 Classification of data

Istat has shown that it has implemented the classification of some external administrative sources, e.g. INPS databases, and some internal archives produced in previous statistical works, taking into account the risks for fundamental rights and freedoms related to the processing of the personal data in question.

On this point, it was further specified that the taxonomy of the data used in the distinct classes, which concerns not only the variables but also the statistical units (natural or legal persons), can be defined both ex ante, i.e. exhaustively and applicable to each future situation, that ex post, i.e. variable each time, case by case, and that the Institute is moving towards an ex ante scheme by creating an initial classification of the sources currently applicable to the register of disabilities which will progressively concern all the variables contained in the different registers.

It also emerged that the variables used for the classification are divided into "identification data", such as, for example, location data (residence), contact data (telephone), other identification data (vehicle number plate) or "thematic data" such as, for example, variables related to education/training, income, work, family, lifestyle, and “sensitive thematic data”. In this regard, it was specified that, with reference to thematic variables of a sensitive nature, at the moment there is only a general classification attributable to the particular categories of personal data, pursuant to art. 9 of the Regulation. There is no second hierarchical level but a classification in this sense could also be envisaged in the future (sexual orientation, health and disability). The number of thematic variables is very numerous, the identifying ones are more limited. This classification concerns both external and internal sources of input produced by other statistical works. The classification process is currently manual and this activity has only been carried out on the disability register. About a quarter of external administrative and input sources will be completed by the end of 2022. By 2023, this activity will be completed for 90% of internal and external sources.

The classification of the variables will be instrumental not only for the correct application of the aforementioned provision but also allows for the improvement of the quality of the processing register (art. 30 of the Regulation), the internal organizational measures relating to the authorization profiles as well as the compilation of the PSN forms and the implementation of security measures, favoring the automation of processes and reducing the margins of error. The classification introduced makes it possible to apply different technical and organizational measures based on the type of data. In particular, identification data may be separated from the rest of the record layout (as envisaged in the definition of pseudonymisation), while for sensitive data it will be possible to apply specific security measures, based on the risk assessment, including encryption.

Istat has undertaken to produce a document illustrating which measures have been implemented or intends to implement with respect to the classifications already carried out. In this regard, it should be noted that on the 20th date, Istat sent a further document on the progress of work on the SIM (prot. no. 18647288), which however does not contain specific indications regarding the classification of the data.

2.3. Specific integration domain

The creation of specific integration domains constitutes the architrave of the measures indicated by the Institute to implement the aforementioned provision of the Guarantor with particular reference to the pseudonymisation of data. With a note of the XX, the Institute declared that it had "already started, within the system currently in use, the procedures aimed at the creation of the specific integration domains for the processing of personal data relating to statistical works: IST-02742 " Work register”; IST-02634 “Extended Register of Employment in Enterprises (Asia Employment)”; IST-01382 “Annual register of wages, hours and individual labor costs – RACLI”; and IST-02645 "Quantification of populations in territorial areas potentially at risk".

In this regard, during the inspection, Istat clarified that by "specific integration domain" we mean the place where the data necessary for the pursuit of a specific statistical purpose are present. In essence, it is an extraction of a subset of information from the master table with individual record layouts based on specific selection criteria. The phase of creating an integration domain takes the form of a formal request made by the directors in charge of this in which specific elements are inserted such as: assumptions of lawfulness of the processing (e.g. PSN), purpose - statistical purpose, necessary information contents and deadline within which the statistical purpose must be pursued. Hierarchical pseudonymisation is envisaged to ensure that the integration domains are independent/autonomous of each other.

As a preliminary step, when the data is acquired by Istat, the system provides for the assignment of a universal pseudonym. This pseudonym (SIM) is translated with a cryptographic function into different pseudonyms valid in the different integration domains. A specific unique encryption key is assigned to each specific integration domain (hereinafter DSI), and each domain has a different encryption key. In the current testing phase, the request formulated by individual statisticians (analysts) is sent to the competent central directors who approve it, assessing the pertinence and non-excess of the data requested with respect to the statistical purposes and taking into account the methodologies currently used. In other words, the application of the minimization principle is left, in relation to this processing operation, to the assessments of the managers and therefore to mere organizational measures.

However, it was specified that these organizational measures are accompanied by specific security measures. In fact, the new system envisages detailed "logging" measures aimed at tracing what is done on the systems at all times, also in order to bring out any anomalies (criticalities) such as "singularities" - "uniqueness" of records. In the event of an anomaly, the archive is placed in a "quarantine" condition to allow the analysis of critical issues.

The Institute has acknowledged that at the moment the system, by default, does not carry out any checks on the data extraction criteria. However, it is potentially capable of signaling the presence of critical issues (singularities), but the decision regarding their tolerability is left to the analyst. In this way, the analyst, at present, with the prior authorization of the competent director, can still proceed with the creation of the DSI.

Istat also specified that at the moment no technical measures are envisaged by default aimed at ensuring guarantees in this regard (automations to identify singularities or other critical issues), undertaking to make every assessment known to the Authority.

2.4. Data pseudonymization

The hierarchical pseudonymisation of the statistical units constitutes the set of technical and organizational measures indicated by the Institute to implement the aforementioned provision of the Authority in order to guarantee that the integration domains are independent/autonomous from each other.

In describing the current system and the evolutions achieved with the ongoing experimentation, Istat reiterated that when a single interested party is entered into the system, he receives a universal pseudonym or master (SIM). It is a progressive numeric code (a sequential number) and that aspect is being changed. About 110 million individual codes have now been created. Within the individual DSIs, each SIM is transformed into a new unique code. For this step, a unique encryption key is used within each DSI. The result is unique and cryptographically irreversible. However, the association between the SIM and the code is kept which allows, if necessary, to rejoin the code to the original SIM. The Institute is evaluating the possibility of introducing different encryption keys assigned to each pseudonym contained in the same DSI.

With specific reference to the period of validity of the coding of the pseudonyms and of the domain itself, it has been declared that the domain has its own predefined duration, at the end of which it ceases to exist as well as all the pseudonyms that are part of the same domain. The encryption key also ceases to exist at that point.

On the 20th day, during the inspection it was requested to be able to access the IT systems, in particular the Disability Register on which the Body declared that it had implemented the measures referred to in provision no. on an experimental basis. 10 of 23 January 2020. Specifically, it was asked to verify the entire process from the moment in which the personal data referring to the statistical unit enters the ISTAT systems for the generation of the master SIM code and, subsequently, how the subsequent new pseudonym code in the DSI. The experimentation of the DSI relating to the Disability Register was carried out using the current SIM platform.

With regard to the indexing of the statistical units, i.e. from the preliminary phase to the actual statistical analysis, necessary to ascertain the uniqueness of the records processed, it has been specified that the platform calculates a hash code obtained exclusively from the variables pertaining to the class of identification data (cf. Minutes of the XX). Due to the small portion of data considered in the calculation of the hash, this operation allows the state to verify the desired uniqueness, provided that such data, already in the collection phase, are in turn effectively unique.

Given these premises, the Institute confirmed that in the event of a security incident, even on a single statistical unit (note the syntax of the data), the database would be completely compromised, adding that, having verified the economic sustainability of the measure, the the future project could include a reflection on further security measures aimed at avoiding this risk and that the introduction of a different salt for each single statistical unit is among the options currently considered.

It was specified that the attribution of the hash does not coincide with the attribution of the master SIM code or of the pseudonym in the DSI. It has a purely operational value, since the hash is only used for consistency checks, i.e. it is used to confirm the accuracy of the data entered by comparing it with statistical units already present in the databases that will make up the domain of integration. For the disability register, the Institute has decided to renounce the treatment of new and not yet registered statistical units. Therefore, if the consistency check carried out via hash should show that a particular interested party has never been registered, it will not be entered in the register. Nevertheless, despite this intention, in practice the hash is not used for this consistency check as the verification of the presence of the statistical unit takes place in a "traditional" way through the comparison of direct identifiers. Therefore the hash (as part of the experimentation concerning the Disability Register - Ist-02748) is currently attributed only for a possible verification of the uniqueness of the records processed for the benefit of the quality of the data.

The Institute has represented that, in relation to the Disability Register, the identification data "PD_ID_DS" are separated from the sensitive thematic data "PD_Tem_DS". The encryption measure called "Transparent_data_encryption" (with AES_256 protocol with salt) is applied to the sensitive data contained in this domain, which makes the data unintelligible to anyone who is not authorized to access it. In this regard, it has been specified that sensitive and identifying data have not been physically separated, but only on a logical level, and that they are placed on the same "machine" which can be accessed via two password levels associated with two different users.

The assignment of the master code (SIM) takes place through a verification procedure, known as record linkage, with respect to the presence of a pre-existing progressive code for the statistical units already registered in the Register. For unregistered units, even if the procedure would allow for the generation of a new progressive code, this generation is not provided for.

In this regard, it was represented by the Institute that in an experimental phase a single DSI concerning the Disability Register was created, which currently contains the following databases (Individual Base Register, INPS Archive of disability certifications, Register of beneficiaries of pension treatments related to disability).

The circumstance that only one experimental integration domain was created made it impossible to view, during the inspection, the procedure which should make it possible to verify that two different pseudonyms contained in two different DSIs possibly refer to the same statistical unit.

2.5. Further documentation sent

With a note of the XX (prot. n. XX), following the aforementioned inspection, the Institute sent further documentation aimed at updating this Authority on the progress of the works on the SIM referring to the period July - October 2022.

Preliminarily, Istat reiterated that it would respect the commitment made "to provide for the adoption by 31/12/2022 of adequate pseudonymisation measures in compliance with provision no. 10 of 23/01/2020 for the statistical works IST-02742 "Work register"; IST-02634 “Extended Register of Employment in Enterprises (Asia Employment)”; IST-01382 “Annual register of wages, hours and individual labor costs – RACLI”; IST-02645 "Quantification of populations in territorial areas potentially at risk", believing "to be able in a short time to give concreteness and effectiveness, for the aforementioned statistical works, to the new pseudonymisation measures outlined in the aforementioned document, at least for what concerns the following aspects:

to. Classification of data, contained in administrative sources acquired from outside or in the products of other statistical works, foreseen as input for the statistical purposes of the aforementioned works.

b. Population of the primary integration domains according to data classification, through migration procedures of data already present in the current system to the new, suitably structured schemes.

c. Release of a user interface for managing data requests from users. The use of this interface must produce: i) a pdf document to be signed by the competent director/s for the necessary authorisations; ii) the metadata documenting the entire set of information contents to be inserted in the specific integration domain and which allow the automatic management of data release in the domain.

d. Release of a procedure which, following the request and the related authorizations, can automatically proceed with the release of the data requested in the specific integration domain".

In this regard, the Institute also stated that:

in carrying out the above, it will duly consider the "observations that emerged during the inspection regarding the robustness and coherence of the cryptographic codings adopted for the reunification of thematic data with universal pseudonyms between the various primary domains (point b) and for the generation of domain pseudonyms starting from the universal ones with the requirement of hierarchical reversibility (point d)";

"on the basis of the observations that emerged during the inspection, the list of functional requirements for the final version was integrated in order to perfect the privacy by default approach in the management phase of the request for information content to be included in the specific domain: i ) pre-definition of transformations of variables that reduce the identification potential of the original variables (e.g. "year of birth" instead of "date of birth"); ii) identification of methods for quantifying the "singularities" in the requested data and for recognizing the variables most responsible for these singularities".

Istat has also attached to the note referred to above the updated timetable on the progress of the works and a document called "XX".

3. The alleged violations

On the basis of the elements acquired in the context of the preliminary investigation and the inspection activity carried out, the Office - with an act of the XX (prot. n. XX), which must be understood as reproduced in full here - initiated, pursuant to art. 166, paragraph 5 of the Code, with reference to the specific situations of illegality referred to therein, a procedure for the adoption of the provisions pursuant to art. 58, par. 2 of the Regulation, against Istat, inviting it to produce defensive writings or documents to the Guarantor or to ask to be heard by the Authority (art. 166, paragraphs 6 and 7, of the Code, as well as art. 18, paragraph 1, law n. 689 of 24 November 1981).

In particular, with the aforementioned deed, the Office notified the Institute that on the basis of the elements acquired and the facts that emerged following the inspection and preliminary investigation, as well as the subsequent assessments, it is ascertained that the processing of personal data carried out in the scope of the DSI relating to the Disability Register - Ist-02748 were carried out in violation of articles 5, par. 2, 24, 25 and 32 of the Regulation and that Istat has put in place, from both a qualitative and quantitative point of view, a series of measures far lower than those described in the Document, whose time schedule "generally ” indicates the month of December 2022 as the deadline within which to conclude “the redesign of the SIM system”, putting oneself in the position of being able to probably violate the provisions formulated by the Guarantor in provision no. 10 of 23 January 2020.

More specifically, the following was found in relation to each of the disputed violations.

3.1. Data classification

Preliminarily, it was clarified that data classification is necessary to associate each type of data class (e.g. identification, common and particular data) with a risk level, therefore the more detailed it is, the more precise it can be. the definition of risk. From the classification (resulting on the basis of the documentation transmitted, quite detailed) the Institute could in fact implement differentiated security measures, such as, for example, the use of particular cryptographic techniques or the adoption of specific authorization profiles. This activity is particularly necessary in the presence of a large amount of data, such as that being processed by the Institute. Each statistical unit is in fact characterized by the presence of a significant number of variables, some of an extremely sensitive nature, which only thanks to a complete classification can the effectiveness of the protection be guaranteed, obtained through the application of technical and organizational measures devised in such a way to be gradually more effective as the level of risk to which the data subjects are exposed increases.

In this regard, it should also be noted that the same Institute, in the aforementioned note of the XX, in particular in the document "Technological and organizational solutions to achieve full compliance of the Microdata Integration System (SIM)" update of 4 February 2021", in describing the activities implemented to implement the aforementioned provision of the Guarantor of 23 January 2020, he declared that:

“The main novelties of the project concern the introduction of the data life cycle as an element characterizing the entire solution and the adoption of additional security measures at the ecosystem level identified in the risk analysis on statistical processes with a high impact in relation to the processing of personal data. These measures are characterized by active masking of data, dynamic protection against access operations and crossing of identifiers in incompatible domains of use, automatic classification of metadata right from the early stages of the acquisition process and the introduction of distributed consensus techniques to complete the standard security measures already adopted”:

“The acquisition phase is based on inferential rules and is able to select the right treatment for each type of microdata and for each family of metadata. It will be possible to proceed with the protection of personal information already in the data acquisition phase by defining a first pseudonymisation domain and, within this context, the treatment specifications for each family of metadata.

The control of any anomalies will take place in real time, signaling any violations of constraints through specific application interfaces. The domain classification carried out starting from the content in terms of metadata of the source, will allow the updating of the catalog of Sources and in particular of the relative compatibility of sets of metadata due to the possible presence of singularities which will be documented and traceable”.

On the merits, following the inspection, it was found, firstly, that the classification activity concerned only the statistical work on the Disability Register - Ist-02748 and, secondly, that at present, it disregards by a precise assessment of the risks related to the treatments carried out and is not consequently corroborated by specific measures aimed at effectively implementing the principles of data protection in relation to the risks identified, this in violation of the obligations of privacy by design and by default (art. 25 of the Regulation). In other words, in the state of the activities carried out, the operation of classification of the sources appears more functional to specific statistical needs than as a preliminary activity to the identification of appropriate measures with respect to the risks related to the processing of certain types of data.

3.2. Data minimization in specific integration domains

As a preliminary point, having concretely verified the DSI relating to the Disability Register - Ist-02748, it should be noted that the effective application of the principle of data minimization must be guaranteed in these contexts both through a punctual verification of the information necessary for the statistical purpose and intended to be pursued through the DSI, as well as their level of aggregation (whenever the processing of information referring to a single statistical unit is disproportionate), and through a suitable pseudonymisation technique. The data controller is therefore required to identify, in accordance with the state of the art technology, by design and by default, measures that allow the effective application of the minimization principle.

It is therefore ascertained that in the management of specific integration domains, in homage to the principle of accountability and data protection from the design and by default, the presence of specific technical measures aimed at guaranteeing the effective application of the principle of minimization. In fact, although the Institute has implemented some organizational security measures, it has not envisaged that data minimization is carried out from the beginning by setting the infrastructure used for the creation of the DSI from a technological point of view. In particular, it is not able to detect anomalies such as singularities (to be understood as a unique combination of attributes referable to a single subject). The automatic detection of anomalies would allow the Institute to be immediately aware of the risks to which the interested parties, object of the investigation, are exposed and to make the necessary strategic determinations derive from this, such as the use of statistics in a more aggregated form, or , where this is not in line with the statistical purpose pursued, the acceptance of the risk, subject to the adoption of specific protection measures and documentation of the motivation which may have led the Institute not to continue further with risk mitigation.

The fact that the Institute has an appropriate legal basis for the processing of the personal data necessary for carrying out the statistical work contained in the national statistical program does not exclude the fact that this must guarantee compliance with the principle of minimization also by avoiding the processing of any singularity and favoring that of aggregate information, when this is compatible with the statistical purpose pursued.

Given all of the above, it is ascertained by this Institute, the violation of articles 5, par. 2, 24 and 25 of the Regulation.

3.3. Data pseudonymization

With respect to the requirement to introduce a hierarchical pseudonymisation system between different DSIs also aimed at implementing the principles of purpose limitation and conservation, since the Institute has currently created a single DSI referring to the Disability Register, it was not possible to verify, almost 3 years after the adoption of the aforementioned provision, that two different pseudonyms contained in two different DSIs possibly refer to the same statistical unit.

Furthermore, taking into account the implementation methods of data pseudonymisation within the single DSI developed and the security measures implemented, the violation of articles 25 and 32 of the Regulation, as the platform calculates a hash code obtained from the variables pertaining to the class of identification data only and by default does not include random elements in the calculation (e.g. salt). Therefore, in the event of a security incident, even on a single statistical unit (note the syntax of the data), the compromising of the database would be total.

4. The defensive memories

With a note of the XX, the Institute sent its defense briefs and the relative attached documentation, not asking to be heard at a hearing, pursuant to art. 166, paragraph 5 of the Code.

In the aforementioned documents, the Institute has preliminarily illustrated the general context from which the objections of the Office of the Guarantor of the XX arise.

In particular, it was represented that the provision of the Guarantor n. 10 of 23 January 2020, slightly preceded the pandemic period which began “with the issue of the resolution of the Council of Ministers of 31 January, G.U. no. 26/2020” then continued for many months. This "Situation which, both at a regulatory level, as well as in some jurisdictional arrests, has been recognized as a circumstance to be taken into account in the assessment of responsibility in the fulfillment of an obligation, as it is attributable to the concept of force majeure".

Nevertheless "without prejudice to what will be stated in detail below regarding what this Institute has in any case actually put in place for the purposes of transposing the indications contained in the aforementioned provision of 01.23.2020 - a circumstance which demonstrates the fulfillment by the Administration of every useful effort and the observance of diligent conduct - it seems reasonable, for the purposes of a fair assessment of the overall matter covered by this assessment, to pay attention to the inevitable organizational difficulties resulting from the health emergency".

In this period, the Institute had to actively face - like other Administrations - internal organizational criticalities having to "adapt, also from a technical point of view, the entity's processes and procedures and which naturally also had an impact on the process of such a complex such as the one in question". In addition, in the specific case of Istat, "the Administration, [...] was also called upon to give its contribution to the country in that dramatic situation, providing its information and knowledge support for the purpose of monitoring the progress of the epidemic and the effectiveness of the contrasting health measures adopted by the Government. Heavy additional task to the ordinary and extensive activities of Istat, which gave rise to further organizational stresses that may have had an impact on organizational efficiency and influenced its optimization". In the light of this, Istat has highlighted that, taking into account the framework applicable to the disputes in question, "the following ruling by the Supreme Court appears to be relevant to the present case: "on the subject of administrative sanctions, unforeseeable circumstances and force majeure, although not expressly mentioned by the law of 24 November 1981, n. 689, must be considered implicitly included in the provision of art. 3 of it and exclude the agent's liability" (Civil Cassation, section II, April 29, 2010, n. 10343)". Istat also represented that "the tender procedures for the supplier of application services and for the software platform to be used in the project have been launched and completed" and that during 2021 "the work of drafting user requirements continued and conducting the experimentation mentioned in point 2 of the aforementioned work progress note, conducted using the technologies already available at the Institute".

With specific reference to the objections raised by the Office in the deed of initiation of the sanctioning procedure pursuant to art. 166, paragraph 5 of the Code, Istat represented in particular the following.

4.1. Data classification

Istat, in producing a document called "Annex 1 (classificazioniSources), declared that:

“The [data classification] work completed […] by 31 December 2022 concerns 34 administrative archives containing personal data out of a total of 129 (26%). The same annex also provides a detailed account of the archives containing personal data which must be classified by 31/12/2022";

“In compliance with the principles of privacy by design and privacy by default pursuant to art. 25 of the Regulation, the assessment of the risks related to the treatments carried out was implemented, taking into consideration the classification referred to in Annex 1. The latter was prepared by the Institute in order to guarantee the protection of the rights of the interested party obtained through the application of technical and organizational measures appropriate to the level of risk (see Annex 2 "Risk analysis");

“These activities demonstrate the Institute's ongoing commitment to fulfill the provisions of art. 25 of the Regulation […] and by the guidelines n. 4/2019 of the European Data Protection Board".

4.2. Data minimization in specific integration domains

In relation to compliance with the data minimization principle, Istat highlighted that "the technical and organizational measures already implemented or being implemented within the Institute" declaring that:

"the minimization of data represents one of the factors of greater complexity, since it is in the application of this principle, more than others, that the balance between the production of official quality statistics (principle of accuracy) and respect for the rights and freedom of the interested parties (principle of minimisation). In fact, in line with the Specific Integration Domain (DSI) approach and taking into account that Istat adopts a register-based statistical production approach, in which the official statistics produced require the exhaustiveness of the units included in the sources administrative procedures, the minimization can possibly be applied only to the variables of each record, and unless this prevents the achievement of the statistical purpose. It is precisely the construction of the DSI that makes it possible to eliminate all the singularities that are not relevant to the statistical purpose pursued and which will not be used in the subsequent phases of the statistical process. In order to incorporate the observations received from the Authority regarding the issue of singularity, Istat has in any case activated a study to identify methods and strategies in order to arrive at an assessment of this risk in an on-the-fly mode, so as to make the outcome available to the applicants before confirming the request”;

the classification of sources (Annex 1) and the related risk analysis (Annex 2), applied to the version currently in use of the DSI (referring to the statistical work IST-02748 "Disability archive") has allowed Istat to develop in addition to organizational structures already present also "the definition of the information contents to be entered in the specific domain" which will be "defined and signed by the designated Director of the DSI through a specific technical request function which insists on the data catalog, in which all the archives are mapped and classified, at the single trace/dataset and single variable level, necessary for the achievement of the respective statistical purposes”;

“The Director who will insert the request supported by the relative legal basis, [...], will be able to define, [...], new variables as a function of the available ones, to be used in place of the original ones and reduce their re-identification potential. Before confirming the data request for the domain, you will receive a report on the classification of the requested data, as an indication for assessing the risk associated with data processing in the domain itself. In other words, the data request model delivered to the Authority during the inspection is accompanied, as requested by the Authority, by automated technical measures, in order to overcome the mere organizational measures applied".

4.3. Data pseudonymization

Istat, in once again representing the "complexity of this pseudonymisation project, not only due to the intrinsic difficulty of the process from an elaboration point of view, but above all because it involves considerable performance problems", highlighted that "for its implementation it required the need to perform various tests and a PoC (Proof of Concept) in Oracle environment to evaluate and estimate the computational needs and system parameters, which are reported in Annex 3. These measures will be implemented in the statistical work related to the " Disability register - Ist-02748" by June 2023 and the Authority will continue to be updated by Istat with periodic reports".

In relation to the specific domain pseudonyms, the Institute also represented that it had taken into account the findings that emerged during the inspections regarding the fragility of the keys generated for the "Disability Archive" domain and in this regard produced a document called "Pseudonymisation in the integrated microdata system (sim) (Annex 3) [to the defense briefs]" which describes in detail all the phases through which ISTAT associates a universal pseudonym, or master, with each statistical unit, which it is subsequently translated with a cryptographic function into several distinct pseudonyms, each valid within each integration domain. Each domain has a different encryption key and ISTAT maintains the association between the master and the derived pseudonyms, thus making it possible, if necessary, to reunite each derived pseudonym with its own master.

The hypothesized procedure is considered consistent with the indications provided by the ENISA agency in the recently published guidelines "Pseudonymisation Techniques and Best Practices" and "Recommendations on shaping technology according to GDPR provisions - An overview on data pseudonymisation", creating a form of " controlled reversibility” between pseudonyms. In other words, thanks to the use of a secret key known to the authorized subjects, it allows an easy "direct" transformation from the master pseudonym to the derived pseudonyms, while the "reverse" transformation from the derived pseudonyms to the master pseudonym by unauthorized subjects and not in possession of the secret key.

From an operational point of view, the generation of the master pseudonym takes place in two steps, the first consists in calculating the hash-based message authentication code (HMAC-SHA256) of each unique combination of attributes referable to each statistical unit, the second in their lexicographic ordering and in the subsequent transformation into a sequential identifier, which is ultimately the master pseudonym. To guarantee the uniqueness of the transformation and to avoid that a certain secret key can be used twice, this key is created starting from a random number generator which uses the current time per millisecond as a seed.

5. Submitted additional documentation

With a note of the XX (prot. n. XX), sent subsequently to the defense briefs, the Institute sent the Office the "State of Progress of Works as at 31 March 2023" relating to the "Microdata Integration System (SIM)" declaring that he "continued the classification of sources (Annex 1_ClassificazioneFonti), which currently consists of 43 administrative archives containing personal data out of a total of 129 (33%), according to the risk analysis carried out by Istat [...]" and also representing that "it is the pseudonymisation platform is in an advanced stage of development, as described in annexes 2 and 2A" of the aforementioned note.

Specifically, the Institute stated that: it has "released a preliminary version of the system (Microdata Release System - SI(RI)M) [...]", and that "it is at the same time automating activities as much as possible through the pseudonymisation (Integrated System for the Management of Administrative and Statistical Microdata - SIGMA) which will include everything contained in SI(RI)M as well as all the sources gradually classified by Istat and the other requirements envisaged by the project".

The objective of the "SI(RI)M - Microdata Release System [...] platform is to allow the creation of specific DSIs relating to the statistical works IST-02742 "Work Register"; IST-02634 “Extended Register of Employment in Enterprises (Asia Employment)”; IST-01382 “Annual register of wages, hours and individual labor costs – RACLI”; IST-02645 "Quantification of populations in territorial areas potentially at risk", in addition of course to the work IST-02748 "Disability Archive" and "could also gradually be used for the population of other DSIs (deriving from the sources classified in Annex 1) , until the definitive platform (SIGMA) is put into operation".

The definitive platform called "SIGMA" is under development and will introduce "a high degree of automation of the hierarchical data pseudonymisation process in order to minimize management activities that require manual intervention, taking into account the indications formulated by this Authority" . This new system - a first release of which is expected at the end of June 2023 - "will allow the following functions:

variable classification management […],

pseudonymization according to the algorithm described in the document "Annex 3_pseudonymization", produced to this Authority on the XX date,

feeding of the primary domains on the basis of the classification of the variables managed through the data catalogue, of which more information is provided below,

creation of specific integration domains,

management of the temporal validity of the pseudonym at multiple levels”.

In relation to the individual aspects of the System, the Institute represented the following.

5.1. Data classification

In this regard, Istat, in indicating the information that must be present in the classification of data, declared that:

"Data classification constitutes the main element in order to be able to carry out all the processes connected to the data themselves, from the construction of the primary domains to making data available to the end user in the specific integration domains [...] and which "acts as a supplier of detailed information on the archives and on the statistical products available in the system, as well as on the variables contained in them".

The Institute then described the phases that make up the function of populating the data catalogue.

5.2. Specific domains of integration and pseudonymisation

The Institute represented that the system, under construction, divides the data into primary domains by applying primary pseudonymisation on the basis of the algorithm already described above and presented in the document attached to the defense briefs "starting from the tables accompanied by the so-called current SIM code. In this phase there is a mapping table between the SIM code calculated in the past and the CSU SIM code (universal SIM code) calculated using the new pseudonymisation algorithm. This docking table will help you manage system backwards compatibility. The data divided into the primary domains will be accompanied only by the new pseudonym while the old one will remain only in the mapping table". Subsequently, in the subsequent phases of the project when the acquisition of flows from the outside will be re-engineered, the new CSU pseudonym will be calculated directly. “However, the association with the old code in the mapping table for backwards compatibility will remain. Once the aforementioned backwards compatibility is no longer needed, the currently used SIM code will also be deleted from the mapping table”. Istat has also classified the domains into: Identification data domain; Data domain referred to in Article 9 of the GDPR; Data domain pursuant to art.10 GDPR; Domain other thematic data. The first three domains are encrypted using TDE (Transparent Data Encryption) provided by the Oracle environment".

Istat also represented that "The adoption of additional security measures for the data pursuant to art. 9 GDPR".

In relation to the specific domains of integration, the Institute illustrated the methods for creating the aforementioned domains. Specifically, "the acquisition of data from the primary domains area by selecting from the catalog only the data of interest for the specific domain" is envisaged; and the implementation "of the secondary pseudonymisation mechanism and generation of the CSD (Domain Specific Code as described in the document called "Annex 3_pseudonymisation" which is an integral part of the defense briefs produced.

Finally, the Institute represented the security measures envisaged by the system, in particular the detailed log and tracing functions at all levels, some performed at an application level, others at a diagnostic level which will be available, "the former for application users while the latter for system administrators". Finally, it was shown that "the system includes an interface that allows you to automate and monitor the process of requesting data and populating the Specific Integration Domains, allowing data requests to be entered by the structures involved (referents of the statistical works ) and their formalization, against the endorsement with signature of the requesting management and the authorization to release. All the operations required for populating the specific integration domains will be traced at the application level and will therefore be verifiable in real time".

5.3 Data Retention

In this regard, the Institute has declared that "thanks to the pseudonymisation procedure referred to above, it is possible to apply different retention periods based on the specific domains connected to the statistical purposes". In particular, "Time validity will be managed at several levels:

retention time of the individual supplies of the acquired archives;

period of validity of the CSU universal SIM code which can be modified at the request of the system administrator;

time of validity of the specific integration domains which will be entered by the thematic referents at the time of the data request.

In the first case, the system will automatically detect the deadline reached and will notify the system administrator to proceed with the verification and deletion of data. These operations, having a great impact on the database, will be performed in a controlled and supervised manner.

As for the validity of specific domains, once the expiration is detected, the system will notify the data manager. The latter may extend the deadline only if in possession of the necessary authorisations, compatible with the provisions of the PSN in force. Otherwise, the system will proceed with the deletion of the data provided for the DSI in question, always tracing all the operations performed".

Finally, Istat represented that "improvements in terms of security measures are being evaluated which allow for: 1. verifying the quality of the data entered in order to avoid assigning characteristics to incorrect subjects, to guarantee the principle of accuracy; 2. increase the levels of control over the operations carried out by operators and administrators on the SIGMA system”.

5. The legislation on the protection of personal data

The processing of personal data, including that carried out in the context of official statistics, must take place in compliance with the Regulation and the Code.

"Personal data" means "any information relating to an identified or identifiable natural person ("data subject")". Furthermore, "an identifiable natural person is one who can be identified, directly or indirectly, with particular reference to an identifier such as a name, an identification number, location data, an online identifier or to one or more characteristic elements of his physical, physiological, genetic, psychic, economic, cultural or social identity" (art. 4, paragraph 1, n. 1 of the Regulation).

Pseudonymisation means: “the processing of personal data in such a way that the personal data can no longer be attributed to a specific data subject without the use of additional information, provided that such additional information is kept separately and subject to technical measures and organizational arrangements aimed at ensuring that such personal data are not attributed to an identified or identifiable natural person" (cons. 26 and art. 4 point 5). Pseudonymisation is an extremely important measure in the field of statistical research, in particular in order to ensure effective application of the principle of minimization (Article 5, paragraph 1, letter c) and 89 of the Regulation). In this regard, the Article 29 Working Party has highlighted that it serves "to reduce the correlation of a data set to the original identity of a data subject, and therefore represents a useful security measure (WP216, 05/2014 on anonymisation techniques adopted on 10 April 2014)" but certainly also and, it should be reiterated, a data minimization measure which implements and makes operational the principle of necessity (see par. 3.5. Guidelines 4/2019 on article 25 Data protection until by design and by default, Version 2.0 Adopted October 20, 2020).

Among the principles applicable to the treatment, it is worth highlighting here the one of responsibility (accountability), according to which "the data controller must comply and be able to demonstrate both compliance with the principles and fulfilments established by the Regulation" (Articles 5, paragraph 2, 24 and of the Regulation).

Connected to this is another duty placed on the data controller, namely that of ensuring that the law and regulations regarding the protection of the personal data of the data subjects are protected and applied from the design and by default (privacy by design and by default, Article 25 of the Regulation).

On the basis of the renewed regulatory framework on the protection of personal data, in fact, the data controllers are required to make a weighted assessment of all the choices connected to the processing of personal data, demonstrable on a logical level through specific reasons, aimed at identifying the necessary measures and proportionate to the actual effectiveness of the principle protected from time to time.

In compliance with the obligation of data protection from the design stage, the data controllers must also take an active conduct in the application of the principles, setting the goal of obtaining a real effect of protection. Therefore, the mere application of generic measures, not directly related to the purpose of protection, is not required, but qualitatively and quantitatively effective measures with respect to the objective and designed to be, if necessary, reviewed in relation to any increases or reductions in risks for the interested parties.

Where possible, these measures should include specific indicators aimed at unequivocally demonstrating their effectiveness. From this point of view, the aforementioned obligation to document the choices relating to the processing of personal data is considered fully fulfilled only if the owner is able to demonstrate, through performance indicators (qualitative and where possible, quantitative), the effectiveness of the measures ( see Guidelines 4/2019 on Article 25 Data Protection by Design and by Default Adopted on 13 November 2019 by the EDPB; provision of the Guarantor of 23 January 2020, web doc. 9261093).

Finally, it notes the principle of integrity and confidentiality of the data on the basis of which they must be "processed in such a way as to guarantee adequate security of personal data, including protection, through adequate technical and organizational measures, from unauthorized or illegal processing and from accidental loss, destruction or damage" to which the obligation for the holder to implement adequate technical and organizational measures is connected to guarantee a level of security appropriate to the risk taking into account the state of the art and implementation costs , as well as the nature, object, context and purpose of the processing, as well as the risk of varying probability and severity for the rights and freedoms of natural persons (articles 5, paragraph 2 letter f) and 32 of the regulation).

6. The outcome of the preliminary investigation: the violations ascertained

The Office challenged Istat that the processing of personal data carried out within the framework of the DSI relating to the Disability Register - Ist-02748 - the only DSI for which the provisions referred to in the provision of 23 January 2020- were carried out in violation of articles 5, par. 2, 24, 25 and 32 of the Regulation and that Istat has put in place, from a qualitative and quantitative point of view, a series of measures far lower than those described in the document presented for compliance with the aforementioned provision of the Guarantor, whose "general" time schedule indicated in December 2022 the deadline within which to conclude "the redesign of the SIM system", putting himself in the position of being able to probably violate the provisions formulated by the Guarantor in the aforementioned provision.

As a preliminary point, it is reiterated that the Institute has presented a very articulated and complex project which, in providing, in general terms, the measures necessary to comply with the prescriptions formulated by the Guarantor in the 2020 provision, which focuses, in summary, above all on the the need for the Institute to undertake to maintain an attitude more compliant with the principle of accountability from a methodological point of view, also by introducing technical and organizational measures to ensure by default the effective application of the principles regarding the protection of personal data, as well as on data pseudonymisation techniques in particular to ensure the effectiveness of the principles of minimization and limitation of conservation.

In this context, the creation of a single specific integration domain, unable to detect any anomalies ex ante, the classification of the data destined to flow into it, as well as pseudonymisation measures which provide for a hash code obtained from the variables pertaining to the class of identification data only and does not include random elements in the calculation by default (e.g. salt), are the only specific activities and measures that the Institute, after three years, intended to implement to implement the aforementioned provision of 23 January 2020.

The Guarantor naturally takes note of the organizational difficulties faced by the Institute during the emergency period caused by the Sars-Cov2 pandemic and of the extraordinary commitment required of him to contribute, for the profiles of his competence, to support the country in adequately dealing with this extraordinary situation.

Nonetheless, this circumstance cannot be considered as a force majeure suitable, in the present case, to exclude the Institute's liability, but rather to grade it.

The Institute must therefore consider itself, albeit in relation to the recognized need for graduation, responsible for its conduct since it acted in a manner that differs from the aforementioned principles regarding the protection of personal data, a circumstance which cannot be attributed to the health emergency (Cass 9738 of 2000; Court of Cassation, 14168 of 2002).

It is therefore confirmed that the Institute has acted in violation of the principle of accountability and of the obligation to protect data from the planning stage and by default, also placing itself in the position of being able to probably violate the provisions formulated by the Guarantor in the aforementioned provision of 23 January 2020.

In fact, regardless of the concrete difficulties that the Institute certainly faced during the pandemic, which justify the delay in completing the project presented to the Guarantor in compliance with the provision of 23 January 2020, it is believed that during the inspection it emerged that the Istat, even in the testing phase of the first DSI, did not bother to ensure effective application of the principles regarding the protection of personal data, not taking steps to identify specific and differentiated solutions, consistent with the state of the technological art and with the particular reference context.

Indeed, it appeared that Istat intended to borrow in the new dimension of the DSI technical or organizational processes and measures already in use, without carrying out the necessary revision and updating of the same to the technological state of the art as indicated pursuant to art. 58, par. 2, lit. a) of the Regulation, in the aforementioned provision.

It emerged that the Institute, through its information systems (for the profiles examined) does not ensure, by default, the correct application of the principles of personal data protection (in particular, purpose limitation, conservation and minimisation) .

It should be noted that the Institute has not adopted the proactive conduct required by the principle of accountability. In fact, it was unable to provide reasons for its choices that were not exclusively assertive without supporting them with specific assessments, in terms of their effectiveness in qualitative or quantitative terms.

This is in breach of the data controller's obligation to comply with the principles applicable to the processing and to be able to prove it.

In this context, taking into account the inspections carried out and the defense briefs provided by the data controller, which was acknowledged in paragraph 4 above, the following is highlighted in relation to the violations ascertained.

6.1 Classification of data

Istat has declared that by the end of this year (2023) it will complete the data classification activity and that, in compliance with the principles of privacy by design and privacy by default, pursuant to art. 25 of the Regulation, an assessment of the risks related to the treatments performed was carried out.

In taking positive note of the declared commitment, it is represented that the rationale for data classification, from the point of view of data protection, must be identified in the possibility of providing for specific and different measures (such as, for example, pseudonymisation, generalisation, etc.) in relation to the different type of data considered and the related estimated risk taking into account the state of the art, implementation costs, as well as the nature, scope, context and purposes of the processing.

At present, in relation to the data classes identified, the Institute has produced a document called "Annex 2 "Risk analysis, risk assessment related to the processing carried out which, in the section "Description of the security measures adopted", lists generic measures technical and organizational measures aimed at mitigating the risks related to the processing of the aforementioned data and in particular at guaranteeing the confidentiality and integrity of the data, pursuant to articles 5, par. 1, lit. f) and 32 of the Regulation and not the effective application of data protection principles (art. 25 of the Regulation and Guidelines 4/2019 on article 25 Data protection from design and by default, Version 2.0, Adopt the 20 October 2020), by describing specific solutions.

More precisely, these measures, where not merely repetitive of specific obligations already provided for by the Regulation (e.g. "identification of authorized subjects; updating of authorisations; instructions of authorized subjects; training of authorized subjects; designation of managers" art. 29 of the Regulation and 2-quaterdecies of the Code ), are not adequately specific as they do not identify which data protection principles they are delegated to implement from time to time, as required by art. 25 of the Regulation.

The classification carried out was therefore not followed by any risk assessment by the Institute which did not envisage specific privacy by design measures, nor diversified security measures with respect to the "identifying", "thematic" and "sensitive thematic" data classes identified.

Furthermore, it should be noted that in the data entry phase, the Institute makes extensive use of consistency checks of the "ictu oculi" type on direct identifiers (such as name, surname, tax code), thereby exposing these identifiers to probable errors, in particular , to the detriment of the principle of accuracy and accuracy of the data.

Having stated all of the above, it is ascertained that, following the classification of the data - of the administrative archives, Istat has not envisaged specific differentiated measures with respect to the risks correlated to the different classes of data processed, this in violation of the principle of accountability and the obligations of privacy by design and by default pursuant to articles 5, par. 2 and 25 of the Regulation.

6.2 Data minimization in specific integration domains

Istat has shown that the implementation of the principle of minimization in the DSI is left, in addition to the organizational measures already in place, to the definition of information content to be included in the DSI.

The pseudonym that should be used in each survey is not actually the only univocal element of the statistical units, which can in fact be well identified by other combinations of attributes (which do not contribute to the creation of the pseudonym), which are unique within of the domain itself.

The Institute anticipated that "in order to incorporate the observations received from the Authority regarding the issue of singularity, [...] has in any case activated a study to identify methods and strategies in order to arrive at an assessment of this risk in on the fly mode, so as to make the outcome available to applicants before confirming the request".

Furthermore, the information contents of the DSI will be defined and signed by the designated Director of the DSI "through a specific technical request function which insists on the data catalog, in which all the archives are mapped and classified, at the level of the single layout/dataset and of single variable, necessary for the achievement of the respective statistical purposes". The Institute also clarified that only the Director responsible for the request will be able to define new variables under his own responsibility to be used in place of the original ones also in order to reduce the risk of re-identification of the interested parties and that before confirming the request for data for the DSI, necessary for the pursuit of a specific statistical purpose, will receive a report on the classification of the requested data, as an indication for assessing the risk associated with the processing of data in the domain itself.

This demonstrates that, in any case, the assessment of the risks associated with the identified variables is left to considerations made by the Director in charge of the DSI.

In this context, while favorably acknowledging the proposed measures, it is therefore confirmed that the Institute, at present, has not yet envisaged that data minimization is carried out from the beginning by setting the infrastructure used for the creation from a technological point of view of the DSI.

Furthermore, the critical issues identified regarding the impossibility of Istat to automatically detect any anomalies present in the DSIs remain unchanged, such as e.g. singularity (to be understood as a combination of unique attributes referable to a single subject), which would allow him to be immediately aware of the risks to which the interested parties, object of the investigation, are exposed and to make the necessary strategic determinations derive from this.

It should therefore be noted, also in the light of what is reported in the documentation last sent, that the Institute, despite the amount of data it manages for the pursuit of its institutional purposes, net of measures of an organizational nature (such as the request for authorization from part of the analysts to the area managers on the advisability of proceeding with the survey anyway), has not yet envisaged interventions of a technical nature aimed at detecting the presence of singularities in an automated way and at mitigating the risks of re-identification that derive from them, this at the in order to implement effectively and from the outset the principle of data minimisation.

It is therefore ascertained that in the management of specific integration domains, the presence of specific technical measures aimed at guaranteeing the effective application of the principle of minimisation, this in violation of articles 5, par. 2, 24 and 25 of the Regulation.

6.3 Pseudonymisation of data

The pseudonymisation system implemented by the Institute calculates a hash code for each statistical unit that flows into the integrated microdata system. The hashing technique on a formal level is suitable for guaranteeing the uniqueness of pseudonyms. However, the circumstance that the creation of the secret key, used for the generation of pseudonyms, is linked to an extremely volatile parameter (the current time per millisecond) makes the "controlled reversibility" of the pseudonymisation technique inapplicable on a concrete level (it escapes , in other words, the "control" by the Institute, unless it is necessary to keep the exact time to the millisecond in which it took place for each transformation).

Furthermore, at present, it is only an evolutionary hypothesis. In fact, the pseudonymisation carried out by Istat does not yet imply the use of random secret keys diversified by statistical unit. Consequently, in the event of a security incident, even on a single statistical unit, once the syntax of the data is known, the integration domain would be completely compromised.

Finally, it should be noted that the fact that, at the time of the inspection, a single integration domain had been created by the Institute (the one relating to the Disability Register), did not make it possible to verify - 3 years after the adoption of the aforementioned provision - the introduction of the prescribed hierarchical decoupling mechanism of the codes in the various databases and rotation of the same over time and, therefore, that two different pseudonyms deriving from the same master pseudonym code possibly contained in two different integration domains ( currently non-existent) may refer to the same statistical unit (see point 4 of the provision of 23 January 2020).

In taking favorable note of the further commitments undertaken by the Institute, represented in the documentation last sent (see note of the XX), it is confirmed that, at present, also in relation to this aspect, the data processing is carried out by the Institute in violation of articles 25 and 32 of the Regulation.

7. Conclusions

In the light of the assessments referred to above, taking into account the statements made by the owner during the investigation ˗the truthfulness of which may be called upon to answer pursuant to art. 168 of the Code˗ the elements provided by the data controller in the defense briefs, although worthy of consideration, do not allow to overcome most of the findings notified by the Office with the deed of initiation of the procedure, since none of the cases envisaged apply by art. 11 of the Regulation of the Guarantor n. 1/2019.

For these reasons, the unlawfulness of the processing of personal data carried out by the National Statistical Institute is noted, in violation of articles 5, par. 2, 24, 25 and 32 of the Regulation, in the terms described above.

Given the above, considering the above and, in particular that:

the provision of 23 January 2020 was adopted close to the pandemic period which began “with the issue of the resolution of the Council of Ministers of 31 January, G.U. no. 26/2020", leading to inevitable delays in the implementation of the articulated and complex "SIM Project" as described in the document called "Technological and organizational solutions to achieve full compliance of the Microdata Integration System (SIM)";

the Institute has produced further technical documentation aimed in particular at describing the specific pseudonymisation measures it intends to implement to comply with the aforementioned prescriptive provision of 23 January 2020;

the Institute has promoted, already following the inspection, the introduction of various corrective measures aimed at overcoming certain disputes raised by the Office;

the Institute immediately demonstrated a high degree of cooperation with the Authority;

the circumstances of the specific case lead to qualifying the violations as a "minor violation", pursuant to recital 148 of the Regulation and the WP 253 Guidelines, concerning the application and provision of administrative pecuniary sanctions for the purposes of Regulation (EU) no. 2016/679.

Therefore, in relation to the case in question, it is believed that it is sufficient to admonish the data controller pursuant to articles 58, par. 2, lit. b), and 83, par. 2, of the Regulation, for having violated the provisions of the Regulation contained in the articles 5, par. 2, 24, 25 and 32 of the Regulation, in the terms described above.

8. Corrective Measures

Among the powers that the art. 58, par. 2 of the Regulation attributes to the Guarantor, there is that of "enjoining the data controller or the data processor to conform the treatments to the provisions of this regulation, if necessary, in a certain way and within a certain term" (lett. d)

In the light of the assessments referred to above, taking into account the commitments undertaken by the Institute with the note of the XX as well as the documentation subsequently received, it is deemed necessary to enjoin the Institute, pursuant to the aforementioned art. 58, par. 2, lit. d) Regulation, to comply within 180 days of notification of this provision, the treatments with the provisions of the Regulation by providing for:

complete the source classification process, associating to each class the corresponding level of measurable risk of re-identification of the interested parties whose data are processed in each survey and the technical-organisational privacy by design and security measures capable of mitigating this risk (point 6.1);

implement technical measures which, within each integration domain, and before proceeding with the survey, identify the presence of any singularities on specific combinations of ex-ante configurable attributes and their percentage incidence in the sample, and which signal such with an alert occurrences to the analyst (point 6.2);

develop an internal policy for the management of these singularities by the analysts responsible for carrying out the statistical survey, which possibly provides for the possibility of reconfiguring the sample through the use of generalization techniques in order to eliminate, or in any case reduce, the incidence of such singularities (point 6.2);

refine the process of "controlled reversibility" of derived pseudonyms, using a secret key that is more controllable than the time per millisecond (e.g. a cryptographic salt) and storing these keys separately, as required by the definition of pseudonymisation pursuant to art. 4. Point 5 of the Regulation (point 6.3);

use the master SIM pseudonym also for automated consistency checks on newly created statistical units (for example, uniqueness and accuracy checks) instead of the traditional "ictu oculi" checks on direct identification attributes (such as name, surname, tax code); still widely used and likely source of errors in the data entry process (point 6.3).

ALL THIS CONSIDERING THE GUARANTEE

a) pursuant to art. 57, par. 1, lit. a) of the Regulation, declares the illegality of the processing of personal data carried out by the National Institute of Statistics, Via Cesare Balbo, 16 – 00184 Rome - tax code 80111810588 and VAT number 02124831005, for the violation of articles 5, par. 2, 24, 25 and 32 of the Regulation, in the terms set out in the justification;

b) pursuant to art. 58, par. 2, lit. b) of the Regulation, admonishes the aforementioned Institute, as owner of the treatment in question, for having violated the articles 5, par. 2, 24, 25 and 32 of the Regulation, as described above;

ENJOYS

To the National Statistical Institute:

a) pursuant to art. 58, par. 2, lit. d), of the Regulation, to conform the treatments to the provisions of the Regulation, adopting the corrective measures indicated in paragraph 8 of this provision, within and no later than 180 days from the notification of this provision, providing for:

exhaustively complete the source classification process, associating each class with the corresponding level of measurable risk of re-identification of the interested parties whose data are processed in each survey and the technical-organizational measures of privacy by design and security capable to mitigate this risk (point 6.1);

implement technical measures which, within each integration domain, and before proceeding with the survey, identify the possible presence of singularities on specific combinations of ex-ante configurable attributes and their percentage incidence in the sample, and which signal such with an alert occurrences to the analyst (point 6.2);

develop an internal policy for the management of these singularities by the analysts responsible for carrying out the statistical survey, which possibly provides for the possibility of reconfiguring the sample through the use of generalization techniques in order to eliminate, or in any case reduce, the incidence of such singularities (point 6.2);

refine the process of "controlled reversibility" of derived pseudonyms, using a secret key that is more controllable than the time per millisecond (e.g. a cryptographic salt) and storing these keys separately, as required by the definition of pseudonymisation pursuant to art. 4. Point 5 of the Regulation (point 6.3);

use the SIM master pseudonym also for automated consistency checks on newly created statistical units (for example, uniqueness and accuracy checks) instead of the traditional "ictu oculi" checks on direct identification attributes (such as name, surname, tax code), still widely used and likely source of errors in the data entry process (point 6.3).

b) pursuant to art. 58, par. 1, lit. a), of the Regulation and of the art. 157 of the Code, to communicate which initiatives have been undertaken in order to implement the provisions of the aforementioned par. 8, and to provide adequately documented feedback in any case, within and no later than 20 days from the expiry of the term indicated above. Failure to respond to a request made pursuant to art. 157 of the Code is punished with an administrative sanction, pursuant to the combined provisions of articles 83, par. 5 of the Regulation and 166 of the Code.

It is also believed that the ancillary sanction of publication on the Guarantor's website of this provision should be applied, provided for by art. 166, paragraph 7 of the Code and art. 16 of the Regulation of the Guarantor n. 1/2019, also in consideration of the type of personal data subject to unlawful processing.

Finally, it should be noted that the conditions pursuant to art. 17 of Regulation no. 1/2019 concerning internal procedures having external relevance, aimed at carrying out the tasks and exercising the powers delegated to the Guarantor.

Pursuant to art. 78 of the Regulation, of the articles 152 of the Code and 10 of Legislative Decree no. 150/2011, against this provision it is possible to lodge an appeal before the ordinary judicial authority, under penalty of inadmissibility, within thirty days from the date of communication of the provision itself or within sixty days if the appellant resides abroad.

Rome, 8 June 2023

PRESIDENT
station

THE SPEAKER
Zest

THE SECRETARY GENERAL
Matthew

[doc. web no. 9921184]

Provision of 8 June 2023

Register of measures
no. 337 of 8 June 2023

THE GUARANTOR FOR THE PROTECTION OF PERSONAL DATA

IN today's meeting, which was attended by prof. Pasquale Stanzione, president, prof.ssa Ginevra Cerrina Feroni, vice president, dr. Agostino Ghiglia and the lawyer Guido Scorza, components, and the cons. Fabio Mattei, general secretary;

HAVING REGARD TO Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 April 2016 on the protection of natural persons with regard to the processing of personal data, as well as on the free circulation of such data and repealing Directive 95/46 /CE-General Data Protection Regulation (hereinafter "Regulation");

HAVING REGARD TO Legislative Decree 30 June 2003, n. 196 containing the "Code regarding the protection of personal data" (hereinafter "Code");

HAVING REGARD TO Legislative Decree 6 September 1989, n. 322, containing the "Rules on the National Statistical System and on the reorganization of the National Institute of Statistics" and in particular, the art. 6-bis of the same decree;

HAVING REGARD to the "Ethical rules for processing for statistical or scientific research purposes carried out within the National Statistical System", Annex A.4 to the Code (hereinafter "Ethical rules") and in particular, Articles 4-bis and 6, paragraph 2, of the same Deontological Rules;

HAVING REGARD TO the provision of 23 January 2020, with which the Guarantor completed the process of authorizing the processing of personal data necessary for the creation of the permanent census (web doc. 9261093)

HAVING REGARD to the documentation in the deeds;

GIVEN the observations made by the Secretary General pursuant to art. 15 of the Regulation of the Guarantor n. 1/2000 on the organization and functioning of the office of the Guarantor for the protection of personal data, in www.gpdp.it doc. web no. 1098801;

SPEAKER the lawyer Guido Scorza

WHEREAS

1. Premise

The Office, by virtue of the service order n. XX, note of the XX, in the XX days, carried out inspections pursuant to art. 58, par. 1, lit. a), e) and f) of the Regulation and of the articles 157 and 158 of the Code, as well as pursuant to articles 21 and 22 of Regulation no. 1/2019 of the Guarantor for the protection of personal data (G.U. n. 106 of 8 May 2019), at the National Institute of Statistics (hereinafter "Istat" or "Institute"), in order to verify compliance with the provisions on the protection of personal data, with specific reference to the methods of implementation of the measures referred to in the Provision adopted by the Guarantor for the protection of personal data n. 10 of 23 January 2020” (web doc. 9261093).

With this provision, the Authority completed the process of authorizing the processing of personal data necessary for the creation of the permanent census, requiring Istat, among other things, to adopt pseudonymisation techniques suitable for ensuring the effective implementation of the principles of of personal data, in particular, minimization, purpose limitation and conservation, in compliance with the obligations of privacy by design and by default (articles 5, paragraph 2, letter b), c) and e) and 25 of the Regulation).

In the aforementioned provision, the Guarantor has, among other things, highlighted specific critical issues in relation to the attribution of a unique code that identifies the individual in the various Istat databases and has therefore ordered Istat to introduce a decoupling mechanism hierarchy of pseudonymised codes in the various databases and their rotation over time.

In fact, this measure makes it possible to achieve a double objective: on the one hand, to make the application of the aforementioned principles effective, since each pseudonymous code used in the individual surveys could not be directly linked to codes used in other surveys nor have a longer than the data retention time envisaged for each statistical work; on the other hand, making it possible to create integration domains, i.e. the interconnection of multiple tables in cases where this constitutes the institutional purpose of the survey and is permitted by law.

With the note of the XX (prot. n. XX) and related attached technical documentation, the Institute represented a multi-year plan for the evolution of the SIM (Microdata Integration System), which, although capable of strengthening the profiles of processing security, had not fully grasped the need for decoupling and rotation of pseudonymous codes expressed in the measure.

Following numerous informal discussions with the Office of the Guarantor, the Institute, on the XX date, in compliance with the aforementioned provision, sent the Guarantor a new version of the document called "Technological and organizational solutions to achieve full compliance of the Microdata Integration (SIM)”.

In considering that the provisions of the aforementioned provision have been complied with, with a note of the XX, prot. no. XX, it was noted that the proposed solutions were particularly articulated and complex and in some passages not without potentially critical issues for which further and specific investigations were deemed necessary.

The Institute was therefore invited to continue to represent to the Authority the solutions that will concretely, from time to time, be envisaged to complete the implementation of the program presented, sharing the need to continue, in line with the tasks of each institution, in the effective collaboration immediately undertaken, so that compliance with the principles applicable to the processing of personal data in the delicate sector of public statistics is concretely ensured.

With notes of the XX (prot. n. XX) and of the XX (prot. n. XX), Istat provided updates on the progress of the work to implement the changes to the SIM, as prescribed by the Guarantor in the aforementioned provision.

Nonetheless, also taking into account the complexity of the project and the difficulty of representing in writing in a complete and effective manner the activities actually carried out also from a technological point of view, the Office ordered the aforementioned inspection.

2. The inspection activity carried out

2.1. Inspection checks

The inspection initiative took place on the twentieth day during which the Institute illustrated the activities carried out by the Organization following the aforementioned provision, starting with a representation, documented on site, of the methods for classifying the sources of personal data acquired by the Institute in order to map all the variables present in them in a complete and exhaustive way.

2.2 Classification of data

Istat has shown that it has implemented the classification of some external administrative sources, e.g. INPS databases, and some internal archives produced in previous statistical works, taking into account the risks for fundamental rights and freedoms related to the processing of the personal data in question.

On this point, it was further specified that the taxonomy of the data used in the distinct classes, which concerns not only the variables but also the statistical units (natural or legal persons), can be defined both ex ante, i.e. exhaustively and applicable to each future situation, that ex post, i.e. variable each time, case by case, and that the Institute is moving towards an ex ante scheme by creating an initial classification of the sources currently applicable to the register of disabilities which will progressively concern all the variables contained in the different registers.

It also emerged that the variables used for the classification are divided into "identification data", such as, for example, location data (residence), contact data (telephone), other identification data (vehicle number plate) or "thematic data" such as, for example, variables related to education/training, income, work, family, lifestyle, and “sensitive thematic data”. In this regard, it was specified that, with reference to thematic variables of a sensitive nature, at the moment there is only a general classification attributable to the particular categories of personal data, pursuant to art. 9 of the Regulation. There is no second hierarchical level but a classification in this sense could also be envisaged in the future (sexual orientation, health and disability). The number of thematic variables is very numerous, the identifying ones are more contained. This classification concerns both external and internal sources of input produced by other statistical works. The classification process is currently manual and this activity has only been carried out on the disability register. About a quarter of external administrative and input sources will be completed by the end of 2022. By 2023, this activity will be completed for 90% of internal and external sources.

The classification of the variables will be instrumental not only for the correct application of the aforementioned provision but also allows for the improvement of the quality of the processing register (art. 30 of the Regulation), the internal organizational measures relating to the authorization profiles as well as the compilation of the PSN forms and the implementation of security measures, favoring the automation of processes and reducing the margins of error. The classification introduced makes it possible to apply different technical and organizational measures based on the type of data. In particular, identification data may be separated from the rest of the record layout (as envisaged in the definition of pseudonymisation), while for sensitive data it will be possible to apply specific security measures, based on the risk assessment, including encryption.

Istat has undertaken to produce a document illustrating which measures have been implemented or intends to implement with respect to the classifications already carried out. In this regard, it should be noted that on the 20th date, Istat sent a further document on the progress of work on the SIM (prot. no. 18647288), which however does not contain specific indications regarding the classification of the data.

2.3. Specific integration domain

The creation of specific integration domains constitutes the architrave of the measures indicated by the Institute to implement the aforementioned provision of the Guarantor with particular reference to the pseudonymisation of data. With a note of the XX, the Institute declared that it had "already started, within the system currently in use, the procedures aimed at the creation of the specific integration domains for the processing of personal data relating to statistical works: IST-02742" Work register”; IST-02634 “Extended Register of Employment in Enterprises (Asia Employment)”; IST-01382 “Annual register of wages, hours and individual labor costs – RACLI”; and IST-02645 "Quantification of populations in territorial areas potentially at risk".

In this regard, during the inspection, Istat clarified that by "specific integration domain" we mean the place where the data necessary for the pursuit of a specific statistical purpose are present. In essence, it is an extraction of a subset of information from the master table with individual record layouts based on specific selection criteria. The phase of creating an integration domain takes the form of a formal request made by the directors in charge of this in which specific elements are inserted such as: assumptions of lawfulness of the processing (e.g. PSN), purpose - statistical purpose, necessary information contents and deadline within which the statistical purpose must be pursued. Hierarchical pseudonymisation is envisaged to ensure that the integration domains are independent/autonomous of each other.

As a preliminary step, when the data is acquired by Istat, the system provides for the assignment of a universal pseudonym. This pseudonym (SIM) is translated with a cryptographic function into different pseudonyms valid in the different integration domains. A specific unique encryption key is assigned to each specific integration domain (hereinafter DSI), and each domain has a different encryption key. In the current testing phase, the request formulated by individual statisticians (analysts) is sent to the competent central directors who approve it, assessing the pertinence and non-excess of the data requested with respect to the statistical purposes and taking into account the methodologies currently used. In other words, the application of the minimization principle is left, in relation to this processing operation, to the evaluations of the managers and therefore to mere organizational measures.

However, it was specified that these organizational measures are accompanied by specific security measures. In fact, the new system envisages detailed "logging" measures aimed at tracing what is done on the systems at any time, also in order to bring out any anomalies (criticalities) such as "singularities" - "uniqueness" of records. In the event of an anomaly, the archive is placed in a "quarantine" condition to allow the analysis of critical issues.

The Institute has acknowledged that at the moment the system, by default, does not carry out any checks on the data extraction criteria. However, it is potentially capable of signaling the presence of critical issues (singularities), but the decision regarding their tolerability is left to the analyst. In this way, the analyst, at present, with the prior authorization of the competent director, can still proceed with the creation of the DSI.

Istat also specified that at the moment no technical measures are envisaged by default aimed at ensuring guarantees in this regard (automations to identify singularities or other critical issues), undertaking to make every assessment known to the Authority.

2.4. Data pseudonymization

The hierarchical pseudonymisation of the statistical units constitutes the set of technical and organizational measures indicated by the Institute to implement the aforementioned provision of the Authority in order to guarantee that the integration domains are independent/autonomous from each other.

In describing the current system and the evolutions achieved with the ongoing experimentation, Istat reiterated that when a single interested party is entered into the system, he receives a universal pseudonym or master (SIM). It is a progressive numeric code (a sequential number) and that aspect is being changed. About 110 million individual codes have now been created. Within the individual DSIs, each SIM is transformed into a new unique code. For this step, a unique encryption key is used within each DSI. The result is unique and cryptographically irreversible. However, the association between the SIM and the code is kept which allows, if necessary, to rejoin the code to the original SIM. The Institute is evaluating the possibility of introducing different encryption keys assigned to each pseudonym contained in the same DSI.

With specific reference to the period of validity of the coding of the pseudonyms and of the domain itself, it has been declared that the domain has its own predefined duration, at the end of which it ceases to exist as well as all the pseudonyms that are part of the same domain. The encryption key also ceases to exist at that point.

On the 20th day, during the inspection it was requested to be able to access the IT systems, in particular the Disability Register on which the Body declared that it had implemented the measures referred to in provision no. on an experimental basis. 10 of 23 January 2020. Specifically, it was asked to verify the entire process from the moment in which the personal data referring to the statistical unit enters the ISTAT systems for the generation of the master SIM code and, subsequently, how the subsequent new pseudonym code in the DSI. The experimentation of the DSI relating to the Disability Register was carried out using the current SIM platform.

With regard to the indexing of the statistical units, i.e. from the preliminary phase to the actual statistical analysis, necessary to ascertain the uniqueness of the records processed, it has been specified that the platform calculates a hash code obtained exclusively from the variables pertaining to the class of identification data (cf. Minutes of the XX). Due to the small portion of data considered in the calculation of the hash, this operation allows the state to verify the desired uniqueness, provided that such data, already in the collection phase, are in turn effectively unique.

Given these premises, the Institute confirmed that in the event of a security incident, even on a single statistical unit (note the syntax of the data), the database would be completely compromised, adding that, having verified the economic sustainability of the measure, the the future project could include a reflection on further security measures aimed at avoiding this risk and that the introduction of a different salt for each single statistical unit is among the options currently considered.

It was specified that the attribution of the hash does not coincide with the attribution of the master SIM code or of the pseudonym in the DSI. It has a purely operational value, since the hash is only used for consistency checks, i.e. it is used to confirm the accuracy of the data entered by comparing it with statistical units already present in the databases that will make up the domain of integration. For the disability register, the Institute has decided to renounce the treatment of new and not yet registered statistical units. Therefore, if the consistency check carried out via hash should show that a particular interested party has never been registered, it will not be entered in the register. Nevertheless, despite this purpose, in practice the hash is not used for this consistency check as the verification of the presence of the statistical unit takes place in a "traditional" way through the comparison of direct identifiers. Therefore the hash (as part of the experimentation concerning the Disability Register - Ist-02748) is currently attributed only for a possible verification of the uniqueness of the records processed for the benefit of the quality of the data.

The Institute has represented that, in relation to the Disability Register, the identification data "PD_ID_DS" are separated from the sensitive thematic data "PD_Tem_DS". The encryption measure called "Transparent_data_encryption" (with AES_256 protocol with salt) is applied to the sensitive data contained in this domain, which makes the data unintelligible to anyone who is not authorized to access it. In this regard, it has been specified that sensitive and identifying data have not been physically separated, but only on a logical level, and that they are placed on the same "machine" which can be accessed via two password levels associated with two different users.

The assignment of the master code (SIM) takes place through a verification procedure, known as record linkage, with respect to the presence of a pre-existing progressive code for the statistical units already registered in the Register. For unregistered units, even if the procedure would allow for the generation of a new progressive code, this generation is not provided for.

In this regard, it was represented by the Institute that in an experimental phase a single DSI was created concerning the Disability Register, which currently contains the following databases (Individual Base Register, INPS Archive of disability certifications, Register of beneficiaries of pension treatments related to disability).

The fact that only one experimental integration domain was created made it impossible to view, during the inspection, the procedure which should allow for the verification that two different pseudonyms contained in two different DSIs possibly refer to the same statistical unit.

2.5. Further documentation sent

With a note of the XX (prot. n. XX), following the aforementioned inspection, the Institute sent further documentation aimed at updating this Authority on the progress of the works on the SIM referring to the period July - October 2022.

Preliminarily, Istat reiterated that it would respect the commitment made "to provide for the adoption by 31/12/2022 of adequate pseudonymisation measures in compliance with provision no. 10 of 23/01/2020 for the statistical works IST-02742 "Work register"; IST-02634 “Extended Register of Employment in Enterprises (Asia Employment)”; IST-01382 “Annual register of wages, hours and individual labor costs – RACLI”; IST-02645 "Quantification of populations in territorial areas potentially at risk", believing "to be able in a short time to give concreteness and effectiveness, for the aforementioned statistical works, to the new pseudonymisation measures outlined in the aforementioned document, at least for what concerns the following aspects:

to. Classification of data, contained in administrative sources acquired from outside or in the products of other statistical works, foreseen as input for the statistical purposes of the aforementioned works.

b. Population of the primary integration domains according to data classification, through migration procedures of data already present in the current system to the new, suitably structured schemes.

c. Release of a user interface for managing data requests from users. The use of this interface must produce: i) a pdf document to be signed by the competent director/s for the necessary authorisations; ii) the metadata documenting the entire set of information contents to be inserted in the specific integration domain and which allow the automatic management of data release in the domain.

d. Release of a procedure which, following the request and the related authorizations, can automatically proceed with the release of the data requested in the specific integration domain".

In this regard, the Institute also stated that:

in carrying out the above, it will duly consider the "observations that emerged during the inspection regarding the robustness and coherence of the cryptographic codings adopted for the reunification of thematic data with universal pseudonyms between the various primary domains (point b) and for the generation of domain pseudonyms starting from the universal ones with the requirement of hierarchical reversibility (point d)”;

"on the basis of the observations that emerged during the inspection, the list of functional requirements for the final version was integrated in order to perfect the privacy by default approach in the management phase of the request for information content to be included in the specific domain: i ) pre-definition of transformations of variables that reduce the identification potential of the original variables (e.g. "year of birth" instead of "date of birth"); ii) identification of methods for quantifying the "singularities" in the requested data and for recognizing the variables most responsible for these singularities".

Istat has also attached to the note referred to above the updated timetable on the progress of the works and a document called "XX".

3. The alleged violations

On the basis of the elements acquired in the context of the preliminary investigation and the inspection activity carried out, the Office - with an act of the XX (prot. n. XX), which must be understood as reproduced in full here - initiated, pursuant to art. 166, paragraph 5 of the Code, with reference to the specific situations of illegality referred to therein, a procedure for the adoption of the provisions pursuant to art. 58, par. 2 of the Regulation, against Istat, inviting it to produce defensive writings or documents to the Guarantor or to ask to be heard by the Authority (art. 166, paragraphs 6 and 7, of the Code, as well as art. 18, paragraph 1, law n. 689 of 24 November 1981).

In particular, with the aforementioned deed, the Office notified the Institute that on the basis of the elements acquired and the facts that emerged following the inspection and preliminary investigation, as well as the subsequent assessments, it is ascertained that the processing of personal data carried out in the scope of the DSI relating to the Disability Register - Ist-02748 were carried out in violation of articles 5, par. 2, 24, 25 and 32 of the Regulation and that Istat has put in place, from both a qualitative and quantitative point of view, a series of measures far lower than those described in the Document, whose time schedule "generally ” indicates the month of December 2022 as the deadline within which to conclude “the redesign of the SIM system”, putting oneself in the position of being able to probably violate the provisions formulated by the Guarantor in provision no. 10 of 23 January 2020.

More specifically, the following was found in relation to each of the disputed violations.

3.1. Data classification

Preliminarily, it was clarified that data classification is necessary to associate each type of data class (e.g. identification, common and particular data) with a risk level, therefore the more detailed it is, the more precise it can be. the definition of risk. From the classification (resulting on the basis of the documentation transmitted, quite detailed) the Institute could in fact implement differentiated security measures, such as, for example, the use of particular cryptographic techniques or the adoption of specific authorization profiles. This activity is particularly necessary in the presence of a large amount of data, such as that being processed by the Institute. Each statistical unit is in fact characterized by the presence of a significant number of variables, some of an extremely sensitive nature, which only thanks to a complete classification can the effectiveness of the protection be guaranteed, obtained through the application of technical and organizational measures devised in such a way to be gradually more effective as the level of risk to which the data subjects are exposed increases.

In this regard, it should also be noted that the same Institute, in the aforementioned note of the XX, in particular in the document "Technological and organizational solutions to achieve full compliance of the Microdata Integration System (SIM)" update of 4 February 2021", in describing the activities implemented to implement the aforementioned provision of the Guarantor of 23 January 2020, he declared that:

“The main novelties of the project concern the introduction of the data life cycle as an element characterizing the entire solution and the adoption of additional security measures at the ecosystem level identified in the risk analysis on statistical processes with a high impact in relation to the processing of personal data. These measures are characterized by active masking of data, dynamic protection against access operations and crossing of identifiers in incompatible domains of use, automatic classification of metadata right from the early stages of the acquisition process and the introduction of distributed consensus techniques to complete the standard security measures already adopted”:

“The acquisition phase is based on inferential rules and is able to select the right treatment for each type of microdata and for each family of metadata. It will be possible to proceed with the protection of personal information already in the data acquisition phase by defining a first pseudonymisation domain and, within this context, the treatment specifications for each family of metadata.

The control of any anomalies will take place in real time, signaling any violations of constraints through specific application interfaces. The domain classification carried out starting from the content in terms of metadata of the source, will allow the updating of the catalog of Sources and in particular of the relative compatibility of sets of metadata due to the possible presence of singularities which will be documented and traceable”.

On the merits, following the inspection, it was found, firstly, that the classification activity concerned only the statistical work on the Disability Register - Ist-02748 and, secondly, that at present, it disregards by a precise assessment of the risks related to the treatments carried out and is consequently not corroborated by specific measures aimed at effectively implementing the principles of data protection in relation to the risks identified, this in violation of the obligations of privacy by design and by default (art. 25 of the Regulation). In other words, in the state of the activities carried out, the operation of classification of the sources appears more functional to specific statistical needs than as a preliminary activity to the identification of appropriate measures with respect to the risks related to the processing of certain types of data.