Electronic Health Records for clinical research

Professor Dr Dipak Kalra speaks to The Innovation Platform about the latest developments in Electronic Health Records data research.

The European Institute for Innovation through Health Data (i~HD) was formed as one of the key sustainable entities arising from the Electronic Health Records for Clinical Research (EHR4CR) and SemanticHealthNet projects, in collaboration with several other European projects and initiatives supported by the European Commission. The vision of i~HD is to become the European organisation of reference for guiding and catalysing the best, most efficient and trustworthy uses of health data and interoperability, for optimising health and knowledge discovery.

i~HD has been established in recognition that there is a need to tackle areas of challenge in the successful scaling up of innovations that critically rely on high-quality and interoperable health data, to sustain and propagate the results of health ICT research, and to specifically address obstacles to using health data that are not being addressed by other current initiatives.

The Innovation Platform speaks to (i~HD) President Prof Dr Dipak Kalra about the challenges and differing viewpoints surrounding data protection when applied to the reuse of EHRs for research.

What are the latest developments in reusing electric health records data for research?

Electric health records have been used for research for over 30 years, which began as soon as they started to contain useful clinical (non-administrative) information. However, they were usually analysed as stand alone repositories to conduct specific research studies, or data sets were extracted to be pooled with similar site extracts for larger scale population studies. Electric health records data has also for many years been extracted to populate disease registries, that have also been used for research and for healthcare quality improvement, epidemiology and public health. There are examples of very large longitudinal data collections derived from electric health records, the most notable being the Clinical Practice Research Datalink in the UK).

The attractiveness of reusing electric health records has grown as the proportion of data captured in a structured and coded form has increased and the use of data standards (such as interoperability standards) has made large scale and sophisticated clinical research more feasible. The innovations in this field have progressively shifted from harnessing single repositories to constructing networks of clinical data repositories that can be analysed in a combined way to yield massive population numbers – such as ‘Big Data’ for health. Many countries are investing in research infrastructure programmes (such as in Germany, France, Switzerland, and the UK) to connect data from multiple healthcare organisations to create national or regional scale data resources. These are being targeted at publicly funded research and industry, such as the pharma and AI industry sectors.

However, there is also important innovation taking place at a European level. A major investor in scaling up European Big Data capability has been the ‘Innovative Medicines Initiative’ (IMI). This is a public private Joint Undertaking between the European Commission (via its Horizon 2020 programme) and the European Federation of Pharmaceutical Industry Associations (EFPIA). There are two IMI projects in particular that should be highlighted from a Big Data perspective. The European Medical Informatics Framework (EMIF) project ran from 2013 to 2018, and developed an architecture, an ICT platform, data harmonisation and data discovery tools, and a governance framework (a Code of Practice) to establish a federated network of data sources across Europe, to support distributed querying and analysis.1 This has been followed by a new deployment and scale up project called the European Health Data Evidence Network (EHDEN), that builds on the EMIF legacy in collaboration with the global Observational Health Data Sciences and Informatics (OHDSI) community. A landmark Big Data study conducted by this community has recently been published in the Lancet.2 This combination of European and national federated data initiatives is transforming how we all perceive the opportunities and possibilities for conducting Big Health Data research, without having to re-engineer the source systems in every hospital and general practice.

Over the past 20 years there has been parallel interest in reusing electronic health records to help to fine tune clinical trial protocols and to then help to recruit suitable patients, using the electronic health records data to case find these patients. Apart from individual hospitals that have achieved this by building a clinical trial platform connected to their own electronic health records system, there has been notable and unique innovation in this area pioneered by the EHR4CR project, which is discussed below.

Can you tell us about EHR4CR?

The EHR4CR project (2011-2016),) was a first of its kind IMI project to design an approach that leverages, in a publicly acceptable and viable way, the profiles of patients within their electronic health records to optimise the design and conduct of clinical trials. As with many IMI projects, the consortium comprised a mixture of academic centres, healthcare provider organisations (in this case, hospitals), and other not-for-profit organisations and pharma companies that are part of EFPIA. This multi-stakeholder consortium was complemented by patient, ethics and regulatory representatives through advisory boards.

The mission of EHR4CR was to develop a method to remotely query multiple hospital electronic health records systems in order to estimate more accurately than current practice the likely numbers of eligible patients at each connected hospital. During the project, after a careful requirements analysis and surveys with members of the public and data protection experts, a robust and scalable platform was developed that uses de-identified data from hospital electronic health records systems, in full compliance with the ethical, regulatory and data protection policies and requirements of each participating country.3

The platform connects securely to a dedicated data warehouse held at (and under the control of) each participating hospital. This design enables the connection of multiple hospital electronic health records systems and clinical data warehouses across Europe.4 Through remote queries constructed by clinical protocol designers (e.g. within a pharma company) a federated platform enables them to estimate with good accuracy the number of eligible patients for a candidate clinical trial protocol, to assess its feasibility, to fine tune it if necessary, and then to locate the most suitable hospital sites with the highest patient numbers.5 This solution is compliant with the EU General Data Protection Regulation and respects the position of hospital and patients.

An in-depth business modelling work plan established the value of this solution to the pharma industry, including a Cost Benefit Assessment and a Business Impact Assessment.6,7 This led to a commercialisation strategy that has enabled the success across Europe of the InSite Platform, which is rapidly growing a network of connected hospital sites.

EHR4CR has shown that this platform can significantly improve the efficiency of designing and conducting clinical trials, reducing time and costs by optimising protocol feasibility assessment, accelerating patient recruitment and enabling the participation of European hospitals in more clinical trials. The IMI programme created the success conditions for this innovation, by bringing together a wide-ranging consortium and ensuring there was sufficient budget and time to go from requirements to a commercially viable product.

As a follow on, some members of that consortium have built on this success by extending the functionality of the platform further, to enable the reuse of patient specific electronic health records data within clinical trials. The EHR2EDC project (2017-2019), co-sponsored by EIT Health and by the participating companies, has designed and implemented this extension to the InSite Platform: after a patient gives consent to be in a study a core data set on that patient is exported from the hospital electronic health records system via InSite to the clinical trial investigators system. In most cases today, this transfer of background medical data is undertaken manually, which is time consuming and error prone (and therefore also costly). An evaluation currently in progress, to be published during 2020, is expected to show the relevance and accuracy of this electronically exported data, to make the case for business value to pharma and to clinical investigational sites. As with EHR4CR, GDPR compliance has been a critical acceptance factor to hospitals and patients.

What is the work and role of i~HD?

It was recognised during EHR4CR that there are considerable opportunities to scale up the reuse of electronic health records data for research, and also for healthcare quality improvement. It was also recognised that these opportunities will require multi-stakeholder alignment, the building of trust between them, and a strong governance system to ensure that good practices (including data protection and that data reuse is for societally acceptable purposes) are developed and promoted. It was felt that a new neutral not for profit body would be the best kind of organisation to engage multiple stakeholders in such a trusted environment. The European Institute for Innovation through Health Data (i~HD) was created in 2016 with the mission to foster and to catalyse the best uses of health data for research and for care, and to accelerate the Learning Health System within Europe.

i~HD is a multi-stakeholder organisation and works to develop good practices and methods that break down present-day barriers to scaling up the trustworthy uses of health data. Some of its priorities have been to guide compliance with the GDPR when it comes to reusing real world data for research, the promotion of interoperability standards, promoting the importance of data quality and assessing it, and helping to convey to the public the value of health data being reused for research.

Moreover, i~HD has built upon codes of practice and standard operating rules that were developed during EHR4CR and EMIF, which were led by its staff. It is tracking the emerging advice and legislation from European Member States about the safeguards they would find acceptable on topics such as the legal basis for reusing health data and pseudonymisation. It has developed good practices in data sharing and contributed to publications on this topic.8 It runs tutorials on the impact of, and compliance with, the GDPR when conducting research, and participates in several Horizon 2020 and IMI projects as a data protection and GPDR compliance expert partner. It also performs system testing on research platforms and hospital electronic health records systems, in partnership with another institute, EuroRec, to verify the security, data protection and data provenance features of such systems.

i~HD staff have played leading roles in the development of ISO interoperability standards for electronic health records information and interact with other standards bodies on semantic interoperability and information security. i~HD has been a partner in Horizon 2020 projects that promote the importance of interoperable patient summaries and the forthcoming International Patient Summary standard. On data quality, it has formed a strategic partnership with the University of Valencia to formalise a data quality assessment methodology suitable for hospital electronic health records. i~HD has begun projects that will perform these assessments in selected clinical conditions, to assess the fitness of the data for health outcomes assessment and for research.

Through its multi-stakeholder events, i~HD has stimulated better awareness of the importance of establishing the reliability of real world evidence derived from real world data and the importance of engaging patients and the public in better understanding how and why data can be reused, including improving our understanding of the acceptance conditions held across society for this. One example initiative in which i~HD partners the European Patients’ Forum is ‘Data Saves Lives’.

In terms of patient data and privacy, do you think enough is currently being done in this area?

There remain several areas of uncertainty and differing viewpoints when it comes to data protection as applied to the reuse of electronic health records for research. Firstly, most kinds of data reuse does not rely upon consent as the legal basis for the reuse of electronic health records. This is because it would not be practical to try to collect fully informed consent from every European citizen for the uses of their data, in part because it is not possible to predict the kinds of clinical research questions that the future might need us to answer from the data. It is therefore not possible to provide detailed information about the possible uses of the data when seeking consent.

Secondly, it would not be practical to offer each citizen the ability to select which research studies, topics or organisations they would be willing to permit their data to be used for, since the numbers of these are too great and will keep changing. Thirdly, it would not be practical to offer hundreds of millions of people the individual ability to change their minds (as required by the GDPR) about their data reuse, potentially at any moment.

For this reason, Big Data research networks and repositories are more reliant on other GDPR-defined legal bases for data reuse, and often utilise pseudonymised or anonymised data. This is, however, also not straightforward, since there is as yet no single pan-European standard for acceptable anonymisation that would regard a data set as being out of scope with the GPDR. The GDPR regards pseudonymised data as still being personal data and inside its scope. Pseudonymised data may be reused for research, but under suitable safeguards which are still being determined, mainly at a Member State level. The appropriate legal bases for research reuse of data are themselves to some extent also the subject of Member State decisions, creating temporary uncertainty. Researchers are therefore at times unsure of their ground when it comes to making use of EHR data, and ethics committees and data protection officers within healthcare provider organisations are (anecdotally) taking conservative decisions when it comes to approving requests to reuse EHR data.

In parallel, there are efforts being made to engage patient communities and society at large in dialogue about health data use. Many surveys are being undertaken to gain better insights into societal preferences. Many pilots are in progress across Europe to experiment with scalable ways of capturing personal preferences and implementing these (for example using technologies like Blockchain). These attempt to demonstrate a more personalised way of adhering to patient preferences and permissions during data analysis or sharing. We are still a long way from scalable solutions, and it is important in the meantime that neutral and trustworthy bodies oversee this societal multi-stakeholder engagement so that the eventual solutions are well-accepted by all.


  1. Lovestone, S., and the EMIF Consortium. 2019. The European medical information framework: A novel ecosystem for sharing healthcare data across Europe. Learn Health Sys.2019;e10214. wileyonlinelibrary.com/journal/lrh2 1 of 13. https://doi.org/10.1002/lrh2.10214
  2. Sucharf, M., Scheumie, M., and Krumholz, H. et al. 2019. Comprehensive comparative effectiveness and safety of first-line antihypertensive drug classes: a systematic, multinational, large-scale analysis. Lancet 2019. 394:1816-1826. DOI: https://doi.org/10.1016/S0140-6736(19)32317-7
  3. Coorevits, P., M, Sundgren, M., and Klein, G.O. et al. 2013. Electronic health records: new opportunities for clinical research. J Intern Med 19 Aug 2013. DOI 10.1111/joim.12119
  4. De Moor, G., Sundgren, M., and Kalra, D. et al. 2015. Using electronic health records for clinical research: The case of the EHR4CR project. J Biomed Inform 53:162-173 Feb 2015
  5. Claerhour, B., Kalra, D., and Mueller, C. et al. 2019. Federated electronic health records research technology to support clinical trial protocol optimization: Evidence from EHR4CR and the InSite platform. Journal of Biomedical Informatics 2019: 90. https://doi.org/10.1016/j.jbi.2018.12.004
  6. Berensniak, A., Schmidt, A., and Proeve, J. et al. 2016. Cost-benefit assessment of using electronic health records data for clinical research versus current practices: Contribution of the Electronic Health Records for Clinical Research (EHR4CR) European Project. Contemporary clinical trials 46:85-91 Jan 2016
  7. Dupony, D., Beresniak, A., and Sundgren, M. et al. 2017. Business analysis for a sustainable, multi-stakeholder ecosystem for leveraging the Electronic Health Records for Clinical Research (EHR4CR) platform in Europe. Int J Med Inform 97:341-352 2017. doi: http://dx.doi.org/10.1016/j.ijmedinf.2016.11.003
  8. Ohmann, C., Banzi, R., and Canham, S. et al. 2017. Sharing and reuse of individual participant data from clinical trials: principles and recommendations. BMJ Open. 2017 Dec 14;7(12):e018647. https://doi.org/10.1136/bmjopen-2017-018647. PubMed PMID: 29247106; PubMed Central PMCID: PMC5736032

Prof Dr Dipak Kalra
The European Institute for Innovation through Health Data (i~HD)

Please note, this article will also appear in the first edition of our new quarterly publication.

Subscribe to our newsletter


Please enter your comment!
Please enter your name here

Featured Topics

Partner News

Latest eBooks

Latest Partners

Similar Articles

More from Innovation News Network