The Covid-19 pandemic has brought the role of clinical data registries to the fore once again. Such registries use observational study methods to collect data about the treatment, outcomes, and quality of life of patients over time. In a health emergency, such real-world data may be the only speedy way to provide information about treatments and to uncover new, unexpected effects of the disease, such as the development of diabetes. However, even though Covid-19 registries started with a huge advantage – WHO issued a core template for everyone to follow – it was not enough to be able to draw firm conclusions about treatment efficacy, for which randomised controlled trials are necessary.
In oncology, registries aggregating large datasets have for a long time played an important role in the evaluation of outcomes for cancer patients. But the importance of registry data in assessing treatments is seriously handicapped by a lack of standardisation, not only internationally, but also on a national level.
Perhaps the first point to make is that the real-world data obtained by registries will never replace randomised, controlled trials (RCTs) either in the evaluation of new treatments or in assessing their effectiveness in real life. Certainly, pragmatic RCTs can be implemented in real life settings. It is not a question whether one is better than the other, rather that the two are complementary. RCTs produce results in a controlled environment and are still most often concerned with bringing drugs to market, though EORTC carries out trials that, for example, randomise patients between existing therapies in order to ascertain which gives a better outcome.
Registries allow access to the kind of large-scale data that would be impossible to obtain through trials, but trials go deep into data collection, whereas registries are wide. The large, complex datasets contained in registries are often disorganised and unsystematic, as well as being difficult to access. And there can be reluctance to commit data to such a large repository. The problems are thus human as well as political and technical.
All registries would agree that achieving common data standards will be very challenging, but you could say the same about clinical trials. If you want to put two EORTC trials together in one database, it also requires effort. We have made progress, and there are now datasets that we can put together, but standardisation remains a difficult task. And collecting data for its own sake is not only expensive and time-consuming but also may lead to no real result. Something clearly needs to be done if we are to optimise the use of registries in investigating treatment outcomes on an international scale.
This could happen at European level if policymakers were sufficiently aware of the problem. At EORTC we hope that the creation of the European Data Space, one of the Commission’s priorities for 2019 – 2025, may help. The Data Space will promote better exchange and access to different types of health data, including data from registries, in order to support not just healthcare delivery, but also health research and policymaking. But if registry data continues to be collected in individual member states in such disparate ways, better exchange and access will count for nothing.
A recent EORTC workshop tried to come up with some possible solutions. For a start, the problem of the number of registries needs tackling. For the same disease, there are often several different kinds of datasets including those set up by companies which may focus on just one product or class of products. While using existing data sources such as registries has clear benefits – there is no need to start again from square one, and they are therefore very much cheaper to set up and run – the downside is a reduction in the ability to clean, annotate, and verify the data that are included.
The limitations of most existing registries include their tendency to collect individual doctors’ therapeutic practices, and this means that no regulatory body can take a decision based solely on registry data. Running small, pragmatic trials in real life is quicker, easier, and provides more robust results. Another problem with registry data is estimating how long and how many patients need to be enrolled so as to come up with a worthwhile result. A registry-based trial may appear to be cheaper than a traditional RCT, but what if it needs to run for 20 years to come up with a result? All this is not to say that registries, with their vast collections of data spreading over many years, are not valuable research tools. The question is how to exploit them to their best use.
EORTC currently has no registries, which are, at the moment, organised by national or regional health bodies, whereas we work internationally. In addition, our database technology is set up to manage regulatory quality randomised clinical trials. Using that technology to perform less perfectionist undertakings would be wasteful and inefficient. We could perhaps set up a consortium of registries to work with us. But is this a realistic possibility, and how would we go about it? Without future partnering of this kind, cohort studies will continue to be expensive, but is there sufficient interest in going further with registry collaboration, given all the inherent problems?
An alternative could be a federated data model, where the data stay in situ (e.g.in a hospital). The question to be answered would be centrally managed and the answers provided by individual members of the network. These models are gaining ground as being easier to manage in terms of data acquisition permission, while still remaining close to real world data. The price is a reduction in control on the semantics of the scientific question, as the resolution of the query is made through a local on-site IT solution, where interpretation may diverge.
At EORTC we know that registries are a valuable repository of information for research and for optimising patient treatment and care. But if they are to provide the best information, they need help in standardisation. We cannot expect them to do this all on their own, and thus we look elsewhere for the necessary support and guidance.
If registries and cohort studies are methods that allow us to reach patients and data outside clinical trials for monitoring, describing, and understanding many aspects of health care, the causality aspect will require benchmarking. We need to introduce a note of caution here – observed effects tend to be lower with routinely collected data due to variability and noise. Today, we are still in the learning process when considering the use of all these complementary approaches, including how best to plug trials into cohorts. This will be fine-tuned over time as technologies for access to data become more efficient. But the general notion will always remain; the conclusions that one can reach from an experiment addressing a specific question will depend on the selected design and the selected data points.