Supercharging Scientists in Digital Pathology – The Critical Role of Infrastructure
In this article, we provide an assessment of a typical digital pathology workflow with respect to scientist productivity and the impact of IT infrastructure.
Keep reading to learn more about:
- How handling large image files and image manipulation can be made quicker.
- Faster data generation from image analysis.
- The importance to scientists of seamless, secure remote access and the provision of dedicated computing resources to support digital pathology workflows.
- Strategies for optimizing digital pathology workflows, including the selection of computational resources, multitasking capabilities, and collaborative tools.
- An introduction to QDPconnect by Sciento, a managed cloud-based solution designed to improve the productivity of digital pathology scientists.
- and much more…
Maintaining Scientist Productivity
Within the field of pathology, the complete transition from traditional glass slide reading to image digitisation is in full swing, cascading on to the use of AI-powered analytical software to aid the interrogation of electronic images.
The highly interactive nature of digital pathology makes it particularly susceptible to any deficiencies in the underlying IT systems that support the digital workflow. Images are designed to be viewed, judgments made rapidly by a trained eye, and algorithms trained and retrained using specific regions of interest chosen in real-time by the scientist. When the underlying IT infrastructure fails to perform at the same rate as the scientist, productivity of can wane, leading to slow data generation and resultant frustrations, from the individual as well as the wider project teams.
The highly interactive nature of digital pathology makes it highly susceptible to any deficiencies in the IT workflow.
Identifying and alleviating pinch-points in a digital pathology workflow
An image analysis scientist will typically follow an established digital pathology workflow on a day-to-day basis. Evaluation of each sequential task in a typical workflow reveals the opportunities for gains in the productivity of scientists by having an optimal IT infrastructure and related support mechanisms.
In this article we identify some of the rate limiting steps in a standard quantitative digital pathology workflow and how applying specific IT approaches, in particular cloud computing, can help to remove bottlenecks.
Provisioning a Suitable Desktop Interface
Simple remote access:
- Secure and easy remote access from the office or home to desktops containing image analysis software is a fundamental component of a quantitative digital pathology platform. The use of Virtual Private Networks (VPNs) to provide encrypted connections is paramount, together with firewalls, secure networks, and password protection. The use of screen viewers to log on to remote static workstations should be avoided due to latency and screen resolution/size issues.
Individual desktops:
- Processing images on a workstation can be computationally intensive which can have an impact on other users on a shared system; systems can become unresponsive resulting in software crashing. The solution is to provision dedicated “Virtual Machine” (VM) desktops to individuals, each working independently of one another. The scale of cloud computing means that VMs can be made readily available to every user and can also be stopped or terminated when not in use to reduce costs.
User-controlled access to computer type:
- Matching the underlying computer server functionality to the analysis task promotes efficient workflow operation and this should ideally be in the control of the scientist. For example, the performance of Deep Learning (DL) related tasks requires a computer with a Graphic Processing Unit (GPU) whereas simple annotating tasks may only require small CPU functionality. The provision of computers with sufficient Central Processing Unit (CPU) and memory (RAM) for image viewing and analysis programs avoids delays in launching software and avoids crashing or hanging for long periods of time.
- Having a menu selection of compute resources immediately available for a scientist to select as necessary increases workflow productivity. Cloud computing approaches allow for real-time monitoring of individual VM performance which are used to identify bottlenecks that can be addressed by, for example, simply provisioning more CPU, Memory, or Network speeds.
Access to software:
- Ideally, scientists should have the ability to select the appropriate software tool for the task in hand and it is not unusual for established digital pathology teams to have access to two or more image analysis software e.g., Visiopharm, Indica Labs HALO, QuPath, as well as Python or Matlab scripted approaches. The various software tools should be installed on the same desktop for ease of access and to allow interoperability.
- Limited access to software because of licence restrictions can be overcome through mechanisms to distribute pool licences effectively, together with safeguards to prevent potential licence use infringement.
Working Quickly with Images
Image loading, panning, and zooming speed:
- Opening images quickly for viewing is paramount; a source of frustration and delay for image analysis scientists when not achieved, and more common now with immunofluorescent multiplex images approaching 100GB in file size. Viewing across whole slide images and changing the level of magnification is the most fundamental, interactive, and repeated of all the digital pathology operations; one where scientists are most focused mentally, making micro-decisions on the image content. The cumulative impact of these much-repeated operations makes this a pivotal pinch-point for workflow productivity. Opening image files within seconds maintains momentum in individual image assessment and also allows rapid comparison of multiple image sets.
- There are multiple approaches and solutions to this common problem. Approaches include the ability to change storage drive types (fast SSD drives vs slower disk drives), changing the IOPS (Input/Output operations Per Second), together with cached storage options for fast access. Again, cloud computing offers huge scope in this area, with monitoring and live infrastructure changes possible when the system is in operation.
Performing Image Analysis
Rapid annotations:
- Although simply annotating images can require small amounts of compute, other tasks such as real-time tuning of the parameters within the software can require larger amounts of CPU resources. Having the correct CPU size set-up in advance or the ability to switch to greater compute power seamlessly maintains the pace of algorithm development. Using web-based consoles to provide the scientist with the ability to upgrade their compute power at will from the options available in a cloud-based infrastructure is an elegant solution.
Using Deep Learning:
- Learning iterations per second is an indicator of an algorithm training rate, and maintaining this at a high rate is critical to ensuring DL algorithms are trained quickly. GPUs can vary in size and having the ability to swap standard and fast GPUs in and out of a workflow is important when experimenting with training speed and balancing cost versus speed of training.
- Cloud computing options allow access to large-scale GPU capabilities and the nature of cloud computing means that GPUs can be rented by the hour and the GPU VMs terminated when no longer required. Furthermore, having the flexibility to access next-generation computer power from the cloud is a cost-effective way to future-proof against the growing demands of AI-based approaches.
Deploying algorithms:
Applying algorithms to images (processing) can be time-consuming when large batches of images are involved. In these circumstances, data can be generated faster if parallel, rather than sequential approaches are deployed, to allow multiple images to be processed simultaneously. The scale of cloud computing can be brought to bear here where it is possible (subject to software licencing) to start hundreds of individual processing VMs, each processing an individual image in parallel, with the processed data being fed back into the software once the processing is complete. The use of scripting can allow automation of this process and automatic termination of the VMs once the ‘jobs’ are complete.
Multi-tasking:
- Real efficiencies and improved productivity occur when the computing resources of scientists do not limit them to the performance of a single task, and they have the ability to multitask. It is possible to have all of the tasks performed by image analysis software conducted on desktops of virtual machines, controlled from a scientist’s host computer but without interrupting or slowing the functionality of other general software packages e.g. email, word processing, etc.
- Furthermore, the ability to scale to multiple virtual machines provides scope for multitasking within a digital pathology workflow itself. The ability to review images on one virtual machine desktop, and train a DL algorithm on another whilst processing images on another is a clear example of the type of highly productive workflow that is achievable with an optimal supporting IT infrastructure.
Management of Images & Data
Image Storage:
- Badly structured, constantly filling, slow-to-respond storage diverts the attention of scientists away from their primary goal of data generation and leads to workflow hiatus. Conversely, a well-curated storage system that has unlimited storage capacity together with automated processes for archiving data, removes day-to-day, repeat decision-making. Such automated archiving of image sets can also provide an organisation with options for highly cost-effective long-term storage solutions such as deep glacier approaches with costs in the region of $1 per TB per month. Cloud storage also provides the reassurance of high levels of data integrity and availability.
Workflow collaboration
- Digital pathology is a highly collaborative discipline with various specialties potentially contributing to a workflow; histologists, pathologists, and image analysts; with both cross-site and cross-company interactions possible. An efficient, remote access system provides the appropriate entry point for scientists. A reliable, secure, and well-managed image management system provides controlled access to appropriate images and data. A system that is responsive to image opening, image manipulation, and sharing software overlays allows the appropriate feedback to be collected on live calls.
Use of New and Improved Software
A sandbox environment:
- Software updates and new version releases are a necessary part of continual improvement. Having separate “test” and “production” environments to stress test any new updates or versions is best practice before committing those updates to a live, production environment. Such environments are also essential for the assessment and controlled deployment of new software approaches.
Quick access to older software versions:
- It is also best practice to be able to recapitulate data generation from primary images, for which there may well be a requirement to store legacy software versions. Having an IT system that can quickly launch legacy software versions from stored operating files makes this potentially onerous task achievable.
Clinical trials support:
- The use of digital pathology and image analysis in clinical trials is increasing. Many such trials can last upwards of 2-3 years in duration. To ensure the use of a consistent analysis software version throughout this period often requires the ability to run IT systems in parallel; a system with a trial software version and a system for contemporaneous software versions, to support the ongoing digital pathology effort.
Informed Helpdesk Support
Specialist IT support:
- Optimal image analysis software performance involves an intimate relationship between the software and the supporting IT infrastructure. The most effective support is delivered by IT professionals who understand the nuances of image analysis software. Digital pathology is also a highly dynamic field and so a future-looking function keeps pace with change.
Responsive support:
- Having the support of IT professionals who respond quickly to scientists needs is an obvious benefit to any workflow. Having an IT system built in a way that facilitates that response time is also an important consideration. The cloud-based infrastructure deployed from cloned templates is an efficient way to distribute software updates quickly and reliably and allow troubleshooting of issues across a wide range of users, rather than fixing problems on an individual computer basis.
Conclusion
That an efficient digital pathology workflow is highly dependent on the effectiveness of the underlying IT infrastructure may be obvious. What may be less obvious is the extent and degree to which that impact can be felt.
From entry points into the workflow, through to the speed of image manipulation, and onto the speed of algorithm development, deployment, and finally to data storage, there are numerous points of dependency. These may be categorised as largely predictable and intrinsic needs of a digital workflow. What is less evident is the scope for large productivity gains in the workflow that can be made through additional features such as:
- Flexibility for scientists to self-select compute resources
- Allowing the scientist to multitask outside and within the digital pathology workflow
- Streamlining user experience in collaborative interactions within that workflow
The expansive list provides a huge opportunity for increased productivity of digital pathology scientists, should the correct IT infrastructure and support be made available. Counter to this, it may be an unwelcome realisation to resource managers of the significant negative impact on lab productivity when the IT infrastructure and support are not optimised.
Sciento Technologies has designed and built QDPconnect – a managed cloud-based software solution to improve the productivity of scientists working in the field of digital pathology. An established solution built originally to support the needs of OracleBio, a major provider of quantitative digital pathology services to Pharma and Biotech worldwide.
Get in touch if you’d like to learn more about Sciento and QDPconnect.