Data Science Pattern Matching


Digital Transformation



Finding patterns in large data sets to reduce environmental impacts in pharmaceutical manufacturing.


During 2019 and early 2020 the manufacturing plant faced rising levels of toxins and asked Wyoming to use data analysis and data modelling to identify the root cause and from there to prepare monitoring and prediction dashboards to inform remediation activity. The wastewater treatment plant (WWTP) is a complex environment with several processing stages, several large vessels (circa. 13,000 cubic meters) and two communities of bacterial organisms that synthesise waste to safe levels of toxins.

The WWTP tracks approximately 150 metrics such as temperature, acidity, dissolved oxygen and flow rate. Together in certain combinations, these metrics lead to stable plant operation (i.e. low toxins) or in other combinations, there can be excursions beyond desired rages and some instability in plant operation which can yield high levels of toxins.

Data quality was noted as a challenge as metrics were subject to different collection and transcription methods so required normalisation prior to use. Some data came directly from a distributed control system (DCS), some data came from standalone adjacent systems, some came from inline meters placed at key points in the plant and other data came from manual probes inserted to certain vessels and pools by staff on  an irregular basis.


The first part of the process was to address inconsistencies in the data and create a base of reliable data to feed into the modelling process. Some data had multiple instances per day, others had a single instance each day and some were collected less frequently. The team at Wyoming established protocols to create consistency across the variable set and present cleaned data into the model each day.

To identify the factors that lead to stable or unstable operation, Wyoming analysed 18 months of daily metrics and built a model to predict the forward trend for toxins rising or falling. To start the process, principal component analysis (PCA) was used to identify which variables had the greatest impact on the output variables (the levels of toxins). An initial subset of variables was then used to focus attention and reduce model complexity. PCA was regularly reviewed and the metric mix in the model was adjusted over time as new patterns emerged.

The data workflow followed the steps below:

To power the insight, a data workflow was developed to source, normalize and prepare the data for modelling and analytics output. First, custom data connectors were built to grab data from each source, clean it and normalize the formats, creating a single dataset. 

Next, PCA runs through the data to check for the variables with the greatest significance and then we built a cluster analysis to separate data into groups. This then supported various analyses including Sliding Window Moving Average, Regression and Moving Average Convergence Divergence.

A variety of statistical modelling tools were used in a unique workflow to transform and analyse the data metrics and create robust predictions, visualised through a dashboard. Key outputs were surfaced on the dashboards to give a look-ahead view of the likely track of key metrics, giving early warning to operators of rising toxins.

A large amount of optimisation was carried out to establish and validate the correct settings for the models at various stages in the workflow and to manage risks of collinearity and over-fitting. Considerable effort was taking to ensure the models were robust and operating correctly.


Once the analysis team were comfortable that the model was operating correctly and generating insight with good predictive strength, the work turned to delivering this into a production setting. This required a high level of user experience (UX) design work to create simple, clear and consistent gauges, charts and alerts so that very quickly the production users could understand the current information display without fear of misinterpretation. A further complexity was localising the dashboard to apply to a very different culture to that of the dashboard developers. Not only translating the text but adopting phrasing and nomenclature that was more commonly in use at the site.

The dashboard showed several metrics, with a current value for the metric and predicted trend to cover then next several days. In addition to a concise view of key metrics, this gave site operators, for the first time, a robust prediction of where the metric will likely be in three to ten days time, giving them a method to continue stable operation or to adjust inputs and address unstable operation.

Now, with the predictive analytics powered dashboard, the production team no longer react to unexpected excursions from safe operating ranges, they can anticipate deviation and take more subtle interventions.

The costs to remediate excess toxins has gone down, as very large quantities of emergency chemical interventions have completely stopped. Previously, it took them hours to understand where they were and now it’s done in minutes.  The plant is now operating in a more controllable and safe manner.

About the client

Our client, a global contract manufacturing organisation (CMO), operates a pharmaceutical manufacturing plant which generates bulk drug materials for other manufacturers. As part of their manufacturing process, toxins are present which have to be treated in a wastewater treatment plant (WWTP) connected to the manufacturing site. From there, treated wastewater flows to the public authority waste treatment network.


Explore more digital projects

Using software to simulate experiment design and model different set-ups. We designed flow panels in software to optimise data selections, avoid waste and get right-first-time operation.

Clinical Trials CDMO improves data access for customers. We developed a secure portal, providing real-time access to clinical trial manufacturing and distribution information.

Making it easy for scientific researchers to select products from large catalogs with tailored searches and filters. UX in scientific applications requires careful planning with subject matter experts to design user journeys that are easy to use, obvious and consistent.