Interested in learning more about the future of AI?
The buzz around potential applications for AI continues to grow. The trouble is that the excitement is often not tempered with a solid understanding of the practical requirements for innovating with AI. So let me cut through the noise with a simple but heartfelt message. To scale rapidly and reap the rewards of evolving customer opportunities, large-scale AI-enabled projects must be built on firm foundations to allow multidisciplinary development teams to thrive.
This is a recurring theme for me as I collaborate with clients to bring AI-powered innovations to market. Speaking at and writing for the recent AI Summit Silicon Valley, I put the emphasis on practical ways to help move AI from the drawing board to profitability. There’s no getting away from the fact that AI is hard – but I passionately believe that by systematically identifying and overcoming common roadblocks we can help build momentum for ambitious companies.
In this article, I want to dive a little deeper into my theme and focus on the challenges faced by organizations aiming to launch large-scale services, maybe with IoT and AIoT devices attached. They could be at an early ideation or planning stage or struggling with scale-up. Either way, they’ll need to find a way past a key conundrum – the need for data to fuel an AI engine is well understood, but the organization and infrastructure needed to make it a reality certainly isn’t.
Let’s look at the landscape for a bit of perspective. While much of the excitement around AI has been driven by academic achievements, the engineering practice of AI has also developed apace. It’s managed to get innovation in algorithms and data into production to generate value. In the last couple of years, search engine trends have revealed an explosion of interest in the practical engineering needed for AI – or MLOps as it’s known.
Delivering the next wave or AI-enabled services will require continual data handling, model retraining, deployment and insight generation between device and cloud. This adds complexity and effort that needs to be abstracted away from data science teams as much as possible. They need to focus on the front end of algorithmic innovation and service innovation without being bottlenecked by infrastructure or labor- intensive processes like data cleaning.
This problem has been recognized as long ago as 2015 and the ML community has been gradually responding to the evolving challenge. As applications of ML become more sophisticated, MLOps techniques also need to respond. In particular, for the type of large-scale, multidisciplinary projects that we undertake at here at CC, MLOps is critical to success. It is also true that the sub-fields of ModelOps and DataOps are key. This article from MIT SMR gives a good description of each.
Broad foundations start with strategic alignment
To take advantage of the potential of AI, there are three broad foundations that need to be built: strategic alignment, data driven organization and processes and a framework for experimenting and iterating in the wild. I mentioned above the importance of strong organization and infrastructure when innovating with AI. To get both pillars in place and succeed with an ambitious vision for AI, it is vital to forge strategic alignment amongst end users and key stakeholders as early as possible.
It’s easy to align on the possibilities of AI, but alignment on the specific use cases and associated engineering complexity and investment may be harder. It’s worthwhile getting this in place early and maintaining a dialogue as ambitions evolve. Capturing this in a multidisciplinary AI strategy from the outset that is updated as the project matures will pay dividends.
Data driven organization and processes
As this blogger puts it ‘ML is not just code, it’s code plus data’ which means we need to be data centric. The benefits of large volumes of representative, timely and accurate data for training and inference is increasingly widely understood, but how should that be reflected in the organization and processes put in place? How do we avoid data science and AI teams spending as much as 80% of their time on tasks other than algorithm development?
So that data science and AI teams can focus on innovation, we need to organize around them and reduce friction which slows experimentation. We see this through two lenses – model management (which maps quite well to traditional software DevOps practices) and data management (which is newer to some organizations). It seems straightforward, but even reproducing results can be challenging without the right processes in place. Getting this right turns data collection and management into a source of competitive advantage.
In looking at what works well in our development work and amongst our top-performing clients, the following principles emerge
- Encourage data scientists to drive requirements and engage with other disciplines. Data requirement, management and profiling are key initial steps to help inform the design and development of AI technology. They are also key in making sure that development workload and resources are planned and assigned properly
- Start AI engineering as close to the data source as possible and build the right data pipeline/architecture (with implications for compute and connectivity choices as discussed by my colleague Joe Corrigan here). In particular this should include data quality control. This avoids introducing limitations early in the data pipeline which may only be identified later in model development
- Avoid collecting data in parallel with model development, unless there is good reason. This has implications for R&D timelines and device design (mapping across even simple changes can be complex and costly)
All of the above will make your data scientists more effective but it’s still a significant amount of effort to invest. Larger projects may benefit from dedicated data engineers to free up data scientists for experimenting with data. These principles are also important for scaling – capturing the right data, processing it effectively and without overinflated communications and compute costs is important to growth.
This applies not just to growth in users/devices but also reach – when expanding into different contexts re-training is required. If you train an autonomous car in the US, you will need to do some level of retraining for other geographies. It’s unlikely to be ideal to simply redeploy the same model. So that’s yet more data which needs to flow through the pipeline.
A lot of this thinking is captured in the now well established, CRISP-DM framework, which is increasingly becoming an industry standard for developing data centric processes.
A framework for experimenting and iterating in the wild
Your rate of innovation flows directly from the speed and quality of each iteration of experimentation you can run through. This means that in addition to the data infrastructure outlined above, you will need an MLOps/ModelOps infrastructure and processes. This is the engineering framework which allows you to create and deploy AI technology in the wild.
This is where the best practices of modern software development need to come together to enable design, evaluation and deployment of models. In our view, the processes and infrastructure required should seek to accelerating the learning cycle which moves product and service development on. How this plays out in the form of infrastructure and processes strongly depends on the use case – for example applications with cloud, on premise distributed compute, or some hybrid will each generate different needs.
Increasingly, tools are becoming available to form part of the overall solution, such as Modzy and Nubix. Regardless of what is used, the way I see it is that top performing organizations ensure that their systems are instrumented to provide useful measurements and hence feedback. Feedback is key to the learning which moves model design and deployment forward.
So, there you have it. Three broad foundational areas which provide the basis for large scale AI-enabled projects to thrive, to scale rapidly and respond to evolving customer needs. As I said at the outset, they are practical, systematic responses to overcome challenges faced by many. If that includes you, and you need help achieving strategic alignment, becoming a data driven organization or establishing a framework for experimenting and iterating with AI in the wild, please do get in touch. It’ll be great to hear from you.