It is a dynamic and threshold breaking time for advancing innovation across Conversational Artificial Intelligence, Computer Vision and Recommender Systems (RecSys), with NVIDIA accelerating new ground with the launch of TensorRT 8, alongside multiple RecSys competition successes - more on this news in depth shortly!
But firstly, let’s set the scene on exactly why this matters so much today. As we move into an Era of Convergence blending algorithm, engineering and culture alike, and reflecting both the level of integration and the increased pace of socio-technical change, it becomes imperative to manage the demands this inevitably creates - in order to optimise the vast opportunities.
Deep learning is a case in point - applicable to a diverse and growing range of industries from medical devices through to conversational IVR and automated driving; and across a wide range of applications in production, including image and video analysis, natural language processing (NLP) and recommender systems. But as the number of applications increases so do the demands! From performance to accuracy, this has resulted in the growth of model size and model complexity too. As an example, while leading large scale language and transformer based models such as BERT have afforded significant accuracy gains for many NLP tasks, it has also demanded substantial compute needs during inference due to its 12/24 layer stacked and multi head attention network. Context matters too! So safety critical applications such as those found within the automotive sector will have understandably rigid requirements on the latency and throughput expected of deep learning models. And this extends to consumer domain requirements with applications such as recommender systems.
TensorRT is an AI Software Development Kit that has been designed to accelerate deep learning inference for the very use cases described above and with extensive successful application. In five years, over 350,000 developers across 27,500 companies and in sectors including finance, retail, automotive and healthcare have downloaded TensorRT almost 2.5 million times, with applications deployed in hyperscale data centres, and embedded or automotive product platforms. This also reflects growth in the percentage of enterprises employing AI, some 270% over the last four years (Gartner) with the demand for real-time applications using AI also rising significantly.
Supporting all major frameworks, TensorRT helps process large volumes of data with low latency catalysed by the combination of powerful optimizations, efficient memory use and calibrated use of reduced precision. This is high performance deep learning inference by design – and it’s just got better with an eighth generation release!
With Speed and Accuracy both an imperative, the recent launch of TensorRT 8 gives the fastest AI inference performance slashing time by half for language queries. It represents a significant enabler for developer innovation from conversational AI to add recommendations to search engines, deployed from the Cloud to the Edge. This is record breaking, bringing BERT-Large inference latency down to an incredible 1.2 milliseconds, allowing companies to double or triple their model size and in so doing, achieving dramatic improvements in accuracy, whilst supporting quality and responsiveness too.
“NVIDIA TensorRT 8 is the most advanced AI inference solution in the market today, enabling hundreds of thousands of developers to deploy AI-based, real-time applications from the cloud to the edge. By achieving BERT-Large inference in a millisecond through transformer-based optimizations, enterprises can now offer conversational AI experiences to their customers that are smarter and faster than ever before.”
Sid Sharma, Head of Product Marketing, AI Software, NVIDIA
And if you are part of the NVIDIA Developer Program, TensorRT 8 is free to access now and with all the latest versions of parsers, plug-ins and samples available on the dedicated GitHub repository here. You can also access the full press release at this link.
One of the most visible success stories of Artificial Intelligence in practice is the rise of recommender systems, a particular form of information filtering. RecSys offer personalised service support to users by learning their previous behaviours alongside user similarities and then predicting their current preferences, so helping millions find what they want to watch, buy and play. This can enhance user experience and satisfaction and on the largest commercial platforms, can equate to an eye opening 30% of revenue - with a 1% improvement in quality translating into billions of dollars.
To reach this point there can be many challenges when training large scale recommender systems including vast datasets, extensive repeated experimentation, huge embedding tables, complex data pre-processing and feature engineering pipelines, and finally distributed training. These extend to monitoring and re-training alongside real-time inference when RecSys are actually in production. Deep Learning methods provide superior prediction at scale for building recommenders with the application framework and ecosystem Merlin built to facilitate all phases of development, from experimentation to production, accelerated on NVIDIA GPUs. It perhaps then is no surprise that teams from NVIDIA recently won three top recommender systems challenges in just five months. The smiles here say it all!
For the ACM RecSys Challenge, Twitter gave participants heterogeneous input data consisting of approximately 40 million data points a day for 28 days and asked them to predict which tweets users would like, retweet, quote or reply to, whilst also providing fair recommendations. Week 1 - 3 was used for training with week 4 for evaluation and testing. NVIDIA’s 7 strong team demonstrated that an inference job that took nearly 24 hours on a CPU core could run on a single NVIDIA A100 Tensor Core GPU in just seven minutes – a staggering level of improvement and great example of multi-goal optimisation too! Plus a second ACM win!
Another example comes from Booking.com with their challenge centred on devising a strategy for making the best recommendation for a travellers next destination in real-time. This utilises a dataset based on millions of real anonymized accommodation reservations and associated data points with the full NVIDIA team leading a field of 40 to correctly predict the final city a vacationer in Europe would choose to visit. And finally, the SIGIR eCommerce 2021 challenge addressed the growing need for reliable predictions within the boundaries of a shopping session, as customer intentions can be different depending on the occasion. With over 20 teams coming from both industry and academia taking part, participants were provided with 37 million data points from online shopping sessions and asked to predict which products users would buy. Another fantastic result for NVIDIA but equally for the positive contagion effect this affords for collaboration and impact across research and practice.
Looking ahead, the outputs of these competitions will contribute to research development and to frameworks such as Merlin which has been designed to meet the computational demands for large-scale DL recommender systems training and inference. And in turn, this insight and innovation will enable data scientists, machine learning engineers, and researchers alike to build high performing recommenders at scale with the multitude of benefits this brings. Further information is available here.
"Our vision and core principles for our software, like Merlin, include being easy to use, production-ready and open to the technologies that data scientists are using. Participating and winning 3 industry challenges in 5 months creates an invaluable feedback loop, driving new ideas and innovations. These new methods are incorporated into our open source software with the ultimate goal of democratizing recommenders."
Kari Briski, Senior Director of Product Management, AI Software, NVIDIA
Dr. Sally Eaves is a highly experienced Chief Technology Officer, Professor in Advanced Technologies and a Global Strategic Advisor on Digital Transformation specialising in the application of emergent technologies, notably AI, FinTech, Blockchain & 5G disciplines, for business transformation and social impact at scale. An international Keynote Speaker and Author, Sally was an inaugural recipient of the Frontier Technology and Social Impact award, presented at the United Nations in 2018 and has been described as the ‘torchbearer for ethical tech’ founding Aspirational Futures to enhance inclusion, diversity and belonging in the technology space and beyond.