Speeding-Up AI Training with Large Language Models (LLM) Innovation

Speeding-Up AI Training with Large Language Models (LLM) Innovation

Sally Eaves 05/08/2022
Speeding-Up AI Training with Large Language Models (LLM) Innovation

We live in an Age of Actualisation for Natural Language Programming (NLP) and especially transformer-based Large Language Models (LLMs).

Firstly, NLP models are on the rise, advancing a fascinating field of AI with applications to shape the very future of human lives, tackling an array of tasks including conversation, text completion, and even coding. This has been fuelled by advances in accuracy, scalability and production readiness. Indeed, research conducted by John Snow Labs last year revealed that 60% of tech leaders indicated their NLP budgets grew by at least 10% compared to 2020, with a rel="nofollow" third (33%) reflecting a rise of at least 30%.

Special attention has been afforded to LLMs for their key role in re-imagining two key branches of AI: language and vision, with NVIDIA's NeMo Megatron a superb example, providing an end-to-end framework from data curation, through to training, inference and evaluation. LLMs are trained on huge amounts of data and enable learning from text with applications across text summarization, real-time content generation, customer service chatbots and question-answering for conversational AI interfaces as summarised below:

Key LLM Benefits

1) Summarization and Paraphrasing (or a better way to say)

-  Real World Examples: Newsletters, Web Search Engines, Media Monitoring

2) Classification (request or question)

-  Real World Examples: Supply Chain Order Queries

- Banking Queries directed to Agents 

3) Semantic Similarity (answered before?)

- Real World Examples: Credit Scoring, Clinical Trial Matching,

- Modular FAQs Scoring Helplines

The size and complexity of LLM-'s continues to accelerate. Traditionally training a large language model could be compared to the sport of bodybuilding - you need to eat an awful lot (big data) plus work out a lot (large model)! Indeed Deep Learning is more dependent on computing power than many other areas because models have more parameters than data points. Other pain points in developing LLMs include time, expense, level of deep technical expertise, distributed infrastructure and the need for a full-stack approach. But these challenges are being addressed head on with NVIDIA's NeMo Megatron upgraded to help train LLMs more efficiently, making them more powerful and more applicable in a range of different scenarios. Let’s explore all the key developments!  

Reflecting back on the pioneering research paper "Attention is all you need’, the scale, speed and trajectory of innovation is clear with NVIDIA now established as a leading and indeed pioneering contributor in the AI field. As someone who has worked in research development myself, most recently as a Principal Investigator on the Digital Transformation ‘Breakthrough’ Study, I have experienced first-hand the challenge of training LLMs on vast datasets on supercomputers, which can take weeks, and sometimes even months. So what if we could speed up training by up to 30% - or by around ten days development time?

Updates to NeMo Megatron by NVIDIA are allowing exactly that! – with the ability to distribute training on as many GPUs as you want, reducing both the amount of memory and compute that is required for training, and making deploying the model much more accessible and significantly faster too. This translates into a 175B-parameter model being trained in 24 days instead of 34 using 1,024 NVIDIA A-100 Tensor Core GPUs.

Amount_of_Activation_Memory_Required.png

This development also offers exciting opportunities for companies to offer customised GPT-3 models ‘as a service’, for example a bespoke and sector specific LLM. I also believe this can help advance the capacity for LLMs to interact together, the ‘cascade perspective’ described here and to better plan and reason about change through pattern recognition – an issue highlighted in further breaking research. So how has this innovation been actualised? The advance is underpinned by two technologies and the hyperparameter tool:

Sequence Parallelism (SP)

Reduces activation memory requirements beyond usual tensor and pipeline parallelism methods by noticing the previously non-parallelized layers of transformers and recomputing only the parts of each transformer layer that use significant memory but are easy to compute.

Selective Active Recomputation (SAP) and Hyperparameter Tool

Selects activations with high-memory low-compute requirements to recompute when memory constraints are too tight, and so avoiding the inefficiency of full activation recomputation. The hyperparameter tool introduced in NeMo Megatron automatically identifies optimal training and inference configurations, allowing configurations with the highest model throughput or the lowest latency during inference, and eliminating the time taken to search for an optimal design – and all with no code changes required.

Amount_of_Computation_Overhead.png

And with my personal passion for democratising access to AI, it is meaningful to note that BLOOM – which stands for BigScience Large Open-science Open-access Multilingual Language Model and is the “world’s largest open-science, open-access multilingual language model" - has been trained using Megatron-DeepSpeed. This has been supported by the work of 1,000+ volunteer researchers as part of the BigScience Project, coordinated by Hugging Face and with French government funding – and it is already enabling text generation in 46 languages and 13 programming languages! With the performance updates to Megatron now available, alongside business benefits, I hope this will also help further advance change in the culture of AI development and support democratising access to cutting-edge AI technology for researchers around the world. Exciting times indeed!

Final Thoughts

The advances to NeMo Megatron will significantly increase future calculations and results, making LLM training and inference both easier and reproducible on a wide range of GPU cluster configurations – I am excited to see all the new applications this will bring, and meanwhile the latest technical blog on all the changes can be viewed here. You can also explore early access options and freely test-drive the enterprise hands-on lab through NVIDIA too!

And as a final thought, don’t miss the chance to connect with AI developers and innovators at #GTC22 this Sept 19-22 and help to shape the future of Artificial Intelligence, Large Language Models, Computer Graphics, Accelerated Computing and more. You can register free with my unique code for a chance to win a RTX3080 Ti – just click here!

Thank you for reading – all feedback most welcomed, Sally

About the Author

Prof. Sally Eaves is a highly experienced chief technology officer, professor in advanced technologies, and a Global Strategic Advisor on digital transformation specializing in the application of emergent technologies, notably AI, 5G, cloud, security, and IoT disciplines, for business and IT transformation, alongside social impact at scale.

An international keynote speaker and author, Sally was an inaugural recipient of the Frontier Technology and Social Impact award, presented at the United Nations, and has been described as the "torchbearer for ethical tech", founding Aspirational Futures to enhance inclusion, diversity, and belonging in the technology space and beyond. Sally is also the chair for the Global Cyber Trust at GFCYBER.

Share this article

Leave your comments

Post comment as a guest

0
terms and condition.
  • No comments found

Share this article

Sally Eaves

Tech Expert

Dr. Sally Eaves is a highly experienced Chief Technology Officer, Professor in Advanced Technologies and a Global Strategic Advisor on Digital Transformation specialising in the application of emergent technologies, notably AI, FinTech, Blockchain & 5G disciplines, for business transformation and social impact at scale. An international Keynote Speaker and Author, Sally was an inaugural recipient of the Frontier Technology and Social Impact award, presented at the United Nations in 2018 and has been described as the ‘torchbearer for ethical tech’ founding Aspirational Futures to enhance inclusion, diversity and belonging in the technology space and beyond.

   
Save
Cookies user prefences
We use cookies to ensure you to get the best experience on our website. If you decline the use of cookies, this website may not function as expected.
Accept all
Decline all
Read more
Analytics
Tools used to analyze the data to measure the effectiveness of a website and to understand how it works.
Google Analytics
Accept
Decline