Generative artificial intelligence (AI) is the art of creating new things inspired by the old.
To understand what generative AI does, ask yourself what you would do if you wanted to create a new hit song.
You wouldn't want to start from scratch. Instead, you'd listen to 100 popular songs, analyze the rhythms, melodies, and chord progressions.
You would learn how these elements are structured and combined. Then, with this knowledge you would use your creativity to mix and match them in new and exciting ways—resulting in a fresh, original song that may be the next big hit.
That's essentially what generative AI does.
It devours massive amounts of training data, whether it's music, images, text, or scientific data. It learns the underlying patterns and relationships within that data. Then, it uses this knowledge to generate entirely new content that shares characteristics with the training data, but with a unique twist.
Generative AI operates like a master mimic, following a three-step process to create entirely new content.
1. Find the building blocks. As mentioned, generative AI begins by analyzing vast amounts of training data. The goal is to identify the underlying building blocks and structures that define the data. For example, different rooms in a house contain different objects—kitchens have different items than bedrooms. Think of it like understanding the grammar of a language or the architectural styles that make up different cityscapes. For example, generative AI trained on cat images would learn to recognize distinct features like ears, whiskers, and fur patterns.
2. Decipher the unwritten rules. Next, it calculates the conditional probability distribution of these structures. In other words, it finds the order and relationship between various structures and the building blocks. Imagine that kitchen again. We know that knives are more likely to be found near cutting boards than, say, pillows, and that a sofa is typically found in front of a TV in the living room. Generative AI quantifies such relationships which allows it to then understand the underlying rules of how likely certain elements are to co-occur and in what order.
3. Perform a bold and creative remix. Finally, armed with this knowledge of building blocks and the rules of their order and structure, generative AI can create entirely new samples. It combines the learned structures and their conditional probabilities in novel ways, producing something entirely new yet rooted in the representations extracted from the training data. This could be a photorealistic image of a cat with unique markings, a catchy song inspired by popular hits, or even a compelling scientific hypothesis based on existing research.
State of the Art
There are a growing number of success stories, and hype is building around the state-of-the-art use of generative AI by engineers. The field is rapidly evolving, with many innovative use cases pushing the boundaries of engineering into new realms. Here are some ways engineers can benefit from the creativity of generative AI.
1. Engineering design of infrastructure. Generative AI assists engineers in exploring a vast design space by taking into account constraints such as materials, cost, style, and space. This allows for the creation of innovative designs that enhance performance and efficiency. AI-driven design tools enable the generation of new designs that optimize performance and efficiency, leading to infrastructure that is not only cost-effective but also structurally sound and efficient.
2. Discovering new materials. By analyzing extensive material databases, generative AI helps in discovering new compositions with specific properties, such as stronger and lighter batteries, advanced CO2 filtration systems, and effective water-purification methods. In conjunction with atomic, molecular, and interfacial simulations, these generative models are able to predict the properties of new materials, significantly accelerating the discovery process and enabling engineers to develop materials that meet precise specifications and performance criteria.
3. Data summarization. Companies are engaged in the use of generative AI to process vast amounts of historical reports, project updates, emails, and decisions to provide a holistic view of the engineering process and the consequences of human decision making. These holistic generative AI solutions help engineers answer questions such as, "How will production change when operations are changed in a particular manner?” and “What are the effects on customer feedback due to the changes in operations?" In a true sense, generative AI helps us make effective data-driven decisions. Generative AI analyzes connections across various projects and reports over long timescales, helping engineers understand the impact of their decisions. This holistic view facilitates better strategic planning and optimization.
4. Code generation. Generative AI automates repetitive coding tasks, enabling engineers to focus on more complex problems. It can generate visualizations and perform data analysis, speeding up the development process. Consequently, such an AI tool allows for rapid prototyping to test concepts without extensive coding knowledge. They also assist in understanding and navigating existing codebases, making the integration of new features smoother.
5. Product design. Generative AI aids in designing user interfaces and experiences that improve usability and usage monitoring. This results in products that are more intuitive and user-friendly. Generative AI greatly improves the product design that streamlines workflows and automates repetitive tasks, increasing efficiency, and reducing the likelihood of human error.
6. Product testing. Such an AI system generates new test cases for software and hardware to rigorously identify potential bugs and functional issues. The generative AI tool can be tuned to generate challenging edge and failure cases that are missed during manual, conventional testing. This thorough testing ensures higher reliability and performance. Generative AI helps explore extreme conditions or unusual inputs, uncovering edge cases that might be missed during traditional testing, thereby enhancing the robustness of the final product.
Digging Deeper Into the Inner Workings
There are six distinct approaches to implementing generative AI, each rooted in its own set of underlying principles geared toward offering innovative solutions.
Generative adversarial networks (GANs). Imagine two neural networks locked in a creative duel. One network, the "composer," aims to create a new hit song, while the other, the "critic," evaluates its efforts. The critic's role is to point out if the song sounds outdated or boring, pushing the composer to improve. Through this continuous feedback loop, both networks become more refined in their roles. The composer learns to create increasingly sophisticated and appealing music, while the critic sharpens its evaluative skills. Ultimately, this competition drives the composer to produce truly exceptional new music, embodying the dynamic and innovative spirit of GANs.
Variational autoencoders (VAEs). VAEs operate like masterful distillers of data. They use a series of filters to progressively compress and refine information, extracting its most essential elements, the building blocks. For instance, when analyzing Western classical music, a VAE can isolate the piano sound as its core essence. In contrast, when processing Indian classical music, it can identify the distinctive presence of the sitar. This method excels in capturing the fundamental features of diverse data types.
Autoregression. Autoregressive models learn to predict each component of data by understanding the relationship between it and its neighboring components. Imagine looking at an image where the body of a duck is surrounded by water pixels, or a mountain is adjacent to blue sky and clouds. Autoregressive models excel at these predictions, using the context provided by nearby pixels to determine what comes next. This method is especially powerful in sequential data, such as text or time series, where the prediction of the next element is based on its preceding elements. By focusing on these local relationships, autoregressive models can generate coherent and contextually appropriate sequences, whether in images, audio, or text.
Diffusion. Diffusion models approach data generation by learning how noise affects good data and then reversing this process. The training begins by quantifying how adding noise alters high-quality data. Once this understanding is in place, the model learns to progressively clean a noisy, blurry mess into a clear and accurate representation, whether it's an image, text, or audio. This method is notably efficient, requiring less data to train while still producing excellent results across various tasks. The simplicity and effectiveness of diffusion models make them a powerful tool for generating high-quality data from seemingly chaotic beginnings.
Recurrent neural networks (RNNs). RNNs are designed to handle sequences by quantifying long dependencies and the order of the basic building blocks of a sequence. This method processes one sequence at a time, making it particularly adept at understanding and generating sequential data. Imagine listening to Indian classical music: an RNN can determine that a sitar sound will likely be followed by a flute, and then a table beat will come in alongside the sitar. By capturing these intricate dependencies and sequences, RNNs excel in tasks where the order and context are critical, such as language translation, speech recognition, and music composition.
Transformers. Transformers revolutionize sequence processing by handling multiple sequences in parallel, rather than one at a time. This allows them to find dependencies and the order of basic elements both within and between sequences. Think of a stage performance with multiple actors: instead of a single spotlight following one actor, multiple spotlights illuminate different actors simultaneously, adjusting their intensity based on each actor's dialogue and interactions. This parallel processing enables transformers to capture the dynamics and relationships of all actors on stage at once. As a result, transformers are incredibly powerful in handling complex tasks like natural language processing, where understanding the context of entire paragraphs or even documents at once is crucial. Their efficiency and ability to model long-range dependencies have made them the backbone of state-of-the-art AI applications, including language models like GPT and BERT.
Challenges in Generative AI Development and Use
Generative AI holds immense promise, but several hurdles need to be overcome for its widespread adoption.
1. Data dilemma. Generative AI models thrive on vast amounts of data for self-supervised learning. However, acquiring this data can be a significant challenge.
- The sheer volume of data needed can be staggering. Imagine training a model to generate realistic images—it might require millions of labeled images encompassing diverse objects, scenes, and lighting conditions.
- Data quality is paramount. Biased, noisy, or incomplete data can lead to models that perpetuate these flaws in their outputs. For instance, a model trained on poorly labeled medical images might generate inaccurate diagnoses.
- Data collection can be expensive and time‑consuming. Gathering sensor data from real-world environments presents challenges like sensor limitations, calibration issues, and potential errors. Additionally, the long‑term nature of data collection can lead to inconsistencies as data formats and sources evolve.
2. Curse of dimensionality. Spatial data, like images or 3D models, often exists in high dimensions. While this richness allows for capturing complex details, it also presents difficulties for generative AI.
- Finding the needle in the haystack. Identifying the relevant features and relationships within these high-dimensional datasets becomes complex. Imagine trying to teach a model to generate realistic buildings—it needs to understand not just the shapes of walls and windows, but also their spatial relationships and interactions with light and shadows.
3. Computational crunch. Training large generative models requires immense computing power. It is exacerbated by factors such as:
- The training process can take months, requiring vast amounts of processing power measured in hundreds of petaflop-days.
- The sheer volume of data used further strains computational resources. Large neural networks with billions of parameters require significant processing power to learn and adapt.
- Tuning the many hyperparameters (settings that control the model's behavior) adds another layer of complexity to the training process.
4. Overfitting and mode collapse. These issues can occur when generative models
- Become too focused on memorizing the training data instead of learning underlying patterns. This can lead to models that struggle to generate novel or creative outputs.
- Get stuck in a rut, repeatedly generating the same type of output. Imagine training a model to generate images of birds, but it always produces pictures of pigeons, neglecting the variety of other bird species.
5. Bias inheritance. Generative models can inadvertently inherit biases present in their training data. This can lead to discriminatory or unfair outputs. For instance, a model trained on a dataset with mostly male programmers might generate code that reflects a gender bias.
Siddharth Misra, SPE, is an associate professor at Texas A&M University who has published two books and developed nine technologies related to machine learning and electromagnetic sensing for energy and earth resource exploration. In 2018, he received the US Department of Energy Early Career Award, and in 2020 he was honored with four international awards for his contributions to exploration geophysics and subsurface engineering. Misra holds a bachelor's of technology degree in electrical engineering from the Indian Institute of Technology, Bombay, and a PhD degree in petroleum and geodesists engineering from The University of Texas at Austin. He can be reached at misra@tamu.edu.