Drilling/completion fluids

Neural Networks Help Classify Reservoirs by Recognizing Cuttings Lithologies

Advances during the past decade in using convolutional neural networks for visual recognition of discriminately different objects means that now object recognition can be achieved to a significant extent.


Drill cuttings and core images often present classification problems. Development of an unbiased objective system that can overcome the various issues creating these difficulties is an important goal. Advances during the past decade in using convolutional neural networks (CNNs) for visual recognition of discriminately different objects means that now object recognition can be achieved to a significant extent. The benefit of such a system would be improvement of reservoir understanding by having all available images classified in a consistent manner, thus keeping characterization consistent as well.


A CNN is a type of artificial neural network used in image recognition and processing that is specifically designed to process pixel data. CNNs are powerful artificial-intelligence systems that use deep learning to perform both generative and descriptive tasks, often using machine vision that includes image and video recognition, along with recommender systems and natural-language processing. A neural network is a system of hardware or software patterned after the operation of neurons in the human brain. Traditional neural networks are not ideal for image processing and must be fed images in reduced-resolution pieces. CNNs have their “neurons” arranged more like those of the frontal lobe, the area responsible for processing visual stimuli in humans and other animals.

Currently, researchers can train their deep-learning models in such a way that overfitting can be avoided. The network is comprised of a typical shallow neural network, whose inputs are generated by a feature-learning procedure that essentially is a preprocessing of the initial raw image to break it down successively into key features that can be used to differentiate it from other object classes. A CNN uses a system much like a multilayer perceptron that has been designed for reduced processing requirements. The layers of a CNN consist of an input layer, an output layer, and a hidden layer that includes multiple convolutional ­layers, pooling layers, fully connected layers, and normalization layers.

In a CNN, the convolution layer does most of the computational heavy lifting. When dealing with high-dimensional inputs such as images, connecting neurons to all neurons in the previous volume is impractical. Instead, each neuron is connected only to a local region of the input volume. The spatial extent of this connectivity is a hyperparameter called the receptive field of the neuron. The parameters of this layer are a set of learnable filters. Every filter is small spatially but extends through the full depth of the input volume. As the filter slides over the width and height of the input volume, a 2D activation map is produced that gives the responses of that filter at every spatial position. Intuitively, the network will learn filters that activate when some type of visual feature is recognized. This results in an entire set of filters in each convolution layer (e.g., 12 filters), and each of them will produce a separate 2D activation map. These activation maps are stacked along the depth dimension and produce the output volume.

Convolutional layers usually are followed by a nonlinear activation function such as a rectified linear unit (ReLU). The ReLU layer does not change the size of its input. The activation maps generated from the convolution layer are mapped onto the ReLU transfer function. The convolution and ReLU layers act as ­feature-extraction layers. A ReLU layer performs a threshold operation to each element, where any input value less than zero is set to zero and any input value greater than zero maintains its value.

The resulting volume then is subject to feature selection by the next operation, referred to by the authors as “max pooling.” Essentially, max pooling seeks to identify the most important features within a volume. Its function is to reduce the spatial size of the representation progressively to reduce the amount of parameters and computation in the network and, thus, to control overfitting. This operation further serves to reduce the computational cost by reducing the number of parameters to learn. Fig. 1 provides an example of how the CNN processes images internally to lead to a certain classification prediction. The initial convolutional layers typically identify edges (both horizontal and vertical); thus, the outline of the individual cuttings is well defined. As one progresses deeper through the network, subtler features are delineated. For example, the ReLU layer displays a black-and-white contrast. This is because inputs less than zero are given zero activations by the ReLU transfer function; thus, these pixels are black.

Fig. 1—Example image of calcareous sandy clay being processed by the CNN data set and modeling.

Data Set and Modeling

The data set used in this paper consisted of drill-cuttings photographs from several wells, subdivided into a dozen lithological classes. The data set was then reduced to four classes with enough images to train the CNN. Twenty randomly chosen images from each lithology class were used in the training process of the methods discussed in the complete paper, and the remainder were used for validation. Multiple modeling methods were attempted, including the following.

Support-Vector-Machine (SVM) Classification Using Features Extracted From AlexNet. Before embarking upon the long process of training a CNN, testing to see whether the image set can be classified using a quicker means should be attempted. One such technique is to use a pretrained CNN developed on a different, or even a similar, image set. The new images of interest are then fed into the pretrained network, and the activations on the first fully connected layer are then extracted for classification using an SVM. One such popular pretrained network is AlexNet. In this paper, drill-cuttings images were fed into AlexNet and the activations from the first fully connected layer were output to be used as input to an SVM. The SVM with linear kernels was then regressed to classify the lithologies. The advantage of this approach is that it can be completed within seconds. The disadvantage is that the SVM performs very well on the training set but poorly on the validation set. In this case, the SVM had a 100% match rate to the training set but only 62% to the validation set.

Transfer Learning Using AlexNet. In transfer learning, a pretrained CNN is used as a starting point for training a different image set. It is assumed that the pretrained network already has filters trained for edge and contrast detection and that application on a new image set would require minimal additional training. Training large image sets is computationally intensive, often requiring high-end graphics cards. In the case of AlexNet, 60 million parameters must be trained, and the larger the image size, the longer this will take. The training time to develop AlexNet was 6 days on two graphics cards running simultaneously. The drill-cuttings images were retrained on AlexNet. The training time was on the order of minutes, and, while the classification level was good on the training set (87%), it did not perform well on the validation set (67%).

Bayesian Optimized Network. A new network was created from scratch. Initially, a single sequence of convolution/batch normalization/ReLU/max pooling was used. After the convolutional sequences, the end filter cube was flattened into a fully connected ReLU layer connected to a softmax layer for classification. More convolutional-layer sequences were added until a reasonably good classification percentage was achieved on the training set. The total number of hyperparameters in this network is approximately 830,000. This set of network weights was further refined by Bayesian optimization on several parameters of the CNN. Because the number of images is fairly small, the initial learning rate range was set to low and the regularization range was set to high to prevent overfitting. The result of this was 100% success on the training set and 82% on the validation set. With the exception of glauconitic silty clay, all classes are predicted fairly well. This highlights the paucity of images available for glauconitic silty clay and establishes that more images need to be collected for this class.

Ensemble CNN. While a single network may not be able to learn or predict consistently well, taking the consensus from an ensemble provides much higher confidence in the final prediction classification. To this end, four Bayesian optimized networks were created that had different final parameter sets. All four were then used to generate a final consensus as to the final classification. The classification success on the training set is 98%, while the validation set saw 73.5%, ostensibly lower than the single Bayesian optimized network.

Generally speaking, one or two networks may misclassify; however, if all the networks misclassify, then further investigation is warranted. Images are sometimes misclassified by the human interpreter and require a second look. When this is performed, the image can be classified correctly and the network retrained. The ensemble is thus updated and its ability to correctly classify in the future is enhanced.

This article, written by JPT Technology Editor Chris Carpenter, contains highlights of paper SPE 196675, “Visual Recognition of Drill-Cuttings Lithologies Using Convolutional Neural Networks To Aid Reservoir Characterization,” by Muhammad Kathrada, SPE, and Benjamin Jacob Adillah, Petronas, prepared for the 2019 SPE Reservoir Characterization and Simulation Conference and Exhibition, Abu Dhabi, 17–19 September. The paper has not been peer reviewed.