Automatic Lithofacies Classification with t-SNE and K-Nearest Neighbors Algorithm




Facies prediction, Well logging, t-SNE


One of the critical processes in the exploration of hydrocarbons is the identification and prediction of lithofacies that constitute the reservoir. One of the cheapest and most efficient ways to carry out that process is from the interpretation of well log data, which are often obtained continuously and in the majority of drilled wells. The main methodologies used to correlate log data to data obtained in well cores are based on statistical analyses, machine learning models and artificial neural networks. This study aims to test an algorithm of dimension reduction of data together with an unsupervised classification method of predicting lithofacies automatically. The performance of the methodology presented was compared to predictions made with artificial neural networks. We used the t-Distributed Stochastic Neighbor Embedding (t-SNE) as an algorithm for mapping the wells logging data in a smaller feature space. Then, the predictions of facies are performed using a KNN algorithm. The method is assessed in the public dataset of the Hugoton and Panoma fields. Prediction of facies through traditional artificial neural networks obtained an accuracy of 69%, where facies predicted through the t-SNE + K-NN algorithm obtained an accuracy of 79%. Considering the nature of the data, which have high dimensionality and are not linearly correlated, the efficiency of t SNE+KNN can be explained by the ability of the algorithm to identify hidden patterns in a fuzzy boundary in data set. It is important to stress that the application of machine learning algorithms offers relevant benefits to the hydrocarbon exploration sector, such as identifying hidden patterns in high-dimensional datasets, searching for complex and non-linear relationships, and avoiding the need for a preliminary definition of mathematic relations among the model’s input data.

Author Biographies

Guilherme Loriato Potratz, Pontifícia Universidade Católica do Rio de Janeiro

Departamento de Engenharia Elétrica

Smith Washington Arauco Canchumuni, Pontifícia Universidade Católica do Rio de Janeiro

Departamento de Engenharia Elétrica

Jose David Bermudez Castro, Pontifícia Universidade Católica do Rio de Janeiro

Departamento de Engenharia Elétrica

Júlia Potratz, Pontifícia Universidade Católica do Rio de Janeiro

Departamento de Engenharia Elétrica, Programa de Pós-Graduação em Métodos de Apoio à Decisão

Marco Aurélio C. Pacheco, Pontifícia Universidade Católica do Rio de Janeiro

Departamento de Engenharia Elétrica


Aibar, S.; González-Blas, C.B.; Moerman, T.; Imrichova, H.; Hulselmans, G.; Rambow, F.; Marine, J.C.; Geurts, P.; Aerts, J. & Oord, J.V.D. 2017. Scenic: single cell regulatory network inference and clustering. Nature methods, 14(11): 1083–1086.

Al-Anazi, A. & Gates, I.D. 2010. On the capability of support vector machines to classify lithology from well logs. Natural Resources Research, 19(2): 125-139.

Albuquerque, C.F.; Soares, J.A. & Bettini, C. 2005. The use of well logs in logfacies modeling–example in the Namorado field, Campos Basin, Brazil. In: 9th INTERNATIONAL CONGRESS OF THE BRAZILIAN GEOPHYSICAL SOCIETY & EXPOGEF, Salvador, 2005. Society of Exploration Geophysicists and Brazilian Geophysical Society, p. 1157-1161.

Bhattacharya, S. & Carr, T.R. 2016. Integrated petrofacies characterization and interpretation of depositional environment of the Bakken Shale in the Williston basin, North America. Petrophysics, 57(2): 96-111.

Bhattacharya, S. & Mishra, S. 2018. Applications of machine learning for facies and fracture prediction using Bayesian Network Theory and Random Forest: Case studies from the Appalachian basin, USA. Journal of Petroleum Science and Engineering, 160: 1005-1017.

Burke, J.A.; Campbell Jr, R.L. & Schmidt, A.W. 1969. The litho porosity cross plot: A new concept for determining porosity and lithology from logging methods. In: SPWLA 10th ANNUAL LOGGING SYMPOSIUM, 1969, Society of Petrophysicists and Well-Log Analysts.

Busch, J.M.; Fortney, W.G. & Berry, L.N. 1987. Determinação da litologia a partir de perfis de poços por análise estatística. Avaliação de Formação de SPE, 2(4): 412–418.

Cunha, E.S.; Oliveira, K.A. & Gomes, H.M. 2003. Investigação do treinamento de uma rede neural para o reconhecimento de litofácies combinando dados de testemunhos e perfis de poços de petróleo. In: CONGRESSO BRASILEIRO DE P&D EM PETRÓLEO & GÁS, 2, 2003, p. 1–6.

Delfiner, P.; Peyret, O. & Serra, O. 1987. Automatic determination of lithology from well logs. SPE Formation Evaluation, 2(03): 303–310.

Dubois, M.; Bohling, G.; Byrnes, A. & Seals, S. 2003. Extracting lithofacies from digital well logs using artificial intelligence, Panoma (council grove) field, Hugoton embayment, Southwest Kansas. In: PROCEEDINGS, MID-CONTINENT SECTION AMERICAN ASSOCIATION OF PETROLEUM GEOLOGISTS MEETING, 2003, Tulsa, p. 30.

Dubois, M.K.; Byrnes, A.P.; Bohling, G.C. & Doveton, J.H. 2006. Multiscale geologic and petrophysical modeling of the giant Hugoton gas field (Permian), Kansas and Oklahoma, USA. In: HARRIS, P.M. & WEBER, L.J. (eds.). Giant Hydrocarbon Reservoirs of the World, from Rocks to Reservoir Characterization and Modeling. American Association of Petroleum Geologists Memoir, 88, p. 307-353.

Dubois, M.K.; Bohling, G.C. & Chakrabarti, S. 2007. Comparison of four approaches to a rock facies classification problem. Computers & Geosciences, 33(5): 599-617.

Heyer, J.F. 1999. Reservoir characterization of the Council Grove Group, Texas County, Oklahoma, In: MERRIAM, D.F. (ed.)., AAPG MIDCONTINENT SECTION MEETING TRANSACTIONS, Geosciences for the 21st Century, p. 71-82.

Hsieh, B.; Lewis, C. & Lin, Z. 2005. Lithology identification of aquifers from geophysical well logs and fuzzy logic analysis: Shui-lin area, Taiwan. Computers & Geosciences, 31(3): 263-275.

Kluth, C.F. 1986. Plate tectonics of the ancestral Rocky Mountains, In: PETERSON, J.A. (ed.). Paleotectonics and Sedimentation of the Rocky Mountains, United States. AAPG Memoir, 41, p. 353–369.

Linderman, G.C. & Steinerberger, S. 2019. Clustering with t-SNE, provably. SIAM Journal on Mathematics of Data Science, 1(2): 313-332.

Maeten, L.V.D. & Hinton, G. 2008. Visualizing data using t-SNE. Journal of Machine Learning Research, 9: 2579–2605.

Mishra, S. & Datta-Gupta, A. 2017. Applied Statistical Modeling and Data Analytics. Elsevier. 237p.

Olson, T.M.J.A.; Babcock, K.V.K.; Prasad, S.D.; Boughton, P.D.; Wagner, M.K.; Franklin, M.H. & Thompson, K.A. 1997. Reservoir characterization of the giant Hugoton gas field, Kansas. AAPG Bulletin, 81: 1785-1803.

Perry, W.J. 1989. Tectonic evolution of the Anadarko basin region, Oklahoma. U.S. Geological Survey Bulletin, 1866: 1-16.

Piegl, L.A. & Tiller, W. 2002. Algorithm for finding all k nearest neighbors. Computer-Aided Design, 34(2): 167-172.

Rogers, S.J.; Fang, J.H.; Karr, C.L. & Stanley, D.A. 1992. Determination of lithology from well logs using a neural network. AAPG Bulletin, 76(5): 731-739.

Rosa, H.; Suslick S.B.; Vidal, A.C. & Sakai, G.K. 2008. Electrofacies characterization using multivariate statistical tools. Revista Escola de Minas, 61(4): 415–422.

Salehi, M. & Bizhan, H. 2014. Automatic identification of formation lithology from well log data: a machine learning approach. Journal of Petroleum Science Research, 3(2):73-82.

Sebtosheikh, M.A. & Salehi, A. 2015. Lithology prediction by support vector classifiers using inverted seismic attributes data and petrophysical logs as a new approach and investigation of training data set size effect on its performance. Journal of Petroleum Science and Engineering, 134: 143-149.

Shaham, U. & Steinerberger, S. 2017. Stochastic neighbor embedding separates well-separated clusters. ArXiv preprint arXiv:1702.02670.