This paper presents the methodology and results of developing a prototype system for the automated extraction of structured information from unstructured textual reports of Production Geophysical Surveys (PGS) of oil wells. The core of the solution is the QwenLarge Language Model (LLM) architecture, enhanced with a Retrieval-Augmented Generation (RAG) mechanism to provide the model with context from external knowledge bases. A comparative analysis of baseline LLM architectures (Qwen2.5-7B-Instruct and ruGPT-3.5-13B) was conducted, revealing that Qwen held a significant advantage in both accuracy and processing speed. The key achievement of this work is the integration of the RAG approach, which substantially increased the accuracy of geological and technical complication classification from 45 % for the baseline Qwen model to 83 % across nine predefined complication classes. The developed software system executes a full processing pipeline: from text preprocessing (tokenization, normalization) and Named Entity Recognition to complication classification and the generation of structured data ready for integration into corporate information systems. The average processing time for a single report was 30 seconds. This proposed solution is designed to automate engineering analysis, support intervention planning, and enhance the operational efficiency of oil field management by transforming unstructured textual data into actionable, structured insights.
References
1. Krasnov V.A., Sudeev I.V., Yudin E.V., Lubnin A.A., Reservoir parameters evaluation using the production data analysis (In Russ.), Nauchno-tekhnicheskiy vestnik
OAO “NK “Rosneft”, 2010, no. 1, pp. 30–34.
2. Asmandiyarov R.N., Kladov A.E., Lubnin A.A. et al., Automatic approach to field data analysis (In Russ.), Neftyanoe khozyaystvo = Oil Industry, 2011, no. 6, pp. 58–61.
3. Andrianova A.M., Yudin E.V., Ganeev T.A. et al., Application of intelligent methods for analysis high-frequency production data for solving oil engineering challenges
(In Russ.), Neftyanoe khozyaystvo = Oil Industry, 2021, no. 9, pp. 70–75, DOI: https://doi.org/10.24887/0028-2448-2021-9-70-75
4. Judin E., Andrianova A., Ganeev T. et al., Intelligent methods for analyzing high-frequency production data to optimize well operation modes, SPE-212118-MS, 2022, DOI: https://doi.org/10.2118/212118-MS
5. Whiteside J., AI-enabled large language model speeds up wells data retrieval but must be used with care, Drilling Contractor, 2023,
6. Rachmanto R., Utilizing large language models for information retrieval from reports in the oil and gas industry, Plain English AI, 2023,
7. Ghorbanfekr H., Kerstens P.J., Dirix K., Classification of geological borehole descriptions using a domain adapted large language model, arXiv preprint arXiv:2407.10991, 2024, DOI: https://doi.org/10.48550/arXiv.2407.10991
8. Zhiwei Ma, Santos J.E., Lackey G. et al., Information extraction from historical well records using a large language model, Scientific Reports, 2024, V. 14, No 1,
DOI: https://doi.org/10.1038/s41598-024-81846-5
9. Zhouhan Lin, Cheng Deng, Le Zhou et al., GeoGalactica: A large language model for geoscience knowledge retrieval and reasoning, arXiv preprint arXiv:2401.00434, 2024, DOI: https://doi.org/10.48550/arXiv.2401.00434
10. Wayne Xin Zhao, Kun Zhou, Junyi Li et al., A survey of large language models, 10.48550/arXiv.2303.18223, 2023, DOI: https://doi.org/10.48550/arXiv.2303.18223
11. Guu K. et al., Retrieval augmented language model pre-training, International conference on machine learning, PMLR, 2020, pp. 3929–3938.
Юбилей Великой Победы![]() - специальная подборка статей журнала, посвященных подвигу нефтяников в годы Великой Отечественной войны; - списки авторов публикаций журнала - участников боев и участников трудового фронта. |