The intent is to produce a series of practical papers describing the current state of the evolution of the data analysis and interpretation techniques over the last 20 years in the oil and gas industry.

 

*A quick note on the term “Bias”: We use the term “bias” often in this article, but in two different contexts. It is important to distinguish between Human bias and Mathematical bias. Human biases are what come to mind when we think of introducing unintentional error into a neural network, and this type of bias usually carries a negative connotation. Mathematical biases on the other hand are critical components of machine learning algorithms that allow them to adjust and learn dynamically.

Another way of thinking about the difference between human and mathematical biases is human biases are static (e.g. “amplitude conformance to structure indicates a contact”), whilst mathematical biases are dynamic (e.g. the biases built into a simple 2-layer neural network).  In this article we will be referring to both – either as “human bias” or “mathematical bias”.

Bias is not always a bad thing; it is an essential component of the learning process, be it organic or inorganic.

 

Insight #1: Interpretation vs. Labeling

The more geoscientists examine geophysical subsurface data and make drilling decisions based on it, the more we have come to realize that data – particularly large amounts of related data – tells much deeper and more varied stories than we previously thought or even recognize. Tools are beginning to become available that utilize Artificial Intelligence (AI) and Machine Learning (ML) to tease out these heretofore unseen pieces of information about relationships between different parameters hidden in the datasets. This is as far as these technologies will take geoscientific questions, however – the role of a geoscientist is to interpret and make sense of the ML outputs, and to use their knowledge and experience to put them into a context that describes the presence(or absence) of active petroleum systems.

The term “Interpretation” means something very specific (and usually deeply personal) to geoscientists - an interpretation is how geoscientists differentiate themselves; great interpretations have made careers and poor ones have tanked them. But in the context of geoscience, what exactly is an interpretation?
While the nuances of the definition largely rest upon the experience level of the person supplying the answer, every interpretation product has one thing in common: some of it - to differing degrees - is speculative. That is to say, in the face of a paucity of data we - the scientists - are forced to make use of our knowledge and/or wisdom to fill in the blanks (think noisy data, inversion attributes, low resolution signals, etc.) and generate a complete story that supports the presence or absence of a working petroleum system. In this sense the old adage “geophysics is half science, half art” is very true. The point is that the “speculative” element of interpretation was critical for a geoscientist to link geophysical representation to developed geological concepts and to achieve their project goals. The landscape has changed however, as energy companies matured AI and Machine Learning tools in a competition to become and remain profitable in the face of multiple economic, social, and geopolitical issues that have gripped the industry in the past few years.

The best new techniques that are appearing in geoscience toolkits today make use of Machine Learning and AI technologies to shed light on some of these historically dim places by allowing for the analysis of massive quantities of data with understood geological codependences. Coincident with the rapid acceptance and deployment of these tools by major E&P companies must however come a pivot in the way that we as geoscientists approach interpretation, and what this word means to us moving forward.

In this paper we focus on the new balance between human and machine roles in the interpretation of 3D seismic data

I.  The Interpretation in ML-Augmented Analyses

Perhaps the best place to start defining what geophysical interpretation means in 2020 and beyond is to have a quick discussion about the AI/ML tools available to geoscientists and how they work. It is in this context we will frame the rest of the discussion. Before we continue, read the three sentences below and commit them to memory; they are critical to a meaningful understanding of this article.

1) A label is how a geoscientist tells a network what data is relevant.*

2) When data is labeled, it is added to the training set, which is what the network uses to learn.

3) If it is not in the training set, the network cannot predict it.

*Labeling data requires a solid understanding of geologic and geophysical concepts, as well as awareness of when such concepts are overextended.

 

Deep learning products being deployed in the oil and gas industry such as Bluware’sInteractivAI™ require, at the most fundamental level, a seismic data set and a set of “labels” provided by the geoscientist in order to operate. A “label” in this context is what a lot of geoscientists would think of as a first step in creating an “interpretation”, with one key difference: when labeling data for consumption by a Machine Learning network, one must be vigilant so that only the data that is relevant to the question being asked is included. Currently, networks are designed to handle one task at a time; applications that produce several machine learning results in one run are running parallel networks. Each of these can still only focus on answering one question at a time. In traditional data interpretation, the geoscientist is expected to lean on their own opinions and experience early in the process and provide their narrative of what the data is saying, usually before examining all of the available data. When using a Machine Learning workflow, this “leading-with-the-answer" approach can be dangerous.

Interpretation workflows developed and published over the last 25 years suggest or assume the interpreter has a working model in mind at the beginning of the interpretation process, which inevitably leads to the introduction of multiple human biases influencing the final interpretation. This was all but un avoidable before the development of commercially-viable ML capabilities. When working with ML tools, the main phase of the “interpretative process” has shifted from the beginning to the end. During any discussion of labeling data for ML, this is an extremely important point for two reasons:

                1) As visually oriented scientists, we have the habit of including all occurrences of a geologic feature in an analysis rather than focusing only on a single type. For example, the structural geologist tasked with creating a regional fault model may also get destructed by details of interpreting polygonal faults from dewatering,which are features largely irrelevant to a regional kinematic model. If the goal of a study is to produce a model of kinematically relevant thick-skinned faulting, labeling irrelevant information like dewatering structures makes the network’s job both harder to do and the network output more challenging to interpret. The key takeaway is if a feature is not directly relevant to the question you are trying to answer, it should not be labeled as a feature of interest or included in the training set for the network.

                2) We are scientists, so it is inherent for us to fill in the blanks of a story when they present themselves. This is the “half art” part of what we do; it also happens to be what Machine Learning excels at. When preparing a label set to define training data for a Machine Learning network, one must be precise about what data is being labeled. In other words, if you cannot actually see it, do not label it!

Using faults again as an example, consider the unexciting but impactful scenario outlined in the images below: an interpreter may want to carry the green fault through the zone circled in yellow on the image below, even though no visible seismic expression of that fault is present on the data. By doing so, the interpreter has inadvertently highlighted low-amplitude, acoustically transparent data as being relevant to the network. On a basic level, this is one primary way that human bias can be unintentionally introduced into a network, so it is a best practice to avoid taking too much artistic liberty when labeling the original training set.

Fig.1: A visual representation between an “interpretation” and a “label” in the context of a deep learning network.

A second example is in Figure 2 below. In approach (a), the geoscientist has carried a fault label through reflectors that for one reason or another do not appear to be offset. While seemingly a minor detail, the inclusion of conflicting data in the training set is can actually be quite significant to the network.

A more appropriate approach when using machine-learning tools is shown in image (b),where the geoscientist chose to stop the fault label where the visible image of the fault stopped and pick it up again on the other side. The takeaway here is that more labels does not necessarily mean better results when training ML networks.

Fig.2: the basic difference between a possible interpretation(a) and a label for Machine Learning (b). The “Geodata Scientist” will recognize the importance of excluding the central part of the interpretation due to the fault not being well imaged in that area and the presence of a single continuous reflector; we would not want either of these to be included in the training set for the network.

As geoscientists, we can all agree that the top through going interpretation in image (a) is probably correct, but as data scientists we can see why it can also be non ideal for inclusion in a training set, as in image (b). It is here we propose the creation of a new discipline known as Geodata Science. Geodata Science is data science with an emphasis on the unique types of data that describes Earth materials and is used in the oil and gas industry among others. The successful Geodata Scientist will be intimately familiar with subsurface concepts and data while also understanding the mathematics of Machine Learning and how the networks can be optimized for the very specific needs of the energy industry.

II. Augmented role of Human Interpretation

The integration of the human interpretive experience is critical in the final part of the Geodata Science workflow, when the labeled data is categorized according to the current understanding of the Earth processes. In this final phase of the cognitive process, human intelligence converts observations into insights and relates them to the other critical pieces of data and interpretations in the project. This means the interpretive role of the geoscientist is no longer focused primarily on moving as fast as possible through as much data as possible, but rather on the value decisions being made during and especially at the end of the ML augmented interpretive process.

Previously – and by “previously” we mean as little as 10 years ago – it was the geoscientist's “modus operandi” to look at available data, come up with a working hypothesis, then work the data with that hypothesis in mind, making adjustments as necessary, in that order. It was common to instinctively accept the mathematical biases in data sets as well as the human biases inherent in the interpretation of that data. Depending on the individual interpreter and their habits, sometimes the “final” interpretation is quite different than their initial assumptions; for others it’s exactly the same. The stage of formulating the working hypothesis we just described is when the interpretation process actually begins, if not before. This used to be acceptable and common, because it was necessary. Historically, initial stages of interpretation usually involved drawing on past experiences to make an educated guess about what the current situation is or will be. The main variable at this stage that will determine if a study is scientifically sound or not is the human condition– some people have a harder time accepting that their original assumptions are incorrect than others do, and thus have a more difficult time adjusting their overall understanding of a subject.

With the introduction and deployment of Machine Learning tools to geoscientists, the need to start with a narrative is largely eliminated (and actually discouraged). A true data-driven workflow does not and cannot begin with a final interpretation model in mind because – like we noted above - this is the primary way that human bias will work its way into the final result. In the modern ML-AI augmented geoscience workflow, the interpretive step comes at the end, after the new tools have been able to run as intended and do what they are designed todo. Machine Learning outputs are not answers as much as they are packets of information about data relationships for the geoscientist to consume -in addition to the other geo data sets to be integrated into analysis.  It requires an adjustment in the way geoscientists work, to be sure – but one that is easily made if framed correctly.

The table below shows a Traditional vs ML-augmented geoscience workflow at a conceptual level, highlighting the point in the process at which the well-meaning interpretive skills of the geoscientist injects human bias. It is important to note that these biases are hierarchical in the sense that once introduced their effect persists through every subsequent step of the process.

Fig. 3:Conceptual Traditional vs. ML-Augmented G&G workflows showing the point along each at which human bias is introduced and carried through to the final interpretation.  In Traditional workflow we have Bias based on partial understanding of the data versus an “Educated Bias” based on the recognition of the most likely interpretation in ML-augmented workflow.

III.  Putting it all Together

Our summary is brief because the topic is still a young one, and with few exceptions corporate strategies are still relying on conventional methods of 3D seismic interpretation and a proper balanced integration of AI/ML technologies into daily practices is in its infancy. With continuous staff reductions, the technology-led changes of corporate behaviors require a review of well-established assumptions and adjustment of corporate perspective on the role and value of the geoscientist.

The rank-and-file geoscientist needs to be aware of the human biases and introspective when thinking about what “interpretation” means and where and to what degree it belongs in a modern ML-augmented geoscience workflow.

Conversely, the substitution of geoscientists with pure data scientists is unlikely to lead to tangible geological insights at the end of the interpretive process. These two roles need to be developed organically to have an impact on the quality of the business decisions.

The amount of data that a dwindling number of geoscientists are expected to make sense of necessitates the use of AI/ML-driven technologies, so it is imperative that those who operate them have a full and honest grasp on where and when (or even if) their own personal narratives come into play. As the role, impact, and extent to which these technologies will ultimately end up being deployed remains an unknown, the only conclusion we can offer at this time is change is inevitable, and remaining relevant in the oil and gas industry will require a pivot away from the traditional ways we think about exploration, production, and the roles of the petroleum geoscientist in general.