From the editors: We would like to thank Nic Ryan, principal data scientist at DataFriends, for his time to answer questions related to digitalization, data science, machine learning, and artificial intelligence.
The term “digitization” is about collecting data and storing it as data. Approximately 90–95% of data in an organization is not stored in databases but in formats such as emails, documents etc., making it hard to access. Say a person has a heap of data stored in pdf documents that he would like to understand, wrangle, plot, and use. Digitization would involve parsing this data from the pdf file (a surprisingly challenging task) and storing it in a database that people could access. The term “digitalization” is about using the data (from digitization) to make the business lives easier and more informed.
As per Nic Ryan, principal data scientist with DataFriends, digitalization in the oil and gas industry is undoubtedly like other industries:
- Data is getting easier to collect and store
- Data-based decisions are becoming seen as increasingly important, as data is considered the ultimate authority
- Better algorithms are available for making data-based decisions
- Advanced computing capacity is available to do the heavy lifting
It really is all about the use and democratization of data for decisions in an organization, requiring a cultural shift coming from the top-down.
In response to recent technological advancements, oil executives are considering digital technologies with the potential to transform operations and create additional profits from existing capacity. McKinsey & Co. finds that the effective use of digital technologies in the oil and gas sector could reduce capital expenditures by up to 20%; it could cut operating costs in upstream by 3–5% and by about half that in downstream.
Digitalization is going to impact every industry in the next 5–10 years. In fact, some suggest that companies using artificial intelligence (AI) will take $1.2 trillion from their less informed peers by 2020. To remain competitive companies in the oil and gas industry must leverage their internal data assets in a more intelligent way than their peers.
Nic defines “data science” as leveraging the internal data assets of a company to help the company achieve its strategic aims. The "science" bit comes in where the scientific method of hypothesis, experimentations, and measurements are used. Data science is such a broad discipline that it is hard to say what is typical for a data analytics engineer. It is a combination of mathematics, statistics, programming, domain expertise, and business. Some people work more on shipping data products in production databases, so they have more in common with software developers, other people are like statisticians doing A/B testing of an email campaign to work out which creative will work best, some people will be building dashboards and data visualizations to inform the business. It varies greatly from one data analytics engineer to another.
Let us drill a bit deeper into AI. Now this term, as per Nic, is not very helpful. He describes it as getting computers to do tasks, including making decisions where, in the old days, humans used to make these decisions. AI has become an umbrella term for “machine learning,” “deep learning,” and “data science.” The terms have become conflated, used interchangeably, and a good way to check if someone is playing "buzzword bingo" in meetings.
Machine learning is a broad subset of AI. In the past, it was easier to gather teams of software developers to carefully craft the logic for these new decision systems. The output was less than adequate because some of the problems like image recognition and stock price prediction were just so complex that humans using "if," "then," and "else" statements on a computer never got it accurate. Hence with machine learning the problem is set up by showing the computer a heap of inputs (pixels, columns, etc.) and then showing outputs (cat pictures vs. dog pictures, actual future stock prices) and saying, "Computer, learn some kind of mapping from these inputs to this set of outputs that can be applied to unseen inputs."
AI is not efficient at jobs which require creativity, critical thinking, empathy, leadership, and other qualities that are conventionally thought of as “human.” Balancing automation that will allow humans to be free from performing repetitive tasks will be beneficial for the future. This will allow noncreative and nonpersonal responsibilities to be done by automation while allowing humans to reach their full potential.
Now let us consider the type of skills that a young professional (YP) needs to become a proficient in data science and how he/she can develop those skills.
Programming. This is an essential skill in data science. YPs should be able to program, supplementing their ability to perform statistics and allowing them to analyze large amounts of data sets. The ability to perform quantitative analysis is a requirement as data science is about understanding complex systems and the behavior they exhibit. Quantitative analysis may be performed through experimental design and analysis, the modeling of complex systems, and the application of machine learning.
SQL, a programming language used to interact with databases, is rare to find in academics but found everywhere in the industry. YPs can take online courses to learn SQL, which can give them an edge over their competition who knows Python and R well. YPs can read outside of their discipline on different types of statistical tools and practice analyzing data and talk to professionals in the data science field to learn more about the specifics.
Product Knowledge. This is a skill that allows the data scientist to understand the complex system that is generating the data that the data scientist is analyzing. This knowledge allows the data scientists to generate their hypotheses, define their metrics, and perform debugging analyses.
Communication and Teamwork. These are the most important skills that a YP requires. This skill is very critical for career in data science as it allows them to communicate their insights, prepare and give meaningful presentations, and work well in teams. A data scientist cannot work in isolation and thus needs to share his/her knowledge with other team members such as designers, engineers, product managers, and others.
Data science teams are made up of professionals who work together to solve an organization’s hardest data problems. Every professional on the team brings a different and unique set of skills that allows the team to work and complete a data science project. The data science team is made up of data analysts, data engineers, and data scientists.
Data scientists are the professionals that develop mathematical and statistical models applied to data. The models allow the data scientist to tackle a business problem and translate it into a data question. Data scientists may be statisticians— professionals who focus on implementing statistical approaches to data, and data managers—professionals who focus on running data science teams.
Data engineers are professionals who handle large amounts of data through their knowledge of computer science. Their day-to-day activities focus on the implementation—taking requests that come from the data scientist and implementing them through the use of coding. Data Engineers essentially take the models developed by the data scientist and implement it in code. Data engineers may be data architects (professionals that focus on structuring the technology that manages data models) and database administrators (professionals who focus on managing data storage solutions).
Data analysts are professionals who analyze data, providing reports and visualizations, which explain what the data means. One does not always need a PhD to become an expert in his/her field. It is the same with data analytics. It is a broad field and there is plenty of room for self-taught experts to contribute. If YPs can use analytics, programing, and other self-taught skills to solve problems in their current role, they can highlight those results when seeking a data analyst position.
A lot of other industries have already been using data analytics for many years now. Hence there is already a lot of chatter of what YPs do in those industries. The main point YPs in the oil and gas industry need to understand is that the quality of data is very important. If the data is not good, even having many gigabytes of data will not help solve a problem. A hardworking engineer with a deep understanding of the oil and gas industry, who knows a programming language and has skills to demonstrate it can achieve huge success in these changing times in a digitized oil and gas industry. The oil and gas industry needs a lot more data scientists today than a year ago, so a person with the right qualifications and experience is the need of the industry today.
| Nic Ryan is the principal data scientist of his consulting company DataFriends and is a regular speaker at data science events. He has worked different in roles in a data science team, from excited newbie to quickly managing 3 teams across 2 countries. He has experience working in diverse industries such as insurance, banking, agriculture, and online advertising. Ryan is passionate about the data science community and regularly produces content aimed at inspiring the next generation of data scientists. He even gives live code demos. |
| Aman Gill is an engineer-in-training working as an account manager with GE Water and Process Technologies working in the Alberta oil sands industry. |
| Jaspreet Singh Sachdeva is a PhD candidate at the University of Stavanger in Norway. He is working on rock-fluid interactions and rock mechanics at the National IOR Centre of Norway. |
| Luis Enrique Valencia is a geoscientist with more than 9 years of experience in the oil and gas industry. He held positions in the exploration and development divisions in Venezuela and Trinidad & Tobago at Chevron. |
| Victor Torrealba a postdoctoral fellow at the Ali I. Al-Naimi Petroleum Engineering Research Center at King Abdullah University of Science and Technology in Saudi Arabia. His research is focused on chemical enhanced oil recovery and simulation of naturally fractured reservoirs. |