Data & Analytics

Large Language Models Boost Supervisory Efficiency With AI-Generated Reports

Routine status reporting often presents a challenge because of its intimidating and time-consuming nature for both employees and supervisors. With large language models, a system was developed to generate coherent artificial-intelligence-driven reports. The goal is to enhance the understanding of overall insights and reduce the time required for individual report reading.

LLM_hero.jpg
Murphy’s Subsurface Data Science team reviews weekly updates.
Source: Murphy Oil

In modern workplaces, technology and communication systems are pivotal for maintaining productivity. The effective integration of these tools ensures timely and accurate information, which is crucial for data-driven decisions. Traditional methods of manually writing and reviewing reports, however, are typically slow and inefficient, presenting significant obstacles to effective workflow management. Such outdated practices can disrupt projects, lead to decision-making delays, increase the likelihood of miscommunication, and cause important details to be overlooked. For example, a manager who struggles to manually compile and review reports may overlook critical insights, which can delay strategic initiatives and adversely affect overall team performance. Employees’ weekly reports often are complex and difficult to follow, posing challenges in grasping key concepts and planning actions.

Many efforts have been made to mitigate the inefficiencies of traditional report-generation methods through the adoption of artificial intelligence (AI), which can automate the creation of text-based reports. These AI solutions focus on generating narrative insights rather than data-driven summaries, thus requiring more customization to fit specific business contexts.

Here, we address the inefficiencies of traditional report-generation methods by using AI to automate the creation of text reports. Our method uses advanced AI models, such as OpenAI’s GPT-3.5 Turbo, to produce structured text reports and incorporates automated workflows to compile, format, and generate these reports consistently, thereby minimizing manual intervention. The performance of these models and processes is evaluated continually and refined to enhance accuracy and relevance. This approach improves efficiency by reducing both the time and the effort required for generating reports, including handling large volumes of data. Moreover, it supports proactive problem-solving by identifying critical issues and trends through automated reporting, enabling more-efficient and -effective decision-making.

Literature Review
Efficient communication and information management are essential for enhancing organizational productivity and influencing decision-making processes and overall performance (Daft and Lengel 1986). In contemporary workplaces, traditional methods of generating and reviewing weekly reports have become increasingly outdated, often leading to inefficiencies and the waste of resources (Brown and Duguid 2000). Automation technologies, particularly those powered by AI, have emerged as effective solutions to these challenges. By automating repetitive tasks and leveraging AI-driven insights, organizations can streamline their workflows and boost productivity (Bughin et al. 2017). Microsoft Power Automate, a low-code automation platform, allows users to create workflows that automate routine tasks. In conjunction with Power Apps, it provides organizations the flexibility to develop customized solutions tailored to their specific needs.

Additionally, AI-powered language models, such as OpenAI GPT-3.5 Turbo, have shown remarkable proficiency in natural language processing tasks, including text generation and summarization (Brown et al. 2020). Integrating these models into organizational workflows enables supervisors to gain comprehensive insights into employee activities while minimizing the time spent on manual report review. Moreover, the adoption of AI technologies in organizational settings has led to significant benefits, including improved efficiency, cost savings, and enhanced decision-making capabilities (Brynjolfsson and McAfee 2017). However, the implementation of AI also presents challenges, including data privacy concerns, the need for continuous model training, and the requirement for administrative support (Mittelstadt et al. 2016).

In summary, the literature indicates that integrating AI technologies such as Microsoft Power Automate and AI-powered language models offers substantial potential for enhancing the generation and review of weekly reports in organizational settings. By leveraging these technologies, organizations can improve communication, optimize information management processes, and significantly enhance productivity and performance.

Data Structure and Processing
The data set for this application contains comprehensive information related to employee activities, supervisor details, and weekly update content, encompassing 22 variables. These variables include crucial details such as names and email addresses of employees and supervisors, and the free text content of individual weekly updates. This rich dataset is foundational to the automated report-generation system, enabling the transformation of fragmented information into cohesive reports. Each record in the dataset is identified by unique IDs for notes and updates, ensuring accurate tracking and referencing.

Additionally, the dataset logs the number of supervisors and employees involved, the start and end dates of each reporting period, and specific details such as titles and categories of each update. This aids in categorizing and prioritizing tasks and documenting the narrative of weekly activities and accomplishments. Timestamps for the creation of updates and metrics such as response, feedback, and update counts provide quantitative measures of engagement, while highlight counts and timestamps offer insights into which updates received the most attention.

The data-processing phase is critical for transforming the raw dataset into a structured format suitable for automated report generation. Initially, weekly updates for each employee are aggregated. Information from emails and unstructured report contents are compiled into a single record per employee per week, ensuring all relevant information is captured cohesively. To enhance report usability and relevance, data is segmented into three hierarchical levels reflecting the different roles within the organization.

Level L1 represents senior supervisors, consolidating updates across the entire organization. Level L2 pertains to first-level supervisors, focusing on activities relevant to their direct reports. Level L3 includes detailed reports for individual contributor, outlining their specific activities and updates.

The final step in data processing involves creating a new database view that encapsulates the aggregated and segmented data. This view includes essential fields such as name, the reporting week, hierarchical level, email address, and consolidated content of updates. The length of each update is calculated to indicate the comprehensiveness of the report. By structuring the data into this view, the system ensures consistent and accurate report generation, adhering to standardized formats that facilitate straightforward analysis and interpretation.

Methodology
The methodology establishes a systematic workflow to automate the generation of weekly reports. The process begins with data acquisition, where the system accesses a database containing a newly created view. This is followed by the development of a user-friendly interface within Microsoft’s Power Apps, allowing users to explore detailed insights based on their hierarchical level, such as employees’ names, email addresses, and desired reporting time frames and contents. The system seamlessly integrates with Microsoft Power Automate, where a sequence of actions retrieves input data from Power Apps to initiate the report-generation process.

Using the OpenAI GPT-3.5 Turbo model, the system generates structured reports based on specified parameters, ensuring the content is relevant and well-organized. Subsequently, these reports are distributed via email, ensuring timely and efficient delivery to supervisors and employees. In the Sub-Surface App Hub, users can access the “Weekly AI-generated Update” section to retrieve updates (Fig. 1). This functionality ensures that the system fetches the correct profile information and associates updates with the appropriate users. When entering the weekly update section, users are provided options to input their report data, which includes selecting the week; specifying the update type; and choosing the relevant project. Users then add detailed content for their update, capturing all necessary information accurately.

LLM_Fig1.jpg
Fig. 1—Sub-Surface App Hub Interface: Users can access various AI-powered tools and submit their weekly updates.

Additionally, users can view a list of previously submitted updates, where each entry displays the update title, content, date range, and associated supervisor. Options are available to open discussion threads, delete, or edit updates directly from the interface (Fig. 2).

LLM_Fig2.jpg
Fig. 2—Weekly report management: Interfaces for inputting, exporting, and managing weekly reports.

The Export Weekly Report feature allows users to select the week, person, report type, and security level before generating and sending the report. This integration with Power Apps captures all inputs and stores them into the database server, which Microsoft Power Automate then uses to retrieve information.

Initially, the workflow (Fig. 3) retrieves user profile information to verify the accuracy of the data processed. It then filters variables such as user email, report dates, and security level. The SQL extraction step is crucial, querying the database to pull the required details of employees and specific report information.

Following data extraction, the contents are formatted in Microsoft Power Automate and integrated into a professional and readable report layout.

LLM_Fig3.jpg
Fig. 3—Workflow diagram: Steps involved in data collection, processing, AI report generation, distribution, and feedback collection.

During the development phase, several large language models were evaluated, including OpenAI’s GPT-2 and GPT-3.5 Turbo, Facebook’s BART, and Google’s Gemini. Results from OpenAI GPT-2 and BART often exhibited issues with incoherence and irrelevant texts, necessitating specific prompts for each report, which increased setup time. The Gemini model met the requirements for length and information but lacked accuracy and consistency. Ultimately, the OpenAI GPT-3.5 Turbo model was selected for its superior performance in generating well-structured and accurate reports based on general parameters suitable for all cases.

The workflow includes format processing and data extraction steps to handle output data effectively. Each report entry is thoroughly iterated, retrieving corresponding staff photos to enhance the visual appeal and personalization of the reports. The process concludes with the “Send an Email” action, distributing the finalized reports to the intended recipients.

Results and Discussions
Preliminary testing of the automated report generation system has demonstrated promising results, notably reducing the time supervisors spend reviewing weekly reports. The AI-generated reports are both accurate and informative, providing supervisors with a comprehensive understanding of each employee’s activities, engagement and performance.

The integration of Microsoft Power Automate and Power Apps has been instrumental in achieving these results. Microsoft Power Automate enhances the workflow by automating repetitive tasks, ensuring consistent data processing and report generation without manual intervention. The Sub-Surface App Hub enhances user interaction with a customizable interface, simplifying the process for supervisors to access and review reports.

Tester feedback has been overwhelmingly positive, with high satisfaction ratings for the system’s usability and efficiency. The AI-powered tool presents data in a clear and concise manner, making it easier to identify key performance indicators and areas for improvement. It also manages large volumes of data and generates reports swiftly, reducing the administrative burden. This efficiency has led to more timely interventions when issues are identified.

LLM_Fig4.jpg
Fig. 4—Cost and performance comparison of various GPT models.

Fig. 4 presents a comprehensive comparison of different GPT models based on relative cost and performance metrics. Fig. 4a displays the performance comparison using Bilingual Evaluation Understudy (BLEU), Recall-Oriented Understudy for Gisting Evaluation (ROUGE), and Perplexity scores, which are essential in assessing the effectiveness of natural language processing (NLP) models. Higher BLEU and ROUGE scores indicate better text generation, while lower Perplexity scores denote superior predictive accuracy.

OpenAI GPT-4o outperforms the other models across all metrics, showcasing significant improvements in model architecture and training data. However, OpenAI GPT-3.5 Turbo emerges as the optimal choice because of its substantial improvement over its predecessor, OpenAI GPT-2, in both BLEU and ROUGE scores and its balanced cost-efficiency.

Facebook BART and Google Gemini exhibit competitive performance, particularly in ROUGE and Perplexity scores, indicating their robustness in specific NLP tasks.

Fig. 4b shows the cost analysis per 1,000 tokens, measured in US dollars, providing a comparison of the economic efficiency of each model. Despite the higher performance of newer models such as OpenAI GPT-4o, they come with increased costs, highlighting the trade-off between performance and economic efficiency.

In conclusion, Fig. 4 illustrates the progression and advancements in GPT models from OpenAI GPT-2 to OpenAI GPT-4o, reflected in improved BLEU and ROUGE scores and decreased Perplexity scores. This demonstrates continuous innovation in NLP model development and the contributions of different research teams in the field, with OpenAI GPT-3.5 Turbo standing out as the most cost-effective and high-performing option.

To determine the monthly cost per employee for using different language models in the proposed system, a detailed analysis is conducted based on token-based pricing. Tokens are units used to measure the length of text processed by language models like OpenAI GPT-3.5 Turbo and OpenAI GPT-4o, where a token can range from a single character to a whole word. For example, consider a typical sentence of about 100 tokens: “The quick brown fox jumps over the lazy dog.” In this setup, each report input consists of approximately 300 to 400 words, with the output ranging from 400 to 500 words, totaling about 1,200 tokens per report. This length is ideal because it strikes a balance between being comprehensive and concise, avoiding overwhelming the reader with excessive information or providing too little detail. Moreover, managing the report length helps control costs, as pricing is based on the number of tokens used.

OpenAI GPT-3.5 Turbo and OpenAI GPT-4o are selected for AI-generated reports (Fig. 5) because of their advanced natural language processing capabilities, which ensure high-quality, coherent, and contextually accurate outputs. OpenAI GPT-3.5 Turbo is cost-effective and efficient, suitable for generating standard reports for routine tasks and large-scale deployments at a lower cost. In contrast, OpenAI GPT-4o, while more expensive, offers enhanced accuracy and detail, making it ideal for more complex and nuanced reporting needs. The use of these models enables a balance between cost-efficiency and high performance, meeting varying organizational needs. For instance, standard reports can be generated using OpenAI GPT-3.5 Turbo, while more detailed reports can use OpenAI GPT-4o, optimizing both budget and quality of output.

LLM_Fig5.jpg
Fig. 5—Sample AI-generated weekly report: This figure demonstrates the format and content of the AI-generated weekly reports, providing comprehensive insights into employee activities and performance.

Conclusion
The implementation of an automated report generation system using Microsoft Power Automate, Power Apps, and AI-driven models such as OpenAI GPT-3.5 Turbo has shown potential for significant benefits in terms of efficiency, accuracy, and user satisfaction. Preliminary testing has shown a notable reduction in the time supervisors spend reviewing weekly reports, providing comprehensive insights into employees’ activities, engagement, and performance. Furthermore, the economic viability of using OpenAI GPT-3.5 Turbo has been underscored by cost analysis, demonstrating that organizations can achieve substantial efficiency gains and cost savings through the strategic implementation of AI-driven automation in report generation. However, further refinement and real-world application are necessary to fully validate these benefits.

Future Works
Moving forward, the scope of automation is aimed to expand to other reporting areas and documentation types. One specific area of development is the integration of a text-to-speech feature powered by AI. This technology, capable of providing multiple built-in voices, aims to make reports accessible to a broader audience, including individuals with visual impairments and those who prefer auditory learning. Additionally, enabling report narration in various languages will help cater to a diverse global workforce, thereby enhancing inclusivity, collaboration, and engagement.

Another key area is improving the depth and detail of the summaries generated. This may involve prompting users to provide more comprehensive inputs, rather than brief one- or two-word entries, to enhance the quality of the generated reports. Moreover, to further refine the system, we plan to implement sentiment analysis to evaluate employee performance based on project updates. Sentiment analysis, a branch of NLP, involves analyzing text to determine the emotional tone behind it. By understanding the sentiment of project updates, we can gain insights into employee morale and the overall team sentiment. This feature, while potentially sensitive, could help in identifying stress or dissatisfaction early, enabling for timely interventions.

Furthermore, leveraging AI for sentiment analysis in performance evaluations can mitigate biases inherent in traditional assessment methods, leading to fairer assessments and better-aligned organizational support. Additionally, exploring tools such as Microsoft Copilot for drafting inputs could streamline the reporting process, although this feature is still in the early stages of adoption. As noted in studies by McKinsey and the Harvard Business Review, organizations with highly engaged employees experience significantly higher profitability and automated sentiment analysis can provide a more objective evaluation of employee performance. By integrating these advanced features, organizations can not only enhance the utility and reach of their reporting systems but also foster a more inclusive, supportive, and productive work environment.



Acknowledgements
We would like to express our sincere gratitude to everyone who contributed to the success of this research. Special thanks are extended to Murphy Oil Corp. for its continuous support and encouragement. Additionally, we appreciate all users who provided valuable feedback during the preliminary testing phase. Your input has been crucial in refining our solution and ensuring its practical applicability. Lastly, we thank our company for providing the necessary resources and an enabling environment to conduct this research, which has been instrumental in achieving the outcomes detailed here.