Speakers

AI and the Future of Work
Sponsored by Wharton Human-AI Research
May 21-22, 2025

Jon M. Huntsman Hall
3730 Walnut Street, 8th Floor
Philadelphia, PA 19104

Keynote Speaker

Elie Schoppik

Elie Schoppik

Head of Technical Training, ANTHROPIC

"Coding with Claude Code"

Discover how Claude Code is revolutionizing software development through AI-powered pair programming. In this hands-on session, you’ll learn how to leverage Anthropic’s command-line tool that allows developers to delegate coding tasks directly from their terminal. We’ll explore practical workflows that demonstrate how Claude Code can accelerate development cycles, reduce debugging time, and help anyone technical focus on high-level problem-solving rather than implementation details.

The session will cover:

  • Getting started with Claude Code in your development environment
  • Effective prompting techniques for optimal code generation
  • Real-world use cases and implementation examples
  • Best practices for integrating AI pair programming into existing workflows
  • Limitations and responsible use considerations

Presenters

Manmohan Aseri

Manmohan Aseri

Assistant Professor, University of Maryland

“Automating Truth at Scale: Incentive Issues in Developing LLM Fact Checkers”

With the rise of LLMs, generating misinformation has become remarkably easy, making it increasingly challenging for humans to keep up with fact-checking. However, these same LLM tools also hold significant potential to serve as automated fact-checkers. The effectiveness of LLM-based fact-checking depends on access to high-quality factual data, either through training sets or dynamic retrieval, which is provided by content providers such as news organizations and publishers. A key challenge, however, lies in the misaligned incentives between LLM developers and content providers. While LLMs seek partnerships to gain access to data, content providers risk losing audiences and revenue by sharing their content. In this paper, we examine the incentive issues in developing LLM-based fact-checkers. Using a game-theoretic model, we show that low-reputation content providers have a natural incentive to collaborate with LLMs and share their data. In contrast, high-reputation content providers are less inclined to do so, necessitating financial incentives from LLM developers to encourage data sharing. Additionally, although partnerships may lead both LLMs and content providers to reduce individual fact-checking quality due to decreased competition, the overall fact-checking quality of the LLM improves through collaboration. We further propose mechanisms to better align the incentives of LLMs and content providers to achieve high-quality LLM-based fact-checking. Our results provide useful insight for practitioners and policymakers regarding containing the spread of misinformation by automated detection.

BANA_Sarah

Sarah Bana

Assistant Professor, Chapman University

"AI-Enabled Job Markets and Market Participation: Jobseekers’ ’Rational Expectations’ about Competition vs ‘AI Aversion’”

As AI matching algorithms become increasingly important in modern job search, we examine how jobseekers respond to learning that job recommendations come from AI systems. While existing research has focused on improving algorithmic matching quality, far less attention has been paid to jobseeker responses. Through a field experiment with 4,562 jobseekers on a labor market platform, we find a series of patterns consistent with jobseekers forming expectations about how AI recommendations change expected payoffs. When AI involvement is disclosed, participation drops by 27.4 percent (2.4 percentage points) compared to when no information about the recommendation source is provided. While this could reflect preferences regarding AI (i.e., AI aversion), our evidence points to a more fundamental mechanism: jobseekers respond strategically to how AI changes the competitive environment. Specifically, jobseekers appear to update their beliefs about match quality, evaluation criteria, and the reach of recommendations when they learn AI is involved. These findings have important consequences for firms and platform designers, as they navigate inviting high quality candidates under different expectations of competition.

Seth Benzell

Seth Benzell

Assistant Professor, Chapman University

“Robots Are Us: Some Economics of Capital Accumulation in the Age of AI”

Will smart machines do to humans what the internal combustion engine did to horses – make them obsolete? If so, can putting people out of work or, at least, good work leave them unable to buy what smart machines produce? Our model’s answer is yes. Over time and under the right conditions, supply reduces demand, leaving everyone worse off in the long-run. Carefully crafted redistribution policies can prevent such immiserating growth. But blunt policies, such as limiting intellectual property rights or restricting labor supply, can make matters worse.

Eric Bogert

Eric Bogert

Assistant Teaching Professor, Northeastern University

"Effects of AI Feedback on Learning, Inequality, and Intellectual Diversity"

We investigate how their AI use affects three interrelated long-term outcomes. First, we show that individuals are far more likely to seek AI feedback in situations in which they experienced success rather than failure. This AI feedback seeking strategy turns out to be detrimental to learning: Feedback on successes decreases future performance, while feedback on failures increases it. Second, higher-skilled decision-makers seek AI feedback more often and are far more likely to seek AI feedback after a failure, and benefit more from AI feedback than lower-skilled individuals. As a result, access to AI feedback increases, rather than decreases, the skill gap between high- and low-skilled individuals. Finally, we leverage 42 major platform updates as natural experiments to show that access to AI feedback causes a decrease in intellectual diversity of the population as individuals tend to specialize in the same areas. Together, those results indicate that learning from AI feedback is not automatic and using AI correctly seems to be a skill itself. Furthermore, despite its individual-level benefits, access to AI feedback can have significant population-level downsides including loss of intellectual diversity and an increasing skill gap.

byrne-david

David Byrne

Principal Economist, Federal Reserve Board

“Generative AI at the Crossroads: Light Bulb, Dynamo, or Microscope?”

With the advent of generative AI (genAI), the potential scope of artificial intelligence has increased dramatically, but the future effect of genAI on productivity remains uncertain. We consider the evidence that generative AI (genAI) is widely used (“light bulb”), spurs new products, processes, and organizations (“dynamo”), and makes R&D more efficient (“microscope”). Which type(s) of technology genAI may be will determine its impact on productivity growth? GenAI-induced productivity gains will depend on the range of tasks enhanced by the technology, the extent of adoption for those tasks, and the success of firms in integrating the adopted technology; it is too early to draw confident conclusions on those questions. Moreover, the effects of the technology on the innovation process will play a crucial role. Some labor-saving innovations, such as the light bulb, temporarily raise productivity growth as adoption spreads, but the effect fades when the market is saturated; that is, the level of output per hour is permanently higher but the growth rate is not. In contrast, two types of technologies stand out as having longer-lived effects on productivity growth. First, there are technologies known as general-purpose technologies (GPTs). GPTs are (1) widely adopted, (2) spur abundant knock-on innovations (new goods and services, process efficiencies, and business reorganization), and (3) improve continuously, refreshing this innovation cycle; the electric dynamo is an example. Second, there are inventions of methods of invention (IMIs). IMIs increase the efficiency of the research and development process, generating new ideas more quickly and cheaply; the compound microscope is an example. We show that GenAI has the characteristics of both a GPT and an IMI—an encouraging sign. Even so, for genAI to boost productivity growth, its contribution will have to outpace past IT innovations which are baked into the trend, including machine learning.

Wei Chen

Wei Chen

Associate Professor, University of Connecticut

“Generative AI and Organizational Structure in the Knowledge Economy”

The adoption of Generative Artificial Intelligence (GenAI) is fundamentally reshaping organizations in the knowledge economy. GenAI can significantly enhance workers’ problem-solving abilities and productivity, yet it also presents a major reliability challenge: hallucinations, or errors presented as plausible outputs. This study develops a theoretical model to examine GenAI’s impact on organizational structure and the role of human-in-the-loop oversight. Our findings indicate that successful GenAI adoption hinges primarily on maintaining hallucination rates below a critical level. After adoption, as GenAI advances in capability or reliability, organizations optimize their workforce by reducing worker knowledge requirements while preserving operational effectiveness through GenAI augmentation—a phenomenon known as deskilling. Unexpectedly, enhanced capability or reliability of GenAI may actually narrow the span of control, increasing the demand for managers rather than flattening organizational hierarchies. To effectively mitigate hallucination risks, many firms implement human-in-the-loop validation, where managers review GenAI-enhanced outputs before implementation. While the validation increases managerial workload, it can, surprisingly, expand the span of control, reducing the number of managers needed. Furthermore, human-in-the-loop validation influences GenAI adoption differently based on validation costs and hallucination rates, deterring adoption in low-error, high-cost scenarios, while promoting it in high-error, low-cost cases. Finally, productivity improvements from GenAI yield distinctive organizational shifts: as productivity increases, firms tend to employ fewer but more knowledgeable workers, gradually expanding managerial spans of control. Our research directly addresses calls for theoretical frameworks to understand how GenAI technologies reshape organizational structures and the future of work, while providing practical guidance for organizations navigating this transformation.

Jung Ho Choi

Jung Ho Choi

Assistant Professor, Stanford University

"Human + AI in Financial Reporting: Early Evidence from the Field”

This paper studies the usage of AI in financial reporting. We partner with a technology startup that provides AI tools to accountants. We combine field data from 74 companies with hundreds of thousands of transactions and survey data from 277 accountants in the US to understand how this growing technology is incorporated into accounting processes, especially in transaction analysis, at both the task and transaction levels. We provide the following early evidence: 1) 38% of survey participants report having fully or partially integrated AI into their workflows, including financial reporting tasks. 2) The usage of AI is associated with a 66% increase in the number of clients supported in the same week, with accountants reallocating approximately 10% of their time from routine tasks to business communication and quality assurance. 3) AI adoption is associated with a 4-day reduction in the time required for monthly financial reporting. 4) Experienced accountants tend to use these tools more frequently but remain cautious about relying on them for complex tasks (e.g., accruals). 5) The LLM-powered AI software generates high-confidence scores for validated transactions, efficiently auto-categorizes accounts without specific rules, and integrates data from multiple sources using machine learning.

Avinash Collis

Avinash Collis

Assistant Professor, Carnegie Mellon University

"LLM Time Machines: Valuing Digital Goods Over Time”

Digital goods generate a significant amount of consumer welfare, yet the magnitude of these welfare gains is hard to estimate due to the lack of prices since most of these goods are free to consumers. Moreover, to fully understand their impact and track the welfare gains over time, we must assess how their value has evolved since their introduction. This is particularly challenging as newly introduced digital goods have limited consumer awareness, and current perceptions often bias retrospective estimates of previous consumer valuations. We investigate the feasibility of using large language models (LLMs) to estimate the valuations of digital goods via incentive-compatible single binary discrete choice experiments. We benchmark LLMs against valuations obtained from these choice experiments on representative samples of US populations from 2016-24. We find that valuations generated by LLMs are similar to valuations estimated using humans and follow similar patterns over time. Moreover, LLMs can be potentially used to extrapolate, going back or forward in time. We conclude by offering some guidance on using LLMs to generate longitudinal data on the valuations of digital goods and other types of goods.

Anastassia Fedyk

Anastassia Fedyk

Assistant Professor of Finance, UC Berkeley, Haas School of Business

“Data Innovation Complementarity and Firm Growth”

This paper examines how complementarity between a firm’s general innovation and data-security innovation affects firm outcomes in the modern data economy. Our theoretical model shows that when the importance of data and its protection increases, firms with high complementarity between data-security and non-data-security innovation enter a virtuous cycle. They take advantage of this complementarity to improve their broader predictive capabilities and extend their productivity frontier. Empirically, we propose a novel firm-level measure of data innovation complementarity based on the intersection of patent inventors who work on both data-security-related and non-data-security related patents. Leveraging the staggered introduction of Data Breach Notification Laws (DBNLs) across U.S. states as a quasi-exogenous shock, we provide robust empirical evidence that heightened incentives to protect in-house generated data activates this feedback loop. We find that firms with complementary data innovation processes experience significant increases in (overall) innovation and profitability, by not only enhancing their in-house data security measures but also integrating these innovations across other domains. In contrast, firms without complementary data experience negative effects from the data protection laws. Our results highlight the dual role of data in driving firm-level market power and innovation dynamics.

FrancescoFilippucci

Francesco Filippucci

Economist, Organisation for Economic Co-operation and Development (OECD)

“Aggregate Productivity Gains from Artificial Intelligence: a Sectoral Perspective”

Artificial Intelligence delivers large productivity gains in specific tasks, but its impact on aggregate productivity remains debated. This paper discusses strategies for aggregating micro-level productivity gains and argues that plausible assumptions about AI exposure and adoption rates imply substantial aggregate productivity gains from AI. However, the implied productivity gains also vary strongly between sectors, likely affecting the sectoral composition of the economy. Accounting for such structural change, we project that AI could contribute between 0.3-0.9 percentage points to annual TFP growth over the next decade. Sectoral differences in AI-driven productivity growth diminish aggregate productivity gains through a Baumol effect, especially if elasticities of substitution in consumption across sectors are low and factor reallocation is limited.

Simon Friis

Simon Friis

Postdoctoral Fellow, Harvard Business School

“Performance vs. Principle: Mapping Resistance to AI in the U.S. Labor Market”

Even when artificial intelligence is apt for a task, people often vigorously resist its use. In a survey of 2,357 U.S. adults evaluating 940 occupations, we decompose resistance into performance-based concerns, which fade as AI surpasses human capabilities at lower cost, and principle-based objections, which persist even under such “perfect AI.” Eliminating performance concerns nearly doubles public support for AI-driven automation (from 30% to 58% of occupations), yet a meaningful subset—12%—remains categorically off-limits because using AI in these roles is viewed as morally repugnant regardless of benefits. This misalignment between where AI can excel and where society permits its use defines a previously uncharted moral frontier, refining forecasts of AI-driven labor displacement and underscoring the need for socially informed AI policy.

Daniel Goldstein

Daniel Goldstein

Senior Principal Research Manager, Microsoft Research

“Facilitating Meetings with LLMs: An Experimental Study of Group Decision Making”

From hiring committees, to medical teams, to intelligence communities, the success of group decision making hinges upon a group’s ability to collect information from each person involved, and to structure and process that information to make a decision. While a broad body of literature has explored how large language models (LLMs) can assist individual users in a wide range of domains, including writing, programming, art, and education, our current understanding of how these models can assist groups of people using them together is relatively nascent. Here, we present a pre-registered (AsPredicted #192061) randomized experiment exploring the utility of LLMs as facilitators for group decision-making. In contrast to existing applications that generate retrospective summaries of a discussion, we demonstrate the promise of using LLMs to actively guide and shape the discussion as it occurs.

Brad Greenwood

Brad Greenwood

Maximus Corp. Professor of Business, Costello College of Business, George Mason University

“The Effect of Gunshot Detection Technologies on Policing Practices: An Empirical Examination of the Chicago Police Department”

Gunshot Detection Technologies (GDTs) use acoustic sensors and machine learning algorithms to identify and pinpoint the location of gunfire. Despite their promise as a tool to enhance crime prevention and resolution, GDTs have faced significant criticism from the public, legal advocates, and the popular press; who argue that these technologies exacerbate racial bias in policing. Surprisingly, although GDTs impact on gun crime has been studied at length, their effects on broader policing practices remain underexplored. We address that gap in this work; wherein we investigate the effect of GDT deployment in the city of Chicago on a variety of policing outcomes (viz. citizen stops, arrests, complaints lodged against officers, levels of crime). Exploiting the phased introduction of GDTs across Chicago police districts, we employ a difference in differences design to estimate the effect on each outcome. Contrary to claims that GDTs lead to over-policing, particularly in minority neighborhoods, we find that GDT deployment significantly reduces the volume of citizen stops, both in general and among racial subgroups, with no detectible heterogeneity in effect magnitudes. Further, although we find no systematic correlation with crime levels, we do observe that GDT deployment increases the likelihood of an arrest being made, conditional on a crime taking place. Together, these results suggest that GDTs may act as a substitute for in-person monitoring on the part of police officers, yielding greater operational efficiency. However, we also observe that GDT deployment leads to a systematic rise in citizen complaints, particularly claims that officers have engaged in improper search. This last result is consistent with the idea that police may use GDTs’ notifications as a pretext for overly aggressive investigation. Collectively, our findings suggest a nuanced dynamic: GDTs appear to drive reductions in the extent of policing, yet a rise in the intensity of investigations that do take place. Theoretical and practical implications are discussed within.

Marius Guenzel

Marius Guenzel

Assistant Professor of Finance, The Wharton School

“The Effect of Gunshot Detection Technologies on Policing Practices: An Empirical Examination of the Chicago Police Department”

Human capital—encompassing cognitive skills and personality traits—is critical for labor market success, yet the personality component remains difficult to measure at scale. Leveraging advances in artificial intelligence and comprehensive LinkedIn microdata, we extract the Big 5 personality traits from facial images of 96,000 MBA graduates, and demonstrate that this novel “Photo Big 5” predicts school rank, compensation, job seniority, industry choice, job transitions, and career advancement. Using administrative records from top-tier MBA programs, we find that the Photo Big 5 exhibits only modest correlations with cognitive measures like GPA and standardized test scores, yet offers comparable incremental predictive power for labor outcomes. Unlike traditional survey-based personality measures, the Photo Big 5 is readily accessible and potentially less susceptible to manipulation, making it suitable for wide adoption in academic research and hiring processes. However, its use in labor market screening raises ethical concerns regarding statistical discrimination and individual autonomy.

 

 

Anne Hansen

Anne Hansen

Senior Financial Economist, Richmond Fed

"Simulating the Survey of Professional Forecasters”

We simulate economic forecasts of professional forecasters using large language models (LLMs). We construct synthetic forecaster personas using a unique hand-gathered dataset of participant characteristics from the Survey of Professional Forecasters. These personas are then provided with real-time macroeconomic data to generate simulated responses to the SPF survey. Our results show that LLM-generated predictions are similar to human forecasts, but often achieve superior accuracy, particularly at medium- and long-term horizons. We argue that this advantage arises from LLMs’ ability to extract latent information encoded in past human forecasts while avoiding systematic biases and noise. Our framework offers a cost-effective, high-frequency alternative that complements traditional survey methods by leveraging both human expertise and AI precision.

 

Chenchuan He

Chenchuan He

PhD Student, University of Delaware

“PersonaCoder: How Personality Influences Coding Performance of LLM Agents”

This study investigates the impact of personality traits on the coding performance of LLM-based AI agents. We systematically assigned personality attributes to AI agents using the Myers-Briggs Type Indicator (MBTI), a widely recognized psychological framework that categorizes individuals into 16 personality types based on four dichotomous dimensions: Extraversion (E) vs. Introversion (I), Sensing (S) vs. Intuition (N), Thinking (T) vs. Feeling (F), and Judging (J) vs. Perceiving (P). Drawing upon prior research on personality traits in human programmers, we hypothesized that agents with Sensing (S) and Thinking (T) traits would exhibit superior coding performance due to their systematic, detail-oriented, and logic-driven cognitive styles. Additionally, we incorporated a reflection mechanism into the agents’ workflows, enabling them to iteratively refine their solutions based on feedback from failed test cases.

Alex Imas

Alex Imas

Professor, University of Chicago Booth School of Business

"Measuring Systemic Discrimination in High Dimensional Data"

This paper presents novel methodological frameworks for measuring discrimination in high-dimensional data contexts, with particular relevance to AI systems and algorithmic decision-making. We develop an “iterated audit” approach that distinguishes between direct discrimination (conditional on observable characteristics) and systemic discrimination (arising from interactions across decision nodes). Our framework specifically addresses challenges in quantifying discrimination when variables subject to disparities are high-dimensional, such as text data or complex feature spaces typical in machine learning applications. We demonstrate how discrimination can emerge indirectly from biased data collection and training systems, even when algorithms are ostensibly “blinded” to protected characteristics. Through two field experiments, we validate our approach: first examining gender-based disparities in recommendation letter language and their labor market impacts, and second investigating racial discrimination in sequential hiring decisions. Traditional methods fail to capture discriminatory patterns in high-dimensional signals, as evidenced by our finding that standard text analysis approaches explain minimal variation in outcomes. Our methodology provides researchers and practitioners with robust tools to identify discrimination in complex AI systems where conventional approaches would miss significant systemic patterns of inequity.

Karthik Babu Nattamai Kannan

Karthik Babu Nattamai Kannan

Assistant Professor of IT and Operations Management, Southern Methodist University

“GenAI Usage Disclosure in Open-Source Project: An Exploratory Analysis”

The integration of generative AI (GenAI) into software development has prompted growing concerns around transparency, ethics, and intellectual property, especially within open-source communities. In response, the Apache Software Foundation (ASF) issued guidelines in mid-2023 advocating for explicit disclosure of GenAI tooling in project contributions. This study examines how contributors to selected ASF projects have adhered to these disclosure guidelines between August 2023 and December 2024. Using a dataset of 8,720 pull requests across five ASF repositories that implemented standardized disclosure prompts, we categorize contributor responses as “Yes,” “No,” “Maybe,” or non-response. The analysis reveals that while the majority of contributors (over 90%) explicitly denied using GenAI tools, only 0.58% acknowledged such use, with 9.2% expressing uncertainty. These patterns suggest high compliance with disclosure protocols but also point to possible underreporting or definitional ambiguities surrounding GenAI usage. The findings highlight the need for clearer guidance, community dialogue, and further research into contributors’ motivations and perceptions related to GenAI disclosures in open-source software development.

 

Anna Kawakami

Anna Kawakami

PhD Student, School of Computer Science, Carnegie Mellon University

"AI Failure Loops: The Confluence of Overconfidence in AI and Underconfidence in Worker Expertise.”

A growing body of literature has focused on understanding and addressing workplace AI design failures. However, past work has largely overlooked the role of the devaluation of worker expertise in shaping the dynamics of AI development and deployment. In this paper, we examine the case of feminized labor: a class of devalued occupations historically misnomered as “women’s work,” such as social work, K-12 teaching, and home healthcare. Drawing on literature on AI deployments in feminized labor contexts, we conceptualize AI Failure Loops: a set of interwoven, socio-technical failure modes that help explain how the systemic devaluation of workers’ expertise negatively impacts, and is impacted by, AI design, evaluation, and governance practices. These failures demonstrate how misjudgments on the automatability of workers’ skills can lead to AI deployments that fail to bring value to workers and, instead, further diminish the visibility of workers’ expertise. We discuss research and design implications for workplace AI, especially for devalued occupations.

 

 

Jin Kim

Jin Kim

Postdoctoral Researcher, Northeastern University

“People Reduce Workers’ Compensation for Using Artificial Intelligence (AI)”

The growing use of AI in the workplace may have unintended financial consequences for workers. Across 11 studies (N = 3,851), we consistently find that people reduce compensation for workers who use AI tools compared to those who do not. This “AI Penalization” effect was robust across hypothetical and real workers, different types of work and employment statuses, different forms and timing of compensation, and various methods of eliciting compensation. The effect appears to be partly driven by people’s perception that AI-assisted workers deserve less credit for their work than unassisted workers. These findings highlight the importance of recognizing the financial disadvantages AI adoption may create for workers.

Caleb Kwon

Caleb Kwon

Assistant Professor, McCombs School of Business, the University of Texas at Austin

“Human-Algorithm Interactions in Labor Scheduling Decisions”

 

We examine whether corporate officers should allow store managers to override AI scheduling tools that fully automate the generation of employee work schedules. Examining administrative data from a large grocery retailer spanning more than 500 stores, 100,000 employees, 46 million shifts, and 1.5 million store-date observations, we find that managers’ overrides increase store labor productivity. We identify two key channels through which these overrides create value. First, managers have private information about demand and their employees. The former is demonstrated by their ability to better align labor with realized demand (proxied by customer foot traffic). The latter is shown by their concentration of overrides on newer employees, for whom information from interviews, resumes, and early interactions is absent from the AI tool’s training data. Second, managers flexibly navigate and expand the AI tool’s constraints on employee availability and task allocation, thereby increasing the feasible set of scheduling solutions. The value of overrides also increases with managerial tenure, which suggests that managers accumulate valuable knowledge about store demand and their employees over time. Overall, our findings suggest that store managers add value to labor scheduling even when stores are equipped with fully automated AI scheduling systems.

Yong Lee

Yong Lee

Associate Professor, University of Notre Dame

“Advancing AI Capabilities and Evolving Labor Outcomes”

This study examines the evolving capabilities of AI and its impact on the labor market by examining the continuously advancing AI capabilities and real-time labor market shifts in employment status and work hours across occupations and demographic groups. Leveraging task-level data, we construct a dynamic Occupational AI Capabilities Score based on a task-level assessment using multiple state-of-the-art AI models, including OpenAI’s ChatGPT 4o, Anthropic’s Claude 3.5, and Meta’s Llama 3. Unlike previous studies that rely on static automation risk or exposure estimates, this approach introduces a five-stage framework that evaluates how AI’s capability to perform tasks in occupations changes as technology advances from traditional machine learning to agentic AI. Using this staged framework, we illustrate the extent to which occupations and employment are exposed to AI using data from the Occupational Employment and Wage Statistics. The Occupational AI Capabilities Scores are then linked to Current Population Survey data, allowing for empirical analysis of employment, unemployment, and work-hour trends. Additionally, occupation-level event studies describe how AI capabilities reshape job structures in specific professions over time. We find that increasing AI capabilities significantly increases unemployment and decreases work hours in occupation level long-differenced regressions. In individual level regressions comparing occupations highly exposed to AI to the rest before and after the release of ChatGPT we find similar results on unemployment and work hours. We find stronger effects among the highly educated and the younger workers. Overall, the results suggest that AI-driven labor shifts are occurring both at the extensive margin (unemployment) and the intensive margin (work hours), but with varied effects across different occupations and demographic groups. Ongoing monitoring of AI’s labor market impact will be essential. Given the speed of AI advancements, this paper highlights the importance of real-time tracking and dynamic modeling of AI capabilities in the labor market and to anticipate labor market changes and adapt workforce policies accordingly.

Frank Li

Frank Li

Assistant Professor, University of British Columbia

“Unpacking the AI Transformation: The Impact of AI Strategies on Firm Performance”

Artificial intelligence (AI) technologies hold great potential for large-scale economic impacts. Aligned with this trend, recent studies explore the impact of AI technologies on firm performance. However, they predominantly measure firms’ AI capabilities with input (e.g., labor/job posting) or output (e.g., patents), neglecting to consider the strategic direction toward AI in business operations and value creation. In this paper, we empirically examine how firms’ AI strategic orientation affects firm performance from a dynamic capabilities perspective. We create a novel firm-year AI strategic orientation measure by employing a large language model to analyze business descriptions in 10-K filings and identify an increasing trend and changing status of AI strategies among U.S. public firms. Our long-difference analysis shows that AI strategic orientation is associated with greater operating cost, capital expenditure, and firm value but not sales, showing the importance of strategic direction toward AI to create business value. By further dissecting firms’ AI strategic orientation into AI awareness, AI product orientation, and AI process orientation, we find that AI awareness is generally not related to performance, that AI product orientation is associated with short-term increased operating expenses and long-term firm value, and that AI process orientation is associated with long-term increased costs and sales. Moreover, we find the moderating effect of environmental dynamism. This study contributes to the recent AI strategy and management literature by providing the strategic role of AI orientation on firm performance.

Jingjing Li

Jingjing Li

Andersen Alumni Associate Professor of Commerce, University of Virginia

"Revolutionizing Physician Workflows and Well-being with Generative AI: A Field Study on the Adoption and Impacts of DAX Copilot in a Major Health System”

“Physician burnout and mental well‐being are critical challenges in the U.S. healthcare system. A major contributor is the administrative burden of documentation, which often detracts from physicians’ attention during patient visits and requires them to work after hours—commonly referred to as “pajama time.” Generative AI presents a promising solution to this challenge. In particular, Microsoft’s Dragon Ambient eXperience (DAX) Copilot listens to physician‐patient encounters and automatically generates structured medical notes that are seamlessly integrated with Electronic Health Record (EHR) systems. This innovation promises to streamline physician workflows and improve physicians’ focus on patient care. Our research employs a multi‐method field study to systematically investigate two key questions: (1) What contextual factors influence the impact of DAX on physician workflow, productivity, well‐being, and patient care? (2) What barriers and enablers shape DAX adoption, and how can implementation strategies be optimized for sustained, effective use?

Benjamin Lira Luttges

Benjamin Lira Luttges

Doctoral Candidate, University of Pennsylvania

“Learning from Examples: AI Assistance Can Enhance Rather Than Hinder Skill Development”

It is widely believed that outsourcing cognitive work to AI boosts immediate productivity at the expense of long-term human capital development. An overlooked possibility is that AI tools can support skill development by providing just-in-time, high-quality, personalized examples. In this investigation, lay forecasters predicted that practicing writing cover letters with an AI tool would impair learning compared to practicing writing letters without the tool. However, in a highly-powered pre-registered experiment, participants randomly assigned to practice writing with AI improved more on a writing test one day later compared to writers assigned to practice without AI. Notably, writers given access to the AI tool improved more despite exerting less effort, whether measured by time on task, keystrokes, or subjective ratings. We replicated and extended these results in a second pre-registered experiment, showing that writers given access to the AI tool again outperformed those who practiced on their own — but performed no better than writers merely shown an AI-generated cover letter that they could not edit. Collectively, these findings constitute an existence proof that by providing personalized examples of high-quality work, AI tools can improve, rather than undermine, learning.

Isabella Loaiza

Isabella Loaiza

Postdoctoral Associate, MIT Sloan

“Measuring New Work in the Age of AI”

This study examines how technological progress reshapes work by not only automating tasks but also creating new ones, though unevenly across occupations and wage levels. While much of the existing research focuses on the displacement of tasks by AI, this paper addresses a critical gap: systematically analyzing how tasks emerge, evolve, and disappear over time. Using historical O*Net data, the study develops a novel methodology to standardize and track task-level changes, leveraging a model called TaskBERT to extract key components of each task. This approach offers a more optimistic perspective on the future of work, highlighting opportunities for workers in an AI-driven economy.

Simon Lowe

Simon Lowe

Economist, The Burning Glass Institute

“Mapping AI Use Cases: A Large-Scale Analysis of Job Postings”

The rapid expansion of artificial intelligence (AI) across industries has intensified interest in understanding how this transformative technology reshapes labor markets, skill requirements, and economic development. While online job advertisements are commonly used to track labor demand, they also provide a unique window into the real-world AI applications that companies are actively developing and deploying. By analyzing the textual content of millions of job postings using state-of-the-art natural language processing (NLP) techniques, this study constructs and applies a novel taxonomy of AI use cases, offering a systematic framework for assessing how AI adoption varies across industries, organizations, occupations, and geographies—and how these trends evolve over time. This granular approach not only identifies the companies and industries most actively recruiting for AI roles but also reveals the evolving skill profiles and educational backgrounds that employers seek. Our taxonomy allows us to track shifts in demand for specific technical competencies—such as advanced mathematics, data engineering, and specialized machine learning frameworks—offering a detailed perspective on how AI is reshaping occupational structures.

Benjamin Manning

Benjamin Manning

PhD Student, MIT

“AI Agents Can Enable Superior Market Designs”

Many theoretically appealing market designs are under-utilized because they demand preference data that humans find costly to provide. This paper demonstrates how large language models (LLMs) can effectively elicit such data from natural language descriptions. In our experiment, human subjects provide free-text descriptions of their tastes over potential roles they could be assigned. An LLM converts these descriptions into cardinal utilities that capture participants’ preferences. We use these utilities and participants’ stated preferences to facilitate three allocation mechanisms—random serial dictatorship, Hylland-Zeckhauser, and a conventional job application type game. A follow-up experiment confirms that participants themselves prefer LLM-generated matches over simpler alternatives under high congestion. These findings suggest that LLM-proxied preference elicitation can enable superior market designs where they would otherwise be impractical to implement.

Kristina Mcelheran

Kristina McElheran

Assistant Professor, University of Toronto

“Industrial AI in America: Microfoundations of the Productivity J-curve(s)”

We examine the prevalence and productivity dynamics of artificial intelligence (AI) in American manufacturing. Working with the Census Bureau to collect detailed large-scale data for 2017 and 2021, we focus on AI-related technologies with industrial applications. We find causal evidence of \textit{J-curve}-shaped returns, where short-term performance losses precede longer-term gains. Consistent with costly adjustment taking place within core production processes, industrial AI use increases work-in-progress inventory, investment in industrial robots, and labor shedding, while harming productivity and profitability in the short run. These losses are unevenly distributed, concentrating among older businesses while being mitigated by growth-oriented business strategies and within-firm spillovers. Dynamics, however, matter: earlier (pre-2017) adopters exhibit stronger growth over time, conditional on survival. Notably, among older establishments, abandonment of structured production-management practices accounts for roughly one-third of these losses, revealing a specific channel through which intangible factors shape AI’s impact. Taken together, these results provide novel evidence on the microfoundations of technology \textit{J-curves}, identifying mechanisms and illuminating how and why they differ across firm types. These findings extend our understanding of modern General Purpose Technologies, explaining why their economic impact—exemplified here by AI—may initially disappoint, particularly in contexts dominated by older, established firms.

miric-milan

Milan Miric

Associate Professor, USC Marshall School of Business, University of Southern California

“Government Policy and Innovation Outcomes: Evidence from 2006 Chinese Domestic Innovation Initiative on Automation-AI Technologies”

This paper investigates the impact of China’s 2006 Domestic Innovation Initiative—the first full-fledged nationwide innovation policy in China— on the quantity and quality of automation-AI patents filed by universities and firms. Employing a two-way fixed effects approach on a comprehensive data with 1.4 million patents, we find the initiative significantly increased the share of university automation-AI patents by 4.7 percentage points compared to firms but had a minor negative effect on patent quality. Subsample analyses show the initiative’s effects were most pronounced in economically less-developed inland regions, which saw the greatest increases in both share and quality of university automation-AI patents. In addition, the most active universities in patenting experienced significant quantity increases but quality declined, while the least active universities saw no change in quantity but major improvements in quality. These findings highlight the important role of political incentives set by the government in innovation outcomes.

 

 

 

Alex Moehring

Alex Moehring

Assistant Professor, Purdue University

"Designing Human-AI Collaboration: A Sufficient-Statistic Approach”

We develop a sufficient statistic approach to designing collaborative human-AI decision-making policies in classification problems, where AI predictions can be used to either automate decisions or selectively assist humans. The approach allows for endogenous and biased beliefs, and effort crowd-out, without imposing a structural model of human decision-making. We deploy and validate our approach in an online fact-checking experiment. We find that humans under-respond to AI predictions and reduce effort when presented with confident AI predictions. AI under-response stems more from human overconfidence in own-signal precision than from under-confidence in AI. The optimal policy automates decisions where the AI is confident and delegates uncertain decisions while fully disclosing the AI prediction. Although automation is valuable, the benefit from assisting humans with AI predictions is negligible.

Robin Na

Robin Na

PhD Candidate, MIT

“Large Language Models for Research Synthesis and Evaluation”

LLMs, in conjunction with the integrative experiment design, can be used as scalable tools not only to synthesize research papers at scale but also to evaluate the informativeness of each set of papers based on its contribution to more accurate predictions. As a case study in behavioral science, we utilize a public goods game (PGG) prediction benchmark from integrative experiments of PGGs that simultaneously vary 14 parameters, including group sizes, game lengths, return ratios on contribution, peer communication, and the cost and effectiveness of punishment. The effectiveness of punishment shows high heterogeneity depending on these parameters, and the yet-unpublished nature of the manuscript and dataset makes it a held-out dataset for humans and LLMs to predict the effectiveness given the values of these 14 parameters. Structured analysis and prediction results from 1,259 published papers reveal several insights into the current state of relevant PGG literature.

Frank Nagle

Frank Nagle

Assistant Professor, Harvard Business School

"Generative AI and the Nature of Work”

Recent advances in artificial intelligence (AI) technology demonstrate a considerable potential to complement human capital intensive activities. While an emerging literature documents wide-ranging productivity effects of AI, relatively little attention has been paid to how AI might change the nature of work itself. How do individuals, especially those in the knowledge economy, adjust how they work when they start using AI? Using the setting of open source software, we study individual level effects that AI has on task allocation. We exploit a natural experiment arising from the deployment of GitHub Copilot, a generative AI code completion tool for software developers. Leveraging millions of panel observations on work activities over a two year period, we use a program eligibility threshold to investigate the impact of AI technology on the task allocation of software developers within a quasi-experimental regression discontinuity design. We find that having access to Copilot induces such individuals to shift task allocation towards their core work of coding activities and away from non-core project management activities. We identify two underlying mechanisms driving this shift – an increase in independent rather than collaborative work, and an increase in exploration activities rather than exploitation. The main effects are greater for individuals with relatively lower ability. Our results are robust to alternate identification strategies, bandwidth and kernel selections, and variable definitions. Overall, our estimates point towards a large potential for AI to transform work processes and to potentially flatten organizational hierarchies in the knowledge economy.

Terrence Neumann

Terrence Neumann

PhD Candidate, University of Texas at Austin

"Should You Use LLMs to Simulate Opinions? Quality Checks for Early-stage Deliberation”

“Automating opinion surveys could significantly transform how governments and industries gather and interpret public sentiment–impacting fields such asmarketing, content moderation, policymaking, and public relations. However, most studies testing this idea have relied on expensive, highly specialized human survey data to validate LLM performance, and their findings have been mixed. This uncertainty makes it difficult for managerial decision-makers to know whether investing in LLM-based methods is worthwhile in early-stage research. To address these challenges, we propose a set of quality checks designed to evaluate an LLM’s suitability for simulating human opinions. These checks focus on three main points: (1) Logical constraints; (2) Model stability; and (3) Alignment with stakeholder expectations. By applying these checks, managers can get a sense of the likelihood of efficacy of LLMs prior to costly investment in human data for validation. The checks also enable teams to iteratively refine their prompting approaches before scaling up. We demonstrate our approach in the area of AI-assisted content moderation—an application where many believe LLMs’ ability to replicate human opinions could be particularly valuable. None of the models we tested passed all of our quality checks, highlighting multiple failure points. We discuss the significance of these shortcomings, and we offer suggestions for how companies can use these tests to improve prompt engineering and mitigate risks when considering LLMs for opinion simulation.”

Patryk-Perkowski

Patryk Perkowski

Assistant Professor of Strategy and Entrepreneurship, Sy Syms School of Business, Yeshiva University

"Generative AI as Routine-Biased Technical Change? Evidence from a Field Experiment in Central Banking”

Theories of routine-biased technical change posit that technological advances complement workers in non-routine tasks and substitute for workers in routine tasks. We examine whether generative AI exhibits characteristics of routine-biased technical change through a field experiment at the National Bank of Slovakia. We randomly assign generative AI access to central bank employees completing workplace tasks that mirror the theoretical task-based framework. Our results indicate that generative AI access leads to large improvements in both quality and efficiency for the majority of participants. In line with theories of routine-biased technical change, we find a strong complementarity between generative AI and non-routine work, both on average and for most participants. We also find some support for generative AI as both cognitive-biased and specialist-biased, though smaller in magnitude than our tests of routine-bias. While workers in routine-intensive jobs experience larger individual performance gains, generative AI is less effective for the routine task content of their work, revealing a potential mismatch between worker-level and task-level impacts. Additionally, we find differences in how the benefits of generative AI relate to workplace skills: low-skill workers benefit most in terms of quality while high-skill workers benefit in terms of efficiency. Our findings provide early empirical support for generative AI as routine-biased technical change, with important implications for how generative AI will impact workers and labor markets more broadly.

 

Beverly Rich

Beverly Rich

Partner, Messner Reeves LLP

“AI-Powered Lawyering: AI Reasoning Models, Retrieval Augmented Generation, and the Future of Legal Practice”

Generative AI is poised to reshape legal practice, yet its limitations—such as factual hallucinations—raise concerns, especially in complex legal tasks. This article evaluates two promising innovations: Retrieval Augmented Generation (RAG), which grounds AI outputs in legal sources, and reasoning models that better structure complex thought. In the first randomized controlled trial of these technologies, upper-level law students completed six legal tasks using either a RAG-powered tool (Vincent AI), a reasoning model (OpenAI’s o1-preview), or no AI. Both tools significantly improved work quality and preserved efficiency gains, outperforming earlier models like GPT-4. Vincent boosted productivity by 38–115% with minimal hallucinations, while o1-preview improved analytical depth but introduced some factual errors. These findings suggest that combining RAG and reasoning models could drive the next generation of AI-assisted lawyering.

Christine Riordan

Christine Riordan

Assistant Professor, University of Illinois at Urbana-Champaign

“From Rhythms to Stop, Drop and Roll: Unpacking the Impact of Algorithmic Management on Discretion in Hotel Housekeeping Work”

A key concern regarding the spread of algorithmic management (AM) technologies in settings outside of gig work is the extent to which it reshapes workers’ discretion, or latitude to make decisions that control their work process. The authors develop a grounded model of AM’s impact on discretion with qualitative data collected from hotel housekeeping. The model hinges on three dimensions: how AM impacts the task structure of workers’ labor process, the varying degrees of uncertainty found in different tasks, and the salience or frequency of tasks. The authors show how changes to discretion vary in accordance with these dimensions, and illustrate both the consequences and moderators of these changes. The findings underscore the importance of paying close attention to characteristics of the labor process to fully understand AM’s implications for labor, especially in settings where AM augments, rather than substitutes, for managerial decision-making.

Arun Sundararajan

Arun Sundararajan

Harold Price Professor of Entrepreneurship, NYU Stern School of Business

"Generative AI, Productivity, and Incentives for AI Twins”

Generative AI is an example of a technology whose automation potential depends in part on participation by humans in training and improving the technology. Although the primary use of generative AI today is as a general-purpose source of knowledge, our focus is on its ability to replicate the process or style of an individual human creator or worker. This kind of “AI twin” could complement its creator’s efforts, enabling them to produce higher-quality output in their individual voice or style more efficiently. However, increasingly intelligent AI twins could also, over time, replace individual humans or lower their ability to command premium wages. We analyze this trade-off using a principal-agent model in which agents have the opportunity to make investments into training an AI twin that lead to a lower cost of agent effort, a higher probability of success, or both. We situate our model within a framework in which the tasks performed vary in the extent to which AI output can be altered or improved by the human (the “editability” of the output) and also vary in the extent to which a non-expert can assess the quality of human or AI output (its “verifiability.”) Our synthesis of recent empirical studies indicates that productivity gains from the use of generative AI are higher overall when task editability is higher, while non-experts enjoy greater relative productivity gains for tasks with higher verifiability.

Luca Vendraminelli

Luca Vendraminelli

Postdoctoral Researcher, Stanford University

“Data is Gold!: Occupational Repositioning, AI Stack Bundling, and the Processes of Technology Sourcing in a Large Firm”

With rapid advancements in Artificial Intelligence (AI) technologies characterized by sophisticated data and engineering needs and uncertainty about how to implement AI effectively, the nature of considerations undergirding strategic sourcing decisions has become considerably more complex than what prior research would predict. In this study, we examine the dynamics of AI sourcing decisions in a large firm, and ask: despite high uncertainty and a strong emphasis on controlling proprietary data, how and why do firms that begin strategic projects with the intent to ‘make’ AI models internally move toward ‘buying’ external vendor solutions? By drawing on four years of fieldwork conducted at Weave (a pseudonym), a multinational fashion company, we examine two strategic projects aimed at internally building AI models. These projects were initiated as a “make” attempt, under the premise that Weave’s “data is gold,” i.e., a unique asset that would enable the firm to internally develop superior AI models and gain competitive advantage, while avoiding reliance on external technology vendors. Yet, despite this commitment, both projects eventually transitioned toward buying AI solutions from external vendors. We identify and examine the processes that underlie such make-to-buy transitions. In doing so, we highlight how and why firms that initially seek to develop AI models in-house to retain data control may find themselves buying AI solutions from external vendors. These findings contribute to research on strategic sourcing by demonstrating how occupational positioning and intra-organizational politics shape technology sourcing decisions in firms.

Binglu-Wang-2022

Binglu Wang

PhD Candidate, Kellogg School of Management, Northwestern University

"LLM Feedback on Science: A Large-Scale Randomized Field Experiment"

The transformative impact of Large Language Models (LLMs) across various sectors has demonstrated their potential to enhance productivity and facilitate complex decision-making processes, significantly altering innovation workflows. Despite their widespread adoption in commercial and consumer domains, the integration of LLMs into the unique processes of scientific discovery and innovation has been minimally explored. This study probes the potential of LLMs to bolster scientific feedback through a large-scale randomized field experiment, involving 34K recent preprints by 50K scholars across 8 STEM domains and 136 global regions. Our findings reveal that LLM-generated feedback significantly influences the revision of preprints and encourages the continued use of LLMs in academic writing. Crucially, our analysis exposes marked regional and expert-level disparities, underscoring the potential of LLMs to not only benefit innovation processes but also enhance equitable access to scientific resources. These preliminary insights advocate for expanded research into the scalable application of LLMs to make innovation and science more democratic and inclusive worldwide.

Gavin Wang

Gavin Wang

Assistant Professor, University of Texas at Dallas

"AI Exposure on Workers’ Career Path: Evidence from U.S. Workforce”

This study investigates the long-term impact of firm-level artificial intelligence (AI) adoption on employees’ career trajectories, focusing on salary progression and level promotions within U.S. public firms. By merging Burning Glass job posting data, Revelio Labs employee resume data, and Compustat firm data, we analyze the career outcomes of employees exposed to AI compared to matched control groups. The findings reveal that AI exposure is associated with a 50% increase in salary and a 0.3-level seniority rise within five years. However, the benefits are unevenly distributed, showing a “level-biased technological change” where higher-level employees get better off after being exposed to AI, while lower-level employees get worse off. Mechanism tests suggest that domain knowledge, which complements AI, is a critical factor in these outcomes. Higher-level employees with more domain knowledge benefit from AI exposure, while AI exposure hinders lower-level employees from acquiring domain knowledge. The study highlights potential inequalities within organizations and underscores the need for targeted training and policy interventions to support lower-level employees in adapting to an AI-driven labor market.

Catherine Wu

Catherine Wu

PhD Student, NYU Stern

“Generative AI, Productivity, and Incentives for AI Twins”

Generative AI is an example of a technology whose automation potential depends in part on participation by humans in training and improving the technology. Although the primary use of generative AI today is as a general-purpose source of knowledge, our focus is on its ability to replicate the process or style of an individual human creator or worker. This kind of “AI twin” could complement its creator’s efforts, enabling them to produce higher-quality output in their individual voice or style more efficiently. However, increasingly intelligent AI twins could also, over time, replace individual humans or lower their ability to command premium wages. We analyze this trade-off using a principal-agent model in which agents have the opportunity to make investments into training an AI twin that lead to a lower cost of agent effort, a higher probability of success, or both. We situate our model within a framework in which the tasks performed vary in the extent to which AI output can be altered or improved by the human (the “editability” of the output) and also vary in the extent to which a non-expert can assess the quality of human or AI output (its “verifiability.”) Our synthesis of recent empirical studies indicates that productivity gains from the use of generative AI are higher overall when task editability is higher, while non-experts enjoy greater relative productivity gains for tasks with higher verifiability.

Jie Zheng

Jie Zheng

PhD Student, Purdue University

“Generative AI, Human Expertise, and Scaling Law”

How effectively organizations can integrate private data and human judgment to emulate human expertise remains underexplored. Our study, conducted in collaboration with a consulting company, consists of two stages. In the first stage, we perform a human evaluation experiment to collect experts’ assessments of consulting reports specifically the negotiation recommendation reports provided to clients. We involve two groups of experts, analysts and executives, in the evaluation process, capturing different types of expertise and judgment within the organization. In the second stage, we design an LLM evaluation experiment to examine how scaling impacts the alignment of AI with human expertise. We select LLM models of varying sizes, ranging from 7 billion to 70 billion parameters, using the DeepSeek-R1 models. To scale private data, we include historical reports related to the current evaluating report in the prompt, providing LLMs with contextual knowledge that mimics the proprietary information used by experts. For scaling expert judgment data, we incorporate previous evaluated reports and evaluation results of the current evaluating report’s expert evaluator and prompt the LLMs to align with the specific judgment style of the evaluator. Our early findings indicate that incorporating both private data and human judgment data can significantly enhance expertise, as measured by the alignment between AI evaluation and expert evaluation.

Eric-Zhou-BU

Eric Zhou

PhD Candidate, Boston University

“Creative Markets in the Age of Generative AI: Strategic Shifts and Labor Market Health”

The rise of generative artificial intelligence (AI) has sparked significant debate about its impact on creative labor markets, echoing concerns raised nearly 200 years ago with the advent of photography. While early photographic technologies displaced traditional portraiture forms, they also spurred artistic innovation, as painters embraced abstraction and new approaches to human expression. Today, generative AI poses a similar challenge but at an unprecedented scale, raising concerns about labor displacement, intellectual property, and the future of human creativity. This paper investigates how creators respond to competitive pressures and ethical concerns associated with generative AI by adopting, protesting, or opting out of AI training systems and the long-term implications of these strategic shifts on the health and composition of creative markets. We conduct artist- and artifact-level analyses using large-scale data from a creative labor market platform to understand strategic responses to both the introduction of AI and protections offered by the platform against AI. We investigate to what extent these strategic shifts impact the overall artist and artwork composition within the broader creative labor market and its implications on the long-term health of such markets.

Posters

SURANJEET CHOWDHURY AVIK

Suranjeet Chowdhury Avik

Graduate Teaching Assistant, Mississippi State University

“Balancing AI Task Delegation and Skill Erosion: The Role of Algorithmic Accountability”

The rising integration of AI in workplace environments has redefined how work is shared between humans and machines. Although delegating tasks to AI implies increased efficiency, it has led to discussions about skill erosion, or the loss of competencies owing to less human involvement in task execution. An important, albeit overlooked aspect of this interplay, is algorithmic accountability- integrating clear procedures and strong systems of responsibility into AI operations. This research experiment will analyze how levels of AI task delegation and the presence of algorithmic accountability jointly influence skill erosion. The impetus arises from the conflicting need for organizational efficiency and sustainability of the workforce in industries increasingly dependent on AI technology for decision-making. The empirical understanding of whether algorithmic accountability mechanisms may mitigate skill erosion due to the level of AI task delegation is still lacking. The central research question is: How does algorithmic accountability moderate the relationship between AI task delegation and skill erosion in professional environments?

Youn Baek

Youn Baek

PhD Student, NYU Stern

“Bridges or Barriers? How Deep Learning Shapes AI Startup Innovation"

This paper examines how deep learning shaped the innovation landscape in AI industry by conferring a distinct advantage to big tech firms over startups. Building on theories of absorptive capacity and knowledge spillovers, I argue that while large companies’ research efforts can generate positive externalities, their proprietary data environments also create structural barriers for smaller firms. Leveraging the sudden rise of deep learning in 2012 as an exogenous shock, I examine the impact of this shift in AI innovation toward big data and large-scale computing. The findings indicate that, following the breakthrough in deep learning, big tech companies significantly increased their participation in AI development. However, this surge in AI research by big techs coincided with a decline in the adoption of deep learning research in AI startup patents. Further analysis indicates that access to data, rather than computing power, is the primary constraint for startups. Moreover, policies aimed at encouraging labor mobility—often assumed to bridge knowledge gaps between firms—appear ineffective in offsetting startups’ disadvantages when big tech firms maintain a strong data advantage.

Pierre Bouquet

Pierre Bouquet

PhD Student, Massachusetts Institute of Technology

"News Sentiment as a Dynamic Predictor of Job Automation Risk"

As artificial intelligence increasingly disrupts job and task structure, it is essential for companies and society, in general, to anticipate which tasks are at risk of automation and how these risks can guide workforce management strategies to proactively reskill employees, restructure roles, and optimize operations. To address these challenges, we introduce a machine learning pipeline that leverages news sentiment as a dynamic proxy for job automation risk assessment. By processing two million news articles, the model computes exposure scores at the task, job, and sector levels, enabling both historical trend analysis and real-time monitoring. Our findings demonstrate that these exposure scores align with prior studies that use rigorous, expert-driven methods. Through its dynamic evaluation, this approach models the impact of AI innovations and can help inform strategies for workforce transformation.

Allen S. Brown

Allen S. Brown

PhD Candidate, Tepper School of Business, Carnegie Mellon University

“More Than Productivity: Psychological Safety in the Age of Generative AI”

Generative AI (GenAI) technologies are widely lauded for their potential to boost productivity and rapidly gaining traction at work. Yet the effect of GenAI implementations on important perceptions about the experience of work, such as psychological safety, is unresolved in the existing literature. Psychological safety is instrumental in facilitating learning behaviors, enhancing coordination, and improving the overall affective experience of work. On the one hand, AI- and algorithmically-driven management systems might threaten psychological safety by increasing concerns about autonomy, surveillance, and opacity. On the other hand, GenAI can also serve as a non-threatening, neutral partner capable of fostering psychologically safe environments by offering quick, non-judgmental feedback. We reconcile these conflicting perspectives by showing that a GenAI, versus human, partner’s role (e.g., evaluative vs. nonevaluative) critically influences individual perceptions of psychological safety. We provide evidence for our theory across three preregistered experiments. Studies 1 and 2 leverage complementary, experimental methodologies and support our model. Study 3 demonstrates that autonomy satisfaction predicts the experience of psychological safety during interactions with GenAI agents. Together, this research establishes an important boundary condition for the study of aversive perceptions about AI and demonstrates strategies for using GenAI tools to facilitate psychologically safe climates at work. GenAI tools can threaten psychological safety when they are associated with performance evaluation at work, but interventions that center autonomy and self-direction can reduce these threats.

Ruyu Chen

Ruyu Chen

Postdoctoral Fellow, Stanford University

“The Impact of Industrial GPT on B2B Procurement – Evidence from a Field Experiment"

This study examines the impact of Industrial AI agents on Business-to-Business (B2B) procurement, with a focus on the Maintenance, Repair, and Operations (MRO) sector. MRO procurement, characterized by product complexity and specification challenges, remains a critical yet inefficient area of B2B operations. Using large language models, ZKH—a leading digital platform for industrial supplies—has developed an AI-powered agent that acts as an intelligent assistant. By analyzing a database of over 17 million SKUs and billions of product parameters, the agent provides smart recommendations, real-time specification support, and streamlined navigation through complex product catalogs. We conduct a field experiment to evaluate the effects of the Industrial AI agent on early-stage buyer engagement, final-stage purchase decisions, procurement duration, and new product adoption. Using data of 254 purchasing companies over a three-month period from a trial experiment, we provide preliminary evidence of the impact of the Industrial AI agent adoption on B2B procurement.

Eunsol Cho

Eunsol Cho

PhD Candidate, New York University

"Decision Complexity and Trust in LLM Advisors"

This study explores how decision complexity affects trust in advice from large language models (LLMs). In an online experiment, participants chose how much of $100 to invest in a lottery, with complexity varied by the number of possible outcomes. Trusting behavior, measured by changes in investment after consulting a GPT-driven advisor, was highest under the greatest complexity condition. Trusting beliefs and intentions also increased with perceived complexity, though participants did not always follow the LLM’s advice even when reported trust was high.

Erik Engberg

Erik Engberg

PhD Student, Örebro University

"AI Unboxed and Jobs: A Novel Measure and Firm-Level Evidence from Three Countries”

To help open the black box of how AI relates to the labor market, we construct a measure of occupational exposure to AI and investigate its relation to firm-level labor demand. To construct our model, we measure AI progress based on over 140 performance benchmarks that have been used in AI research, in combination with data on occupational work content, to estimate how occupations’ AI exposure varies year by year over the period 2010-2023. Furthermore, we create nine sub-indices which reflect exposure to different areas of AI technology, such as language modeling or computer vision. We take the social dimension of work into account, assuming that more socially demanding occupations, such as clergy or CEO, are less exposed to AI. According to the model, white collar occupations are most exposed to AI, and especially white-collar work that entails relatively little social interaction. Examples include software developers, mathematicians, and technical writers. Economists rank high, at the 97th percentile of AI exposure. While high exposure indicates that AI is likely applicable to the occupation, the model does not predict what the net effect on the demand for human labor will be. Further research is needed to shed light on the circumstances that determine the degree to which AI exposure leads to different outcomes such as automation of tasks, augmentation of human labor, or the creation of new tasks. To explore the relationship between AI exposure and labor market developments, and to illustrate the usefulness of the measure, we apply it to near-universal data on firms and individuals from Sweden, Denmark, and Portugal. Our exposure index is available for download as open data, for the U.S. occupational taxonomy SOC, as well as the international and EU equivalent, ISCO.

nikhil-george-photo

Nikhil George

PhD Candidate, C.M.U.

"How Informative are Job Posting Skill Measures? Evidence from Selection Decisions"

Information embedded within job postings is increasingly central to modern labor markets, serving not only potential employees but also powering matching algorithms, screening tools, and broader workforce analytics. This growing reliance necessitates a thorough validation of skill and job requirement measures derived from job posting content. We provide the first empirical validation of this informativeness by predicting selection for internal jobs within a major firm using a novel measure of skill distance constructed solely from job postings. Our analysis reveals that job posting-derived skill distances strongly predict selection outcomes: the probability of selection is 84% higher when the sought job is in the lowest quintile of skill distance compared to a position in the highest quintile. Furthermore, in 70% of cases where multiple candidates applied, the selected candidate was the one with the shortest skill distance. Notably, these posting-derived skill measures consistently outperform traditional employee characteristics in explaining selection decisions. Building on this validation, we demonstrate how internal application intensity is significantly correlated with the average skill distance to emerging opportunities, yielding novel insights directly relevant to human capital management and scholarly inquiry on employee mobility in contemporary labor markets. Beyond validating the predictive power of job posting content for selection, our analysis unveils rich possibilities for future scholarly inquiry and the development of advanced analytics leveraging the readily available and informative skill content within job postings.

Ruru Hoong

Ruru Hoong

PhD Candidate, Harvard Business School

“Calibrated Coarsening in Human-AI Interaction, Theory and Experiments”

Artificial intelligence (AI) signals are increasingly being deployed as human decision-making aids across many critical applications. However, human cognitive biases can prevent even informative AI predictions from improving outcomes. We propose coarsening at optimal thresholds—partitioning the signal space into a fewer number of cells—as a way to improve decision-making outcomes while (i) keeping humans in the loop, (ii) modifying signals without deception, and (iii) adapting to various forms of cognitive biases or decision-making contexts. Within a Bayesian persuasion framework with privately-informed receivers, we study the optimal structure of signals. We empirically show in a randomised experiment with loan underwriters that the provision of AI signals optimally coarsened at the right thresholds improves overall decision-making outcomes.

Rubing Li

Rubing Li

PhD Student, New York University

“Reasoning and the Trusting Behavior of DeepSeek and GPT: An Experiment Revealing Hidden Fault Lines in Large Language Models”

When encountering increasingly frequent performance improvements or cost reductions from a new large language model (LLM), developers of applications leveraging LLMs must decide whether to take advantage of these improvements or stay with older tried-and-tested models. Low perceived switching frictions can lead to choices that do not consider more subtle behavior changes that the transition may induce. Our experiments use a popular game-theoretic behavioral economics model of trust to show stark differences in the trusting behavior of OpenAI’s and DeepSeek’s models. We highlight a collapse in the economic trust behavior of the o1-mini and o3-mini models as they reconcile profit-maximizing and risk-seeking with future returns from trust, and contrast it with DeepSeek’s more sophisticated and profitable trusting behavior that stems from an ability to incorporate deeper concepts like forward planning and theory-of-mind. As LLMs form the basis for high-stakes commercial systems, our results highlight the perils of relying on LLM performance benchmarks that are too narrowly defined and suggest that careful analysis of their hidden fault lines should be part of any organization’s AI strategy.

Dalbert Ma

Dalbert Ma

PhD in Strategy & Entrepreneurship, London Business School

“How Disruptive will Generative AI be? A Micro-Level Analysis of Evidence and Expectations”

This study investigates how executives assess Generative AI’s impact on competitive advantage and industry disruption. Using mixed-methods analysis of UK Director data, we find that GenAI’s disruptive potential is heightened in modular, less-regulated sectors, while incumbents anticipate leveraging complementary assets for differentiation. The presence of pattern recognition tasks and proprietary data amplifies both displacement risks and differentiation opportunities.

Xincheng Ma

Xincheng Ma

PhD student, HKUST Business School

"Less is More: Monologue vs. Dialogue in AI-Enabled Telemarketing”

As AI-enabled dialogue systems increasingly replace traditional robocalls in telemarketing, their promise of enhanced personalization through simulated conversation and social presence has garnered widespread interest. However, does this interactivity truly improve marketing effectiveness—or might it backfire? Drawing on perceived social presence (PSP) and cognitive load theories, we conduct a large-scale randomized field experiment involving 90,000 customers of a leading Chinese retailer. Participants were randomly assigned to receive one of three types of AI calls: a traditional monologue robocall, a socially engaging dialogue (Dialogue1), and an efficiency-optimized dialogue with the core offer front-loaded (Dialogue2). Contrary to common assumptions, results show that the monologue and Dialogue2 conditions significantly outperformed Dialogue1, revealing that enhanced social presence alone does not improve outcomes. Behavioral and transcript analyses uncover a dual-path mechanism: while Dialogue2 increases early rejection (ad avoidance), it also boosts conversions among retained users by reducing cognitive disorientation and temporal frustration. Mediation analysis confirms that over one-fifth of Dialogue2’s effectiveness is attributable to lowered psychological frictions. Our findings suggest that in low-volition, high-interruption contexts like telemarketing, the benefits of PSP are offset by its cognitive costs. Clarity, not conversation, drives compliance—making telemarketing a boundary condition where PSP’s traditional advantages no longer apply.

Nicholas Rounding

Nicholas Rounding

PhD Student, Maastricht University

“Artificial Intelligence in the Workplace: Improving Productivity and Communication Quality for Call Center Agents”

This paper analyzes the impact of artificial intelligence (AI) on worker productivity. Using a unique dataset from a large financial service provider, we evaluate the introduction of a labor-augmenting AI tool in the coaching regime of call center agents. The data includes both traditional productivity metrics, such as call handling time, and complex AI-generated measures of communication style. We exploit the company’s staggered introduction of the AI tool to estimate causal effects. The AI-enhanced coaching reduces agents’ call handling time by approximately 7%, with larger effects for low- compared to their high-tenured agents. These productivity gains are primarily driven by improvements in problem-solving skills and communication styles.

Jafar Sabbah

Jafar Sabbah

Bayes Fellow, City St. George’s, University of London

"Co-Efficacy at Work: The Case of Co-Creative Efficacy in Human-AI Collaboration”

Self-efficacy has long been recognized as a critical driver of work performance, shaping how individuals set goals, persevere through challenges, and deliver outcomes (Bandura, 1977; Gist & Mitchell, 1992). In creative work contexts, creative self-efficacy (CSE)—the belief in one’s ability to produce creative outcomes—has similarly been shown to predict performance (Tierney & Farmer, 2002). However, with the rise of GenAI tools in the workplace, creativity is increasingly becoming a collaborative effort between humans and AI systems. In such joint human–GenAI settings, traditional constructs like CSE may no longer fully capture the psychological dynamics at play. This paper introduces a new construct, creative co-efficacy (CCE)—defined as the belief in the joint capabilities of humans and GenAI to achieve creative outcomes—and empirically investigates its antecedents and predictive power. Across three studies, including survey-based research and experimental analysis, we demonstrate that CCE is distinct from CSE, shaped by factors such as prior AI experiences, prompting skills, and cognitive flexibility. Moreover, CCE predicts how humans engage with GenAI in creative tasks and is associated with stronger creative performance when collaboration is structured effectively. This work contributes to a growing body of research on human-AI teaming and highlights the need for new psychological constructs to understand performance in AI-augmented work environments.

 

 

Dr. Travis Smith

Travis Smith

Senior Manager, Pharmacy Cancer Research, Mayo Clinic

“Cost-Effectiveness Analysis of AI-Assisted Workflow for Clinical Trial Document Preparation”

This study evaluates the cost-effectiveness of implementing an AI-assisted workflow for pharmacy summary sheet creation in clinical trials. Traditional manual workflows for preparing these documents are labor-intensive, requiring extensive review of protocol documents, identification of key data points, and meticulous accuracy checks. Given the increasing complexity of clinical trial protocols, AI-driven automation presents an opportunity to enhance efficiency while maintaining compliance and accuracy. A cost-effectiveness analysis was conducted comparing manual and AI-assisted workflows for pharmacy study summary sheet creation. The study measured labor costs, time efficiency, and the incremental cost-effectiveness ratio (ICER). The research utilized workflow time data from clinical trials classified as moderate or high complexity, where manual summary preparation by lead pharmacists was compared to an AI-assisted process using Microsoft Copilot. AI-generated summaries were subsequently reviewed and refined by investigational drug services (IDS) pharmacists, and time efficiency gains were recorded.

Angesom Teklu

Angesom Teklu

PhD Student, Pardee RAND Graduate School

"Force Multipliers in Policy Research: A Field Experiment on AI-Driven Productivity and Quality Enhancement”

This study examines the potential of AI tools to serve as “force multipliers” in policy research by enhancing productivity and work quality among policy researchers. We are designing a randomized controlled field experiment wherein participants will be assigned to either an AI-assisted group or to a control group that adheres to standard research practices. The primary objective is to assess the causal impact of AI augmentation on key dimensions of research performance. Our experimental design incorporates multiple pre- and post-intervention assessments to capture changes in both productivity and cognitive workload. Productivity metrics include the time required to synthesize research findings, the number of policy recommendations generated, and overall report turnaround times. To evaluate research quality, we employ peer evaluations of research reports along with quantitative analyses of citation quality and accuracy in policy analyses. Additionally, we assess cognitive load and researcher satisfaction through structured surveys addressing perceived workload and overall job satisfaction. Randomization at both the individual and team levels will allow us to rigorously isolate the effects of AI integration from potential confounding factors. Although the full study is scheduled for implementation in the coming months, preliminary designs, pilot insights, and expected outcomes will be presented at this conference. This work contributes to the broader discourse on transforming workplace practices through AI and offers prescriptive insights for enhancing both efficiency and innovation in public policy research.

 

 

 

Meixian Wang

Meixian Wang

PhD student, Temple University

“The Impact of Realism and AI Disclosure on Virtual Influencer Effectiveness: A Large Field Experiment”

Influencer marketing has become an essential component for brand strategies, and the adoption of AI-generated virtual influencers is accelerating with advancements in generative AI. However, these emerging practices raise challenges related to transparency and ethical concerns. By conducting a large field experiment involving over 1.8 million consumers, we examine the interplay between virtual influencers’ anthropomorphism levels and the disclosure of their AI identity. The results show that virtual influencers with higher anthropomorphism levels enhance engagement metrics, while AI identity disclosure reduces link clicks and three-second video plays, particularly for highly realistic virtual influencers. Our underlying mechanism analysis based on an online experiment reveals that this reduction is driven by “expectation violation effect”, where disclosure violates consumers’ expectations and evokes negative feelings. Importantly, the negative effects reverse for consumers with prior experience interacting with virtual influencers—among this group, highly realistic virtual influencers with disclosure actually lead to greater engagement than their less anthropomorphic counterparts. Our findings provide theoretical and practical insights into the design and development of AI agents, emphasizing the need to strategically manage anthropomorphism and transparency to optimize consumer engagement.

 

 

Nicholas Wolczynski

Nicholas Wolczynski

PhD Candidate, UT Austin

“The Value of AI Advice: Personalized and Value-Maximizing AI Advisors Are Necessary to Reliably Benefit Experts and Organizations”

Despite advances in AI’s performance and interpretability, AI advisors can undermine experts’ decisions and increase the time and effort experts must invest to make decisions. Consequently, AI systems deployed in high-stakes settings often fail to consistently add value across contexts and can even diminish the value that experts alone provide. Beyond harm in specific domains, such outcomes impede progress in research and practice, underscoring the need to understand when and why different AI advisors add or diminish value. To bridge this gap, we stress the importance of assessing the value AI advice brings to real-world contexts when designing and evaluating AI advisors. Building on this perspective, we characterize key pillars — pathways through which AI advice impacts value — and develop a framework that incorporates these pillars to create reliable, personalized, and value-adding advisors. Our results highlight the need for system-level, value-driven development of AI advisors that advise selectively, are tailored to experts’ unique behaviors, and are optimized for context-specific trade-offs between decision improvements and advising costs. They also reveal how the lack of inclusion of these pillars in the design of AI advising systems may be contributing to the failures observed in practical applications.

 

 

 

Headshot of a person with shoulder-length black hair, wearing a black blazer over a white top, against a plain background.

Sojung Yoon

PhD Candidate, University of Minnesota

“Algorithm as Boss or Coworker? Randomized Field Experiment on Algorithmic Control and Collaboration in Gig Platform”

The rapid integration of artificial intelligence into the workforce, particularly in the gig economy, presents both opportunities and challenges. Algorithmic control is often used to align individual worker behaviors with organizational objectives. While algorithmic control facilitates efficient management of workers, it also leads to intrusive exertion of control, also known as the “algorithm-as-boss” phenomenon. In this study, we attempt to understand the tradeoffs and outcomes of different algorithmic control configurations for gig workers. Partnering with a major delivery labor union, we run a randomized field experiment involving 130 gig workers who are randomly assigned to three conditions: tight algorithmic control (i.e., no option to decline the AI-curated recommendation), loose algorithmic control (i.e., a choice to decline the AI-curated recommendation one at a time), and no control (i.e., free to view all AI-curated recommendations and choose a task). We analyzed the impact of different algorithmic control configurations on outcomes related to the platform’s operational efficiency (i.e., delivery time) and workers’ compensation (i.e., delivery fee). Our study reveals that although workers under loose algorithmic control do not produce the best operational efficiency, they earned higher profits compared to those under tight control and reported greater self-efficacy and autonomy. Furthermore, our post-experiment surveys regarding workers’ perceived autonomy and self-efficacy reveal no significant difference between workers under tight and no control, indicating that the absence of algorithmic control is not necessarily better than tight control. Overall, results suggest that a nuanced approach to algorithmic control is needed in managing gig workers.

Dongmiao Zhang

Dongmiao Zhang

Postdoctoral Researcher, Carnegie Mellon University

“Who Benefits from AI Adoption? Skill Complementarity and Labour Market Dynamics”

Recent studies on artificial intelligence (AI) and labour market outcomes often focus on the automation effects of AI. However, which skills might complement AI technologies and how AI adoption shapes employment and wage dynamics remain under-explored at the occupation level. Moving beyond the classic measurement of skills such as education, tenure or specific skill categories, I assess how many skills are combined in an occupation and their respective complexity. I refer to this as “complexity intelligence” and propose that occupations with high complexity intelligence will complement AI technologies. Most notably, the findings show that complex occupations are more likely to adopt AI technologies. In addition, there is a positive correlation between AI adoption and employment growth. AI adoption is associated with an increase in wage growth on average, with a larger increase for complex occupations.

 

Ziyi (Iggy) Zhao

Ziyi (Iggy) Zhao

PhD Candidate, Temple University

“Revolutionizing Workplace Problem-Solving: Collaboration with Out-of-the-Box AI Solutions”

Generative artificial intelligence (GenAI) is reshaping the landscape of work by redefining how professionals solve complex problems, moving beyond traditional search methods to generate innovative and context-specific solutions. This transformative shift has a profound impact on job roles, responsibilities, and the skills required for effective human-AI collaboration. IT professionals face the critical task of navigating these changes, balancing the potential productivity and innovation gains of GenAI adoption with the need to manage technological risks and ethical considerations. This mixed-methods study directly addresses how IT professionals evaluate and use GenAI-generated solutions in complex problem-solving tasks. Specifically, by examining user interactions with GenAI in the context of programming, the research highlights mechanisms through which human-AI collaboration can significantly improve workplace efficiency and productivity. The four interaction patterns identified (Autopilot, Reviser, Auditor, and Collaborator) through 1800 minutes of videos of dynamic interactions of GenAI illustrate different strategies that professionals employ when interacting with GenAI. As a result, this research underscores the importance of cultivating “prompt representation” that enables professionals to iteratively guide GenAI toward accurate and contextually tailored outputs.