Improving Carbon Emissions of Federated Large Language Model Inference through Classification of Task-Specificity

Geerd-Dietger Hoffmann (Employed by Green Coding Solutions); Verena Majuntke (HTW Berlin)

Abstract

The resource consumption of software and communication infrastructure is an increasing concern, particularly with the emergence of Large Language Models (LLMs). While energy consumption and carbon emission of LLMs during training have been a focus in research, the impact of LLM inference which scales with the number of requests, remains underexplored. In this paper we show that energy efficiency and carbon emission of LLM inference vary depending on the model and on the task category, e.g. math, programming, general knowledge, with which the model is prompted and that smaller specialized models can achieve comparable accuracy while using less resources. We analyze the differences across 8 open-source LLMs when processing prompts of different task categories. Our findings lead to a novel approach: Classifying prompts by using embeddings to route them to the most energy-efficient and least carbon-intensive LLM in a federation of LLMs while keeping accuracy high. We validate the effectiveness of our method through empirical measurements.