Resource Efficient Computing Strategy (overall and) for AI/Gen-AI workloads !!

With reference to my earlier post (https://chats01.com/2025/01/29/gen-ai-ai-led-innovation-valuation-strategy/) where I touched upon the possible high level strategies to optimize LLM models – where the argument was primarily on FLOPS per query (assumed to consume 100 tokens for an acceptable tolerance of 90 %), where typically an SLM (GPT ~2) FLOPS range upto 10 Bn while LLM (GPT ~ 3/4) ranges upto 5 Tn – the point remains if just optimizing the Gen AI model could drive efficient resource allocation for a computing strategy and create a strong business case from the CIO to the CFO.

With recent uprise of DeepSeek implementing “DeepThink” it is clear that the entry barrier of early and new adopters in AI / Gen-AI led innovation are getting diminished. The attempt to leverage simple, efficient models (the R1 model in this case) to break down long and complex queries into smaller, manageable tasks is pedestrian, yet the first mover advantage prevails always.

The cost is believed to be around a sizable fraction of the earlier models, to counter which the vendors of the earlier models would race price down to bottom. But end of the day it is not a “model” price race ! At one end of the spectrum there is lot to define a compelling business use case, while on the other hands there is a need to drive resource efficient computing strategy. I would focus more on the later on this article.

Strategies to propel resource-efficient computing overall (more pronounced for AI / Gen-AI workloads) ?

Consider the deployment architecture of Gen AI models – (a) build an open source model into on-prem, training and testing with your own enterprise data, which means a Blank Slate model without any past experience – downside is it might lose broader correlation and inferring capability but on the flip side eliminates AI hallucinations gets accurate results faster for your enterprise; OR you could (b) leverage a model deployed on cloud infrastructure from Microsoft Azure, AWS and other mega players who have custom trained the models from vendors like Open AI, Anthropic etc; OR (c) still rely directly on vendors like Open AI, Anthropic, Mistral, Meta to leverage their models on their run time and test your data for real time use cases. It will be case-by-case to determine which is more cost efficient.

Let’s get one level deeper and select chip providers that implement advanced chip interconnect solutions, such as, (a) compute interconnect technology, (b) die-to-die connectivity solutions, (c) network-on-chip technology, (d) high speed optical interconnect technology, (e) fiber to chip packaging using laser, eliminating use of glue and more. There are lot of niche players delivering in this segment on top of existing mega chip giants and selecting one over the other is only going to augment our business case.

Use Accelerated infrastructure model across diverse compute requirements, such as (a) near memory compute and expansion, (b) dedicated data processing units, (b) network acceleration through, coherent and optical DSPs, DCI Optical Modules, Ethernet PHYs, (c) storage acceleration through fibre channels, SSD controller and storage accelerators and more to find out what is the critical path in your compute strategy.

Prevent data centre utilization leakages by (a) visualizing, and driving observability around all components of data center, (b) planning capacity more efficiently, (c) driving agent-less & automated discovery and inventory of data centre hardware. This is nothing but Cloud FinOps / SRE / IT operations observability principle fathomed at the hardware level helping you justify CAPEX.

Dynamically manage AI workloads for optimal GPU utilization by (a) AI clustering, (b) GPU virtualization, (c) GPU fractioning, (d) workload scheduling, (e) node pooling, and (f) container orchestration etc. Again simple FinOps principles.

Leveraging AI agent based infrastructure operations where observability anomalies, capacity utilization, threat detection etc is no more waiting on human eyes but leverage LLM based contextual processing of the training data to take a decision on the actual test data, integrate and orchestrate the next step of actions, until there is an unacceptable risk in the system where it eventually needs to end up in the human intelligence queue. This is no new and comes from traditional developers stitching things together with software frameworks like LangChain, but has a more value added outcome when attached to Infrastructure and data centre operations.

Use endto-end “data center” infrastructure management specialists who can bring economies of scale and drive performance metrics, through (a) industrial engineering, (b) sustainability delivered contracting, (c) onsite power generation, cooling, battery and back-up, (d) IT racks and accessories, (e) prefabricated data centre modules and more. This is contrary to having a data centre management focusing only on compute, storage, network, cables and switches and making the entire ecosystem more run end-to-end.

What other strategies do you have in mind as a CIO to present before the CFO?

Resource Efficient Computing Strategy (overall and) for AI/Gen-AI workloads !!

Share this:

Leave a comment Cancel reply