AI and HPC: On-Prem Deployment Strategies for CIOs

Published by

Saikat Chatterjee

on

February 13, 2025

Continuation to my earlier note on “Efficient computing – Strategy for CIOs”, where I spoke about deployment architecture, chip interconnect technologies, accelerated infrastructure options, data centre utilization observability, dynamically managing AI/HPC workloads, leveraging AI agent based infrastructure, and DCIM specialists role – here is my note double clicking on the segment around deployment architecture more focused on, the “Strategy for AI/HPC/Gen-AI deployment your own way !!”, to create flexibility in bringing model of your own choice deployed on an on-prem of your choice, and come out of the fear of vendor-lock-in or cyber threat around letting your enterprise data being used outside your walls !

Step 1: Get convinced about the business case for your AI / HPC workload requirements, and the need to have the augmented capacity or to scale up the GPU usage over time, since it needs substantial resource allocation on our on-prem strategy. When calculating NPV and IRR, make sure that the weighted average cost of capital is < than the enterprise ROIC and not the biased ROIC on the expected ambitious returns of the AI initiative. This might end up in destroying capital and value down the line.

Step 2: Now that we are convinced about the AI initiative, let us right size the GPU capacity. For example, (a) 2- 4 GPU deployment would be more than enough for most large enterprise (recording at least a million ops transactions a day), running basic ML tasks to drive operational use cases, agentic AI workflows for business processes, running scientific calculations on large data sets for top-line facing – pricing, product strategy use case, or for D2C needs of content creation, video editing etc, (b) 8 – 12 GPU deployments for advanced ML on huge datasets including retrieving insights from multi-year archived data, weather modelling, complex molecular dynamics, OT / IOT integration for detailed visuals for automated factory floor or transport operations, (c) 100 ish GPU deployments for large scale machine learning like those of GPT-4, complex scientific simulations across various industries, genetic analysis for medical research, clinical trials and drug launch, image and video processing of real-time and high quality precision and (d) we can keep going up with needs of higher accuracy, low recall or specificity with more global data sets, optimizing AI bias and so on and so forth for holistic enterprise or research needs.

Step 3: Now that we have decided what type of GPU deployments, we need to figure out the model that meets your choice. If we look around, we will see every need has a model (blank slate), ready to use by a provider (flex for your own deployment) or already available as-a-service on cloud. And every model has a version for usage of purpose and compute scale (SLM, LLM etc). Enterprise architecture needs to evaluate and select the model from these providers like ChatGPT, Gemini, LLAMA, Mistral, Claude and now Deep seek and many more with their right versions fit for purpose.

Step 4: Now that we decided what model, let us think about factors for on-prem design.

Servers with high or low GPU density (density implies the number of GPU per chassis)
Downstream needs of powerful CPUs to handle data transfer and management between GPUs (often time this is missed)
Chip Interconnect technologies for faster processing and movement of data (often time this is missed, you would find mention of this in my earlier post)
Leveraging data center infrastructure management providers to provide onsite power generation, back up services, robust cooling systems etc or go Colo
Clustered deployments for horizontal scalability

Step 5: After all this is a complex matrix of building physical architecture of on-prem – what GPUs (Supermicro A+ series, NVIDIA A100 or H100 GPUs on a single board), which CPUs (Intel Xeon Scalable or AMD EPYC processors ), which interconnect technology (die-to-die connectivity solutions, network-on-chip, high speed optical interconnect, fiber to chip packaging using), and finally what DCIM strategy (your own CAPEX or Colo or outsource to DCIM providers) ?

Step 6: Phew!! Do I need so much more complexity after running my business?

The call to action is whether or not an AI / HPC strategy is needed, and if needed how should we go about it is best nurtured with a all-encompassing technology and financial strategy across all the steps laid above to ensure that CIO strategy makes perfect sense in front of the board !! This will save millions of sunk cost in getting biased by 1 provider or building complex, unmanageable and costly computing environments.

AI and HPC: On-Prem Deployment Strategies for CIOs

Share this:

Leave a comment Cancel reply