AI Infrastructure and MLOPs Archives

Agentic AI

Model Context Protocol (MCP)

February 4, 2026

Nahush Gowda
AI Ethics

Constitutional AI

November 24, 2025

Nahush Gowda
AI Foundations

Tree-of-Thought (ToT)

November 24, 2025

Nahush Gowda
RAG

Retrieval-Augmented Generation (RAG)

November 23, 2025

Nahush Gowda
AI Foundations

Small Language Model (SLM)

November 23, 2025

Nahush Gowda
AI Foundations

Direct Preference Optimization (DPO)

November 23, 2025

Nahush Gowda
RAG

Graph Retrieval-Augmented Generation (Graph RAG)

November 23, 2025

Nahush Gowda
AI Foundations

Chain-of-Thought (CoT)

November 23, 2025

Nahush Gowda
Agentic AI

Function Calling

November 23, 2025

Nahush Gowda

Speculative Decoding

AI Infrastructure and MLOPs

Speculative decoding speeds up LLM inference by letting a fast draft model propose tokens that a larger model…

November 23, 2025

Nahush Gowda
Low-Rank Adaptation (LoRA)

AI Infrastructure and MLOPs

LoRA fine-tunes LLMs by training small low-rank adapters on top of frozen weights, slashing memory and compute while…

November 23, 2025

Nahush Gowda
FlashAttention

AI Infrastructure and MLOPs

FlashAttention is an IO‑aware, exact attention algorithm that tiles work into GPU SRAM and fuses kernels to cut…

November 23, 2025

Nahush Gowda
Paged Attention

AI Infrastructure and MLOPs

Paged Attention organizes LLM KV caches into fixed-size pages to reduce fragmentation, enable continuous batching, and support long…

November 23, 2025

Nahush Gowda
Vector Database

AI Infrastructure and MLOPs

A vector database indexes high‑dimensional embeddings for fast similarity search with metadata filters and hybrid retrieval—foundational for RAG,…

November 23, 2025

Nahush Gowda
Model Quantization

AI Infrastructure and MLOPs

Model quantization reduces precision (e.g., INT8/INT4) for weights and activations to shrink memory and speed inference, enabling cheaper,…

November 23, 2025

Nahush Gowda
Key-Value Cache (KV Cache)

AI Infrastructure and MLOPs

A KV cache stores past attention keys/values so LLMs reuse them at each step, cutting latency, enabling continuous…

November 23, 2025

Nahush Gowda
Grouped-Query Attention (GQA)

AI Infrastructure and MLOPs

Grouped-Query Attention shares keys/values across groups of query heads, shrinking KV caches and bandwidth to speed LLM inference…

November 23, 2025

Nahush Gowda
Speculative Decoding

AI Infrastructure and MLOPs

Speculative decoding speeds up LLM inference by letting a fast draft model propose tokens that a larger model…

November 23, 2025

Nahush Gowda

ai search

Ask me anything. I will answer your question based on my website database.

Stay Connected

@techwirenews

1.4M+ Followers

TechWire News

2M+ Followers

TechWire

4M+ Subscribers

Subscribe to our newsletters. We’ll keep you in the loop.

Speculative Decoding

AI Infrastructure and MLOPs

Speculative decoding speeds up LLM inference by letting a fast draft model propose tokens that a larger model…

November 23, 2025

Nahush Gowda
Low-Rank Adaptation (LoRA)

AI Infrastructure and MLOPs

LoRA fine-tunes LLMs by training small low-rank adapters on top of frozen weights, slashing memory and compute while…

November 23, 2025

Nahush Gowda
FlashAttention

AI Infrastructure and MLOPs

FlashAttention is an IO‑aware, exact attention algorithm that tiles work into GPU SRAM and fuses kernels to cut…

November 23, 2025

Nahush Gowda
Paged Attention

AI Infrastructure and MLOPs

Paged Attention organizes LLM KV caches into fixed-size pages to reduce fragmentation, enable continuous batching, and support long…

November 23, 2025

Nahush Gowda
Vector Database

AI Infrastructure and MLOPs

A vector database indexes high‑dimensional embeddings for fast similarity search with metadata filters and hybrid retrieval—foundational for RAG,…

November 23, 2025

Nahush Gowda
Model Quantization

AI Infrastructure and MLOPs

Model quantization reduces precision (e.g., INT8/INT4) for weights and activations to shrink memory and speed inference, enabling cheaper,…

November 23, 2025

Nahush Gowda
Key-Value Cache (KV Cache)

AI Infrastructure and MLOPs

A KV cache stores past attention keys/values so LLMs reuse them at each step, cutting latency, enabling continuous…

November 23, 2025

Nahush Gowda
Grouped-Query Attention (GQA)

AI Infrastructure and MLOPs

Grouped-Query Attention shares keys/values across groups of query heads, shrinking KV caches and bandwidth to speed LLM inference…

November 23, 2025

Nahush Gowda
Speculative Decoding

AI Infrastructure and MLOPs

Speculative decoding speeds up LLM inference by letting a fast draft model propose tokens that a larger model…

November 23, 2025

Nahush Gowda