Sandbox99 Chronicles

LLM Basics: How Large Language Models Work and Why They Matter

Written by Jose Mendez

Hi, I’m Jose Mendez, the creator of sandbox99.cc. with a passion for technology and a hands-on approach to learning, I’ve spent more than fifteen years navigating the ever-evolving world of IT.



AI in IT | AI Model | Cloud vs Local AI | GPT | LLM | Natural Language Processing | Transformers



Published Aug 21, 2025 | Last updated on Aug 21, 2025 at 4:30AM



Follow me on LinkedIn

Reading Time: 5 minutes

Table of Contents show

🌟 Introduction

Artificial Intelligence has rapidly shifted from being a research-heavy concept to becoming an essential tool for IT professionals, security researchers, and enterprises alike. At the heart of this shift are Large Language Models (LLMs) — advanced systems capable of generating human-like text, assisting in DevOps, supporting cybersecurity tasks, and even automating workflows.

👉 If you haven’t read my earlier post Breaking Down AI: From Prompts to Responses, I recommend checking it out first to build a strong foundation before diving into LLMs.

But what exactly are LLMs, how do they work, and what do IT professionals need to understand before relying on them? Let’s break it down.

⚙️ How LLMs Work

At their core, LLMs are built on a transformer architecture, first introduced in 2017. The key innovation is the attention mechanism, which allows the model to understand the relationships between words in a sequence instead of treating them independently.

Here’s a simplified breakdown:

🔡 Tokens & embeddings: Text is broken into smaller units called tokens. Each token is converted into a numerical representation (embedding).
📚 Training: Models are pre-trained on massive datasets, learning grammar, structure, facts, and reasoning patterns. Fine-tuning may follow for specific tasks.
🤖 Inference: When you input a prompt, the model predicts the most likely next token, generating responses step by step.

For IT professionals, think of it as a predictive engine that consumes context and outputs structured text, much like a high-powered autocomplete with deep contextual awareness.

🏢 Key Players and Models

The LLM ecosystem is expanding rapidly, with both proprietary and open-source players shaping the space:

🔷 OpenAI (ChatGPT): Creator of the GPT series (ChatGPT, GPT-4, GPT-5), widely recognized for driving mainstream adoption of LLMs.
🌐 Anthropic (Claude): Focused on alignment and safety, offering models like Claude that prioritize responsible AI usage.
🔎 Google DeepMind (Gemini): Successor to Bard, designed as a multimodal AI capable of handling text, images, and more.
📘 Meta (LLaMA): Open-source model family, providing researchers and developers with flexible, customizable LLMs.
⚡ Mistral: Known for lightweight, efficient open-source models optimized for speed and local deployment.
🏭 Cohere & AI21: Enterprise-focused NLP providers offering APIs for text generation, classification, and search.
💻 Microsoft (GitHub Copilot): AI-powered coding assistant built on OpenAI’s models, integrated directly into IDEs like VS Code for developer productivity.
🐉 DeepSeek: Emerging Chinese AI brand, gaining attention for high-performance LLMs and cost-efficient alternatives to Western offerings.
🔍 Perplexity AI (Perplexity): A research-focused AI search engine powered by LLMs, designed to deliver real-time, citation-backed answers for professionals and general users alike.

The key divide remains open-source vs proprietary. Open-source models like LLaMA and Mistral empower IT professionals to self-host and customize deployments, while proprietary solutions such as OpenAI, Claude, Copilot, and Perplexity provide turnkey capabilities but raise cost and privacy considerations.

💡 Real-World Applications of LLMs

LLMs are no longer limited to academic papers — they’re practical tools shaping IT workflows today. Some core applications include:

💬 Chatbots and assistants: Automating customer support and internal IT helpdesks.
👨‍💻 DevOps and coding support: Suggesting code snippets, generating configs, or even building scripts on demand.
📑 Knowledge extraction: Quickly summarizing logs, security advisories, or technical documentation.
📝 Content generation: From documentation drafts to report writing.
🔐 Security operations: Assisting with threat analysis, log correlation, and even incident response triage.

The common thread: LLMs act as force multipliers for IT teams, reducing repetitive tasks while enhancing decision-making speed.

⚠️ Challenges and Limitations

Despite their capabilities, LLMs come with risks IT professionals must recognize:

❌ Hallucinations: Models may generate incorrect but confident-sounding answers.
⚖️ Bias and ethics: Training data can embed harmful or misleading biases.
💸 Resource cost: Training and running LLMs requires massive compute resources. Even inference can be costly depending on model size.
🛡️ Security concerns: LLMs introduce new attack surfaces — from prompt injection to potential data leakage if sensitive inputs are shared.

In practice, LLMs should be treated as assistive tools, not authoritative sources. Always validate outputs before execution, especially in production environments.

🖥️ Running LLMs: Local vs Cloud

One of the biggest choices for IT professionals is deciding where to run an LLM.

☁️ Cloud-hosted APIs:
- ✅ Pros: (The Benefits)
  - Minimal Upfront Cost: You don’t need to purchase any expensive hardware. You can get started immediately with just an API key. This makes it perfect for a hobbyist, a small project, or for testing the viability of an application.
  - Zero Infrastructure Management: The provider handles all the complex infrastructure, scaling, maintenance, and updates. You simply make API calls and receive responses, allowing you to focus entirely on your project.
  - Access to the Best Models: You can use the most powerful and advanced LLMs as soon as they are released, without having to worry about if your hardware can run them.
  - Instant Scalability: The provider’s infrastructure can scale on demand to handle fluctuating workloads. Your project can go from 1 to 1000 users without any changes to your backend.
- ❌ Cons: (The Challenges)
  - Data Privacy Concerns: Your input and output data is sent to a third-party server for processing. While major providers like OpenAI and Google have robust privacy policies, the data is still handled by an external entity, which may be a deal-breaker for sensitive security tasks.
  - Variable and Potentially High Costs: While a subscription is cheaper for low usage, the cost can skyrocket with high-volume or long-running tasks. A simple mistake in a script could lead to a large, unexpected bill.
  - Dependency on a Third Party: You are reliant on the provider’s uptime, service terms, and pricing. A change in their policy or a service outage could directly impact your project.
💻 Local deployment (Self-Hosted):
- ✅ Pros: (The Benefits)
  - Ultimate Data Privacy and Security: This is the most significant advantage. Your data—including all the sensitive information from your pentesting targets—never leaves your local network or a server you control. This is critical for organizations dealing with proprietary, private, or legally regulated data (e.g., in healthcare or finance).
  - No Ongoing Usage Fees: Once you’ve paid for the initial hardware, your operational costs are limited to electricity and cooling. For high-volume, continuous usage, a self-hosted model can become significantly more cost-effective over time compared to pay-per-token API fees.
  - Full Customization and Control: You can fine-tune the model with your own domain-specific data, such as internal security reports, company codebases, or custom scripts. You also have full control over the model’s behavior, parameters, and output, without being limited by a provider’s content filters or API restrictions.
  - No Vendor Lock-in or External Reliance: You are not dependent on a third-party’s service uptime, price changes, or model deprecations. Your system will function as long as your own network and hardware are running.
- ❌ Cons: (The Challenges)
  - High Upfront Hardware Costs: This is the biggest barrier. Running a powerful LLM requires a high-end GPU with a large amount of VRAM (Video RAM). For a small but capable model like Llama 3 8B, you would need a GPU with at least 16 GB of VRAM. For larger, more capable models, you would need a minimum of 24 GB of VRAM (e.g., an NVIDIA RTX 3090/4090) or multiple GPUs, which can cost thousands of dollars.
  - Significant Technical Complexity: You are responsible for all aspects of deployment and maintenance. This includes setting up the server, managing dependencies, configuring the API, and ensuring the model remains updated and performs optimally. This requires in-house expertise in machine learning and infrastructure.
  - Limited Access to Cutting-Edge Models: The most powerful, state-of-the-art models (like GPT-4o, Claude 3 Opus, and Gemini 1.5 Pro) are proprietary and cannot be self-hosted. You are limited to the open-source models available on platforms like Hugging Face.
  - Performance Trade-offs: The performance of a self-hosted model depends entirely on your hardware. While a powerful GPU can provide low latency, a consumer-grade setup may be slower than a major cloud provider’s optimized infrastructure.

🔮 The Future of LLMs

We are still at the early stages of LLM evolution. Some emerging directions include:

🖼️ Multimodal AI: Models that understand not just text, but also images, video, and audio.
🤝 Agentic AI: Systems capable of performing autonomous tasks, such as managing infrastructure or running penetration tests.
🏥 Domain-specific LLMs: Specialized models trained for cybersecurity, legal, or healthcare applications.
📜 Regulation: Governments and organizations are actively working on AI governance to ensure responsible usage.

For IT professionals, this means the toolset will only get more powerful — but also more complex to manage responsibly.

✅ Final Thoughts

LLMs are more than just hype. They represent a paradigm shift in how IT professionals approach automation, knowledge work, and even security. Whether you choose to adopt a cloud-hosted solution for convenience or experiment with local deployments for privacy, the key is to understand both their potential and their limitations.

The takeaway: treat LLMs as partners, not replacements. By experimenting with them today, you’ll be better prepared for the rapidly changing AI-driven landscape tomorrow.

Calendar

October 2025
S	M	T	W	T	F	S
	1	2	3	4
5	6	7	8	9	10	11
12	13	14	15	16	17	18
19	20	21	22	23	24	25
26	27	28	29	30	31

Building a Custom MCP Server for AI-Assisted Pentesting in VS Code

by Jose Mendez | Published Sep 3, 2025 | Last updated on Sep 4, 2025 at 8:22AM | Artificial Intelligence, Cybersecurity, Ethical Hacking

🔍 Introduction Modern penetration testing demands agility, automation, and contextual intelligence. With the rise of AI-assisted development, GitHub Copilot Chat in VS Code now supports Model Context Protocol (MCP)—a powerful bridge between AI and external tools. By...

Prompt Files 101: The Blueprint for Consistent AI Outputs

by Jose Mendez | Published Aug 31, 2025 | Last updated on Aug 31, 2025 at 5:00PM | Artificial Intelligence, Cybersecurity

Introduction In the rapidly evolving world of AI-assisted development, prompt files are becoming the unsung heroes of consistency, reproducibility, and collaboration. Instead of typing lengthy instructions into an AI tool every time, prompt files let you store those...

Breaking Down AI: From Prompts to Responses

by Jose Mendez | Published Aug 20, 2025 | Last updated on Aug 20, 2025 at 3:39PM | Artificial Intelligence

📝 Introduction Artificial Intelligence can feel overwhelming with all the technical jargon—LLM, MCP, AI Agents, prompts, and more. But when broken down into simple terms, these concepts are much easier to understand. At its core, an LLM (Large Language Model) acts...