MySphere Posts

Technical Brief: Core Generative AI Architectures and Implementation

1. Technical Overview

This documentation provides a high-level technical synthesis of the foundational concepts driving modern Generative AI (GenAI). It covers the transition from basic text processing to autonomous agent orchestration, specifically aligned with the IBM watsonx ecosystem. The focus is on understanding how these components interact to build scalable, enterprise-grade AI solutions.

Level: Intermediate Keywords: LLM, Parameter-Efficient Fine-Tuning (PEFT), Vector Databases, Inference, Neural Networks, Agentic Workflows.

2. Technologies & Concepts Covered

  • AI Agents & A2A Protocol: Autonomous systems that use LLMs as “reasoning engines” to execute tasks. The Agent-to-Agent (A2A) protocol facilitates standardized communication between specialized agents.
  • RAG (Retrieval-Augmented Generation): An architectural pattern that optimizes LLM output by querying external, authoritative data sources (Vector DBs) before generating a response.
  • Tokenization: The preprocessing step where text is converted into numerical representations (tokens) that the transformer architecture can process.
  • RLHF (Reinforcement Learning from Human Feedback): A fine-tuning stage that aligns model behavior with human values and instructions using reward models.
  • Diffusion Models: A class of generative models that create data (usually images) by iteratively removing noise from a signal.
  • LoRA (Low-Rank Adaptation): A PEFT technique that freezes pre-trained model weights and injects trainable rank decomposition matrices, drastically reducing VRAM requirements for fine-tuning.

3. Practical Applications

  • Enterprise Search: Implementing RAG to allow AI assistants to answer queries based on private company documentation without retraining the model.
  • Task Automation: Utilizing AI Agents to perform multi-step operations, such as booking flights or generating reports by interacting with third-party APIs.
  • Model Optimization: Applying LoRA to adapt a general-purpose LLM to a specific legal or medical vocabulary with minimal computational overhead.

4. Technical Prerequisites

  • Fundamental understanding of Machine Learning (ML) pipelines.
  • Familiarity with Python and RESTful API integration.
  • Basic knowledge of Transformer architectures and Large Language Models (LLMs).
  • Experience with cloud-based AI environments (e.g., IBM Cloud, watsonx.ai).

5. Next Steps

  • Certification: Prepare for the watsonx AI Assistant Engineer v1 – Professional exam to validate your expertise in agentic workflows.
  • Deep Dive: Review the official Agent2Agent (A2A) protocol documentation for multi-agent system design.
  • Implementation: Experiment with LoRA adapters on open-source models via the watsonx.ai platform.

AI

1. Technical Overview

The Model Context Protocol (MCP) is an open-standard communication protocol designed to replace the fragmented landscape of custom API integrations for Large Language Models (LLMs). While traditional REST APIs require developers to write specific “glue code” for every data source, MCP provides a universal interface.

It enables AI agents to perform dynamic discovery, allowing them to identify available tools and data schemas at runtime. By standardizing how models access local and remote resources, MCP shifts the integration burden from manual endpoint configuration to a scalable, plug-and-play architecture.

2. Technologies & Tools

  • Model Context Protocol (MCP): The core specification for standardized AI-to-data communication.
  • LLM Orchestration: Integration with models like Claude (Anthropic) and other MCP-compliant agents.
  • Transport Layers: Support for communication via `stdio` (local) or `HTTP` (remote).
  • JSON-RPC: The underlying messaging format used for requests and notifications.
  • SDKs: Official support for TypeScript/Node.js and Python for building MCP servers and clients.

3. Practical Applications

  • Dynamic Resource Discovery: AI agents can query an MCP server to see what files, databases, or tools are available without pre-defined hardcoding.
  • Unified SaaS Integration: Accessing data from platforms like GitHub, Slack, or Google Drive through a single protocol rather than managing multiple distinct API authentication and response formats.
  • Context Injection: Automatically fetching real-time documentation or system logs to augment the LLM’s context window during a session.
  • Automated Tool Execution: Enabling agents to execute complex functions (e.g., database writes or code execution) through standardized “Tools” defined in the MCP schema.

4. Technical Prerequisites

  • Programming Proficiency: Experience with Python or Node.js.
  • API Fundamentals: Understanding of JSON, RESTful architectures, and authentication (OAuth, API Keys).
  • LLM Familiarity: Basic knowledge of prompt engineering and how agents utilize external tools (Function Calling).
  • Environment Management: Familiarity with Docker or virtual environments for hosting MCP servers.

5. Next Steps

1. Review the Specification: Study the official MCP documentation to understand the Client-Server-Host relationship. 

2. Build an MCP Server: Use the Python or TypeScript SDK to expose a local data source as an MCP resource.

 3. Test with a Client: Connect your server to an MCP-compliant host (such as Claude Desktop) to verify dynamic tool discovery. 

4. Refactor Legacy Code: Identify static API integrations in your current AI workflows and migrate them to MCP for better scalability.

AI

Have you ever felt like you’re just “vibe coding”?

You throw a prompt at an AI, cross your fingers, and hope the output actually fits your project. It feels more like a game of trial and error than actual engineering. The problem is that “good enough” prompts lead to inconsistent results. You spend more time fixing AI hallucinations and cleaning up messy logic than you would have spent writing the code yourself. It’s frustrating, it’s not scalable, and it definitely won’t fly in an enterprise environment. If you’re tired of the guesswork, it’s time to change the game.

This video introduces you to **Spec-Driven Development (SDD)**—the professional way to build with AI. Instead of relying on random prompts, you’ll learn how to use formal specifications as the “source of truth.” We explore how tools like **watsonx.ai** turn structured requirements into predictable, maintainable, and high-quality code that aligns perfectly with your architecture.

AI

Can AI agents succeed without humans? Anna Gutowska explains the importance of Human-in-the-Loop (HITL) systems for safe and ethical AI decision-making. Learn how HITL balances automation, compliance, and oversight to ensure AI agents align with goals and user needs! AI news moves fast. Sign up for a monthly newsletter for AI updates from IBM → https://ibm.biz/BdpXRC #aiagents #humanintheloop #aiarchitecture

AI

Sobre o vídeo:

Ready to become a certified watsonx AI Assistant Engineer v1 – Professional? Register now and use code IBMTechYT20 for 20% off of your exam → https://ibm.biz/BdpXdV

Learn more about A2A protocol (Agent2Agent) here → https://ibm.biz/BdpBwz

Learn more about Model Context Protocol (MCP) here → https://ibm.biz/BdpBwf

Are your AI agents struggling to collaborate? 🤔 Martin Keen and Anna Gutowska reveal how advanced frameworks enable seamless agent communication and integration with tools. Discover how A2A connects agents and MCP links them to resources for smarter, streamlined workflows. 🚀

AI news moves fast. Sign up for a monthly newsletter for AI updates from IBM → https://ibm.biz/BdpBwP

#a2a #aiagents #aiworkflows

Assistir no YouTube

MCO

Today, I set out to pull an image from Docker Hub on a RHEL 9 system using Podman. This step was part of my journey to install Watson Code Assistant for Z. While Podman is a great alternative to Docker on RHEL, the process had its quirks—especially when working with enterprise environments and specialized tools like Watson Code Assistant. In this post, I’ll share what worked, what didn’t, and some tips to make the setup smoother for anyone tackling the same challenge.

I can simulate the error using podman pull:

podman pull docker.io/library/orientdb:3.2.28

Trying to pull docker.io/library/orientdb:3.2.28…

WARN[0000] Failed, retrying in 1s ... (1/3). Error: initializing source docker://orientdb:3.2.28: pinging container registry registry-1.docker.io: Get "https://registry-1.docker.io/v2/": read tcp 123.123.3.60:35924->98.90.233.146:443: read: connection reset by peer

WARN[0001] Failed, retrying in 1s ... (2/3). Error: initializing source docker://orientdb:3.2.28: pinging container registry registry-1.docker.io: Get "https://registry-1.docker.io/v2/": read tcp 123.123.3.60:57270->52.2.233.225:443: read: connection reset by peer

WARN[0003] Failed, retrying in 1s ... (3/3). Error: initializing source docker://orientdb:3.2.28: pinging container registry registry-1.docker.io: Get "https://registry-1.docker.io/v2/": read tcp 123.123.3.60:59134->3.93.227.105:443: read: connection reset by peer

Error: unable to copy from source docker://orientdb:3.2.28: initializing source docker://orientdb:3.2.28: pinging container registry registry-1.docker.io: Get "https://registry-1.docker.io/v2/": read tcp 123.24.3.60:57266->44.220.224.219:443: read: connection reset by peer

In most cases, the standard approach to enable image pulling behind a corporate proxy is to set the appropriate environment variables. This typically involves exporting your proxy settings like so:

export http_proxy="http://<proxy-host>:<proxy-port>"
export https_proxy="http://<proxy-host>:<proxy-port>"
export no_proxy="localhost,127.0.0.1"

The standard approach does not work, so i configure the proxy Globaly on RHEL 9

Create or edit the file /etc/environment (this affects all users and most services):

sudo vi /etc/environment

Add the following lines (replace with your actual proxy details):

http_proxy=”http://proxy.example.com:8080″
https_proxy=”http://proxy.example.com:8080″
ftp_proxy=”http://proxy.example.com:8080″

If your proxy requires authentication use these lines:
http_proxy=”http://username:[email protected]:8080″
https_proxy=”http://username:[email protected]:8080″

No-proxy (local networks, internal hosts) – very important!. Change the example bellow for your network

no_proxy=”localhost,127.0.0.1,::1,.example.com,10.0.0.0/8,192.168.0.0/16,172.16.0.0/12″
NO_PROXY=”localhost,127.0.0.1,::1,.example.com,10.0.0.0/8,192.168.0.0/16,172.16.0.0/12″

Save the file and source it:

source /etc/environment

podman pull docker.io/library/orientdb:3.2.28
Trying to pull docker.io/library/orientdb:3.2.28…
Getting image source signatures
Copying blob sha256:2d2472ac6840da0115175cae8b0be8d1b8c2b6b74acb5fc6bf185b0c9333b8a3
Copying blob sha256:9b076355b79badd38bc5732aebeb48133934a0adae078e4a6bf52c7d9d7a4a82
Copying blob sha256:0dde1d053504a51dc52d89eb36d703df02afbbc274b25ac00c02fe219e2d6f7c
Copying blob sha256:bd259c2f39c587be8bdd17660976c6158388173b58e226f2b5095d399cf658f2
Copying blob sha256:a22bcaede3cb82201c2804d7a050cbf18f994bd6f0b34f3ec133a47cc3c24ca9
Copying blob sha256:c050069391baee7bb13200b3297c944c954a22f0428769272d51e6cba8118a36
Copying blob sha256:42b80092d7e24557b10ea1e44542f6f887201fe9b56381a4a477cfbf9f2fc099
Copying config sha256:26cbda2db34c77dd8240b721da4177c6b43d6148f50d1ff15b81ce6c5c8869a9
Writing manifest to image destination
26cbda2db34c77dd8240b721da4177c6b43d6148f50d1ff15b81ce6c5c8869a9

podman

Retrieving OpenShift Cluster Logs

Logs are invaluable for understanding the state of your OpenShift cluster and diagnosing problems. OpenShift provides several ways to access these logs efficiently:

Get node logs

Display node journal:

oc adm node-logs <node>

Tail 10 lines from node journal:

oc adm node-logs --tail=10 <node>

Get kubelet journal logs only:

oc adm node-logs -u kubelet.service <node>

Grep kernel word on node journal:

oc adm node-logs --grep=kernel <node>

List /var/log contents:

oc adm node-logs --path=/ <node>

Get /var/log/audit/audit.log from node:

oc adm node-logs --path=audit/audit.log <node>

Pod Logs

Pod logs provide insights into application behavior.

  • Retrieve logs for a specific pod:
oc logs <pod-name> -n <namespace>
  • For pods with multiple containers, specify the container name:
oc logs <pod-name> -c <container-name> -n <namespace>
  • Stream logs in real-time:
oc logs -f <pod-name> -n <namespace>

openshift

“Day 2” operations refer to everything that happens after the cluster is installed, which could be a lot or a little, depending on how you plan to use the cluster.

Effective troubleshooting and monitoring in OpenShift require understanding the right way to retrieve logs and manage issues. While it might be tempting to SSH directly into cluster nodes, OpenShift provides tools and workflows to handle logs more securely and efficiently.

Why You Should Avoid SSHing to Nodes

SSHing directly into cluster nodes might seem like a quick way to debug issues, but it introduces several risks and challenges:

1. Security Risks
  • Inconsistent Access Control: Granting SSH access bypasses OpenShift’s centralized role-based access control (RBAC).
  • Increased Attack Surface: Open SSH ports expose nodes to potential attacks.
2. Configuration Drift
  • Manual changes made via SSH can lead to discrepancies between the actual state and the desired state managed by OpenShift.
  • Untracked modifications can complicate troubleshooting and recovery processes.
3. Cluster Stability
  • Direct changes to system files or services can inadvertently disrupt critical cluster operations.
  • Node taints and labels, critical for scheduling, might be accidentally altered.
4. Unsupported Practices

OpenShift’s design assumes that all management and troubleshooting occur through API-driven tools. Manual SSH access may invalidate support agreements or create unsupported states.

Retrieving OpenShift Cluster Logs

Logs are invaluable for understanding the state of your OpenShift cluster and diagnosing problems. OpenShift provides several ways to access these logs efficiently:

Get node logs

Display node journal:

oc adm node-logs <node>

Tail 10 lines from node journal:

oc adm node-logs --tail=10 <node>

Get kubelet journal logs only:

oc adm node-logs -u kubelet.service <node>

Grep kernel word on node journal:

oc adm node-logs --grep=kernel <node>

List /var/log contents:

oc adm node-logs --path=/ <node>

Get /var/log/audit/audit.log from node:

oc adm node-logs --path=audit/audit.log <node>

Pod Logs

Pod logs provide insights into application behavior.

  • Retrieve logs for a specific pod:
oc logs <pod-name> -n <namespace>
  • For pods with multiple containers, specify the container name:
oc logs <pod-name> -c <container-name> -n <namespace>
  • Stream logs in real-time:
oc logs -f <pod-name> -n <namespace>

Uncategorized

“Day 2” operations refer to everything that happens after the cluster is installed, which could be a lot or a little, depending on how you plan to use the cluster.

Verifying the Health of Your OpenShift 4 Cluster

Managing an OpenShift 4 cluster effectively involves regular health checks to ensure smooth operation and reliability. An unhealthy cluster can lead to downtime, reduced performance, and compromised workloads.

Node Health

Healthy nodes are crucial for running workloads effectively.

  • Use this command to check the node status: oc get nodes

Verify that all nodes show Ready in the STATUS column.

You can use oc get nodes -o wide to get more details about the cluster

For more details on a specific node:

oc describe node <node-name>

Verify resources allocated to a node:

oc describe node <node name>  | grep -A 10 "Allocated resources"

Get Allocated resources for all nodes:

oc describe nodes | grep -A 10 "Allocated resources"

 Check Cluster Operators

Cluster Operators are responsible for managing the lifecycle of key components of an OpenShift cluster. To verify their status:

Run the following command:

oc get clusteroperators

A little addon to the previous command very useful when you are upgrading your cluster:

watch -n5 oc get clusteroperators

 Pod Health

Ensuring that pods are running as expected is a key part of cluster health.

Get pods not running nor completed

oc get pods -A -o wide | grep -v -E 'Completed|Running'


openshift

Introduction

In the context of integration servers, such as IBM App Connect Enterprise (ACE), improper configurations can lead to significant performance issues. One such issue is CPU thrashing, which can be triggered by settings such as setting 256 additional instances for a message flow. This article explains what CPU thrashing is, its causes, effects, and how to mitigate it, focusing on its relevance to IBM ACE.

What is CPU Thrashing?

CPU thrashing occurs when a computer’s CPU becomes overloaded due to excessive context switching between tasks or threads, resulting in poor efficiency and reduced performance. Instead of doing useful work, the CPU spends most of its time managing switching between processes or threads, which impedes progress on actual computations.

Causes of CPU Thrashing

  • Excess Threads or Processes:

When there are many active threads, such as in the case of 256 additional instances in a message flow in IBM ACE, the CPU may have difficulty managing them. Each thread requires context switching, which involves saving and restoring the state of the CPU (registers, program counter, etc.).

If the number of threads exceeds the CPU capacity (e.g., available cores), the system spends more time swapping than performing tasks.

  • Resource Containment:

Threads competing for shared resources (such as memory, I/O, or locks) can cause the CPU to wait, increasing context-switching overhead.

In IBM ACE, if 256 threads access a database simultaneously, contention for connections can lead to thrashing as threads repeatedly block and unblock.

  • Inefficient Memory Management:

Thrashing is often associated with memory issues such as paging or excessive swapping. When the system runs low on physical memory (RAM), it relies on virtual memory, causing frequent disk I/O operations to exchange data. This keeps the CPU busy managing memory instead of executing application logic.

In IBM ACE, a high number of threads can increase memory demand, potentially triggering thrashing if the JVM heap or system memory is insufficient.

  • Inefficient Scaling:

The operating system scheduler can prioritize threads inappropriately, especially under high load, causing quick switching between tasks without completing meaningful work.

In scenarios with many threads, the scheduler may have difficulty allocating CPU time effectively, leading to thrashing.

CPU Thrashing Effects

  • Performance Degradation: Applications run significantly slower as the CPU spends more time on overhead than on productive work.
  • High CPU Utilization with Low Throughput: CPU utilization can be as high as 100%, but little actual work is completed (for example, message processing in IBM ACE slows down).
  • Increased Latency: Response times for tasks (such as message flows) increase due to delays in thread execution.
  • System Instability: In extreme cases, thrashing can lead to timeouts, crashes, or even system unavailability, especially if CPU or memory resources are exhausted.

Relevance to IBM ACE with 256 Additional Instances

In the context of an IBM ACE configuration with 256 additional instances (found in a customer configuration):

Thread Overhead: Each instance represents one thread, so 256 additional instances means up to 257 threads per message flow. If multiple streams are deployed or the server handles multiple concurrent tasks, the total number of threads can overwhelm the CPU, leading to thrashing.

Resource Contention: If these threads access shared resources (such as database connections, file systems, or JVM heap), contention can force the CPU to switch contexts frequently, reducing efficiency.

Memory Pressure: A high number of threads increases memory usage (for example, for thread stacks and message processing). If the memory is insufficient, the system may resort to swapping, aggravating the thrashing.

Mitigation Strategies

Reduce the Number of Threads:

Decrease the number of additional instances in IBM ACE (for example, from 256 to a tested value, such as 10 or 20), based on your workload and server capacity. Conduct performance tests to find the optimal number.
Example: If a single stream with 256 instances causes thrashing, try reducing it to 50 and monitor CPU and throughput.

Optimize Resource Usage:

Adjust message flow to minimize resource-intensive operations (for example, reduce database queries, use connection caching).
Make sure that external systems (such as databases) can handle concurrent requests from many threads.

Increase Hardware Resources:

Add more CPU cores or memory to the server to support high thread counts. For example, a server with 4 cores may struggle with 256 threads, but one with 16 cores may handle it better.
Increase the JVM heap size in IBM ACE to reduce memory-related thrashing, but monitor garbage collection overhead.

Workload Management:

Configure workload management policies in IBM ACE to dynamically limit thread allocation based on load.
Prioritize critical flows to avoid resource contention.

Monitoring and Profiling:

Use IBM ACE monitoring tools or system-level tools (such as top, vmstat,  or perf on Linux) to detect thrashing. Look for high CPU utilization, excessive context switching, or paging/swapping activity.
Check for signs such as low throughput despite high CPU utilization.

Conclusion

 CPU thrashing is a critical issue that can compromise the performance of servers such as IBM ACE, especially in configurations with a high number of threads, such as 256 additional instances. It occurs due to excessive context switching, resource contention, or memory pressure, leading to low throughput, high latency, and possible system instability. To mitigate, it is essential to reduce the number of threads, optimize message flows, increase recursos de hardware e monitorar o desempenho. Com ajustes cuidadosos e testes, é possível evitar o thrashing e garantir um desempenho eficiente no IBM ACE.

ACE