Snake_Bytes | Theodore C. Tanner Jr.

SnakeByte[21]: The Token Arms Race: Architectures Behind Long-Context Foundation Models

OpenAI’s Idea Of A Computer Loving The Sunset

Sometimes I tell sky our story. I dont have to say a word. Words are useless in the cosmos; words are useless and absurd.

~ Jess Welles

First, i trust everyone is safe. Second, i am going to write about something that is evolving extremely quickly and we are moving into a world some are calling context engineering. This is beyond prompt engineering. Instead of this just being mainly a python based how-to use a library, i wanted to do some math and some business modeling, thus the name of the blog.

So the more i thought about this i was thinking in terms of how our world is now tokenized. (Remember the token economy ala the word that shall not be named BLOCKCHAIN. Ok, i said it much like saying CandyMan in the movie CandyMan except i dont think anyone will show up if you say blockchain five times).

The old days of crafting clever prompts are fading fast, some say prompting is obsolete. The future isn’t about typing the perfect input; it’s about engineering the entire context in which AI operates and feeding that back into the evolving system. This shift is a game-changer, moving us from toy demos to real-world production systems where AI can actually deliver on scale.

Prompt Engineering So Last Month

Think about it: prompts might dazzle in a controlled demo, but they crumble when faced with the messy reality of actual work. Most AI agents don’t fail because their underlying models are weak—they falter because they don’t see enough of the window and aperture, if you will, is not wide enough. They lack the full situational awareness needed to navigate complex tasks. That’s where context engineering steps in as the new core skill, the backbone of getting AI to handle real jobs effectively.

Words Have Meanings.

~ Dr. Mathew Aldridge

So, what does context engineering mean? It’s a holistic approach to feeding AI the right information at the right time, beyond just a single command. It starts with system prompts that shape the agent’s behavior and voice, setting the tone for how it responds. Then there’s user intent, which frames the actual goalnot just what you ask, but why you’re asking it. Short-term memory keeps multi-step logic and dialogue history alive, while long-term memory stores facts, preferences, and learnings for consistency. Retrieval-Augmented Generation (RAG) pulls in relevant data from APIs, databases, and documents, ensuring the agent has the latest context. Tool availability empowers agents to act not just answer by letting them execute tasks. Finally, structured outputs ensure responses are usable, cutting the fluff and delivering actionable results.

Vertically Trained Horizontally Chained

This isn’t theory; platforms like LangChain and Anthropic are already proving it at scale. They split complex tasks into sub-agents, each with a focused context window to avoid overload. Long chats get compressed via summarization, keeping token limits in check. Sandboxed environments isolate heavy state, preventing crashes, while memory is managed with embeddings, scratchpads, and smart retrieval systems. LangGraph orchestrates these agents with fine-grained control, and LangSmith’s tracing and testing tools evaluate every context tweak, ensuring reliability. It’s a far cry from the old string-crafting days of prompting.

Prompting involved crafting a response with a well-worded sentence. Context engineering is the dynamic design of systems, building full-stack pipelines that provide AI with the right input when it matters. This is what turns a flashy demo into a production-ready product. The magic happens not in the prompt, but in the orchestrated context that surrounds it. As we move forward, mastering this skill will distinguish innovators from imitators, enabling AI to solve real-world problems with precision and power. People will look at you quizzically. In this context, tokens are the food for Large Language Models and are orthogonal to tokens in a blockchain economy.

Slide The Transformers

Which brings us to the evolution of long-context transformers, examining key players, technical concepts, and business implications. NOTE: Even back in the days of the semantic web it was about context.

Foundation model development has entered a new frontier not just of model size, but of memory scale. We’re witnessing the rise of long-context transformers: architectures capable of handling hundreds of thousands and even millions of tokens in a single pass.

This shift is not cosmetic; it alters the fundamental capabilities and business models of LLM platforms. First, i’ll analyze the major players, their long-term strategies, and then we will run through some mathematical architecture powering these transformations. Finally getting down to the Snake Language on basic function implementations for very simple examples.

Company	Model	Max Context Length	Transformer Variant	Notable Use Case
Google	Gemini 1.5 Pro	2M tokens	Mixture-of-Experts + RoPE	Context-rich agent orchestration
OpenAI	GPT-4 Turbo	128k tokens	LLM w/ windowed attention	ChatGPT + enterprise workflows
Anthropic	Claude 3.5 Sonnet	200k tokens	Constitutional Sparse Attention	Safety-aligned memory agents
Magic.dev	LTM-2-Mini	100M tokens	Segmented Recurrence w/ Cache	Codebase-wide comprehension
Meta	Llama 4 Scout	10M tokens	On-device, efficient RoPE	Edge + multimodal inference
Mistral	Mistral Large 2	128k tokens	Sliding Window + Local Attention	Generalist LLM APIs
DeepSeek	DeepSeek V3	128k tokens	Block Sparse Transformer	Multilingual document parsing
IBM	Granite Code/Instruct	128k tokens	Optimized FlashAttention-2	Code generation & compliance

The Matrix Of The Token Grid Arms Race

Redefining Long Context

Here is my explanation and blurb that i researched on each of these:

Google – Gemini 1.5 Pro (2M tokens, Mixture-of-Experts + RoPE)
Google’s Gemini 1.5 Pro is a heavyweight, handling 2 million tokens with a clever mix of Mixture-of-Experts and Rotary Positional Embeddings. It shines in context-rich agent orchestration, seamlessly managing complex, multi-step tasks across vast datasets—perfect for enterprise-grade automation.
OpenAI – GPT-4 Turbo (128k tokens, LLM w/ windowed attention)
OpenAI’s GPT-4 Turbo packs 128k tokens into a windowed attention framework, making it a go-to for ChatGPT and enterprise workflows. Its strength lies in balancing performance and accessibility, delivering reliable responses for business applications with moderate context needs.
Anthropic – Claude 3.5 Sonnet (200k tokens, Constitutional Sparse Attention)
Anthropic’s Claude 3.5 Sonnet offers 200k tokens with Constitutional Sparse Attention, prioritizing safety and alignment. It’s a standout for memory agents, ensuring secure, ethical handling of long conversations—a boon for sensitive industries like healthcare or legal.
Magic.dev – LTM-2-Mini (100M tokens, Segmented Recurrence w/ Cache)
Magic.dev’s LTM-2-Mini pushes the envelope with 100 million tokens, using Segmented Recurrence and caching for codebase-wide comprehension. This beast is ideal for developers, retaining entire project histories to streamline coding and debugging at scale.
Meta – Llama 4 Scout (10M tokens, On-device, efficient RoPE)
Meta’s Llama 4 Scout brings 10 million tokens to the edge with efficient RoPE, designed for on-device use. Its multimodal inference capability makes it a favorite for privacy-focused applications, from smart devices to defense systems, without cloud reliance.
Mistral – Mistral Large 2 (128k tokens, Sliding Window + Local Attention)
Mistral Large 2 handles 128k tokens with Sliding Window and Local Attention, offering a versatile generalist LLM API. It’s a solid choice for broad applications, providing fast, efficient responses for developers and businesses alike.
DeepSeek – DeepSeek V3 (128k tokens, Block Sparse Transformer)
DeepSeek V3 matches 128k tokens with a Block Sparse Transformer, excelling in multilingual document parsing. Its strength lies in handling diverse languages and formats, making it a go-to for global content analysis and translation tasks.
IBM – Granite Code/Instruct (128k tokens, Optimized FlashAttention-2)
IBM’s Granite Code/Instruct leverages 128k tokens with Optimized FlashAttention-2, tailored for code generation and compliance. It’s a powerhouse for technical workflows, ensuring accurate, regulation-aware outputs for developers and enterprises.

Each of these companies is carving out their own window of context and capabilities for the tokens arms race. So what are some of the basic mathematics at work here for long context?

i’ll integrate Python code to illustrate key architectural ideas (RoPE, Sparse Attention, MoE, Sliding Window) and business use cases (MaaS, Agentic Platforms), using libraries like NumPy, PyTorch, and a mock agent setup. These examples will be practical and runnable in a Jupyter environment.

Rotary Positional Embeddings (RoPE) Extensions

Rotary Positional Embeddings (RoPE) is a technique for incorporating positional information into Transformer-based Large Language Models (LLMs). Unlike traditional methods that add positional vectors, RoPE encodes absolute positions with a rotation matrix and explicitly includes relative position dependency within the self-attention mechanism. This approach enhances the model’s ability to handle longer sequences and better understand token interactions across larger contexts.

The core idea behind RoPE involves rotating the query and key vectors within the attention mechanism based on their positions in the sequence. This rotation encodes positional information and affects the dot product between query and key vectors, which is crucial for attention calculations.

To allow for arbitrarily long context, models generalize RoPE using scaling factors and interpolation. Here is the set of basic equations:

$\text{RoPE}(x_i) = x_i \cos(\theta_i) + x_i^\perp \sin(\theta_i)$

where $\theta_i \propto \frac{1}{10000^{\frac{2i}{d}}}$ , extended by interpolation.

Here is some basic code implementing this process:

import numpy as np
import torch

def apply_rope(input_seq, dim=768, max_seq_len=1000000):
    """
    Apply Rotary Positional Embeddings (RoPE) to input sequence.
    Args:
        input_seq (torch.Tensor): Input tensor of shape (batch_size, seq_len, dim)
        dim (int): Model dimension (must be even)
        max_seq_len (int): Maximum sequence length for precomputing positional embeddings
    Returns:
        torch.Tensor: Input with RoPE applied, same shape as input_seq
    """
    batch_size, seq_len, dim = input_seq.shape
    assert dim % 2 == 0, "Dimension must be even for RoPE"
    
    # Compute positional frequencies for half the dimension
    theta = 10000 ** (-2 * np.arange(0, dim//2, 1) / (dim//2))
    pos = np.arange(seq_len)
    pos_emb = pos[:, None] * theta[None, :]
    pos_emb = np.stack([np.cos(pos_emb), np.sin(pos_emb)], axis=-1)  # Shape: (seq_len, dim//2, 2)
    pos_emb = torch.tensor(pos_emb, dtype=torch.float32).view(seq_len, -1)  # Shape: (seq_len, dim)

    # Reshape and split input for RoPE
    x = input_seq  # Keep original shape (batch_size, seq_len, dim)
    x_reshaped = x.view(batch_size, seq_len, dim//2, 2).transpose(2, 3)  # Shape: (batch_size, seq_len, 2, dim//2)
    x_real = x_reshaped[:, :, 0, :]  # Real part, shape: (batch_size, seq_len, dim//2)
    x_imag = x_reshaped[:, :, 1, :]  # Imaginary part, shape: (batch_size, seq_len, dim//2)

    # Expand pos_emb for batch dimension and apply RoPE
    pos_emb_expanded = pos_emb[None, :, :].expand(batch_size, -1, -1)  # Shape: (batch_size, seq_len, dim)
    out_real = x_real * pos_emb_expanded[:, :, ::2] - x_imag * pos_emb_expanded[:, :, 1::2]
    out_imag = x_real * pos_emb_expanded[:, :, 1::2] + x_imag * pos_emb_expanded[:, :, ::2]

    # Combine and reshape back to original
    output = torch.stack([out_real, out_imag], dim=-1).view(batch_size, seq_len, dim)
    return output

# Mock input sequence (batch_size=1, seq_len=5, dim=4)
input_tensor = torch.randn(1, 5, 4)
rope_output = apply_rope(input_seq=input_tensor, dim=4, max_seq_len=5)
print("RoPE Output Shape:", rope_output.shape)
print("RoPE Output Sample:", rope_output[0, 0, :])  # Print first token's output

You should have get the following output:

RoPE Output Shape: torch.Size([1, 5, 4])
RoPE Output Sample: tensor([ 0.6517, -0.6794, -0.4551,  0.3666])

The shape verifies the function’s dimensional integrity, ensuring it’s ready for downstream tasks. The sample gives a glimpse into the transformed token, showing RoPE’s effect. You can compare it to the raw input_tensor[0, 0, :] tto see the rotation (though exact differences depend on position and frequency).see the rotation (though exact differences depend on position and to see the rotation (though exact differences depend on position and frequency).

Sparse Attention Mechanisms:

Sparse attention mechanisms are techniques used in transformer models to reduce computational cost by focusing on a subset of input tokens during attention calculations, rather than considering all possible token interactions. This selective attention process enhances efficiency and allows models to handle longer sequences, making them particularly useful for natural language processing tasks like translation and summarization.

In standard self-attention mechanisms, each token in an input sequence attends to every other token, resulting in a computational complexity that scales quadratically with the sequence length $(O(n^2 d))$ . For long sequences, this becomes computationally expensive. Sparse attention addresses this by selectively attending to a subset of tokens, reducing the computational burden. Complexity drops from $O(n^2 d)$ to $O(nd \sqrt{n})$ or better using block or sliding windows.

Sparse attention mechanisms achieve this reduction in computation by reducing the number of interactions instead of computing attention scores for all possible token pairs, sparse attention focuses on a smaller, selected set of tokens. The downside is by focusing on a subset of tokens, sparse attention may potentially discard some relevant information, which could negatively impact performance on certain tasks. Also it gets more complex code-wise.

This is mock implementation using pytorch.

import torch
import torch.nn.functional as F

def sparse_attention(q, k, v, window_size=3):
    batch, num_heads, seq_len, head_dim = q.shape
    attn_scores = torch.matmul(q, k.transpose(-2, -1)) / (head_dim ** 0.5)
    # Apply sliding window mask
    mask = torch.triu(torch.ones(seq_len, seq_len), diagonal=1-window_size).bool()
    attn_scores.masked_fill_(mask, float('-inf'))
    attn_weights = F.softmax(attn_scores, dim=-1)
    return torch.matmul(attn_weights, v)

# Mock query, key, value tensors (batch=1, heads=2, seq_len=6, dim=4)
q = torch.randn(1, 2, 6, 4)
k = torch.randn(1, 2, 6, 4)
v = torch.randn(1, 2, 6, 4)
output = sparse_attention(q, k, v, window_size=3)
print("Sparse Attention Output Shape:", output.shape)

This should just print out the shape:

Sparse Attention Output Shape: torch.Size([1, 2, 6, 4])

The sparse_attention function implements a simplified attention mechanism with a sliding window mask, mimicking sparse attention patterns used in long-context transformers. It takes query (q), key (k), and value (v) tensors, computes attention scores, applies a mask to limit the attention window, and returns the weighted output. The shape torch.Size([1, 2, 6, 4]) indicates that the output tensor has the same structure as the input v tensor. This is expected because the attention mechanism computes a weighted sum of the value vectors based on the attention scores derived from q and k. The sliding window mask (defined by window_size=3) restricts attention to the current token and the previous 2 tokens (diagonal offset of 1-window_size), but it doesn’t change the output shape it only affects which scores contribute to the weighting. The output retains the full sequence length and head structure, ensuring compatibility with downstream layers in a transformer model. This shape signifies that for each of the 1 batch, 2 heads, and 6 tokens, the output is a 4-dimensional vector, representing the attended features after the sparse attention operation.

Mixture-of-Experts (MoE) + Routing

Mixture-of-Experts (MoE) is a machine learning technique that utilizes multiple specialized neural networks, called “experts,” along with a routing mechanism to process input data. The router, a gating network, determines which experts are most relevant for a given input and routes the data accordingly, activating only those specific experts. This approach allows for increased model capacity and computational efficiency, as only a subset of the model needs to be activated for each input.

Key Components:

Experts: These are individual neural networks, each trained to be effective at processing specific types of data or patterns. They can be simple feedforward networks, or even more complex structures.
Routing/Gating Network:This component acts as a dispatcher, deciding which experts are most appropriate for a given input. It typically uses a learned weighting or probability distribution to select the experts.

This basic definition activates a sparse subset of experts:

$\text{MoE}(x) = \sum_{i=1}^k g_i(x) \cdot E_i(x)$

(Simulating MoE with 2 of 4 experts):

import torch
import torch.nn as nn

class MoE(nn.Module):
    def __init__(self, num_experts=4, top_k=2):
        super().__init__()
        self.experts = nn.ModuleList([nn.Linear(4, 4) for _ in range(num_experts)])
        self.gate = nn.Linear(4, num_experts)
        self.top_k = top_k

    def forward(self, x):
        scores = self.gate(x)  # (batch, num_experts)
        _, top_indices = scores.topk(self.top_k, dim=-1)  # Select top 2 experts
        output = torch.zeros_like(x)
        for i in range(x.shape[0]):
            for j in top_indices[i]:
                output[i] += self.experts[j](x[i])
        return output / self.top_k

# Mock input (batch=2, dim=4)
x = torch.randn(2, 4)
moe = MoE(num_experts=4, top_k=2)
moe_output = moe(x)
print("MoE Output Shape:", moe_output.shape)

This should give you the output:

MoE Output Shape: torch.Size([2, 4])

The shape torch.Size([2, 4]) indicates that the output tensor has the same batch size and dimension as the input tensor x. This is expected because the MoE applies a linear transformation from each selected expert (all outputting 4-dimensional vectors) and averages them, maintaining the input’s feature space. The Mixture-of-Experts mechanism works by:

Computing scores via self.gate(x), producing a (2, 4) tensor that’s transformed to (2, num_experts) (i.e., (2, 4)).
Selecting the top_k=2 experts per sample using topk, resulting in indices for the 2 best experts out of 4.
Applying each expert’s nn.Linear(4, 4) to the input x[i], summing the outputs, and dividing by top_k to normalize the contribution.

The output represents the averaged transformation of the input by the two most relevant experts for each sample, tailored to the input’s characteristics as determined by the gating function.

Sliding Window + Recurrence for Locality

While A context window in an AI model refers to the amount of information (tokens in text) it can consider at any one time. The Locality emphasizes the importance of data points that are close together in a sequence. In many applications, recent information is more relevant than older information. For example, in conversations, recent dialogue contributes most to a coherent response. The importance of that addition lies in effectively handling long contexts in large language models (LLMs) and optimizing inference. Strategies involve splitting the context into segments and managing the Key-Value (KV) cache using data structures like trees.

Segmenting Context: For very long inputs, the entire context might not fit within the model’s memory or process efficiently as a single unit. Therefore, the context can be divided into smaller, manageable segments or chunks.

KV Cache: During LLM inference, the KV cache stores previously computed “keys” and “values” for tokens in the input sequence. This avoids recomputing attention mechanisms for already processed tokens, speeding up the generation process ergo the terminology.

This code splits context into segments with KV cache trees.

import torch

def sliding_window_recurrence(input_seq, segment_size=3, cache_size=2):
    """
    Apply sliding window recurrence with caching.
    Args:
        input_seq (torch.Tensor): Input tensor of shape (batch_size, seq_len, dim)
        segment_size (int): Size of each segment
        cache_size (int): Size of the cache
    Returns:
        torch.Tensor: Output with recurrence applied
    """
    batch_size, seq_len, dim = input_seq.shape
    output = []
    # Initialize cache with batch dimension
    cache = torch.zeros(batch_size, cache_size, dim)  # Shape: (batch_size, cache_size, dim)
    
    for i in range(0, seq_len, segment_size):
        segment = input_seq[:, i:i+segment_size]  # Shape: (batch_size, segment_size, dim)
        # Ensure cache and segment dimensions align
        if segment.size(1) < segment_size and i + segment_size <= seq_len:
            segment = torch.cat([segment, torch.zeros(batch_size, segment_size - segment.size(1), dim)], dim=1)
        # Mock recurrence: combine with cache
        combined = torch.cat([cache, segment], dim=1)[:, -segment_size:]  # Take last segment_size
        output.append(combined)
        # Update cache with the last cache_size elements
        cache = torch.cat([cache, segment], dim=1)[:, -cache_size:]

    return torch.cat(output, dim=1)

# Mock input (batch=1, seq_len=6, dim=4)
input_tensor = torch.randn(1, 6, 4)
recurrent_output = sliding_window_recurrence(input_tensor, segment_size=3, cache_size=2)
print("Recurrent Output Shape:", recurrent_output.shape)

The output should be:

Recurrent Output Shape: torch.Size([1, 6, 4])

The shape torch.Size([1, 6, 4]) indicates that the output tensor has the same structure as the input tensor input_tensor. This is intentional, as the function aims to process the entire sequence while applying a recurrent mechanism. Sliding Window Process:

The input sequence (length 6) is split into segments of size 3. With seq_len=6 and segment_size=3, there are 2 full segments (indices 0:3 and 3:6).
Each segment is combined with a cache (size 2) using torch.cat, and the last segment_size elements are kept (e.g., (2+3)=5 elements, sliced to 3).
The loop runs twice, appending segments and torch.cat(output, dim=1) reconstructs the full sequence length of 6.

For the Recurrence Effect the cache (initialized as (1, 2, 4)) carries over information from previous segments, mimicking a recurrent neural network’s memory. The output at each position reflects the segment’s data combined with the cache’s prior context, but the shape remains unchanged because the function preserves the original sequence length. In practical applicability for a long-context model, this output could feed into attention layers, where the recurrent combination enhances positional awareness across segments, supporting lengths like 10M tokens (e.g., Meta’s Llama 4 Scout).

So how do we make money? Here are some business model implications.

MemoryAsAService: MaaS class mocks token storage and retrieval with a cost model. For enterprise search, compliance, and document workflows, long-context models enable models to hold entire datasets in RAM, reducing RAG complexity.

Revenue lever: Metered billing based on tokens stored and tokens retrieved

Agentic Platforms and Contextual Autonomy: (With 10M+ token windows), AI agents can:

Load multiyear project timelines
Track legal/compliance chains of thought
Maintain psychological memory for coaching or therapy

Revenue lever: Subscription for persistent agent state memory

Embedded / Edge LLMs: Pruning the attention mimics on-device optimization.

What are you attentive to and where are you attentive to? This is very important for autonomy systems. Insect-like LLMS? Models uses hardware-tuned attention pruning to run on-device without cloud support.

Revenue lever:

Hardware partnerships (Qualcomm, Apple, etc.)
Private licensing for defense/healthcare

Developer Infrastructure: Codebase Memory tracks repo events. Can Haz Logs? Devops on steroids. Analyize repos based on quality and deployment size.

Revenue lever: Developer SaaS pricing by repo or engineering team size (best fewest ups the revenue per employee and margin).

Magic.dev monetizes 100M-token memory by creating LLM-native IDEs that retain architecture history, unit tests, PRs, and stack traces. Super IDE’s for Context Engineering?

Here are some notional mappings for catalyst:

Business Edge	Mathematical Leverage
Persistent memory	Attention cache, memory layers, LRU gating
Low latency	Sliding windows, efficient decoding
Data privacy	On-device + quantized attention ops
Vertical domain AI	MoE + sparse fine-tuning adapters

Closing

In this token-maximized world, the architectural arms race is becoming a memory computation problem. The firms that master the blend of:

Efficient inference at high context length
Agentic memory persistence
Economically viable context scaling will win not just on benchmark scores, but on unit economics, retention, and defensibility.

In the world of AI business models, context is the new (i couldnt think of a buzzword please help me LazyWeb^TM)? Also I believe that William Gibson was right. Got More Ram?

Until Then.

#iwishyouwater

Ted ℂ. Tanner Jr. (@tctjr) / X

MUZAK TO BLOG BY: Jesse Welles, Pilgrim. If you havent listened to Jesse Welles you are missing out. He is our present-day Bob Dylan. Look him up on youtube out in the field and under the power lines.

SnakeByte[20] Fractals Fractals Everywhere

Groks Idea of a Luminous 3D Fractal

Abandon the urge to simplify everything, to look for formulas and easy answers, and to begin to think multidimensionally, to glory in the mystery and paradoxes of life.

~ Scott Peck

First, i hope everyone is safe. Second, I’m writing a SnakeByte that is very near and dear to my heart, fractals. Which is actually encapsulated in the areas of complexity theory, Fibonacci sequences, and the Golden Ratio, ϕ. The world around Us and Our Universe is teeming with these self-similar geometries. Also, sometime ago, i wrote a blog on the mathematics of Grief and how I thought it was fractal-based, never-ending, ever-evolving, entitled “It’s An Honor To Say Goodbye”.

For example:

Golden Ratio: Appears in fractals as a scaling factor for self-similar structures (e.g., golden spirals, golden triangles) and in the proportions of natural fractals.
Fibonacci Sequence: Manifests in the counting of fractal elements (e.g., spirals in sunflowers, branches in trees) and converges to ϕ, linking to fractal scaling.
Mandelbrot Set: Contains spiral patterns that can approximate ϕ-based logarithmic spirals, especially near the boundary.
Nature and Art: Both fractals and Fibonacci patterns appear in natural growth and aesthetic designs, reflecting universal mathematical principles as well as our own bodies.

Fractals are mesmerizing mathematical objects that exhibit self-similarity, meaning their patterns repeat at different scales. Their intricate beauty and complexity make them a fascinating subject for mathematicians, artists, and programmers alike. In this blog, i’l dive into the main types of fractals and then create an interactive visualization of the iconic Mandelbrot set using Python and Bokeh, complete with adjustable parameters. get ready Oh Dear Readers i went long in the SnakeTooth.

Like What Are Fractals?

A fractal is a geometric shape that can be split into parts, each resembling a smaller copy of the whole. Unlike regular shapes like circles or squares, fractals have a fractional dimension and display complexity at every magnification level. They’re often generated through iterative processes or recursive algorithms, making them perfect for computational exploration.

Fractals have practical applications too, from modeling natural phenomena (like tree branching or mountain ranges) to optimizing antenna designs and generating realistic graphics in movies and games.

Types of Fractals

Fractals come in various forms, each with unique properties and generation methods. Here are the main types:

Geometric Fractals

Geometric fractals are created through iterative geometric rules, where a simple shape is repeatedly transformed. The result is a self-similar structure that looks the same at different scales.

Example: Sierpinski Triangle
- Start with a triangle, divide it into four smaller triangles, and remove the central one. Repeat this process for each remaining triangle.
- The result is a triangle with an infinite number of holes, resembling a lace-like pattern.
- Properties: Perfect self-similarity, simple recursive construction.
Example: Koch Snowflake
- Begin with a straight line, divide it into three parts, and replace the middle part with two sides of an equilateral triangle. Repeat for each segment.
- The curve becomes infinitely long while enclosing a finite area.
- Properties: Continuous but non-differentiable, infinite perimeter.

Algebraic Fractals

Algebraic fractals arise from iterating complex mathematical functions, often in the complex plane. They’re defined by equations and produce intricate, non-repeating patterns.

Example: Mandelbrot Set (the one you probably have seen)
- Defined by iterating the function $z_n+1=z_n^2+c,$ where $z$ and $c$ are complex numbers.
- Points that remain bounded under iteration form the Mandelbrot set, creating a black region with a colorful, infinitely complex boundary.
- Properties: Self-similarity at different scales, chaotic boundary behavior.
Example: Julia Sets
- Similar to the Mandelbrot set but defined for a fixed $c$ value, with $z$ varying across the plane.
- Each $c$ produces a unique fractal, ranging from connected “fat” sets to disconnected “dust” patterns.
- Properties: Diverse shapes, sensitive to parameter changes.

Random Fractals

Random fractals incorporate randomness into their construction, mimicking natural phenomena like landscapes or clouds. They’re less predictable but still exhibit self-similarity.

Example: Brownian Motion
- Models the random movement of particles, creating jagged, fractal-like paths.
- Used in physics and financial modeling.
- Properties: Statistically self-similar, irregular patterns. Also akin to the tail of a reverb. Same type of fractal nature.
Example: Fractal Landscapes
- Generated using algorithms like the diamond-square algorithm, producing realistic terrain or cloud textures.
- Common in computer graphics for games and simulations.
- Properties: Approximate self-similarity, naturalistic appearance.

Strange Attractors

Strange attractors arise from chaotic dynamical systems, where iterative processes produce fractal patterns in phase space. They’re less about geometry and more about the behavior of systems over time.

Example: Lorenz Attractor
- Derived from equations modeling atmospheric convection, producing a butterfly-shaped fractal.
- Used to study chaos theory and weather prediction.
- Properties: Non-repeating, fractal dimension, sensitive to initial conditions.

Now Let’s Compute Some Complexity

Fractals are more than just pretty pictures. They help us understand complex systems, from the growth of galaxies to the structure of the internet. For programmers, fractals are a playground for exploring algorithms, visualization, and interactivity. Let’s now create an interactive Mandelbrot set visualization using Python and Bokeh, where you can tweak parameters like zoom and iterations to explore its infinite complexity.

The Mandelbrot set is a perfect fractal to visualize because of its striking patterns and computational simplicity. We’ll use Python with the Bokeh library to generate an interactive plot, allowing users to adjust the zoom level and the number of iterations to see how the fractal changes.

Prerequisites

Install required libraries: pip install numpy bokeh
Basic understanding of Python and complex numbers.

Code Overview

The code below does the following:

Computes the Mandelbrot set by iterating the function $z_{n+1} = z_n^2 + c$ for each point in a grid.
Colors points based on how quickly they escape to infinity (for points outside the set) or marks them black (for points inside).
Uses Bokeh to create a static plot with static parameters.
Uses Bokeh to create a static plot.
Displays the fractal as an image with a color palette.

import numpy as np
from bokeh.plotting import figure, show
from bokeh.io import output_notebook
from bokeh.models import LogColorMapper, LinearColorMapper
from bokeh.palettes import Viridis256
import warnings
warnings.filterwarnings('ignore')

# Enable Bokeh output in Jupyter Notebook
output_notebook()

# Parameters for the Mandelbrot set
width, height = 800, 600  # Image dimensions
x_min, x_max = -2.0, 1.0  # X-axis range
y_min, y_max = -1.5, 1.5  # Y-axis range
max_iter = 100  # Maximum iterations for divergence check

# Create coordinate arrays
x = np.linspace(x_min, x_max, width)
y = np.linspace(y_min, y_max, height)
X, Y = np.meshgrid(x, y)
C = X + 1j * Y  # Complex plane

# Initialize arrays for iteration counts and output
Z = np.zeros_like(C)
output = np.zeros_like(C, dtype=float)

# Compute Mandelbrot set
for i in range(max_iter):
    mask = np.abs(Z) <= 2
    Z[mask] = Z[mask] * Z[mask] + C[mask]
    output += mask

# Normalize output for coloring
output = np.log1p(output)

# Create Bokeh plot
p = figure(width=width, height=height, x_range=(x_min, x_max), y_range=(y_min, y_max),
           title="Mandelbrot Fractal", toolbar_location="above")

# Use a color mapper for visualization
color_mapper = LogColorMapper(palette=Viridis256, low=output.min(), high=output.max())
p.image(image=[output], x=x_min, y=y_min, dw=x_max - x_min, dh=y_max - y_min,
        color_mapper=color_mapper)

# Display the plot
show(p)

When you run in in-line you should see the following:

BokehJS 3.6.0 successfully loaded.

Mandlebrot Fractal with Static Parameters

Now let’s do something a little more interesting and add some adjustable parameters for pan and zoom to explore the complex plane space. i’ll break the sections down as we have to add a JavaScript callback routine due to some funkiness with Bokeh and Jupyter Notebooks.

Ok so first of all, let’s break down the imports:

Ok import Libraries:

numpy for numerical computations (e.g., creating arrays for the complex plane).
bokeh modules for plotting (figure, show), notebook output (output_notebook), color mapping (LogColorMapper), JavaScript callbacks (CustomJS), layouts (column), and sliders (Slider).
The above uses Viridis256 for a color palette for visualizing the fractal. FWIW Bokeh provides us with Matplotlib color palettes. There are 5 types of Matplotlib color palettes Magma, Inferno, Plasma, Viridis, Cividis. Magma is my favorite.

Warnings: Suppressed to avoid clutter from Bokeh or NumPy. (i know treat all warnings as errors).

import numpy as np
from bokeh.plotting import figure, show
from bokeh.io import output_notebook
from bokeh.models import LogColorMapper, ColumnDataSource, CustomJS
from bokeh.palettes import Viridis256
from bokeh.layouts import column
from bokeh.models.widgets import Slider
import warnings
warnings.filterwarnings('ignore')

Next, and this is crucial. The following line configures Bokeh to render plots inline in the Jupyter Notebook:

output_notebook()

Initialize the MandleBrot Set Parameters where:

width, height: Dimensions of the output image (800×600 pixels).
initial_x_min, initial_x_max: Range for the real part of complex numbers (-2.0 to 1.0).
initial_y_min, initial_y_max: Range for the imaginary part of complex numbers (-1.5 to 1.5).
max_iter: Maximum number of iterations to determine if a point is in the Mandelbrot set (controls detail).

These ranges define the initial view of the complex plane, centered around the Mandelbrot set’s most interesting region.

width, height = 800, 600
initial_x_min, initial_x_max = -2.0, 1.0
initial_y_min, initial_y_max = -1.5, 1.5
max_iter = 100

Okay, now we’re getting somewhere. Onto the meat of the code, as it were. The computation function:

Purpose: Computes the Mandelbrot set for a given region of the complex plane.Steps:

Create arrays x and y for real and imaginary parts using np.linspace.
Form a 2D grid with np.meshgrid and combine into a complex array $C = X + 1j * Y$ .
Initialize $Z$ (iteration values) and output (iteration counts) as zero arrays.
Iterate up to max_iter times:
- mask = np.abs(Z) <= 2: Identifies points that haven’t diverged (magnitude ≤ 2).
- Update $Z$ for non-diverged points using the Mandelbrot formula: $Z = Z^2 + C$
- Increment output for points still in the set (via mask).
Apply np.log1p(output) to smooth the color gradient for visualization. This took me a while to be honest.

Output: A 2D array where each value represents the (log-transformed) number of iterations before divergence, used for coloring.

def compute_mandelbrot(x_min, x_max, y_min, y_max, width, height, max_iter):
    x = np.linspace(x_min, x_max, width)
    y = np.linspace(y_min, y_max, height)
    X, Y = np.meshgrid(x, y)
    C = X + 1j * Y
    Z = np.zeros_like(C)
    output = np.zeros_like(C, dtype=float)
    for i in range(max_iter):
        mask = np.abs(Z) <= 2
        Z[mask] = Z[mask] * Z[mask] + C[mask]
        output += mask
    return np.log1p(output)

Let’s call the function where we compute the Mandelbrot set for the initial view (-2.0 to 1.0 real, -1.5 to 1.5 imaginary).

output = compute_mandelbrot(initial_x_min, initial_x_max, initial_y_min, initial_y_max, width, height, max_iter)

Now we are going to set up the eye candy with bokeh.

Figure: Creates a Bokeh plot with the specified dimensions and initial ranges.

x_range and y_range set the plot’s axes to match the complex plane’s real and imaginary parts.
toolbar_location=”above” adds zoom and pan tools above the plot. (if anyone has any idea, i couldn’t get this to work correctly)
Color Mapper: Maps iteration counts to colors using Viridis256 (a perceptually uniform palette). LogColorMapper: ensures smooth color transitions.
Image: Renders the output array as an image:
dw, dh: Width and height in data coordinates.

p = figure(width=width, height=height, x_range=(initial_x_min, initial_x_max), y_range=(initial_y_min, initial_y_max),
           title="Interactive Mandelbrot Fractal", toolbar_location="above")
color_mapper = LogColorMapper(palette=Viridis256, low=output.min(), high=output.max())
image = p.image(image=[output], x=initial_x_min, y=initial_y_min, dw=initial_x_max - initial_x_min, dh=initial_y_max - initial_y_max, color_mapper=color_mapper)

Now let’s add some interactivity where:

Sliders:

step: Controls slider precision (0.1 for smooth adjustments).

x_pan_slider: Adjusts the center of the real axis (range: -2.0 to 1.0, default 0).

y_pan_slider: Adjusts the center of the imaginary axis (range: -1.5 to 1.5, default 0).

zoom_slider: Scales the view (range: 0.1 to 3.0, default 1.0; smaller values zoom in, larger zoom out).

x_pan_slider = Slider(start=-2.0, end=1.0, value=0, step=0.1, title="X Pan")
y_pan_slider = Slider(start=-1.5, end=1.5, value=0, step=0.1, title="Y Pan")
zoom_slider = Slider(start=0.1, end=3.0, value=1.0, step=0.1, title="Zoom")

Ok, now here is what took me the most time, and I had to research it because, well, because i must be dense. We need to add a JavaScript Callback for Slider Updates. This code updates the plot’s x and y ranges when sliders change, without recomputing the fractal (for performance). For reference: Javascript Callbacks In Bokeh.

zoom_factor: Scales the view (1.0 = original size, <1 zooms in, >1 zooms out).
x_center: Shifts the real axis center by x_pan.value from the initial center.
y_center: Shifts the imaginary axis center by y_pan.value.
x_width, y_height: Scale the original ranges by zoom_factor.
Updates p.x_range and p.y_range to reposition and resize the view.

The callback triggers whenever any slider’s value changes.

callback = CustomJS(args=dict(p=p, x_pan=x_pan_slider, y_pan=y_pan_slider, zoom=zoom_slider,
                             initial_x_min=initial_x_min, initial_x_max=initial_x_max,
                             initial_y_min=initial_y_min, initial_y_max=initial_y_max),
                    code="""
    const zoom_factor = zoom.value;
    const x_center = initial_x_min + (initial_x_max - initial_x_min) / 2 + x_pan.value;
    const y_center = initial_y_min + (initial_y_max - initial_y_min) / 2 + y_pan.value;
    const x_width = (initial_x_max - initial_x_min) * zoom_factor;
    const y_height = (initial_y_max - initial_y_min) * zoom_factor;
    p.x_range.start = x_center - x_width / 2;
    p.x_range.end = x_center + x_width / 2;
    p.y_range.start = y_center - y_height / 2;
    p.y_range.end = y_center + y_height / 2;
""")
x_pan_slider.js_on_change('value', callback)
y_pan_slider.js_on_change('value', callback)
zoom_slider.js_on_change('value', callback)

Ok, here is a long explanation which is important for the layout and display to understand what is happening computationally fully:

Layout: Arranges the plot and sliders vertically using column.
Display: Renders the interactive plot in the notebook.

X and Y Axis Labels and Complex Numbers

The x and y axes of the plot represent the real and imaginary parts of complex numbers in the plane where the Mandelbrot set is defined.

X-Axis (Real Part):
- Label: Implicitly represents the real component of a complex number $c=a+bi$ .
- Range: Initially spans -2.0 to 1.0 (covering the Mandelbrot set’s primary region).
- Interpretation: Each x-coordinate corresponds to the real part $a$ of a complex number $c$ . For example, x=−1.5 x = -1.5 x=−1.5 corresponds to a complex number with real part -1.5 $(e.g., c=−1.5+bi)$ .
- Role in Mandelbrot: The real part, combined with the imaginary part, defines the constant $c$ in the iterative formula $z_{n+1} = z_n^2 + c$ .
Y-Axis (Imaginary Part):
- Label: Implicitly represents the imaginary component of a complex number $c=a+bi$ .
- Range: Initially spans -1.5 to 1.5 (symmetric to capture the set’s structure).
- Interpretation: Each y-coordinate corresponds to the imaginary part $b$ (scaled by $i$ ). For example, y=0.5 y = 0.5 y=0.5 corresponds to a complex number with imaginary part $0.5i (e.g., c=a+0.5i)$ .
- Role in Mandelbrot: The imaginary part contributes to $c$ , affecting the iterative behavior.
Complex Plane:
- Each pixel in the plot corresponds to a complex number $c=x+yi$ , where $x$ is the x-coordinate (real part) and $y$ is the y-coordinate (imaginary part).
- The Mandelbrot set consists of points $c$ where the sequence $z_0 = 0$ , $z_{n+1} = z_n^2 + c$ remains bounded (doesn’t diverge to infinity).
- The color at each pixel reflects how quickly the sequence diverges (or if it doesn’t, it’s typically black).
Slider Effects:
- X Pan: Shifts the real part center, moving the view left or right in the complex plane.
- Y Pan: Shifts the imaginary part center, moving the view up or down.
- Zoom: Scales both real and imaginary ranges, zooming in (smaller zoom_factor) or out (larger zoom_factor). Zooming in reveals finer details of the fractal’s boundary.

layout = column(p, x_pan_slider, y_pan_slider, zoom_slider)
show(layout)

A couple of notes on general performance:

Performance: The code uses a JavaScript callback to update the view without recomputing the fractal, which is fast but limits resolution. For high-zoom levels, you’d need to recompute the fractal (not implemented here for simplicity).
Axis Labels: Bokeh doesn’t explicitly label axes as “Real Part” or “Imaginary Part,” but the numerical values correspond to these components. You could add explicit labels using p.xaxis.axis_label = “Real Part” and p.yaxis.axis_label = “Imaginary Part” if desired.
Fractal Detail: The fixed max_iter=100 limits detail at high zoom. For deeper zooms, max_iter should increase, and the fractal should be recomputed. Try adding a timer() function to check compute times.

Ok here is the full listing you can copypasta this into a notebook and run it as is. i suppose i could add gist.

import numpy as np
from bokeh.plotting import figure, show
from bokeh.io import output_notebook
from bokeh.models import LogColorMapper, ColumnDataSource, CustomJS
from bokeh.palettes import Viridis256
from bokeh.layouts import column
from bokeh.models.widgets import Slider
import warnings
warnings.filterwarnings('ignore')

# Enable Bokeh output in Jupyter Notebook
output_notebook()

# Parameters for the Mandelbrot set
width, height = 800, 600  # Image dimensions
initial_x_min, initial_x_max = -2.0, 1.0  # Initial X-axis range
initial_y_min, initial_y_max = -1.5, 1.5  # Initial Y-axis range
max_iter = 100  # Maximum iterations for divergence check

# Function to compute Mandelbrot set
def compute_mandelbrot(x_min, x_max, y_min, y_max, width, height, max_iter):
    x = np.linspace(x_min, x_max, width)
    y = np.linspace(y_min, y_max, height)
    X, Y = np.meshgrid(x, y)
    C = X + 1j * Y  # Complex plane

    Z = np.zeros_like(C)
    output = np.zeros_like(C, dtype=float)

    for i in range(max_iter):
        mask = np.abs(Z) <= 2
        Z[mask] = Z[mask] * Z[mask] + C[mask]
        output += mask

    return np.log1p(output)

# Initial Mandelbrot computation
output = compute_mandelbrot(initial_x_min, initial_x_max, initial_y_min, initial_y_max, width, height, max_iter)

# Create Bokeh plot
p = figure(width=width, height=height, x_range=(initial_x_min, initial_x_max), y_range=(initial_y_min, initial_y_max),
           title="Interactive Mandelbrot Fractal", toolbar_location="above")

# Use a color mapper for visualization
color_mapper = LogColorMapper(palette=Viridis256, low=output.min(), high=output.max())
image = p.image(image=[output], x=initial_x_min, y=initial_y_min, dw=initial_x_max - initial_x_min,
                dh=initial_y_max - initial_y_min, color_mapper=color_mapper)

# Create sliders for panning and zooming
x_pan_slider = Slider(start=-2.0, end=1.0, value=0, step=0.1, title="X Pan")
y_pan_slider = Slider(start=-1.5, end=1.5, value=0, step=0.1, title="Y Pan")
zoom_slider = Slider(start=0.1, end=3.0, value=1.0, step=0.1, title="Zoom")

# JavaScript callback to update plot ranges
callback = CustomJS(args=dict(p=p, x_pan=x_pan_slider, y_pan=y_pan_slider, zoom=zoom_slider,
                             initial_x_min=initial_x_min, initial_x_max=initial_x_max,
                             initial_y_min=initial_y_min, initial_y_max=initial_y_max),
                    code="""
    const zoom_factor = zoom.value;
    const x_center = initial_x_min + (initial_x_max - initial_x_min) / 2 + x_pan.value;
    const y_center = initial_y_min + (initial_y_max - initial_y_min) / 2 + y_pan.value;
    const x_width = (initial_x_max - initial_x_min) * zoom_factor;
    const y_height = (initial_y_max - initial_y_min) * zoom_factor;
    
    p.x_range.start = x_center - x_width / 2;
    p.x_range.end = x_center + x_width / 2;
    p.y_range.start = y_center - y_height / 2;
    p.y_range.end = y_center + y_height / 2;
""")

# Attach callback to sliders
x_pan_slider.js_on_change('value', callback)
y_pan_slider.js_on_change('value', callback)
zoom_slider.js_on_change('value', callback)

# Layout the plot and sliders
layout = column(p, x_pan_slider, y_pan_slider, zoom_slider)

# Display the layout
show(layout)

Here is the output screen shot: (NOTE i changed it to Magma256)

Mandlebrot Fractal with Interactive Bokeh Sliders.

So there ya have it. A little lab for fractals. Maybe i’ll extend this code and commit it. As a matter of fact, i should push all the SnakeByte code over the years. That said concerning the subject of Fractals there is a much deeper and longer discussion to be had relating fractals, the golden ratio $ϕ=1+ϕ1$ and the Fibonacci sequences (A sequence where each number is the sum of the two preceding ones: 0, 1, 1, 2, 3, 5, 8, 13, 21…) are deeply interconnected through mathematical patterns and structures that appear in nature, art, and geometry. The very essence of our lives.

Until Then.

#iwishyouwater <- Pavones,CR Longest Left in the Northern Hemisphere. i love it there.

Ted ℂ. Tanner Jr. (@tctjr) / X

MUZAK To Blog By: Arvo Pärt, Für Anna Maria, Spiegel im Spiegel (Mirror In Mirror) very fractal song. it is gorgeous.

References:

Here is a very short list of books on the subject. Ping me as I have several from philosophical to mathematical on said subject.

[1] Fractals for the Classroom: Strategic Activities Volume One by Peitgen, HeinzOtto; Jürgens, Hartmut; Saupe, Dietmar by Springer

Good Introduction Text. Has some fun experiments like creating fractals with video feedback.

[2] Fractals and Chaos: The Mandelbrot Set and Beyond

[3] Introduction to Dynamic Systems: Theory, Models, and Applications by Luenberger, David G. by Wiley

Great book on stability and things like Lorentz attractors and Lyapunov functions.

[4] The Fractal Geometry of Nature by Benoit B. Mandelbrot by W. H. Freeman & Co.

The book that started it all. I was lucky enough to see a lecture by him. Astounding.

[5] Nonlinear Dynamics and Chaos: With Applications to Physics, Biology, Chemistry, and Engineering By Steven H Strogatz

This is a tough slog and not for the faint of heart.

[6] Complexity: A Guided Tour by Mitchell, Melanie

Complexity: A Guided Tour by Mitchell, Melanie by OUP Us

Professor Mitchell wrote a great accessible book. i took her complexity class at the Sante Fe Institute.

[7]

SnakeByte[19] – Software As A Religion

Religion is regarded by the common people as true, by the wise as false, and by the rulers as useful.
Lucius Annaeus Seneca

Dalle’s Idea of Religion

First as always i hope everyone is safe oh Dear Readers. Secondly, i am going to write about something that i have been pondering for quite some time, probably close to two decades.

What i call Religion Of Warez (ROWZ).

This involves someone who i hold in the highest regard YOU the esteemed developer.

Marc Andreeson famously said “Software Is Eating The Word”. Here is the original blog:

“Why Software Is Eating The World” by Marc Andreeson

There is a war going on for your attention. There is a war going on for your thumb typing. There is a war going on for your viewership. There is a war going on for your selfies. There is a war going on for your emoticons. There is a war going on for github pull requests.

There is a war going on for the output of your prompting.

We have entered into The Great Cognitive Artificial Intelligence Arms Race (TGCAIAR) via camps of Large Languge Model foundational model creators.

The ability to deploy the warez needed to wage war on YOU Oh Dear Reader is much more complex from an ideological perspective. i speculate that Software if i may use that term as an entity is a non-theistic religion. Even within the Main Tabernacle of Software (MTOS) there are various fissures of said religions whether it be languages, architectures or processes.

A great blog can be found here -> Software Development: It’s a Religion.

Let us head over to the LazyWeb^TMand do a cursory search and see what we find^[1] concerning some comparison numbers for religions and software languages.

In going to wikipedia we find:

According to some estimates, there are roughly 4,200 religions, churches, denominations, religious bodies, faith groups, tribes, cultures, movements, ultimate concerns, which at some point in the future will be countless.
Wikipedia

Worldwide, more than eight-in-ten people identify with a religious group. i suppose even though we don’t like to be categorized, we like to be categorized as belonging to a particular sect. Here is a telling graphic:

Let us map this to just computer languages. Just how many computer languages are there? i guessed 6000 in aggregate. There are about 700 main programming languages, including esoteric coding languages. From what i can ascertain some only list notable languages add up to 245 languages. Another list called HOPL, which claims to include every programming language ever to exist, puts the total number of programming languages at 8,945.

So i wasn’t that far off.

Why so much kerfuffle on languages? For those that have ever had a language discussion, did it feel like you were discussing religion? Hmmmm?

Hey, my language does automatic heap management. Why are you messing with memory allocation via this dumb thing called pointers?

The Art of Computer Programming is mapping an idea into a binary computational translation (classical computing rates apply). This process is highly inefficient compared to having binary-to-binary discussions^[2]. Note we are not even considering architectures or methods in this mapping. Let us keep it at English to binary representation. What is the dimensionality reduction for that mapping? What is lost in translation?

For reference, i found a very precise and well-written blog here -> How Much Code Has Ever Been Written?

The calculation involves the number of lines of code ever written up to that point sans the exponential rate from the past two years:

2,781,000,000,000

Roughly 2.8 Trillion Lines of Code have been written in the past 20 years.
Sage McEnery 2020

As i always like to do i refer to Miriam Webster Dictionary. It holds a soft spot in my heart strings as i used to read it in grade school. (Yes i read the dictionary…)

Religion: Noun

re·li·gion (ruh·li·jen)

: a cause, principle, or system of beliefs held to with ardor and faith

Well, now, Dear Reader, the proverbial plot thickens. A System of Beliefs held to faith. Nowadays, religion is utilized as a concept today applied for a genus of social formations that includes several members, a type of which there are many tokens or facets.

If this is, in fact, the case, I will venture to say that Software could be considered a Religion.

One must then ask? Is there “a model” to the madness? Do we go the route of the core religions? Would we dare say the Belief System Of The Warez^[3] be included as a prominent religion?

I have said several times and will continue to say that Software is one of the greatest human endeavors of all time. It is at the essence of ideas incarnate.

It has been said that if you adopt the science, you adopt the ideology. Such hatred or fear of science has always been justified in the name of some ideology or other.

If we take this as the undertone for many new aspects of software, we see that the continuum of mind varies within the perception of the universe by which we are affected by said software. It is extremely canonical and first order.

Most often, we anthropomorphize most things and our software is no exception. It is as though it were an entity or even a thing in the most straightforward cases. It is, in fact, neither. It is just information imputed upon our minds via probabilistic models via non convex optimization methods. It is as if it was a Rorschach test that allowed many people to project their own meaning onto it (sound familiar?).

Let me say this a different way. With the advent of ChatGPT we seem to desire IT to be alive or reason somehow someway yet we don’t want it to turn into the terminator.

Stock market predictions – YES

Terminator – NO.

The Thou Shalts Will Kill You
~ Joseph Campbell

Now we are entering a time very quickly where we have “agentic” based large language models that can be scripted for specific tasks and then chained together to perform multiple tasks.

Now we have large language models distilling information gleaned from other LLMs. Who’s peanut butter is in the chocolate? Is there a limit of growth here for information? Asymptotic token computation if you will?

We are nowhere near the end of writing the Religion Of Warez (ROWZ) sacred texts compared to the Bible, Sutras, Vedas, the Upanishads, and the Bhagavad Gita, Quran, Agamas, Torah, Tao Te Ching or Avesta, even the Satanic Bible. My apologies if i left your special tome out it wasn’t on purpose. i could have listed thousands. BTW for reference there is even a religion called the Partridge Family Temple. The cult’s members believe the characters are archetypal gods and goddesses.

In fact we have just begun to author the Religion Of Warez (ROWZ) sacred text. The next chapters are going be accelerated and written via generative adversarial networks, stable fusion and reinforcement learning transformer technologies.

Which, then, one must ask which Diety are YOU going to choose?

i wrote a little stupid python script to show relationships of coding languages based on dates for the main ones. Simple key value stuff. All hail the gods K&R for creating C.

import networkx as nx
import matplotlib.pyplot as plt

def create_language_graph():
    G = nx.DiGraph()
    
    # Nodes (Programming languages with their release years)
    languages = {
        "Fortran": 1957, "Lisp": 1958, "COBOL": 1959, "ALGOL": 1960,
        "C": 1972, "Smalltalk": 1972, "Prolog": 1972, "ML": 1973,
        "Pascal": 1970, "Scheme": 1975, "Ada": 1980, "C++": 1983,
        "Objective-C": 1984, "Perl": 1987, "Haskell": 1990, "Python": 1991,
        "Ruby": 1995, "Java": 1995, "JavaScript": 1995, "PHP": 1995,
        "C#": 2000, "Scala": 2003, "Go": 2009, "Rust": 2010,
        "Common Lisp": 1984
    }
    
    # Adding nodes
    for lang, year in languages.items():
        G.add_node(lang, year=year)
    
    # Directed edges (influences between languages)
    edges = [
        ("Fortran", "C"), ("Lisp", "Scheme"), ("Lisp", "Common Lisp"),
        ("ALGOL", "Pascal"), ("ALGOL", "C"), ("C", "C++"), ("C", "Objective-C"),
        ("C", "Go"), ("C", "Rust"), ("Smalltalk", "Objective-C"),
        ("C++", "Java"), ("C++", "C#"), ("ML", "Haskell"), ("ML", "Scala"),
        ("Scheme", "JavaScript"), ("Perl", "PHP"), ("Python", "Ruby"),
        ("Python", "Go"), ("Java", "Scala"), ("Java", "C#"), ("JavaScript", "Rust")
    ]
    
    # Adding edges
    G.add_edges_from(edges)
    
    return G

def visualize_graph(G):
    plt.figure(figsize=(12, 8))
    pos = nx.spring_layout(G, seed=42)
    years = nx.get_node_attributes(G, 'year')
    
    # Color nodes based on their release year
    node_colors = [plt.cm.viridis((years[node] - 1950) / 70) for node in G.nodes]
    
    nx.draw(G, pos, with_labels=True, node_color=node_colors, edge_color='gray', 
            node_size=3000, font_size=10, font_weight='bold', arrows=True)
    
    plt.title("Programming Language Influence Graph")
    plt.show()

if __name__ == "__main__":
    G = create_language_graph()
    visualize_graph(G)

Programming Relationship Diagram

So, folks, let me know what you think. I am considering authoring a much longer paper comparing behaviors, taxonomies and the relationship between religions and software.

i would like to know if you think this would be a worthwhile piece?

Until Then,

#iwishyouwater <- Banzai Pipeline January 2023. Amazing.

@tctjr

MUZAK TO BLOG BY: Baroque Ensemble Of Vienna – “Classical Legends of Baroque”. i truly believe i was born in the wrong century when i listen to this level of music. Candidly J.S. Bach is by far my favorite composer going back to when i was in 3rd grade. BRAVO! Stupdendum Perficientur!

[1] Ever notice that searching is not finding? i prefer finding. Someone needs to trademark “Finding Not Searching” The same vein as catching ain’t fishing.

[2] Great paper from OpenAI on just this subject: two agents having a discussion (via reinforcement learning) : https://openai.com/blog/learning-to-communicate/ (more technical paper click HERE)

[3] For a great read i refer you to the The Ware Tetralogy by Rudy Rucker: Software (1982), Wetware (1988), Freeware (1997), Realware (2000)

[4] When the words “software” and “engineering” were first put together [Naur and Randell 1968] it was not clear exactly what the marriage of the two into the newly minted term really meant. Some people understood that the term would probably come to be defined by what our community did and what the world made of it. Since those days in the late 1960’s a spectrum of research and practice has been collected under the term.

SnakeByte[18] Function Optimization with OpenMDAO

DALLE’s Rendering of Non-Convex Optimization

In Life We Are Always Optimizing.

~ Professor Benard Widrow (inventor of the LMS algorithm)

Hello Folks! As always, i hope everyone is safe. i also hope everyone had a wonderful holiday break with food, family, and friends.

The first SnakeByte of the new year involves a subject near and dear to my heart: Optimization.

The quote above was from a class in adaptive signal processing that i took at Stanford from Professor Benard Widrow where he talked about how almost everything is a gradient type of optimization and “In Life We Are Always Optimizing.”. Incredibly profound if One ponders the underlying meaning thereof.

So why optimization?

Well glad you asked Dear Reader. There are essentially two large buckets of optimization: Convex and Non Convex optimization.

Convex optimization is an optimization problem has a single optimal solution that is also the global optimal solution. Convex optimization problems are efficient and can be solved for huge issues. Examples of convex optimization include maximizing stock market portfolio returns, estimating machine learning model parameters, and minimizing power consumption in electronic circuits.

Non-convex optimization is an optimization problem can have multiple locally optimal points, and it can be challenging to determine if the problem has no solution or if the solution is global. Non-convex optimization problems can be more difficult to deal with than convex problems and can take a long time to solve. Optimization algorithms like gradient descent with random initialization and annealing can help find reasonable solutions for non-convex optimization problems.

You can determine if a function is convex by taking its second derivative. If the second derivative is greater than or equal to zero for all values of x in an interval, then the function is convex. Ah calculus 101 to the rescue.

Caveat Emptor, these are very broad mathematically defined brush strokes.

So why do you care?

Once again, Oh Dear Reader, glad you asked.

Non-convex optimization is fundamentally linked to how neural networks work, particularly in the training process, where the network learns from data by minimizing a loss function. Here’s how non-convex optimization connects to neural networks:

A loss function is a global function for convex optimization. A “loss landscape” in a neural network refers to representation across the entire parameter space or landscape, essentially depicting how the loss value changes as the network’s weights are adjusted, creating a multidimensional surface where low points represent areas with minimal loss and high points represent areas with high loss; it allows researchers to analyze the geometry of the loss function to understand the training process and potential challenges like local minima. To note the weights can be millions, billions or trillions. It’s the basis for the cognitive AI arms race, if you will.

The loss function in neural networks, measures the difference between predicted and true outputs, is often a highly complex, non-convex function. This is due to:

The multi-layered structure of neural networks, where each layer introduces non-linear transformations and the high dimensionality of the parameter space, as networks can have millions, billions or trillions of parameters (weights and biases vectors).

As a result, the optimization process involves navigating a rugged loss landscape with multiple local minima, saddle points, and plateaus.

Optimization Algorithms in Non-Convex Settings

Training a neural network involves finding a set of parameters that minimize the loss function. This is typically done using optimization algorithms like gradient descent and its variants. While these algorithms are not guaranteed to find the global minimum in a non-convex landscape, they aim to reach a point where the loss is sufficiently low for practical purposes.

This leads to the latest SnakeBtye[18]. The process of optimizing these parameters is often called hyperparameter optimization. Also, relative to this process, designing things like aircraft wings, warehouses, and the like is called Multi-Objective Optimization, where you have multiple optimization points.

As always, there are test cases. In this case, you can test your optimization algorithm on a function called The Himmelblau’s function. The Himmelblau Function was introduced by David Himmelblau in 1972 and is a mathematical benchmark function used to test the performance and robustness of optimization algorithms. It is defined as:

$f(x, y) = (x^2 + y - 11)^2 + (x + y^2 - 7)^2$

Using Wolfram Mathematica to visualize this function (as i didn’t know what it looked like…) relative to solving for $f(x,y)$ :

Wolfram Plot Of The Himmelblau Function

This function is particularly significant in optimization and machine learning due to its unique landscape, which includes four global minima located at distinct points. These minima create a challenging environment for optimization algorithms, especially when dealing with non-linear, non-convex search spaces. Get the connection to large-scale neural networks? (aka Deep Learnin…)

The Himmelblau’s function is continuous and differentiable, making it suitable for gradient-based methods while still being complex enough to test heuristic approaches like genetic algorithms, particle swarm optimization, and simulated annealing. The function’s four minima demand algorithms to effectively explore and exploit the gradient search space, ensuring that solutions are not prematurely trapped in local optima.

Researchers use it to evaluate how well an algorithm navigates a multi-modal surface, balancing exploration (global search) with exploitation (local refinement). Its widespread adoption has made it a standard in algorithm development and performance assessment.

Several types of libraries exist to perform Multi-Objective or Parameter Optimization. This blog concerns one that is extremely flexible, called OpenMDAO.

What Does OpenMDAO Accomplish, and Why Is It Important?

OpenMDAO (Open-source Multidisciplinary Design Analysis and Optimization) is an open-source framework developed by NASA to facilitate multidisciplinary design, analysis, and optimization (MDAO). It provides tools for integrating various disciplines into a cohesive computational framework, enabling the design and optimization of complex engineering systems.

Key Features of OpenMDAO Integration:

OpenMDAO allows engineers and researchers to couple different models into a unified computational graph, such as aerodynamics, structures, propulsion, thermal systems, and hyperparameter machine learning. This integration is crucial for studying interactions and trade-offs between disciplines.

Automatic Differentiation:

A standout feature of OpenMDAO is its support for automatic differentiation, which provides accurate gradients for optimization. These gradients are essential for efficient gradient-based optimization techniques, particularly in high-dimensional design spaces. Ah that calculus 101 stuff again.

It supports various optimization methods, including gradient-based and heuristic approaches, allowing it to handle linear and non-linear problems effectively.

By making advanced optimization techniques accessible, OpenMDAO facilitates cutting-edge research in system design and pushes the boundaries of what is achievable in engineering.

Lo and Behold! OpenMDAO itself is a Python library! It is written in Python and designed for use within the Python programming environment. This allows users to leverage Python’s extensive ecosystem of libraries while building and solving multidisciplinary optimization problems.

So i had the idea to use and test OpenMDAO on The Himmelblau function. You might as well test an industry-standard library on an industry-standard function!

First things first, pip install or anaconda:

>> pip install 'openmdao[all]'

Next, being We are going to be plotting stuff within JupyterLab i always forget to enable it with the majik command:

## main code
%matplotlib inline

Ok lets get to the good stuff the code.

# add your imports here:
import numpy as np
import matplotlib.pyplot as plt
from openmdao.api import Problem, IndepVarComp, ExecComp, ScipyOptimizeDriver
# NOTE: the scipy import 

# Define the OpenMDAO optimization problem - almost like self.self
prob = Problem()

# Add independent variables x and y and make a guess of X and Y:
indeps = prob.model.add_subsystem('indeps', IndepVarComp(), promotes_outputs=['*'])
indeps.add_output('x', val=0.0)  # Initial guess for x
indeps.add_output('y', val=0.0)  # Initial guess for y

# Add the Himmelblau objective function. See the equation from the Wolfram Plot?
prob.model.add_subsystem('obj_comp', ExecComp('f = (x**2 + y - 11)**2 + (x + y**2 - 7)**2'), promotes_inputs=['x', 'y'], promotes_outputs=['f'])

# Specify the optimization driver and eplison error bounbs.  ScipyOptimizeDriver wraps the optimizers in *scipy.optimize.minimize*. In this example, we use the SLSQP optimizer to find the minimum of the "Paraboloid" type optimization:
prob.driver = ScipyOptimizeDriver()
prob.driver.options['optimizer'] = 'SLSQP'
prob.driver.options['tol'] = 1e-6

# Set design variables and bounds
prob.model.add_design_var('x', lower=-10, upper=10)
prob.model.add_design_var('y', lower=-10, upper=10)

# Add the objective function Himmelblau via promotes.output['f']:
prob.model.add_objective('f')

# Setup and run the problem and cross your fingers:
prob.setup()
prob.run_driver()

Dear Reader, You should see something like this:

Optimization terminated successfully (Exit mode 0)
Current function value: 9.495162792777827e-11
Iterations: 10
Function evaluations: 14
Gradient evaluations: 10
Optimization Complete
———————————–
Optimal x: [3.0000008]
Optimal y: [1.99999743]
Optimal f(x, y): [9.49516279e-11]

So this optimized the minima of the function relative to the bounds of $x$ and $y$ and $\epsilon$ .

Now, lets look at the cool eye candy in several ways:

# Retrieve the optimized values
x_opt = prob['x']
y_opt = prob['y']
f_opt = prob['f']

print(f"Optimal x: {x_opt}")
print(f"Optimal y: {y_opt}")
print(f"Optimal f(x, y): {f_opt}")

# Plot the function and optimal point
x = np.linspace(-6, 6, 400)
y = np.linspace(-6, 6, 400)
X, Y = np.meshgrid(x, y)
Z = (X**2 + Y - 11)**2 + (X + Y**2 - 7)**2

plt.figure(figsize=(8, 6))
contour = plt.contour(X, Y, Z, levels=50, cmap='viridis')
plt.clabel(contour, inline=True, fontsize=8)
plt.scatter(x_opt, y_opt, color='red', label='Optimal Point')
plt.title("Contour Plot of f(x, y) with Optimal Point")
plt.xlabel("x")
plt.ylabel("y")
plt.legend()
plt.colorbar(contour)
plt.show()

Now, lets try something that looks a little more exciting:

import numpy as np
import matplotlib.pyplot as plt

# Define the function
def f(x, y):
    return (x**2 + y - 11)**2 + (x + y**2 - 7)**2

# Generate a grid of x and y values
x = np.linspace(-6, 6, 500)
y = np.linspace(-6, 6, 500)
X, Y = np.meshgrid(x, y)
Z = f(X, Y)

# Plot the function
plt.figure(figsize=(8, 6))
plt.contourf(X, Y, Z, levels=100, cmap='magma')  # Gradient color
plt.colorbar(label='f(x, y)')
plt.title("Plot of f(x, y) = (x² + y - 11)² + (x + y² - 7)²")
plt.xlabel("x")
plt.ylabel("y")
plt.show()

That is cool looking.

Ok, lets take this even further:

We can compare it to the Wolfram Function 3D plot:

from mpl_toolkits.mplot3d import Axes3D

# Create a 3D plot
fig = plt.figure(figsize=(10, 8))
ax = fig.add_subplot(111, projection='3d')

# Plot the surface
ax.plot_surface(X, Y, Z, cmap='magma', edgecolor='none', alpha=0.9)

# Labels and title
ax.set_title("3D Plot of f(x, y) = (x² + y - 11)² + (x + y² - 7)²")
ax.set_xlabel("x")
ax.set_ylabel("y")
ax.set_zlabel("f(x, y)")

plt.show()

Which gives you a 3D plot of the function:

3D Plot of f(x, y) = (x² + y – 11)² + (x + y² – 7)²

While this was a toy example for OpenMDAO, it is also a critical tool for advancing multidisciplinary optimization in engineering. Its robust capabilities, open-source nature, and focus on efficient computation of derivatives make it invaluable for researchers and practitioners seeking to tackle the complexities of modern system design.

i hope you find it useful.

Until Then,

#iwishyouwater <- The EDDIE – the most famous big wave contest ran this year. i saw it on the beach in 2004 and got washed across e rivermouth on a 60ft clean up set that washed out the river.

@tctjr

Music To Blog By: GodSpeedYouBlackEmperor “No Title As of 13 February 2024” – great band if you enjoy atmospheric compositional music.

SnakeByte[17] The Metropolis Algorithm

Frame Grab From the movie Metropolis 1927

Who told you to attack the machines, you fools? Without them you’ll all die!!

~ Grot, the Guardian of the Heart Machine

First, as always, Oh Dear Reader, i hope you are safe. There are many unsafe places in and around the world in this current time. Second, this blog is a SnakeByte[] based on something that i knew about but had no idea it was called this by this name.

Third, relative to this, i must confess, Oh, Dear Reader, i have a disease of the bibliomaniac kind. i have an obsession with books and reading. “They” say that belief comes first, followed by admission. There is a Japanese word that translates to having so many books you cannot possibly read them all. This word is tsundoku. From the website (if you click on the word):

“Tsundoku dates from the Meiji era, and derives from a combination of tsunde-oku (to let things pile up) and dokusho (to read books). It can also refer to the stacks themselves. Crucially, it doesn’t carry a pejorative connotation, being more akin to bookworm than an irredeemable slob.”

Thus, while perusing a math-related book site, i came across a monograph entitled “The Metropolis Algorithm: Theory and Examples” by C Douglas Howard [1].

i was intrigued, and because it was 5 bucks (Side note: i always try to buy used and loved books), i decided to throw it into the virtual shopping buggy.

Upon receiving said monograph, i sat down to read it, and i was amazed to find it was closely related to something I was very familiar with from decades ago. This finally brings us to the current SnakeByte[].

The Metropolis Algorithm is a method in computational statistics used to sample from complex probability distributions. It is a type of Markov Chain Monte Carlo (MCMC) algorithm (i had no idea), which relies on Markov Chains to generate a sequence of samples that can approximate a desired distribution, even when direct sampling is complex. Yes, let me say that again – i had no idea. Go ahead LazyWeb^TM laugh!

So let us start with how the Metropolis Algorithm and how it relates to Markov Chains. (Caveat Emptor: You will need to dig out those statistics books and a little linear algebra.)

Markov Chains Basics

A Markov Chain is a mathematical system that transitions from one state to another in a state space. It has the property that the next state depends only on the current state, not the sequence of states preceding it. This is called the Markov property. The algorithm was introduced by Metropolis et al. (1953) in a Statistical Physics context and was generalized by Hastings (1970). It was considered in the context of image analysis (Geman and Geman, 1984) and data augmentation (Tanner (I’m not related that i know of…) and Wong, 1987). However, its routine use in statistics (especially for Bayesian inference) did not take place until Gelfand and Smith (1990) popularised it. For modern discussions of MCMC, see e.g. Tierney (1994), Smith and Roberts (1993), Gilks et al. (1996), and Roberts and Rosenthal (1998b).

Ergo, the name Metropolis-Hastings algorithm. Once again, i had no idea.

Anyhow,

A Markov Chain can be described by a set of states $S$ and a transition matrix $P$ , where each element $P_{ij}$ represents the probability of transitioning from state $i$ to state $j$ .

Provide The Goal: Sampling from a Probability Distribution $\pi(x)$

In many applications (e.g., statistical mechanics, Bayesian inference, as mentioned), we are interested in sampling from a complex probability distribution $\pi(x)$ . This distribution might be difficult to sample from directly, but we can use a Markov Chain to create a sequence of samples that, after a certain period (called the burn-in period), will approximate $\pi(x)$ .

Ok Now: The Metropolis Algorithm

The Metropolis Algorithm is one of the simplest MCMC algorithms to generate samples from $\pi(x)$ . It works by constructing a Markov Chain whose stationary distribution is the desired probability distribution $\pi(x)$ . A stationary distribution is a probability distribution that remains the same over time in a Markov chain. Thus it can describe the long-term behavior of a chain, where the probabilities of being in each state do not change as time passes. (Whatever time is, i digress.)

The key steps of the algorithm are:

Initialization

Start with an initial guess $x_0$ , a point in the state space. This point can be chosen randomly or based on prior knowledge.

Proposal Step

From the current state $x_t$ , propose a new state $x^*$ using a proposal distribution $q(x^*|x_t)$ , which suggests a candidate for the next state. This proposal distribution can be symmetric (e.g., a normal distribution centered at $x_t$ ) or asymmetric.

Acceptance Probability

Calculate the acceptance probability $\alpha$ for moving from the current state $x_t$ to the proposed state $x^*$ :

$\alpha = \min \left(1, \frac{\pi(x^) q(x_t | x^)}{\pi(x_t) q(x^* | x_t)} \right)$

In the case where the proposal distribution is symmetric (i.e., $q(x^|x_t) = q(x_t|x^))$ , the formula simplifies to:

$\alpha = \min \left(1, \frac{\pi(x^*)}{\pi(x_t)} \right)$

Acceptance or Rejection

Generate a random number $u$ from a uniform distribution $U(0, 1)$
If $u \leq \alpha$ , accept the proposed state $x^*$ , i.e., $set x_{t+1} = x^*$ .
If $u > \alpha$ , reject the proposed state and remain at the current state, i.e., $set x_{t+1} = x_t$ .

Repeat

Repeat the proposal, acceptance, and rejection steps to generate a Markov Chain of samples.

Convergence and Stationary Distribution:

Over time, as more samples are generated, the Markov Chain converges to a stationary distribution. The stationary distribution is the target distribution $\pi(x)$ , meaning the samples generated by the algorithm will approximate $\pi(x)$ more closely as the number of iterations increases.

Applications:

The Metropolis Algorithm is widely used in various fields such as Bayesian statistics, physics (e.g., in the simulation of physical systems), machine learning, and finance. It is especially useful for high-dimensional problems where direct sampling is computationally expensive or impossible.

Key Features of the Metropolis Algorithm:

Simplicity: It’s easy to implement and doesn’t require knowledge of the normalization constant of $\pi(x)$ , which can be difficult to compute.
Flexibility: It works with a wide range of proposal distributions, allowing the algorithm to be adapted to different problem contexts.
Efficiency: While it can be computationally demanding, the algorithm can provide high-quality approximations to complex distributions with well-chosen proposals and sufficient iterations.

The Metropolis-Hastings Algorithm is a more general version that allows for non-symmetric proposal distributions, expanding the range of problems the algorithm can handle.

Now let us code it up:

i am going to assume the underlying distribution is Gaussian with a time-dependent mean $\mu_t$ , which changes slowly over time. We’ll use a simple time-series analytics setup to sample this distribution using the Metropolis Algorithm and plot the results. Note: When the target distribution is Gaussian (or close to Gaussian), the algorithm can converge more quickly to the true distribution because of the symmetric smooth nature of the normal distribution.

import numpy as np
import matplotlib.pyplot as plt

# Time-dependent mean function (example: sinusoidal pattern)
def mu_t(t):
    return 10 * np.sin(0.1 * t)

# Target distribution: Gaussian with time-varying mean mu_t and fixed variance
def target_distribution(x, t):
    mu = mu_t(t)
    sigma = 1.0  # Assume fixed variance for simplicity
    return np.exp(-0.5 * ((x - mu) / sigma) ** 2)

# Metropolis Algorithm for time-series sampling
def metropolis_sampling(num_samples, initial_x, proposal_std, time_steps):
    samples = np.zeros(num_samples)
    samples[0] = initial_x

    # Iterate over the time steps
    for t in range(1, num_samples):
        # Propose a new state based on the current state
        x_current = samples[t - 1]
        x_proposed = np.random.normal(x_current, proposal_std)

        # Acceptance probability (Metropolis-Hastings step)
        acceptance_ratio = target_distribution(x_proposed, time_steps[t]) / target_distribution(x_current, time_steps[t])
        acceptance_probability = min(1, acceptance_ratio)

        # Accept or reject the proposed sample
        if np.random.rand() < acceptance_probability:
            samples[t] = x_proposed
        else:
            samples[t] = x_current

    return samples

# Parameters
num_samples = 10000  # Total number of samples to generate
initial_x = 0.0      # Initial state
proposal_std = 0.5   # Standard deviation for proposal distribution
time_steps = np.linspace(0, 1000, num_samples)  # Time steps for temporal evolution

# Run the Metropolis Algorithm
samples = metropolis_sampling(num_samples, initial_x, proposal_std, time_steps)

# Plot the time series of samples and the underlying mean function
plt.figure(figsize=(12, 6))

# Plot the samples over time
plt.plot(time_steps, samples, label='Metropolis Samples', alpha=0.7)

# Plot the underlying time-varying mean (true function)
plt.plot(time_steps, mu_t(time_steps), label='True Mean ', color='red', linewidth=2)

plt.title("Metropolis Algorithm Sampling with Time-Varying Gaussian Distribution")
plt.xlabel("Time")
plt.ylabel("Sample Value")
plt.legend()
plt.grid(True)
plt.show()

Output of Python Script Figure 1.0

Ok, What’s going on here?

For the Target Distribution:

The function $mu_t(t)$ defines a time-varying mean for the distribution. In this example, it follows a sinusoidal pattern.
The function target_distribution(x, t) models a Gaussian distribution with mean $\mu_t$ and a fixed variance (set to 1.0).

Metropolis Algorithm:

The metropolis_sampling function implements the Metropolis algorithm. It iterates over time, generating samples from the time-varying distribution. The acceptance probability is calculated using the target distribution at each time step.

Proposal Distribution:

A normal distribution centered around the current state with standard deviation proposal_std is used to propose new states.

Temporal Evolution:

The time steps are generated using np.linspace to simulate temporal evolution, which can be used in time-series analytics.

Plot The Results:

The results are plotted, showing the samples generated by the Metropolis algorithm as well as the true underlying mean function $\mu_t$ (in red).

The plot shows the Metropolis samples over time, which should cluster around the time-varying mean $\mu_t$ of the distribution. As time progresses, the samples follow the red curve (the true mean) as time moves on like and arrow in this case.

Now you are probably asking “Hey is there a more pythonic library way to to this?”. Oh Dear Reader i am glad you asked! Yes There Is A Python Library! AFAIC PyMC started it all. Most probably know it as PyMc3 (formerly known as…). There is a great writeup here: History of PyMc.

We are golden age of probabilistic programming.

~ Chris Fonnesbeck (creator of PyMC)

Lets convert it using PyMC. Steps to Conversion:

Define the probabilistic model using PyMC’s modeling syntax.
Specify the Gaussian likelihood with the time-varying mean $\mu_t$ .
Use PyMC’s built-in Metropolis sampler.
Visualize the results similarly to how we did earlier.

import pymc as pm
import numpy as np
import matplotlib.pyplot as plt

# Time-dependent mean function (example: sinusoidal pattern)
def mu_t(t):
    return 10 * np.sin(0.1 * t)

# Set random seed for reproducibility
np.random.seed(42)

# Number of time points and samples
num_samples = 10000
time_steps = np.linspace(0, 1000, num_samples)

# PyMC model definition
with pm.Model() as model:
    # Prior for the time-varying parameter (mean of Gaussian)
    mu_t_values = mu_t(time_steps)

    # Observational model: Normally distributed samples with time-varying mean and fixed variance
    sigma = 1.0  # Fixed variance
    x = pm.Normal('x', mu=mu_t_values, sigma=sigma, shape=num_samples)

    # Use the Metropolis sampler explicitly
    step = pm.Metropolis()

    # Run MCMC sampling with the Metropolis step
    samples_all = pm.sample(num_samples, tune=1000, step=step, chains=5, return_inferencedata=False)

# Extract one chain's worth of samples for plotting
samples = samples_all['x'][0]  # Taking only the first chain

# Plot the time series of samples and the underlying mean function
plt.figure(figsize=(12, 6))

# Plot the samples over time
plt.plot(time_steps, samples, label='PyMC Metropolis Samples', alpha=0.7)

# Plot the underlying time-varying mean (true function)
plt.plot(time_steps, mu_t(time_steps), label='True Mean ', color='red', linewidth=2)

plt.title("PyMC Metropolis Sampling with Time-Varying Gaussian Distribution")
plt.xlabel("Time")
plt.ylabel("Sample Value")
plt.legend()
plt.grid(True)
plt.show()

When you execute this code you will see the following status bar:

It will be a while. Go grab your favorite beverage and take a walk…..

Output of Python Script Figure 1.1

Key Differences from the Previous Code:

PyMC Model Usage Definition:
In PyMC, the model is defined using the pm.Model() context. The x variable is defined as a Normal distribution with the time-varying mean $\mu_t$ . Instead of manually implementing the acceptance probability, PyMC handles this automatically with the specified sampler.

Metropolis Sampler:
PyMC allows us to specify the sampling method. Here, we explicitly use the Metropolis algorithm with pm.Metropolis().

Samples Parameter:
We specify shape=num_samples in the pm.Normal() distribution to indicate that we want a series of samples for each time step.

Plotting:
The resulting plot will show the sampled values using the PyMC Metropolis algorithm compared with the true underlying mean, similar to the earlier approach. Now, samples has the same shape as time_steps (in this case, both with 10,000 elements), allowing you to plot the sample values correctly against the time points; otherwise, the x and y axes would not align.

NOTE: We used this library at one of our previous health startups with great success.

Optimizations herewith include several. There is a default setting in PyMC which is called NUTS.
No need to manually set the number of leapfrog steps. NUTS automatically determines the optimal number of steps for each iteration, preventing inefficient or divergent sampling. NUTS automatically stops the trajectory when it detects that the particle is about to turn back on itself (i.e., when the trajectory “U-turns”). A U-turn means that continuing to move in the same direction would result in redundant exploration of the space and inefficient sampling. When NUTS detects this, it terminates the trajectory early, preventing unnecessary steps. Also the acceptance rates on convergence are higher.

There are several references to this set of algorithms. It truly a case of both mathematical and computational elegance.

Of course you have to know what the name means. They say words have meanings. Then again one cannot know everything.

Until Then,

#iwishyouwater <- Of all places Alabama getting the memo From Helene 2024

𝕋𝕖𝕕 ℂ. 𝕋𝕒𝕟𝕟𝕖𝕣 𝕁𝕣. (@tctjr) / X

Music To Blog By: View From The Magicians Window, The Psychic Circle

References:

[1] The Metropolis Algorithm: Theory and Examples by C Douglas Howard

[2] The Metropolis-Hastings Algorithm: A note by Danielle Navarro

[3] Github code for Sample Based Inference by bashhwu

Entire Metropolis Movie For Your Viewing Pleasure. (AFAIC The most amazing Sci-Fi movie besides BladeRunner)

May 29, 2024May 29, 2024 TCTJr Complexity_Theory|Computing_History|Snake_Bytes

SnakeByte[16]: Enhancing Your Code Analysis with pyastgrep

Dalle 3’s idea of an Abstract Syntax Tree in $R^3$ space

If you would know strength and patience, welcome the company of trees.
~ Hal Borland

First, I hope everyone is safe. Second, I am changing my usual SnakeByte [] stance process. I am pulling this from a website I ran across. I saw the library mentioned, so I decided to pull from the LazyWebTM instead of the usual snake-based tomes I have in my library.

As a Python developer, understanding and navigating your codebase efficiently is crucial, especially as it grows in size and complexity. Trust me, it will, as does Entropy. Traditional search tools like grep or IDE-based search functionalities can be helpful, but they cannot often “‘understand” the structure of Python code – sans some of the Co-Pilot developments. (I’m using understand here *very* loosely Oh Dear Reader).

This is where pyastgrep it comes into play, offering a powerful way to search and analyze your Python codebase using Abstract Syntax Trees (ASTs). While going into the theory of ASTs is tl;dr for a SnakeByte[] , and there appears to be some ambiguity on the history and definition of Who actually invented ASTs, i have placed some references at the end of the blog for your reading pleasure, Oh Dear Reader. In parlance, if you have ever worked on compilers or core embedded systems, Abstract Syntax Trees are data structures widely used in compilers and the like to represent the structure of program code. An AST is usually the result of the syntax analysis phase of a compiler. It often serves as an intermediate representation of the program through several stages that the compiler requires and has a strong impact on the final output of the compiler.

So what is the Python Library that you speak of? i’m Glad you asked.

What is `pyastgrep`?

pyastgrep is a command-line tool designed to search Python codebases with an understanding of Python’s syntax and structure. Unlike traditional text-based search tools, pyastgrep it leverages the AST, allowing you to search for specific syntactic constructs rather than just raw text. This makes it an invaluable tool for code refactoring, auditing, and general code analysis.

Why Use `pyastgrep`?

Here are a few scenarios where pyastgrep excels:

Refactoring: Identify all instances of a particular pattern, such as function definitions, class instantiations, or specific argument names.
Code Auditing: Find usages of deprecated functions, unsafe code patterns, or adherence to coding standards.
Learning: Explore and understand unfamiliar codebases by searching for specific constructs.

I have a mantra: Reduce, Refactor, and Reuse. Please raise your hand of y’all need to refactor your code? (C’mon now no one is watching… tell the truth…). See if it is possible to reduce the code footprint, refactor the code into more optimized transforms, and then let others reuse it across the enterprise.

Getting Started with `pyastgrep`

Let’s explore some practical examples of using pyastgrep to enhance your code analysis workflow.

Installing `pyastgrep`

Before we dive into how to use pyastgrep, let’s get it installed. You can install pyastgrep via pip:

(base)tcjr% pip install pyastgrep #dont actually type the tctjr part that is my virtualenv

Example 1: Finding Function Definitions

Suppose you want to find all function definitions in your codebase. With pyastgrep, this is straightforward:

pyastgrep 'FunctionDef'

This command searches for all function definitions (FunctionDef) in your codebase, providing a list of files and line numbers where these definitions occur. Ok pretty basic string search.

Example 2: Searching for Specific Argument Names

Imagine you need to find all functions that take an argument named config. This is how you can do it:

pyastgrep 'arg(arg=config)'

This query searches for function arguments named config, helping you quickly locate where configuration arguments are being used.

Example 3: Finding Class Instantiations

To find all instances where a particular class, say MyClass, is instantiated, you can use:

pyastgrep 'Call(func=Name(id=MyClass))'

This command searches for instantiations of MyClass, making it easier to track how and where specific classes are utilized in your project.

Advanced Usage of `pyastgrep`

For more complex queries, you can combine multiple AST nodes. For instance, to find all print statements in your code, you might use:

pyastgrep 'Call(func=Name(id=print))'

This command finds all calls to the print function. You can also use more detailed queries to find nested structures or specific code patterns.

Integrating `pyastgrep` into Your Workflow

Integrating pyastgrep into your development workflow can greatly enhance your ability to analyze and maintain your code. Here are a few tips:

Pre-commit Hooks: Use pyastgrep in pre-commit hooks to enforce coding standards or check for deprecated patterns.
Code Reviews: Employ pyastgrep during code reviews to quickly identify and discuss specific code constructs.
Documentation: Generate documentation or code summaries by extracting specific patterns or structures from your codebase.

Example Script

To get you started, here’s a simple Python script using pyastgrep to search for all function definitions in a directory:

import os
from subprocess import run

def search_function_definitions(directory):
    result = run(['pyastgrep', 'FunctionDef', directory], capture_output=True, text=True)
    print(result.stdout)

if __name__ == "__main__":
    directory = "path/to/your/codebase" #yes this is not optimal folks just an example.
    search_function_definitions(directory)

Replace "path/to/your/codebase" with the actual path to your Python codebase, and run the script to see pyastgrep in action.

Conclusion

pyastgrep is a powerful tool that brings the capabilities of AST-based searching to your fingertips. Understanding and leveraging the syntax and structure of your Python code, pyastgrep allows for more precise and meaningful code searches. Whether you’re refactoring, auditing, or simply exploring code, pyastgrep it can significantly enhance your productivity and code quality. This is a great direct addition to your arsenal. Hope it helps and i hope you found this interesting.

Until Then,

#iwishyouwater <- The best of the best at Day1 Tahiti Pro presented by Outerknown 2024

𝕋𝕖𝕕 ℂ. 𝕋𝕒𝕟𝕟𝕖𝕣 𝕁𝕣. (@tctjr) / X

MUZAK to Blog By: SweetLeaf: A Stoner Rock Salute to Black Sabbath. While i do not really like bands that do covers, this is very well done. For other references to the Best Band In Existence ( Black Sabbath) i also refer you to Nativity in Black Volumes 1&2.

References:

[1] Basics Of AST

[2] The person who made pyastgrep

[3] Wikipedia page on AST

May 22, 2024May 23, 2024 TCTJr Snake_Bytes

SnakeByte [15]: Debugging With pdb In The Trenches

Dalle3 idea of debugging code from the view of the code itself.

If debugging is the process of removing software bugs, then programming must be the process of putting them in.
~ Edsger W. Dijkstra

Image Explanation: Above is the image showing the perspective of debugging code from the viewpoint of the code itself. The scene features lines of code on a large monitor, with sections highlighted to indicate errors. In the foreground, anthropomorphic code characters are working to fix these highlighted sections, set against a digital landscape of code lines forming a cityscape.

So meta and canonical.

In any event Dear Readers, it is Wednesday! That means as usual everyday is Wednesday in a startup, you actually work at a company where you enjoy the work or it is SnakeByte [ ] time!

i haven’t written a SnakeByte is quite some time. Also, recently, in a previous blog, I mentioned that I had a little oversite on my part, and my blog went down. i didn’t have alerting turned on ye ole AWS and those gosh darn binlogs where not deleting in ye ole WordPress as such we took the proverbial digger into downtime land. i re-provisioned with an additional sizing of the volume and changed the disc from magnetic to SSD, turned on alerts and we are back in business.

So per my usual routine i grab one of the python books from the book shelf randomly open then read about command or commands and attempt to write a blog as fast as possible.

Today’s random command from Python In A Nutshell is pdb, the Python Debugger. i’ll walk you through the basic of using pdb to debug your Python code, which, as it turns out is better than a bunch of print().

Getting Started with pdb

To leverage pdb, import it in your Python script. You can then set breakpoints manually with pdb.set_trace(). When the execution hits this line, the script pauses, allowing you to engage in an interactive debugging session.

Consider this straightforward example:

# example.py
import pdb

def add(a, b):
    result = a + b
    return result

def main():
    x = 10
    y = 20
    pdb.set_trace()  # Breakpoint set here
    total = add(x, y)
    print(f'Total: {total}')

if __name__ == '__main__':
    main()

Here, we have a simple add function and a main function that invokes add. The pdb.set_trace() in the main function sets a breakpoint where the debugger will halt execution.

Running the Code with pdb

Execute the script from the command line to initiate the debugger:

python example.py

When the execution reaches pdb.set_trace(), you will see the debugger prompt ((Pdb)):

> /path/to/example.py(11)main()
-> total = add(x, y)
(Pdb)

Essential pdb Commands

Once at the debugger prompt, several commands can help you navigate and inspect your code. Key commands include:

l (list): Displays the surrounding source code.
n (next): Moves to the next line within the same function.
s (step): Steps into the next function call.
c (continue): Resumes execution until the next breakpoint.
p (print): Evaluates and prints an expression.
q (quit): Exits the debugger and terminates the program.

Let’s walk through using these commands in our example:

List the surrounding code:

Pdb) 1
  6         def main():
  7             x = 10
  8             y = 20
  9             pdb.set_trace()  # Breakpoint here
 10  ->         total = add(x, y)
 11             print(f'Total: {total}')
 12     
 13         if __name__ == '__main__':
 14             main()

Print variable values:

(Pdb) p x
10
(Pdb) p y
20

Step into the add function:

(Pdb) s
> /path/to/example.py(3)add()
-> result = a + b
(Pdb)

Inspect parameters a and b:

(Pdb) p a
10
(Pdb) p b
20

Continue execution:

(Pdb) c
Total: 30

Advanced pdb Features

For more nuanced debugging needs, pdb offers advanced capabilities:

Conditional Breakpoints: Trigger breakpoints based on specific conditions:

if x == 10:
    pdb.set_trace()

Post-Mortem Debugging: Analyze code after an exception occurs:

import pdb

def faulty_function():
    return 1 / 0

try:
    faulty_function()
except ZeroDivisionError:
    pdb.post_mortem() #they need to add a pre-mortem what can go wrong will...

Command Line Invocation: Run a script under pdb control directly from the command line like the first simple example:

python -m pdb example.py

Effective debugging is crucial for building robust and maintainable software. pdb provides a powerful, interactive way to understand and fix your Python code. By mastering pdb, you can streamline your debugging process, saving time and enhancing your productivity.

pdb, the Python Debugger, is an indispensable tool for any developer. It allows us to set breakpoints, step through code, inspect variables, and evaluate expressions on the fly. This level of control is crucial for diagnosing and resolving issues efficiently. While i used cli in the examples it also works with Jupyter Notebooks.

We’ve covered the basics and advanced features of pdb, equipping you with the knowledge to tackle even the most challenging bugs. As you integrate these techniques into your workflow, you’ll find that debugging becomes less of a chore and more of a strategic advantage unless you create a perfect design which is no code at all!

Until then,

#iwishyouwater <- La Vaca Gigante Wipeouts 2024

𝕋𝕖𝕕 ℂ. 𝕋𝕒𝕟𝕟𝕖𝕣 𝕁𝕣. (@tctjr)

Muzak to Blog By: Apostrophe (‘) by Frank Zappa. i was part of something called the Zappa Ensemble in graduate school as one of the engineers. The musicians where amazing. Apostrohe (‘)is an amazing album. The solo on Uncle Remus is just astounding well as is most of his solos.

September 27, 2023September 27, 2023 TCTJr Complexity_Theory|Snake_Bytes

Snake_Byte[15] Fourier, Discrete and Fast Transformers

The frequency domain of mind (a mind, it must be stressed, is an unextended, massless, immaterial singularity) can produce an extended, spacetime domain of matter via ontological Fourier mathematics, and the two domains interact via inverse and forward Fourier transforms.
~ Dr. Cody Newman, The Ontological Self: The Ontological Mathematics of Consciousness

I am Optimus Transformer Ruler Of The AutoCorrelation Bots

First i trust everyone is safe. i haven’t written technical blog in a while so figured i would write a Snake_Byte on one of my favorite equations The Fourier Transform:

$\hat{f} (\xi)=\int_{-\infty}^{\infty}f(x)e^{-2\pi ix\xi}dx$

More specifically we will be dealing with the Fast Fourier Transform which is an implementation of The Discrete Fourier Transform. The Fourier Transform operates on continuous signals and while i do believe we will have analog computing devices (again) in the future we have to operate on 0’s and 1’s at this juncture thus we have a discrete version thereof. The discrete version:

$F(x) &= f\f[k] &= \sum_{j=0}^{N-1} x[j]\left(e^{-2\pi i k/N}\right)^j\0 &\leq k < N$

where:

$f[k] &= f_e[k]+e^{-2\pi i k/N}f_o[k]\f[k+N/2] &= f_e[k]-e^{-2\pi i k/N}f_o[k]$

The Discrete Fourier Transform (DFT) is a mathematical operation. The Fast Fourier Transform (FFT) is an efficient algorithm for the evaluation of that operation (actually, a family of such algorithms). However, it is easy to get these two confused. Often, one may see a phrase like “take the FFT of this sequence”, which really means to take the DFT of that sequence using the FFT algorithm to do it efficiently.

The Fourier sequence is a kernel operation for any number of transforms where the kernel is matched to the signal if possible. The Fourier Transform is a series of $sin(\theta)$ and $cos(\theta)$ which makes it really useful for audio and radar analysis.

For the FFT it only takes for the sequence computation and as one would imagine this is a substantial gain. The most commonly used FFT algorithm is the Cooley-Tukey algorithm, which was named after J. W. Cooley and John Tukey. It’s a divide and conquer algorithm for the machine calculation of complex Fourier series. It breaks the DFT into smaller DFTs. Other FFT algorithms include the Rader’s algorithm, Winograd Fourier transform algorithm, Chirp Z-transform algorithm, etc. The only rub comes as a function of the delay throughput.

There have been amazing text books written on this subject and i will list them at the end of the blarg[1,2,3]

So lets get on with some code. First we do the usual houskeeping on import libraries as well as doing some majik for inline display if you are using JupyterNotebooks. Of note ffpack which is a package of Fortran subroutines for the fast Fourier transform. It includes complex, real, sine, cosine, and quarter-wave transforms. It was developed by Paul Swarztrauber of the National Center for Atmospheric Research, and is included in the general-purpose mathematical library SLATEC.

# House keeping libraries imports and inline plots:
import numpy as np
from scipy import fftpack
%matplotlib inline
import matplotlib.pyplot as pl

We now set up some signals where we create a sinusoid with a sample rate. We use linspace to set up the amplitude and signal length.

#frequency in cycles per second or Hertz
#this is equivalent to concert A

Frequency = 20 
# Sampling rate or the number of measurements per second
# This is the rate of digital audio

Sampling_Frequency = 100 

# set up the signal space:
time = np.linspace(0,2,2 * Sampling_Frequency, endpoint = False)
signal = np.sin(Frequency * 2 * np.pi * time)

Next we plot the sinusoid under consideration:

# plot the signal:
fif, ax = plt.subplots()
ax.plot(time, signal)
ax.set_xlabel('Time [seconds]')
ax.set_ylabel('Signal Amplitude')

Next we apply the Fast Fourier Transform and transform into the frequency domain:

X_Hat = fftpack.fft(signal)
Frequency_Component = fftpack.fftfreq(len(signal)) * Sampling_Frequency

We now plot the transformed sinusoid depicting the frequencies we generated:

# plot frequency components of the signal:
fig, ax = plt.subplots()
ax.stem(Frequency_Component, np.abs(X_Hat)) # absolute value of spectrum
ax.set_xlabel ('Frequency in Hertz [HZ] Of Transformed Signal')
ax.set_ylabel ('Frequency Domain (Spectrum) Magnitude')
ax.set_xlim(-Sampling_Frequency / 2, Sampling_Frequency / 2)
ax.set_ylim(-5,110)

To note you will see two frequency components, this is because there are positive and negative (real and imaginary) components to the transform which is what we see using the stem plots as expected. This is because the kernel as mentioned before is both $sin(\theta)$ and $cos(\theta)$ .

So something really cool happens when using the FFT. It is called the convolution theorem as well as Dual Domain Theory. Convolution in the time domain yields multiplication in the frequency domain. Mathematically, the convolution theorem states that under suitable conditions the Fourier transform of a convolution of two functions (or signals) is the poin-twise (Hadamard multiplication) product of their Fourier transforms. More generally, convolution in one domain (e.g., time domain) equals point-wise multiplication in the other domain (e.g., frequency domain).

Where:

$x(t)*h(t) &= y(t)$

$X(f) H(f) &= Y(f)$

So there you have it. A little taster on the powerful Fourier Transform.

Until Then,

#iwishyouwater <- Cloudbreak this past year

Muzak To Blarg by: Voyager Essential Max Ricther. Phenomenal. November is truly staggering.

References:

[1] The Fourier Transform and Its Applications by Dr Ronald N Bracewell. i had the honor of taking the actual class at Stanford University from Professor Bracewell.

[2] The Fourier Transform and Its Applications by E. Roan Brigham. Graet book on butterfly and overlap-add derivations thereof.

[3] Adaptive Digital Signal Processing by Dr. Claude Lindquist. A phenomenal book on frequency domain signal processing and kernel analysis. A book ahead of its time. Professor Lindquist was a mentor and had a direct effect and affect on my career and the way i approach information theory.

Snake_Byte:[14] Coding In Philosophical Frameworks

Your vision will only become clear when you can look into your heart. Who looks outside, dreams; who looks inside, awakes. Knowing your own darkness is the best method for dealing with the darknesses of other people. We cannot change anything until we accept it.
~ C. Jung

(Caveat Emptor: This blog is rather long in the snakes tooth and actually more like a CHOMP instead of a BYTE. tl;dr)

First, Oh Dear Reader i trust everyone is safe, Second sure feels like we are living in an age of Deus Ex Machina, doesn’t it? Third with this in mind i wanted to write a Snake_Byte that have been “thoughting” about for quite some but never really knew how to approach it if truth be told. I cant take full credit for this ideation nor do i actually want to claim any ideation. Jay Sales and i were talking a long time after i believe i gave a presentation on creating Belief Systems using BeliefNetworks or some such nonsense.

The net of the discussion was we both believed that in the future we will code in philosophical frameworks.

Maybe we are here?

So how would one go about coding an agent-based distributed system that allowed one to create an agent or a piece of evolutionary code to exhibit said behaviors of a philosophical framework?

Well we must first attempt to define a philosophy and ensconce it into a quantized explanation.

Stoicism seemed to me at least the best first mover here as it appeared to be the tersest by definition.

So first those not familiar with said philosophy, Marcus Aurelius was probably the most famous practitioner of Stoicism. i have put some references that i have read at the end of this blog¹.

Stoicism is a philosophical school that emphasizes rationality, self-control, and inner peace in the face of adversity. In thinking about this i figure To build an agent-based software system that embodies Stoicism, we would need to consider several key aspects of this philosophy.

Stoics believe in living in accordance with nature and the natural order of things. This could be represented in an agent-based system through a set of rules or constraints that guide the behavior of the agents, encouraging them to act in a way that is in harmony with their environment and circumstances.

Stoics believe in the importance of self-control and emotional regulation. This could be represented in an agent-based system through the use of decision-making algorithms that take into account the agent’s emotional state and prioritize rational, level-headed responses to stimuli.

Stoics believe in the concept of the “inner citadel,” or the idea that the mind is the only thing we truly have control over. This could be represented in an agent-based system through a focus on internal states and self-reflection, encouraging agents to take responsibility for their own thoughts and feelings and strive to cultivate a sense of inner calm and balance.

Stoics believe in the importance of living a virtuous life and acting with moral purpose. This could be represented in an agent-based system through the use of reward structures and incentives that encourage agents to act in accordance with Stoic values such as courage, wisdom, and justice.

So given a definition of Stoicism we then need to create a quantized model or discrete model of those behaviors that encompass a “Stoic Individual”. i figured we could use the evolutionary library called DEAP (Distributed Evolutionary Algorithms in Python ). DEAP contains both genetic algorithms and genetic programs utilities as well as evolutionary strategy methods for this type of programming.

Genetic algorithms and genetic programming are both techniques used in artificial intelligence and optimization, but they have some key differences.

This is important as people confuse the two.

Genetic algorithms are a type of optimization algorithm that use principles of natural selection to find the best solution to a problem. In a genetic algorithm, a population of potential solutions is generated and then evaluated based on their fitness. The fittest solutions are then selected for reproduction, and their genetic information is combined to create new offspring solutions. This process of selection and reproduction continues until a satisfactory solution is found.

On the other hand, genetic programming is a form of machine learning that involves the use of genetic algorithms to automatically create computer programs. Instead of searching for a single solution to a problem, genetic programming evolves a population of computer programs, which are represented as strings of code. The programs are evaluated based on their ability to solve a specific task, and the most successful programs are selected for reproduction, combining their genetic material to create new programs. This process continues until a program is evolved that solves the problem to a satisfactory level.

So the key difference between genetic algorithms and genetic programming is that genetic algorithms search for a solution to a specific problem, while genetic programming searches for a computer program that can solve the problem. Genetic programming is therefore a more general approach, as it can be used to solve a wide range of problems, but it can also be more computationally intensive due to the complexity of evolving computer programs².

So returning back to the main() function as it were, we need create a genetic program that models Stoic behavior using the DEAP library,

First need to define the problem and the relevant fitness function. This is where the quantized part comes into play. Since Stoic behavior involves a combination of rationality, self-control, and moral purpose, we could define a fitness function that measures an individual’s ability to balance these traits and act in accordance with Stoic values.

So lets get to the code.

To create a genetic program that models Stoic behavior using the DEAP library in a Jupyter Notebook, we first need to install the DEAP library. We can do this by running the following command in a code cell:

pip install deap

Next, we can import the necessary modules and functions:

import random
import operator
import numpy as np
from deap import algorithms, base, creator, tools

We can then define the problem and the relevant fitness function. Since Stoic behavior involves a combination of rationality, self-control, and moral purpose, we could define a fitness function that measures an individual’s ability to balance these traits and act in accordance with Stoic values.

Here’s an example of how we might define a “fitness function” for this problem:

# Define the fitness function.  NOTE: # i am open to other ways of defining this and other models
# the definition of what is a behavior needs to be quantized or discretized and 
# trying to do that yields a lossy functions most times.  Its also self referential

def fitness_function(individual):
    # Calculate the fitness based on how closely the individual's behavior matches stoic principles
    fitness = 0
    # Add points for self-control, rationality, focus, resilience, and adaptability can haz Stoic?
    fitness += individual[0]  # self-control
    fitness += individual[1]  # rationality
    fitness += individual[2]  # focus
    fitness += individual[3]  # resilience
    fitness += individual[4]  # adaptability
    return fitness,

# Define the genetic programming problem
creator.create("FitnessMax", base.Fitness, weights=(1.0,))
creator.create("Individual", list, fitness=creator.FitnessMax)

# Initialize the genetic algorithm toolbox
toolbox = base.Toolbox()

# Define the genetic operators
toolbox.register("attribute", random.uniform, 0, 1)
toolbox.register("individual", tools.initRepeat, creator.Individual, toolbox.attribute, n=5)
toolbox.register("population", tools.initRepeat, list, toolbox.individual)
toolbox.register("evaluate", fitness_function)
toolbox.register("mate", tools.cxTwoPoint)
toolbox.register("mutate", tools.mutGaussian, mu=0, sigma=0.1, indpb=0.1)
toolbox.register("select", tools.selTournament, tournsize=3)

# Run the genetic algorithm
population = toolbox.population(n=10)
for generation in range(20):
    offspring = algorithms.varAnd(population, toolbox, cxpb=0.5, mutpb=0.1)
    fits = toolbox.map(toolbox.evaluate, offspring)
    for fit, ind in zip(fits, offspring):
        ind.fitness.values = fit
    population = toolbox.select(offspring, k=len(population))
    
# Print the best individual found
best_individual = tools.selBest(population, k=1)[0]

print ("Best Individual:", best_individual)

Here, we define the genetic programming parameters (i.e., the traits that we’re optimizing for) using the toolbox.register function. We also define the evaluation function (stoic_fitness), genetic operators (mate and mutate), and selection operator (select) using DEAP’s built-in functions.

We then define the fitness function that the genetic algorithm will optimize. This function takes an “individual” (represented as a list of five attributes) as input, and calculates the fitness based on how closely the individual’s behavior matches stoic principles.

We then define the genetic programming problem via the quantized attributes, and initialize the genetic algorithm toolbox with the necessary genetic operators.

Finally, we run the genetic algorithm for 20 generations, and print the best individual found. The selBest function is used to select the top individual fitness agent or a “behavior” if you will for that generation based on the iterations or epochs. This individual represents an agent that mimics the philosophy of stoicism in software, with behavior that is self-controlled, rational, focused, resilient, and adaptable.

Best Individual: [0.8150247518866958, 0.9678037028949047, 0.8844195735244268, 0.3970642186025506, 1.2091810770505023]

This denotes the best individual with those best balanced attributes or in this case the Most Stoic,

As i noted this is a first attempt at this problem i think there is a better way with a full GP solution as well as a tunable fitness function. In a larger distributed system you would then use this agent as a framework amongst other agents you would define.

i at least got this out of my head.

until then,

#iwishyouwater <- Alexey Molchanov and Dan Bilzerian at Deep Dive Dubai

Muzak To Blog By: Phil Lynott “The Philip Lynott Album”, if you dont know who this is there is a statue in Ireland of him that i walked a long way with my co-founder, Lisa Maki a long time ago to pay homage to the great Irish singer of the amazing band Thin Lizzy. Alas they took Phil to be cleaned that day. At least we got to walk and talk and i’ll never forget that day. This is one of his solo efforts and i believe he is one of the best artists of all time. The first track is deeply emotional.

References:

[1] A list of books on Stoicism -> click HERE.

[2] Genetic Programming (On the Programming of Computers by Means of Natural Selection), By Professor John R. Koza. There are multiple volumes i think four and i have all of this but this is a great place to start and the DEAP documentation. Just optimizing a transcendental functions is mind blowing what GP comes out with using arithmetic

March 15, 2023March 15, 2023 TCTJr Mathematics|Snake_Bytes

Snake_Byte:[13] The Describe Function.

First i trust everyone is safe. Second i hope people are recovering somewhat from the SVB situation. We are at the end of a era, cycle or epoch; take your pick. Third i felt like picking a Python function that was simple in nature but very helpful.

The function is pandas.describe(). i’ve previously written about other introspection libraries like DABL however this is rather simple and in place. Actually i never had utilized it before. i was working on some other code as a hobby in the areas of transfer learning and was playing around with some data and decided to to use the breast cancer data form the sklearn library which is much like the iris data used for canonical modeling and comparison. Most machine learning is data cleansing and feature selection so lets start with something we know.

Breast cancer is the second most common cancer in women worldwide, with an estimated 2.3 million new cases in 2020. Early detection is key to improving survival rates, and machine learning algorithms can aid in diagnosing and treating breast cancer. In this blog, we will explore how to load and analyze the breast cancer dataset using the scikit-learn library in Python.

The breast cancer dataset is included in scikit-learn's datasets module, which contains a variety of well-known datasets for machine learning. The features describe the characteristics of the cell nuclei present in the image. We can load the dataset using the load_breast_cancer function, which returns a dictionary-like object containing the data and metadata about the dataset.

It has been surmised that machine learning is mostly data exploration and data cleaning.

from sklearn.datasets import load_breast_cancer
import pandas as pd

#Load the breast cancer dataset
data = load_breast_cancer()

The data object returned by load_breast_cancer contains the feature data and the target variable. The feature data contains measurements of 30 different features, such as radius, texture, and symmetry, extracted from digitized images of fine needle aspirate (FNA) of breast mass. The target variable is binary, with a value of 0 indicating a benign tumor and a value of 1 indicating a malignant tumor.

We can convert the feature data and target variable into a pandas dataframe using the DataFrame constructor from the pandas library. We also add a column to the dataframe containing the target variable.

#Convert the data to a pandas dataframe
df = pd.DataFrame(data.data, columns=data.feature_names)
df['target'] = pd.Series(data.target)

Finally, we can use the describe method of the pandas dataframe to get a summary of the dataset. The describe method returns a table containing the count, mean, standard deviation, minimum, and maximum values for each feature, as well as the count, mean, standard deviation, minimum, and maximum values for the target variable.

#Use the describe() method to get a summary of the dataset
print(df.describe())

The output of the describe method is as follows:

mean radius  mean texture  ...  worst symmetry      target
count   569.000000    569.000000  ...      569.000000  569.000000
mean     14.127292     19.289649  ...        0.290076    0.627417
std       3.524049      4.301036  ...        0.061867    0.483918
min       6.981000      9.710000  ...        0.156500    0.000000
25%      11.700000     16.170000  ...        0.250400    0.000000
50%      13.370000     18.840000  ...        0.282200    1.000000
75%      15.780000     21.800000  ...        0.317900    1.000000
max      28.110000     39.280000  ...        0.663800    1.000000

[8 rows x 31 columns]

From the summary statistics, we can see that the mean values of the features vary widely, with the mean radius ranging from 6.981 to 28.11 and the mean texture ranging from 9.71 to 39.28. We can also see that the target variable is roughly balanced, with 62.7% of the tumors being malignant.

Pretty nice utility.

Then again in looking at this data one would think we could get to first principles engineering and root causes and make it go away? This directly affects motherhood which i still believe is the hardest job in humanity. Makes you wonder where all the money goes?

Until then,

#iwishyouwater <- Free Diver Steph who is also a mom hunting pelagics on #onebreath

Muzak To Blog By Peter Gabriel’s “Peter Gabriels 3: Melt (remastered). He is coming out with a new album. Games Without Frontiers and Intruder are timeless. i applied long ago to work at Real World Studios and received the nicest rejection letter.

Prompt Engineering So Last Month

Vertically Trained Horizontally Chained

Slide The Transformers

Redefining Long Context

Closing

Share this:

Like What Are Fractals?

Types of Fractals

Geometric Fractals

Algebraic Fractals

Random Fractals

Strange Attractors

Now Let’s Compute Some Complexity

Prerequisites

Code Overview

X and Y Axis Labels and Complex Numbers

Share this:

2,781,000,000,000

Share this:

Share this:

Share this:

What is pyastgrep?

Why Use pyastgrep?

Getting Started with pyastgrep

Installing pyastgrep

Example 1: Finding Function Definitions

Example 2: Searching for Specific Argument Names

Example 3: Finding Class Instantiations

Advanced Usage of pyastgrep

Integrating pyastgrep into Your Workflow

Example Script

Conclusion

𝕋𝕖𝕕 ℂ. 𝕋𝕒𝕟𝕟𝕖𝕣 𝕁𝕣. (@tctjr) / X

References:

Share this:

Getting Started with pdb

Running the Code with pdb

Essential pdb Commands

Advanced pdb Features

𝕋𝕖𝕕 ℂ. 𝕋𝕒𝕟𝕟𝕖𝕣 𝕁𝕣. (@tctjr)

Share this:

Share this:

Share this:

Share this:

What is `pyastgrep`?

Why Use `pyastgrep`?

Getting Started with `pyastgrep`

Installing `pyastgrep`

Advanced Usage of `pyastgrep`

Integrating `pyastgrep` into Your Workflow