Snake_Byte:[13] The Describe Function.

DALLE-2 Draws Describe

First i trust everyone is safe. Second i hope people are recovering somewhat from the SVB situation. We are at the end of a era, cycle or epoch; take your pick. Third i felt like picking a Python function that was simple in nature but very helpful.

The function is pandas.describe(). i’ve previously written about other introspection libraries like DABL however this is rather simple and in place. Actually i never had utilized it before. i was working on some other code as a hobby in the areas of transfer learning and was playing around with some data and decided to to use the breast cancer data form the sklearn library which is much like the iris data used for canonical modeling and comparison. Most machine learning is data cleansing and feature selection so lets start with something we know.

Breast cancer is the second most common cancer in women worldwide, with an estimated 2.3 million new cases in 2020. Early detection is key to improving survival rates, and machine learning algorithms can aid in diagnosing and treating breast cancer. In this blog, we will explore how to load and analyze the breast cancer dataset using the scikit-learn library in Python.

The breast cancer dataset is included in scikit-learn's datasets module, which contains a variety of well-known datasets for machine learning. The features describe the characteristics of the cell nuclei present in the image. We can load the dataset using the load_breast_cancer function, which returns a dictionary-like object containing the data and metadata about the dataset.

It has been surmised that machine learning is mostly data exploration and data cleaning.

from sklearn.datasets import load_breast_cancer
import pandas as pd

#Load the breast cancer dataset
data = load_breast_cancer()

The data object returned by load_breast_cancer contains the feature data and the target variable. The feature data contains measurements of 30 different features, such as radius, texture, and symmetry, extracted from digitized images of fine needle aspirate (FNA) of breast mass. The target variable is binary, with a value of 0 indicating a benign tumor and a value of 1 indicating a malignant tumor.

We can convert the feature data and target variable into a pandas dataframe using the DataFrame constructor from the pandas library. We also add a column to the dataframe containing the target variable.

#Convert the data to a pandas dataframe
df = pd.DataFrame(data.data, columns=data.feature_names)
df['target'] = pd.Series(data.target)

Finally, we can use the describe method of the pandas dataframe to get a summary of the dataset. The describe method returns a table containing the count, mean, standard deviation, minimum, and maximum values for each feature, as well as the count, mean, standard deviation, minimum, and maximum values for the target variable.

#Use the describe() method to get a summary of the dataset
print(df.describe())

The output of the describe method is as follows:

mean radius  mean texture  ...  worst symmetry      target
count   569.000000    569.000000  ...      569.000000  569.000000
mean     14.127292     19.289649  ...        0.290076    0.627417
std       3.524049      4.301036  ...        0.061867    0.483918
min       6.981000      9.710000  ...        0.156500    0.000000
25%      11.700000     16.170000  ...        0.250400    0.000000
50%      13.370000     18.840000  ...        0.282200    1.000000
75%      15.780000     21.800000  ...        0.317900    1.000000
max      28.110000     39.280000  ...        0.663800    1.000000

[8 rows x 31 columns]

From the summary statistics, we can see that the mean values of the features vary widely, with the mean radius ranging from 6.981 to 28.11 and the mean texture ranging from 9.71 to 39.28. We can also see that the target variable is roughly balanced, with 62.7% of the tumors being malignant.

Pretty nice utility.

Then again in looking at this data one would think we could get to first principles engineering and root causes and make it go away? This directly affects motherhood which i still believe is the hardest job in humanity. Makes you wonder where all the money goes?

Until then,

#iwishyouwater <- Free Diver Steph who is also a mom hunting pelagics on #onebreath

Muzak To Blog By Peter Gabriel’s “Peter Gabriels 3: Melt (remastered). He is coming out with a new album. Games Without Frontiers and Intruder are timeless. i applied long ago to work at Real World Studios and received the nicest rejection letter.

Snake_Byte[11] Linear Algebra, Matrices and Products – Oh My!

Algebra is the metaphysics of arithmetic.

~ John Ray
Looks Hard.

First, as always, i hope everyone is safe, Second, as i mentioned in my last Snake_Byte [] let us do something a little more technical and scientific. For context, the catalyst for this was a surprising discussion that came from how current machine learning interviews are being conducted and how the basics of the distance between two vectors have been overlooked. So this is a basic example and in the following Snake_Byte [] i promise to get into something a little more say carnivore.

With that let us move to some linear algebra. For those that don’t know what linear algebra is, i will refer you to the best book on the subject, Professor Gilbert Strang’s Linear Algebra and its Applications.

i am biased here; however, i do believe the two most important areas of machine learning and data science are linear algebra and probability, with optimization techniques coming in a close third.

So dear reader, please bear with me here. We will review a little math; maybe for some, this will be new, and for those that already know this, you can rest your glass-balls.

We denote x\in\mathbb(R)^N be N-dimensional vectors taking real numbers as their entries. For example:

\begin{bmatrix}\Huge 0 \\ 1 \\ 2 \end{bmatrix}

where \{a_i\} are the indices respectively. In this case [3].

An M-by-N matrix is denoted as X\in\mathbb(R)^N . The transpose of a matrix is denoted as X^T. A matrix X can be viewed according to its columns and its rows:

\begin{bmatrix}  0 & 1 & 2 \\ 3 & 4 & 5\\ 6 & 7 & 8 \\ \end{bmatrix}

where \{a_i_j\} are the row and column indices.

An array is a data structure in python programming that holds fix number of elements and these elements should be of the same data type. The main idea behind using an array of storing multiple elements of the same type. Most of the data structure makes use of an array to implement their algorithm. There is two important parts of the array:

  • Element: Each item stored in the array is called an element.
  • Index: Every element in the array has its own numerical value to identify the element.

Think of programming a loop, tuple, list,array,range or matrix:

from math import exp
v1 = [x, y] # list of variables
v2 = (-1, 2) # tuple of numbers
v3 = (x1, x2, x3) # tuple of variables

v4 = [exp(-i*0.1) for i in range(150)] #ye ole range loop

and check this out for a matrix:

import numpy as np
a = np.matrix('0 1:2 3')
print (a)
output: [[0 1]
 [2 3]]

which folks is why we like the Snake Language. Really that is about it for vectors and matrices. The theory is where you get into proofs and derivations which can save you a ton of time on optimizations.

So now let’s double click on some things that will make you sound cool at the parties or meetups.

A vector can be multiplied by a number. This number a is usually denoted as a scalar:

a\cdot (v_1,v_2) = (av_1,av_2)

Now given this one of the most fundamental aspects in all of machine-learning is the inner product, also called dot product, or scalar product, of two vectors, is a number. Most of all, machine learning algorithms have some form of a dot product somewhere within the depths of all the mathz. Nvidia GPUs are optimized for (you guessed it) dot products.

So how do we set this up? Multiplication of scalar a and a vector (v_1,\dots,v_{n-1}) yields:

(av_0,\dots,av_{n-1})

Ok good so far.

The inner or dot product of two n-vectors is defined as:

(u_0,\dots,u_{n-1})\cdot(v_0,\dots,v_{n-1}) = u_0v_0 +,\dots,+ u_{n-1}v_{n-1}

which, if you are paying attention yields:

(1)   \begin{equation*} = \sum_{j=0}^{N-1}{u_jv_j}\end{equation*}

Geometrically, the dot product of U and V equals the length of U times the length of V times the cosine of the angle between them:

\textbd{U}\cdot\textbf{V}=|\textbf{U}||\textbf{V}|\cos\theta

ok so big deal huh? yea, but check this out in the Snake_Language:

# dot product of two vectors
 
# Importing numpy module
import numpy as np
 
# Taking two scalar values
a = 5
b = 7
 
# Calculating dot product using dot()
print(np.dot(a, b))
output: 35

hey now!

# Importing numpy module
import numpy as np
 
# Taking two 2D array
# For 2-D arrays it is the matrix product
a = [[2, 1], [0, 3]]
b = [[1, 1], [3, 2]]
 
# Calculating dot product using dot()
print(np.dot(a, b))
output:[[5 4]
       [9 6]]

Mathematically speaking the inner product is a generalization of a dot product. As we said constructing a vector is done using the command np.array. Inside this command, one needs to enter the array. For a column vector, we write [[1],[2],[3]], with an outer [], and three inner [] for each entry. If the vector is a row vector, the one can omit the inner []’s by just calling np.array([1, 2, 3]).

Given two column vectors x and y, the inner product is computed via np.dot(x.T,y), where np.dot is the command for inner product, and x.T returns the transpose of x. One can also call np.transpose(x), which is the same as x.T.

 # Python code to perform an inner product with transposition
 import numpy as np
 x = np.array([[1],[0],[-1]])
 y = np.array([[3],[2],[0]]) 
 z = np.dot(np.transpose(x),y)
print (z) 


Yes, now dear read you now can impress your friends with your linear algebra and python prowess.

Note: In this case, the dot product is scale independent for actual purposes of real computation you must do something called a norm of a vector. i won’t go into the mechanics of this unless asked for further explanations on the mechanics of linear algebra. i will gladly go into pythonic examples if so asked and will be happy to write about said subject. Feel free to inquire in the comments below.

Unitl Then,

#iwishyouwater <- Nathan Florence with Kelly Slater at the Box. Watch.

tctjr.

Muzak to Blog By: INXS. i had forgotten how good of a band they were and the catalog. Michael Hutchinson, the lead singer, hung himself in a hotel room. Check out the song “By My Side”, “Dont Change” and “Never Tear Us Apart” and “To Look At You”. They weren’t afraid the take production chances.

Note[2]: i resurrected some very old content from a previous site i owned i imported the older blogs. Some hilarious. Some sad. Some infuriating. i’m shining them up. Feel free to look back in time.

Snake_Byte[6] Algorithm Complexity

The Lighter Side of Complexity - The Complexity Project
Your software design?

Any intelligent fool can make things bigger, more complex, and more violent. It takes a touch of genius and a lot of courage to move in the opposite direction.

E.F. Schumacher

First, i hope everyone is safe.

Second, i had meant this for reading over Thanksgiving but transparently I was having technical difficulties with \LATEX rendering and it appears that both MATHJAX and native LATEX are not working on my site. For those interested i even injected the MATHJAX code into my .php header. Hence i had to rewrite a bunch of stuff alas with no equations. Although for some reason unbenowst to me my table worked.

Third, Hey its time for a Snake_Byte [] !

In this installment, i will be discussing Algorithm Complexity and will be using a Python method that i previously wrote about in Snake_Byte[5]: Range.

So what is algorithm complexity?  Well, you may remember in your mathematics or computer science classes “Big Oh” notation.  For those that don’t know this involves both space and time complexity not to be confused with Space-Time Continuums.  

Let’s hit the LazyWeb and particularly Wikipedia:

“Big O notation is a mathematical notation that describes the limiting behavior of a function when the argument tends towards a particular value or infinity. It is a member of a family of notations invented by Paul Bachmann, Edmund Landau, and others collectively called Bachmann–Landau notation or asymptotic notation.”

— Wikipedia’s definition of Big O notation

Hmmm.   Let’s try to parse that a little better shall we?

So you want to figure out how slow or hopefully how fast your code is using fancy algebraic terms and terminology.  So you want to measure the algorithmic behavior as a function of two variables with time complexity and space complexity.  Time is both the throughput as well as how fast from t0-tni1 the algorithm operates.  Then we have space complexity which is literally how much memory (either in memory or persistent memory) the algorithms require as a function of the input.  As an added bonus you can throw around the word asymptotic:

From Dictionary.com

/ (ˌæsɪmˈtɒtɪk) / adjective. of or referring to an asymptote. (of a function, series, formula, etc) approaching a given value or condition, as a variable or an expression containing a variable approaches a limit, usually infinity.

Ergo asymptotic analysis means how the algorithm responds “to” or “with” values that approach ∞.

So “Hey what’s the asymptotic response of the algorithm?”

Hence we need a language that will allow us to say that the computing time, as a function of (n), grows ‘on the order of n3,’ or ‘at most as fast as n3,’ or ‘at least as fast as n *log*n,’ etc.

There are five symbols that are used in the language of comparing the rates of growth of functions they are the following five: ‘o’ (read ‘is little oh of’), O (read ‘is big oh of’), ‘θ’ (read ‘is theta of’), ‘∼’ (read ‘is asymptotically equal to’ or, irreverently, as ‘twiddles’), and Ω (read ‘is omega of’). It is interesting to note there are discrepancies amongst the ranks of computer science and mathematics as to the accuracy and validity of each. We will just keep it simple and say Big-Oh.

So given f(x) and g(x) be two functions of x. Where each of the five symbols above are intended to compare the rapidity of growth of f and g. If we say that f(x) = o(g(x)), then informally we are saying that f grows more slowly than g does when x is very large.

Let’s address the time complexity piece i don’t want to get philosophical on What is Time? So for now and this blog i will make the bounds it just like an arrow t(0) – t(n-1)

That said the analysis of the algorithm is for an order of magnitude not the actual running time. There is a python function called time that we can use to do an exact analysis for the running time.  Remember this is to save you time upfront to gain an understanding of the time complexity before and while you are designing said algorithm.

Most arithmetic operations are constant time; multiplication usually takes longer than addition and subtraction, and division takes even longer, but these run times don’t depend on the magnitude of the operands. Very large integers are an exception; in that case, the run time increases with the number of digits.

So for Indexing operations whether reading or writing elements in a sequence or dictionary are also constant time, regardless of the size of the data structure.

A for loop that traverses a sequence or dictionary is usually linear, as long as all of the operations in the body of the loop are constant time.

The built-in function sum is also linear because it does the same thing, but it tends to be faster because it is a more efficient implementation; in the language of algorithmic analysis, it has a smaller leading coefficient.

If you use the same loop to “add” a list of strings, the run time is quadratic because string concatenation is linear.

The string method join is usually faster because it is linear in the total length of the strings.

So let’s look at an example using the previous aforementioned range built-in function:

So this is much like the linear example above: The lowest complexity is O(1). When we have a loop:


k = 0
for i in range(n):
    for j in range(m):
        print(i)
        k=k+1

In this case for nested loops we multiply the time complexity thus O(n*m). it also works the same for a loop with time complexity (n) we call a function a function with time complexity (m). When calculating complexity we omit the constant regardless if its execution 5 or 100 times.

When you are performing an analysis look for worst-case boundary conditions or examples.

Linear O(n):

for i in range(n):
 if t[i] == 0:
   return 0
return 1

Quadratic O(n**2):

res = 0
for i in range (n):
   for in range (m):
      res += 1
return (res)

There are other types if time complexity like exponential time and factorial time. Exponential Time is O(2**n) and Factorial Time is O(n!).

For space complexity memory has a limit especially if you have ever chased down a heap allocation or trash collection bug. Like we said earlier there is no free lunch you either trade space for time or time for space. Data-driven architectures respond to the input size of the data. Thus the dimensionality of the input space needs to be addressed. If you have a constant number of variables: O(1). If you need to declare an array like using numpy for instance with (n) elements then you have linear space complexity O(n). Remember these are independent of the size of the problem.

For a great book on Algorithm Design and Analysis i highly recommend:

The Algorithm Design Manual by Steven S. Skiena (click it takes you to amazon)

It goes in-depth to growth rates and dominance relations etc `as it relates to graph algorithms, search and sorting as well as cryptographic functions.

There is also a trilogy of sorts called Algorithms Unlocked and Illuminated by Roughgarden and Cormen which are great and less mathematically rigorous if that is not your forte.

Well, i hope this gave you a taste. i had meant this to be a much longer and more in-depth blog however i need to fix this latex issue so i can properly address the matters at hand.

Until then,

#iwishyouwater <- Alexey Molchanov new world freedive record. He is a really awesome human.

Muzak To Blog By: Maddalena (Original Motion Picture Soundtrack) by the Maestro Ennio Morricone – Rest in Power Maestro i have spent many hours listening to your works.

Snake_Byte[4]: Random and PseudoRandom Numbers

Expose yourself to as much randomness as possible.


~ Ben Casnocha
Visualization of the algorithmic random data
A Visualization Of Randomness

First i trust everyone is safe.

Second it is WEDNESDAY and that must mean a Snake_Byte or you are working in a startup because every day is WEDNESDAY in a startup!

i almost didn’t get this one done because well life happens but i want to remain true to the goals herewith to the best of my ability.

So in today’s Snake_Byte we are going to cover Random and PseudoRandom Numbers.  i really liked this one because it was more in line with scientific computing and numerical optimization.

The random module in Python generates what is called pseudorandom numbers.  It is in the vernacular a pseudorandom number generator (PRNG).  This generation includes different types of distributions for said numbers. 

So what is a pseudorandom number:

“A pseudorandom number generator (PRNG), also known as a deterministic random bit generator, is an algorithm for generating a sequence of numbers whose properties approximate the properties of sequences of random numbers.” ~ Wikipedia

The important aspect here is:  the properties approximate sequences of random numbers.  So this means that it is statistically random even though it was generated by a deterministic response.

While i have used the random module and have even generated various random number algorithms i learned something new in this blog.  The pseudorandom number generator in Python uses an algorithm called the Mersenne Twister algorithm.  The period of said algorithm is length 2**19937-1 for the 32 bit version and there is also a 64-bit version.  The underlying implementation in C is both fast and thread-safe. The Mersenne Twister is one of the most extensively tested random number generators in existence. One issue though is that due to the deterministic nature of the algorithm it is not suitable for cryptographic methods.

Let us delve down into some code into the various random module offerings, shall we?

i like using %system in Jupyter Lab to create an interactive session. First we import random. Lets look at random.random() which returns a uniform distribution and when multiplied by a integer bounds it within that distribution range:

%system
import random
for i in range (5):
    x = random.random() * 100
    print (x)
63.281889167063035
0.13679757425121286
47.697874648329
96.66882808709684
76.63300711554905

Next let us check out random.choice(seq) which returns a random element from the non-empty sequence seq. If seq is empty, raises IndexError:

for z in range (5):
mySurfBoardlist = ["longboard", "shortboard", "boogieboard"]
print(random.choice(mySurfBoardlist))
longboard
boogieboard
boogieboard
longboard
shortboard

Next let us look at random.randrange(startstop[, step]) which returns a randomly selected element from range(start, stop, step). This is equivalent to choice(range(start, stop, step)) but doesn’t actually build a range object.

ParameterDescription
startOptional. An integer specifying at which position to start.
Default 0
stopRequired. An integer specifying at which position to end.
stepOptional. An integer specifying the incrementation.
Default 1
random.ranrange parameters
for i in range (5): 
      print(random.randrange(10, 100,1))
84
21
94
91
87

Now let us move on to some calls that you would use in signal processing, statistics or machine learning. The first one is gauss(). gauss() returns a gaussian distribution using the following mathematics:

\Large f(x) = \frac{1}{\sigma\sqrt{2\pi}}\exp\left(-\frac{1}{2}\left(\frac{x-\mu}{\sigma}\right)^{2}\right)

Gaussian distribution (also known as normal distribution) is a bell-shaped curve (aka the bell curve), and it is assumed that during any measurement values will follow a normal distribution with an equal number of measurements above and below the mean value.

ParameterDescription
muthe mean
sigmathe standard deviation
returns a random gaussian distribution floating number
gauss() parameters
# import the required libraries 
import random 
import matplotlib.pyplot as plt 
#set the inline magic
%matplotlib inline   
# store the random numbers in a list 
nums = [] 
mu = 100
sigma = 50
    
for i in range(100000): 
    temp = random.gauss(mu, sigma) 
    nums.append(temp) 
        
# plot the distribution 
plt.hist(nums, bins = 500, ec="red") 
plt.show()
Gaussian Distribution in Red

There are several more parameters in the random module, setter functions, seed functions and very complex statistical functions. Hit stack overflow and give it a try! Also it doesn’t hurt if you dust off that probability and statistics textbook!

As a last thought which came first the framework of entropy or the framework of randomness? As well as is everything truly random? i would love to hear your thought in the comments!

Until then,

#iwishyouwater <- click here on this one!

tctjr

References:

Python In A Nutshell by Alex Martelli

M. Matsumoto and T. Nishimura, “Mersenne Twister: A 623-dimensionally equidistributed uniform pseudorandom number generator”, ACM Transactions on Modeling and Computer Simulation Vol. 8, No. 1, January pp.3–30 1998

Muzak To Muzak To Blog By:  Black Sabbath  – The End: Live In Birmingham

Snake_Byte[2]: Comparisons and Equality

Contrariwise, continued Tweedledee, if it was so, it might be, and if it were so, it would be; but as it isn’t, it ain’t. That’s logic!

TweedleDee
Algebra, trigonometry and mathematical logic lessons by Janetvr | Fiverr
It’s all rational isn’t it?

First, i trust everyone is safe.

Second, i am going to be pushing a blog out every Wednesday called Snake_Bytes.  This is the second one hot off the press.  Snake as in Python and Bytes as in well you get it. Yes, it is a bad pun but hey most are bad. 

i will pick one of the myriads of python based books i have in my library and randomly open it to a page.  No matter how basic or advanced i will start from there and i will create a short concise blog on said subject.  For some possibly many the content will be rather pedantic for others i hope you gain a little insight.  As a former professor told me “to know a subject in many ways is to know it well.”  Just like martial arts or music performing the basics hopefully makes everything else effortless at some point.

Ok so in today’s installment we have Comparison and Equality.

I suppose more philosophically what is the Truth?

All Python objects at some level respond to some form of comparisons such as a test for equality or a magnitude comparison or even binary TRUE and FALSE.

For all comparisons in Python, the language traverses all parts of compound objects until a result can be ascertained and this includes nested objects and data structures.  The traversal for data structures is applied recursively from left to right.  

So let us jump into some simple snippets there starting with lists objects.  

List objects compare all of their components automatically.

%system #command line majik in Jupyterlab
# same value with unique objects
A1 = [2, (‘b’, 3)] 
A2 = [2, (‘b’, 3)]

#Are they equivalent?  Same Objects?
A1 == A2, A1 is A2
(True, False)

 So what happened here?  A1 and A2 are assigned lists which in fact are equivalent but distinct objects.  

So for comparisons how does that work?

  •  The ==  tests value equivalence

Python recursively tests nested comparisons until a result is ascertained.

  • The is operator tests object identity

Python tests whether the two are really the same object and live at the same address in memory.

So let’s compare some strings, shall we?

StringThing1 = "water"
StringThing2 = "water"
StringThing1 == StringThing2, StringThing1 is StringThing2
(True, True)

Ok, what just happened?  We need to be very careful here and i have seen this cause some really ugly bugs when performing long-chained regex stuff with health data.  Python internally caches and reuses some strings as an optimization technique.  Here there is really just a single string ‘water’ in memory shared by S1, S2 thus the identity operator evaluates to True.

The workaround is thus:

StringThing1 = "i wish you water"
StringThing2 = "i wish you water"
StringThing1 == StringThing2,StringThing1 is StringThing2
(True, False)

Given the logic of this lets see how we have conditional logic comparisons.

I believe Python 2.5 introduced ternary operators.  Once again interesting word:

Ternary operators ternary means composed of three parts or three as a base.

The operators are the fabled if/else you see in almost all programming languages.

Whentrue if condition else whenfalse

The condition is evaluated first.  If condition is true the result is whentrue; otherwise the result is whenfalse.  Only one of the two subexpressions whentrue and whenfalse evaluates depending on the truth value of condition.

Stylistically you want to palace parentheses around the whole expression.

Example of operator this was taken directly out the Python docs with a slight change as i thought it was funny:

is_nice = True
state = "nice" if is_nice else "ain’t nice"
print(state)

Which also shows how Python treats True and False.

In most programming languages an integer 0 is FALSE and an integer 1 is TRUE.

However, Python looks at an empty data structure as False.  True and False as illustrated above are inherent properties of every object in Python.

So in general Python compares types as follows:

  • Numbers are compared by the relative magnitude
  • Non-numeric mixed types comparisons where ( 3 < ‘water’) doesn’t fly in Python 3.0  However they are allowed in Python 2.6 where they use a fixed arbitrary rule.  Same with sorts non-numeric mixed type collections cannot be sorted in Python 3.0
  • Strings are compared lexicographically (ok cool word what does it mean?). Iin mathematics, the lexicographic or lexicographical order is a generalization of the alphabetical order of the dictionaries to sequences of ordered symbols or, more generally, of elements of a totally ordered set. In other words like a dictionary. Character by character where (“abc” < “ac”)
  • Lists and tuples are compared component by component left to right
  • Dictionaries are compared as equal if their sorted (key, value) lists are equal.  However relative magnitude comparisons are not supported in Python 3.0

With structured objects as one would think the comparison happens as though you had written the objects as literal and compared all the components one at a time left to right.  

Further, you can chain the comparisons such as:

a < b <= c < d

Which functionally is the same thing as:

a < b and b <= c and c < d

The chain form is more compact and more readable and evaluates each subexpression once at the most.

Being that most reading this should be using Python 3.0 a couple of words on dictionaries per the last commentary.  In Python 2.6 dictionaries supported magnitude comparisons as though you were comparing (key,value) lists.

In Python 3.0 magnitude comparisons for dictionaries are removed because they incur too much overhead when performing equality computations.  Python 3.0 from what i can gather uses an in-optimized scheme for equality comparisons.  So you write loops or compare them manually.  Once again no free lunch. The documentation can be found here: Ordering Comparisons in Python 3.0.

One last thing.  There is a special object called None.  It’s a special data type in Python in fact i think the only special data type.  None is equivalent to a Null pointer in C.  

This comes in handy if your list size is not known:

MyList = [None] * 50
Print (MyList)
[None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None]

The output makes me think of a Monty Python skit. See what I did there? While the comparison to a NULL pointer is correct the way in which it allocates memory and doesn’t limit the size of the list it allocates presets an initial size to allow for future indexing assignments. In this way, it kind of reminds me of malloc in C.  Purist please don’t shoot the messenger. 

Well, i got a little long in the tooth as they say.  See what i did again?  Teeth, Snakes and Python.

See y’all next week.

Until Then,

#iwishyouwater

@tctjr

Muzak To Blog By: various tunes by : Pink Martini, Pixies, Steve Miller.

Computing The Human Condition – Project Noumena (Part 1)

“I am putting myself to the fullest possible use, which is all I think any conscious entity can ever hope to do.” ~ HAL 9000

“If you want to make the world a better place take a look at yourself and then make a change.” ~ MJ.

First and foremost with this blog i trust everyone is safe.  The world is in an interesting place, space, and time both physically and dare i say collectively – mentally.

A Laundry List

Introduction

This past week we celebrated  Earth Day.  i believe i heard it was the 50th year of Earth Day.  While I applaud the efforts and longevity for a day we should have Earth Day every day.  Further just “thoughting” about or tweeting about Earth Day – while it may wake up your posterior lobe of the pituitary gland and secret some oxytocin – creating the warm fuzzies for you it really doesn’t create an action for furthering Earth Day.  (much like typing /giphy YAY! In Slack).

 As such, i decided to embark on a multipart blog that i have been “thinking” about what i call an Ecological Computing System.  Then the more i thought about it why stop at Ecology?   We are able to model and connect essentially anything, we now have models for the brain that while are coarse-grained can account for gross behaviors, we have tons of data on buying habits and advertisement data and everything is highly mobile and distributed.  Machine learning which can optimize, classify and predict with extremely high dimensionality is no longer an academic exercise.  

Thus, i suppose taking it one step further from ecology and what would differentiate it from other efforts is that <IT>  would actually attempt to provide a compute framework that would compute The Human Condition.  I am going to call this effort Project Noumena.  Kant the eminent thinker of 18th century Germany defined Noumena as a thing as it is in itself, as distinct from a thing as it is knowable by the senses through phenomenal attributes and proposed that the experience was a product of the mind.

My impetus for this are manifold:

  • i love the air, water, trees, and animals,
  • i am an active water person,
  • i want my children’s children’s children to know the wonder of staring at the azure skies, azure oceans and purple mountains,
  • Maybe technology will assist us in saving us from The Human Condition.

Timing

i have waited probably 15+ years to write about this ideation of such a system mainly due to the technological considerations were nowhere near where they needed to be and to be extremely transparent no one seemed to really think it was an issue until recently.  The pandemic seems to have been a global wakeup call that in fact, Humanity is fragile.  There are shortages of resources in the most advanced societies.  Further due to the recent awareness that the pollution levels appear (reported) to be subsiding as a function in the reduction of humans’ daily involvement within the environment. To that point over the past two years, there appears to be an uptake of awareness in how plastics are destroying our oceans.  This has a coupling effect that with the pandemic and other environmental concerns there could potentially be a food shortage due to these highly nonlinear effects.   This uptake in awareness has mainly been due to the usage of technology of mobile computing and social media which in and of itself probably couldn’t have existed without plastics and massive natural resource consumption.  So i trust the irony is not lost there.   

From a technical perspective, Open source and Open Source Systems have become the way that software is developed.  For those that have not read The Cathedral and The Bazaar and In The Beginning Was The Command Line i urge you to do so it will change your perspective.

We are no longer hampered by the concept of scale in computing. We can also create a system that behaves at scale with only but a few human resources.  You can do a lot with few humans now which has been the promise of computing.

Distributed computing methods are now coming to fruition. We no longer think in terms of a monolithic operating system or in place machine learning. Edge computing and fiber networks are accelerating this at an astonishing rate.  Transactions now dictate trust. While we will revisit this during the design chapters of the blog I’ll go out on a limb here and say these three features are cogent to distributed system processing (and possibly the future of computing at scale).

  • Incentive models
  • Consensus models
  • Protocol models

We will definitely be going into the deeper psychological, mathematical, and technical aspects of these items.

Some additional points of interest and on timing.  Microsoft recently released press about a Planetary Computer and announced the position of Chief Ecology Officer.  While i do not consider Project Nuomena to be of the same system type there could be similarities on the ecological aspects which just like in open source creates a more resilient base to work.

The top market cap companies are all information theoretic-based corporations.  Humans that know the science, technology, mathematics and liberal arts are key to their success.  All of these companies are woven and interwoven into the very fabric of our physical and psychological lives.

Thus it is with the confluence of these items i believe the time is now to embark on this design journey.  We must address the Environment, Societal factors and the model of governance.

A mentor once told me one time in a land far away: “Timing is everything as long as you can execute.”  Ergo Timing and Execution Is Everything.

Goals

It is my goal that i can create a design and hopefully, an implementation that is utilizing computational means to truly assist in building models and sampling the world where we can adhere to goals in making small but meaningful changes that can be used within what i am calling the 3R’s:  recycle, redact, reuse.  Further, i hope with the proper incentive models in place that are dynamic it has a mentality positive feedback effect.  Just as in complexity theory a small change – a butterfly wings – can create hurricanes – in this case positive effect. 

Here is my overall plan. i’m not big on the process or gant charts.  I’ll be putting all of this in a README.md as well.  I may ensconce the feature sets etc into a trello or some other tracking mechanism to keep me focused – WebSphere feel free to make recommendations in the comments section:

Action Items:

  • Create Comparative Models
  • Create Coarse-Grained Attributes
  • Identify underlying technical attributes
  • Attempt to coalesce into an architecture
  • Start writing code for the above.

Preamble

Humanity has come to expect growth as a material extension of human behavior.  We equate growth with progress.  In fact, we use the term exponential growth as it is indefinitely positive.  In most cases for a fixed time interval, this means a doubling of the relevant system variable or variables.  We speak of growth as a function of gross national production.  In most cases, exponential growth is treacherous where there are no known or perceived limits.  It appears that humanity has only recently become aware that we do not have infinite resources.  Psychologically there is a clash between the exponential growth and the psychological or physical limit.  The only significance is the relevant (usually local) limit.  How does it affect me, us, and them?  This can be seen throughput most game theory practices – dominant choice.  The pattern of growth is not the surprise it is the collision of the awareness of the limit to the ever-increasing growth function is the surprise.

One must stop and ask: 

Q: Are progress (and capacity) and the ever-increasing function a positive and how does it relate to 2nd law of thermodynamics aka Entropy?  Must it always expand?

We are starting to see that our world can exert dormant forces that within our life can greatly affect our well being. When we approach the actual or perceived limit the forces which are usually negative begin to gain strength.

So given these aspects of why i’ll turn now to start the discussion.  If we do not understand history we cannot predict the future by inventing it or in most cases re-inventing it as it where.

I want to start off the history by referencing several books that i have been reading and re-reading on subjects of modeling the world, complexity, and models for collapse throughout this multipart blog.  We will be addressing issues concerning complex dynamics as are manifested with respect to attributes model types, economics, equality, and mental concerns.  

These core references are located at the end of the blog under references.  They are all hot-linked.  Please go scroll and check them out.  i’ll still be here.  i’ll wait.

Checked them out?  i know a long list. 

As you can see the core is rather extensive due to the nature of the subject matter.  The top three books are the main ones that have been the prime movers and guides of my thinking.  These three books i will refer to as The Core Trilogy:

World Dynamics

The Collapse of Complex Societies 

Six Sources of Collapse 

 As i mentioned i have been deeply thinking about all aspects of this system for quite some time. I will be mentioning several other texts and references along the continuum of creation of this design.

We will start by referencing the first book: World Dynamics by J.W. Forrestor.  World Dynamics came out of several meetings of the Rome Club a 75 person invite-only club founded by the President of Fiat.  The club set forth the following attributes for a dynamic model that would attempt to predict the future of the world:

  • Population Growth
  • Capital Investment
  • Geographical Space
  • Natural Resources
  • Pollution
  • Food Production

The output of this design was codified in a computer program called World3.  It has been running since the 1970s what was then termed a golden age of society in many cases.  All of these variables have been growing at an exponential rate. Here we see the model with the various attributes in action. There have been several criticisms of the models and also analysis which i will go into in further blogs. However, in some cases, the variants have been eerily accurate. The following plot is an output of the World3 model:

2060 does not look good

Issues Raised By World3 and World Dynamics

The issues raised by World3 and within the book World Dynamics are the following:

  • There is a strong undercurrent that technology might not be the savior of humankind
  • Industrialism (including medicine and public health) may be a more disturbing force than the population.  
  • We may face extreme psychological stress and pressures from a four-pronged dilemma via suppression of the modern industrial world.
  • We may be living in a “golden age” despite a widely acknowledged feeling of malaise.  
  • Exhtortions and programs directed at population control may be self-defeating.  Population control, if it works, would yield excesses thereby allowing further procreation.
  • Pollution and Population seem to oscillate whereas the high standard of living increases the production of food and material goods which outrun the population.  Agriculture as it hits a space limit and as natural resources reach a pollution limit then the quality of life falls in equalizing population.
  • There may be no realistic hope of underdeveloped countries reaching the same standard and quality of life as developed countries.  However, with the decline in developed countries, the underdeveloped countries may be equalized by that decline.
  • A society with a high level of industrialization may be unsustainable.  
  • From a long term 100 years hence it may be unwise for underdeveloped countries to seek the same levels of industrialization.  The present underdeveloped nations may be in better conditions for surviving the forthcoming pressures.  These underdeveloped countries would suffer far less in a world collapse.  

Fuzzy Human – Fuzzy Model

The human mind is amazing at identifying structures of complex situations. However, our experiences train us poorly for estimating the dynamic consequences of said complexities.  Our mind is also not very accurate at estimating ad hoc parts of the complexities and the variational outcomes.  

One of the problems with models is well it is just a model  The subject-observer reference could shift and the context shifts thereof.  This dynamic aspect needs to be built into the models.

Also while we would like to think that our mental model is accurate it is really quite fuzzy and even irrational in most cases.  Also attempting to generalize everything into a singular model parameter is exceedingly difficult.  It is very difficult to transfer one industry model onto another.  

In general parameterization of most of these systems is based on some perceptual model we have rationally or irrationally invented.  

When these models were created there was the consideration of modeling social mechanics of good-evil, greed – altruism, fears, goals, habits, prejudice, homeostasis, and other so-called human characteristics.  We are now at a level of science where we can actually model the synaptic impulse and other aspects that come with these perceptions and emotions.

There is a common cross-cutting construct in most complex models within this text that consists of and mainly concerned with the concept of feedback and how the non-linear relationships of these modeled systems feedback into one another.  System-wide thinking permeates the text itself.  On a related note from the 1940’s of which Dr Norbert Weiner and others such as Claude Shannon worked on ballistic tracking systems and coupled feedback both in a cybernetic and information-theoretic fashion of which he attributed the concept of feedback as one of the most fundamental operations in information theory.  This led to the extremely famous Weiner Estimation Filters.  Also, side note: Dr Weiner was a self-styled pacifist proving you can hold two very opposing views in the same instance whilst being successful at executing both ideals.   

Given that basic function of feedback, lets look at the principle structures.  Essentially the model states there will be levels and rates.  Rates are flows that cause levels to change.  Levels can accumulate the net level. Either addition or subtraction to that level.  The various system levels can in aggregate describe the system state at any given time (t).  Levels existing in all subsystems of existence.  These subsystems as you will see include but are not limited to financial, psychological, biological, and economic.   The reason that i say not limited to because i also believe there are some yet to be identified subsystems at the quantum level.  The differential or rate of flow is controlled by one or more systems.  All systems that have some Spatio-temporal manifestation can be represented by using the two variables levels and rates.  Thus with respect to the spatial or temporal variables, we can have a dynamic model.  

The below picture is the model that grew out of interest from the initial meetings of the Club of Rome.  The inaugural meeting which was the impetus for the model was held in Bern, Switzerland on June 29, 1970.  Each of the levels presents a variable in the previously mentioned major structures. System levels appear as right triangles.  Each level is increased or decreased by the respective flow.  As previously mentioned on feedback any closed path through the diagram is a feedback loop.  Some of the closed loops given certain information-theoretic attributes be positive feedback loops that generate growth and others that seek equilibrium will be negative feedback loops.  If you notice something about the diagram it essentially is a birth and death loop. The population loop if you will.  For the benefit of modeling, there are really only two major variables that affect the population.  Birth Rate (BR) and Death Rate (DR).  They represent the total aggregate rate at which the population is being increased or decreased.  The system has coefficients that can initialize them to normal rates.  For example, in 1970 BRN is taken as 0.0885 (88.5 per thousand) which is then multiplied by population to determine BR.  DRN by the same measure is the outflow or reduction.  In 1970 it was 9.5% or 0.095.  The difference is the net and called normal rates.  The normale rates correspond to a physical normal world.  When there are normal levels of food, material standard of living, crowding, and pollution.  The influencers are then multipliers that increase or decrease the normal rates.

Feedback and isomorphisms abound


As a caveat, there have been some detractors of this model. To be sure it is very coarse-grained however while i haven’t seen the latest runs or outputs it is my understanding as i said the current outputs are close. The criticisms come in the shape of “Well its just modeling everything as a y=x*e^{{rt}}. I will be using this concept and map if you will as the basis for Noumena.  The concepts and values as i evolve the system will vary greatly from the World3 model but i believe starting with a minimum viable product is essential here as i said humans are not very good at predicting all of the various outcomes in high dimensional space. We can asses situations very quickly but probably outcomes no so much. Next up we will be delving into the loops deeper and getting loopier.

So this is the first draft if you will as everything nowadays can be considered an evolutionary draft.  

Then again isn’t really all of this just  The_Inifinite_Human_Do_Loop?

until then,

#iwishyouwater

tctjr

References:

(Note: They are all hotlinked)

World Dynamics

The Collapse of Complex Societies 

Six Sources of Collapse 

Beyond The Limits 

The Limits To Growth 

Thinking In Systems Donella Meadows

Designing Distributed Systems Brendan Burns

Introduction to Distributed Algorithms 

A Pragmatic Introduction to Secure Multi-Party Computation 

Reliable Secure Distributed Programming 

Distributed Algorithms 

Dynamic General Equilibrium Modeling 

Advanced Information Systems Engineering 

Introduction to Dynamic Systems Modeling 

Nonlinear Dynamics and Chaos 

Technological Revolutions and Financial Capital 

Marginalism and Discontinuity 

How Nature Works 

Complexity and The Economy 

Complexity a Guided Tour

Future Shock 

Agent_Zero 

Nudge Theory In Action

The Structure of Scientific Revolutions

Agent-Based Modelling In Economics

Cybernetics

Human Use Of Human Beings

The Technological Society

The Origins Of Order

The Lorax

Blog Muzak: Brain and Roger Eno: Mixing Colors