An algorithm for sustainability: Ecological clustering - Digital Defiant Studios Seattle Web development & Design by Chris Tabor

After reading Fritjof Capras' “The Hidden Connectons” I came away with some very interesting insights about systems theory and how it relates to ecology, political governance, and our increasing struggle with global capitalism (and globalism in general).

This post is about a particular aspect of that book, outlined briefly towards the end, what I’ve called “ecological clustering”. The background is this: businesses tend to produce waste and consume resources, materials, etc…, but are operating in isolation. This creates immense waste and environmental destruction, because no business operates as a system, when in fact it should.

The idea behind one solution, outlined int he book, is simple: businesses form symbiotic relationships with each other, forming ‘communities’ based on their resources – from their inputs (materials) to their outputs (waste). This idea hinges on cooperation, and it’s a very forward thinking methodology. Essentially, companies cooperate and agree to exchange materials, where one material may be trash for one company, and treasure for another.

My idea here has been to turn this into a programmatic problem, and create an algorithm that will find all useful relationships and map them such that you can derive a list where waste OUTPUTS for one company are mapped to material INPUTS for another. This gives you a picture of companies that should be working together.

Examples

Let’s make a few toy examples, that may actually have some real parallels.

“Starbocks”

Inputs: “Coffee (Beans)”, “Water”, “Milk”
Outputs: “Coffee grounds”, “Wastewater”, “Cardboard”, “Food scraps”

“Composting Northwest”

Inputs: “Coffee grounds”, “Food scraps”
Outputs: “Soil”, “Fertilizer”, “Earthworms”

“Sustainable Farms”

Inputs: “Earthworms”, “Chicken feed”, “Cardboard”, “Water”, “Fertilizer”
Outputs: “Chicken meat”, “Chicken eggs”, “Wastewater”, “Food scraps”

“Wastewater treatment Northwest (WTNW)”

Inputs: “Wastewater”
Outputs: “Water”

Now, the algorithm is pretty straightforward. The most important aspect is that we’ll be using an inverted index to create our mapping, which allows single keys (in our case inputs and outputs) to map to all their locations (their occurrences, here).

for all companies,
  make an inverted index of the input and the company associated with it
  make an inverted index of the output and the company associated with it

Then, make a list of pairs of companies where one companies output is another companies output, and vice versa.

Encoding our data

The next important step is to encode the data properly, so we can actually use it in a program. Converting our previous examples into a proper, Pythonic format, we get:

relationships = {
    'in': {
        'coffee': ['starbocks'],
        'coffee-grounds': ['composting northwest'],
        'water': ['sustainable farms', 'starbocks'],
        'wastewater': ['WTNW'],
        'cardboard': ['sustainable farms'],
        'milk': ['starbocks'],
        'chicken-meat': [],
        'chicken-eggs': [],
        'earthworms': ['sustainable farms'],
        'fertilizer': ['sustainable farms'],
        'soil': [],
        'food scraps': [],
    },
    'out': {
        'coffee': [],
        'coffee-grounds': ['starbocks'],
        'water': ['WTNW'],
        'wastewater': ['starbocks', 'sustainable farms'],
        'cardboard': ['starbocks'],
        'milk': [],
        'chicken-meat': ['sustainable farms'],
        'chicken-eggs': ['sustainable farms'],
        'earthworms': ['composting northwest'],
        'fertilizer': ['composting northwest'],
        'soil': ['composting northwest'],
        'food scraps': ['starbocks', 'sustainable farms'],
    },
}

The algorithm

def generate_relationships(d):
    matches = []
    pairs = []

    for _input, companies in d['in'].iteritems():
        for _output, companies2 in d['out'].iteritems():
            if _input == _output:
                matches.append((companies, companies2, _input, _output))
    for c1, c2, inputs, outputs in matches:
        if not all([c1, c2]):
            continue
        if c1 != c2:
            pairs.append((c1, c2, inputs, outputs))
            print('{c2} <=outputs=> "{output}" so it should pair with {c1} '
                  'which =>inputs<= "{input}"'.format(
                   c1=c1, output=outputs, c2=c2, input=inputs))
    return matches


if __name__ == '__main__':
    generate_relationships(relationships)

The above function takes a dictionary (key-value pair) as input, finds the input relationships, and maps them with output relationships, for all companies in each input list.

So, here’s an example of the output when run against our previous data set:

['WTNW'] <=outputs=> "water" so it should pair with ['sustainable farms', 'starbocks'] which =>inputs<= "water"
['composting northwest'] <=outputs=> "fertilizer" so it should pair with ['sustainable farms'] which =>inputs<= "fertilizer"
['starbocks'] <=outputs=> "cardboard" so it should pair with ['sustainable farms'] which =>inputs<= "cardboard"
['composting northwest'] <=outputs=> "earthworms" so it should pair with ['sustainable farms'] which =>inputs<= "earthworms"
['starbocks'] <=outputs=> "coffee-grounds" so it should pair with ['composting northwest'] which =>inputs<= "coffee-grounds"
['starbocks', 'sustainable farms'] <=outputs=> "wastewater" so it should pair with ['WTNW'] which =>inputs<= "wastewater"

Very useful! This could of course be manually done, but the point is to automate it en masse, and find unexpected relationships across a swathe of industries.

Parallelizable work

This is an example of an algorithm that would be considered “embarrassingly parallel”, because the format of the algorithm, and the fact it operates on structured data sets, means it can be partitioned up into sections and then split across any number of machines to be done in parallel.

For example, you could take this algorithm and plug it into the MapReduce paradigm, where you could then map the work across a cluster of computers, and then reduce the results to the input/output relationships.

This would make finding insights amazingly easy, and could help find novel solutions to sustainability!