Skip to content

Commit

Permalink
Additions to README
Browse files Browse the repository at this point in the history
  • Loading branch information
lpulley committed Mar 1, 2020
1 parent ecc236b commit c9ed5d7
Showing 1 changed file with 51 additions and 15 deletions.
66 changes: 51 additions & 15 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,32 +1,68 @@
# goliath
# Goliath

### Authors

Manikandan Swaminathan, Logan Pulley, Deepan Venkatesh, Ilie Vartic, Zachary Oldham

### Abstract
This package enables python coders to build "multi-threaded" programs and optimize their data processing.

This package enables Python to offload sets of function calls to pools of remote worker processes.

### Details

Oftentimes, python coders will need to handle large amounts of data or tasks. Ideally, they would be able to utilize the thread-based model when the data processing could be separated into independent chunks.
However, python's support for concurrency is essentially fake. Python's substitute for the thread model is a turn-based system where different "threads" take turns running at a time. This can be frustrating for programmers trying to implement actual thread-based programs.
When handling large sets of data, the thread-pool model can often do wonders for parallelizing and thus speeding up a program. However, Python's native support for concurrency is more like _polling_ than _threading_; it doesn't properly take advantage of multiple CPU cores. This can be frustrating when working in Python with a task that would be easily threadable in other languages.

Goliath enables Python to distribute function calls over a set of servers. This essentially simulates the thread-pool model as a pool of servers, each maintaining a pool of Python worker processes. Additionally, these servers can be reached over the Internet, enabling a many-to-many relationship between clients requesting work and servers providing workers; one client can have work distributed across multiple servers, and each server can handle work from multiple clients.

Goliath abstracts this entire model and aggregates the results from the servers, finally returning the list of results to the coder.

goliath is a python package which enables programmers to distribute operations over a variable number of remote servers, which are in turn specified by the coder. This essentially simulates the "thread-based model", but instead replaces each thread with an independent process on a server. By using remote servers, goliath enables a many-to-many relationship where multiple clients can have associated processes on the same server, while one client can also have processes distributed across multiple servers.
## Requirements

goliath abstracts the communication with the servers and aggregates the results from each server's processes, finally returning the processed data to the coder.
- Python 3.8

## Installation

Install with `pip`:

### Installation:
Run:
`pip install goliath`

Then, in python script:
`from goliath import commander`
## Usage

### Sending work (Commander)

```py
# foo.py

from goliath.commander import Commander

# Create a commander (doesn't connect yet)
cmdr = Commander([
# Lieutenants can be hostnames, domains, IPs
('lieutenant-hostname', 8080),
('lieutenant.com', 3333),
('127.0.0.1', 2222)
])

# The function to execute
def foo(bar, baz):
return str(bar) + str(baz)

# Function to generate list of arguments to try
def foo_args(bar_range, baz_range):
for bar in bar_range:
for baz in baz_range:
yield { 'bar': bar, 'baz': baz }

# Connect to lieutenants, run all the functions, and get results
results = cmdr.run(foo, foo_args(range(100), range(100)), ['foo.py'])
```

### Performing work (Lieutenant & Worker)

The commander module contains the interface with which the programmer must interact.
To run a lieutenant on this machine on port 3333 with 8 worker processes:

Prerequisites:
+ Python3.8
`python3.8 -m goliath.lieutenant localhost 3333 8`

### Licensing
## Licensing

goliath is open-source software, licensed under GNU's Lesser GPL.
Goliath is open-source software, licensed under GNU's Lesser GPL.

0 comments on commit c9ed5d7

Please sign in to comment.