diff --git a/README.md b/README.md index 76037f6..15ca0ad 100644 --- a/README.md +++ b/README.md @@ -1,32 +1,68 @@ -# goliath +# Goliath ### Authors + Manikandan Swaminathan, Logan Pulley, Deepan Venkatesh, Ilie Vartic, Zachary Oldham ### Abstract -This package enables python coders to build "multi-threaded" programs and optimize their data processing. + +This package enables Python to offload sets of function calls to pools of remote worker processes. ### Details -Oftentimes, python coders will need to handle large amounts of data or tasks. Ideally, they would be able to utilize the thread-based model when the data processing could be separated into independent chunks. -However, python's support for concurrency is essentially fake. Python's substitute for the thread model is a turn-based system where different "threads" take turns running at a time. This can be frustrating for programmers trying to implement actual thread-based programs. +When handling large sets of data, the thread-pool model can often do wonders for parallelizing and thus speeding up a program. However, Python's native support for concurrency is more like _polling_ than _threading_; it doesn't properly take advantage of multiple CPU cores. This can be frustrating when working in Python with a task that would be easily threadable in other languages. + +Goliath enables Python to distribute function calls over a set of servers. This essentially simulates the thread-pool model as a pool of servers, each maintaining a pool of Python worker processes. Additionally, these servers can be reached over the Internet, enabling a many-to-many relationship between clients requesting work and servers providing workers; one client can have work distributed across multiple servers, and each server can handle work from multiple clients. + +Goliath abstracts this entire model and aggregates the results from the servers, finally returning the list of results to the coder. -goliath is a python package which enables programmers to distribute operations over a variable number of remote servers, which are in turn specified by the coder. This essentially simulates the "thread-based model", but instead replaces each thread with an independent process on a server. By using remote servers, goliath enables a many-to-many relationship where multiple clients can have associated processes on the same server, while one client can also have processes distributed across multiple servers. +## Requirements -goliath abstracts the communication with the servers and aggregates the results from each server's processes, finally returning the processed data to the coder. +- Python 3.8 + +## Installation + +Install with `pip`: -### Installation: -Run: `pip install goliath` -Then, in python script: -`from goliath import commander` +## Usage + +### Sending work (Commander) + +```py +# foo.py + +from goliath.commander import Commander + +# Create a commander (doesn't connect yet) +cmdr = Commander([ + # Lieutenants can be hostnames, domains, IPs + ('lieutenant-hostname', 8080), + ('lieutenant.com', 3333), + ('127.0.0.1', 2222) +]) + +# The function to execute +def foo(bar, baz): + return str(bar) + str(baz) + +# Function to generate list of arguments to try +def foo_args(bar_range, baz_range): + for bar in bar_range: + for baz in baz_range: + yield { 'bar': bar, 'baz': baz } + +# Connect to lieutenants, run all the functions, and get results +results = cmdr.run(foo, foo_args(range(100), range(100)), ['foo.py']) +``` + +### Performing work (Lieutenant & Worker) -The commander module contains the interface with which the programmer must interact. +To run a lieutenant on this machine on port 3333 with 8 worker processes: -Prerequisites: -+ Python3.8 +`python3.8 -m goliath.lieutenant localhost 3333 8` -### Licensing +## Licensing -goliath is open-source software, licensed under GNU's Lesser GPL. +Goliath is open-source software, licensed under GNU's Lesser GPL.