-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
1 changed file
with
51 additions
and
15 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,32 +1,68 @@ | ||
# goliath | ||
# Goliath | ||
|
||
### Authors | ||
|
||
Manikandan Swaminathan, Logan Pulley, Deepan Venkatesh, Ilie Vartic, Zachary Oldham | ||
|
||
### Abstract | ||
This package enables python coders to build "multi-threaded" programs and optimize their data processing. | ||
|
||
This package enables Python to offload sets of function calls to pools of remote worker processes. | ||
|
||
### Details | ||
|
||
Oftentimes, python coders will need to handle large amounts of data or tasks. Ideally, they would be able to utilize the thread-based model when the data processing could be separated into independent chunks. | ||
However, python's support for concurrency is essentially fake. Python's substitute for the thread model is a turn-based system where different "threads" take turns running at a time. This can be frustrating for programmers trying to implement actual thread-based programs. | ||
When handling large sets of data, the thread-pool model can often do wonders for parallelizing and thus speeding up a program. However, Python's native support for concurrency is more like _polling_ than _threading_; it doesn't properly take advantage of multiple CPU cores. This can be frustrating when working in Python with a task that would be easily threadable in other languages. | ||
|
||
Goliath enables Python to distribute function calls over a set of servers. This essentially simulates the thread-pool model as a pool of servers, each maintaining a pool of Python worker processes. Additionally, these servers can be reached over the Internet, enabling a many-to-many relationship between clients requesting work and servers providing workers; one client can have work distributed across multiple servers, and each server can handle work from multiple clients. | ||
|
||
Goliath abstracts this entire model and aggregates the results from the servers, finally returning the list of results to the coder. | ||
|
||
goliath is a python package which enables programmers to distribute operations over a variable number of remote servers, which are in turn specified by the coder. This essentially simulates the "thread-based model", but instead replaces each thread with an independent process on a server. By using remote servers, goliath enables a many-to-many relationship where multiple clients can have associated processes on the same server, while one client can also have processes distributed across multiple servers. | ||
## Requirements | ||
|
||
goliath abstracts the communication with the servers and aggregates the results from each server's processes, finally returning the processed data to the coder. | ||
- Python 3.8 | ||
|
||
## Installation | ||
|
||
Install with `pip`: | ||
|
||
### Installation: | ||
Run: | ||
`pip install goliath` | ||
|
||
Then, in python script: | ||
`from goliath import commander` | ||
## Usage | ||
|
||
### Sending work (Commander) | ||
|
||
```py | ||
# foo.py | ||
|
||
from goliath.commander import Commander | ||
|
||
# Create a commander (doesn't connect yet) | ||
cmdr = Commander([ | ||
# Lieutenants can be hostnames, domains, IPs | ||
('lieutenant-hostname', 8080), | ||
('lieutenant.com', 3333), | ||
('127.0.0.1', 2222) | ||
]) | ||
|
||
# The function to execute | ||
def foo(bar, baz): | ||
return str(bar) + str(baz) | ||
|
||
# Function to generate list of arguments to try | ||
def foo_args(bar_range, baz_range): | ||
for bar in bar_range: | ||
for baz in baz_range: | ||
yield { 'bar': bar, 'baz': baz } | ||
|
||
# Connect to lieutenants, run all the functions, and get results | ||
results = cmdr.run(foo, foo_args(range(100), range(100)), ['foo.py']) | ||
``` | ||
|
||
### Performing work (Lieutenant & Worker) | ||
|
||
The commander module contains the interface with which the programmer must interact. | ||
To run a lieutenant on this machine on port 3333 with 8 worker processes: | ||
|
||
Prerequisites: | ||
+ Python3.8 | ||
`python3.8 -m goliath.lieutenant localhost 3333 8` | ||
|
||
### Licensing | ||
## Licensing | ||
|
||
goliath is open-source software, licensed under GNU's Lesser GPL. | ||
Goliath is open-source software, licensed under GNU's Lesser GPL. |