Skip to content

Commit

Permalink
Big code refactoring & improvement
Browse files Browse the repository at this point in the history
  • Loading branch information
nbulaj committed Aug 18, 2017
1 parent 8a51da7 commit d2cd0c6
Show file tree
Hide file tree
Showing 20 changed files with 354 additions and 216 deletions.
38 changes: 22 additions & 16 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@
[![Gem Version](https://badge.fury.io/rb/proxy_fetcher.svg)](http://badge.fury.io/rb/proxy_fetcher)
[![Build Status](https://travis-ci.org/nbulaj/proxy_fetcher.svg?branch=master)](https://travis-ci.org/nbulaj/proxy_fetcher)
[![Coverage Status](https://coveralls.io/repos/github/nbulaj/proxy_fetcher/badge.svg)](https://coveralls.io/github/nbulaj/proxy_fetcher)
[![Code Climate](https://codeclimate.com/github/nbulaj/proxy_fetcher/badges/gpa.svg)](https://codeclimate.com/github/nbulaj/proxy_fetcher)
[![License](http://img.shields.io/badge/license-MIT-brightgreen.svg)](#license)

This gem can help your Ruby application to make HTTP(S) requests from proxy by fetching and validating actual
Expand All @@ -15,7 +16,7 @@ at the documentation below to find all the gem features.
If using bundler, first add 'proxy_fetcher' to your Gemfile:

```ruby
gem 'proxy_fetcher', '~> 0.2'
gem 'proxy_fetcher', '~> 0.3'
```

or if you want to use the latest version (from `master` branch), then:
Expand All @@ -33,7 +34,7 @@ bundle install
Otherwise simply install the gem:

```sh
gem install proxy_fetcher -v '0.2'
gem install proxy_fetcher -v '0.3'
```

## Example of usage
Expand Down Expand Up @@ -84,7 +85,7 @@ Every proxy is a `ProxyFetcher::Proxy` object that has next readers (instance va
* `response_time` (5217 for example)
* `speed` (`:slow`, `:medium` or `:fast`. **Note:** depends on the proxy provider and can be `nil`)
* `type` (URI schema, HTTP or HTTPS)
* `anonimity` (Low or High +KA for example)
* `anonymity` (`Low`, `Elite proxy` or `High +KA` for example)

Also you can call next instance methods for every Proxy object:

Expand All @@ -99,13 +100,15 @@ You can use two methods to get the first proxy from the list:
* `get` or aliased `pop` (will return first proxy and move it to the end of the list)
* `get!` or aliased `pop!` (will return first **connectable** proxy and move it to the end of the list; all the proxies till the working one will be removed)

If you wanna clear current proxy manager list from dead servers, you can just call `cleanup!` method:
Or you can get just random proxy by calling `manager.random_proxy` or it's alias `manager.random`.

If you wanna clean current proxy list from some dead servers that does not respond to the requests, than you can just call `cleanup!` method:

```ruby
manager.cleanup! # or manager.validate!
```

You can sort or find any proxy by speed using next 3 instance methods:
Also you can sort or find any proxy by speed using next 3 instance methods (if it is available for the specific provider):

* `fast?`
* `medium?`
Expand All @@ -117,26 +120,27 @@ To change open/read timeout for `cleanup!` and `connectable?` methods you need t

```ruby
ProxyFetcher.configure do |config|
config.read_timeout = 1 # default is 3
config.open_timeout = 1 # default is 3
config.connection_timeout = 1 # default is 3
end

manager = ProxyFetcher::Manager.new
manager.cleanup!
```

ProxyFetcher uses simple Ruby solution for dealing with HTTP requests - `net/http` library. If you wanna add, for example, your custom provider that
was developed as a Single Page Application (SPA) with some JavaScript, then you will need something like []selenium-webdriver](https://github.com/SeleniumHQ/selenium/tree/master/rb)
ProxyFetcher uses simple Ruby solution for dealing with HTTP(S) requests - `net/http` library from the stdlib. If you wanna add, for example, your custom provider that
was developed as a Single Page Application (SPA) with some JavaScript, then you will need something like [selenium-webdriver](https://github.com/SeleniumHQ/selenium/tree/master/rb)
to properly load the content of the website. For those and other cases you can write your own class for fetching HTML content by the URL and setup it
in the ProxyFetcher config:

```ruby
class MyHTTPClient
class << self
# [IMPORTANT]: self.fetch method is required!
def fetch(url)
# ... some magic to return proper HTML ...
end
# [IMPORTANT]: below methods are required!
def self.fetch(url)
# ... some magic to return proper HTML ...
end

def self.connectable?(url)
# ... some magic to check if url is connectable ...
end
end

Expand All @@ -149,6 +153,8 @@ manager.proxies
# @response_time=5217, @speed=48, @type="HTTP", @anonymity="High">, ... ]
```

You can take a look at the [lib/proxy_fetcher/utils/http_client.rb](lib/proxy_fetcher/utils/http_client.rb) for an example.

## Providers

Currently ProxyFetcher can deal with next proxy providers (services):
Expand Down Expand Up @@ -176,8 +182,8 @@ Also you can write your own provider. All you need is to create a class, that wo
ProxyFetcher::Configuration.register_provider(:your_provider, YourProviderClass)
```

Provider class must implement `self.load_proxy_list` and `#parse!(html_entry)` methods that will load and parse
provider HTML page with proxy list. Take a look at the samples in the `proxy_fetcher/providers` directory.
Provider class must implement `self.load_proxy_list` and `#to_proxy(html_element)` methods that will load and parse
provider HTML page with proxy list. Take a look at the existing providers in the [lib/proxy_fetcher/providers](lib/proxy_fetcher/providers) directory.

## TODO

Expand Down
8 changes: 7 additions & 1 deletion lib/proxy_fetcher.rb
Original file line number Diff line number Diff line change
@@ -1,16 +1,22 @@
require 'uri'
require 'net/http'
require 'openssl'
require 'nokogiri'
require 'ostruct'

require 'proxy_fetcher/configuration'
require 'proxy_fetcher/proxy'
require 'proxy_fetcher/manager'
require 'proxy_fetcher/utils/http_fetcher'

require 'proxy_fetcher/utils/http_client'
require 'proxy_fetcher/utils/html'

require 'proxy_fetcher/providers/base'
require 'proxy_fetcher/providers/free_proxy_list'
require 'proxy_fetcher/providers/free_proxy_list_ssl'
require 'proxy_fetcher/providers/hide_my_name'
require 'proxy_fetcher/providers/proxy_docker'
require 'proxy_fetcher/providers/proxy_list'
require 'proxy_fetcher/providers/xroxy'

module ProxyFetcher
Expand Down
24 changes: 18 additions & 6 deletions lib/proxy_fetcher/configuration.rb
Original file line number Diff line number Diff line change
Expand Up @@ -2,25 +2,29 @@ module ProxyFetcher
class Configuration
UnknownProvider = Class.new(StandardError)
RegisteredProvider = Class.new(StandardError)
WrongHttpClient = Class.new(StandardError)

attr_accessor :open_timeout, :read_timeout, :provider
attr_accessor :http_client
attr_accessor :http_client, :connection_timeout
attr_accessor :provider

class << self
def providers
@providers ||= {}
end

def register_provider(name, klass)
raise RegisteredProvider, "#{name} provider already registered!" if providers.key?(name.to_sym)
raise RegisteredProvider, "`#{name}` provider already registered!" if providers.key?(name.to_sym)

providers[name.to_sym] = klass
end
end

def initialize
@open_timeout = 3
@read_timeout = 3
reset!
end

def reset!
@connection_timeout = 3
@http_client = HTTPClient

self.provider = :hide_my_name # currently default one
Expand All @@ -29,7 +33,15 @@ def initialize
def provider=(name)
@provider = self.class.providers[name.to_sym]

raise UnknownProvider, "unregistered proxy provider (#{name})!" if @provider.nil?
raise UnknownProvider, "unregistered proxy provider `#{name}`!" if @provider.nil?
end

def http_client=(klass)
unless klass.respond_to?(:fetch, :connectable?)
raise WrongHttpClient, "#{klass} must respond to #fetch and #connectable? class methods!"
end

@http_client = klass
end
end
end
7 changes: 4 additions & 3 deletions lib/proxy_fetcher/manager.rb
Original file line number Diff line number Diff line change
Expand Up @@ -14,8 +14,7 @@ def initialize(refresh: true)

# Update current proxy list from the provider
def refresh_list!
rows = ProxyFetcher.config.provider.load_proxy_list
@proxies = rows.map { |row| Proxy.new(row) }
@proxies = ProxyFetcher.config.provider.fetch_proxies!
end

alias fetch! refresh_list!
Expand Down Expand Up @@ -56,10 +55,12 @@ def cleanup!
alias validate! cleanup!

# Return random proxy
def random
def random_proxy
proxies.sample
end

alias random random_proxy

# Returns array of proxy URLs (just schema + host + port)
def raw_proxies
proxies.map(&:url)
Expand Down
48 changes: 36 additions & 12 deletions lib/proxy_fetcher/providers/base.rb
Original file line number Diff line number Diff line change
@@ -1,25 +1,49 @@
require 'forwardable'

module ProxyFetcher
module Providers
class Base
attr_reader :proxy
extend Forwardable

def initialize(proxy_instance)
@proxy = proxy_instance
end
def_delegators ProxyFetcher::HTML, :clear, :convert_to_int

PROXY_TYPES = [
HTTP = 'HTTP'.freeze,
HTTPS = 'HTTPS'.freeze
].freeze

def set!(name, value)
@proxy.instance_variable_set(:"@#{name}", value)
attr_reader :proxy

def fetch_proxies!
load_proxy_list.map { |html| to_proxy(html) }
end

class << self
def parse_entry(entry, proxy_instance)
new(proxy_instance).parse!(entry)
def fetch_proxies!
new.fetch_proxies!
end
end

# Get HTML from the requested URL
def load_html(url)
ProxyFetcher.config.http_client.fetch(url)
end
protected

# Get HTML from the requested URL
def load_html(url)
ProxyFetcher.config.http_client.fetch(url)
end

# Get HTML elements with proxy info
def load_proxy_list(*)
raise NotImplementedError, "#{__method__} must be implemented in a descendant class!"
end

# Convert HTML element with proxy info to ProxyFetcher::Proxy instance
def to_proxy(*)
raise NotImplementedError, "#{__method__} must be implemented in a descendant class!"
end

# Return normalized HTML element content by selector
def parse_element(element, selector, method = :at_xpath)
clear(element.public_send(method, selector).content)
end
end
end
Expand Down
42 changes: 13 additions & 29 deletions lib/proxy_fetcher/providers/free_proxy_list.rb
Original file line number Diff line number Diff line change
Expand Up @@ -3,42 +3,26 @@ module Providers
class FreeProxyList < Base
PROVIDER_URL = 'https://free-proxy-list.net/'.freeze

class << self
def load_proxy_list
doc = Nokogiri::HTML(load_html(PROVIDER_URL))
doc.xpath('//table[@id="proxylisttable"]/tbody/tr')
end
def load_proxy_list
doc = Nokogiri::HTML(load_html(PROVIDER_URL))
doc.xpath('//table[@id="proxylisttable"]/tbody/tr')
end

def parse!(html_entry)
html_entry.xpath('td').each_with_index do |td, index|
case index
when 0
set!(:addr, td.content.strip)
when 1 then
set!(:port, Integer(td.content.strip))
when 3 then
set!(:country, td.content.strip)
when 4
set!(:anonymity, td.content.strip)
when 6
set!(:type, parse_type(td))
else
# nothing
end
def to_proxy(html_element)
ProxyFetcher::Proxy.new.tap do |proxy|
proxy.addr = parse_element(html_element, 'td[1]')
proxy.port = convert_to_int(parse_element(html_element, 'td[2]'))
proxy.country = parse_element(html_element, 'td[4]')
proxy.anonymity = parse_element(html_element, 'td[5]')
proxy.type = parse_type(html_element)
end
end

private

def parse_type(td)
type = td.content.strip

if type && type.downcase.include?('yes')
'HTTPS'
else
'HTTP'
end
def parse_type(element)
type = parse_element(element, 'td[6]')
type && type.casecmp('yes').zero? ? HTTPS : HTTP
end
end

Expand Down
31 changes: 10 additions & 21 deletions lib/proxy_fetcher/providers/free_proxy_list_ssl.rb
Original file line number Diff line number Diff line change
Expand Up @@ -3,29 +3,18 @@ module Providers
class FreeProxyListSSL < Base
PROVIDER_URL = 'https://www.sslproxies.org/'.freeze

class << self
def load_proxy_list
doc = Nokogiri::HTML(load_html(PROVIDER_URL))
doc.xpath('//table[@id="proxylisttable"]/tbody/tr')
end
def load_proxy_list
doc = Nokogiri::HTML(load_html(PROVIDER_URL))
doc.xpath('//table[@id="proxylisttable"]/tbody/tr')
end

def parse!(html_entry)
html_entry.xpath('td').each_with_index do |td, index|
case index
when 0
set!(:addr, td.content.strip)
when 1 then
set!(:port, Integer(td.content.strip))
when 3 then
set!(:country, td.content.strip)
when 4
set!(:anonymity, td.content.strip)
when 6
set!(:type, 'HTTPS')
else
# nothing
end
def to_proxy(html_element)
ProxyFetcher::Proxy.new.tap do |proxy|
proxy.addr = parse_element(html_element, 'td[1]')
proxy.port = convert_to_int(parse_element(html_element, 'td[2]'))
proxy.country = parse_element(html_element, 'td[4]')
proxy.anonymity = parse_element(html_element, 'td[5]')
proxy.type = HTTPS
end
end
end
Expand Down
Loading

0 comments on commit d2cd0c6

Please sign in to comment.