diff --git a/.coveralls.yml b/.coveralls.yml
new file mode 100644
index 00000000..18e6b205
--- /dev/null
+++ b/.coveralls.yml
@@ -0,0 +1,3 @@
+coverage_clover: test/clover.xml
+json_path: test/coveralls-upload.json
+service_name: travis-ci
\ No newline at end of file
diff --git a/.travis.yml b/.travis.yml
index 18105c78..a80e0947 100644
--- a/.travis.yml
+++ b/.travis.yml
@@ -1,6 +1,7 @@
language: php
-install: composer install
+install:
+ - composer install
php:
- "5.6"
@@ -8,4 +9,11 @@ php:
- "7.1"
- "7.2"
+script:
+ - ./vendor/bin/phpunit --coverage-clover ./test/clover.xml
+
+after_script:
+ - composer require php-coveralls/php-coveralls:^2.0
+ - php ./vendor/php-coveralls/php-coveralls/bin/php-coveralls -v
+
sudo: false
\ No newline at end of file
diff --git a/CHANGELOG.md b/CHANGELOG.md
index fa71696d..708342d4 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -3,9 +3,17 @@ All notable changes to this project will be documented in this file.
## Unreleased
+- Merged PR#49 (Missing object when calling `->getContent()`)
+- Imported all changes from Readability.js as of 2 March 2018 ([8525c6a](https://github.com/mozilla/readability/commit/8525c6af36d3badbe27c4672a6f2dd99ddb4097f)):
+ - Check for `` elements before converting URLs to absolute.
+ - Clean `` tags on `prepArticle()`
+ - Attempt to return at least some text if all the algorithm runs fail (Check PR [#423](https://github.com/mozilla/readability/pull/423) on JS version)
+ - Add new test cases for the previous changes
+ - And all other changes reflected [in this diff](https://github.com/mozilla/readability/compare/c3ff1a2d2c94c1db257b2c9aa88a4b8fbeb221c5...8525c6af36d3badbe27c4672a6f2dd99ddb4097f)
+
## [v1.1.1](https://github.com/andreskrey/readability.php/releases/tag/v1.1.1)
-- Switched from assertEquals to assertSame on unit testing to avoid weak comparisons.
+- Switched from assertEquals to assertSame on unit testing to avoid weak comparisons.
- Added a safe check to avoid sending the DOMDocument as a node when scanning for node ancestors.
- Fix issue #45: Small mistake in documentation
- Fix issue #46: Added `data-src` as a image source path
diff --git a/README.md b/README.md
index 34441b90..19b3dc36 100644
--- a/README.md
+++ b/README.md
@@ -1,5 +1,5 @@
# Readability.php
-[![Latest Stable Version](https://poser.pugx.org/andreskrey/readability.php/v/stable)](https://packagist.org/packages/andreskrey/readability.php) [![StyleCI](https://styleci.io/repos/71042668/shield?branch=master)](https://styleci.io/repos/71042668) [![Build Status](https://travis-ci.org/andreskrey/readability.php.svg?branch=master)](https://travis-ci.org/andreskrey/readability.php) [![Total Downloads](https://poser.pugx.org/andreskrey/readability.php/downloads)](https://packagist.org/packages/andreskrey/readability.php) [![Monthly Downloads](https://poser.pugx.org/andreskrey/readability.php/d/monthly)](https://packagist.org/packages/andreskrey/readability.php)
+[![Latest Stable Version](https://poser.pugx.org/andreskrey/readability.php/v/stable)](https://packagist.org/packages/andreskrey/readability.php) [![Build Status](https://travis-ci.org/andreskrey/readability.php.svg?branch=master)](https://travis-ci.org/andreskrey/readability.php) [![Coverage Status](https://coveralls.io/repos/github/andreskrey/readability.php/badge.svg?branch=master)](https://coveralls.io/github/andreskrey/readability.php/?branch=master) [![StyleCI](https://styleci.io/repos/71042668/shield?branch=master)](https://styleci.io/repos/71042668) [![Total Downloads](https://poser.pugx.org/andreskrey/readability.php/downloads)](https://packagist.org/packages/andreskrey/readability.php) [![Monthly Downloads](https://poser.pugx.org/andreskrey/readability.php/d/monthly)](https://packagist.org/packages/andreskrey/readability.php)
PHP port of *Mozilla's* **[Readability.js](https://github.com/mozilla/readability)**. Parses html text (usually news and other articles) and returns **title**, **author**, **main image** and **text content** without nav bars, ads, footers, or anything that isn't the main body of the text. Analyzes each node, gives them a score, and determines what's relevant and what can be discarded.
@@ -7,14 +7,14 @@ PHP port of *Mozilla's* **[Readability.js](https://github.com/mozilla/readabilit
The project aim is to be a 1 to 1 port of Mozilla's version and to follow closely all changes introduced there, but there are some major differences on the structure. Most of the code is a 1:1 copy –even the comments were imported– but some functions and structures were adapted to suit better the PHP language.
+**Lead Developer**: Andres Rey
+
## Requirements
PHP 5.6+, ext-dom, ext-xml, and ext-mbstring. To install all this dependencies (in the rare case your system does not have them already), you could try something like this in *nix like environments:
`$ sudo apt-get install php7.1-xml php7.1-mbstring`
-**Lead Developer**: Andres Rey
-
## How to use it
First you have to require the library using composer:
@@ -152,7 +152,7 @@ Self closing tags like ` ` get automatically expanded to ` logger = $logger;
+
+ return $this;
+ }
+
/**
* @return int
*/
diff --git a/src/Nodes/DOM/DOMDocument.php b/src/Nodes/DOM/DOMDocument.php
index a83f5b9c..81e9c7de 100644
--- a/src/Nodes/DOM/DOMDocument.php
+++ b/src/Nodes/DOM/DOMDocument.php
@@ -20,10 +20,11 @@ public function __construct($version, $encoding)
$this->registerNodeClass('DOMDocumentFragment', DOMDocumentFragment::class);
$this->registerNodeClass('DOMDocumentType', DOMDocumentType::class);
$this->registerNodeClass('DOMElement', DOMElement::class);
+ $this->registerNodeClass('DOMEntity', DOMEntity::class);
+ $this->registerNodeClass('DOMEntityReference', DOMEntityReference::class);
$this->registerNodeClass('DOMNode', DOMNode::class);
$this->registerNodeClass('DOMNotation', DOMNotation::class);
$this->registerNodeClass('DOMProcessingInstruction', DOMProcessingInstruction::class);
$this->registerNodeClass('DOMText', DOMText::class);
- $this->registerNodeClass('DOMEntityReference', DOMEntityReference::class);
}
}
diff --git a/src/Nodes/DOM/DOMEntity.php b/src/Nodes/DOM/DOMEntity.php
new file mode 100644
index 00000000..8493e731
--- /dev/null
+++ b/src/Nodes/DOM/DOMEntity.php
@@ -0,0 +1,10 @@
+getElementsByTagName('p') as $p) {
- $length += mb_strlen($p->textContent);
- }
+ $length = mb_strlen(preg_replace(NodeUtility::$regexps['onlyWhitespace'], '', $result->textContent));
$this->logger->info(sprintf('[Parsing] Article parsed. Amount of words: %s. Current threshold is: %s', $length, $this->configuration->getWordThreshold()));
- if ($result && mb_strlen(preg_replace('/\s/', '', $result->textContent)) < $this->configuration->getWordThreshold()) {
+ $parseSuccessful = true;
+
+ if ($result && $length < $this->configuration->getWordThreshold()) {
$this->dom = $this->loadHTML($html);
$root = $this->dom->getElementsByTagName('body')->item(0);
+ $parseSuccessful = false;
if ($this->configuration->getStripUnlikelyCandidates()) {
$this->logger->debug('[Parsing] Threshold not met, trying again setting StripUnlikelyCandidates as false');
$this->configuration->setStripUnlikelyCandidates(false);
+ $this->attempts[] = ['articleContent' => $result, 'textLength' => $length];
} elseif ($this->configuration->getWeightClasses()) {
$this->logger->debug('[Parsing] Threshold not met, trying again setting WeightClasses as false');
$this->configuration->setWeightClasses(false);
+ $this->attempts[] = ['articleContent' => $result, 'textLength' => $length];
} elseif ($this->configuration->getCleanConditionally()) {
$this->logger->debug('[Parsing] Threshold not met, trying again setting CleanConditionally as false');
$this->configuration->setCleanConditionally(false);
+ $this->attempts[] = ['articleContent' => $result, 'textLength' => $length];
} else {
- $this->logger->emergency('[Parsing] Could not parse text, giving up :(');
+ $this->logger->debug('[Parsing] Threshold not met, searching across attempts for some content.');
+ $this->attempts[] = ['articleContent' => $result, 'textLength' => $length];
+
+ // No luck after removing flags, just return the longest text we found during the different loops
+ usort($this->attempts, function ($a, $b) {
+ return $a['textLength'] < $b['textLength'];
+ });
+
+ // But first check if we actually have something
+ if (!$this->attempts[0]['textLength']) {
+ $this->logger->emergency('[Parsing] Could not parse text, giving up :(');
- throw new ParseException('Could not parse text.');
+ throw new ParseException('Could not parse text.');
+ }
+
+ $this->logger->debug('[Parsing] Threshold not met, but found some content in previous attempts.');
+
+ $result = $this->attempts[0]['articleContent'];
+ $parseSuccessful = true;
+ break;
}
} else {
break;
}
}
- $result = $this->postProcessContent($result);
-
- // If we haven't found an excerpt in the article's metadata, use the article's
- // first paragraph as the excerpt. This can be used for displaying a preview of
- // the article's content.
- if (!$this->getExcerpt()) {
- $this->logger->debug('[Parsing] No excerpt text found on metadata, extracting first p node and using it as excerpt.');
- $paragraphs = $result->getElementsByTagName('p');
- if ($paragraphs->length > 0) {
- $this->setExcerpt(trim($paragraphs->item(0)->textContent));
+ if ($parseSuccessful) {
+ $result = $this->postProcessContent($result);
+
+ // If we haven't found an excerpt in the article's metadata, use the article's
+ // first paragraph as the excerpt. This can be used for displaying a preview of
+ // the article's content.
+ if (!$this->getExcerpt()) {
+ $this->logger->debug('[Parsing] No excerpt text found on metadata, extracting first p node and using it as excerpt.');
+ $paragraphs = $result->getElementsByTagName('p');
+ if ($paragraphs->length > 0) {
+ $this->setExcerpt(trim($paragraphs->item(0)->textContent));
+ }
}
- }
- $this->setContent($result);
+ $this->setContent($result);
- $this->logger->info('*** Parse successful :)');
+ $this->logger->info('*** Parse successful :)');
- return true;
+ return true;
+ }
}
/**
@@ -468,6 +497,10 @@ private function getArticleTitle()
if (count(preg_split('/\s+/', $curTitle)) < 3) {
$curTitle = substr($originalTitle, strpos($originalTitle, ':') + 1);
$this->logger->info(sprintf('[Metadata] Title too short, using the first part of the title instead: \'%s\'', $curTitle));
+ } elseif (count(preg_split('/\s+/', substr($curTitle, 0, strpos($curTitle, ':')))) > 5) {
+ // But if we have too many words before the colon there's something weird
+ // with the titles and the H tags so let's just use the original title instead
+ $curTitle = $originalTitle;
}
}
} elseif (mb_strlen($curTitle) > 150 || mb_strlen($curTitle) < 15) {
@@ -549,7 +582,19 @@ private function toAbsoluteURI($uri)
*/
public function getPathInfo($url)
{
- $pathBase = parse_url($url, PHP_URL_SCHEME) . '://' . parse_url($url, PHP_URL_HOST) . dirname(parse_url($url, PHP_URL_PATH)) . '/';
+ // Check for base URLs
+ if ($this->dom->baseURI !== null) {
+ if (substr($this->dom->baseURI, 0, 1) === '/') {
+ // URLs starting with '/' override completely the URL defined in the link
+ $pathBase = parse_url($url, PHP_URL_SCHEME) . '://' . parse_url($url, PHP_URL_HOST) . $this->dom->baseURI;
+ } else {
+ // Otherwise just prepend the base to the actual path
+ $pathBase = parse_url($url, PHP_URL_SCHEME) . '://' . parse_url($url, PHP_URL_HOST) . dirname(parse_url($url, PHP_URL_PATH)) . '/' . rtrim($this->dom->baseURI, '/') . '/';
+ }
+ } else {
+ $pathBase = parse_url($url, PHP_URL_SCHEME) . '://' . parse_url($url, PHP_URL_HOST) . dirname(parse_url($url, PHP_URL_PATH)) . '/';
+ }
+
$scheme = parse_url($pathBase, PHP_URL_SCHEME);
$prePath = $scheme . '://' . parse_url($pathBase, PHP_URL_HOST);
@@ -1129,6 +1174,7 @@ public function prepArticle(DOMDocument $article)
$this->_clean($article, 'embed');
$this->_clean($article, 'h1');
$this->_clean($article, 'footer');
+ $this->_clean($article, 'link');
// Clean out elements have "share" in their id/class combinations from final top candidates,
// which means we don't remove the top candidates even they have "share".
@@ -1479,6 +1525,28 @@ public function _cleanHeaders(DOMDocument $article)
}
}
+ /**
+ * Removes the class="" attribute from every element in the given
+ * subtree.
+ *
+ * Readability.js has a special filter to avoid cleaning the classes that the algorithm adds. We don't add classes
+ * here so no need to filter those.
+ *
+ * @param DOMDocument|DOMNode $node
+ *
+ * @return void
+ **/
+ public function _cleanClasses($node)
+ {
+ if ($node->getAttribute('class') !== '') {
+ $node->removeAttribute('class');
+ }
+
+ for ($node = $node->firstChild; $node !== null; $node = $node->nextSibling) {
+ $this->_cleanClasses($node);
+ }
+ }
+
/**
* @param DOMDocument $article
*
@@ -1532,6 +1600,8 @@ public function postProcessContent(DOMDocument $article)
}
}
+ $this->_cleanClasses($article);
+
return $article;
}
@@ -1564,7 +1634,7 @@ protected function setTitle($title)
*/
public function getContent()
{
- return $this->content->C14N();
+ return ($this->content instanceof DOMDocument) ? $this->content->C14N() : null;
}
/**
diff --git a/test/ConfigurationTest.php b/test/ConfigurationTest.php
index fda9227c..19db2f14 100644
--- a/test/ConfigurationTest.php
+++ b/test/ConfigurationTest.php
@@ -3,6 +3,8 @@
namespace andreskrey\Readability\Test;
use andreskrey\Readability\Configuration;
+use Monolog\Handler\NullHandler;
+use Monolog\Logger;
/**
* Class ConfigurationTest.
@@ -37,13 +39,17 @@ public function testInvalidParameterIsNotInConfig(array $params)
*/
private function doEqualsAsserts(Configuration $config, array $options)
{
- // just part of params, it's enough
- $this->assertEquals($options['originalURL'], $config->getOriginalURL());
- $this->assertEquals($options['fixRelativeURLs'], $config->getFixRelativeURLs());
- $this->assertEquals($options['articleByLine'], $config->getArticleByLine());
$this->assertEquals($options['maxTopCandidates'], $config->getMaxTopCandidates());
+ $this->assertEquals($options['wordThreshold'], $config->getWordThreshold());
+ $this->assertEquals($options['articleByLine'], $config->getArticleByLine());
$this->assertEquals($options['stripUnlikelyCandidates'], $config->getStripUnlikelyCandidates());
+ $this->assertEquals($options['cleanConditionally'], $config->getCleanConditionally());
+ $this->assertEquals($options['weightClasses'], $config->getWeightClasses());
+ $this->assertEquals($options['fixRelativeURLs'], $config->getFixRelativeURLs());
$this->assertEquals($options['substituteEntities'], $config->getSubstituteEntities());
+ $this->assertEquals($options['normalizeEntities'], $config->getNormalizeEntities());
+ $this->assertEquals($options['originalURL'], $config->getOriginalURL());
+ $this->assertEquals($options['summonCthulhu'], $config->getOriginalURL());
}
/**
@@ -51,18 +57,35 @@ private function doEqualsAsserts(Configuration $config, array $options)
*/
public function getParams()
{
- return [
- [
- [
- 'originalURL' => 'my.original.url',
- 'fixRelativeURLs' => true,
- 'articleByLine' => true,
- 'maxTopCandidates' => 3,
- 'stripUnlikelyCandidates' => false,
- 'substituteEntities' => true,
- 'invalidParameter' => 'invalidParameterValue',
- ],
- ],
- ];
+ return [[
+ 'All current parameters' => [
+ 'maxTopCandidates' => 3,
+ 'wordThreshold' => 500,
+ 'articleByLine' => true,
+ 'stripUnlikelyCandidates' => false,
+ 'cleanConditionally' => false,
+ 'weightClasses' => false,
+ 'fixRelativeURLs' => true,
+ 'substituteEntities' => true,
+ 'normalizeEntities' => true,
+ 'originalURL' => 'my.original.url',
+ 'summonCthulhu' => 'my.original.url',
+ 'invalidParameter' => 'invalidParameterValue'
+ ]
+ ]];
+ }
+
+ /**
+ * Test if a logger interface can be injected and retrieved from the Configuration object.
+ */
+ public function testLoggerCanBeInjected()
+ {
+ $configuration = new Configuration();
+ $log = new Logger('Readability');
+ $log->pushHandler(new NullHandler());
+
+ $configuration->setLogger($log);
+
+ $this->assertSame($log, $configuration->getLogger());
}
}
diff --git a/test/ReadabilityTest.php b/test/ReadabilityTest.php
index fe17b234..c20574ee 100644
--- a/test/ReadabilityTest.php
+++ b/test/ReadabilityTest.php
@@ -13,7 +13,7 @@ class ReadabilityTest extends \PHPUnit_Framework_TestCase
*/
public function testReadabilityParsesHTML($html, $expectedResult, $expectedMetadata, $config, $expectedImages)
{
- $options = ['originalURL' => 'http://fakehost/test/test.html',
+ $options = ['OriginalURL' => 'http://fakehost/test/test.html',
'FixRelativeURLs' => true,
'SubstituteEntities' => true,
'ArticleByLine' => true
@@ -27,12 +27,7 @@ public function testReadabilityParsesHTML($html, $expectedResult, $expectedMetad
$options = array_merge($config, $options);
}
- $configuration = new Configuration();
-
- foreach ($options as $key => $value) {
- $name = 'set' . $key;
- $configuration->$name($value);
- }
+ $configuration = new Configuration($options);
$readability = new Readability($configuration);
$readability->parse($html);
@@ -50,7 +45,7 @@ public function testReadabilityParsesHTML($html, $expectedResult, $expectedMetad
*/
public function testHTMLParserParsesImages($html, $expectedResult, $expectedMetadata, $config, $expectedImages)
{
- $options = ['originalURL' => 'http://fakehost/test/test.html',
+ $options = ['OriginalURL' => 'http://fakehost/test/test.html',
'fixRelativeURLs' => true,
'substituteEntities' => true,
];
@@ -58,12 +53,8 @@ public function testHTMLParserParsesImages($html, $expectedResult, $expectedMeta
if ($config) {
$options = array_merge($options, $config);
}
- $configuration = new Configuration();
- foreach ($options as $key => $value) {
- $name = 'set' . $key;
- $configuration->$name($value);
- }
+ $configuration = new Configuration($options);
$readability = new Readability($configuration);
$readability->parse($html);
@@ -115,6 +106,12 @@ public function testReadabilityThrowsExceptionWithUnparseableHTML()
$parser = new Readability(new Configuration());
$this->expectException(ParseException::class);
$this->expectExceptionMessage('Could not parse text.');
- $parser->parse('
hello
');
+ $parser->parse('');
+ }
+
+ public function testReadabilityCallGetContentWithNoContent()
+ {
+ $parser = new Readability(new Configuration());
+ $this->assertNull($parser->getContent());
}
}
diff --git a/test/test-pages/001/expected.html b/test/test-pages/001/expected.html
index c101aecd..e05810ff 100644
--- a/test/test-pages/001/expected.html
+++ b/test/test-pages/001/expected.html
@@ -13,7 +13,7 @@
I guess.
Actually I've only found one which provides an adapter for Mocha and
actually works…
-
+
Drinking game for web devs:
(1) Think of a noun
(2) Google "<noun>.js"
diff --git a/test/test-pages/002/expected.html b/test/test-pages/002/expected.html
index d836b603..16dca2a0 100644
--- a/test/test-pages/002/expected.html
+++ b/test/test-pages/002/expected.html
@@ -1,4 +1,4 @@
-
For more than a decade the Web has used XMLHttpRequest (XHR) to achieve
+
For more than a decade the Web has used XMLHttpRequest (XHR) to achieve
asynchronous requests in JavaScript. While very useful, XHR is not a very
nice API. It suffers from lack of separation of concerns. The input, output
and state are all managed by interacting with one object, and state is
@@ -29,8 +29,8 @@
Simple fetching
The most useful, high-level part of the Fetch API is the fetch() function.
In its simplest form it takes a URL and returns a promise that resolves
to the response. The response is captured as a Response object.
All of the Headers methods throw TypeError if name is not a
valid HTTP Header name. The mutation operations will throw TypeError
if there is an immutable guard. Otherwise they fail silently. For example:
-
-
var res = Response.error();
+
+
var res = Response.error();try{
res.headers.set("Origin","http://mybank.com");}catch(e){
@@ -152,8 +152,8 @@
Request
a body, a request mode, credentials and cache hints.
The simplest Request is of course, just a URL, as you may do to GET a
resource.
-
-
var req =new Request("/index.html");
+
+
var req =new Request("/index.html");
console.log(req.method);// "GET"
console.log(req.url);// "http://example.com/index.html"
@@ -163,8 +163,8 @@
Request
(This is not the same as calling the clone() method, which
is covered in
the “Reading bodies” section.).
-
-
var copy =new Request(req);
+
+
var copy =new Request(req);
console.log(copy.method);// "GET"
console.log(copy.url);// "http://example.com/index.html"
@@ -173,8 +173,8 @@
Request
The non-URL attributes of the Request can only be set by passing
initial
values as a second argument to the constructor. This argument is a dictionary.
-
-
var uploadReq =new Request("/uploadImage",{
+
+
var uploadReq =new Request("/uploadImage",{
method:"POST",
headers:{"Content-Type":"image/png",
@@ -191,8 +191,8 @@
Request
origin with this mode set, the result is simply an error. You could use
this to ensure that
a request is always being made to your origin.
-
-
var arbitraryUrl = document.getElementById("url-input").value;
+
headers is exposed in the Response, but the body is readable. For example,
you could get a list of Flickr’s most interesting photos
today like this:
-
-
var u =new URLSearchParams();
+
+
var u =new URLSearchParams();
u.append('method','flickr.interestingness.getList');
u.append('api_key','<insert api key here>');
u.append('format','json');
@@ -241,8 +241,8 @@
Request
You may not read out the “Date” header since Flickr does not allow it
via
Access-Control-Expose-Headers.
-
-
response.headers.get("Date");// null
+
+
response.headers.get("Date");// null
The credentials enumeration determines if cookies for the other
@@ -296,8 +296,8 @@
Response
The
idiomatic way to return a Response to an intercepted request in ServiceWorkers
is:
-
This is a significant improvement over XHR in terms of ease of use of
non-text data!
Request bodies can be set by passing body parameters:
-
-
var form =new FormData(document.getElementById('login-form'));
+
+
var form =new FormData(document.getElementById('login-form'));
fetch("/login",{
method:"POST",
body: form
@@ -356,8 +356,8 @@
Dealing with bodies
Responses take the first argument as the body.
-
-
var res =new Response(new File(["chunk","chunk"],"archive.zip",
+
+
var res =new Response(new File(["chunk","chunk"],"archive.zip",{ type:"application/zip"}));
@@ -371,8 +371,8 @@
Streams and cloning
It is important to realise that Request and Response bodies can only be
read once! Both interfaces have a boolean attribute bodyUsed to
determine if it is safe to read or not.
-
-
var res =new Response("one time use");
+
+
var res =new Response("one time use");
console.log(res.bodyUsed);// false
res.text().then(function(v){
console.log(res.bodyUsed);// true
@@ -397,8 +397,8 @@
Streams and cloning
will return a clone of the object, with a ‘new’ body. clone() MUST
be called before the body of the corresponding object has been used. That
is, clone() first, read later.
-
A flaw in the wildly popular online game Minecraft makes it easy for just about anyone to crash the server hosting the game, according to a computer programmer who has released proof-of-concept code that exploits the vulnerability.
"I thought a lot before writing this post," Pakistan-based developer Ammar Askar wrote in a blog post published Thursday, 21 months, he said, after privately reporting the bug to Minecraft developer Mojang. "On the one hand I don't want to expose thousands of servers to a major vulnerability, yet on the other hand Mojang has failed to act on it."
The bug resides in the networking internals of the Minecraft protocol. It allows the contents of inventory slots to be exchanged, so that, among other things, items in players' hotbars are displayed automatically after logging in. Minecraft items can also store arbitrary metadata in a file format known as Named Binary Tag (NBT), which allows complex data structures to be kept in hierarchical nests. Askar has released proof-of-concept attack code he said exploits the vulnerability to crash any server hosting the game. Here's how it works.
The vulnerability stems from the fact that the client is allowed to send the server information about certain slots. This, coupled with the NBT format’s nesting allows us to craft a packet that is incredibly complex for the server to deserialize but trivial for us to generate.
In my case, I chose to create lists within lists, down to five levels. This is a json representation of what it looks like.
The root of the object, rekt, contains 300 lists. Each list has a list with 10 sublists, and each of those sublists has 10 of their own, up until 5 levels of recursion. That’s a total of 10^5 * 300 = 30,000,000 lists.
And this isn’t even the theoretical maximum for this attack. Just the nbt data for this payload is 26.6 megabytes. But luckily Minecraft implements a way to compress large packets, lucky us! zlib shrinks down our evil data to a mere 39 kilobytes.
Note: in previous versions of Minecraft, there was no protocol wide compression for big packets. Previously, NBT was sent compressed with gzip and prefixed with a signed short of its length, which reduced our maximum payload size to 2^15 - 1. Now that the length is a varint capable of storing integers up to 2^28, our potential for attack has increased significantly.
diff --git a/test/test-pages/base-url-base-element-relative/expected-images.json b/test/test-pages/base-url-base-element-relative/expected-images.json
new file mode 100644
index 00000000..efa3a530
--- /dev/null
+++ b/test/test-pages/base-url-base-element-relative/expected-images.json
@@ -0,0 +1 @@
+{"0":"http:\/\/fakehost\/test\/base\/foo\/bar\/baz.png","2":"http:\/\/fakehost\/foo\/bar\/baz.png","3":"http:\/\/test\/foo\/bar\/baz.png","4":"https:\/\/test\/foo\/bar\/baz.png"}
\ No newline at end of file
diff --git a/test/test-pages/base-url-base-element-relative/expected-metadata.json b/test/test-pages/base-url-base-element-relative/expected-metadata.json
new file mode 100644
index 00000000..eb78f287
--- /dev/null
+++ b/test/test-pages/base-url-base-element-relative/expected-metadata.json
@@ -0,0 +1,4 @@
+{
+ "Title": "Base URL with base relative test",
+ "Excerpt": "Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod\n tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam,\n quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo\n consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse\n cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non\n proident, sunt in culpa qui officia deserunt mollit anim id est laborum."
+}
diff --git a/test/test-pages/base-url-base-element-relative/expected.html b/test/test-pages/base-url-base-element-relative/expected.html
new file mode 100644
index 00000000..14d23f67
--- /dev/null
+++ b/test/test-pages/base-url-base-element-relative/expected.html
@@ -0,0 +1,33 @@
+
+
+ Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod
+ tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam,
+ quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo
+ consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse
+ cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non
+ proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
+
+ Tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam,
+ quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo
+ consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse
+ cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non
+ proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
+
+
\ No newline at end of file
diff --git a/test/test-pages/base-url-base-element-relative/source.html b/test/test-pages/base-url-base-element-relative/source.html
new file mode 100644
index 00000000..bb0f7df0
--- /dev/null
+++ b/test/test-pages/base-url-base-element-relative/source.html
@@ -0,0 +1,44 @@
+
+
+
+
+
+ Base URL with base relative test
+
+
+
+
Lorem
+
+ Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod
+ tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam,
+ quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo
+ consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse
+ cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non
+ proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
+
+ Tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam,
+ quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo
+ consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse
+ cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non
+ proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
+
+
+
+
diff --git a/test/test-pages/base-url-base-element/expected-images.json b/test/test-pages/base-url-base-element/expected-images.json
new file mode 100644
index 00000000..1f0eaea2
--- /dev/null
+++ b/test/test-pages/base-url-base-element/expected-images.json
@@ -0,0 +1 @@
+{"0":"http:\/\/fakehost\/foo\/bar\/baz.png","3":"http:\/\/test\/foo\/bar\/baz.png","4":"https:\/\/test\/foo\/bar\/baz.png"}
\ No newline at end of file
diff --git a/test/test-pages/base-url-base-element/expected-metadata.json b/test/test-pages/base-url-base-element/expected-metadata.json
new file mode 100644
index 00000000..4a22f750
--- /dev/null
+++ b/test/test-pages/base-url-base-element/expected-metadata.json
@@ -0,0 +1,4 @@
+{
+ "Title": "Base URL with base test",
+ "Excerpt": "Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod\n tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam,\n quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo\n consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse\n cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non\n proident, sunt in culpa qui officia deserunt mollit anim id est laborum."
+}
diff --git a/test/test-pages/base-url-base-element/expected.html b/test/test-pages/base-url-base-element/expected.html
new file mode 100644
index 00000000..5037eb26
--- /dev/null
+++ b/test/test-pages/base-url-base-element/expected.html
@@ -0,0 +1,33 @@
+
+
+ Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod
+ tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam,
+ quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo
+ consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse
+ cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non
+ proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
+
+ Tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam,
+ quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo
+ consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse
+ cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non
+ proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
+
+
\ No newline at end of file
diff --git a/test/test-pages/base-url-base-element/source.html b/test/test-pages/base-url-base-element/source.html
new file mode 100644
index 00000000..4b3c63c8
--- /dev/null
+++ b/test/test-pages/base-url-base-element/source.html
@@ -0,0 +1,44 @@
+
+
+
+
+
+ Base URL with base test
+
+
+
+
Lorem
+
+ Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod
+ tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam,
+ quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo
+ consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse
+ cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non
+ proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
+
+ Tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam,
+ quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo
+ consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse
+ cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non
+ proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
+
President Barack Obama has admitted that his failure to pass "common sense gun safety laws" in the US is the greatest frustration of his presidency.
In an interview with the BBC, Mr Obama said it was "distressing" not to have made progress on the issue "even in the face of repeated mass killings".
He vowed to keep trying, but the BBC's North America editor Jon Sopel said the president did not sound very confident.
However, Mr Obama said race relations had improved during his presidency.
Hours after the interview, a gunman opened fire at a cinema in the US state of Louisiana, killing two people and injuring several others before shooting himself.
In a wide-ranging interview, President Obama also said:
President Barack Obama has admitted that his failure to pass "common sense gun safety laws" in the US is the greatest frustration of his presidency.
In an interview with the BBC, Mr Obama said it was "distressing" not to have made progress on the issue "even in the face of repeated mass killings".
He vowed to keep trying, but the BBC's North America editor Jon Sopel said the president did not sound very confident.
However, Mr Obama said race relations had improved during his presidency.
Hours after the interview, a gunman opened fire at a cinema in the US state of Louisiana, killing two people and injuring several others before shooting himself.
In a wide-ranging interview, President Obama also said:
Mr Obama lands in Kenya later on Friday for his first visit since becoming president.
But with just 18 months left in power, he said gun control was the area where he has been "most frustrated and most stymied" since coming to power in 2009.
"If you look at the number of Americans killed since 9/11 by terrorism, it's less than 100. If you look at the number that have been killed by gun violence, it's in the tens of thousands," Mr Obama said.
Kenya trip
Mr Obama was speaking to the BBC at the White House before departing for Kenya.
His father was Kenyan and the president is expected to meet relatives in Nairobi.
Mr Obama has faced criticism in the country after the US legalised gay marriage. However, in his interview, the president said he would not fall silent on the issue.
"I am not a fan of discrimination and bullying of anybody on the basis of race, on the basis of religion, on the basis of sexual orientation or gender," he said.
The president also admitted that some African governments, including Kenya's, needed to improve their records on human rights and democracy. However, he defended his decision to engage with and visit those governments.
"Well, they're not ideal institutions. But what we found is, is that when we combined blunt talk with engagement, that gives us the best opportunity to influence and open up space for civil society."
Mr Obama will become the first US president to address the African Union when he travels on to Ethiopia on Sunday.
\ No newline at end of file
diff --git a/test/test-pages/blogger/expected.html b/test/test-pages/blogger/expected.html
index e7868b9e..10b48746 100644
--- a/test/test-pages/blogger/expected.html
+++ b/test/test-pages/blogger/expected.html
@@ -1,22 +1,22 @@
-
+
I've written a couple of posts in the past few months but they were all for
so I figured I'm long overdue for one on Silicon Exposed.
So what's a GreenPak?
-
Silego Technology is a fabless semiconductor company located in the SF Bay area, which makes (among other things) a line of programmable logic devices known as GreenPak. Their
GreenPak devices are kind of like itty bitty PSoCs - they have a mixed signal fabric with an ADC, DACs, comparators, voltage references, plus a digital LUT/FF fabric and some typical digital MCU peripherals like counters and oscillators (but no CPU).
It's actually an interesting architecture - FPGAs (including some devices marketed as CPLDs) are a 2D array of LUTs connected via wires to adjacent cells, and true (product term) CPLDs are a star topology of AND-OR arrays connected by a crossbar. GreenPak, on the other hand, is a star topology of LUTs, flipflops, and analog/digital hard IP connected to a crossbar.
Without further ado, here's a block diagram showing all the cool stuff you get in the SLG46620V:
+
Silego Technology is a fabless semiconductor company located in the SF Bay area, which makes (among other things) a line of programmable logic devices known as GreenPak. Their
GreenPak devices are kind of like itty bitty PSoCs - they have a mixed signal fabric with an ADC, DACs, comparators, voltage references, plus a digital LUT/FF fabric and some typical digital MCU peripherals like counters and oscillators (but no CPU).
It's actually an interesting architecture - FPGAs (including some devices marketed as CPLDs) are a 2D array of LUTs connected via wires to adjacent cells, and true (product term) CPLDs are a star topology of AND-OR arrays connected by a crossbar. GreenPak, on the other hand, is a star topology of LUTs, flipflops, and analog/digital hard IP connected to a crossbar.
Without further ado, here's a block diagram showing all the cool stuff you get in the SLG46620V:
-
SLG46620V block diagram (from device datasheet)
+
SLG46620V block diagram (from device datasheet)
- They're also tiny (the SLG46620V is a 20-pin 0.4mm pitch STQFN measuring 2x3 mm, and the lower gate count SLG46140V is a mere 1.6x2 mm) and probably the cheapest programmable logic device on the market - $0.50 in low volume and less than $0.40 in larger quantities.
The Vdd range of GreenPak4 is huge, more like what you'd expect from an MCU than an FPGA! It can run on anything from 1.8 to 5V, although performance is only specified at 1.8, 3.3, and 5V nominal voltages. There's also a dual-rail version that trades one of the GPIO pins for a second power supply pin, allowing you to interface to logic at two different voltage levels.
To support low-cost/space-constrained applications, they even have the configuration memory on die. It's one-time programmable and needs external Vpp to program (presumably Silego didn't want to waste die area on charge pumps that would only be used once) but has a SRAM programming mode for prototyping.
The best part is that the development software (GreenPak Designer) is free of charge and provided for all major operating systems including Linux! Unfortunately, the only supported design entry method is schematic entry and there's no way to write your design in a HDL.
While schematics may be fine for quick tinkering on really simple designs, they quickly get unwieldy. The nightmare of a circuit shown below is just a bunch of counters hooked up to LEDs that blink at various rates.
+ They're also tiny (the SLG46620V is a 20-pin 0.4mm pitch STQFN measuring 2x3 mm, and the lower gate count SLG46140V is a mere 1.6x2 mm) and probably the cheapest programmable logic device on the market - $0.50 in low volume and less than $0.40 in larger quantities.
The Vdd range of GreenPak4 is huge, more like what you'd expect from an MCU than an FPGA! It can run on anything from 1.8 to 5V, although performance is only specified at 1.8, 3.3, and 5V nominal voltages. There's also a dual-rail version that trades one of the GPIO pins for a second power supply pin, allowing you to interface to logic at two different voltage levels.
To support low-cost/space-constrained applications, they even have the configuration memory on die. It's one-time programmable and needs external Vpp to program (presumably Silego didn't want to waste die area on charge pumps that would only be used once) but has a SRAM programming mode for prototyping.
The best part is that the development software (GreenPak Designer) is free of charge and provided for all major operating systems including Linux! Unfortunately, the only supported design entry method is schematic entry and there's no way to write your design in a HDL.
While schematics may be fine for quick tinkering on really simple designs, they quickly get unwieldy. The nightmare of a circuit shown below is just a bunch of counters hooked up to LEDs that blink at various rates.
-
Schematic from hell!
+
Schematic from hell!
As if this wasn't enough of a problem, the largest GreenPak4 device (the SLG46620V) is split into two halves with limited routing between them, and the GUI doesn't help the user manage this complexity at all - you have to draw your schematic in two halves and add "cross connections" between them.
The icing on the cake is that schematics are a pain to diff and collaborate on. Although GreenPak schematics are XML based, which is a touch better than binary, who wants to read a giant XML diff and try to figure out what's going on in the circuit?
This isn't going to be a post on the quirks of Silego's software, though - that would be boring. As it turns out, there's one more exciting feature of these chips that I didn't mention earlier: the configuration bitstream is 100% documented in the device datasheet! This is unheard of in the programmable logic world. As Nick of Arachnid Labs says, the chip is "just dying for someone to write a VHDL or Verilog compiler for it". As you can probably guess by from the title of this post, I've been busy doing exactly that.
Great! How does it work?
-
Rather than wasting time writing a synthesizer, I decided to write a GreenPak technology library for Clifford Wolf's excellent open source synthesis tool,
, and then make a place-and-route tool to turn that into a final netlist. The post-PAR netlist can then be loaded into GreenPak Designer in order to program the device.
The first step of the process is to run the "synth_greenpak4" Yosys flow on the Verilog source. This runs a generic RTL synthesis pass, then some coarse-grained extraction passes to infer shift register and counter cells from behavioral logic, and finally maps the remaining logic to LUT/FF cells and outputs a JSON-formatted netlist.
Once the design has been synthesized, my tool (named, surprisingly, gp4par) is then launched on the netlist. It begins by parsing the JSON and constructing a directed graph of cell objects in memory. A second graph, containing all of the primitives in the device and the legal connections between them, is then created based on the device specified on the command line. (As of now only the SLG46620V is supported; the SLG46621V can be added fairly easily but the SLG46140V has a slightly different microarchitecture which will require a bit more work to support.)
After the graphs are generated, each node in the netlist graph is assigned a numeric label identifying the type of cell and each node in the device graph is assigned a list of legal labels: for example, an I/O buffer site is legal for an input buffer, output buffer, or bidirectional buffer.
+
Rather than wasting time writing a synthesizer, I decided to write a GreenPak technology library for Clifford Wolf's excellent open source synthesis tool,
, and then make a place-and-route tool to turn that into a final netlist. The post-PAR netlist can then be loaded into GreenPak Designer in order to program the device.
The first step of the process is to run the "synth_greenpak4" Yosys flow on the Verilog source. This runs a generic RTL synthesis pass, then some coarse-grained extraction passes to infer shift register and counter cells from behavioral logic, and finally maps the remaining logic to LUT/FF cells and outputs a JSON-formatted netlist.
Once the design has been synthesized, my tool (named, surprisingly, gp4par) is then launched on the netlist. It begins by parsing the JSON and constructing a directed graph of cell objects in memory. A second graph, containing all of the primitives in the device and the legal connections between them, is then created based on the device specified on the command line. (As of now only the SLG46620V is supported; the SLG46621V can be added fairly easily but the SLG46140V has a slightly different microarchitecture which will require a bit more work to support.)
After the graphs are generated, each node in the netlist graph is assigned a numeric label identifying the type of cell and each node in the device graph is assigned a list of legal labels: for example, an I/O buffer site is legal for an input buffer, output buffer, or bidirectional buffer.
-
Example labeling for a subset of the netlist and device graphs
+
Example labeling for a subset of the netlist and device graphs
The labeled nodes now need to be placed. The initial placement uses a simple greedy algorithm to create a valid (although not necessarily optimal or even routable) placement:
Loop over the cells in the netlist. If any cell has a LOC constraint, which locks the cell to a specific physical site, attempt to assign the node to the specified site. If the specified node is the wrong type, doesn't exist, or is already used by another constrained node, the constraint is invalid so fail with an error.
Loop over all of the unconstrained cells in the netlist and assign them to the first unused site with the right label. If none are available, the design is too big for the device so fail with an error.
Snopes fact checker and staff writer David Emery posted to Twitter asking if there were “any un-angry Trump supporters?”
Emery, a writer for partisan “fact-checking” website Snopes.com which soon will be in charge of labelling “fake news” alongside ABC News and Politifact, retweeted an article by Vulture magazine relating to the protests of the Hamilton musical following the decision by the cast of the show to make a public announcement to Vice-president elect Mike Pence while he watched the performance with his family.
-
-
SIGN UP FOR OUR NEWSLETTER
+
+
SIGN UP FOR OUR NEWSLETTER
-
The tweet from Vulture magazine reads, “#Hamilton Chicago show interrupted by angry Trump supporter.” Emery retweeted the story, saying, “Are there un-angry Trump supporters?”
+
The tweet from Vulture magazine reads, “#Hamilton Chicago show interrupted by angry Trump supporter.” Emery retweeted the story, saying, “Are there un-angry Trump supporters?”
@@ -33,7 +33,7 @@
Snopes fact checker and staff writer David Emery posted to Twitter ask
Facebook believe that Emery, along with other Snopes writers, ABC News, and Politifact are impartial enough to label and silence what they believe to be “fake news” on social media.
-
Lucas Nolan is a reporter for Breitbart Tech covering issues of free speech and online censorship. Follow him on Twitter @LucasNolan_ or email him at lnolan@breitbart.com
+
Lucas Nolan is a reporter for Breitbart Tech covering issues of free speech and online censorship. Follow him on Twitter @LucasNolan_ or email him at lnolan@breitbart.com
Most people go to hotels for the pleasure of sleeping in a giant bed with clean white sheets and waking up to fresh towels in the morning.
But those towels and sheets might not be as clean as they look, according to the hotel bosses that responded to an online thread about the things hotel owners don’t want you to know.
@@ -11,10 +11,10 @@
-
-
+
+
-
+
@@ -27,17 +27,17 @@
Forrest Jones said that anything that comes into contact with any of the previous guest’s skin should be taken out and washed every time the room is made, but that even the fanciest hotels don’t always do so. "Hotels are getting away from comforters. Blankets are here to stay, however. But some hotels are still hesitant about washing them every day if they think they can get out of it," he said.
-
+
Video shows bed bug infestation at New York hotel
-
-
+
+
-
+
@@ -52,10 +52,10 @@
-
-
+
+
-
+
@@ -70,10 +70,10 @@
-
-
+
+
-
+
@@ -94,12 +94,12 @@
5. Beware the wall-mounted hairdryer
-
-
+
+
-
+
-
Business news in pictures
+
Business news in pictures
@@ -114,10 +114,10 @@
6. Mini bars almost always lose money
-
-
+
+
-
+
@@ -129,11 +129,11 @@
6. Mini bars almost always lose money
7. Always made sure the hand towels are clean when you arrive
Forrest Jones made a discovery when he was helping out with the housekeepers. “You know where you almost always find a hand towel in any recently-vacated hotel room that was occupied by a guy? On the floor, next to the bed, about halfway down, maybe a little toward the foot of the bed. Same spot in the floor, next to almost every bed occupied by a man, in every room. I'll leave the rest to your imagination,” he said.
\ No newline at end of file
diff --git a/test/test-pages/buzzfeed-1/expected.html b/test/test-pages/buzzfeed-1/expected.html
index 3e477ae3..82dc3a10 100644
--- a/test/test-pages/buzzfeed-1/expected.html
+++ b/test/test-pages/buzzfeed-1/expected.html
@@ -1,42 +1,42 @@
-
-
+
+
The mother of a woman who took suspected diet pills bought online has described how her daughter was “literally burning up from within” moments before her death.
-
West Merica Police
+
West Merica Police
-
-
Eloise Parry, 21, was taken to Royal Shrewsbury hospital on 12 April after taking a lethal dose of highly toxic “slimming tablets”.
+
+
Eloise Parry, 21, was taken to Royal Shrewsbury hospital on 12 April after taking a lethal dose of highly toxic “slimming tablets”.
“The drug was in her system, there was no anti-dote, two tablets was a lethal dose – and she had taken eight,” her mother, Fiona, said in a statement yesterday.
“As Eloise deteriorated, the staff in A&E did all they could to stabilise her. As the drug kicked in and started to make her metabolism soar, they attempted to cool her down, but they were fighting an uphill battle.
“She was literally burning up from within.”
She added: “They never stood a chance of saving her. She burned and crashed.”
“We are undoubtedly concerned over the origin and sale of these pills and are working with partner agencies to establish where they were bought from and how they were advertised,” said chief inspector Jennifer Mattinson from the West Mercia police.
The Food Standards Agency warned people to stay away from slimming products that contained DNP.
“We advise the public not to take any tablets or powders containing DNP, as it is an industrial chemical and not fit for human consumption,” it said in a statement.
-
+
Fiona Parry issued a plea for people to stay away from pills containing the chemical.
-
“[Eloise] just never really understood how dangerous the tablets that she took were,” she said. “Most of us don’t believe that a slimming tablet could possibly kill us.
+
“[Eloise] just never really understood how dangerous the tablets that she took were,” she said. “Most of us don’t believe that a slimming tablet could possibly kill us.
“DNP is not a miracle slimming pill. It is a deadly toxin.”
\ No newline at end of file
diff --git a/test/test-pages/challenges/expected.html b/test/test-pages/challenges/expected.html
index b204943c..195bdd27 100644
--- a/test/test-pages/challenges/expected.html
+++ b/test/test-pages/challenges/expected.html
@@ -1,4 +1,4 @@
-
+
par Alexandria Sage et Lisa Girion
LAS VEGAS, Nevada (Reuters) - La police américaine peinait mardi à établir les motivations qui ont poussé un retraité de 64 ans à tirer sur la foule depuis une chambre d'hôtel de Las Vegas, faisant 59 morts et 527 blessés dans la plus meurtrière fusillade de l'histoire des Etats-Unis.
Le tireur, qui s'est donné la mort peu avant l'arrivée de police, a été identifié comme Stephen Paddock, un individu apparemment sans histoire, inconnu des services de police et vivant dans un lotissement en périphérie de la ville. Le seul fait notable le concernant était une infraction au code de la route.
diff --git a/test/test-pages/cnet-svg-classes/expected-images.json b/test/test-pages/cnet-svg-classes/expected-images.json
new file mode 100644
index 00000000..3402f790
--- /dev/null
+++ b/test/test-pages/cnet-svg-classes/expected-images.json
@@ -0,0 +1 @@
+["https:\/\/cdn1.cnet.com\/img\/JumVcu1ZSLtPP8ui0UWaSlgi5RU=\/670x503\/2017\/12\/01\/b36ce794-e0b8-495c-a198-184923a8f4e9\/twitter-lite.jpg","https:\/\/cdn2.cnet.com\/img\/LI8y19stcvIQUdzbYdH4-DAigtc=\/fit-in\/570x0\/2017\/12\/01\/b36ce794-e0b8-495c-a198-184923a8f4e9\/twitter-lite.jpg","https:\/\/cdn1.cnet.com\/img\/mWcZaiA8Ngv61OQcpdoh6Ra9nY0=\/170x96\/2017\/11\/13\/70540d7a-cbc5-4563-ab86-b5549ef68168\/oneplus-5t-product-21.jpg"]
\ No newline at end of file
diff --git a/test/test-pages/cnet-svg-classes/expected-metadata.json b/test/test-pages/cnet-svg-classes/expected-metadata.json
new file mode 100644
index 00000000..ebebc0fd
--- /dev/null
+++ b/test/test-pages/cnet-svg-classes/expected-metadata.json
@@ -0,0 +1,4 @@
+{
+ "Title": "Twitter Lite se estrena en México, Venezuela y otros nueve países",
+ "Excerpt": "Twitter Lite llega a 11 países de América Latina, para ayudar a los usuarios con mala señal de sus redes móviles."
+}
diff --git a/test/test-pages/cnet-svg-classes/expected.html b/test/test-pages/cnet-svg-classes/expected.html
new file mode 100644
index 00000000..26b7440a
--- /dev/null
+++ b/test/test-pages/cnet-svg-classes/expected.html
@@ -0,0 +1,23 @@
+
+
+
Twitter Lite estará disponible en Google Play Store en 11 países de América Latina.
+ Twitter
+
Twitter ha dado a conocer que Twitter Lite llegará a un total de 24 nuevos países a partir de hoy, 11 de ellos de América Latina.
+
Según explicó en un comunicadoTwitter Lite ahora estará disponible en Bolivia, Brasil, Chile, Colombia, Costa Rica, Ecuador, México, Panamá, Perú, El Salvador y Venezuela.
+
Twitter Lite es la versión ligera de la aplicación de la red social para Android, disponible en la Google Play Store. Con este app los usuarios que experimentan fallos de red o que viven en países con redes con poca velocidad de conexión como Venezuela podrán descargar los tuits de forma más rápida.
+
+
Entre sus novedades, Twitter Lite permite la carga rápida de tuits en redes 2G y 3G, y ofrece ayuda offline en caso de que pierdas tu conexión; a eso debemos sumar que minimiza el uso de datos y ofrece un modo de ahorro, en el que únicamente se descargan las fotos o videos de los tuits que quieres ver.
+
+
+
Además, el app ocupa menos espacio en tu teléfono móvil, al reducir a 3MB su peso.
+
Twitter dio a conocer Twitter Lite en abril en India, y desde entonces ha estado trabajando para llevarlo a más países. La empresa en los últimos meses también se ha involucrado de forma definitiva en la eliminación de los abusos en la red social, tomando medidas incluso en la verificación de cuentas.
+
+
+
+ Reproduciendo:Mira esto: Google Assistant mejora, hay más cambios en Twitter y...
+
+ 8:09
+
+
+
+
\ No newline at end of file
diff --git a/test/test-pages/cnet-svg-classes/source.html b/test/test-pages/cnet-svg-classes/source.html
new file mode 100644
index 00000000..c71419fe
--- /dev/null
+++ b/test/test-pages/cnet-svg-classes/source.html
@@ -0,0 +1,662 @@
+
+
+
+
+
+
+
+
+
+
+
+ Twitter Lite se estrena en México, Venezuela y otros nueve países - CNET en Español
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
Twitter Lite estará disponible en Google Play Store en 11 países de América Latina.
+ Twitter
+
+
+
Twitter ha dado a conocer que Twitter Lite llegará a un total de 24 nuevos países a partir de hoy, 11 de ellos de América Latina.
+
Según explicó en un comunicadoTwitter Lite ahora estará disponible en Bolivia, Brasil, Chile, Colombia, Costa Rica, Ecuador, México, Panamá, Perú, El Salvador y Venezuela.
+
Twitter Lite es la versión ligera de la aplicación de la red social para Android, disponible en la Google Play Store. Con este app los usuarios que experimentan fallos de red o que viven en países con redes con poca velocidad de conexión como Venezuela podrán descargar los tuits de forma más rápida.
Entre sus novedades, Twitter Lite permite la carga rápida de tuits en redes 2G y 3G, y ofrece ayuda offline en caso de que pierdas tu conexión; a eso debemos sumar que minimiza el uso de datos y ofrece un modo de ahorro, en el que únicamente se descargan las fotos o videos de los tuits que quieres ver.
+
+
+
+
+
+
+
Además, el app ocupa menos espacio en tu teléfono móvil, al reducir a 3MB su peso.
+
Twitter dio a conocer Twitter Lite en abril en India, y desde entonces ha estado trabajando para llevarlo a más países. La empresa en los últimos meses también se ha involucrado de forma definitiva en la eliminación de los abusos en la red social, tomando medidas incluso en la verificación de cuentas.
+
+
+
+
+
+
+ Reproduciendo:Mira esto: Google Assistant mejora, hay más cambios en Twitter y...