Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
42 changes: 42 additions & 0 deletions Readability.js
Original file line number Diff line number Diff line change
Expand Up @@ -1907,6 +1907,47 @@ Readability.prototype = {
return false;
},

_removeDeeplyNestedImageDivs() {
var doc = this._doc;
var nodes = Array.from(this._getAllNodesWithTag(doc, ["img"]));
for (var i = 0; i < nodes.length; i++) {
var node = nodes[i];
var parent = node.parentNode;
while (parent.tagName === "DIV" && !node.previousElementSibling) {
// If we've only got an image and potentially a noscript after it, with
// no other non-whitespace text content, we can unwrap the div.

// First check sibling elements. If there's a non-noscript el, or
// more stuff after that, we can't unwrap.
if (
node.nextElementSibling &&
(node.nextElementSibling.tagName !== "NOSCRIPT" ||
node.nextElementSibling.nextElementSibling)
) {
break;
}
// Next, check for non-whitespace text content siblings.
let hasNoRealTextContent = !this._someNode(
parent.childNodes,
function (node) {
return (
node.nodeType === this.TEXT_NODE &&
this.REGEXPS.hasContent.test(node.textContent)
);
}
);
if (!hasNoRealTextContent) {
break;
}
while (parent.firstElementChild) {
parent.parentNode.insertBefore(parent.firstElementChild, parent);
}
parent.remove();
parent = node.parentNode;
}
}
},

/**
* Find all <noscript> that are located after <img> nodes, and which contain only one
* <img> element. Replace the first image with the image from inside the <noscript> tag,
Expand Down Expand Up @@ -2756,6 +2797,7 @@ Readability.prototype = {
}

// Unwrap image from noscript
this._removeDeeplyNestedImageDivs();
this._unwrapNoscriptImages(this._doc);

// Extract JSON-LD metadata before removing scripts
Expand Down
10 changes: 10 additions & 0 deletions test/test-pages/allrecipes-1/expected-metadata.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
{
"title": "Hot Honey Brussels Sprouts",
"byline": "Nicole Russell",
"dir": null,
"lang": "en",
"excerpt": "These hot honey Brussels sprouts are a simple side dish with all the elements you'll ever need in a side. They're sweet, spicy, crispy, and melt in your mouth.",
"siteName": "Allrecipes",
"publishedTime": null,
"readerable": true
}
279 changes: 279 additions & 0 deletions test/test-pages/allrecipes-1/expected.html

Large diffs are not rendered by default.

3,349 changes: 3,349 additions & 0 deletions test/test-pages/allrecipes-1/source.html

Large diffs are not rendered by default.

29 changes: 8 additions & 21 deletions test/test-pages/bug-1255978/expected.html
Original file line number Diff line number Diff line change
Expand Up @@ -6,9 +6,7 @@
<p>But even luxury hotels aren’t always cleaned as often as they should be.</p>
<p>Here are some of the secrets that the receptionist will never tell you when you check in, according to answers posted on <a href="https://www.quora.com/What-are-the-things-we-dont-know-about-hotel-rooms" target="_blank">Quora</a>.</p>
<div>
<div>
<p><img src="https://static.independent.co.uk/s3fs-public/styles/story_medium/public/thumbnails/image/2014/03/18/10/bandb2.jpg" alt="bandb2.jpg" title="bandb2.jpg" width="564" height="423" /></p>
</div>
<p><img src="https://static.independent.co.uk/s3fs-public/styles/story_medium/public/thumbnails/image/2014/03/18/10/bandb2.jpg" alt="bandb2.jpg" title="bandb2.jpg" width="564" height="423" /></p>
<p>Even posh hotels might not wash a blanket in between stays </p>
</div>
<p>1. Take any blankets or duvets off the bed</p>
Expand All @@ -28,10 +26,8 @@
<p><span>Duration Time</span> 0:00</p>
</div>
<div tabindex="0" role="slider" aria-valuenow="NaN" aria-valuemin="0" aria-valuemax="100" aria-label="progress bar" aria-valuetext="0:00">
<p><span><span>Loaded</span>: 0%</span>
</p>
<p><span><span>Progress</span>: 0%</span>
</p>
<p><span><span>Loaded</span>: 0%</span></p>
<p><span><span>Progress</span>: 0%</span></p>
</div>
<div>
<p><span>Remaining Time</span> -0:00</p>
Expand All @@ -43,25 +39,19 @@
<p>Video shows bed bug infestation at New York hotel</p>
</div>
<div>
<div>
<p><img src="https://static.independent.co.uk/s3fs-public/styles/story_medium/public/thumbnails/image/2015/05/26/11/hotel-door-getty.jpg" alt="hotel-door-getty.jpg" title="hotel-door-getty.jpg" width="564" height="423" /></p>
</div>
<p><img src="https://static.independent.co.uk/s3fs-public/styles/story_medium/public/thumbnails/image/2015/05/26/11/hotel-door-getty.jpg" alt="hotel-door-getty.jpg" title="hotel-door-getty.jpg" width="564" height="423" /></p>
<p>Forrest Jones advised stuffing the peep hole with a strip of rolled up notepaper when not in use. </p>
</div>
<p>2. Check the peep hole has not been tampered with</p>
<p>This is not common, but can happen, Forrest Jones said. He advised stuffing the peep hole with a strip of rolled up notepaper when not in use. When someone knocks on the door, the paper can be removed to check who is there. If no one is visible, he recommends calling the front desk immediately. “I look forward to the day when I can tell you to choose only hotels where every employee who has access to guestroom keys is subjected to a complete public records background check, prior to hire, and every year or two thereafter. But for now, I can't,” he said.</p>
<div>
<div>
<p><img src="https://static.independent.co.uk/s3fs-public/styles/story_medium/public/thumbnails/image/2013/07/31/15/luggage-3.jpg" alt="luggage-3.jpg" title="luggage-3.jpg" width="564" height="423" /></p>
</div>
<p><img src="https://static.independent.co.uk/s3fs-public/styles/story_medium/public/thumbnails/image/2013/07/31/15/luggage-3.jpg" alt="luggage-3.jpg" title="luggage-3.jpg" width="564" height="423" /></p>
<p>Put luggage on the floor </p>
</div>
<p>3. Don’t use a wooden luggage rack</p>
<p>Bedbugs love wood. Even though a wooden luggage rack might look nicer and more expensive than a metal one, it’s a breeding ground for bugs. Forrest Jones says guests should put the items they plan to take from bags on other pieces of furniture and leave the bag on the floor.</p>
<div>
<div>
<p><img src="https://static.independent.co.uk/s3fs-public/styles/story_medium/public/thumbnails/image/2015/04/13/11/Lifestyle-hotels.jpg" alt="Lifestyle-hotels.jpg" title="Lifestyle-hotels.jpg" width="564" height="423" /></p>
</div>
<p><img src="https://static.independent.co.uk/s3fs-public/styles/story_medium/public/thumbnails/image/2015/04/13/11/Lifestyle-hotels.jpg" alt="Lifestyle-hotels.jpg" title="Lifestyle-hotels.jpg" width="564" height="423" /></p>
<p>The old rule of thumb is that for every 00 invested in a room, the hotel should charge in average daily rate </p>
</div>
<p>4. Hotel rooms are priced according to how expensive they were to build</p>
Expand All @@ -71,9 +61,7 @@ <h3>5. Beware the wall-mounted hairdryer</h3>
<h3>6. Mini bars almost always lose money</h3>
<p>Despite the snacks in the minibar seeming like the most overpriced food you have ever seen, hotel owners are still struggling to make a profit from those snacks. "Minibars almost always lose money, even when they charge $10 for a Diet Coke,” Sharon said.</p>
<div>
<div>
<p><img src="https://static.independent.co.uk/s3fs-public/styles/story_medium/public/thumbnails/image/2014/03/13/16/agenda7.jpg" alt="agenda7.jpg" title="agenda7.jpg" width="564" height="423" /></p>
</div>
<p><img src="https://static.independent.co.uk/s3fs-public/styles/story_medium/public/thumbnails/image/2014/03/13/16/agenda7.jpg" alt="agenda7.jpg" title="agenda7.jpg" width="564" height="423" /></p>
<p>Towels should always be cleaned between stays </p>
</div>
<p>7. Always made sure the hand towels are clean when you arrive</p>
Expand All @@ -84,7 +72,6 @@ <h3>6. Mini bars almost always lose money</h3>
<li><a itemprop="keywords" href="http://fakehost/topic/Hotels">Hotels</a></li>
<li><a itemprop="keywords" href="http://fakehost/topic/Hygiene">Hygiene</a></li>
</ul>
<p><a href="http://fakehost/syndication/reuse-permision-form?url=http://www.independent.co.uk/news/business/news/seven-secrets-that-hotel-owners-dont-want-you-to-know-10506160.html" target="_blank"><img src="http://fakehost/sites/all/themes/ines_themes/independent_theme/img/reuse.png" width="25" />Reuse content</a>
</p>
<p><a href="http://fakehost/syndication/reuse-permision-form?url=http://www.independent.co.uk/news/business/news/seven-secrets-that-hotel-owners-dont-want-you-to-know-10506160.html" target="_blank"><img src="http://fakehost/sites/all/themes/ines_themes/independent_theme/img/reuse.png" width="25" />Reuse content</a></p>
</div>
</div>
16 changes: 6 additions & 10 deletions test/test-pages/mozilla-1/expected.html
Original file line number Diff line number Diff line change
Expand Up @@ -10,8 +10,7 @@
<div id="designed-copy">
<h2>Designed to <br />be redesigned</h2>
<p>Get fast and easy access to the features you use most in the new menu. Open the “Customize” panel to add, move or remove any button you want. Keep your favorite features — add-ons, private browsing, Sync and more — one quick click away.</p>
<p><img src="http://mozorg.cdn.mozilla.net/media/img/firefox/desktop/customize/designed-redesigned.fbd3ee9402e6.png" alt="" id="designed-mobile" data-src="//mozorg.cdn.mozilla.net/media/img/firefox/desktop/customize/designed-redesigned.fbd3ee9402e6.png" data-high-res-src="//mozorg.cdn.mozilla.net/media/img/firefox/desktop/customize/designed-redesigned-high-res.6efd60766484.png" />
</p>
<p><img src="http://mozorg.cdn.mozilla.net/media/img/firefox/desktop/customize/designed-redesigned.fbd3ee9402e6.png" alt="" id="designed-mobile" data-src="//mozorg.cdn.mozilla.net/media/img/firefox/desktop/customize/designed-redesigned.fbd3ee9402e6.png" data-high-res-src="//mozorg.cdn.mozilla.net/media/img/firefox/desktop/customize/designed-redesigned-high-res.6efd60766484.png" /></p>
</div>
<div id="flexible-bottom-animation">
<p><img src="http://mozorg.cdn.mozilla.net/media/img/firefox/desktop/customize/animations/flexible-bottom-fallback.cafd48a3d0a4.png" alt="" /></p>
Expand All @@ -37,8 +36,8 @@ <h3>Themes</h3>
<br /> <a rel="external" href="https://support.mozilla.org/kb/use-themes-change-look-of-firefox">Learn more</a>
</p>
</div>
<p><a href="#add-ons" role="button">Next</a></p>
<p><img id="theme-demo" src="http://mozorg.cdn.mozilla.net/media/img/firefox/desktop/customize/theme-red.61611c5734ab.png" alt="Preview of the currently selected theme" />
<p><a href="#add-ons" role="button">Next</a>
<img id="theme-demo" src="http://mozorg.cdn.mozilla.net/media/img/firefox/desktop/customize/theme-red.61611c5734ab.png" alt="Preview of the currently selected theme" />
</p>
</div>
<div id="add-ons" role="tabpanel" aria-labelledby="customize-addons">
Expand All @@ -55,19 +54,16 @@ <h3>Add-ons</h3>
<br /> <a rel="external" href="https://support.mozilla.org/kb/find-and-install-add-ons-add-features-to-firefox">Learn more</a>
</p>
</div>
<p><img src="http://mozorg.cdn.mozilla.net/media/img/firefox/desktop/customize/add-ons.63a4b761f822.png" alt="" />
</p>
<p><img src="http://mozorg.cdn.mozilla.net/media/img/firefox/desktop/customize/add-ons.63a4b761f822.png" alt="" /></p>
</div>
<div id="awesome-bar" role="tabpanel" aria-labelledby="customize-awesomebar">
<div>
<h3>Awesome Bar</h3>
<p><a href="#themes" role="button">Next</a></p>
<p>The Awesome Bar learns as you browse to make your version of Firefox unique. Find and return to your favorite sites without having to remember a URL.</p>
<p><a rel="external" href="https://support.mozilla.org/kb/awesome-bar-find-your-bookmarks-history-and-tabs">See what it can do for you</a>
</p>
<p><a rel="external" href="https://support.mozilla.org/kb/awesome-bar-find-your-bookmarks-history-and-tabs">See what it can do for you</a></p>
</div>
<p><img src="http://mozorg.cdn.mozilla.net/media/img/firefox/desktop/customize/awesome-bar.437df162126c.png" alt="Firefox Awesome Bar" />
</p>
<p><img src="http://mozorg.cdn.mozilla.net/media/img/firefox/desktop/customize/awesome-bar.437df162126c.png" alt="Firefox Awesome Bar" /></p>
</div>
</div>
</div>
Expand Down
2 changes: 2 additions & 0 deletions test/test-pages/simplyfound-1/expected.html
Original file line number Diff line number Diff line change
@@ -1,8 +1,10 @@
<div id="readability-page-1" class="page">
<div>
<p>The Raspberry Pi Foundation started by a handful of volunteers in 2012 when they released the original Raspberry Pi 256MB Model B without knowing what to expect. &nbsp;In a short four-year period they have grown to over sixty full-time&nbsp;employees&nbsp;and have shipped over <b>eight million</b> units to-date. &nbsp;Raspberry Pi has achieved new heights by being shipped to the&nbsp;International&nbsp;Space Station for research and by being an affordable computing platforms used by teachers throughout the world. &nbsp;"It has become the all-time best-selling computer in the UK".</p>
<p><img src="https://d34hb2g9mvfppu.cloudfront.net/m/images/cache/images/2016/02/29/apcnews2012raspberry_pi_logo_mainimage8_jpg8_322_27630a8388eb_lg.jpg" /></p>
<p>Raspberry Pi 3 - A credit card sized PC that only costs $35 - Image: Raspberry Pi Foundation</p>
<p>Raspberry Pi Foundation is charity organization that pushes for a digital revolution with a mission to inspire kids to learn by&nbsp;creating computer-powered objects. &nbsp;The foundation also helps teachers learn computing &nbsp;skills through free training and readily available tutorials &amp; example code for creating cool things such as music.</p>
<p><img src="https://d34hb2g9mvfppu.cloudfront.net/m/images/cache/images/2016/02/29/teachers_classroom_guide_324_a221bf31d64c_lg.png" /></p>
<p>Raspberry Pi in educations - Image: Raspberry Pi Foundation</p>
<p>In celebration of their 4th year&nbsp;anniversary, the foundation has released&nbsp;<b>Raspberry Pi 3</b> with the same price tag of<b>&nbsp;</b>$35 USD. &nbsp;The 3rd revision features a <b>1.2GHz 64-bit quad-core</b>&nbsp;ARM CPU with integrated Bluetooth 4.1 and 802.11n wireless LAN chipsets. &nbsp;The ARM Cortex-A53 CPU along with other architectural enhancements making it the fastest Raspberry Pi to-date. &nbsp;The 3rd revision is reportedly about 50-60% times faster than its predecessor Raspberry Pi 2 and about 10 times faster then the original Raspberry PI.</p>
<p>Raspberry Pi - Various Usage</p>
Expand Down
2 changes: 1 addition & 1 deletion test/test-pages/yahoo-1/expected.html
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
<div id="Col1-0-ContentCanvas-Proxy" data-reactid="406">
<article data-uuid="80b35014-fba3-377e-adc5-47fb44f61fa7" data-type="story" data-reactid="408">
<figure data-type="image" data-reactid="409">
<p><img alt="The PlayStation VR" src="http://l1.yimg.com/ny/api/res/1.2/589noY9BZNdmsUUQf6L1AQ--/YXBwaWQ9aGlnaGxhbmRlcjtzbT0xO3c9NzQ0O2g9NjY5/http://media.zenfs.com/en/homerun/feed_manager_auto_publish_494/4406ef57dcb40376c513903b03bef048" /></p>
<img alt="The PlayStation VR" src="http://l1.yimg.com/ny/api/res/1.2/589noY9BZNdmsUUQf6L1AQ--/YXBwaWQ9aGlnaGxhbmRlcjtzbT0xO3c9NzQ0O2g9NjY5/http://media.zenfs.com/en/homerun/feed_manager_auto_publish_494/4406ef57dcb40376c513903b03bef048" />
<div data-reactid="413">
<figcaption title="Sony’s PlayStation VR." data-reactid="414">
<p>Sony’s PlayStation VR.</p>
Expand Down