Showing posts with label YARIP. Show all posts
Showing posts with label YARIP. Show all posts

Saturday, September 12, 2015

Remove unwanted garbage from YouTube sidebar

Nothing much to say about this one, really.  Eliminate useless clickbait bullshit from the related videos sidebar in YouTube.  Why are there 20 videos for preschool kids in the sidebar when I'm watching a video about running a lathe?  I don't want to see that shit ever again.

The way I look at it, there are at least three classes of video links that can be identified as disposable:
  • Videos from certain channels
  • Videos with an astronomical number of views
  • Videos that are "Recommended"
I eliminate these targets with YARIP.  It might be desired to blacklist more things, but I got tired of stabbing blindly at other attempts while exploring the limited functions of XPATH 1.0.

//li[child::div/a/span[@class="stat attribution"]/span[contains(.,'Busy Beavers')]]
//li[child::div/a/span[@class="stat attribution"]/span[contains(.,'BuzzFeed')]]
//li[child::div/a/span[@class="stat attribution"]/span[contains(.,'Danger Dolan')]]
//li[child::div/a/span[@class="stat view-count" and number(translate(substring-before(.," "),',','')) > 3000000]]
//li[child::div/a/span[@class="stat view-count" and contains(., 'Recommended')]]

The particular channel names and limiting views count can be tailored for your experience and taste. 

Sunday, February 22, 2015

Streamline a hopeless task

I mentioned previously about how to use YARIP to block search results on job boards.  Given the utter futility of trying to use the search tools on these sites to sort the unending stream of useless garbage results, I have gone on a bit of a crusade to reduce the amount of time it takes to sweep the boards I frequent.  Here I will include the YARIP blacklist templates I've compiled for these sites, but the keyword blacklisting technique for enforcing search exclusion is a bit of an inelegance.  Since this keyword list would ideally be used for all sites, it makes more sense to do this sort of multiple-site element blacklisting with a userscript rather than YARIP.  Since I suck at writing Javascript, I just wrote a bash script to generate the XML files for YARIP. 

For jobs.ieee.org:
//div[@id='ad_module_120x90']
//div[@id='ad_module_300x250']
//div[child::div/div/div/h3/a[contains(translate(., 'ABCDEFGHIJKLMNOPQRSTUVWXYZ', 'abcdefghijklmnopqrstuvwxyz'),  'keyword')]]
For LinkedIn:
//div[@id='header']
//div[@id='responsive-nav-scrollable']
//div[@id='top-header']
//li[child::div/h3/a[contains(translate(., 'ABCDEFGHIJKLMNOPQRSTUVWXYZ', 'abcdefghijklmnopqrstuvwxyz'),  'keyword')]]
For Indeed.com:
//div[@id='bjobalerts']
//div[@id='femp_list']/div[2][@class='femp_item']
//div[@id='serpRecommendations']
//div[@id='tjobalerts']
//div[child::h2/a[contains(translate(., 'ABCDEFGHIJKLMNOPQRSTUVWXYZ', 'abcdefghijklmnopqrstuvwxyz'),  'keyword')]]
For Ziprecruiter.com:
//div[@id='email_alert_form_wrapper']
//li[child::a/h4/span[contains(translate(., 'ABCDEFGHIJKLMNOPQRSTUVWXYZ', 'abcdefghijklmnopqrstuvwxyz'),  'keyword')]]
For Beyond.com:
//body[@id='body']/div[4][@class='container']/div[@class='row']/div[@class='col-md-12 ']/div[@class='panel panel-default']/div[@class='panel-body no-padding']/div[@class='row']/div[2][@class='col-sm-8 col-md-6']/ul[@class='list-group list-group-lined margin-bottom-sm']/li[28][@class='list-group-item list-group-item-registration-panel']
//body[@id='body']/div[4][@class='container']/div[@class='row']/div[@class='col-md-12 ']/div[@class='panel panel-default']/div[@class='panel-body no-padding']/div[@class='row']/div[3][@class='col-sm-4 col-md-3']
//body[@id='body']/div[4][@class='container']/div[@class='row']/div[@class='col-md-12 ']/div[@class='panel panel-default']/div[@class='panel-body no-padding']/div[@class='row']/div[3][@class='col-sm-4 col-md-3']/div[3][@class='row ng-scope']/div[@class='col-xs-12']/ul[@class='list-group list-group-lined no-margin-bottom-sm']/li[2][@class='list-group-item']
//body[@id='body']/header/nav[1][@class='navbar navbar-default navbar-fixed-top']
//div[@class='job-companylogo hidden-xs']
//div[@class='job-footer']
//div[@id='Display_Header_Top_Text_Position1']
//div[@id='TopNav']/div[@class='navbar-left navbar-center ng-scope']
//img[@class='Areas/Jobs/Search-Index/FeaturedJobStar.gif job-featured-icon']
//li[@class='list-group-item google-ad-zone-bg']
//li[child::div/a/h4[contains(translate(., 'ABCDEFGHIJKLMNOPQRSTUVWXYZ', 'abcdefghijklmnopqrstuvwxyz'),  'keyword')]]
Beyond.com search results with cleanup and keyword exclusion blacklists

Indeed.com search results with cleanup and keyword exclusion blacklists
As the images hopefully convey, a bit of work with YARIP blacklists makes it easy to see at a glance if there are any interesting results on a given page.  Removing the clutter and known irrelevant chaff speed up the process and make the experience much less frustrating ... though it does nothing to make filling out broken applications any less of a nightmare.  Of course, it takes a while to build a suitable blacklist of keywords based on what you are and are not looking for.

The hazard that comes with forcing search relevance is that you may find that very few jobs are posted that will fit even an unreasonably broad set of criteria.  With my short list of keywords to essentially enforce "entry level electrical engineer", I eliminated roughly 60% to 80% of all search results.  If something similar happens to you, do not be alarmed, but know that the people who speak of booming demand for tech jobs and a robust economic recovery are not merely mistaken, but lying

Friday, November 21, 2014

Remove unwanted results and other garbage from google searches

Do I have to even explain why this is useful?  I would just assume that people shouldn't naturally enjoy having their daily tasks obstructed by the injection of useless garbage into the information they're trying to parse, but the fact that part of this solution addresses social media links tells me this is probably not the case.

As usual, YARIP comes to the rescue.  To get rid of the top, bottom, and side ads:
//div[@id='tvcap']
//div[@id='bottomads']
//div[@id='rhs_block']
To get rid of extended results boxes for local business locations:
//div/li[@id='lclbox']
To remove all listings from a particular website from the search results:
//li[child::div/div/div/div/cite[contains(., 'pinterest.com')]]
//li[child::div/div/div/div/cite[contains(., 'facebook.com')]]
//li[child::div/div/div/div/cite[contains(., 'twitter.com')]]
//li[child::div/div/div/div/cite[contains(., 'huffingtonpost.com')]]
//li[child::div/div/div/div/cite[contains(., 'pitchfork.com')]]
While it's certainly nice to be able to slap that useless trash off the visible web, there are other more useful strategies for single-site blocking.  How about getting rid of sites that just dilute the results with repeated similar pages?
//li[child::div/div/div/div/cite[contains(., 'alibaba.com')]]
Google does provide several different forms of "extended results", such as the local business listings, maps, conversion utilities, image search previews, and youtube items.  Each of these can also be blocked depending on what irks you and what your browsing habits cause you to encounter.

Exclusion through YARIP is also a quick cure for frustration on any other sites that tend to return volumes of irrelevant or unwanted results for specific searches or where available search tools don't allow exclusions.  One of my current favorite applications is dealing with poor search refinement tools on internet job boards.  For example, perform case insensitive exclusion with in the job title field on LinkedIn:
//li[child::div/h3/a[contains(translate(., 'ABCDEFGHIJKLMNOPQRSTUVWXYZ', 'abcdefghijklmnopqrstuvwxyz'),  'sales engineer')]]
//li[child::div/h3/a[contains(translate(., 'ABCDEFGHIJKLMNOPQRSTUVWXYZ', 'abcdefghijklmnopqrstuvwxyz'),  'senior')]]
//li[child::div/h3/a[contains(translate(., 'ABCDEFGHIJKLMNOPQRSTUVWXYZ', 'abcdefghijklmnopqrstuvwxyz'),  'federal government')]]
//li[child::div/h3/a[contains(translate(., 'ABCDEFGHIJKLMNOPQRSTUVWXYZ', 'abcdefghijklmnopqrstuvwxyz'),  'information systems')]]
//li[child::div/h3/a[contains(translate(., 'ABCDEFGHIJKLMNOPQRSTUVWXYZ', 'abcdefghijklmnopqrstuvwxyz'),  'information technology')]]
//li[child::div/h3/a[contains(translate(., 'ABCDEFGHIJKLMNOPQRSTUVWXYZ', 'abcdefghijklmnopqrstuvwxyz'),  'executive')]]
Of course, each site and set of search tools has its own limitations and it's the conflict between these limitations and your usage patterns that will dictate how useful any of this might be. 

Monday, October 20, 2014

Kill internet trolls with fire

Well maybe it's not actual fire, but it certainly gives me a warm feeling inside.

Do you frequent any sites where the user comment section is overrun with trolls, plants, and paid shills? Are you sick of their comments reminding you that they remain capable of breathing? Use YARIP to eliminate all their comments from view. Just use Firefox's element inspector (use the right-click menu) to take a peek at the local structure and put together an appropriate XPATH to use in YARIP.

For example, delete entire posts based on a partial name match on reason.com:
//li[child::p/strong[contains(., 'Viagra')]]
Or delete just the post content based on a full name match on zerohedge.com:
//div[child::div/b/a[@href="/users/bangalore-equity-trader"]]/div[@class='comment-content']
Adapt to suit your browsing habits.

Always expand YouTube video descriptions

I never understood the point of creating website content and then persistently hiding it from users. Do stores put their merchandise on top of tall greased poles to improve sales?  With Google continually finding ways to obfuscate interfaces for no clear reason, I'm reluctantly conditioned to lose hope in the suspicion that the defining characteristic of the modern web is waste without reason -- but masturbatory social media should have made that conclusion obvious long ago.  It's a world of greased poles either way. 

Today's drop in the sea, the pinprick distracting from a thousand knife wounds, is the YouTube video watch page.  Why hide the video description? Fix that fucking shit with YARIP.  The following ugly attribute deletion expands the description, and the second XPATH removes the more/less buttons.
//div[@id='action-panel-details']/attribute::class
//div[@id='action-panel-details']/button
As an alternative and more appropriate approach, instead of using the first line to delete the class attribute for 'action-panel-details', YARIP can be configured to perform an attribute substitution.  From the YARIP's page manager, under the attribute tab, the class attribute for the 'action-panel-details' element can be simply set to
'action-panel-content yt-card yt-card-has-padding yt-uix-expander'
As a bonus, try this in YARIP to get rid of a fraction of the unrelated garbage in the sidebar. 
//li[child::a/span[text()="Recommended for you"]]
I'd like to find a way to fix the comment sorting, but I haven't had much luck so far.  It's not exactly as if I know what I'm doing.

Alternatively, the expansion itself can be accomplished using a userscript.   I use Scriptish, but Greasemonkey may also work. 
// ==UserScript==
// @name          YouTube Show Description
// @description   show the full description for fucks sake
// @namespace     http://www.stopbreakingyoutubeyouassholes.com
// @include       https://www.youtube.com/watch?*
// ==/UserScript==

document.getElementById('action-panel-details').className = 'action-panel-content yt-card yt-card-has-padding yt-uix-expander';