Friday, November 21, 2014

Remove unwanted results and other garbage from google searches

Do I have to even explain why this is useful?  I would just assume that people shouldn't naturally enjoy having their daily tasks obstructed by the injection of useless garbage into the information they're trying to parse, but the fact that part of this solution addresses social media links tells me this is probably not the case.

As usual, YARIP comes to the rescue.  To get rid of the top, bottom, and side ads:
//div[@id='tvcap']
//div[@id='bottomads']
//div[@id='rhs_block']
To get rid of extended results boxes for local business locations:
//div/li[@id='lclbox']
To remove all listings from a particular website from the search results:
//li[child::div/div/div/div/cite[contains(., 'pinterest.com')]]
//li[child::div/div/div/div/cite[contains(., 'facebook.com')]]
//li[child::div/div/div/div/cite[contains(., 'twitter.com')]]
//li[child::div/div/div/div/cite[contains(., 'huffingtonpost.com')]]
//li[child::div/div/div/div/cite[contains(., 'pitchfork.com')]]
While it's certainly nice to be able to slap that useless trash off the visible web, there are other more useful strategies for single-site blocking.  How about getting rid of sites that just dilute the results with repeated similar pages?
//li[child::div/div/div/div/cite[contains(., 'alibaba.com')]]
Google does provide several different forms of "extended results", such as the local business listings, maps, conversion utilities, image search previews, and youtube items.  Each of these can also be blocked depending on what irks you and what your browsing habits cause you to encounter.

Exclusion through YARIP is also a quick cure for frustration on any other sites that tend to return volumes of irrelevant or unwanted results for specific searches or where available search tools don't allow exclusions.  One of my current favorite applications is dealing with poor search refinement tools on internet job boards.  For example, perform case insensitive exclusion with in the job title field on LinkedIn:
//li[child::div/h3/a[contains(translate(., 'ABCDEFGHIJKLMNOPQRSTUVWXYZ', 'abcdefghijklmnopqrstuvwxyz'),  'sales engineer')]]
//li[child::div/h3/a[contains(translate(., 'ABCDEFGHIJKLMNOPQRSTUVWXYZ', 'abcdefghijklmnopqrstuvwxyz'),  'senior')]]
//li[child::div/h3/a[contains(translate(., 'ABCDEFGHIJKLMNOPQRSTUVWXYZ', 'abcdefghijklmnopqrstuvwxyz'),  'federal government')]]
//li[child::div/h3/a[contains(translate(., 'ABCDEFGHIJKLMNOPQRSTUVWXYZ', 'abcdefghijklmnopqrstuvwxyz'),  'information systems')]]
//li[child::div/h3/a[contains(translate(., 'ABCDEFGHIJKLMNOPQRSTUVWXYZ', 'abcdefghijklmnopqrstuvwxyz'),  'information technology')]]
//li[child::div/h3/a[contains(translate(., 'ABCDEFGHIJKLMNOPQRSTUVWXYZ', 'abcdefghijklmnopqrstuvwxyz'),  'executive')]]
Of course, each site and set of search tools has its own limitations and it's the conflict between these limitations and your usage patterns that will dictate how useful any of this might be. 

No comments:

Post a Comment