Sunday, February 22, 2015

Streamline a hopeless task

I mentioned previously about how to use YARIP to block search results on job boards.  Given the utter futility of trying to use the search tools on these sites to sort the unending stream of useless garbage results, I have gone on a bit of a crusade to reduce the amount of time it takes to sweep the boards I frequent.  Here I will include the YARIP blacklist templates I've compiled for these sites, but the keyword blacklisting technique for enforcing search exclusion is a bit of an inelegance.  Since this keyword list would ideally be used for all sites, it makes more sense to do this sort of multiple-site element blacklisting with a userscript rather than YARIP.  Since I suck at writing Javascript, I just wrote a bash script to generate the XML files for YARIP. 

//div[child::div/div/div/h3/a[contains(translate(., 'ABCDEFGHIJKLMNOPQRSTUVWXYZ', 'abcdefghijklmnopqrstuvwxyz'),  'keyword')]]
For LinkedIn:
//li[child::div/h3/a[contains(translate(., 'ABCDEFGHIJKLMNOPQRSTUVWXYZ', 'abcdefghijklmnopqrstuvwxyz'),  'keyword')]]
//div[child::h2/a[contains(translate(., 'ABCDEFGHIJKLMNOPQRSTUVWXYZ', 'abcdefghijklmnopqrstuvwxyz'),  'keyword')]]
//li[child::a/h4/span[contains(translate(., 'ABCDEFGHIJKLMNOPQRSTUVWXYZ', 'abcdefghijklmnopqrstuvwxyz'),  'keyword')]]
//body[@id='body']/div[4][@class='container']/div[@class='row']/div[@class='col-md-12 ']/div[@class='panel panel-default']/div[@class='panel-body no-padding']/div[@class='row']/div[2][@class='col-sm-8 col-md-6']/ul[@class='list-group list-group-lined margin-bottom-sm']/li[28][@class='list-group-item list-group-item-registration-panel']
//body[@id='body']/div[4][@class='container']/div[@class='row']/div[@class='col-md-12 ']/div[@class='panel panel-default']/div[@class='panel-body no-padding']/div[@class='row']/div[3][@class='col-sm-4 col-md-3']
//body[@id='body']/div[4][@class='container']/div[@class='row']/div[@class='col-md-12 ']/div[@class='panel panel-default']/div[@class='panel-body no-padding']/div[@class='row']/div[3][@class='col-sm-4 col-md-3']/div[3][@class='row ng-scope']/div[@class='col-xs-12']/ul[@class='list-group list-group-lined no-margin-bottom-sm']/li[2][@class='list-group-item']
//body[@id='body']/header/nav[1][@class='navbar navbar-default navbar-fixed-top']
//div[@class='job-companylogo hidden-xs']
//div[@id='TopNav']/div[@class='navbar-left navbar-center ng-scope']
//img[@class='Areas/Jobs/Search-Index/FeaturedJobStar.gif job-featured-icon']
//li[@class='list-group-item google-ad-zone-bg']
//li[child::div/a/h4[contains(translate(., 'ABCDEFGHIJKLMNOPQRSTUVWXYZ', 'abcdefghijklmnopqrstuvwxyz'),  'keyword')]] search results with cleanup and keyword exclusion blacklists search results with cleanup and keyword exclusion blacklists
As the images hopefully convey, a bit of work with YARIP blacklists makes it easy to see at a glance if there are any interesting results on a given page.  Removing the clutter and known irrelevant chaff speed up the process and make the experience much less frustrating ... though it does nothing to make filling out broken applications any less of a nightmare.  Of course, it takes a while to build a suitable blacklist of keywords based on what you are and are not looking for.

The hazard that comes with forcing search relevance is that you may find that very few jobs are posted that will fit even an unreasonably broad set of criteria.  With my short list of keywords to essentially enforce "entry level electrical engineer", I eliminated roughly 60% to 80% of all search results.  If something similar happens to you, do not be alarmed, but know that the people who speak of booming demand for tech jobs and a robust economic recovery are not merely mistaken, but lying