Tuesday, June 16, 2015

Digitize all those binders full of notes

Ever find yourself referring to your old notes and school work?  Why not cram that giant stack of sketchy, error-riddled paper into a pdf or nine?  Just think of all the advantages!
  • they would be more portable - you could put them on a usb stick or a phone
  • they would be easier to reference while doing design work on the computer
  • they would take up less space and collect fewer dead bugs 
  • they might not get damaged by the leaky roof or eaten by termites
  • they could be shared with people who want to learn that they can't read your shitty handwriting
Maybe you could even have the prescience to approach this task before you're two years past graduation!

Just some books and binders

Over my many years wasting my life with a bad drawing habit, I've learned one thing from flatbed scanners: graphite pencil marks on paper are reflective.  This means that for particular illumination angles, scanned or photographed pencil handwriting/drawings may be washed out very effectively.  Long ago, I simply abandoned scanners for drawing purposes because of this.  I decided to simply use my second-hand smartphone as a camera.  I grabbed some scrap metal from the pile next to the bandsaw, and after some sawing and hammering and hot glue, I created a tripod mount for the phone.  I set up a light tent and made a staging easel out of a pizza box.  After ten minutes of page flipping, I have a hundred pictures of upside-down pages that I can hardly read with my D-grade eyeballs. 

Rectification and contrast enhancement

Now, how to process the images?  For most purposes, processing of scanned books or handwriting can be done with a simple procedure:
  • desaturate original image
  • make a copy
  • blur the copy with a large kernel
  • blend the blurred mask with the original in a Divide mode
  • re-adjust levels as appropriate
This works well and produces a high-contrast image.  Keep in mind though why it works.  This is essentially a high-pass filter method.  Only the low spatial frequency content survives the blurring operation and is removed in the blend operation.  This removes the effects of uneven lighting or slight paper contours, but if the page content is not restricted to narrow lines, we'll run into problems. 

Excess filtering on pages with graphics

Let's say some of the pages have printed figures or tables; the removal of low-frequency content will tend to leave only edges of any solid dark regions.  In the binders that contained occasional printed graphics, I used a different method for processing printed pages.  Since most of my handwritten notes are on yellow paper, I simply processed non-yellow pages differently.  If I know there are no graphics, I can just ignore the testing routine.

The color-testing routine finds the average color of an annulus of the page so as to ignore content and page placement inaccuracy.  One convenience of this is that images that are processed with the high-pass filter method can be re-colorized if desired.  I personally don't find this to help with visual contrast, so I didn't use it.

#!/bin/bash
# process photos of coursework pages from binders
# uses slower method of contrast mask generation and overlay 

#581 1773x2283+45+543 180
#487 1758x2286+54+546 180
#488 2220x1716+321+36 270
#221 1785x2325+51+531 180
#GDn 1755x2394+24+657 180
#471 1803x2319+33+540 180
#537 1779x2286+45+552 180

pdfname="ECE537_Integrated_Photonics"
cropsize="1779x2286+45+552" #the rect parameters for cropping
rotateamt="180"    #how much to rotate after cropping

indir="originals" #this is the local directory name where the original images are
outdir="output"   #this is the local directory name where the script will shit out the processed pages
outqual=85   #percent jpeg quality
hconly=1   #always assume pages are handwritten (used when there aren't any printed graphics pages)
retint=0   #percent retint for yellow pages
retintmode="multiply"

# ###########################################################################
if [ $hconly == 1 ]; then 
 echo "high contrast mode"
else
 echo "auto contrast mode"
fi

page=1
for infile in $(ls -1 $indir/*.jpg); do
 outfile="$outdir/output-$(printf "%04d" $page).jpg"
 jpegtran -crop $cropsize -trim -copy none $infile | \
 jpegtran -rotate $rotateamt -trim -outfile temp.jpg
 
 if [ $hconly == 0 ]; then 
  # get average page color excluding border and content
  imgstring=$(convert \( temp.jpg -threshold -1 -scale 95% \) \
    \( temp.jpg -threshold 100% -scale 80% \) \
   -gravity center -compose multiply -composite - | \
   convert temp.jpg - -alpha off -gravity center -compose copy_opacity -composite -resize 1x1 txt:)
  RGB=$(echo $imgstring | sed 's/ //g' | sed 's/(/ /g' | sed 's/)/ /g' | sed 's/,/ /g' | cut -d ' ' -f 6-8)
  R=$(echo $RGB | cut -d ' ' -f 1)
  G=$(echo $RGB | cut -d ' ' -f 2)
  B=$(echo $RGB | cut -d ' ' -f 3)
  isyel=$(echo "($R+$G)/2 > $B*1.3" | bc)
  #echo $imgstring
  echo $R $G $B ">> $page is yellow? >>" $isyel
 fi

 if [ $hconly == 1 ] || [ $isyel == 1 ]; then
  # if page is yellow, do 100% contrast enhancement and partial page re-tinting 
  if [ $retint != 0 ]; then 
   convert -modulate 100,0 temp.jpg - | \
   convert - \( +clone -filter Gaussian -resize 25% -define filter:sigma=25 -resize 400% \) -compose Divide_Src -composite - | \
   convert -level 70%,100% -quality 100 - temp.jpg
   convert temp.jpg \( +clone -background "rgb($R,$G,$B)" -compose Dst -flatten \) -compose $retintmode -composite - | \
   convert temp.jpg - -compose blend -define compose:args=$retint -composite - | \
   convert -quality $outqual - $outfile
  else
   convert -modulate 100,0 temp.jpg - | \
   convert - \( +clone -filter Gaussian -resize 25% -define filter:sigma=25 -resize 400% \) -compose Divide_Src -composite - | \
   convert -level 70%,100% -quality $outqual - $outfile  
  fi
 else
  # if page is not yellow, retain most color and do a 50% contrast enhancement
  convert -modulate 100,80 temp.jpg - | \
  convert - \( +clone -filter Gaussian -resize 25% -define filter:sigma=25 -resize 400% \) -compose Divide_Src -composite - | \
  convert - temp.jpg -compose blend -define compose:args=50 -composite - | \
  convert -level 25%,100% -quality $outqual - $outfile
 fi

 #echo $infile
 page=$[$page+1]
done

rm temp.jpg
convert $outdir/*.jpg $pdfname.pdf


The blurring method entails a size reduction and expansion.  This has two purposes: First, it speeds up blurs with a large kernel (in this case, by about a factor of 3); second, it helps reduce vignetting effects that would otherwise be caused by a simple "-blur 0,100" operation.  If a simple blur is used, it would help to crop the page oversize, then trim it down after contrast enhancement or after the blur itself.

-blur 0,100 -filter Gaussian -resize 25% -define filter:sigma=25 -resize 400%
Difference between simple blur and resize+blur methods

Of course you can guess that I'd do this in bash.  This is about as ad-hoc as they come.  It's not even externally parameterized.  I was thinking about doing this with Matlab, but I decided that I'd rather pull my hair out trying to do image stacks in ImageMagick.  Tell it where the files are, how they should be checked and compressed, and the script will grind them into a pdf for you.  I highly doubt anyone would ever actually use this ugly code, but I'm posting it anyway because I have nothing else to do. Still, I'm not dumb enough to post my horrible notes in whole.

1 comment:

  1. Great job. This is great digitizing of binders full of notes. Thank you very much. This will made ease for everyone. Good day

    ReplyDelete