1. Shell-Scripting for Fun and Profit: Finding, Downloading, and Merging Course Slides to a Single PDF

    Motivation

    Professors often provide a page with links to lecture slides in PDF format. The main benefit of joining them to a single document, is that one can use a reader’s search function to find topics and keywords faster than when searching in multiple documents.

    The Script

    Change the URL variable to the page with links to lecture slides. Edit the grep and pdfjoin lines to match filenames of provided lecture slides.

    #! /bin/sh
    
    TARGET_FILE=all-lectures.pdf
    URL="http://www.ida.liu.se/~TDDC88/theory/lectures.shtml"
    
    for prog in wget lynx pdfjoin; do
        which $prog 1>/dev/null
        if [ $? -ne 0 ]; then
            echo $prog needed but not found.
            exit 1
        fi
    done
    
    PDF_URLS=$(
        lynx -listonly -dump -hiddenlinks=merge $URL \
        | tail -n+4 \
        | awk '{print $2}' \
        | grep 'lecture-.*-pps6.pdf'
        )
    
    TEMPDIR=$(mktemp -d)
    cd $TEMPDIR
    echo Fetching PDFs...
    wget $PDF_URLS
    echo Joining documents...
    pdfjoin $(seq -f 'lecture-%g-*-pps6.pdf' 1 $(echo $PDF_URLS | wc -w))
    cd -
    mv $TEMPDIR/*-joined.pdf $TARGET_FILE
    rm -r $TEMPDIR
    echo PDF $TARGET_FILE was generated.
    

    Future Improvements

    • Get rid of the lynx dependency
    • Download all found documents simultaneously (in background jobs, perhaps)
     
    1. klaatu reblogged this from spantz
    2. spantz posted this