Daily Archive for July 27th, 2009

Scanners, Again

I’ve already ranted about my document scanner suite. I have recently updated it to add new features.

The basic workflow goes like this:

  • You run the “scan” command. This usually happens by clicking the desktop icon for the launcher, but you can also run it on a command line.
  • The program prompts you for a document name. Aside from being different from any existing document name (to avoid accidental overwriting) you are free to choose any valid file name.
  • The program starts scanning pages. Every time a page is scanned, a  preview is shown and the user can accept or try again. Every time a page is accepted, the user is allowed to scan another page or stop scanning.
  • Every scanned page is saved to TIFF on the fly. Once all pages have been retrieved, they are converted to PNM, then to DJVU. This conversion step takes around two minutes per page on my computer. Then, all DJVU files are bundled together as a single file.
  • The bundled DJVU is stored both locally and on a backup server through FTP.

Once the manual scan-preview-confirm process has ended, the lengthy compression and upload stage starts, but is completely non-interactive. It is therefore possible to start scanning another document (or do something else) while it finishes.

I have also reduced the resolution from 300 dpi to 150 dpi, as it remains quite readable. This has resulted in a reduction in file size from around 8MiB PNG files to 2MiB TIFF files, which are in turn compressed to 1MiB DJVU files. My current library of scanned pages (mostly administrative documents, reports and contracts) weighs in at around 150MiB instead of the previous 1.1GiB.

Below is a scan of Papier d’Arménie made by my delightful assistant:

The Objective Caml source code for running this little baby follows below:

exception CommandFailed of int

let run command =
  print_endline command ;
  let result = Sys.command command in
    if result <> 0 then raise (CommandFailed result)

let ask request =
  print_endline ( "# " ^ request ) ;
  read_line ()

let tmp ext =
  Filename.temp_file "" ext

let say format =
  Printf.printf ("# " ^^ format)

(* Scan a page, display the result, ask if the user wants to keep it
   (tries again until it gets the scan right) and returns the filename
   where the successful scan was saved. *)
let rec scan_to_tiff () =
  let file = tmp ".tiff" in
    run ("scanimage -l 0 -t 0 -x 215 -y 297 --brightness -22 "
         ^ "--contrast 22 --resolution 150 --progress --mode Gray "
         ^ "--format=tiff > " ^ file) ;
    run ("display " ^ file) ;
    if ask "keep this page? [Yn]" <> "n" then
      file
    else
      scan_to_tiff ()

(* Scan individual pages (using scan_to_tiff) until the user decides to
   stop. If an individual scan fails due to system errors, allows retrying.
   Returns the list of all filenames the user agreed with. *)
let rec scan_list_to_tiff () =
  try
    let file = scan_to_tiff () in
      if ask "scan another page? [Yn]" <> "n" then
        file :: scan_list_to_tiff ()
      else
        [file]
  with CommandFailed i ->
    say "command failed with exit code %d\n" i ;
    if ask "try again? [Yn]" <> "n" then
      scan_list_to_tiff ()
    else
      []

(* Turn individual image into djvu image. Returns djvu filename
   if successful. *)
let rec tiff_to_djvu file =
  let pnm = tmp ".ppm" in
  let djvu = tmp ".djvu" in
    run ( "convert " ^ file ^ " " ^ pnm ) ;
    run ( "cpaldjvu " ^ pnm ^ " " ^ djvu ) ;
    djvu

(* Turn a set of images into individual djvu pages. Allow skipping
   or retrying on error during the conversion process. *)
let rec tiff_list_to_djvu_list = function
  | [] -> []
  | file :: list ->
    try
      tiff_to_djvu file :: tiff_list_to_djvu_list list
    with CommandFailed i ->
      say "command failed with exit code %d\n" i ;
      if ask "try again? [Yn]" <> "n" then
        tiff_list_to_djvu_list (file :: list)
      else
        tiff_list_to_djvu_list list

(* Turn a list of individual djvu files into a bundled djvu file. *)
let rec make_djvu_bundle file list =
  try
    if  list = [] then
      false
    else if List.tl list = [] then
      ( run ( "cp " ^ List.hd list ^ " " ^ file ) ; true )
    else
      ( run ( "djvm " ^ file ^ " " ^ String.concat " " list) ; true )
  with CommandFailed i ->
    say "command failed with exit code %d\n" i ;
    if ask "try again? [Yn]" <> "n" then
      make_djvu_bundle file list
    else
     ( say "scan aborted" ; false )

(* Choose a name for the output djvu file *)
let rec choose_djvu_filename () =
  let path = "/home/arkadir/docs/" in
  let name = ask "document name (extension will be added automatically) ?" in
    if name <> "" && name <> Filename.basename name then
      ( say "incorrect filename" ; choose_djvu_filename () )
    else if Sys.file_exists (Filename.concat path (name ^ ".djvu")) then
      ( say "file already exists" ; choose_djvu_filename () )
    else
      Filename.concat path (name ^ ".djvu")

(* Upload a file to an ftp server. *)
let rec upload_file file =
  try
    run ( "ncftpput -f /home/arkadir/docs/ftp.cfg /home/www/blog/docs " ^ file )
  with CommandFailed i ->
    say "command failed with exit code %d\n" i  ;
    if ask "try again? [Yn]" <> "n" then
      upload_file file
    else
      say "upload aborted"

(* Complete process *)
let _ =
  let name = choose_djvu_filename () in
  let files = tiff_list_to_djvu_list (scan_list_to_tiff ()) in
    if make_djvu_bundle name files then
      upload_file name

This requires the classic djvuLibre utils to be installed (cpaldjvu and djvm), as well as imagemagick (convert) and ncftp (ncftpput). Scanning happens with sane (scanimage). Some files are also uploaded to my web server, where I use “convert -thumbnail” to create thumbnails from DJVU files.



1150 feed subscribers
(readers who polled a feed this week)