I’ve already ranted about my document scanner suite. I have recently updated it to add new features.
The basic workflow goes like this:
- You run the “scan” command. This usually happens by clicking the desktop icon for the launcher, but you can also run it on a command line.
- The program prompts you for a document name. Aside from being different from any existing document name (to avoid accidental overwriting) you are free to choose any valid file name.
- The program starts scanning pages. Every time a page is scanned, a preview is shown and the user can accept or try again. Every time a page is accepted, the user is allowed to scan another page or stop scanning.
- Every scanned page is saved to TIFF on the fly. Once all pages have been retrieved, they are converted to PNM, then to DJVU. This conversion step takes around two minutes per page on my computer. Then, all DJVU files are bundled together as a single file.
- The bundled DJVU is stored both locally and on a backup server through FTP.
Once the manual scan-preview-confirm process has ended, the lengthy compression and upload stage starts, but is completely non-interactive. It is therefore possible to start scanning another document (or do something else) while it finishes.
I have also reduced the resolution from 300 dpi to 150 dpi, as it remains quite readable. This has resulted in a reduction in file size from around 8MiB PNG files to 2MiB TIFF files, which are in turn compressed to 1MiB DJVU files. My current library of scanned pages (mostly administrative documents, reports and contracts) weighs in at around 150MiB instead of the previous 1.1GiB.
Below is a scan of Papier d’Arménie made by my delightful assistant:
The Objective Caml source code for running this little baby follows below:
exception CommandFailed of int let run command = print_endline command ; let result = Sys.command command in if result <> 0 then raise (CommandFailed result) let ask request = print_endline ( "# " ^ request ) ; read_line () let tmp ext = Filename.temp_file "" ext let say format = Printf.printf ("# " ^^ format) (* Scan a page, display the result, ask if the user wants to keep it (tries again until it gets the scan right) and returns the filename where the successful scan was saved. *) let rec scan_to_tiff () = let file = tmp ".tiff" in run ("scanimage -l 0 -t 0 -x 215 -y 297 --brightness -22 " ^ "--contrast 22 --resolution 150 --progress --mode Gray " ^ "--format=tiff > " ^ file) ; run ("display " ^ file) ; if ask "keep this page? [Yn]" <> "n" then file else scan_to_tiff () (* Scan individual pages (using scan_to_tiff) until the user decides to stop. If an individual scan fails due to system errors, allows retrying. Returns the list of all filenames the user agreed with. *) let rec scan_list_to_tiff () = try let file = scan_to_tiff () in if ask "scan another page? [Yn]" <> "n" then file :: scan_list_to_tiff () else [file] with CommandFailed i -> say "command failed with exit code %d\n" i ; if ask "try again? [Yn]" <> "n" then scan_list_to_tiff () else [] (* Turn individual image into djvu image. Returns djvu filename if successful. *) let rec tiff_to_djvu file = let pnm = tmp ".ppm" in let djvu = tmp ".djvu" in run ( "convert " ^ file ^ " " ^ pnm ) ; run ( "cpaldjvu " ^ pnm ^ " " ^ djvu ) ; djvu (* Turn a set of images into individual djvu pages. Allow skipping or retrying on error during the conversion process. *) let rec tiff_list_to_djvu_list = function | [] -> [] | file :: list -> try tiff_to_djvu file :: tiff_list_to_djvu_list list with CommandFailed i -> say "command failed with exit code %d\n" i ; if ask "try again? [Yn]" <> "n" then tiff_list_to_djvu_list (file :: list) else tiff_list_to_djvu_list list (* Turn a list of individual djvu files into a bundled djvu file. *) let rec make_djvu_bundle file list = try if list = [] then false else if List.tl list = [] then ( run ( "cp " ^ List.hd list ^ " " ^ file ) ; true ) else ( run ( "djvm " ^ file ^ " " ^ String.concat " " list) ; true ) with CommandFailed i -> say "command failed with exit code %d\n" i ; if ask "try again? [Yn]" <> "n" then make_djvu_bundle file list else ( say "scan aborted" ; false ) (* Choose a name for the output djvu file *) let rec choose_djvu_filename () = let path = "/home/arkadir/docs/" in let name = ask "document name (extension will be added automatically) ?" in if name <> "" && name <> Filename.basename name then ( say "incorrect filename" ; choose_djvu_filename () ) else if Sys.file_exists (Filename.concat path (name ^ ".djvu")) then ( say "file already exists" ; choose_djvu_filename () ) else Filename.concat path (name ^ ".djvu") (* Upload a file to an ftp server. *) let rec upload_file file = try run ( "ncftpput -f /home/arkadir/docs/ftp.cfg /home/www/blog/docs " ^ file ) with CommandFailed i -> say "command failed with exit code %d\n" i ; if ask "try again? [Yn]" <> "n" then upload_file file else say "upload aborted" (* Complete process *) let _ = let name = choose_djvu_filename () in let files = tiff_list_to_djvu_list (scan_list_to_tiff ()) in if make_djvu_bundle name files then upload_file name
This requires the classic djvuLibre utils to be installed (cpaldjvu and djvm), as well as imagemagick (convert) and ncftp (ncftpput). Scanning happens with sane (scanimage). Some files are also uploaded to my web server, where I use “convert -thumbnail” to create thumbnails from DJVU files.

Hi. I'm Victor Nicollet,
Recent Comments