Numerical digressions

vendredi 21 février 2014

A very long time indeed

As you've probably realized, I didn't post much here.
Not when I was actively working as a researcher, and not since then. But I am starting to realize how important is will be for me to be able to write what I am doing.
Since a few months now (almost a year) I have started another activity. I work as a consultant for young entrepreneurs from Paris area, helping them formalize grants and subsidies proposals.
It is not an easy job but it is very rewarding.
But I have bigger plans for the future.
I will try to update regularly here. I promised myself to (over)think less and to produce more.
I need to put that into action. The future is now.

mardi 21 mai 2013

Powerfull images manipulations with ImageMagick, ghostscript, and others.

I kept this post in my "draft" folder for a long time. Originaly, this was devised because I found a scanned copy of a very old, out-of-print, scientific monography, that was quite valuable to my research at the time (and also interesting in a historical perspective). Of course, no official electronic version of this book existed, and the second-hand versions of this book sold on ebay were grossly overpriced.
Copyrights issues set aside (the writer just unfortunately passed away), reading from a set of separated scanned files is really not convenient, and not really compatible with the electronic-book managment software Calibre that I am currently using in combination with an "old" Kindle 4.

Thus, I had to find a way to generate a sufficiently small electronic version of the set, say .pdf. And the open-source software Imagemagick is just fit to the task, more specifically through the command-line tool Convert. Note that for pdf generation/deconstruction, Convert depends on Ghostscript.

After a few years using this routinely - and searching routinely in my "draft" folder the informations contained here, I just decided to release it at is it. Hope this help.

---

So, you have this set of black-and-white scanned pictures in .png format, numbered from 001.png to, say, 300.png. And you want to convert that in pdf. You can actually do this with the following command (in bash):

convert *.png <name>.pdf

The result is a 300 pages document <name>.pdf. Yeah, it is that simple.
However, BEWARE: this operation has a huge memory requirement ; this is most probably because each file is first uncompressed by ghostscript prior to the pdf generation. As an example, in my case, it requires about 10 GB of RAM. On most systems, this command will probably freeze your computer [1].

Thus, you will have to optimize the output. For that, there are a huge number of options in Convert. The first one you want to use systematically one converting from/to pdf is the -verbose option, to see how it is going. It will print you the size of each file during processing ; if it is slowing down dramatically, it is probably time to hit Ctrl+C and to try another way.

First, convert each picture to pdf format ; in bash, you can use the one-liner

for i in *.png ; do f=`echo $i | sed s/.png/.pdf/` ; convert $i $f ; echo $f ; done

Then, use directly ghoscript to merge all the files together

gs -dNOPAUSE -sDEVICE=pdfwrite -sOUTPUTFILE=<name>.pdf -dBATCH *pdf

Et voila. Note that the final pdf is a bit bigger than the one obtained directly by Convert.

Of course, the resulting file size and quality depends on the initial set of pictures. In most case with black and white pictures, you will want to use .png format instead of .jpg

if you want to convert a pdf into a set of pictures:
- convert name.pdf name.png
It will split the pdf into as many pictures as there are pages, with numbers like name-0.png, name-1.png...
if you want jpg instead of png:
- convert name.pdf name.jpg
The quality of the pictures are by default 100 dpi ; you can change that to any value with -density:
-convert -density 300 name.pdf name.jpg
BEWARE: it will create huge raw files in memory so it can be very slow.

if you want to convert a set of picture to a pdf:
1. name the pictures in the order you want them to be packaged (00.png, 01.png 02.png, etc)
2. run:
- convert *.png name.pdf

if you want to convert a set of pictures from one format to another
- for i in *.png ; do f=`echo $i | sed s/.png/.jpg/` ; convert $i $f ; done

if you want to change their size, for example for them to have all a height of 1200 pixels:
- for i in *.png ; do f=`echo $i | sed s/.png/_resized.png/` ; convert -rezise x1200 $i $f ; done
BEWARE: resizing tends to render images quite dirty, the algorithms behind are not perfect...

Finally, if you prefer Djvu to PDF:
(Nb: if it is black & white files, it is probably a bad idea as it will be significantly smaller in pdf if you start from .png pictures )
1. convert all your pictures to djvu using c44:
- for i in *.png ; do f=`echo $i | sed s/.png/.djvu/` ; c44 $i $f ; done
2. create the djvu file using djvm:
- djvm -c name.djvu `for i in *.djvu ; do echo -n $i" " ; done`
3. you can then add an outline using a text file (e.g. outline.txt) with the following format :

(bookmark
("Chapter1" "#page_number"
("Subchapter1" "#page_number")
("Subchapter2" "#page_number")
)
("Conclusion" "#page_number")
)

And then set it in your file with:
- djvused set -outline outline.txt name.dvju

[1] but for the lucky few that use supercomputers, this should not really be a problem. Yes, I tried.

lundi 26 décembre 2011

Tutorial - installing the Cowan Fortran program on Mac OS X

I recently became interested in computing X-ray spectras from first principles. A huge challenge.
There are a whole lot of methods to do that, but one of the most popular is based on the Cowan's classic book "The Theory of Atomic Structure and Spectra".
Since the first edition, back in '81, he developped a FORTRAN 77 code that was able to compute, among others, atomic multiplets. The code was enormously developped since then, and the last version by the author is hopefully still available in full on Cormac McGuiness website (here).
I am just starting with it. And the first challenge was to get it installed on mac, and this is what I will try to explain here.

Most of the problems come from the fact that this bundle of programs is very old, and you will run into some dependency issues - luckily, nothing too funny here. As a foreword, you can use MacPorts to grab most of your software packages on-line, but these are not included in.

First, you need a FORTRAN 77 compiler. As mentioned by McGuiness, g77 is no longer available on modern Linux distributions, and gfortran (included in gcc) will not compile the code. However, Gaurav Khanna from UMass Dartmouth is maintaining a set of g77 binaries (here) which are working perfectly so far.

Second, you will need a working LaTeX distribution as the documentation has to be compiled from tex sources. You can port TeXLive, but the (huge) bundle Mactex is user-friendly and well maintained. However, you will need also to use the package latex2html, which is not part of the Mactex packages. Mild Opinion did a mini-tutorial for that, so just follow its instructions - and don't forget to deal with the required dependencies!

Once you've done that, download the full package of Cowan's program on Cormac's website (here). Then, in a terminal:
- "tar zxvf CowanCode.tgz" and go to the uncompressed repertory
- edit "Makefile": change DISTDIR to where you want the program to be installed.
- go to the Code directory, edit "Makefile": erase the line "FC=ifort" and change DISTDIR
- go to the main directory, "make" and "make install"

The compilation is running fine, including the documentation (I should store the html file online...), but I did not tested the binaries yet.