El blog de Juan Palómez

8 abril 2011

Parallel grep on a single file

Filed under: Uncategorized — Etiquetas: , , — thisisoneball @ 12:24

I have a very large text file which I need to grep a lot of times, and it takes 4 minutes. The grep command only uses one core, so it could take much less time if I parallelize the command.

If you have GNU Parallel this will do the trick (tested on Linux and Cygwin):

$ cat dump.sql | parallel -k --pipe grep -i pattern

This is a different method, using xargs, which is present in most Unices, but it uses temporary files (the original file splitted).

This command splits the big file into 6 smaller ones, as I want to use 6 cores in parallel. The -C switch limits the size of each splitted file, and ensures that no lines are splitted between one file and the next one, the split will always occur at the end of one line:

$ split -C 320000000 dump.sql
 -rw-r--r--. 1 bd users 319999897 Apr 7 16:55 xaa
 -rw-r--r--. 1 bd users 319999939 Apr 7 16:55 xab
 -rw-r--r--. 1 bd users 319999801 Apr 7 16:55 xac
 -rw-r--r--. 1 bd users 319999988 Apr 7 16:55 xad
 -rw-r--r--. 1 bd users 319999595 Apr 7 16:55 xae
 -rw-r--r--. 1 bd users 308677345 Apr 7 16:55 xaf

And this command runs one grep process for each of the splitted files:

$ ls xa* | xargs -P 6 -n 1 grep -i 'pattern'

For differences between this kind of programs, see the Parallel man page

1 comentario »

  1. GNU Parallel newer than 20110205 can do it without tempfiles:

    cat dump.sql | parallel –pipe grep -i pattern

    See http://www.youtube.com/watch?v=1ntxT-47VPA

    Comentario por Ole Tange — 2 mayo 2011 @ 14:30

RSS feed for comments on this post. TrackBack URI


Por favor, inicia sesión con uno de estos métodos para publicar tu comentario:

Logo de WordPress.com

Estás comentando usando tu cuenta de WordPress.com. Cerrar sesión / Cambiar )

Imagen de Twitter

Estás comentando usando tu cuenta de Twitter. Cerrar sesión / Cambiar )

Foto de Facebook

Estás comentando usando tu cuenta de Facebook. Cerrar sesión / Cambiar )

Google+ photo

Estás comentando usando tu cuenta de Google+. Cerrar sesión / Cambiar )

Conectando a %s

Crea un blog o un sitio web gratuitos con WordPress.com.

A %d blogueros les gusta esto: