El blog de Juan Palómez

8 abril 2011

Parallel grep on a single file

Archivado en: Uncategorized — Etiquetas: , , — thisisoneball @ 12:24

I have a very large text file which I need to grep a lot of times, and it takes 4 minutes. The grep command only uses one core, so it could take much less time if I parallelize the command.

If you have GNU Parallel this will do the trick (tested on Linux and Cygwin):

$ cat dump.sql | parallel -k --pipe grep -i pattern

This is a different method, using xargs, which is present in most Unices, but it uses temporary files (the original file splitted).

This command splits the big file into 6 smaller ones, as I want to use 6 cores in parallel. The -C switch limits the size of each splitted file, and ensures that no lines are splitted between one file and the next one, the split will always occur at the end of one line:

$ split -C 320000000 dump.sql
 -rw-r--r--. 1 bd users 319999897 Apr 7 16:55 xaa
 -rw-r--r--. 1 bd users 319999939 Apr 7 16:55 xab
 -rw-r--r--. 1 bd users 319999801 Apr 7 16:55 xac
 -rw-r--r--. 1 bd users 319999988 Apr 7 16:55 xad
 -rw-r--r--. 1 bd users 319999595 Apr 7 16:55 xae
 -rw-r--r--. 1 bd users 308677345 Apr 7 16:55 xaf

And this command runs one grep process for each of the splitted files:

$ ls xa* | xargs -P 6 -n 1 grep -i 'pattern'

For differences between this kind of programs, see the Parallel man page

Advertisement

1 comentario »

  1. GNU Parallel newer than 20110205 can do it without tempfiles:

    cat dump.sql | parallel –pipe grep -i pattern

    See http://www.youtube.com/watch?v=1ntxT-47VPA

    Comentario por Ole Tange — 2 mayo 2011 @ 14:30


RSS feed para los comentarios de esta entrada. URI para TrackBack.

Deja un comentario

Fill in your details below or click an icon to log in:

Logo de WordPress.com

You are commenting using your WordPress.com account. Log Out / Cambiar )

Twitter picture

You are commenting using your Twitter account. Log Out / Cambiar )

Facebook photo

You are commenting using your Facebook account. Log Out / Cambiar )

Connecting to %s

Tema Shocking Blue Green. Blog de WordPress.com.

Seguir

Get every new post delivered to your Inbox.