I/O performance is almost unlimited today - you can get any bandwidth from you hard-disks or your network that your willing to pay for. You can even get new unseen low latencies from your SSD-disks. With RAM prices declining we get more of it – mostly to the benefit of caching more and more data for extreme cached I/O speeds.
But you can't get more CPU power - except by going multicore.
Many traditional tools are not brought up to date, multicore wise, including to the gnu-core-utils. Often its possible to obtain parallelism at a high level, treating several files in parallel. As an example the “make -j” or “split, sort, merge”, but often at the cost of making I/O more scattered.
Examples of performance gains with parallelized versions of wc, sort and grep running on a current standard workstation pc are given – and their implementation discussed. The feasibility of parallelizing other tools in the standard unix toolchain is evaluated. Furthermore it is discussed how to integrate with/into the gnu-core utils.
Lars Ole Belhage is a MscEE (civ. Ing) and have been working with unix since start '80 and with Linux since 'Yggdrasil'. Been upgrading his knowledge at CBS and latest at DIKU with courses in deep multiprocessing and cluster computing. Work assignment include algorithmdesign for E-CAD systems, treatment of 'large' dataset including total set of danish phonesubscribers, global databases of financial data, web and router/firewall logs. Main tools: unix coreutils, awk/perl, c/c++
| Attachment | Size |
|---|---|
| OSD20100306.odp | 57.98 KB |
| OSD20100306.pdf | 49.25 KB |