One of my hobbies during this recent World Cup was to collect stickers. Actually, I’ve built the sticker album because my son wanted it but I had fun, too, I guess.
An important part of collecting stickers is to exchange the repeated ones. Through messages in WhatsApp groups, we report which repeated stickers we have and which ones we still need. As a programmer, I refused to compare the lists myself, so I wrote a little program em Python (with doctests and all) to find intersections.
The missing laptop
Last week, a person came to my home to exchange stickers. I had the lists of repeated and needed cards, both mine and hers, but my script was in another laptop. I did not even know where this machine was and my guest was in a hurry.
There was no time to find the computer, or rewriting the program. Or even to compare manually.
It’s Unix time!
The list format
In general, the lists had this format:
15, 18, 26, 31, 40, 45 (2), 49, 51, 110, 115, 128, 131 (2), 143, 151, 161, 162, 183 (2), 216 (2), 221, 223, 253, 267 (3), 269, 280, 287, 296, 313, 325, 329, 333 (2), 353 (3), 355, 357, 359, 362, 365, 366, 371, 373, 384, 399, 400, 421 (2), 445, 457, 469, 470, 498 (2), 526, 536, 553, 560, 568, 570, 585, 591 (2), 604 (2), 639 (2), 660.
Basically, I needed to remove everything which were not digits, alongside with the numbers in parentheses, and to compare both lists. Easy, indeed.
Pre-processing with sed
First, I had to remove the counters between parentheses:
$ cat list.txt | sed 's/([^)]*)//g'
15, 18, 26, 31, [...] 591 , 604 , 639 , 660.
(I know, UUOC. Whatever.)
Then, I put each number in its own line:
$ cat list.txt | sed 's/([^)]*)//g' | sed 's/, */\n/g'
Later, I clean up every line removing any character that is not a digit:
cat list.txt | sed 's/([^)]*)//g' | sed 's/, */\n/g' | sed 's/[^0-9]*\([0-9]*\)[^0-9]*/\1/g'
In practice, I only call sed
once, passing up both expressions. Here, I believe it would be clearer to invoke sed
many times.)
Finally, I sort the values:
$ cat list.txt | sed 's/([^)]*)//g' | sed 's/, */\n/g' | sed 's/[^0-9]*\([0-9]*\)[^0-9]*/\1/g' | sort -n > mine-needed.txt
I do it with the list of needed stickers, and also with the list of repeated stickers, getting two files.
Finding intersections with grep
Now, I need to compare them. There are many options, and I choose to use grep
.
In this case, I called grep
with one of the files as an input, and the other file as a list of patterns to match, through the -f
option. Also, only the complete match matters here, so we are going to use the -x
flag. Finally, I asked grep
to compare strings directly (instead of treating them as regular expressions) with the -F
flag.
$ fgrep -Fxf mine-needed.txt theirs-repeated.txt
253
269
333
470
639
Done! In a minute, I already know which stickers I want. I just need to do the same with my repeated ones.
Why is this interesting?
These one-liners are not really a big deal to me, today. The interesting thing is that when I started to use the terminal, they would be incredible. Really, look how many pipes we use to pre-process the files! And this grep
trick? I suffered to merely create a regex which worked! Actually, until solving this problem, I did not even know the -x
option.
I once helped a friend to process a good number of files. He already spent more than two hours trying to do it with Java, and we solved it together in ten minutes with shell script. He then asked me how much he wanted to know shell script and asked me how to learn it.
Well, little examples (like this one), as simple as they seem, taught me a lot. This is how I learned to script: trying to solve problems, knowing new commands and options in small batches. In the end, this is a valuable skill.
So, I hope this little toying enrich your day, too. I certainly enriched mine — I’d like to think about it before spending three times more time with my Python script!
This post is a translation of Trocando figurinhas sobre o terminal.