July 2018 – Suspension of Disbelief

One of my hobbies during this recent World Cup was to collect stickers. Actually, I’ve built the sticker album because my son wanted it but I had fun, too, I guess.

2018 sticker album showing France team missing three pictures. — Sadly, not completed yet

An important part of collecting stickers is to exchange the repeated ones. Through messages in WhatsApp groups, we report which repeated stickers we have and which ones we still need. As a programmer, I refused to compare the lists myself, so I wrote a little program em Python (with doctests and all) to find intersections.

The missing laptop

Last week, a person came to my home to exchange stickers. I had the lists of repeated and needed cards, both mine and hers, but my script was in another laptop. I did not even know where this machine was and my guest was in a hurry.

There was no time to find the computer, or rewriting the program. Or even to compare manually.

It’s Unix time!

The list format

In general, the lists had this format:

15, 18, 26, 31, 40, 45 (2), 49, 51, 110, 115, 128, 131 (2), 143, 151, 161, 162, 183 (2), 216 (2), 221, 223, 253, 267 (3), 269, 280, 287, 296, 313, 325, 329, 333 (2), 353 (3), 355, 357, 359, 362, 365, 366, 371, 373, 384, 399, 400, 421 (2), 445, 457, 469, 470, 498 (2), 526, 536, 553, 560, 568, 570, 585, 591 (2), 604 (2), 639 (2), 660.

Basically, I needed to remove everything which were not digits, alongside with the numbers in parentheses, and to compare both lists. Easy, indeed.

Pre-processing with sed

First, I had to remove the counters between parentheses:

$ cat list.txt | sed 's/([^)]*)//g' 15, 18, 26, 31, [...] 591 , 604 , 639 , 660.

(I know, UUOC. Whatever.)

Then, I put each number in its own line:

$ cat list.txt | sed 's/([^)]*)//g' | sed 's/, */\n/g'

Later, I clean up every line removing any character that is not a digit:

cat list.txt | sed 's/([^)]*)//g' | sed 's/, */\n/g' | sed 's/[^0-9]*$[0-9]*$[^0-9]*/\1/g'

In practice, I only call sed once, passing up both expressions. Here, I believe it would be clearer to invoke sed many times.)

Finally, I sort the values:

$ cat list.txt | sed 's/([^)]*)//g' | sed 's/, */\n/g' | sed 's/[^0-9]*$[0-9]*$[^0-9]*/\1/g' | sort -n > mine-needed.txt

I do it with the list of needed stickers, and also with the list of repeated stickers, getting two files.

Finding intersections with grep

Now, I need to compare them. There are many options, and I choose to use grep.

In this case, I called grep with one of the files as an input, and the other file as a list of patterns to match, through the -f option. Also, only the complete match matters here, so we are going to use the -x flag. Finally, I asked grep to compare strings directly (instead of treating them as regular expressions) with the -F flag.

$ fgrep -Fxf mine-needed.txt theirs-repeated.txt 253 269 333 470 639

Done! In a minute, I already know which stickers I want. I just need to do the same with my repeated ones.

Why is this interesting?

These one-liners are not really a big deal to me, today. The interesting thing is that when I started to use the terminal, they would be incredible. Really, look how many pipes we use to pre-process the files! And this grep trick? I suffered to merely create a regex which worked! Actually, until solving this problem, I did not even know the -x option.

I once helped a friend to process a good number of files. He already spent more than two hours trying to do it with Java, and we solved it together in ten minutes with shell script. He then asked me how much he wanted to know shell script and asked me how to learn it.

Well, little examples (like this one), as simple as they seem, taught me a lot. This is how I learned to script: trying to solve problems, knowing new commands and options in small batches. In the end, this is a valuable skill.

So, I hope this little toying enrich your day, too. I certainly enriched mine — I’d like to think about it before spending three times more time with my Python script!

This post is a translation of Trocando figurinhas sobre o terminal.

Month: July 2018

Exchanging World Cup’s sticker figures with the terminal

The missing laptop

The list format

Pre-processing with sed

Finding intersections with grep

Why is this interesting?