Awk amazingness

So, you may or may not be aware, I am really excited about awk at the moment. Also find.

I learned a particular bit of magic on Sunday. I used my most complicated awk script yet: it uses an END block!

root@host [/home]# find /home/back/ -user <user> -exec ls -ld {} \; | awk '{total=total+$5}; END {print total}'

Here is the context: In a cPanel server, a user was looking at their disk usage, and saw lots of stuff under “Other Usage”. This user had lots of stuff in that category. They wanted to know what was taking up that space. Here is the documentation section about that cpanel page: Cpanel Docs: Disk Space Usage

This feature also displays disk space usage summaries for:

  • Files contained within your home directory
  • Files in hidden subdirectories
  • Mailing lists in Mailman
  • Files not contained within your home directory (see Other Usage bar)

So, now let’s break down this command I wrote and analyze what it does.

find

find is a utility used to look in the filesystem for files for which certain conditions are true. I find the manpage for this function very useful because it has informative examples. I usually use (and currently have bookmarked) the one here, so I can see it in my browser: man find

find /home/back/

The path argument tells find where it is looking. I had noticed that there was a folder in /home owned by root. This seemed a likely place to look in this case.

find /home/back/ -user <user>

I only wanted to find the things in that directory that were owned by that particular user. There were a lot of things in the folder, so I didn’t wanna check myself if all of them were owned by that user. So I made find do it for me! 🙂

find /home/back/ -user <user> -exec ls -ld {} \;

Once I’ve found the files I wanted, I needed to find how much space they were taking. I figured a good way to do this was to pass each file through ls -l so I could grab the number of bytes from that listing. For the directories, if I didn’t have the d flag in there, it would also list all the files in each folder when it got to it, which was not the desired behavior. Another thing I could have tried, instead of using the -exec command in the find command, was to pipe the results of the results through xargs ls -ld like this:

find /home/back -user <usr> | xargs ls -ld

However, this would cause ls to be confused if any file or folder names would have a space in the name. When using xargs without the -0 flag, spaces are used as input delimiters. I could fix it by using the -0 flag in xargs and using the -print0 command in find, like this:

find /home/back -user <user> -print0 | xargs -0 ls -ld

However, if I’m adding a command to find anyway, why not save the pipe and xargs by just using -exec? So that’s why I did that.

find /home/back/ -user <user> -exec ls -ld {} \; | awk '{print $0}'

This is a testing version I used so I could make each command with as small a difference as I could from the one before so I knew exactly what changes I was making each time. {print $0} is already the default action of awk, but if you don’t specifically list a condition for which lines to match, you have to have some command. And I want it to work on all the lines.

find /home/back/ -user <user> -exec ls -ld {} \; | awk '{print $0; print $5}'

This one is a sanity-check to make sure field #5 is the one with the file sizes in it.

find /home/back/ -user <user> -exec ls -ld {} \; | awk '{print $0; print $5; total=total+$5; print total}'

This tests that my variable total is working right, that it is adding up the file sizes as it goes.

find /home/back/ -user <user> -exec ls -ld {} \; | awk '{print $0; print $5; total=total+$5; print total}; END {print "Total: ",total}'

This tests that the END block works correctly and that the total will be printed correctly at the end of the script. Since it works and is giving me the information I want, I can now modify it to remove the pieces I don’t want. Since I’m being extra careful, I take one piece out at a time.

find /home/back/ -user <user> -exec ls -ld {} \; | awk '{print $5; total=total+$5; print total}; END {print "Total: ",total}'

find /home/back/ -user <user> -exec ls -ld {} \; | awk '{total=total+$5; print total}; END {print "Total: ",total}'

find /home/back/ -user <user> -exec ls -ld {} \; | awk '{total=total+$5}; END {print "Total: ",total}'

find /home/back/ -user <user> -exec ls -ld {} \; | awk '{total=total+$5}; END {print total}'

And now we have it. We found everything in that folder owned by that user and added up how much disk space it takes in bytes. To change to a more useful unit like megabytes, you can use this nifty trick

echo $((<total>/1024/1024))

In retrospect, looking at the documentation, it appears the sizes of the folders themselves are not counted.

The disk space usage information contained in this feature does not indicate how much space the directory itself uses. It only displays disk usage information about the directory’s contents. Typically, directories themselves occupy a negligible amount of disk space.

So for that, we would want to make our script count only the files.

find /home/back/ -user <user> -type f -exec ls -ld {} \; | awk '{total=total+$5}; END {print total}'

Then what remains is finding where else needs looking. 🙂

Some Explanations – Introduction

So I was going to make a post about this really awesome tv show I’ve been watching recently, and how I wanted to give it the Triple Crown award for fitting All Three Categories. But that wouldn’t make much sense without me having actually explained my categories. As is probably obvious from the title of my blog, I intend it to be primarily about Mathematics, Morality, and Magic. However, what might not be so clear is what I mean by that, and how that actually encompasses just about everything there is. My current intent is to begin explaining this in a three part series, each part explaining one of the categories and what it means to me and why it’s important enough to be a category. I plan to go in order. Let’s see whether this actually works, shall we?

New Project!

So, I’ve been meaning to learn a new programming language, but hadn’t decided which one I wanted to learn. So, as with many new projects, I started on Wikipedia. I had been thinking of learning Haskell, due to the Curry-Howard Correspondence. Part of my reasoning also though, was that it had an emacs-like editor that used Haskell instead of Lisp. But this weekend, while I was trying to look that up again, I couldn’t find the editor. It probably still exists somewhere, but it was harder to find. While I was reading about Lisp, I remembered that one of the things I like about it is that it is Homoiconic. So then I was reading about homoiconicity and found out there’s another language like that, called Prolog, which also is based on logic. So it’s really just a stone’s throw from Lojban. And the SWI dialect of it comes with an emacs clone called PceEmacs. I don’t know whether it has as many extensions as Lisp Emacs, so I’m not sure if I can use it as a mail client, and a feed reader, and stuff like that. Also, it depends on X to run. But if I want a more unified system where other than the OS itself I only run a few other things, it might still work.

Also, Prolog is a lot of fun to learn so far. The syntax is very intuitive (once you know Lojban or have taken a class in Mathematical Logic) and I think I’m getting the hang of it. I don’t know how to do very many things yet, but I’m pretty early in the tutorial.

coi rodo (Hello, everyone.)

Hello,

This is my first post on what I hope will be a lovely and fun blog. I plan to write about whatever I think about and decide I want to talk about. As you can guess from the title, this is likely to be about Math, Morality, or Magic, or some combination of them. However, as you are about to find out, I don’t always define words in the same way as is typical. I haven’t yet decided on a schedule, but I hope to post something every day I’m not at work. My job has an odd schedule, though, so it might be tricky for you to get used to. And maybe I can at least have something short to say on work days too. I really think writing will be good for me, and maybe the things I say will be useful to someone else too. And being publicly available on the internet seems to me the best place for whoever it would be useful to to find it.