Updates

Day 4, I think?

There was a brief downtime earlier today while my server was being upgraded. Now it has 64-bit CentOS 7, which means cPanel can finally resume updating! It looks like version 58 really did fix PECL for php 7. I wasn’t fully sure if they really would, since they’ve misestimated arrival times of features a few times recently, but I was able to install APCu successfully. So currently this blog is using php 7, with dso (with mod_ruid2 installed), zend opcache, and I have wordpress plugin WP-FFPC installed, which uses the apcu to cache even more things. While configuring things I did a few gtmetrix tests. With php 7 and dso/mod_ruid2 (without the opcache or apcu) it loaded in 1.2 seconds. With opcache enabled it was 1.0s. With WP-FFPC enabled with nearly-default settings and apcu installed, it was down to 0.8s. I temporarily enabled cloudflare to test, but with the settings it had stored that brought the load time up to 1.1s. 0.3s is a lot when it was otherwise down to 0.8. I bet if I looked at the settings more carefully I could get the cloudflare value down a bit more, but that’ll be something to do another day. I have a bit of other tweaking to check on and redo (or do differently) in the server, and it’s probably also getting to be time to recheck how I want my plugins configured. Remake my “child theme” so I can have my Project Honeypot links again and continue the crusade against spammers.

One thing about EasyApache 4 though is that mod_security is flagged as conflicting with mod_ruid2. So you can’t have both installed. But, even with EA3, certain types of modsec rules don’t work when mod_ruid2 is enabled (and the log rotation gets… “wonky”).

At some point I might look at nginxcp more closely, in case that can squeeze a little more speed out yet, but last I looked it had a lot of problems with cent7. More recently I’ve been interested in Engintron, but it can be hard to configure “correctly” for not-terribly-uncommon setups (like if there’s any domains on Cloudflare, or if any sites have a Dedicated IP). I feel like Engintron has a lot of potential, but might not quite be “ready” yet. Nginxcp I used to be a huge fan of but I don’t know if they actually fix problems. I haven’t checked recently, so for all I know it might be a lot better now, but last I was following it, it seemed to be progressing very slowly, which wasn’t encouraging with all the changes needed to work with cent7 using systemd.

I’m slightly disappointed that the Let’s Encrypt plugin is only for domain certificates rather than also including the hostname certificate. At some point I need to retry putting a certificate here on my site, but I’ll want to be extra careful after what happened last time. At least this time I’m not trying to use the apache jails, which somehow is still in “EXPERIMENTAL” status.

The new Path of Exile league is starting soon! It sounds like it will be fun, and I’m glad they’re keeping the Prophecy mechanic. I’m thinking of playing a flame totem hierophant for my first character in the new league. It sounds like a good basic idea for a build, that doesn’t rely too heavily on having specific unique items.

Hi again

This is, if I’m still counting correctly, the third day in a row I have posted.

I spent most of the day playing Shadow of Mordor. My cat used to sit on my lap when I played it when she was a kitten, and today she sat in my lap part of the day when that’s where the sunshine was. But, I think trying to play it too close to bedtime makes it harder to sleep. I think it’s a bit too adrenaliney for right before bed.

I still like Habitica so far. I added a couple tasks to it today. I didn’t make my bed though, so I’m not getting both my dailies today, but I am writing here so that’s one of them.

I’ve been asked what PHB stands for. It stands for Pointy-Haired Boss. The name is from Dilbert. Among their most notable characteristics, after not knowing what they’re doing, is susceptibility to buzzwords. But a lot of things, even though they are buzzwords, the words really do have a meaning, and can actually be useful. But the buzzword form becomes uselessly meaningless. Like SEO (Search Engine Optimization). There are some things to do or not do, but most people who loudly claim to be experts in it are probably either making things up, or read things written by someone making stuff up.

And, I have been asked how I made the last paragraph cut off, on the main page. Inside WordPress, on the page where the post is being edited, if you’re in “Text” mode, one of the buttons across the top says “more”. If you click on that, then it adds a “more tag”. Everything after that tag does not show up in the version of the post on the main page, but does show up on the page for the post itself.

Didn’t get as much done today as I wanted, but I at least made this post. And, the game I was playing is fun, and I’m making some progress. Tomorrow is a work day.

UFYH and Habitica

I don’t remember how exactly I found the page, but I am really liking UFYH so far. It’s on tumblr. For those of you who don’t like certain types of words, let’s pretend for now that it stands for “Un-Fail Your Habitat”. That, hopefully, preserves the internetly-informality of the original. It’s purpose is to help motivate to do things that need to be done. It primarily focuses on cleaning, but can apply to other areas as well. The most basic premise, is that no one and no environment is beyond help. Everything can be improved. You don’t have to fix everything all at once, but if you can do a little bit of fixing every day, it will continue to get better. Even if you mess up sometimes, just keep working on it and it will get better. The site does make somewhat-frequent use of profanities, “as the kids do these days”, so those of you who don’t like reading certain words probably don’t want to use this specific site. But if that’s not a problem for you, and you have trouble getting started on things, or maintaining focus on things, this site might help you. For those who want it, the full link to the site will be under the “more” tag, so that those who don’t like seeing certain words can more easily avoid it. But I haven’t figured out how to hide it when on the post view instead of the main page. It looks like I would need either a plugin or a theme or learn some css to do so. I’m adding that to my list now, but that doesn’t fix this post (yet). So if you’re already on the post page, skip the last paragraph if you don’t want to see it.

I notice an interesting similarity between how it says to break up the work into small pieces with frequent breaks, and the “little and often” idea prevalent in all of the Mark Forster systems. What this site adds to that, is its insistence that no one is beyond help. That just because yesterday didn’t go as I wanted it to, that doesn’t make today hopeless. At least in certain moods, that’s a very important thing to remember. The word choice in the site (at least with the tone in which the words are used), to me adds a level of informality and relaxedness that also helps.

UFYH also has an Android app, which I mostly like so far. It does cost about a dollar, so if you only use free apps, you probably don’t want it, but I had some credits saved up from the google survey thingy. The app has, among other things, a timer preset for the “20/10″s (twenty minutes working, followed by ten minute break), a “Random Motivation” function, and a to-do list. I thought at first I’d get a lot of mileage out of the to-do list, having it so close to the motivator and timer, but recurring things don’t show any indication of having been done when you check them off. This is mentioned in one of the first few reviews on its page in Google Play, but I didn’t realize how big of a deal that would be until I tried it myself.

I still very much like the website. I followed it on Tumblr. And the app itself isn’t a whole waste of a dollar either; the motivator and the timer and the random challenge so far seem like they will still be useful to me. But it left me wondering what to do for a to-do list. There are so very many to-do list apps everywhere and even picking one sounds itself like a project.

But I noticed in Google Play some recommended apps, that seemed to follow this “gamification” idea that seems to have become popular when I wasn’t looking. It sounds like such a buzzword, worthy of the pointiest of PHBs, but the idea itself seems to have a lot of merit (as long as the buzzwordiness doesn’t get out of hand). So, when I did a basic google search for to-do apps, and one of the earlier articles mentioned Habitica, its name had already some small glimmer of familiarity, from having seen it in the suggestion list next to the other app I had.

Basically the idea is, one of the reasons people like games, is getting rewards for doing stuff. So this application takes real life, and makes it into an RPG. You add tasks to your list, either as one-off things, recurring things, or things you just want to do generally more or less often, and then when you do (or not do, for “bad” things) them you get XP and gold. When you don’t do (or do, for “bad” things) them, you lose HP. If you lose all your hit points you lose a level. If you get enough XP (experience points), you get a new level. Gold is used to buy equipment. There also exist gems, which can be purchased for real money, but the things bought with gems are strictly cosmetic, and there are in-game ways to earn them too. At a certain level, you get to choose a class (Warrior, Mage, Healer, or Rogue) which will give you access to skills. Skills are even more useful in quests. In order to do quests, you have to join a party (but if you really don’t want to interact with other people, you can have a party with only one person in it). Basically, the whole group progresses in the quest when anyone in the group does things they set out to do, but also the whole group has penalties any time anyone in the group fails to meet the expectations they created for themselves. In this way, you and your friends can help keep each other motivated and accountable to work towards goals. And, if you get sick or go on vacation or for some other reason have a good reason for not doing your usual things, you can Rest at the Inn. While resting in the inn, you don’t take any damage for missing your tasks, and if you are in a quest your party members don’t take damage for your missed tasks. (But you do still take damage for your party members’ tasks if they are not also resting, so you will at least want to check on your character in case you need a health potion.)

I like Habitica so far, but it’s definitely far too early to tell for sure how much of an effect it will have on me working towards my goals. But it sounds like it fits in a gap I feel like I have, to help me remember and motivate myself to actually do the things I’m wanting to do.

I didn’t accomplish very much today, but I feel like part of it was vaguely useful. I plan for the next bit of the evening to play some Shadow of Mordor. Though probably (hopefully) not very long, since I don’t want to get my sleep schedule off too far.

Here comes the more tag, before I go, but if you’re already on the page for the post itself it doesn’t make you click to see after it. Remember, if you don’t like to read profanities, you probably want to stop reading before the next paragraph starts.

Continue reading “UFYH and Habitica”

Trying Again

It has been quite some time since I posted. Over a year and a half it looks like.

I’m currently planning to try getting into an actual habit of writing. So, my new “rule” is, at least one post per day. Every day. Doesn’t have to be much, doesn’t have to be profound, just something. Sometime before I go to bed at night, I have to have logged into the site, typed some words, and posted them.

They say it takes about a month for a new thing to become a habit. Since my schedule is a two-week pattern, I will round to two fortnights. An extra fencepost for good measure, that makes 29 posts, of which this is the first. Ideally, I will continue writing daily after that. But twenty-nine days is how long I will be keeping count.

I was thinking of saying that if I write in my tumblr, for example if I get started with roleplaying there, that I would let that count as a daily post. But now I think instead, even if I do write there, I can at least log in here to say that I did it. I’m not requiring a lot in these daily posts, just that they exist, so if I just say “Hey, everyone, I did a tumblr post,” that’s already as long as the post here would have needed to be anyway.

Of course, knowing me, the posts will inevitably be much longer than that. Brevity is often not my strong suit, and I often write huge walls of text. I think my previous posts here demonstrate that well enough. But I’m not requiring any specific length, for this purpose.

I also plan once I get going, to use this as an anchor-habit for other habits. Once I’m already writing here regularly, I can use this to talk about other habits I’ve started working on and how well those are going. Especially if I think people are reading it, maybe I’ll have more motivation to do those things.

I have probably-too-many anti-spam plugins (I really dislike spam a lot), so if you try to comment and it doesn’t work, or if anything else in the site doesn’t work, you can email me at the probably-obvious admin email address. I’m also on gmail and tumblr and twitter and facebook and my name is always “skaryzgik”. I don’t check all of these things with equal frequencies (some of them quite rarely indeed). If you need to get in touch with me and one way doesn’t work, try another.

To-Dos and Similar

There’s a new WordPress theme I want to try. I started reading some of the new features, and it looks like I should like TwentyFifteen. I have a few customizations I’ve made in a child theme for TwentyFourteen that I’ll want to convert before I make the switch. Once I get started on it, it’ll probably only take a few minutes, but I’m in the middle of hardware upgrades on my home computer and will need to finish reinstalling and configuring the OS before I do that.

In the meantime, I think I’ll get started on getting “caught up” on my social networking stuff some. I’ll probably start with Facebook since I’ve got so much family there. After that, I’ll probably do g+ or twitter, then the other one. Maybe LinkedIn after that? If anyone’s reading this thing yet and has suggestions on what order to go in catching up on this stuff, you can email me or leave a comment. If you have trouble commenting, let me know (email is most likely to work) with as much detail as you can, so I can try to fix it.

So I don’t confuse everyone like this, here’s my current approximation of how to contact me. My first preference is email, either at my gmail or my domain email. If it’s about administrative stuff for my server or site, I have an admin email address you can use for that. Personal stuff would more effectively be sent to one of the other ones. I’ll also check comments here, and soon Facebook messages too. Once I get caught up on more social networking sites I have accounts in, I’ll probably check those too. But in any given sitting I’ll probably read email first, comments second, and the other stuff after that, so if my sitting isn’t long enough to do that many things, I might only see email at that time. For this reason, I might not see messages via other channels for a few days after I might have seen it if it was email, but I’ll probably still see them at some point, so if it’s not some huge rush it should still be fine.

I plan soon to update my about page with this stuff too, so that it will be easier to find.

Oops!

So I don’t think I have enough traffic yet that anyone is likely to have noticed, but, I had some downtime last night. Completely self-inflicted, of course. I’m not sure exactly what specific things were going wrong, but that’s exactly what the problem is. I was less careful than I should have been. Here’s the story:

So I’ve been working on my site lately trying to add new stuff to it. A few plugins that look like they might be more useful than problematic. Then I realized my site didn’t have https yet. Not that I previously thought I did, I just hadn’t fully realized the implications of this until a few days ago.

It turns out, trying to make a wordpress site use https correctly is a pretty common stumbling block. There’s a couple plugins that are supposed to help, but even then, lots of people still wind up with redirect problems, or with mixed content being served (ie, things referenced in the page still using http even when the page is requested with https, causing browser warnings even if the certificate is signed).

I made myself a self-signed certificate to use for testing. I didn’t wanna use yet another plugin when I already feel like I might be pushing too-many, so I thought I’d see if I could do it just with wordpress settings. But once I changed the url settings, boom! redirect loops. Firefox detected that the site was redirecting in a way that would never complete.

After trying to find where in the files or databases those are stored and failing (one of these days I really need to learn how a wordpress database is actually organized), and checking all the .htaccess files for redirects, I decided to try renaming all the .htaccess files (so that they’re not named .htaccess anymore). I even removed the certificate too. Then it was almost, sorta-working. It looked like a webpage that was missing its stylesheets, or rather, what I think of a webpage as looking like when missing stylesheets. The links were all there, but everything was in one column and there were no pictures. There was a button though which seemed a little odd, so I’m probably not understanding exactly what it is it was doing, but here’s the tricky part:

The login page was still using the https url. So even though the page was sort-of loading, I couldn’t log in to try disabling plugins and stuff. My best guess at the time was that it was somehow in the database. I also thought it might be something caching something (I’ve recently enabled a lot of caching settings (but not using disk caching! but more on that topic in another post (technically I think nginx’s cache is currently on a disk. I intend soon to make a tmpfs for it though so that that would be on ram too. I want to make sure I first understand how the virtualization affects mounting and whether /tmp is already special; if I decide it probably won’t hurt anything I’ll probably mount all of /tmp to ram. ) ) ) but I’d already restarted everything that should be storing the caches, and removed all files from the nginx cache. I was able to grep the table files for https to determine it was in one of three tables. And I know how to log into mysql and use a database and describe a table and select from a table where things are like things, which sounds like it should be enough. But I didn’t know quite what exactly it was I was even looking for, so I’d either find no results, or way too many results to see what was going on. This is part of why I mentioned earlier wanting to learn more about wordpress databases and their structure and how it organizes things so that it’s easier to find things. Like using awk in exim logs, it’s very powerful but is most useful when you know all three of command syntax, structure of what you’re looking in, and what you’re actually looking for.

I think I might have had a little more success if it wasn’t already past my usual bedtime, and/or if I didn’t have a headache at the time (I had less caffeine than I’ve gotten used to lately). But I was tired and annoyed. Annoyed at wordpress for not working how it seemed like it should, annoyed at myself for not having figured out how to fix it yet, and embarrassed that my site that I was so excited about building and just made a post that I was telling people about, so they might see that I’d broken my site and not yet fixed it. I really didn’t wanna leave it visibly broken overnight. So I tried restoring from a backup.

First, I used the backup restore functionality from within WHM. I have my automatic backups configured to run daily and I keep a lot of them, so I did have a backup, and it wasn’t all that old either. The behavior was unchanged though. My main site page was still only sort-of loading, and the login link still had https and was getting 404 (and when I tried to use plain http it redirected to https and thus still got the 404). I looked in the folder and saw both .htaccess and .htaccess-bak existing. The backup restoring didn’t remove files that weren’t in the backup so it wasn’t everything exactly as it was at two in the morning.

I almost decided to go to sleep then, but then I remembered that if I remove the cPanel account and then restore it from the backup, then it should be exactly how it was. So I tried that.

But then everything was getting 404. And I couldn’t figure out why it would be doing that if all the files and databases were the same as they were at 2am when it was working fine. But I also knew whatever was happening was probably more complicated than I was going to figure out that late at night. So I went to go sleep.

This morning, I tried making a mysqldump of the database, and seeing what happened if I made a new wordpress install in a different account, then imported the database dump. The main page loaded fine, but all the links were still pointing at the other site. Apparently the links are all in the database. (There is a lot of stuff kept in the database. I really should learn how it’s organized). So then, after saving another copy of the files I already had, I tried to do a new wordpress install in the original account, with the intention of later trying to find just the parts of the database that pertain to the posts to import just that. But even the wordpress install page was giving a 404.

I almost thought it was going to end up easier to start the whole thing over, but wanted a break first. After Brad got home from class, I told him about it and he suggested taking one more look at the apache error logs. And it is a good thing he did. There were mod_ruid2 errors about it not being able to change directory correctly. It turned out something weird was happening with the apache jailshell. Brad pointed out that it is still listed as experimental, so I should check if it’s any better if I disable it. At first it didn’t make a difference, but then he also pointed out sometimes apache isn’t always restarted fully enough after making some types of change. Once I restarted apache, then the wordpress install page was loading. Oh yeah, I deleted my files (after copying them to somewhere else).

Then I did another backup restore, from yesterday morning’s backup, and then my page was loading just fine, and I was very happy. I logged in and started writing this post, to explain what was wrong with my site, but also to share things I have learned.

I think now that the apache problem might have been that the jailshell was confused about the mountpoints. One of them wasn’t unmounted correctly when I terminated the account, so I unmounted it manually so that the repquota cron would work again. I think some of the other jailshell settings might have not gotten cleaned correctly when the account was deleted and recreated.

Now that it’s working again, I might try briefly to see if apache jailshell will work right if I re-enable it. If it doesn’t work I will re-disable it. It is still experimental after all, so some bugs are to be expected. After that, my plan is to go over some settings changes I made yesterday that got lost when I did the backup restore, and then research how to use https with wordpress. I do still want to do that, but after what happened yesterday, I’m going to try to research it better before jumping into things I don’t actually know how to do.

Time and Stuff to Do in it

A perennial trickiness seems to be how to actually accomplish those things which have been chosen. There seems to be an entire industry built up around telling everyone else how to do the things they already want to do anyway. Naturally, I’m a bit skeptical of any how-to whose website is nothing but an advertisement and a store, where you purchase the books and training courses to use the system. Yes, GTD, I’m looking at you. But Lifehacker, that repository of all tips nerdy and geeky, seemed to stand by it pretty hard. So I gleaned what I could of the system without having to shell out the bucks. (I was in college. Who wants to spend money on things that feel like they should be free?) I was able to piece together the basics, and it seemed pretty useful. But as with every other system I tried, it felt good in those first few days where the boundless enthusiasm I have for any project was carrying me through, but as soon as that wore off, it all fell apart. All those lists are all very well and good, but only so long as I look at them.

Which, I want to make very clear, is not a failing in the system itself. It’s just that the very thing I was looking for isn’t solved by it. I did learn a lot of useful things from it. It takes a lot of seemingly-obvious facts and combines them into a model, which if you’re able to actually get off your butt and follow it, can be extremely powerful. But as with most caches and channels of power, it must be approached carefully. Additionally, sometimes the seemingly-obvious isn’t noticed nearly as early as you’d think it would be, especially if the person who needs to notice is as oblivious as I am.

Also, it feels really unbalanced to have separate context lists, when the only one that ever has more than one thing on it is @home/desk/computer.

There is very much a difference between the facets of “keeping track of what to do and choosing the order to do it in” and “actually getting off my butt and doing the stuff I just chose”. Currently, I’m trying Mark Forster’s (you may have heard of AutoFocus or SuperFocus or one of his many revisions of them) “Final Version (FV)” for listkeeping and a modified version of “Pomodoro” for actually-doing-stuff. Maybe I’ll call it “The Last Tomato”. My main change to Pomodoro comes in two main parts, both stemming from the fact that the task is always the same: “follow the list”.

Since everything goes on the list (including games and internet reading) that leaves the tricky question of “what do I do in my 5-minute breaks? or my half-hour/hour breaks?”. The chain-building from FV already has built-in a rewards system and since the things I usually consider breaktime are already in the list, my current strategy is to “skip” the breaks and use the timer more to delineate my time so I am aware of its passing. Also, if I have something that takes more than one tomato of time, one tomato is probably more than long enough for it to be “worked on” and if I spend more than that in a row I’m probably not gonna actually get around to the other stuff. So when the tomato alarm goes off, I do something else. Also, if I get distracted reading too many articles on wikipedia or lifehacker or cracked or any of several other notorious time-sinks, the alarm will let me know time has passed at some point before all of the time has passed.

This being my first day of “Last Tomato” it’s really too early to say if it’s going to be “The System” or not, but I’ve played with FV a bit before and think with the added focus of remembering that time is passing it has a lot of potential. Last time I tried FV the thing that broke me from it was that my list, having everything on it, got really long and unwieldly and I was asking myself why I was reading the name of each book I’ve heard of and every game I’ve remembered I like in the last week, every time I go through the preselection to build the next chain. But I think if I can remember to relax a bit and not worry so much and let the system work itself out, that it at least has a chance of lasting longer than others tend to.

And even if it doesn’t, at least I’m not as stressed as I was in college. Seriously, that stuff was hard.

Sender Verification and its Callouts

In the constant fight against spam, one of the many useful tools is Sender Verification. You may have seen references to this when poking around in the exim configuration manager. But what is it? What is it for? What problems can it “cause”?

One of the occasional tell-tale signs of spam, is that it is coming from a sender address or domain that does not exist. Why would you send an email without having your own email address? There are several possible reasons, but the most common answer to that question is that either you are a spammer or you have something misconfigured. And if you are a spammer, I probably don’t want your filthy spammy email anyway. So one way I can filter you out right away so I don’t have to see your spammy email messges is to tell my mailserver daemon not to accept any mail from a domain or address that does not exist. This is called Sender Verification.

There are two main parts to Sender Verification. In exim, these are referred to as Sender Verification, and Sender Verification Callouts.

The more basic of the two is Sender Verification. This is why Sender Verification must be turned on before you can enable the Callouts. Sender Verification just checks if the domain exists. So, for example, if I get an email from an address called “iamlegitiswear@totallynotaspammer.com”, the mailserver will check if the domain “totallynotaspammer.com” exists. Currently, at the time of this writing, that domain does not resolve:

===
[skaryzgik@localhost ~]$ dig any totallynotaspammer.com +short @8.8.8.8 | wc -l
0
===

So my mailserver sees the domain not resolving and figures “uh-oh, this looks bad!” and rejects the mail. And I don’t have to see the spammy spammy message, and exim tells off the sending server. By which I mean, it returns an informative error so that the server admin can see that something weird is going on and they should check their security settings.

Sender Verification Callouts can help stop the receiving of spam in other cases, where the domain itself exists and resolves just fine, but the actual address does not. The way Sender Verification Callouts accomplish this, is that they send a test email to the sender’s address. If the test email works, then the sender address must work, and this particular check will not block the mail. If the test email does not work, the callout fails, and exim decides the sender address does not exist and rejects the message.

Yay! I have less spam running through my server! But wait, there’s more!

Sometimes, Sender Verification or Sender Verification Callouts can appear to cause legitimate mail to not work. For example, here is a problem I see occasionally. The complaint is usually along the lines of “Halp! My php script can’t send mail!”. There are many cases in which a php script would want to send mail, for example if it is a well-protected forum registration page which is resistant to forum spammers. You may want your forum to send each new registrant an introductory email with useful informative links and a few of the basic rules and terms. But the mails don’t get sent and you see errors about addresses not existing but you know very well the recipient exists because you just sent an email there.

Yeah, customers like to panic. Anyway, the sender verification, especially when it’s doing the callouts, if it can’t find the sender, will refer to them as a recipient, because they are a recipient – of the test mail.

So then you might be thinking, “okay, so the server thinks cpaneluser@host.domain.tld doesn’t exist. How is that better? It obviously exists! It’s my main cpanel email user?”

The weird thing about hostnames is lots of people don’t bother to make sure they resolve. They don’t realize they’re used for anything. Similarly with nameservers, but that causes other, sooner-noticed, problems. Make sure the hostname has an A record. WHM has a special page for it so you don’t even have to edit the dns zone yourself.

Of course, you still have to make sure the domain the hostname is under hasn’t expired.

Happy spam hunting!

EDIT: It has been pointed out to me that with these recommendations, you still wouldn’t be able to receive mail from many noreply@ messages, since they do not usually accept mail. I have added finding a suitable solution to my shiny-new to-do list.

Awk amazingness

So, you may or may not be aware, I am really excited about awk at the moment. Also find.

I learned a particular bit of magic on Sunday. I used my most complicated awk script yet: it uses an END block!

root@host [/home]# find /home/back/ -user <user> -exec ls -ld {} \; | awk '{total=total+$5}; END {print total}'

Here is the context: In a cPanel server, a user was looking at their disk usage, and saw lots of stuff under “Other Usage”. This user had lots of stuff in that category. They wanted to know what was taking up that space. Here is the documentation section about that cpanel page: Cpanel Docs: Disk Space Usage

This feature also displays disk space usage summaries for:

  • Files contained within your home directory
  • Files in hidden subdirectories
  • Mailing lists in Mailman
  • Files not contained within your home directory (see Other Usage bar)

So, now let’s break down this command I wrote and analyze what it does.

find

find is a utility used to look in the filesystem for files for which certain conditions are true. I find the manpage for this function very useful because it has informative examples. I usually use (and currently have bookmarked) the one here, so I can see it in my browser: man find

find /home/back/

The path argument tells find where it is looking. I had noticed that there was a folder in /home owned by root. This seemed a likely place to look in this case.

find /home/back/ -user <user>

I only wanted to find the things in that directory that were owned by that particular user. There were a lot of things in the folder, so I didn’t wanna check myself if all of them were owned by that user. So I made find do it for me! 🙂

find /home/back/ -user <user> -exec ls -ld {} \;

Once I’ve found the files I wanted, I needed to find how much space they were taking. I figured a good way to do this was to pass each file through ls -l so I could grab the number of bytes from that listing. For the directories, if I didn’t have the d flag in there, it would also list all the files in each folder when it got to it, which was not the desired behavior. Another thing I could have tried, instead of using the -exec command in the find command, was to pipe the results of the results through xargs ls -ld like this:

find /home/back -user <usr> | xargs ls -ld

However, this would cause ls to be confused if any file or folder names would have a space in the name. When using xargs without the -0 flag, spaces are used as input delimiters. I could fix it by using the -0 flag in xargs and using the -print0 command in find, like this:

find /home/back -user <user> -print0 | xargs -0 ls -ld

However, if I’m adding a command to find anyway, why not save the pipe and xargs by just using -exec? So that’s why I did that.

find /home/back/ -user <user> -exec ls -ld {} \; | awk '{print $0}'

This is a testing version I used so I could make each command with as small a difference as I could from the one before so I knew exactly what changes I was making each time. {print $0} is already the default action of awk, but if you don’t specifically list a condition for which lines to match, you have to have some command. And I want it to work on all the lines.

find /home/back/ -user <user> -exec ls -ld {} \; | awk '{print $0; print $5}'

This one is a sanity-check to make sure field #5 is the one with the file sizes in it.

find /home/back/ -user <user> -exec ls -ld {} \; | awk '{print $0; print $5; total=total+$5; print total}'

This tests that my variable total is working right, that it is adding up the file sizes as it goes.

find /home/back/ -user <user> -exec ls -ld {} \; | awk '{print $0; print $5; total=total+$5; print total}; END {print "Total: ",total}'

This tests that the END block works correctly and that the total will be printed correctly at the end of the script. Since it works and is giving me the information I want, I can now modify it to remove the pieces I don’t want. Since I’m being extra careful, I take one piece out at a time.

find /home/back/ -user <user> -exec ls -ld {} \; | awk '{print $5; total=total+$5; print total}; END {print "Total: ",total}'

find /home/back/ -user <user> -exec ls -ld {} \; | awk '{total=total+$5; print total}; END {print "Total: ",total}'

find /home/back/ -user <user> -exec ls -ld {} \; | awk '{total=total+$5}; END {print "Total: ",total}'

find /home/back/ -user <user> -exec ls -ld {} \; | awk '{total=total+$5}; END {print total}'

And now we have it. We found everything in that folder owned by that user and added up how much disk space it takes in bytes. To change to a more useful unit like megabytes, you can use this nifty trick

echo $((<total>/1024/1024))

In retrospect, looking at the documentation, it appears the sizes of the folders themselves are not counted.

The disk space usage information contained in this feature does not indicate how much space the directory itself uses. It only displays disk usage information about the directory’s contents. Typically, directories themselves occupy a negligible amount of disk space.

So for that, we would want to make our script count only the files.

find /home/back/ -user <user> -type f -exec ls -ld {} \; | awk '{total=total+$5}; END {print total}'

Then what remains is finding where else needs looking. 🙂