If you’re like me, sifting through tons of data by hand is one of the most boring and tedious processes known to man. Thankfully, we have Bash (the Unix shell command language) to streamline our search. I’ll walk you through the fundamentals and show you how you can put them together to get the answers you need.
Note: If you’ve been following our other content, you’ll know that we suggest you have access to Kali Linux in some form, whether that be from a bootable USB, virtual box image, or some other format. All of the following content runs in the Unix command line. If you aren’t ready to work with a Command Line Interface (CLI), you can try CryptoKait’s initial trick here!
Background
For those of you who are not already familiar with them, “a log, in a computing context, is the automatically produced and time-stamped documentation of events relevant to a particular system. Virtually all software applications and systems produce log files.”1 Out in the wild, log analysis is often under-appreciated, but it becomes very important when you’re trying to identify the source of a breach.
You want to know which IP address connected to your server and downloaded files? Logs can tell you that. You want to know how many different users are trying to log into your system without proper credentials? You can find that too.
Commands
Command | Use | Examples | Common Flags |
cat | Outputting text | cat output2.txt (takes text of file ‘output2.txt’ and prints it to the terminal window) | |
| | Piping | cat output2.txt | [2nd command] (takes text of file output2.txt and uses it as an input for the next command) | N/A |
grep | Pattern matching | grep -i ‘example’ Prints all lines with ‘example’ | -i, -v |
awk | Everything | awk ‘{print $3}’ (prints 3rd column of text) | |
wc | Word count | cat file.txt | wc -l (Counts lines in file.txt) | -l, -c, -w |
sort | Sorts output | cat file.txt | awk ‘{print $7}’ | sort -d (takes the output from awk and sorts it in dictionary order) | -d, -n |
uniq | Remove duplicates | cat file.txt | sort | uniq | -c |
cat
The most fundamental of commands. cat is not just a cute animal that people make memes about, but it also prints out all the text of a file to your terminal. “Now WebWitch”, you might say, “I can read the file in my GUI, why do I need to print it out to the terminal?” Though it might be rather useless on its own, if you take the output of cat and feed it into the input of another command, you can sift through for the information you need.
TheBog:~ WebWitch$ cat example.log Sun Mar 19 03:38:38 2017 [pid 24540] CONNECT: Client “59.188.221.110” Sun Mar 19 03:38:42 2017 [pid 24539] [anonymous] FAIL LOGIN: Client “59.188.221.110” Sun Mar 19 03:43:30 2017 [pid 26902] CONNECT: Client “121.206.121.31” Sun Mar 19 03:43:42 2017 [pid 26901] [anonymous] FAIL LOGIN: Client “121.206.121.31” [Truncated] |
|
This vertical bar (generally found on the right side of your keyboard above the enter key) is called a pipe. Like its namesake, it will direct the output of one command into the input of another like so:
TheBog:~ WebWitch$ cat example.log | [next command] |
grep
This is a basic pattern matching tool mostly used to find specific words in a piece of text. If I wanted to find all the lines in the log that say CONNECT, I would type:
TheBog:~ WebWitch$ cat example.log | grep CONNECT Sun Mar 19 03:38:38 2017 [pid 24540] CONNECT: Client “59.188.221.110” Sun Mar 19 03:43:30 2017 [pid 26902] CONNECT: Client “121.206.121.31” Sun Mar 19 03:55:19 2017 [pid 29983] CONNECT: Client “222.223.143.107” Sun Mar 19 03:55:23 2017 [pid 30001] CONNECT: Client “222.223.143.107” [Truncated] |
Notice how we take cat to put the contents of the file into the input of our grep command.
In its natural state, grep searches for text that matches the case of whatever you put in. If I typed grep connect, it wouldn’t return any line with CONNECT, Connect, etc. If you want the search to ignore case, pass in the -i flag:
TheBog:~ WebWitch$ cat example.log | grep -i ‘connect’ Sun Mar 19 03:38:38 2017 [pid 24540] CONNECT: Client “59.188.221.110” Sun Mar 19 03:43:30 2017 [pid 26902] CONNECT: Client “121.206.121.31” Sun Mar 19 03:55:19 2017 [pid 29983] CONNECT: Client “222.223.143.107” Sun Mar 19 03:55:23 2017 [pid 30001] CONNECT: Client “222.223.143.107” [Truncated] |
If you need grep to search for a short phrase, encapsulate the phrase in quotes:
TheBog:~ WebWitch$ cat example.log | grep ‘FAIL LOGIN’ Sun Mar 19 03:38:42 2017 [pid 24539] [anonymous] FAIL LOGIN: Client “59.188.221.110” Sun Mar 19 03:43:42 2017 [pid 26901] [anonymous] FAIL LOGIN: Client “121.206.121.31” Sun Mar 19 03:55:21 2017 [pid 29982] [anonymous] FAIL LOGIN: Client “222.223.143.107” Sun Mar 19 03:55:26 2017 [pid 30000] [budclub] FAIL LOGIN: Client “222.223.143.107” [Truncated] |
If you’re in a situation where you’re not quite sure what you need out of the log file, but you’re very certain of what you don’t need, you can use the -v flag to match everything but what you put in the quotes:
TheBog:~ WebWitch$ cat example.log | grep -v ‘FAIL’ Tue Mar 21 20:56:07 2017 [pid 4143] CONNECT: Client “119.184.121.232” Tue Mar 21 21:20:55 2017 [pid 7792] CONNECT: Client “93.174.94.203” Tue Mar 21 21:21:31 2017 [pid 7817] CONNECT: Client “5.138.42.134” Tue Mar 21 21:21:31 2017 [pid 7816] [polimer] OK LOGIN: Client “5.138.42.134” Tue Mar 21 21:23:27 2017 [pid 7818] [polimer] OK DOWNLOAD: Client “5.138.42.134”, “/santa3.15.zip”, 6583053 bytes, 993.96Kbyte/sec Tue Mar 21 21:27:45 2017 [pid 7818] [polimer] OK UPLOAD: Client “5.138.42.134”, “/design/template1.html”, 2270 bytes, 13.15Kbyte/sec [Truncated] |
Note: This forces case matching whether it is a word or a phrase.
This is not all grep can do! Grep has even more flags you can pass in for different functionality — check the man page for details.
wc
Now that we have all of the lines where a user is trying to CONNECT to the server, how do we determine how many connections there are? wc can help! This command counts the number of words, lines, or characters of the file that you give it. In the NCL, it’s most commonly used with the -l flag after a grep command to count the number of lines that match the conditions you’ve searched for. If I needed to find how many failed logins there were by all users, I would enter the following:
TheBog:~ WebWitch$ cat example.log | grep ‘FAIL LOGIN’ | wc -l 5197 |
Note: -l gives number of lines, -w gives number of words, and -m gives number of total characters.
awk
This is an extremely powerful command that we will only be scratching the surface of. The function I used it most often for was its ability to print data out by individual columns based on a separator you define. By default, it will use spaces in the text. Let’s say you’ve identified all the lines where users are connecting to the server in our example log. If you only want their IP addresses, you would craft the command like this:
TheBog:~ WebWitch$ cat example.log | grep ‘CONNECT’ | awk ‘{print $10}’ “59.188.221.110” “121.206.121.31” “222.223.143.107” “222.223.143.107” [Truncated] |
$0 would print you out the whole line, $NF would give you the last column in the rows. There are thousands of uses for awk, but going over all of them would take a blog *series*. I highly suggest looking up some of its other uses on your own — you might find them very helpful.
sort and uniq
These two commands are the cherry on top of your filtered data. There are many times in the NCL where you will be asked questions along the lines of “what is the name of the user who failed connect to the server the 3rd most times?” or “How many different IP addresses tried to connect to the server.” When you have a long list of potential answers, these two commands are your friend. Sort, as you may assume, sorts all input data alphabetically. Uniq will remove duplicate lines that are next to each other. Passing the -c flag into uniq will count how many duplicates existed in the data. You can combine these two commands in different orders to suit the needs of the question at hand.
Want to try your luck at some practice questions? Take this document and test your understanding. You can try your answers here.
Editor Note: This file will NOT preview because it’s not a compatible file with Google Drive. You will have to download the file to access it. The file is configured to allow ANYONE to download it. If you request access to the file in surplus of 3 times, you will be blocked from accessing it for failure to follow directions (and for rudely spamming my notifications). Please CLICK DOWNLOAD (NOT REQUEST ACCESS as you DO have access to download it.) Thanks, CryptoKait
- What is the username of the person who uploaded documents to the server?
- How many bytes were downloaded off the server?
- How many unique IP addresses tried to connect to the server?
- What is the name of the user who failed connect to the server the 3rd most times?
Pro Tips:
- Not all log files look alike. One of the biggest hurdles you’ll face is first understanding the structure of each entry. Spend a bit of time getting to understand the data you’re looking at before diving in, it will save you a lot of accuracy loss in the future.
- Make sure you understand the output you’re getting with each command you’re writing. Though you might think you’re only grabbing lines that say FAIL LOGIN if you search FAIL, but some lines with FAIL DOWNLOAD might have sneaked in there without you noticing — throwing off the final count of whatever you were originally looking for.
- If you’re having difficulty deciphering exactly what information is in the log file, the title of the module may provide some insight. You may be able to find other examples online with that same log structure that are explained in more detail.
- If the question asks how many bytes were uploaded onto the server, it is probably smart to grep for the lines with “upload” and see what information the log gives you right off the bat. You can always whittle it down or expand your search from there.
With all of that, I bid you bon voyage on your journey through the sea of logs!
— WebWitch
Thanks , I’ve just been searching for information approximately this topic for ages and yours is the best I have came upon till now. But, what in regards to the bottom line? Are you sure in regards to the source?|
LikeLike
You could definitely see your enthusiasm in the work you write.
The arena hopes for even more passionate writers such as you who
aren’t afraid to say how they believe. All the time go after your
heart.
LikeLike
You could definitely see your skills within the article
you write. The arena hopes for even more passionate writers like you who aren’t afraid
to mention how they believe. At all times go after your heart.
LikeLike