Not to be confused with open source software, open source intelligence, often abbreviated OSINT (and not OSI), is the practice of gathering and analyzing information from public sources.
This is not to be confused with OSI (which is the Open Systems Interconnection model for networking). Not that many people confuse this. It's mostly just Kait that mixes them up. But if she's confused, you might be, too.
In the real world, open source intelligence is a broad category that can encompass a wide variety of techniques, and goals are often open-ended. For the sake of keeping challenges within a reasonable scope for a timed competition, the National Cyber League (NCL) typically presents focused challenges that fall into three categories:
- Trivia-like questions. Your goal is to use a generic search engine, like Google, to find the answers. You are not expected to know the answers in advance, though it’s possible that you might know some of them, especially for easier challenges. Often there is jargon included in the questions that you will need to understand in order to find the correct answer.
- Very specific questions about a specific entity. While a generic search engine might help with some of the easier questions, you’ll likely need to use more specialized resources to answer most of them. For example, a question might ask who owns a particular piece of land at a set of coordinates. While you might be able to find a business name located at those coordinates on Google Maps, that wouldn’t tell you the legal owner.
- A file is presented to you. Your goal is to extract information from that file, typically via metadata. Metadata is defined as “data about data”, but essentially means that you will extract data hidden within a file such as the time the file was initially created or last updated. You might have to know a little bit about the file format or be able to research it in order to extract the information you need.
Enhancing Your Search Skills
Searching for information on the internet might seem like a simple task, but not every question can be answered with a naïve Google search. However, if you’re still having trouble answering easier OSINT trivia questions, check out Kait’s blog post for the basics.
Most major search engines support a number of operators. These operators alter how a search query is processed. For example, many search engines support the following operators:
- Quotes (
""): Enclosing a series of search terms in quotes will search for that phrase verbatim; the words have to appear as-is, in the order they’re provided.
site:example.com: Prefixing a search query with
site:example.comwill only search for pages on
example.com. This can be useful for searching Wikipedia articles, for example.
- Minus (
-): Prefixing a term with a minus sign will exclude results containing that term. For example,
dogs -catswould search for websites that contain the term
dogsbut exclude any websites that mention
Each search engine has their own set of custom operators. These operators change over time, and they have varying degrees of effectiveness depending on the search engine you’re using. For example, as of writing, Google has a tendency to ignore quoted search terms and exclusion operators. Get to know which search engines support which operators; you may need to attempt your search on multiple engines before you get the results you need.
If you’re looking for information about a specific entity, you’re not likely to have much luck with a generic search engine. There are a wide variety of specialized resources you can utilize. Below is a list of some of my favorites, but there are many others, and you may need to find your own depending on the task.
- Information about internet-connected devices
- Shodan – Mass port- and service-scans
- Censys – Mass port- and service-scans
- RiskIQ – Variety of information about websites
- crt.sh – Certificate transparency logs
- ViewDNS.info – Various lookup and reverse-lookup tools
- IPinfo – Basic information about IP addresses
- bgp.he.net – Routing information
- Public records
Files often contain metadata. The presence of this data might not be immediately obvious to the person who created the file, and it may provide clues as to the author. This can be important when analyzing a document created by an unidentified adversary, for example.
JPEG images are particularly notorious for the amount of metadata they can contain. Using a format called Exif, JPEGs typically encode information about when the picture was taken and what sort of device was used to take the picture, often without the photographer’s knowledge. For pictures taken with modern smartphones, they may even include the exact location. This can be crucial for investigation.
There are a variety of web-based Exif viewers, with Jeffrey Friedl’s being particularly popular. However, if you’re looking to up your game, you should familiarize yourself with offline tools, especially those commonly used from the command line. ImageMagick includes a handy
identify utility; by running a command such as
identify -verbose photo.jpg, you can view all the Exif data encoded in
photo.jpg. Additionally, depending on your operating system, you may be able to view metadata using a file manager. For example, on macOS, right-clicking an image and choosing
Get Info will show a dialog with most of the metadata.
Other file formats can contain metadata as well, and if you know that your target is metadata, it’s usually easy enough to find a tool to help. A quick web search will often reveal a variety of appropriate tools for any format, though some formats are more likely to contain useful metadata than others.
Sometimes it’s not always clear what format a file uses, especially if the file lacks an extension. You can use a tool such as Linux’s
file command to help identify the file, or you can manually open the file in a hex editor and match the first handful of bytes against known file signatures. Most mainstream formats contain a few identifying bytes at the beginning of each file.
A Note on Ethics
Knowledge can be powerful. As your skills improve, you’ll likely be tempted to use them for less-than-ethical purposes. Sometimes the line between right and wrong gets blurred.
Don’t use these skills to find information about people without their permission. No matter who they are or what they may have done, using these skills to target people is harmful, regardless of your intentions, and it may very well be illegal.