Not a day goes by in my life where I don’t use Google search. Nothing is easier than loading up the page, typing in a phrase, and seeing 650,000 related articles come back to you -- but therein lies the problem. Google is an amazing tool but tends to over-deliver unless you’re very specific and know a few advanced search operators. In this blog post, I’m going to attempt to show some relatively simple Google hacks that will make recon a breeze, and hopefully translate over into the rest of your searches as well.
Google search operates a few ways. Typing a phrase will look for the exact phrase, as well as variations of the phrase (adding or subtracting words), to try to come close to what you want. It also operates on Boolean operators such as “OR,” and “AND.” Finally, there are some advanced filters you can write inline with your search to help narrow things down. For the purposes of this post, we’re going to focus on the last two methods. I've also included a cheat sheet to help with more advanced Google hacking during your penetration testing
The Basics of Search
There are two basic searches I use all the time; quotes around phrases, and the + operator. These two functions alone can be immensely helpful when gathering information or filtering through junk.
Quotes around phrases require Google to search the whole phrase, not just parts of it. Take a look at the results for
-------Begin RSA Private Key--------
with and without quotes; they will be significantly different. You can also place a quote around a single word - without it, Google will try looking at variations of the word (for example, the phrase ‘Malware Hunting’ without quotes may return results for Malware Hunters, Malware Hunt, Virus Hunting, etc.)
The + operator before a word will only return results that specifically include that word. Building upon the last example, a search for
“-------Begin RSA Private Key--------” +openssl
will only return results where OpenSSL is being discussed along with that phrase.
Boolean Operators in Google: No really, just give me what I want.
Google search results vastly improve with the AND or OR operators. AND is similar to the + operator above, and can be used in parenthesis as well to build queries. It binds two terms together and will only give you exact results.
In this example, we’re trying to find a previous AlienVault blog post on building a Malware Hunter’s home lab. Searching Malware Hunter Home Lab by itself does return what we want, but also returns over 45,000 other pages -- and after the first few pages, the relevance of each result drops dramatically. By modifying this to contain some Boolean operators and quotation marks, we can search for
malware AND "hunter" "home lab"
and narrow this down to 3,500. The results are also much more geared toward what we want. Using the example above, maybe we don’t quite recall what the phrasing of the article was. We remember that it was either malware or virus hunting, and it was a home lab setup. Using the query
(malware OR virus) hunting AND “home lab”
we can narrow down to 4,200 results, from malware virus hunter home lab which returns a whopping half million. Another operator you should know is the NOT operator. Google simply uses a minus sign in front of the word to exclude it from all search results. Building on the query above, we want to exclude any results that mention Twitter. We simply modify the query to
(malware OR virus) hunting AND “home lab” -twitter
and no results including the word Twitter will be returned. This can be risky however, especially when excluding social media - many legitimate sites will have links to their social media presence on each page. However, this leads us into our next section…
Custom Inline Filters: Seriously Powerful.
There are a ton of very powerful, restrictive filters that are Google-specific. I’ll go over a few of my favorites, and have the supplemental cheat sheet available if you want to try out some more. Be advised that once you start stringing these together, Google will get suspicious. After a few queries, expect to get a CAPTCHA challenge page along with your IP address and search query - this is normal. Put in the query and you’ll get a few more advanced searches out of them before it reappears. This is intended mainly to stop automated scrapers and scripts, and the only surefire way I’ve found to avoid it is to do a normal, uncomplicated search in between your advanced searches.
A sample of Google’s search CAPTCHA
Remember you can also use the above filters in conjunction with the below inline filters; for example, -site:microsoft.com excludes microsoft, +site:microsoft.com requires it.
- site: This filter will restrict the rest of your search to one website. Rather than returning Wikipedia links, Reddit links, Twitter links, etc - you can focus on one particular site. Let's use it to search for password dumps on Pastebin with the following query:
"password dump" +@gmail.com site:pastebin.com.
Unraveling this query, we’re requiring the phrase “password dump,” only returning results that include @gmail.com addresses, and limiting our results to the website “pastebin.com” -- this returns 73 results, all very relevant to what we want.
- inurl: Maybe the exact site doesn’t matter much, but you only want results with part of a url present. In this example we’ll search for phpMyAdmin databases indexed by Google:
inurl:/phpMyAdmin/index.php db=
this returns lots of poorly configured phpMyAdmin installs. Further filtering with site: above, a penetration tester could quickly search for misconfigured services during the recon portion of their test.
- filetype: Filetype is a great way to search for files that were exposed to the internet erroneously, or shared and long forgotten. We can use standard file extensions without the dot (think pdf, not .pdf) to see what’s been left out for public consumption. In the following query, we’ll pull all .PDF files left from NIST.gov that have the phrase “best practice” somewhere in them.
inurl:"nist.gov" filetype:PDF best practice
From a penetration testing reconnaissance standpoint, this could help you collect authentic corporate documents and inside contacts for a social engineering campaign, pull out potentially sensitive data, scan for leaks in their regulatory compliance requirements, and countless other things.
Beyond narrowing down your search results, security practitioners and penetration testers can leverage their knowledge of systems and vulnerabilities along with Google hacking to perform audits. With these tools you can see what’s “available on the outside,” and where your potential exposure is. During penetration tests, you can use these tools to build out social engineering campaigns or locate sensitive information the target may have forgotten was even public. Here are a few queries you can modify to find some very interesting data.
Find open web directories containing the word TERM on Apache or IIS servers:
(“Index Of” | “[To Parent Directory]”) inbody:TERM site:somecompany.com
Find password dumps on pastebin:
"password dump" +@gmail.com site:pastebin.com
Find Excel files on nist.gov (hint: Click on “view omitted search results” for more):
site:nist.gov (filetype:XLS | filetype:XSLX)
Look for Apache Tomcat installs on a specific website:
intitle:"Tomcat Status" inurl:/status site:somecompany.com
Search for Windows XP exploits, but do not include anything on Exploit-DB.com:
"Windows XP" +exploit -site:exploit-db.com
Conclusion
Hopefully the techniques outlined above will help make your life a little easier. While I initially learned these search operators to help myself professionally, hardly a day goes by when I don't use them for personal searching as well. Google is a wealth of information, but often times, you need to separate the wheat from the chaff to find what you're really looking for.
About the Author
Jayme is a systems administrator by trade, IT Manager by role, penetration tester by passion, and is always on the hunt for interesting infosec information. https://www.twitter.com/highmeh