I did a presentation about Google at Warpstock Europe 2008. What I showed there is a short version of what I teach in my classes as well during a security course at my University.
First part was about enhanced Google hacking, which is mainly about advanced search operators and some creative googling, second part was about security awareness. Due to the limited amount of time available I could not show many details, but I promised to put all the links online. If you could not take part of the presentation, you might checkout the links in here.
Advanced search operators
You can search much more effective on Google using so called advanced search operators. Google has a nice page about them where they explain what they do and how you can use them. You might check that page first to get an idea of what I am talking about.
- site: – restrict the search to the following domain. Example: odin site:netlabs.org
- inurl: – search for the keyword in the URL. Example: odin inurl:netlabs
- intext: – search for this term in the text and not in the whole page. Example: netlabs intext:voyager
- filetype: – search for specific file types only. Very handy if you look for some specific files. Example: netlabs voyager filetype:pdf
If you want to know more about that you might check out GoogleGuide, which is a great resource to learn more about searching with Google. They also provide a page about advanced search operators. If you want to do so called Google Hacking, you might have a look at the Google Hacking Database. You will find good tips about how you can search for specific things combining various tricks in one search. It’s all pretty easy in the end :) Don’t believe it? Try this: site:ibm.com filetype:ppt intext:confidential
Note that by default Google searches for all keywords you provide (like an AND operator), if you want to exclude some keywords, you can specify that as well using the – operator. Example: netlabs -OS/2 would search for everything related to netlabs but not related to OS/2.
As a last tip here, you might want to have a look at Goolag, which is a (so far Windows only) application that does this kind of searches automated. Unfortunately it looks like they lost their domain (goolag.org), it is at a domain parker now… will update that in case it gets back online again.
Person search engines
If you want to know more about a person, you can use various person search engines. They are actually pretty scary and I recommend to do the same about your very own name. If you find stuff you don’t like, you might have to react somehow…
Some search engines I know:
The second part of the presentation was about privacy. I could spend hours explaining my concerns with that so I just would like to give you a very short view on my thoughts. If you can read German, I recommend to have a look at the Google special from the Swiss Wochenzeitung (WOZ) weekly newspaper. This is a great introduction to this topic and shows the issues I have with Google very well.
As a start, I quickly list some companies or projects owned by Google:
Google Docs, Google Mail, Google Calendar, Google Maps, Google Desktop Search, iGoogle (startpage meshup), Google Analytics, Google AdSense, Google Video/Youtube, Orkut, DoubleClick.com, Google Checkout, Blogger.com, Chrome Webbrowser, Google Talk, Google News, Google Groups
So that’s in the direction of 20… now imagine what kind of tracking you get when you combine all those databases/logfiles together. Never forget that Google earns about 99% with advertisements. The more they know about you, the more they can give you exactly that kind of advertisement which fits for you.
If a website is using Google Analytics or DoubleClick.com, you get even tracked by Google if you visit a complete different page, as your browser will resolve links to those pages using banners or even 1×1 pixel graphics. Don’t believe me? Simply open your favourite news page in Firefox, Click Ctrl-I (Tools->Page Info) and check the Media tab in there. Now go through the list and search for all links to domains not related to the news page you just visit. There is no reason for a 0x0 or 1×1 pixel graphics in there, except for tracking you…
Another thing are all those great Web 2.0 services like Facebook, StudiVZ, LinkedIn etc. All you put in there should be considered as a tattoo: You will not get rid of it, so be very careful. It’s probably not a good idea to put pictures in there of your last party at the University where half of your friend including yourself were drunk. Companies do check such things when you apply for a job, they are not stupid. It’s not even necessary that you put the pictures in there yourself, just search for my name on some person search engines and you will find loads of pictures done by other people and tagged with my name. That’s the drawback of doing presentations in the public ;)
Anyway, there are a few tools I use and I would like to promote them a bit here:
- AdBlock Plus – the best privacy tool in my humble opinion. It does not just block annoying advertisements, it also makes sure you do not get tracked like this. Don’t forget to add a *google-analytics.com/* URL in there as well, this should at least partially block another tracking from Google. You can subscribe to some lists which are updated on a regular base and you can add your own URLs as you need.
- TrackMeNot – This plugins will simply send random queries to search engines. Sounds weird but this is actually a pretty smart idea to screw them their databases. In my opinion this is one of the few possibilities we have to screw their business in the future: noise and obfuscation.
- Tor and Privoxy – this combination is probably the best you can get in terms of privacy. Tor will give you a completely randomized IP and Privoxy will make sure your browser does not betray you with some cookies or other things you might have forgotten. Please check both projects to learn about how they work. Unfortunately the Tor network is far away from being fast so right now I just use that if I have a good reason to do so.
That’s about it with tools, if I forgot something, please add a comment to this article. If you run your own web pages and/or blogs you might consider providing a robots.txt file which forbids to cache your site. In case you have to take some content offline, there is a better chance that it is really offline afterwards like this. Most search engines do pay attention to the robots.txt file, as they get sued if they don’t. Ask Google if you want to know more about that file.
As a last tip, if you want to have your information as the first hit, provide that information on your own. That works pretty good for me and most people are too lazy to check the links which do not show up on top.