I’m currently building a web spider with java apache commons. I’m crawling basic google search queries like https://google.com/search?q=word&hl=en
Somehow after about 60 queries I get blocked, it seems they recognize me as a bot and I get a 503 Service Unavailable response
Now the important part:
If I visit the same site with firefox/chrome I still get the desired result.
If I make a GET Request with my Application using the same http header (user-agent, cookies, cache etc.) I am still blocked.
HOW does Google know whether I’m connecting via Application or Chrome-Browser, when there is only the IP and the HTTP-Header as Information?(maybe I’m wrong?)
Are there more parameters to recognize my App? Something that Google sees and I don’t?
(Maybe important: I’m using Chrome Developer Tools and httpbin.org to compare the headers of Browser and Application.)
Thanks a lot