Statement : How to protect AEM instances from Google searches.
Recommendation
Here is an example search that lists servers
that have not removed Geometrixx:use this url in search engine for search :
inurl:/content/geometrixx
-
First and foremost, as a best practice, recommend all
CQ5 author and publish servers be put behind a firewall, not publicly
accessible.
-
Only your web server (dispatcher) should be in front of the firewall. If your
author and publish servers are behind a firewall, there won’t be any way for
Google to index them.
Solution:
ROBOTS.txt
If it is
absolutely necessary for author or publish server to be in front of a firewall,
we should add a robots.txt file to the root directory /.
-
This file
will prevent most search engines from displaying your server in search results.
Here are the steps for doing this:
-
Navigate to
CRXDelight at {server}/crx/de/ (Make sure you’re logged in as admin)
-
Right click
on your root node, and go to Create … > Create File …
1. Name the file robots.txt
2. Place the
following code in the file, and save it:
1. User-agent: *
3. Now we have to
grant the anonymous user read access to the file. To do this, navigate to the
user admin section at {server}/useradmin(loclhost:4502/useradmin)
4. Open the
anonymous user, and click on the permissions tab
-
Verify the robots.txt file exists and is accessible by first
logging out, then navigating to {server}/robots.txt
(localhost:4502/robots.txt)
-
If it’s there, search engines should no longer index your server
-
Repeat these actions for all author/publish servers that are publicly
accessible.
Robots.txt related findings
Finding ID
|
Name
|
Total risk
|
Effort to Fix
|
RB1
|
Enable robots.txt in prod author and
Publishers
|
HIGH
|
Medium
|
Very nice article, php scriptler,nulled scripts, wordpress temaları, warez forum
ReplyDeletevery helpful thanks
ReplyDelete