Recommendation
Here is an example search that lists servers that have not removed Geometrixx:use this url in search engine for search :
inurl:/content/geometrixx
- First and foremost, as a best practice, recommend all CQ5 author and publish servers be put behind a firewall, not publicly accessible.
- Only your web server (dispatcher) should be in front of the firewall. If your author and publish servers are behind a firewall, there won’t be any way for Google to index them.
Here is an example search that lists servers that have not removed Geometrixx:use this url in search engine for search :
inurl:/content/geometrixx
- First and foremost, as a best practice, recommend all CQ5 author and publish servers be put behind a firewall, not publicly accessible.
- Only your web server (dispatcher) should be in front of the firewall. If your author and publish servers are behind a firewall, there won’t be any way for Google to index them.
Solution:
ROBOTS.txt
If it is absolutely necessary for author or publish server to be in front of a firewall, we should add a robots.txt file to the root directory /.
- This file will prevent most search engines from displaying your server in search results.
Here are the steps for doing this:
- Navigate to CRXDelight at {server}/crx/de/ (Make sure you’re logged in as admin)
- Right click on your root node, and go to Create … > Create File …
1. Name the file robots.txt
2. Place the following code in the file, and save it:
1. User-agent: *
2. Disallow: /
3. Now we have to grant the anonymous user read access to the file. To do this, navigate to the user admin section at {server}/useradmin(http://loclhost:4502/useradmin)
4. Open the anonymous user, and click on the permissions tab
5. Grant read access to the robots.txt file, then click save
- Verify the robots.txt file exists and is accessible by first logging out, then navigating to {server}/robots.txt (localhost:4502/robots.txt)
- If it’s there, search engines should no longer index your server
- Repeat these actions for all author/publish servers that are publicly accessible.
ROBOTS.txt
If it is absolutely necessary for author or publish server to be in front of a firewall, we should add a robots.txt file to the root directory /.
- This file will prevent most search engines from displaying your server in search results.
Here are the steps for doing this:
- Navigate to CRXDelight at {server}/crx/de/ (Make sure you’re logged in as admin)
- Right click on your root node, and go to Create … > Create File …
1. Name the file robots.txt
2. Place the following code in the file, and save it:
1. User-agent: *
2. Disallow: /
3. Now we have to grant the anonymous user read access to the file. To do this, navigate to the user admin section at {server}/useradmin(http://loclhost:4502/useradmin)
4. Open the anonymous user, and click on the permissions tab
5. Grant read access to the robots.txt file, then click save
- Verify the robots.txt file exists and is accessible by first logging out, then navigating to {server}/robots.txt (localhost:4502/robots.txt)
- If it’s there, search engines should no longer index your server
- Repeat these actions for all author/publish servers that are publicly accessible.
Robots.txt related findings
Finding ID
|
Name
|
Total risk
|
Effort to Fix
|
RB1
|
Enable robots.txt in prod author and Publishers
|
HIGH
|
Medium
|
No comments:
Post a Comment
If you have any doubts or questions, please let us know.