JSOUP is one of very useful DOM parser library available, It provides a very convenient API for extracting and manipulating data, using the best of DOM, CSS, and jquery-like methods.
In real world, there will a lots of time you will asked by your cool manager to parse some webpage’s DOM (Document Object Model). It may to extract images, links, content or you might be asked to develop some content migration tool.
Our component in AEM will accept a URL. Based on the given URL what you want to extract you can check using available checkboxes.
Example: I have the URL as ” http://help-forums.adobe.com/content/adobeforums/en/experience-manager-forum/adobe-experience-manager.html “. If I want to extract Images from this URL, I should check Images.
Here are the steps to get this awesome stuff done:
1. Create a component, which should have
Textarea for URL, URLs are pretty long to its better to get textarea than textfield
Three checkboxes, each for Images, Hyperlinks, Imports ( css files in imported in webpage )
1. Create a component, which should have
Textarea for URL, URLs are pretty long to its better to get textarea than textfield
Three checkboxes, each for Images, Hyperlinks, Imports ( css files in imported in webpage )
2. As we have sightly here in .html file of component
div data-sly-test="${wcmmode.edit && !properties.address}">
Please specify the url
</div>
Results:
<div data-sly-test="${properties.link=='true'}" data-sly-use.v="com.mycompany.myproject.components.Parser">
<b>Here are the Links</b>
<ul data-sly-list="${v.links}">
<li>${item}</li>
</ul>
</div>
<div data-sly-test="${properties.imp=='true'}" data-sly-use.imp="com.mycompany.myproject.components.Parser">
<b>Here are the Imports</b>
<ul data-sly-list="${imp.imports}">
<li>${item}</li>
</ul>
</div>
<div data-sly-test="${properties.img=='true'}" data-sly-use.img="com.mycompany.myproject.components.Parser">
<b>Here are the Images</b>
<ul data-sly-list="${img.images}">
<li><img src="${item}"></li>
</ul>
</div>
3. company.myproject.components is my Sling Model class which is backed by com.mycompany.myproject.components.Parser a JSOUP Parser Class
4. Sling Model
package com.mycompany.myproject.components;
import com.adobe.cq.address.api.AddressException;
import com.adobe.cq.address.api.location.Coordinates;
import com.adobe.cq.address.api.location.GeocodeProvider;
import java.util.ArrayList;
import java.util.List;
import javax.annotation.PostConstruct;
import javax.inject.Inject;
import javax.inject.Named;
import org.apache.sling.api.resource.Resource;
import org.apache.sling.models.annotations.Default;
import org.apache.sling.models.annotations.Model;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
@Model(adaptables = {
Resource.class
})
public class Parser {
Logger logger = LoggerFactory.getLogger(Parser.class);
public static final String DEFAULT = "http://jsoup.org/cookbook/input/load-document-from-url";
public static final String TRUE = "true";
public static final String FALSE = "";
@Inject
@Named("address")
@Default(values = {
"http://jsoup.org/cookbook/input/load-document-from-url"
})
protected String addressDescription;
@Inject
@Named("imp")
@Default(values = {
""
})
protected String imp;
@Inject
@Named("link")
@Default(values = {
""
})
protected String link;
@Inject
@Named("img")
@Default(values = {
""
})
protected String img;
@Inject
private GeocodeProvider geocode;
public Coordinates coordinates;
private List < String > file;
private List < String > imports;
private List < String > images;
@PostConstruct
public void activate()
throws AddressException {
this.file = new ArrayList();
this.imports = new ArrayList();
this.images = new ArrayList();
this.logger.info("URL is {}", this.addressDescription);
if (this.link.equals("true")) {
this.file = new DataParser().parseLinks(this.addressDescription);
this.logger.info("file size {}", Integer.valueOf(this.file.size()));
}
if (this.img.equals("true")) {
this.images = new DataParser().parseImages(this.addressDescription);
this.logger.info("Images size {}", Integer.valueOf(this.images.size()));
}
if (this.imp.equals("true")) {
this.imports = new DataParser().parseImports(this.addressDescription);
this.logger.info("Imports size {}", Integer.valueOf(this.imports.size()));
}
this.coordinates = this.geocode.geocode(this.addressDescription);
}
public List < String > getLinks() {
return this.file;
}
public List < String > getImages() {
return this.images;
}
public List < String > getImports() {
return this.imports;
}
}
5. Parser
package com.mycompany.myproject.components;
import java.io.IOException;
import java.util.ArrayList;
import java.util.List;
import org.jsoup.Connection;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
public class DataParser {
Logger logger = LoggerFactory.getLogger(DataParser.class);
public Document docParse(String url) {
try {
return Jsoup.connect(url).get();
} catch (IOException e) {
e.printStackTrace();
}
return null;
}
public List < String > parseLinks(String url) {
List < String > hyperLinks = new ArrayList();
try {
Elements links = docParse(url).select("a[href]");
for (Element link: links) {
hyperLinks.add(link.attr("abs:href"));
this.logger.info(link.attr("abs:href"));
}
} catch (Exception e) {
this.logger.info("Something went wrong for parsing link.. {}", e);
}
return hyperLinks;
}
public List < String > parseImports(String url) {
List < String > imports = new ArrayList();
try {
Elements imp = docParse(url).select("link[href]");
for (Element i: imp) {
imports.add(i.attr("abs:href"));
}
} catch (Exception e) {
this.logger.info("Something went wrong for parsing imports.. {}", e);
}
return imports;
}
public List < String > parseImages(String url) {
List < String > images = new ArrayList();
try {
Elements img = docParse(url).select("[src]");
for (Element i: img) {
if (i.tagName().equals("img")) {
images.add(i.attr("abs:src"));
this.logger.info(i.attr("abs:src"));
}
}
} catch (Exception e) {
this.logger.info("Something went wrong for parsing images.. {}", e);
}
return images;
}
}
com.mycompany.myproject.components.Parser:- This class have 3 function each for extracting Hyperlink, Image, Imports which is returned to Sling Model class which later return this to Sightly file in Component.
Try to get this working on your AEM Instance, We have shared the complete article on Adobe AEM Community,
No comments:
Post a Comment
If you have any doubts or questions, please let us know.