CSSelly
CSSelly is a Java implementation of the W3C Selectors Level 3 specification.
It's small, fast and extendable. CSSelly parses string containing CSS selectors; this data may be used by any HTML parser. Yet it works the best with Lagarto DOM tree and our Jerry, the HTML parser with JQuery friendly API.
Usage
Parsing
CSSelly csselly = new CSSelly("div:nth-child(2n) span#jodd"); List<CssSelector> selectors = csselly = csselly.parse();
Selecting
Here is how parsed CSS selectors information may be used on Document
,
a root node of some HTML content parsed by Lagarto.
NodeSelector nodeSelector = new NodeSelector(document); LinkedList<Node> selectedNodes = nodeSelector.select("div#jodd li");
Resulting list is a set of nodes that matches selector.
See Jerry for more convenient HTML manipulation, using JQuery-friendly API.
Supported selectors
Default
*
any elementE
an element of type EE[foo]
an E element with a "foo" attributeE[foo="bar"]
an E element whose "foo" attribute value is exactly equal to "bar"E[foo~="bar"]
an E element whose "foo" attribute value is a list of whitespace-separated values, one of which is exactly equal to "bar"E[foo^="bar"]
an E element whose "foo" attribute value begins exactly with the string "bar"E[foo$="bar"]
an E element whose "foo" attribute value ends exactly with the string "bar"E[foo*="bar"]
an E element whose "foo" attribute value contains the substring "bar"E[foo|="en"]
an E element whose "foo" attribute has a hyphen-separated list of values beginning (from the left) with "en"E:root
an E element, root of the documentE:nth-child(n)
an E element, the n-th child of its parentE:nth-last-child(n)
an E element, the n-th child of its parent, counting from the last oneE:nth-of-type(n)
an E element, the n-th sibling of its typeE:nth-last-of-type(n)
an E element, the n-th sibling of its type, counting from the last oneE:first-child
an E element, first child of its parentE:last-child
an E element, last child of its parentE:first-of-type
an E element, first sibling of its typeE:last-of-type
an E element, last sibling of its typeE:only-child
an E element, only child of its parentE:only-of-type
an E element, only sibling of its typeE:empty
an E element that has no children (including text nodes)E#myid
an E element with ID equal to “myid”.E F
an F element descendant of an E elementE > F
an F element child of an E elementE + F
an F element immediately preceded by an E elementE ~ F
an F element preceded by an E element
Extension
Here is the list of additional pseudo classes and pseudo functions supported by CSSelly:
:first
:last
:button
:checkbox
:file
:header
:image
:input
:parent
:password
:radio
:reset
:selected
:checked
:submit
:text
:even
:odd
:eq(n)
:gt(n)
:lt(n)
:contains(text)
Custom user classes and functions
CSSelly allows user to create custom pseudo classes and functions.
Custom pseudo class
For custom pseudo classes extend the PseudoClass
and implement method match(Node node)
. This method should return true
if a node is matched. You may also override method getPseudoClassName()
if you don't want to auto-generate pseudo class name from class name. For example:
public static class MyPseudoClass extends PseudoClass { @Override public boolean match(Node node) { return node.hasAttribute("jodd-attr"); } @Override public String getPseudoClassName() { return "some-cool-name"; } }
Then register your pseudo class with:
PseudoClassSelector.registerPseudoClass(MyPseudoClass.class);
From that moment you will be able to find all nodes with the attribute jodd-attr
using the :some-cool-name
pseudo class.
Custom pseudo function
Similar to pseudo classes, for custom pseudo function implement the PseudoFunction
class. This time, however, you need to also implement a method
that parses input expression. This expression is later passed to the matching method. Here is an example, lets make a function that matches all nodes with certain name length:
public static class MyPseudoFunction extends PseudoFunction { @Override public Object parseExpression(String expression) { return Integer.valueOf(expression); } @Override public boolean match(Node node, Object expression) { Integer size = (Integer) expression; return node.getNodeName().length() == size.intValue(); } @Override public String getPseudoFunctionName() { return "super-fn"; } }
Register this function with:
PseudoFunctionSelector.registerPseudoFunction(MyPseudoFunction.class);
You can use it like this: :super-fn(3)
to match all nodes with names size equal to 3.
Escaping
CSSelly supports escaping characters using backslash, e.g.: "nspace\:name
" refers to the tag name
"nspace:name
" (that uses namespaces) and not for pseudo class "name
".