The weight of a property name in AEM

Jackrabbit 2.x / CQ5

Back in Jackrabbit 2.x, and therefore in CQ/AEM 5.x, everything was indexed by default other than you stated otherwise.

This translated that every time you run a query, Lucene was there for you serving an indexed answer.

In this scenario it didn’t really matter what property name you used for you application or if you defined additional node types.

This had the advantage that everything was indexed and therefore an index was almost always there serving your query and you didn’t have to think about it.

On the other hand we all know that the bigger the index is, the slower it will be in serving you the result set, as it will simply have to analyse more data.

Jackrabbit Oak / AEM6

Nowadays Apache Jackrabbit Oak, aka Jackrabbit 3.x, is the foundation of AEM6.

Opposed to JR2, in Oak almost nothing is indexed by default. Which means that if you would take a vanilla Oak and run a query, you have very good chances you’re going to traverse the repository (depending on your query).

This has the advantage that you can create very dedicated indexes that will overall perform better as they will be as tailored as possible to your query.

The disadvantage are that you’ll have to define each index and that you’ll have to know how fine tune your queries for getting the most out of this approach.

Not going deeply into the configuration of each individual available index type I think the two main properties, you’ll end-up tuning for better performances are

  • propertyNames
  • declaringNodeTypes

the first one will define what property your index is going to index while the second will restrict the index to a specific node type. In other words the condition for a node to be included into an index are

$nodetype in ($declaringNodeTypes) AND $property = $propertyNames

caveats

  • indexes on more than one property are not supported (yet)
  • an index cannot serve conditions where you ask something like WHERE property IS NULL.

This take us to the very topic of this post: be careful on how you use your property or structure your queries.

Remember the rule: the smaller the index the more efficient the query.

Let’s see how important is a property and a node type with an example then.

If you have a custom application in which you want to extract nodes after a specific date, a way of doing so would be

SELECT * FROM [nt:base]
WHERE [jcr:lastModified] >= CAST('...' AS DATE)

this query is very bad. It can’t really makes use of any index.

Let’s say you create an index on jcr:lastModified. The index itself will be almost as big as the repository as by default in AEM (almost?) every node as mix:lastModified.

A better way would be

SELECT * FROM [nt:base]
WHERE [myLastModified] >= CAST('...' AS DATE)

this will allow you to define an index on the property mylastModified which you’ll know it will contain only your application data. But we can get even better.

Let’s assume you have a very sparse and large content structure so you can’t apply path filters and you don’t want on the other side to create tons of myLastModified for addressing different aspects of your information.

Let’s assume then, for sake of example, that you categorise your data into:

  • comments
  • news
  • articles.

What you could do is create three different node types:

  • my:comments
  • my:news
  • my:articles

now you can define three different, very dedicated indexes

  • declaringNodeTypes = my:comments AND propertyNames = myLastModified
  • declaringNodeTypes = my:news AND propertyNames = myLastModified
  • declaringNodeTypes = my:articles AND propertyNames = myLastModified

One eventual query will look like

SELECT * FROM [my:comments]
WHERE [myLastModified] >= CAST('...' AS DATE)

Actually in the example above, assuming your nodes comes with mix:lastModified, as soon as you create a custom node type you could have simply used the jcr:lastModified date as they will be (I expect) the same size. You can change the exercise above with any property name like: colours, size, tags, etc.

References

Advertisements

CQ5 author instance and Apache

Here at a customer site, we have a CQ application deployed into “/” context root (as usual) of weblogic 9.2 alongside other functional web apps. All of this sites behind Apache httpd which serves some html files as well as reverse proxies to weblogic based on some urls.

Generally there’s no problem in this except that in this case, when accessing from apache the “/” it serves its htdocs directory so we are not able to login to cq. When logging into CQ, the login form POSTs to http://admin:admin@<server>/?sling:authRequestLogin=1 which needs to be proxied to CQ’s / instead of Apache’s htdocs. As apache is serving its content, no CQ filter is fired.

If we make the weblogic-handler module to come in action for “/” we loose all the apache content. As well, we cannot move all the apache content into CQ.

So the problem is how do we make Apache to use weblogic handler only for a particular URL and query string? We’ve come to a solution combining both Apache configs and some crx node creation.

First, the apache configs:

<LocationMatch "^/(content|libs|siteadmin|apps|bin|home|etc|welcome|var|tmp|cf|useradmin|damadmin|miscadmin|workflow|tagging|inbox|cqauthurl)">
SetHandler weblogic-handler
</LocationMatch>

This sets all the urls in the LocationMatch to be handled by weblogic-handler to proxy to CQ.

As you may have noticed we are dealing with a /cqauthurl that is not a CQ url. We need to use Content Explorer to create the node /cqauthurl (nt:unstructured). This will avoid a 404.

The add the following rewrite rule in Apache configs; as general or within the location should make no difference:

RewriteRule ^/cqlogin$ /libs/cq/core/content/login.html?resource=/siteadmin [R]
RewriteCond %{QUERY_STRING}  ^sling:authRequestLogin=1$
RewriteRule (.*) /cqauthurl [PT]

The first line will allow the user to bookmark a simple url like http://<server>/cqlogin. When accessing this url there will be a redirect onto the CQ’s Login.

The second and third lines tells apache: when on the query string there’s sling:authRequestLogin=1 then rewrite to /cqauthurl.

This lets the user’s login go through to CQ and caches the basic auth credentials for all paths under http://author-server:port/

CQ Site admin needs to be accessed via http://author-server:port/siteadmin instead of the root context.

weekly links 2010-49

Captain Crunch needs your help

When John Draper aka Captain Crunch is on form, great things happen. A legendary hacker, he created the infamous Blue Box. He went on to invent EasyWriter, the first ever word processor for the Apple II.

Getting Resources and Properties in Sling

The Five Competencies of User Experience Design

The ASF Resigns From the JCP Executive Committee

Weekly links 2010-29

The Passage of Time by ToniVCCommons Configuration

Commons Configuration provides a generic configuration interface which enables a Java application to read configuration data from a variety of sources.

Lambdas in Java Preview – Part 1: The Basics | Javalobby

…closures (or better lambda expressions) will (probably) be added to JDK7..

Lambdas in Java Preview – Part 2: Functional Java | Javalobby

…giving some practical examples of lambdas, how functional programming in Java could look like and how lambdas could affect some of the well known libraries in Java land…

Lambdas in Java Preview – Part 3: Collections API

In this part I’ll focus on how the addition of lambdas could affect one of the most used standard APIs – the Collections API.

First Symbian^4 Screenshots Surface

for those interested in, there are some previews of the 4th version of Symbian OS

Life is Too Short to Parse Command Line Parameters

Recently, DZone MVB Cedric Beust unveiled JCommander, a tool he developed that takes away the manual labor of command line parsing.  Just six days after posting his announcement for JCommander 1.0, he’s already got an expanded 1.1 version out.  New features include simple internationalization, type converters, and password parameters.

OutOfMemoryError in Eclipse with JDK 1.6.0_21, on Windows

This past weekend I spent a good amount of time trying to solve an OutOfMemoryError that made Eclipse crash every 5 minutes.

Sun Java 6 on Ubuntu 10.04 10.10 and later

Apparently the Ubuntu folks have started putting some weight behind their recommendations for switching to the “OpenJDK.” Fortunately, the official, “proprietary” Java is still available through another Ubuntu repository.

Effective JSON with Google Web Toolkit

Hopefully every developer knows JSON protocol or has at least heard about it. In Google Web Toolkit technology, JSON is a very common protocol

Specify proxy in Maven

I’ve just started using Maven, and I suddenly stub into a problem where maybe everybody in big corporate have stubbed: connecting through a proxy.

Maven can connect to internet through an HTTP proxy that support basic authentication. In order to specify it, just create (or modify if exist) the file ~/.m2/settings.xml in the following way:

<settings>
<proxies>
<proxy>
<active>true</active>
<protocol>http</protocol>
<host>put.here.the.proxy.address</host>
<port>3128</port>
<username>yourUserName</username>
<password>yourPassword</password>
<nonProxyHosts /> <!– useful for location to make bypassed by proxy –>
</proxy>
</proxies>
</settings>

That’s it! 🙂

SSL and Java (ciak 2)

credits: Vagamundos (from flickr)

credits: Vagamundos (from flickr)

Last time we spoke about accessing a site via HTTPS using the pure Sun way.

Today, we see the same problem, solved with the apache HttpClient libraries. This library require a keystore specified in the code. In order to generate a keystore, you will have to download a certification file as described in the previous post and then create a keystore using the keytool program. I don’t remember well how to create a new keystore with the required certificate via keytool (I did’t write it down), but reading the help of the program it should be something like:

keytool -importcert -alias <my_alias_certificate> -file <path_to_the_cer_file> -keystore myKeystore.ks

where myKeystore.ks is the name of the file containing the keystore. The pros of using this approach is that you can provide a .ks file among with the program, in a location desired and the program will use it, avoiding post-installation procedures to register the certificate on each jvm.

So, assuming to have our keystore into a sub-directory certs the code for using the site is in the pdf as usual.