Saturday, February 21, 2015

Registry resource indexing in WSO2 server.

The registry resources are going to be stored in underline database as blob content. If we want to search some resource by value of content/attribute(s) we have to read the entire content and pass through the content to find the matching resource(s). When the resource volume is high,  that will affect to the performance of the search operations. To get rid of that issue we have introduced the Apache solr based content indexing/searching.

Set of  WSO2 products (WSO2 API Manager, WSO2 Governance Registry, WSO2 User Engagement Server ..etc) are using this feature to list , search the registry artifacts.

The embedded solr server has configured and it is scheduled to start after the server start-up. The solr  configuration can be found in registry.xml file.



Eg: registry.xml file for WSO2 API Manager 1.8.0

<indexingConfiguration>
        <startingDelayInSeconds>60</startingDelayInSeconds>
        <indexingFrequencyInSeconds>5</indexingFrequencyInSeconds>
        <!--number of resources submit for given indexing thread -->
        <batchSize>50</batchSize>
         <!--number of worker threads for indexing -->
        <indexerPoolSize>50</indexerPoolSize>
        <!-- location storing the time the indexing took place-->
        <lastAccessTimeLocation>/_system/local/repository/components/org.wso2.carbon.registry/indexing/lastaccesstime</lastAccessTimeLocation>
        <!-- the indexers that implement the indexer interface for a relevant media type/(s) -->
        <indexers>
            <indexer class="org.wso2.carbon.registry.indexing.indexer.MSExcelIndexer" mediaTypeRegEx="application/vnd.ms-excel"/>
            <indexer class="org.wso2.carbon.registry.indexing.indexer.MSPowerpointIndexer" mediaTypeRegEx="application/vnd.ms-powerpoint"/>
            <indexer class="org.wso2.carbon.registry.indexing.indexer.MSWordIndexer" mediaTypeRegEx="application/msword"/>
            <indexer class="org.wso2.carbon.registry.indexing.indexer.PDFIndexer" mediaTypeRegEx="application/pdf"/>
            <indexer class="org.wso2.carbon.registry.indexing.indexer.XMLIndexer" mediaTypeRegEx="application/xml"/>
            <indexer class="org.wso2.carbon.governance.registry.extensions.indexers.RXTIndexer" mediaTypeRegEx="application/wsdl\+xml" profiles ="default,uddi-registry"/>
            <indexer class="org.wso2.carbon.governance.registry.extensions.indexers.RXTIndexer" mediaTypeRegEx="application/x-xsd\+xml " profiles ="default,uddi-registry"/>
            <indexer class="org.wso2.carbon.governance.registry.extensions.indexers.RXTIndexer" mediaTypeRegEx="application/policy\+xml" profiles ="default,uddi-registry"/>
            <indexer class="org.wso2.carbon.governance.registry.extensions.indexers.RXTIndexer" mediaTypeRegEx="application/vnd.(.)+\+xml" profiles ="default,uddi-registry"/>
            <indexer class="org.wso2.carbon.registry.indexing.indexer.XMLIndexer" mediaTypeRegEx="application/(.)+\+xml"/>
            <indexer class="org.wso2.carbon.registry.indexing.indexer.PlainTextIndexer" mediaTypeRegEx="text/(.)+"/>
            <indexer class="org.wso2.carbon.registry.indexing.indexer.PlainTextIndexer" mediaTypeRegEx="application/x-javascript"/>
        </indexers>
        <exclusions>
            <exclusion pathRegEx="/_system/config/repository/dashboards/gadgets/swfobject1-5/.*[.]html"/>
            <exclusion pathRegEx="/_system/local/repository/components/org[.]wso2[.]carbon[.]registry/mount/.*"/>
        </exclusions>
    </indexingConfiguration>  

 --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
<startingDelayInSeconds>60</startingDelayInSeconds>
When the server is starting (at the time registry indexing service register in OSGI ) indexing task is scheduling with 60s delay.
<indexingFrequencyInSeconds>5</indexingFrequencyInSeconds>
Indexing task is running every 5s second to index the newly added, updated and deleted resources.
<batchSize>50</batchSize>
Max resource count can handle by one indexing task.
<indexerPoolSize>50</indexerPoolSize>
Indexing task thread pool size.
<lastAccessTimeLocation>/_system/local/repository/components/org.wso2.carbon.registry/indexing/lastaccesstime</lastAccessTimeLocation>
The Last indexing task executed time is stored in that location. (Then next indexing task will fetch resources from registry which are updated/added after the last indexing time. This prevents re-indexing resource, which is already indexed but not updated/deleted.)

<indexer class="org.wso2.carbon.registry.indexing.indexer.MSExcelIndexer" mediaTypeRegEx="application/vnd.ms-excel"/>
There are set of indexers to index different type of resources/artifacts based on media type.
<exclusion pathRegEx="/_system/config/repository/dashboards/gadgets/swfobject1-5/.*[.]html"/>
You can exclude the resource indexing which are stored in given path(s).


How to select resource(s) to index.

1. Registry indexing task read the activity logs from the REG_LOG table and filter the logs which are added/updated after the timestamps stored in lastAccessTimeLocation.



2. Then it filter the relevant indexer(configured in registry.xml) matching with the media type. if the matching media type found , it creates the  indexable resource file and send to solr server to index.

3. The Governance API (GenericArtifactManager) and Registry API (ContentBasedSearchService) provides the APIs to search the indexed resources through the indexing client.


Example:

WSO2 API Manager store the API meta data as configurable governance artifact in registry. Those API meta data going to be indexed using the RXTIndexer. The Governance API provides an API to search the indexed api artifacts. Following client code search the api which contain the  state as "PUBLISHED" and visibility as "public".

/*
*  Copyright (c) 2005-2010, WSO2 Inc. (http://www.wso2.org) All Rights Reserved.
*
*  WSO2 Inc. licenses this file to you under the Apache License,
*  Version 2.0 (the "License"); you may not use this file except
*  in compliance with the License.
*  You may obtain a copy of the License at
*
*    http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing,
* software distributed under the License is distributed on an
* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
* KIND, either express or implied.  See the License for the
* specific language governing permissions and limitations
* under the License.
*/
package org.wso2.carbon.registry.ws.client.sample;

import org.apache.axis2.context.ConfigurationContext;
import org.apache.axis2.context.ConfigurationContextFactory;
import org.wso2.carbon.base.ServerConfiguration;
import org.wso2.carbon.governance.api.generic.GenericArtifactManager;
import org.wso2.carbon.governance.api.generic.dataobjects.GenericArtifact;
import org.wso2.carbon.governance.api.util.GovernanceUtils;
import org.wso2.carbon.governance.client.WSRegistrySearchClient;
import org.wso2.carbon.registry.core.Registry;
import org.wso2.carbon.registry.core.pagination.PaginationContext;
import org.wso2.carbon.registry.core.session.UserRegistry;
import org.wso2.carbon.registry.ws.client.registry.WSRegistryServiceClient;

import java.io.File;
import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;
import java.util.Map;

public class SampleWSRegistrySearchClient {

    private static ConfigurationContext configContext = null;

    private static final String CARBON_HOME = "/home/ajith/wso2/packs/wso2am-1.8.0/";
    private static final String axis2Repo = CARBON_HOME + File.separator + "repository" +
            File.separator + "deployment" + File.separator + "client";
    private static final String axis2Conf = ServerConfiguration.getInstance().getFirstProperty("Axis2Config.clientAxis2XmlLocation");
    private static final String username = "admin";
    private static final String password = "admin";
    private static final String serverURL = "https://localhost:9443/services/";
    private static Registry registry;

    private static WSRegistryServiceClient initialize() throws Exception {

        System.setProperty("javax.net.ssl.trustStore", CARBON_HOME + File.separator + "repository" +
                File.separator + "resources" + File.separator + "security" + File.separator +
                "wso2carbon.jks");
        System.setProperty("javax.net.ssl.trustStorePassword", "wso2carbon");
        System.setProperty("javax.net.ssl.trustStoreType", "JKS");
        System.setProperty("carbon.repo.write.mode", "true");
        configContext = ConfigurationContextFactory.createConfigurationContextFromFileSystem(
                axis2Repo, axis2Conf);
        return new WSRegistryServiceClient(serverURL, username, password, configContext);
    }

    public static void main(String[] args) throws Exception {
        try {
            registry = initialize();
            Registry gov = GovernanceUtils.getGovernanceUserRegistry(registry, "admin");
            // Should be load the governance artifact.
            GovernanceUtils.loadGovernanceArtifacts((UserRegistry) gov);
            //Initialize the pagination context.
            PaginationContext.init(0, 20, "", "", 10);
            WSRegistrySearchClient wsRegistrySearchClient = new WSRegistrySearchClient(serverURL, username, password, configContext);
            //This should be execute to initialize the AttributeSearchService.
            wsRegistrySearchClient.init();
            //Initialize the GenericArtifactManager
            GenericArtifactManager artifactManager = new GenericArtifactManager(gov, "api");
            Map<String, List<String>> listMap = new HashMap<String, List<String>>();
            //Create the search attribute map
            listMap.put("overview_status", new ArrayList<String>() {{
                add("PUBLISHED");
            }});
            listMap.put("overview_visibility", new ArrayList<String>() {{
                add("public");
            }});
            //Find the results.
            GenericArtifact[] genericArtifacts = artifactManager.findGenericArtifacts(listMap);

            for(GenericArtifact artifact : genericArtifacts){
                System.out.println(artifact.getQName().getLocalPart());
            }

        } finally {
            PaginationContext.destroy();
            ((WSRegistryServiceClient)registry).logut();
            System.exit(0);
        }

    }
}