Understand general capabilities and requirements of each product. The RapidMiner Platform includes several products. Although an Administrator may be primarily interested in server capabilities, it still makes sense to start with RapidMiner Studio because that is where processes can be designed and built.
Know what RapidMiner Studio is and what it can do. It is a Java application that is available for many operating systems and it can be used by anyone that will design or run RapidMiner Processes.
Be able to install RapidMiner Studio. Installation can happen with an installer on Windows or Mac; it can also be installed with a script for Linux.
Know the licensing options and limitations. There are different levels of licenses starting at free, and increasing in price. Within one organization, there may be more than one license level, so it can be important for an administrator to understand and track the levels.
Be able to use RapidMiner Server and Studio together. Connecting to a RapidMiner Server is simple from RapidMiner Studio. It should be one of the first tasks after RapidMiner Server installation.
Be able to install and use RapidMiner extensions. They can be installed from the Marketplace. It can be accessed on the web or through Studio.
Understand installation of use of the Radoop extension. Connecting to a Hadoop cluster can be very simple from RapidMiner Studio with Radoop. The simplest way is to connect to a cluster manager an allow the cluster manager to provide the connection details. Other methods are available as well.
Know the locations of configurations and logs. When RapidMiner Studio is installed there is a hidden folder, .RapidMiner, created in the user home. This folder contains information relevant to the installation and the user. It’s the primary location for configurations and logs.
Be able to use a variety of connections. There are many types of connections including database connections that the administrator may need to manage.
Know what RapidMiner Server is and what it can do. RapidMiner Server is designed to make it easy to share processes and models, deploy repeatable processes, and handle large jobs.
Know the Deployment Options for RapidMiner Server
Be able to install RapidMiner Server. This is commonly done for on-premise installations.
Be able to use cloud images. Both Bring Your Own License and Pay As You Go options are available on AWS and Azure.
Know the installation prerequisites. RapidMiner Server stores configuration and other information in a database, so Database Setup is a critical step.
Know the installation options. The installation may be completed with a wizard, or headless. It will require information about the host, location, Java, ports, if it will be registered as a Windows service, and database information. After installation, there are many configuration tasks that can be done through the web interface.
Be able to perform user management. Many organizations have complex requirements regarding different types of access to different data or processes. Care needs to be taken to setup authentication, group membership, and server repository access rights.
Know the different levels of licenses for RapidMiner Server starting at Free and increasing in price. The administrator can Manage Licenses, and set different types of limits on server processes.
Understand the architecture and know how to plan a scalable architecture. When processes (jobs) are submitted to the server, they will belong to a specified Queue. Job Agents that subscribe to the queue will run the job in a Job Container. The server will be able to accept and run jobs immediately with the default queue and Job Agent, but this may not be ideal for scaling in the future.
Know the Security practices. These are highlighted in the server documentation to protect the server and its connections.
Be able to use a variety of connections. This includes database connections that the administrator may need to manage.
Understand what RapidMiner Server’s High Availability architecture is and how it is implemented. It is designed for mission-critical projects.
Know what RapidMiner Radoop is and what it can do. It provides an easy-to-use graphical interface for analyzing data on a Hadoop cluster with a running Hive server. It can be installed as an extension in RapidMiner Server, and in RapidMiner Studio. RapidMiner Radoop Processes can be run on the Hadoop cluster so that only meta-data and results are provided to RapidMiner
Be able to Install RapidMiner Radoop. It requires RapidMiner Studio with the Radoop extension and a compatible Hadoop cluster. Then the connection can be specified manually, with a file, or with a connection manager. After the connection is made and tested in Studio, then the connection can be added to a RapidMiner Server with the Radoop extension installed.
Be able to use RapidMiner Radoop. Processes all have a Radoop Nest Operator that specifies a connection. That Nest operator contains other Radoop operators. Those operators can either directly perform operations on the specified cluster, or they can be Process Pushdown operators that call other RapidMiner operators and push those operations into the cluster.
Know what RapidMiner Real-Time Scoring is and what it can do. It is an add-on product to RapidMiner Server designed for fast scoring use cases via web services.
Be able to setup a Scoring Agent. After RapidMiner Server is already installed, and you have a process ready for high-volume production, you can setup a Scoring Agent for that process. You can then extract a deployment file, which defines web services that are exposed by the Scoring Agent to score data.
Know how to use a Real-Time Scoring. After installing a deployment on the Scoring Agent, you can use the exposed web services to score data.
Know how to use the Marketplace and Manage Extensions. There are settings for the location of extensions as well as the update behavior for Studio. Additionally, many extensions have important settings.