File shares are scraped using Xillio PIPE. This article explains both setting up Xillio PIPE for Insights as well as configuring the PIPE plugin as a repository. Find in this article the download (if logged on) and the manual belonging to it.

PIPE configuration

To start, remove the default pipelines that come with Xillio PIPE. We are going to setup new pipelines, one for each root to scrape.

Create a new pipeline by copying the example configuration below into a new file. The name of the file is used as the name of the pipeline. Note that supplied example configuration has hashing enabled by default. You can disable this by changing the value of filesTo to ["files"].

{
  "modules": {
    "getFiles": {
      "type": "filescraper",
      "rootDir": "c:/",
      "zoneId" : "UTC",
      "dateTimePattern": "yyyy-MM-dd'T'HH:mm:ss.SSSX",
      "filesTo": [
        "fileHasher"
      ],
      "foldersTo": [
        "folders"
      ],
      "errorsTo" : [
        "errors" 
      ]
    },
    "fileHasher" : {
      "type": "hasher",
      "perCore": true,
      "concurrency": 1,
      "hashes": [
        "MD5"
      ],
      "outputTo": [
        "files"
      ],
      "errorsTo" : [
        "errors" 
      ]
    },
    "files": {
      "type": "jsonlineWriter",
      "preClean" : "true",
      "filePrefix": "files",
      "outputDir" : "c:/insights/pipeOutput/scraper" 
    },
    "folders": {
      "type": "jsonlineWriter",
      "filePrefix": "folders",
      "preClean" : "true",
      "outputDir" : "c:/insights/pipeOutput/scraper" 
    },
    "errors": {
      "type": "jsonlineWriter",
      "filePrefix": "errors",
      "preClean" : "true",
      "outputDir" : "c:/insights/pipeOutput/scraper" 
    }
  }
}


Create additional pipelines if you need to scrape multiple roots. Make sure that you store the results of each pipeline in a separate directory.

Repository configuration

To setup a file share as a repository add the example configuration, found at the end of this page, to the repositories.json. Change the settings according to your needs. Repeat the structure for each configured pipeline.

* = required

xillioPipe

Object containing the connection details for Xillio PIPE. Defaults to:

{
   "host" : "http://localhost:5050"
}


pipeline

The name of the pipeline that is configured in Xillio PIPE. Defaults to scraper.

path

The path to the directory where the configured pipeline stores its results. Defaults to c:/insights/pipeOutput/scraper.

startPipeline Specifies if the configured pipeline needs to run. If set to false the results directory will be read and the results will not be updated. Defaults to true.

Example configuration

 "repositoryName" : {
        "type" : "pipe",
        "config": {
            "xillioPipe" : {
                "host" : "http://localhost:5050"
            },
            "pipeline" : "scraper",
            "path" : "c:/insights/pipeOutput/scraper",
            "startPipeline" : true
        }
    }