Reading/decoding BSON files from a MongoDB mongodump backup

MongoDB is an extremely useful database format, especially for projects where the scope is developing and changing. The 'documents' that represent items of data don't need to have identical elements, despite belonging to the same table.  This makes evolving requirements much easier to implement.

In a production environment, the live database can be backup up with a simple mongodump command, and some service providers implement regular backups automatically. So if you need to delve back in time you can expand the backup and take a look inside.

The actual data documents are saved in BSON format, one file per Collection. Opening these files up as they are, reveal some of the keys and string values, but much of the useful data and structure is encoded and therefore unavailable.

To tranform this data into a more useful format the best tool is bsondump, which is provided as part of the command line tools available from MongoDB. Either get it from MongoDB's Server download page, or on MacOS you can use Homebrew.

brew tap mongodb/brew

brew install mongodb-community@4.2

This will include the bsondump function. I always like to make web tools to provide a nice UI for actions like this, so you can lever php's exec function to run the command for you on your localhost server, using something like:

if (!empty($_FILES['file']['tmp_name'])) {
  $original = $_FILES['file']['name'];
  $output = 'output.json';
  if (!empty($original)) {
    if (substr($original,-5) == '.bson') $output = str_replace('.bson', '.json', $original);
  }
  $command = '/usr/local/bin/bsondump --outFile=' .$output. ' ' . $_FILES['file']['tmp_name'];
  exec($command, $retArr, $retVal);
  $json = file_get_contents($output, FILE_TEXT);
}

Now you can exploit regex tools to make custom searches, of a string stored in $seek, like:

$results = array();
if (!empty($seek)) {
    $matches = array();
    preg_match_all('/^.*?'.$seek.'.*?$/mi', $json, $results);
}

The HTML will include the input form and ouput the results, along the lines of:

<form method="post" enctype="multipart/form-data">

<div class="input-item">
<label for="seek">Seek String</label>
<input type="text" name="seek" id="file" size="100" value="<?php print ($seek)?$seek:''; ?>" /><br>
BSON to decode:
<input type="file" name="file" id="file" size="100" value="" />
</div>

<div class="input-item">
<input class="submit" type="submit" name="submit" id="submit" value="Seek" width="180" />
</div>

<!-- Results -->
<div class="output">
<?php

if (!empty($results)) :
  print '<p>Found: ' . count($results[0]) . '</p>' . "\n";
  print '<textarea rows="20">';
  foreach ($results[0] as $i=>$r) print $i . ':' . "\n" . $r  . "\n\n";
  print '</textarea>' . "\n";

  print '<p>Decoded</p>' . "\n";
  print '<textarea rows="20">';
  foreach ($results[0] as $i=>$r) {
    print $i . ':' . "\n";
    $d = json_decode($r, true);
    if (!empty($d)) {
      foreach ($d as $key=>$val) {
        // Customise output based on $key and $val, or just continue if $key is now wanted.
        print $kkey . ':' . $val . ', ';
      }
    }
    print "\n\n";
  }
  print '</textarea>' . "\n";
endif;

?>
</div>
</form>