Thursday 7 May 2015

MongoDB M101N Final Exam Question 7

Question 7:

You have been tasked to cleanup a photo-sharing database. The database consists of two collections, albums, and images. Every image is supposed to be in an album, but there are orphan images that appear in no album. Here are some example documents (not from the collections you will be downloading). 

> db.albums.findOne()
{
    "_id" : 67
    "images" : [
        4745,
        7651,
        15247,
        17517,
        17853,
        20529,
        22640,
        27299,
        27997,
        32930,
        35591,
        48969,
        52901,
        57320,
        96342,
        99705
    ]
}

> db.images.findOne()
{ "_id" : 99705, "height" : 480, "width" : 640, "tags" : [ "dogs", "kittens", "work" ] }




From the above, you can conclude that the image with _id = 99705 is in album 67. It is not an orphan.

Your task is to write a program to remove every image from the images collection that appears in no album. Or put another way, if an image does not appear in at least one album, it's an orphan and should be removed from the images collection.

Download final7.zip from Download Handout link and use mongoimport to import the collections in albums.json and images.json.

When you are done removing the orphan images from the collection, there should be 89,737 documents in the images collection. 


Hint: you might consider creating an index or two or your program will take a long time to run. As as a sanity check, there are 49,887 images that are tagged 'sunrises' before you remove the images.
What are the total number of images with the tag "sunrises" after the removal of orphans?


Options:

  • 47,678 
  • 38,934  
  • 49,932 
  • 45,911 
  • 44,787     
Solution: 

 Answer is 44,787

 Using .net driver for mongodb, you can remove the orphans as below


                var client = new MongoClient();
            var db = client.GetDatabase("exam");

            var images = db.GetCollection<Image>("images");

            List<Album> Albums = await db.GetCollection<Album>("albums").Find("{}").ToListAsync();
            List<Image> Images = await images.Find("{}").ToListAsync();
            int count = 0;
            List<Image> lstOrphanImages = new List<Image>();
            foreach (Image img in Images)
            {
                bool isFound = false;
                foreach (Album album in Albums)
                {
                    if (album.images.Any(x => x == img.Id))
                    {
                        isFound = true;
                        break;
                    }
                }

                if (!isFound)
                {
                    count++;
                    lstOrphanImages.Add(img);
                    await images.DeleteOneAsync(x => x.Id == img.Id);
                }
            }

And then find total number of images with tag "sunrises" after removal of orphans

> db.images.find({tags:'sunrises'}).count();

3 comments:

  1. how about a simpler solution?

    // Collection
    MongoCollection albums = database.getCollection("albums");
    // Images
    MongoCollection imgColl = database.getCollection("images");
    FindIterable images = imgColl.find();
    for (Document image : images) {
    Document album = albums.find(new Document("images", image.get("_id"))).first();
    if (album == null) {
    imgColl.deleteOne(image);
    }
    }

    ReplyDelete
  2. static void RemoveOrphans()
    {
    //Mongo client
    var client = new MongoClient("mongodb://localhost:27017/local");
    var db = client.GetDatabase("local");

    //Collections
    var albumsCol = db.GetCollection("albums");
    var imagesCol = db.GetCollection("images");

    //Distinct image_ids from the array element in albums collection
    var idsFromAlbums = albumsCol
    .Find(d => true)
    .Project(d => d.ElementAt(1))
    .ToList()
    .Select(d => d.Value.AsInt32)
    .Distinct();

    //all ids from the images collection
    var ids = imagesCol
    .Find(d => true)
    .Project(d => d.ElementAt(0))
    .ToList()
    .Select(d => d.Value.AsInt32);

    //orphan ids
    var idsToDelete = ids.Except(idsFromAlbums);

    //delete orphans
    var filter = Builders.Filter.In("_id", idsToDelete);
    imagesCol.DeleteMany(filter);

    }

    ReplyDelete
  3. Hello.

    You also can do it with aggregation:

    db.albums.aggregate([
    { $unwind: "$images" } ,
    { $lookup: { from: "images", localField: "images", foreignField: "_id", as: "images_new"}},
    { $unwind: "$images_new" } ,
    { $group: { _id: "$images_new._id", tags: { $first: "$images_new.tags" } }},
    { $match: { tags: { $in : [ "sunrises" ] } } },
    { $count : "sum"} ])

    ReplyDelete