Question 7:
You have been tasked to cleanup a photo-sharing database. The database consists of two collections, albums, and images. Every image is supposed to be in an album, but there are orphan images that appear in no album. Here are some example documents (not from the collections you will be downloading).
> db.albums.findOne()
{
"_id" : 67
"images" : [
4745,
7651,
15247,
17517,
17853,
20529,
22640,
27299,
27997,
32930,
35591,
48969,
52901,
57320,
96342,
99705
]
}
> db.images.findOne()
{ "_id" : 99705, "height" : 480, "width" : 640, "tags" : [ "dogs", "kittens", "work" ] }
From the above, you can conclude that the image with _id = 99705 is in album 67. It is not an orphan.
Your task is to write a program to remove every image from the images collection that appears in no album. Or put another way, if an image does not appear in at least one album, it's an orphan and should be removed from the images collection.
Download final7.zip from Download Handout link and use mongoimport to import the collections in albums.json and images.json.
When you are done removing the orphan images from the collection, there should be 89,737 documents in the images collection.
Hint: you might consider creating an index or two or your program will take a long time to run. As as a sanity check, there are 49,887 images that are tagged 'sunrises' before you remove the images.
What are the total number of images with the tag "sunrises" after the removal of orphans?
Options:
Answer is 44,787
Using .net driver for mongodb, you can remove the orphans as below
And then find total number of images with tag "sunrises" after removal of orphans
You have been tasked to cleanup a photo-sharing database. The database consists of two collections, albums, and images. Every image is supposed to be in an album, but there are orphan images that appear in no album. Here are some example documents (not from the collections you will be downloading).
> db.albums.findOne()
{
"_id" : 67
"images" : [
4745,
7651,
15247,
17517,
17853,
20529,
22640,
27299,
27997,
32930,
35591,
48969,
52901,
57320,
96342,
99705
]
}
> db.images.findOne()
{ "_id" : 99705, "height" : 480, "width" : 640, "tags" : [ "dogs", "kittens", "work" ] }
From the above, you can conclude that the image with _id = 99705 is in album 67. It is not an orphan.
Your task is to write a program to remove every image from the images collection that appears in no album. Or put another way, if an image does not appear in at least one album, it's an orphan and should be removed from the images collection.
Download final7.zip from Download Handout link and use mongoimport to import the collections in albums.json and images.json.
When you are done removing the orphan images from the collection, there should be 89,737 documents in the images collection.
Hint: you might consider creating an index or two or your program will take a long time to run. As as a sanity check, there are 49,887 images that are tagged 'sunrises' before you remove the images.
What are the total number of images with the tag "sunrises" after the removal of orphans?
Options:
- 47,678
- 38,934
- 49,932
- 45,911
- 44,787
Answer is 44,787
Using .net driver for mongodb, you can remove the orphans as below
var client = new MongoClient();
var db = client.GetDatabase("exam");
var images = db.GetCollection<Image>("images");
List<Album> Albums = await db.GetCollection<Album>("albums").Find("{}").ToListAsync();
List<Image> Images = await images.Find("{}").ToListAsync();
int count = 0;
List<Image> lstOrphanImages = new List<Image>();
foreach (Image img in Images)
{
bool isFound = false;
foreach (Album album in Albums)
{
if (album.images.Any(x => x == img.Id))
{
isFound = true;
break;
}
}
if (!isFound)
{
count++;
lstOrphanImages.Add(img);
await images.DeleteOneAsync(x => x.Id == img.Id);
}
}
var db = client.GetDatabase("exam");
var images = db.GetCollection<Image>("images");
List<Album> Albums = await db.GetCollection<Album>("albums").Find("{}").ToListAsync();
List<Image> Images = await images.Find("{}").ToListAsync();
int count = 0;
List<Image> lstOrphanImages = new List<Image>();
foreach (Image img in Images)
{
bool isFound = false;
foreach (Album album in Albums)
{
if (album.images.Any(x => x == img.Id))
{
isFound = true;
break;
}
}
if (!isFound)
{
count++;
lstOrphanImages.Add(img);
await images.DeleteOneAsync(x => x.Id == img.Id);
}
}
And then find total number of images with tag "sunrises" after removal of orphans
> db.images.find({tags:'sunrises'}).count();
how about a simpler solution?
ReplyDelete// Collection
MongoCollection albums = database.getCollection("albums");
// Images
MongoCollection imgColl = database.getCollection("images");
FindIterable images = imgColl.find();
for (Document image : images) {
Document album = albums.find(new Document("images", image.get("_id"))).first();
if (album == null) {
imgColl.deleteOne(image);
}
}
static void RemoveOrphans()
ReplyDelete{
//Mongo client
var client = new MongoClient("mongodb://localhost:27017/local");
var db = client.GetDatabase("local");
//Collections
var albumsCol = db.GetCollection("albums");
var imagesCol = db.GetCollection("images");
//Distinct image_ids from the array element in albums collection
var idsFromAlbums = albumsCol
.Find(d => true)
.Project(d => d.ElementAt(1))
.ToList()
.Select(d => d.Value.AsInt32)
.Distinct();
//all ids from the images collection
var ids = imagesCol
.Find(d => true)
.Project(d => d.ElementAt(0))
.ToList()
.Select(d => d.Value.AsInt32);
//orphan ids
var idsToDelete = ids.Except(idsFromAlbums);
//delete orphans
var filter = Builders.Filter.In("_id", idsToDelete);
imagesCol.DeleteMany(filter);
}
Hello.
ReplyDeleteYou also can do it with aggregation:
db.albums.aggregate([
{ $unwind: "$images" } ,
{ $lookup: { from: "images", localField: "images", foreignField: "_id", as: "images_new"}},
{ $unwind: "$images_new" } ,
{ $group: { _id: "$images_new._id", tags: { $first: "$images_new.tags" } }},
{ $match: { tags: { $in : [ "sunrises" ] } } },
{ $count : "sum"} ])