MongoDB/Mongoose schema - to embed or not?

I am working on two different projects right now and I encountered the same dilemma in both. I have some documents that probably should be embedded in another document, but in both cases that is causing me a headache.

Project 1 - some sort of discussion forum. I have posts and comments on those posts. I thought it would make sense to embed the comments in the post document, but I ran into problems when I tried getting a list of all the comments made by a user. It probably wouldn’t be a very frequent request (not as frequent as getting all the comments for a post), but it seemed nearly impossible. I ended up de-embedding the posts and doing a .populate(‘comments’) when loading a post. It works, but I am worried that it might eventually become too slow.

Project 2 - The voting app. I embedded the options inside the survey. I really don’t see a case where I would need the options outside of the context of the survey. So embedding would make sense, right? Now I am trying to make a post request to update an option when someone votes. Getting to the embedded option seems overly complicated. Even a request to find the option will return the survey, and then I need to loop through that survey’s options array to get the option. Is that right? Or is there a better way to do this?

Is embedding ever a good idea??

Ok, I actually created my request for the voting app and it wasn’t that bad. I still wish there was a better way to find an embedded document by its id, but I guess that’s not how MongoDB works.

I am still wondering about the other project. Should I embed comments or not?

I’m having a bit of trouble picturing what you mean :slight_smile: Could you post a code example?

My voting app db entries look like this:

{
    "_id": {
        "$oid": "57beee75e69a4e06686fb7de"
    },
    "poll": "Who is your favourite superhero?",
    "creator": "email@example.com",
    "createdTime": "2016-08-25T13:10:33.113Z",
    "options": [
        "Spiderman",
        "Batman",
        "Black Widow"
    ],
    "voteCount": [
        0,
        1,
        2
    ],
    "voters": [
        "email@example.com",
        "email2@example.com",
        "email3@example.com"
    ]
}

So it’s pretty flat…most of the internal logic relies on array indicies and array lengths :slight_smile:

This is a pretty hacky approach, I suspect, and probably doesn’t scale that well…

for my voting app. Not as flat as yours… I still don’t have a good grasp of how to flatten things.

var SurveySchema = mongoose.Schema({
  title:{
    type:String,
    required:true,
    trim:true,
    minlength:1
  },
  _creator:{
    required: true,
    type: mongoose.Schema.Types.ObjectId
  },
  options:[
    {
      text:{
        type:String,
        required: true,
        trim: true,
        minlength:1
      },
      count:{
        type:Number,
        required: true,
        default: 0
      }
    }
  ],
  contributors:[
    {
      type: mongoose.Schema.Types.ObjectId
    }
  ],
  endDate: {
    type: Date
  }
})

I was actually able to make this work and update the embedded document.

The main problem I have now is with my forum application (actually a code review application). It has code documents, and those documents have comments.

Here is what my embedded schema looks like:

Code = {
  _id: ID object,
  author: ID object,
  tags:[Strings],
  date_submitted: Number,
  date_edited: Number,
  date_commented: Number,
  open_for_review: Boolean,
  text:{ 'The content of this object will depend on the code editor tool we use' },
  comments: [
    {
      _id: ID object,
      author: ID object,
      date_submitted: Number,
      date_edited: Number,
      text:String (or object?),
      is_general: Boolean,
      position:{'Object: required for inline comments. For general comments, should be null'},
    }
    ]
}

The problem I have with this one is when I try to generate a list of all the comments created by a user.

I actually removed the embedding and just have a list of ids in the comments array. It makes my requests simpler, but I am worried that if I end up with a lot of comments, loading a code document with all its comments will be really slow.

Is that clearer?

If I’m getting your question right, you’re looking for a better way to access comments for a particular user.

As you scale your app, this would become tricky with your current model because you would have to search through every “code review” document to find comments by user and then display the results. If you have N documents and each document has M comments, you would be querying with N * M.

What if instead you saved comments on your user model? This would duplicate the data on the user object, but would save processing time. Say you had an array of comments attached to each user instance and push to that array with each post. You can store a unique ID or link on each comment so users can click on the comments and go straight to the appropriate thread. So in your POST request to your server, you would have an additional step where the data gets saved to the user. The result is that it would be easier, faster to pull user comments.

One other thing you might look into is using Mongoose population which may be helpful here where you have this relationship between comments, projects, and users. With “ref” you can for example reference the user model from within a “code review” model. Then when you query your DB, you can use the populate method to query documents from one collection that also contain data from another collection. It takes awhile to get your head wrapped around it (I’m still learning as well) but a pretty cool feature. http://mongoosejs.com/docs/populate.html

Hope this helps, let me know if you have questions!