I use mongoDB, but I’m open to suggestions if other DB will be favored for this kind of project.
I build an app that has to track prices of many products. There is an obvious ProductSchema that has id, name, image(path) and perhaps a price field. However, how to define the prices field? use an array of objects with key-value with (date: dd/mm/yy, price: xx.xx) ?
What would be the best way to implement this?
The end-goal is to be able to track price changes, compare, analyse and whatever…
Sorry I don’t know mongo. My answer is for relational darabase.
You need a many to many relationship. Product table, date-price table and price-table
Your primary key for product table will be a foreign key in the price-date table. Your price table primary key will be a foreign key in price-date table. The primary key will make date-price table unique. Prices will be unique, and product will be unique, price, product, date will be unique.
You don’t want date and price in the product table. It will create redundant products.
The reason for relational databases is to not have redundancy, eliminating errors.
I probably settle for the next solution, haven’t tested it yet though.
Essentially is to use 2 Schema basically, a child and a parent schema like follows:
const priceSchema = new Schema({
price: { type: Number, required: true },
date: new Date
})
const productSchema = new Schema({
name: { type: String, required: true },
prices: [priceSchema],
url: { type: String, required: true },
dateAdded: new Date
})
Well, something like this, sorry for typos if there are any.
I know that as long as single document doesn’t exceed 16MB it’s fine, and really I don’t see this document growing even close to 16MB. I think this isn’t the best solution and there should be a better way to handle this from within a single defined Schema.
When using mongodb, your welcome to use a more “relational” approach as mentioned above. But unlike a relational database you could use a middle-ground, where you de-normalize the data and save the changes for a given product within the product. (what you suggested initially)
So if your use-case is for say a grocery store, odd’s are most products prices won’t change often, but if the prices of apples changes daily, and you need to keep every single change since the end of time then you can’t go this route and should go the “relational” route. (what @helen1 suggested)
This is definantly where the noSql opens the door for differnet implementations based upon your use-case, just make sure you ask your-self what will happen in the worst possible use-case to make sure you don’t go down the wrong path.
PS. I don’t use mongoose, but I have experience with mongoDB, and relational database design.
In regard to how often do I need to update the price, I’d say once a week, BUT - as you mentioned worst use-case would be update daily. Is that significant enough to use a Relational Database?
If the answer is most likely, then why? Because of the danger that document will grow beyond 16MB (which is not recommended)?
I thought a little more about the use-case and the idea that creating an array of prices within an object is a bad idea, and I’ve come to the conclusion it probably would be fine for the foreseeable future.
But I think we need to expand the scope of the requirements to further evaluate which direction we should take.
For example, if you wanted to know the average price of all items in a shop, it would be easy if it were in its own collection, but you’d need todo a less efficient, and more complex query if you had prices nested within each item. But I don’t know if you even thought about this requirement (it might be one, or never could be one), but its worth considering what and why you need to save all the previous prices, and model the database accordingly to optimally fit most of the use-cases, instead of following some abstract idea of what is “right”.
As such without knowing more of the requirements the “right” choices is more or less up in the air. Unlike SQL we can model the database more or less based upon the use-case, instead of forcing the user-case “into” a format that makes sense for the database
If you had an SQL database you have more or less no choice of how todo this (you have a separate table of all the prices) but since were in NOSQL land, we have a choice.
Perhaps I will put my intentions & goals for the application.
The app tracks prices of user chosen products, and notifies the user if the price has decreased.
The tracking is done via web scraping. So user provides the product’s URL, the app parses the URL and saves the product to the Database.
Next, what has to happen is that the App checks every … say week for the price, and saves it to the Database and if the price is lower than last week - it notifies the user by email.
While it is possible to ONLY save the price to the database when the price is actually lower, I think a future need might arise like for example analyzing product’s pricing over 1 year (idk)…
So with these new requirements a few things stand out to me
Weekly updates
Comparison with the previous week check
Possible future data analyzation.
So the first part is what we knew before, as product prices would be changed somewhat often, but not crazy often, and technically could stay under the 16mb limit for the foreseeable future if you nested it.
But the other two parts are what I’d consider more in deciding factors on how to setup the DB. Namely to do the comparison you only need the previous week value and the current week value (which you get thru web scraping) as such you will probably want to get the last value for a given item as efficiently as possible (IE you dont load every single item into nodejs/memory and run find on it)
I feel like there are more capabilities to have the prices separated into its own schema to get some optimizations to get the last value than if you nested the values.
On-top of this, the third potential requirement (doing some data analyzation) over all the product prices probably wont be performed often, so the performance impact of finding the prices for a given product against all products shouldn’t be an issue, and there are a number of ways to optimize this query, and be used perform simpler queries against all prices for any number of products/product types. (lost of potential here :D)
So to sum it up i’d say just create a different schema and leverage more of mongodb utilities and use a good old fashion lookup to get the prices for your production. The advantages with two schemas out weights the advantages of nesting the data I believe
PS. There is a reason why SQL is still as useful as NOSQL, and remains popular. It’s because it’s structured data still works for most use-cases, and it’s optimized for said use-cases. And so you could of done this in SQL with the same setup and reaped the same benefits. Not saying you should switch over right now, just wanted to give some “food for thought” for the future