Defining Scope, Sources and Structure
Defining Scope
To start out almost any project, especially those projects we are going to be overseeing and building from the beginning to end, we need to define the initial scope. More than that, we need to take the questions we’ve settled on answering and narrowly define our project’s scope.
The physical world is one of limits and bounds. I, as a physiologically normal human being, have only so much time in a day to work, only so much in the way of personal resources. My computer system is likewise bound by physical reality, time and its own electronic resources. All this to say, I cannot expect to do something as expansive as “let’s examine the entire industry on a daily level for the last 10 years.”
So even before we write any code (pseudo or otherwise) we need to hammer out what we need and how we are going to go about getting it.
We have three questions to serve as guides to our scope:
- Which competitors sell what feeder sizes?
- We need competitors (though not all of them of them, as the feeder industry is somewhat decentralized and heavily predicated on local providers)
- We need size info, and likely a way of standardizing or normalizing the various sizes, weights and ranges that feeder companies make. What company A calls “small” may not overlap with what company B calls small.
- How do prices vary by size, species and packing format?
- We, of course, need prices (and some feature that combines a normalized size with a price, like a normalized united pricing)
- Give that QCFS is only looking at rats right now, we can narrow in on only rats for now, as this general framework can work for any species after we have a schema set up.
- Which competitors look premium, budget or bulk-oriented?
- We’ll need to find an average cost per unit to define what counts as “premium” (20% higher perhaps?) or budget/bulk (20% lower, more per package maybe)?
So with this information in hand, we have a good idea of what we need.
- Between 5 and 10 companies to examine
- Frozen rats as a product
- Weekly or monthly snapshots
- Price and possibly availabilty tracking
And with this in mind we have the beginning of a schema forming
- Company Name
- Website
- contact information
- Product Name
- species
- size
- Quantity (package size)
- Unit Price
- some sort of date/timestamp
Defining Sources
Because of the overall nature of the larger players in the feeder industry, there is a small mix of data sources for us to use. We can use competitor product pages (direct sales from their websites, typically), or we could use third-party pages (like Amazon or eBay) similarly. We could probably leverage some use of the internet archives as well to track pricing changes over time in the past.
It’s easy to focus on things that are publicly accessible when those are the main ways that this industry interacts with its clientele.
So with all this in place, we have enough to start acutally working on things that actually touch code!