Dear all!

I wanted to write a few comments regarding the OpenStack analytics that we published on stackalytics.com and that got quite a bit of attention from the community. 

First and foremost I want to underscore that the intent of this initiative is to provide the community with full data transparency of who is doing what in OpenStack. We are of the belief that the contribution data should be tracked by the community exactly the same way OpenStack itself is being developed, in a completely open and transparent way, with clearly defined ground rules as to how the contributions are measured. 

When we announced Stackalytics about 10 days ago (http://www.openstack.org/blog/category/measurement/), we were looking to receive community feedback we could incorporate into the initial code that we are releasing to StackForge later this week. Based on this thread, so far so good! 

So to reiterate, we developed Stackalytics with the following guiding principals in mind:

1. be completely transparent to the community, developed in open source, just like OpenStack itself.

2. be extremely user friendly and able to provide valuable information both to the insiders tracking their own statistics as well as to the outsiders looking to understand who the real authors of OpenStack are.  

3. focus on well-defined set of metrics of what we measure (right now it is LOC and Commits with more measurements like number of reviews, etc. still to come), where the data is coming from, and the well-defined ground rules of how these measurements are calculated (e.g. we already seem to have a consensus that the auto-generated code should not be counted into the LOC count, etc.).

4. set a well-defined process of challenging any metric that should be embraced by the community using the same general process used in Gerrit code review (+1, -2, etc). 

5. provide a discussion forum around Stackalytics allowing anyone to raise their issues or concerns.

I believe that with these guiding principles very soon we will develop a uniform statistics engine that will truly represent what OpenStack is all about -- an open community effort. The fact that there is already some controversy about it (thank you Josh McKenty for pointing out that the Dreamhost contribution of renaming the Quantum project caused their LOC count to soar) means that 

a) people already started to follow Stackalytics
b) that we have a way to improve on the way we measure the data to make it meaningful.

As one of the marketing gurus once told me, "no data is reliable until you rely on it. "

Thanks for the feedback and look forward to more.

Alex Freedland
Co-Founder and Chairman
Mirantis, Inc.


On Mon, Jul 8, 2013 at 3:47 PM, Stefano Maffulli <stefano@openstack.org> wrote:
On 07/08/2013 08:24 PM, Joshua McKenty wrote:
> I believe that the OpenStack marketing community sees comparisons to
> other open source cloud frameworks as significant competitive
> positioning. Accuracy in that data would be valuable to the whole community.

If you're arguing that Activity Board should include some data from
other cloud frameworks let's discuss what questions you'd like to see
answered/what raw data. Keep in mind that comparing different projects
is like comparing oranges and apples: cloudstack and openstack are not
comparable. The work that Qingye Jiang does is IMHO valuable when it
highlights trends across different metrics for separate projects but it
opens to all sorts of criticisms when it creates indexes like the
Activeness Index and when it compares absolute numbers across projects
(for example, the way openstack uses its -dev mailing list  is different
than cloudstack's making the comparison irrelevant; neither you can
compare discussions on gerrit with mlist traffic).

I wouldn't want the Foundation to produce anything like a comparative
analysis for public consumption. IMHO public comparative reports would
create way too much noise and risk of distracting our marketing
resources.

For internal reports I'd be open to start tracking some significant
metrics from other projects: let me know which ones you care about and
I'll be happy to work on producing a periodic report for staff and board.

> I *know* that a number of OpenStack member companies use their
> "position" in terms of ATC contributions as a marketing point, and
> having an accurate baseline for those numbers might also be valuable.

All that data is public on the OpenStack Activity Board: data may be
wrong though and if you spot mistakes please let me know so I can
correct them. How companies decide to use public data gathered from
gerrit, git/github etc is their decision to make.

> For example, DreamHost has suddenly become the most substantial
> contributor to Quantum *ever*. :)

I see the smile ... but for the record, your link refers to a report
limited to havana only and counts 'Lines of code' (added/removed? not
clear) which is a very poor metric when quoted out of context: I'm sure
you know and I'd expect to count on people that know for not quoting
such data point out of context.

> As for myself, I often use the count of individual members, corporate
> members, and total committers in sales and marketing materials - and
> I've found a number of discrepancies in the user database that I find
> concerning (duplicate names, etc.).

BI is hard :) At the moment the database of people+affiliation as
cleaned up by Bitergia is what I consider the most reliable produced by
the Foundation. It's built by merging the Foundation db and the lists
included in the git-dm tables and some extra manual cleanup.  I can have
that one published if you think it's needed. You can also look at the
JSON files and the database dumps linked from
http://activity.openstack.org/dash/browser/index.html which are the
results of elaboration. We can discuss on -dev under the [metrics] topic
more about the technical details.

> Solid, official data is valuable for
> everyone - and I think inviting these other projects to join the
> activity board effort, by making it an openstack project itself, could
> be a great way to get there.

Definitely, I have already invited Mirantis to join the current efforts.
I'm waiting to see their code in order to judge if and how it can be
merged with the Activity Board. I definitely like their UI, although it
has less dimensions than I need to see.

I always loved the idea of having *one* place for all OpenStack-related
data and I've learned that no matter what I wish, there will always be
somebody with his/her own itch to scratch who decides to create a new
source of data and reports.

/stef

_______________________________________________
Marketing mailing list
Marketing@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/marketing