Leading by Game-Changing Cloud, Big Data and IoT Innovations

Tony Shan

Subscribe to Tony Shan: eMailAlertsEmail Alerts
Get Tony Shan: homepageHomepage mobileMobile rssRSS facebookFacebook twitterTwitter linkedinLinkedIn


Blog Feed Post

Big Data Characterized

A recent survey by Evans Data Corp. of Santa Cruz, California found that more than one-third of the people surveyed said the overall size of their organization’s data stores would grow by more than 75% over the next year. Nearly 15% indicated that storage requirements would more than double. 71% of respondents specified that they require advanced processing more than half the time. The findings are in line of the new trend of real-time event processing for Big Data. Hence I complied the secondary Vs to further characterize Big Data:

  • Viscosity: Viscosity is a measure of the resistance encountered in the mass data in the flow. The resistance is from the friction from integration flow rates, different sources of data origination, and transformation required to convert the data into information.  Efficient messaging systems like Kafka provides stronger ordering guarantees in the persistent high-throughput message queue. Streaming technologies like Storm can enable distributed continuous processing of incoming data in real time. Sophisticated CEP engines further strengthen the rule-based event-driven processability of Big Data in support of things like PMML.
  • Virality: Virality is the ability of data to be distributed over networks, measuring the speed of dispersion across peer-to-peer networks. Time and number of crosslinks are vital factors that determine the spreading rate. CDN is a type of large distributed systems to server contents to end users in high performance and availability. P2P-assisted streaming technologies are leveraged in online video by vendors like Netflix. 
  • Vigilance: Project teams need to be watchful for the traps and pitfalls in the Big Data implementations. Not just a handful of organizations have deployed Hadoop extensively in an attempt to process data in real time, not realizing that Hadoop was constructed for batch processing by design. Users must be careful to deal with the data in motion and data at rest. For example, one can leverage Lambda architecture to make full use of both batch- and stream-processing methods for massive quantity of data. Hybrid use of SQL and NoSQL is also advantageous, but be alert of difficulties in operation.

For more information, please contact Tony Shan (blog@tonyshan.com). ©Tony Shan. All rights reserved.

Read the original blog entry...

More Stories By Tony Shan

Tony Shan works as a senior consultant, advisor at a global applications and infrastructure solutions firm helping clients realize the greatest value from their IT. Shan is a renowned thought leader and technology visionary with a number of years of field experience and guru-level expertise on cloud computing, Big Data, Hadoop, NoSQL, social, mobile, SOA, BI, technology strategy, IT roadmapping, systems design, architecture engineering, portfolio rationalization, product development, asset management, strategic planning, process standardization, and Web 2.0. He has directed the lifecycle R&D and buildout of large-scale award-winning distributed systems on diverse platforms in Fortune 100 companies and public sector like IBM, Bank of America, Wells Fargo, Cisco, Honeywell, Abbott, etc.

Shan is an inventive expert with a proven track record of influential innovations such as Cloud Engineering. He has authored dozens of top-notch technical papers on next-generation technologies and over ten books that won multiple awards. He is a frequent keynote speaker and Chair/Panel/Advisor/Judge/Organizing Committee in prominent conferences/workshops, an editor/editorial advisory board member of IT research journals/books, and a founder of several user groups, forums, and centers of excellence (CoE).