"The evolution of the digital and mobile domains has brought about a new variety of technology companies which are primarily concerned with handling large volumes of data - in the order of magnitude of Big Data. Such companies may only employ a dozen employees - but may handle tens or hundreds of millions of users each day, which provides a Big Data challenge for such companies" said Golan Nachum, owner and CEO of Twingo, in a joint interview with Ilya Golman, CTO of the company.
"Talk of Big Data has greatly accelerated in recent years, so much so that today there is hardly any technology provider or integrator which does not benefit from the current market buzz around Big Data technologies. For us, however, Big Data is not a new discovery. Over the past four years, we at Twingo have not been merely talking about Big Data – we have been delivering Big Data. For us this is not a theoretical issue - this is day-in, day-out delivery of technology consulting and deployment with compnies in the fields of internet, telecom, video, digital advertising, mobile, cyber, gaming, chips and more," said Golan Nachum, owner and CEO of Twingo in a joint interview with Ilya Golman, CTO of the company.
Twingo is a leading consulting and integration provider in the field of Big Data and Business Intelligence (BI). To date, the Company has delivered dozens of complex Big Data projects, primarily for technology companies and start-ups in the digital and mobile domains. Twingo is a distributor of the Vertica database designed for Big Data – including sales, consulting, integration, administration and training for HP Vertica infrastructure in Israel. The Company also partners in Israel with MicroStrategy, developer of a custom BI and analytics platform for the Big Data world and with MapR, developer and distributor of Big Data and analytics solutions based on Apache Hadoop.
Golan, how does the Big Data world look from the viewpoint of new technology companies?
The traditional distinction between enterprise and SMB organizations no longer holds in the new world of data. A company with but a dozen employees can generate information today amounting to billions of records every day, which poses Big Data challenges – primarily to data collection, retrieval and analysis. The evolution of the digital and mobile domains has brought about a new type of technology companies – virtual shops, mobile applications, digital advertising platforms, location services, recommendation engines, customer-focused advertising, social networks and more. These are characterized by core operations that involve generating data and the need to analyze it. These are companies where data is generated automatically - whether directly by users or by automated systems for data monitoring and collection. These companies also typically address the global market - major global customers, or they may operate digital platforms which deliver a service used by users around the world. New technology companies may have very few employees, but they generate enormous volumes of data - from hundreds of millions to tens of billions of records per day - all in very short time frames. They can reach hundreds of millions of users around the world and their challenges regarding information collection and analysis often exceed those of large enterprise companies in the traditional economy.
What are some of the hot Big Data challenges you see today?
"The volume of information generated by new technology companies and the need to efficiently handle this information to derive added value for the business - this is the major challenge which companies face today. Beyond the volume of information, there is also the issue of time – once a record is created, it should be in the data warehouse within a short time frame, from a few seconds to one minute, and this information is required for real time (or near real time) analytics which account for not only the most recent data - but for all the historical data, which may be of enormous volumes, up to many Petabytes of information. These systems need to be able to load large volumes of data and to query data at the same time, especially since in this world of data there is no longer an option to carry out operations during a night-time window, with fewer users on the system. The volume of information and the need to immediately handle it does not allow the organization the privilege of using a night-time window, which we are currently familiar with for information processing at traditional organizations.
Another challenge lies in the need for system scalability. System must be dynamic, capable of simply growing without production systems and the organization's core being disabled due to system growth and information processing. We also see the need of these new organizations to query enormous volumes of data and to have the query return an answer within a fraction of a second or up to a few seconds - even for complex analytical queries. Another major challenge which these organizations face lies in the need for survivability of Big Data systems; when you serve tens or hundreds of millions of users, the new technology company does not have the luxury of any shut-down or failure of their production systems. The organization must continue with business as usual - load data, run queries etc. Therefore, Big Data systems must replicate information to multiple servers and perform automated backup to survive any malfunction or shut-down."
Ilya, you are often in the field, deploying Big Data projects for organizations; what do you see as the major challenges for such projects?
It is important to understand that handling information at Big Data volumes is unlike handling of traditional information which we have previously been used to. An organization involved in Big Data receives, every second, more information than all of the information in a major library - and such organizations collect such data for days, weeks, months - even years. Therefore, the approach to information in the Big Data age differs from that of traditional organizations. The key to approaching information in a Big Data project is the need to plan ahead. Due to the large volume of information, even the simplest action must be planned in advance and carried out according to the plan. The technology selection process accounts for 80% of the project success.
In addition to planning, you must test each process step in the project. With traditional projects, the new system is typically tested at the end of deployment, once development and deployment have been carried out using a small volume of data. In the world of Big Data, this is no longer good enough. At the very outset, you need a large set of data for testing each stage of development and deployment and for testing whether the data processing system is performing optimally, the algorithm developed for information processing is efficient enough, the data is correctly
One of the things we saw in the field is that organizations who have failed to plan the project ahead of time and did not carry out testing during the project - have reached the final stage very far removed from where they were aiming for. Another recommendation has to do with the organization's need to use experts, who are intimately familiar with Big Data technologies and who have a track record in this field. This Big Data technology is dissimilar to anything we have seen before - in terms of business requirements as well as technology requirements - and requires a customized response to challenges and requirements.
Where is Big Data headed going forward?
We see this technology constantly evolving. Today we can collect the information, but technology evolution in the near future would allow us to get more from the information; analysis by machines would be based on improved machine learning capabilities, resulting in better analysis than is currently available. Optimization would be improved; take for instance the internet – users would benefit from services better customized to their needs and organizations could offer a higher level of