Early this April, SenseTime (商汤科技) announced a fresh funding round of $600 million, just nine months after its then record-breaking (in the AI scene) Series B round of $410 million. The latest financing brought in new shareholders including Alibaba, Suning (苏宁) and Singapore’s sovereign fund Temasek.
Recently, XU Li, SenseTime’s CEO, sat down for an exclusive interview with 36Kr, a Chinese biztech media and parent company to KrASIA, in which he talked about the company’s future strategies, the competitive landscape in the field of computer vision this year, as well as what he thinks a world that’s increasingly driven by data will look like.
The following is an English translation of the transcript of the interview. Some parts have been edited by 36Kr and KrASIA for brevity and clarity.
$600 million on two things
Q: The rumour of Alibaba’s investment began last year. When did you actually start discussing the possibility of an investment deal?
A: We had business contact with Alibaba Cloud late last year over its smart city project, and the investment is actually based on the fact that we two are strategically complementary. Alibaba Cloud needs our expertise in visual analysis to push ahead with its City Brain project,
Q: Last year saw a couple of computer vision unicorns raise hefty sums. Will there be a shift in investors’ attitude this year?
A: At this stage, working capital has become a crucial factor for AI companies. I think capital will eventually flow to the few companies at the top. This is in fact the case with many industries: The top player gets to rake in high profit, the second much lower, and the third hardly profitable.
Q: How do you plan on spending the money raised?
A: I think you can liken what we are doing now to putting money in the bank. What matters at this stage is setting up barriers to ensure that we can collect higher-than-average interest in the future. We will spend the money on two things: developing our product mix and pooling resources along the value chain.
It’s impossible to cover the entire market with a single product, so what we are offering is a product mix. For example, we may customize different facial recognition gates for office buildings, airports and hotels.
Pooling resources along the value chain is made possible when you have enough capital. And the rest is building the underlying infrastructure. This is what constitutes the core competency of our company. It affects our efficiency in developing and upgrading algorithms and allows us to finish in half an hour what others may take days or weeks to do.
Q: SenseTime has been investing in other companies that it thinks may help promote the application of its technology in vertical areas. What has it invested in so far?
A: We have invested in about six or seven companies, all of which operate in areas our technology can be applied in. Some are our clients. By buying their shares, we are able to collaborate with them more efficiently, and they all have their own products, which can become stronger when combined with our technology. For example, in the field of AR and VR, we invested in 51VR, and in the area of security, we joined hands with Terminus Technologies.
Q: What targets does the company have for this year?
A: We need to transform our operating model. SenseTime started out selling algorithms and later shifted to selling SDKs because our algorithms were not mature enough then and SDKs can be sold at scale.
Then we ran into a new problem: While it’s a viable business model to sell software licenses in countries like Japan, Chinese enterprises are not used to spending too much on software.
They don’t think it’s worth it to buy a software package with tens of millions of yuan. More often than not, they would bargain with suppliers and ask them to cut the price, say, from 60 million yuan to 50 million yuan. They may demand heavier discounts the second year on the grounds that it seems the suppliers have offered little service during the first year. Chinese clients feel more comfortable spending money on physical equipment.
Therefore, faced with the prospect of a profit squeeze, this year we are focusing on bringing our upstream and downstream technology to the next level so as to strengthen our competitiveness before it’s too late. We are looking for some investment or merger opportunities in the downstream segments of the value chain to boost our strength. We are not content with providing services in certain areas alone.
Q: What will be different after this transformation? Can you give us an example?
A: For example, in the field of security, we may sell just SDKs. We could also work together with system integrators and downstream partners to provide comprehensive industry solutions. For clients, such products are more acceptable. We can build a business ecosystem of our own if we pick the right partners and expand our presence along the value chain. Putting the focus on building a business ecosystem at the early stage pays off. It’s the best decision we’ve made since we went commercial.
Reason for starting with the underlying technology
Q: You once said that SenseTime is one of few companies controlling original AI technology, and that in the long run, data and algorithms follow Moore’s law. Can you expand on that?
A: Back when I was a college student, I wrote on my resume “proficient with operating systems” and was asked to explain during an interview. I said that I could use every command with ease. The interviewer burst out laughing and told me it should be “proficient with using operating systems.”
From PC (Linux) to mobile phones (Android) to AI, the source code for operating systems has long been controlled by companies like Google and Microsoft, which are building their own ecosystems. It’s true that using their trained and proven systems can save you investment in hardware and infrastructure, but there will be little you can do if their algorithms turn out to be not good enough or deviate from where you are going. This is the problem with everyone using the same system. That’s why people say data is the most important .
The core competency of AI companies should be a better “brain” (operating systems). That’s why SenseTime has kept investing to build training platforms. It’s a process of setting up barriers to prepare itself for when the market turns from a blue ocean to a red ocean . There is only one way to stay original: make sure that your pace of innovation is faster than others. You’ll miss the opportunity if you wait.
Q: What advantage does having your own underlying technology bring to your company with respect to user experience?
A: Take phone cameras for example. Regular dual-camera algorithms are particularly demanding on image registration. Many manufacturers have therefore designed a holder for the two cameras, which costs a few dollars. But if you develop your own underlying technology, you can conduct end-to-end training from the beginning, which saves the need for holders. It’s like how human eyes work. They are actually not that demanding. When the cost of your products drops, users’ experience improves and your competitiveness increases.
Q: How much has SenseTime invested in its supercomputer center?
A: Our supercomputer center has over 8,000 GPUs now. We invest hundreds of millions of yuan in it every year.
Q: Why do you think it’s necessary to invest so heavily in the supercomputer center?
A: If we used others’ “brains,” for example, Google’s TensorFlow, and train algorithms with our own data, we wouldn’t have needed to invest so much in the supercomputer center.
But to develop our own training platform and “brain,” we must conduct numerous tests and use massive amounts of data to validate our models. That’s what makes our investment in computing power necessary. Outsiders may be impressed by the size of our early-stage investment, but that’s exactly what has enabled us to stay ahead of others.