Asked Google Tokuu and Developer Steve Chen

"Tokyo Tower Height", "Roppongi to Shibuya", "352 x 218" -Gogle has provided not only a ward search but also various functions through search boxes.On the other hand, it has adopted machine learning and has improved new functions and accuracy by photo and translation.

The Japanese version of "Google Assistant", which has been available sequentially from the 29th, has been mobilized and combined with all the technologies used in Google services.Interviews with the Google Assistant Developer Steve Chan, and Mr. Yuto Tokuo, General of Google Product Development Division, who was on stage in Japan.

――The Japanese version has finally been announced, but how about the use of Google Assistant in the United States first?

I send a message from Chen, make a phone call, go information, etc., so I can use it for a lot of purposes because it can be used immediately by mouth.It is often used.


――What were the most difficult issues during development?

There were a lot of difficulties (thinking a little), but Google Assistant makes it easy to use Google's functions, such as voice recognition, synthesis, road guidance and message transmission.Therefore, we needed to work with other teams.

Technically, it was difficult to accurately understand the questions of users.For example, if it is easy to understand, such as "tomorrow's weather", you can easily take measures.But in more detailed questions, how accurately can you understand what users really intend?This was a big challenge.

Even if you understand the user's intentions well, what do you do?For example, an assistant personality.If you make it a "machine" attitude, wouldn't it seem like you're dignified or cold?If you have the opposite posture, do you think, "Is this really reliable?"It was one challenge to create a solid personality that was solid and made to make it fun to talk.

――Is it possible to make a personality in other products?

In fact, when he created a Google Assistant personality this time, he had Ryan Germick, who is in the Google Doodle (also a mischief, in Japan), to Ryan Germick.DOODLE introduces famous people and events such as events in illustrations, but for users, I think it's a good thing to say "Google's face" and "protect the personality of Google".That's it.Ryan is a really interesting person, and I think that the personality of Google Assistant is a very good character because such a person supervised.

――That means that Google himself talks through Google Assistant.

Chen Yes, that's right.If Google itself is a human, it is a personality.

―― Apple names Siri, and Microsoft Office has dolphins, but it is Google itself.

That's right.Some services of other companies are nicknamed, but Google Assistant characters are Google itself, which has evolved in search so far.

――Why did it be controlled by voice?What is the importance of audio?

Mr. Chen That's a difficult question ... I'll talk about my own thoughts.

For example, as a computer operation method, the graphical user interface (GUI), which has been the mainstream so far, can access various information very quickly.

On the other hand, voice interfaces have a fixed speed of speaking, and there is a limit to telling them one by one.

But the voice (audio) interface is also an interface that can realize a truly magical mechanism.It is very easy to change the content in just one sentence.You can random access.For example, while talking about voice input, you can suddenly switch to a trip on vacation.

In the GUI, the number of buttons that can be displayed at one time is limited.In other words, the GUI is designed for humans to access information step by step.An easy -to -understand example is a directory type (hierarchical) service that existed in the past.To get to the information you really want to know, you will tap many times.

--I see.

In Chen search, you can access it in one shot if you put the word.This is the challenge of "understanding the intentions of the user accurately", but the function is also worthwhile to search by voice.

If you search for a hotel, you can just say "TripAdvisor, Tokyo Grand Hyatt (Grand Hyatt TOKYO ON TRIPADVISER)".But in the case of GUI, it's a flow of searching for accommodation search apps from the app list, tapping the icons, searching for the hotel name ...I think this is the power of the voice interface.

――You said he would cooperate with KDDI at the au summer model presentation on the 30th.What is the difference between other third -parties creating a Google Assistant -friendly app?

In Chen, the United States is working on an Actions on Google, which allows third -party to develop apps.The announcement with KDDI on the 30th has been clarified for the two companies, but maybe before the release (KDDI) may be able to access (KDDI's ACTIONS ON Google), and get feedback.You may do it.I think the IoT service au Home is very unique, so you may be able to gain knowledge that can be reflected on our platform.

--I see.では、Googleアシスタントそのものの未来について教えてください。短期的、たとえば今後半年くらいで、実現したいことはありますか?

Is Chen 6 months ...?First of all, provide in more languages?Our CEO's Sndar Pichai is from outside the United States, a global perspective, and believes that Google needs to provide services that can be used by everyone around the world.The Google Assistant has a higher quality in the English version, though the Japanese version has appeared.I want to make this the same quality in other languages.

--thank you.

――How do you get involved with the Google Assistant this time?

Mr. Tokuo: I have been involved in Google Assistants quite a bit, as I usually improve all the searches related to searches in Japan.

――What is the Google Assistant's heart from Tokuo's perspective?

Mr. Tokuo has been searching for it, but Google Assistant thinks it is an extension of the search.When you fully understand the intentions of the user and return the results properly, you can't just show the search results on the website.In addition to searching for information, there are also situations where you want to do business, such as calling or making reservations.I wonder if the natural evolution from the search is one of the Google Assistants.

In order to realize such a function, it is necessary to use not only search but also all Google products such as maps and YouTube.That alone is not enough, and it will be well combined with other companies' services.I think the "easy -to -talk entrance" that can do everything is the value of Google Assistant.

――How did you create a mechanism to interpret the intentions of Japanese sentences when providing a Japanese version?Was it made from zero?

Because it is a search base for Mr. Tokuu, it is not from zero, but it may not work well in the language, so there was such a response.English is easy to understand the breaks of words and sentences, but in Japanese, as in search, if there is space between words and words, processing in natural languages is not in English.I think.

If there is no subject, for example, the expression "I was hungry" would be "(I) tired today" or "(that person) was worn today".I haven't solved it yet, but I think it can be solved someday, and it's a challenge that is not in English.

――The processing of natural languages was not developed from zero.

There are some things that can be used from Tokuo's search, so there is no reason from zero at all.Nevertheless, Google Assistant was complicated by hanging the search, so there was room to move it so that it would not be broken or look for an extension for improving quality.

It is inefficient for companies like Google to develop each language, so in that sense, there is less zero.


――In the 29th conference, you said that you have accumulated search and audio recognition.Please tell me a little more search.

Mr. Tokuu I think that it is also possible to understand the intentions, but for example, in search, users (even at one point of height), "Height of former President Obama", "Tight of Tokyo Tower".Do.Sometimes I search for photos that my family is on the statue.Even if the scene is different, I think it will be useful to cultivate it in "search" to understand the intention.

――Do you change the intent to pump in that it matches your personal attributes or matches the devices used by users?I thought, is there a mechanism to identify?

It would be quite difficult to identify from the content to search for various things for one Tokuo.However, in Google Home, there is a mechanism that allows you to identify your family by voice.In addition, there is a mechanism that the page you often see is displayed at the top in the search.

By the way, 15 % of the daily search words (query) have never been searched before, so we are challenging to display results properly in such words.

――In the recital that said, "It's not the first step yet, it's not completed", is there any aspect of responding to such new queries?

Mr. Tokuo Yes, that's right.When it comes to audio input, the way of asking questions changes slightly.The search box was "Obama" and "height", but it would be a more natural sentence with Google Assistants.The know -how cultivated in the search will not pass 100 %, but you will know what the user asks for an assistant.

The achievement in audio is the background of machine learning such as neural networks and deep learning.This is because we have a remarkable achievement in machine learning and voice recognition.

In the past, the recognition rate of voice recognition was bad in the noise.But humans can hear well at the party venue.What humans can do may work well with machine learning, but in the past year to two years, it has worked.It works when used at a train station in a car.

Google Home said, "I was thinking of attaching eight microphones at first, but I don't know where the human voice comes from. I have heard two human ears.If you use machine learning well, you've become more and more recognized. "The accuracy of voice recognition has improved significantly.

――In addition to smartphones, there are also Google Assistants with various devices.But isn't it just a smartphone?I think there is a way of view.

Mr. Tokuu has not yet appeared in Japan, but let's consider how to use it at home.There are some people, such as family, at home.And even though I have my own smartphone, I don't keep it at hand.If Google Home is placed in the living room, you can operate just by talking for the time being, so it will be convenient.

――How do the experiences you get when you put an Android tablet in the living room and use it with Google Assistant and use Google Home?Of course, since it has a different hardware configuration such as microphones and speakers, such points may provide a different experience ...

Mr. Tokuu: I want to be able to move Google Assistant with everything, so if you reach the ultimate goal, the hardware difference may be the biggest.However, the difference in hardware cannot be ignored.I think Google Home is overwhelmingly superior to picking up sounds.

―― How much would it be (laughs)?

Public Relations Inc. It's $ 129 in the United States (laughs).

――What would your Google Assistant's challenges and weaknesses at this stage?

Mr. Tokuu's official language is in English, and the quality of the English version will be high.Nevertheless, I am an engineer who fixes only Japanese, or even if I work in Japan, I do global products, so I don't have a staff member like a localized person, so I want to do it well.

――By the way, the quality of the English version is high ...

In terms of the assumption questions and the result of it.If the English version can get more optimal results.

Not only in the Japanese version, but also in Google Assistant, there is a lot to do, but in the first place, Google himself is trying to step on the assistant from the search in the first place, so to make users feel value.And how do you develop development?

――In order to use it, the question, "Is it really possible to use it with a voice?"

The use of Tokuo's voice input itself is increasing.Of course, I don't think typing or letting the text input is out.However, there are situations where voice input is required when using Google Assistant, such as cars, so it is not lost in the provision of the voice input function.By the way, Google Assistant has just announced that it will support character input in Google I/O, and the Japanese version supports the use of character input from the beginning.

--I see.

Speaking of Tokuo, children really use audio input.

――Is it a senior?I often hear from seniors that text input is easy to sound.

――Do you think that the use in Japan will be used more than other countries in the future?The United States is exceptional ...

I have just released the Japanese version of Tokuo and I don't know honestly, but since Japan is the most mobile country, I think something will come out.

――In that sense, it is expected to be in Actions on Google.

Mr. Tokuo Yes, it is very important.If you don't have a third -party service, you won't be an ideal assistant.

――What do you expect as a genre?

Mr. Tokuu: I don't know which categories in the United States have many categories, but looking back on Google's service, the one ahead of Japan is mobile, route search, etc.There may be needs inside, but I don't know yet.The point of view of seniors may be interesting.It is an aging country.

――By the way, if you develop a Google Assistant -friendly service, will it be automatically operated from a foreign language?

Mr. Tokuo It may depend on how the development side is troublesome.

―― That is not an automatic translation without permission.

Mr. Tokuo may be responsive if such a need is large, but it is not perfect for automatic translation ...

――In other companies' assistant services that make full use of AI, etc., it explains that in Japan, LINE has the strength only because it is a company that is rooted in Japan.If you have the know -how cultivated in the search and search, you have the impression that you will not take it.

I hope you don't take Tokusu.I think it's good to be excited, and I think LINE is a great company.I am also looking forward to it as a user.

――Is it really convenient for speaker -type devices?

It is convenient to have Tokuo.If it gets better, it will be really convenient.

――I don't know when it will be “if it gets better” ...

Looking at the evolution of imaging and voice recognition, machine learning has not yet been saturated.The computer has also changed dramatically from desktop to smartphone.I don't know how much it will evolve in six months, but I think it's a tremendous interesting thing in a few years.

--thank you very much.

