2 lines
230 KiB
XML
2 lines
230 KiB
XML
<?xml version="1.0" encoding="utf-8"?>
|
||
<rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/" version="2.0" xmlns:media="http://search.yahoo.com/mrss/"><channel><title>Cloud Blog</title><link>https://cloud.google.com/blog/</link><description>Cloud Blog</description><atom:link href="https://cloudblog.withgoogle.com/rss/" rel="self"></atom:link><language>en</language><lastBuildDate>Tue, 03 Sep 2019 18:00:00 -0000</lastBuildDate><image><url>https://gweb-cloudblog-publish.appspot.com/static/blog/images/google.a51985becaa6.png</url><title>Cloud Blog</title><link>https://cloud.google.com/blog/</link></image><item><title>Want to keep your employees productive? Pay attention to shadow IT clues</title><link>https://cloud.google.com/blog/products/productivity-collaboration/want-to-keep-your-employees-productive-pay-attention-to-shadow-it-clues/</link><description><html><head></head><body><div class="block-paragraph"><div class="rich-text"><p>Employees use tools at their disposal to get work done, but if these tools (often legacy) hamper collaboration or are inflexible, they’ll turn to less secure options for the sake of convenience. According to <a href="https://www.gartner.com/smarterwithgartner/top-10-security-predictions-2016/">Gartner</a>, a third of successful attacks experienced by enterprises will come from Shadow IT usage by 2020. </p><p>And this problem is not unknown. Eighty-three percent of IT professionals <a href="https://www.prnewswire.com/news-releases/shadow-it---cloud-usage-a-growing-challenge-for-cios-575359961.html">reported</a> that employees stored company data in unsanctioned cloud services, a challenge especially apparent with file sync and share tools. When people work around their legacy systems to use tools like Google Drive, it’s often because they find their current systems to be clunky or that they can’t collaborate with others as easily. They’re unable to do three key things in legacy file sync and share systems (like Microsoft SharePoint):</p><ol><li><b>Unable to work on their phones.</b> By now, people expect to be able to work on the go—and this means not just opening an attachment, but actually making edits to and comments on work. It gives them freedom to work when it’s convenient for them and to help teammates anytime. </li><li><b>Unable to create workspaces independently and easily.</b> This might sound counterintuitive, but if an employee needs to contact IT to have a new project folder made on a drive, the bar is too high. Employees need to be able to quickly, and independently, create documents that can be shared simply because of the changing nature of collaboration. Work happens ad-hoc, on the go (like we mentioned above), and with people inside and outside of your organization. If someone has to contact IT to create a new folder, they’re more likely to neglect the request or use a different tool altogether to get started. </li><li><b>Unable to make the data work for them.</b> Traditional file storage is just that, storage. Like an attic, we store things in these systems, but at some point stuff gets stale and it’s hard to tell what we should keep or pitch. People need their storage systems to not only house their data, but to help them categorize and find information quicker so that they can make this data work better for them.</li></ol><p>The way I see it, you have two choices when it comes to making a decision on file sync and share systems:</p><p><i>Option 1:</i> Continue to let your employees work on unsanctioned products, some of which may open your business up to unintended security issues (and, in some instances, scary terms of service).</p><p><i>Option 2:</i> Buy the tools that your users want to use because these tools are making them more productive.</p><p>If you want to create a more productive workforce, take cues from your employees. Your tools should not only meet the highest security standards for IT, but let people work the way they want to (and be intelligent enough to guide them along the way). Imagine if your technology could flag that a file contains confidential information before an employee accidentally shares it. Or surface files as they’re needed to help people work faster. <a href="https://inthecloud.withgoogle.com/drive/replace-sharepoint.html?utm_source=cloudblog&amp;utm_medium=drive&amp;utm_campaign=replacingsharepoint">Google Drive does this</a>.</p><p>Remember, if the technology doesn’t suit your employees, they’re just going to work around it anyway. Instead of investing time and resources on routine maintenance, shift this energy toward helping your employees stay productive in ways that work for both you and them.</p></div></div></body></html></description><pubDate>Tue, 03 Sep 2019 18:00:00 -0000</pubDate><guid>https://cloud.google.com/blog/products/productivity-collaboration/want-to-keep-your-employees-productive-pay-attention-to-shadow-it-clues/</guid><category>Perspectives</category><category>Drive</category><category>Inside Google Cloud</category><category>Productivity & Collaboration</category><media:content url="https://storage.googleapis.com/gweb-cloudblog-publish/images/shadow_IT.max-600x600.png" width="540" height="540"></media:content><og xmlns:og="http://ogp.me/ns#"><type>article</type><title>Want to keep your employees productive? Pay attention to shadow IT clues</title><description></description><image>https://storage.googleapis.com/gweb-cloudblog-publish/images/shadow_IT.max-600x600.png</image><site_name>Google</site_name><url>https://cloud.google.com/blog/products/productivity-collaboration/want-to-keep-your-employees-productive-pay-attention-to-shadow-it-clues/</url></og><author xmlns:author="http://www.w3.org/2005/Atom"><name>Diane Chaleff</name><title>Product Manager, Office of the CTO</title><department></department><company></company></author></item><item><title>Last month today: August on GCP</title><link>https://cloud.google.com/blog/products/gcp/last-month-today-august-2019-on-gcp/</link><description><html><head></head><body><div class="block-paragraph"><div class="rich-text"><p>Last month on the <a href="https://cloud.google.com/">Google Cloud Platform</a> (GCP) blog, we dove into hardware, software, and the humans who make technology work. Here’s what topped our charts in August.</p><p><b>Exploring the nuts and bolts of cloud</b></p><ul><li><p>Google already uses AMD’s EPYC processors for internal workloads, and last month we announced <a href="https://cloud.google.com/blog/products/compute/amd-epyc-processors-come-to-google-and-to-google-cloud">that they’re coming to the data centers that power Google Cloud</a> products. Second-gen AMD EPYC processors will soon power our new virtual machines—the largest general-purpose VMs we’ve ever offered. There will be a range of sizes for these AMD VMs so you can choose accordingly, and can also configure them as custom machine types. Improvements like these can help you get more performance for the price for your workloads. </p></li></ul><ul><li><p>One small button can make it easy for other developers to deploy your app to GCP using Cloud Run, our managed compute platform that lets you deploy containerized serverless apps. You can add the <a href="https://cloud.google.com/blog/products/serverless/introducing-cloud-run-button-click-to-deploy-your-git-repos-to-google-cloud">new Cloud Run Button</a> to any source code repository that has a dockerfile or that can be built with Cloud Native Buildpacks. One click will package the app source code as a container image, push it to Google Container Registry, then deploy it on Cloud Run. </p></li></ul><p><b>Looking at the human side of technology</b></p><ul><li><p>This <a href="https://cloud.google.com/blog/topics/hybrid-cloud/a-cios-guide-to-the-cloud-hybrid-and-human-solutions-to-avoid-trade-offs">blog post offered a look at the tradeoffs that CIOs and CTOs</a> have to make in their pursuit of business acceleration in a hybrid world, based on recent McKinsey research. While digital transformation and new tech capabilities are in high demand, leaders can avoid making tradeoffs by choosing technology wisely and making necessary operational changes too, including fostering a change mindset. There are tips here on embracing a DevOps model, using a flexible hybrid cloud model, and adopting open-source architectures to avoid common pitfalls.</p></li><li><p>This year’s <a href="https://cloud.google.com/blog/products/devops-sre/the-2019-accelerate-state-of-devops-elite-performance-productivity-and-scaling">Accelerate State of DevOps Report is available now</a>, and offers a look at the latest in DevOps, with tips for organizations at all stages of DevOps maturity. This year, data shows that the percentage of elite performers is at its highest ever, and that these elite performers are more likely to use cloud. The report found that most cloud users still aren’t getting all of its benefits, though. DevOps should be a team effort, too, with both organizational and team-level efforts important for success.</p></li></ul><p><b>How customers are developing with cloud</b></p><ul><li><p>Google Cloud customers are pushing innovation further to serve customers in lots of interesting ways. First up this month is <a href="https://cloud.google.com/blog/topics/customers/macys-uses-google-cloud-to-streamline-retail-operations">Macy’s, which uses Google Cloud</a> to help provide customers with great online and in-person experiences. The company is streamlining retail operations across its network with cloud, and uses GCP’s data warehousing and analytics to optimize all kinds of merchandise tasks at its new distribution center.</p></li><li><p>We also heard this month from <a href="https://cloud.google.com/blog/products/ai-machine-learning/itau-unibanco-how-we-built-a-cicd-pipeline-for-machine-learning-with-online-training-in-kubeflow">Itau Unibanco of Brazil, which developed a digital customer service tool</a> to offer instant help to bank users. They use Google Cloud to build a Kubeflow-based CI/CD pipeline to deploy machine learning models and serve customers quickly and accurately. The post offers a look at their architecture and offers tips for replicating the pipeline.</p></li></ul><ul><li><p>Last but not least, check out this story on <a href="https://cloud.google.com/blog/products/maps-platform/how-two-developers-reached-new-heights-with-google-maps-platform">how web developers are using Google Maps Platform and custom Street View imagery</a> to offer virtual tours to the top of Zugspitze, the tallest mountain in Germany. Along with exploring APIs and deciding how to use the technology, the developers took a ton of 360° photos while hiking up and down parts of the 10,000-foot mountain. Take the tour yourself <a href="https://zugspitze360.com/">on their site</a>.</p></li></ul><p>That’s a wrap for August! Stay tuned <a href="https://cloud.google.com/blog/">on the blog</a> for all the latest.</p></div></div></body></html></description><pubDate>Tue, 03 Sep 2019 14:00:00 -0000</pubDate><guid>https://cloud.google.com/blog/products/gcp/last-month-today-august-2019-on-gcp/</guid><category>Google Cloud Platform</category><media:content url="https://storage.googleapis.com/gweb-cloudblog-publish/images/Last_Month_Today_Aug19.max-600x600.jpg" width="540" height="540"></media:content><og xmlns:og="http://ogp.me/ns#"><type>article</type><title>Last month today: August on GCP</title><description>Here are some of the top GCP stories that appeared on the Cloud blog in August.</description><image>https://storage.googleapis.com/gweb-cloudblog-publish/images/Last_Month_Today_Aug19.max-600x600.jpg</image><site_name>Google</site_name><url>https://cloud.google.com/blog/products/gcp/last-month-today-august-2019-on-gcp/</url></og><author xmlns:author="http://www.w3.org/2005/Atom"><name>The Google Cloud blog team </name><title></title><department></department><company></company></author></item><item><title>Build a dev workflow with Cloud Code on a Pixelbook</title><link>https://cloud.google.com/blog/products/application-development/build-a-dev-workflow-with-cloud-code-on-a-pixelbook/</link><description><html><head></head><body><div class="block-paragraph"><div class="rich-text"><p>Can you use a Pixelbook for serious software development? Do you want a workflow that is simple, doesn’t slow you down, and is portable to other platforms? And do you need support for Google Cloud Platform SDK, Kubernetes and Docker? I switched to a Pixelbook for development, and I love it!</p></div></div><div class="block-image_full_width"><div class="article-module h-c-page"><div class="h-c-grid"><figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 "><img alt="Cloud Code.png" src="https://storage.googleapis.com/gweb-cloudblog-publish/images/Cloud_Code_tOGSx5R.max-1000x1000.png"/></figure></div></div></div><div class="block-paragraph"><div class="rich-text"><p>Pixelbooks are slim, light, ergonomic, and provide great performance. Chrome OS is simple to use. It brings many advantages over traditional operating systems: </p><ul><li>frictionless updates</li><li>enhanced security</li><li>extended battery life</li></ul><p>And the most compelling feature for me: almost <b>instant coming to life after sleep</b>. This is great when hopping between meetings and on the road. </p><p>A little about me - I’m a <a href="https://medium.com/google-cloud/developer-programs-engineer-say-what-b12829729693">Developer Programs Engineer</a>. I work on Google Cloud and contribute to many open source projects. I need to accomplish repeatable development tasks: working with Github, build, debug, deploy and observe. Running and testing the code on multiple platforms is also of high importance. I can assure you, the workflow below built on Pixelbook satisfies all the following:</p><ul><li>Simple, repeatable development workflow with emphasis on developer productivity</li><li>Portable to other platforms (Linux, MacOS, Windows)—“create once, use everywhere”</li><li>Support for Google Cloud Platform SDK, Github, Kubernetes and Docker.</li></ul><p>Let’s dive into how you can set up a development environment on Pixelbook that meets all those requirements using <a href="https://cloud.google.com/code/docs/vscode/quickstart">Cloud Code for Visual Studio Code</a>, remote extensions, and several other handy tools. If you are new to the world of Chromebooks and switching from a PC, check out <a href="https://cloud.google.com/blog/products/chrome-enterprise/how-to-use-a-chromebook-if-youve-switched-from-a-pc">this post</a> to get started.</p><h2>Step 1: Enable Linux apps on Pixelbook</h2><p>Linux for Chromebooks (aka <a href="https://chromium.googlesource.com/chromiumos/docs/+/master/containers_and_vms.md#Crostini">Crostini</a>) is a project to let developers do everything they need locally on a Chromebook, with an emphasis on web and Android app development. It adds Linux support. </p><p>On your Pixelbook:</p>1. Go to Settings (chrome://settings) in the built-in Chrome browser.<br/>2. Scroll down to the “Linux (Beta) ” section (see screenshot below).<br/></div></div><div class="block-image_full_width"><div class="article-module h-c-page"><div class="h-c-grid"><figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 "><img alt="Enable Linux apps on Pixelbook.png" src="https://storage.googleapis.com/gweb-cloudblog-publish/images/Enable_Linux_apps_on_Pixelbook1.max-1000x1000.png"/></figure></div></div></div><div class="block-paragraph"><div class="rich-text">3. Click “<b>Turn on</b>” and follow the prompts. It may take up to 10 minutes depending on your Wi-Fi connection.<br/>4. At the end, a new Terminal window should automatically open to a shell within the container. We’re all set to continue to the next step - installing developer tools!<p><br/></p><p>Pin the terminal window to your program bar for convenience.</p><p><b>Configure Pixelbook keyboard to respect Function keys<br/></b>Folks coming from Windows or MacOS backgrounds are used to using Function keys for development productivity. On Chrome OS, they are replaced by default to a group of shortcuts. </p><p>However, we can bring them back:</p><p>Navigate to chrome://settings. Now, pick “Device” on the left menu, then pick “keyboard”. Toggle “treat top-row keys as function keys”:</p><p></p></div></div><div class="block-image_full_width"><div class="article-module h-c-page"><div class="h-c-grid"><figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 "><img alt="Configure Pixelbook keyboard to respect Function key.png" src="https://storage.googleapis.com/gweb-cloudblog-publish/images/pixelbook1.1000064220000303.max-1000x1000.jpg"/></figure></div></div></div><div class="block-paragraph"><div class="rich-text"><h2>Step 2: Install development tools</h2><p>For Kubernetes development on GCP, we need to install tools like Docker, Google Cloud SDK and kubectl. Pixelbook Linux is Debian Stretch, so we will install prerequisites for docker and gcloud using instructions for Debian Stretch distribution.</p><p><b>Install and configure Google Cloud SDK (gcloud):<br/></b>Run these commands from <a href="https://cloud.google.com/sdk/docs/quickstart-debian-ubuntu">gcloud Debian quickstart</a> to install gcloud sdk:</p></div></div><div class="block-code"><div class="article-module h-c-page"><div class="h-c-grid uni-paragraph-wrap"><div class="uni-paragraph h-c-grid__col h-c-grid__col--8 h-c-grid__col-m--6 h-c-grid__col-l--6 h-c-grid__col--offset-2 h-c-grid__col-m--offset-3 h-c-grid__col-l--offset-3"><pre><code></code></pre></div></div></div></div><div class="block-paragraph"><div class="rich-text"><p><b>Troubleshooting<br/></b>You might run into this error:</p></div></div><div class="block-code"><div class="article-module h-c-page"><div class="h-c-grid uni-paragraph-wrap"><div class="uni-paragraph h-c-grid__col h-c-grid__col--8 h-c-grid__col-m--6 h-c-grid__col-l--6 h-c-grid__col--offset-2 h-c-grid__col-m--offset-3 h-c-grid__col-l--offset-3"><pre><code></code></pre></div></div></div></div><div class="block-paragraph"><div class="rich-text"><p>Your keyrings are out of date. Run the following commands and try the Cloud SDK commands again:</p></div></div><div class="block-code"><div class="article-module h-c-page"><div class="h-c-grid uni-paragraph-wrap"><div class="uni-paragraph h-c-grid__col h-c-grid__col--8 h-c-grid__col-m--6 h-c-grid__col-l--6 h-c-grid__col--offset-2 h-c-grid__col-m--offset-3 h-c-grid__col-l--offset-3"><pre><code></code></pre></div></div></div></div><div class="block-paragraph"><div class="rich-text"><p><b>Add gcloud to PATH</b></p></div></div><div class="block-code"><div class="article-module h-c-page"><div class="h-c-grid uni-paragraph-wrap"><div class="uni-paragraph h-c-grid__col h-c-grid__col--8 h-c-grid__col-m--6 h-c-grid__col-l--6 h-c-grid__col--offset-2 h-c-grid__col-m--offset-3 h-c-grid__col-l--offset-3"><pre><code></code></pre></div></div></div></div><div class="block-paragraph"><div class="rich-text"><p><b>Installing Docker CE for Linux:<br/></b>Follow these <a href="https://docs.docker.com/install/linux/docker-ce/debian">instructions</a>.</p><p>And then add your user to the docker group:</p></div></div><div class="block-code"><div class="article-module h-c-page"><div class="h-c-grid uni-paragraph-wrap"><div class="uni-paragraph h-c-grid__col h-c-grid__col--8 h-c-grid__col-m--6 h-c-grid__col-l--6 h-c-grid__col--offset-2 h-c-grid__col-m--offset-3 h-c-grid__col-l--offset-3"><pre><code></code></pre></div></div></div></div><div class="block-paragraph"><div class="rich-text"><p><i><b>NOTE:</b> This allows running docker commands without sudo.</i></p><p><b>Install kubectl</b></p></div></div><div class="block-paragraph"><div class="rich-text"><p><b>Installing Visual Studio Code</b></p><p>Go to <a href="https://code.visualstudio.com/docs/setup/linux">VSCode linux install instructions page</a>.</p><ol><li><p>Download the<a href="https://go.microsoft.com/fwlink/?LinkID=760868">.deb package (64bit)</a> from the link on the page.</p></li><li><p>After the download is complete, install the deb file using “Install app with Linux (beta)”:</p></li></ol></div></div><div class="block-image_full_width"><div class="article-module h-c-page"><div class="h-c-grid"><figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 "><img alt="Installing Visual Studio Code.png" src="https://storage.googleapis.com/gweb-cloudblog-publish/images/Installing_Visual_Studio_Code.max-1000x1000.png"/></figure></div></div></div><div class="block-paragraph"><div class="rich-text"><p><b>Troubleshooting<br/></b>If you don’t see “Install with Linux” as an option for the deb file, double check that you switched to the beta channel.</p><p>Now let’s install a few extensions that I find helpful when working on a remote container using VS Code:</p><ul><li><p><a href="https://marketplace.visualstudio.com/items?itemName=ms-azuretools.vscode-docker">Docker</a> - managing docker images, autocompletion for docker files, and more.</p></li></ul><ul><li><p><a href="https://marketplace.visualstudio.com/items?itemName=ms-vscode-remote.remote-containers">Remote Containers</a> - use a docker container as a full-featured development environment. </p></li></ul><p>These two, along with Cloud Code, are key extensions in our solution.</p><h2>Step 3: Configuring Github access</h2><p><b>Configure github with SSH key</b></p></div></div><div class="block-code"><div class="article-module h-c-page"><div class="h-c-grid uni-paragraph-wrap"><div class="uni-paragraph h-c-grid__col h-c-grid__col--8 h-c-grid__col-m--6 h-c-grid__col-l--6 h-c-grid__col--offset-2 h-c-grid__col-m--offset-3 h-c-grid__col-l--offset-3"><pre><code></code></pre></div></div></div></div><div class="block-paragraph"><div class="rich-text"><p>Now copy and past the key into <a href="https://github.com/settings/keys">Github</a>.</p><p><i><b>NOTE:</b>If facing permissions error doing ssh-add, run <b>sudo chown $USER .ssh</b> and re-run all the steps for github setup again.</i></p><p>Set the username and email of github:</p></div></div><div class="block-code"><div class="article-module h-c-page"><div class="h-c-grid uni-paragraph-wrap"><div class="uni-paragraph h-c-grid__col h-c-grid__col--8 h-c-grid__col-m--6 h-c-grid__col-l--6 h-c-grid__col--offset-2 h-c-grid__col-m--offset-3 h-c-grid__col-l--offset-3"><pre><code></code></pre></div></div></div></div><div class="block-paragraph"><div class="rich-text"><h2>Step 4: Remote development</h2><p>Now that we have the tools installed and Github access configured, let’s configure our development workflow. In order to create a solution that is portable to other platforms, we will use remote containers extension. We will create a container that will be used to build, deploy and debug applications that we create. This is how it will work:</p></div></div><div class="block-paragraph"><div class="rich-text"><p>We will open our codebase in a remote container. This will let VS Code think that it is open in isolated Linux environment, so everything we do (build, deploy, debug, file operations) will be interpreted as if we were working on a dedicated Linux VM with its own file system: every command we execute on VS Code will be sent for execution on our remote container. This way we achieve the goal of portability—remote Linux container can run on both MacOS and Windows just like we do it on Pixelbook with Chrome OS that supports Linux.</p><h2>Dev Container settings for each repo</h2><p>Here’s how to set up a dev container for an existing project. You can find the full source code in the <a href="https://github.com/GoogleCloudPlatform/cloud-code-samples">Cloud Code templates repo</a>. This Github repo includes templates for getting started with repeatable Kubernetes development in five programming languages—Node.js, Go, Java, Python and .NET. Each template includes configuration for debugging and deploying the template to Kubernetes cluster using <a href="https://marketplace.visualstudio.com/items?itemName=GoogleCloudTools.cloudcode">Cloud Code for VS Code</a> and <a href="https://cloud.google.com/intellij/">IntelliJ</a>. For simplicity, we work with a HelloWorld template that just serves “Hello World” message from a simple web server in a single container.</p><p>To enable remote container development, we need to add a <b>.devcontainer</b> folder with two files:</p><ul><li><p><b>Dockerfile</b> — defines container image that holds all developer tools we need installed in a remote development container</p></li><li><p><b>Devcontainer.json</b> — Instructs VS Code Remote Tools extension how to run remote development container.</p></li></ul><p><b>Creating a container image for remote development<br/></b>Our remote container needs to have the SDK we use for development in the programming language of our choice. In addition, it needs tools that enable Cloud Code and Kubernetes workflows on Google Cloud. Therefore in the <a href="https://github.com/GoogleCloudPlatform/cloud-code-samples/blob/master/nodejs/nodejs-hello-world/.devcontainer/Dockerfile">Dockerfile</a> we install:</p><ul><li><p><a href="https://cloud.google.com/sdk/">Google Cloud SDK</a></p></li><li><p><a href="https://skaffold.dev">Skaffold</a> — tool Cloud Code uses for handling the workflow for building, pushing and deploying apps in containers</p></li><li><p><a href="https://docs.docker.com/engine/reference/commandline/cli/">Docker CLI</a></p></li></ul><p>In addition, container images are immutable. Every time we open the code in a remote container, we’ll get a clean state—no extra settings will be persisted between remote container reloads by default (kubernetes clusters to work with, gcloud project configuration, github ssh keys). To address that, we mount our host folders as drives in the container (see this part later in <a href="https://github.com/GoogleCloudPlatform/cloud-code-samples/blob/master/nodejs/nodejs-hello-world/.devcontainer/devcontainer.json">devcontainer.json</a>) and copy its content to the folder in the container file system where dev tools expect to find these files. </p><p>Example from Dockerfile of kubeconfig, gcloud and ssh keys sync between host and remote container:</p></div></div><div class="block-code"><div class="article-module h-c-page"><div class="h-c-grid uni-paragraph-wrap"><div class="uni-paragraph h-c-grid__col h-c-grid__col--8 h-c-grid__col-m--6 h-c-grid__col-l--6 h-c-grid__col--offset-2 h-c-grid__col-m--offset-3 h-c-grid__col-l--offset-3"><pre><code></code></pre></div></div></div></div><div class="block-paragraph"><div class="rich-text"><p><b>devcontainer.json<br/></b>This file tells Remote Container extension which ports to expose in the container, how to mount drives, which extensions to install in the remote container, and more.</p><p>A few notable configurations:</p></div></div><div class="block-code"><div class="article-module h-c-page"><div class="h-c-grid uni-paragraph-wrap"><div class="uni-paragraph h-c-grid__col h-c-grid__col--8 h-c-grid__col-m--6 h-c-grid__col-l--6 h-c-grid__col--offset-2 h-c-grid__col-m--offset-3 h-c-grid__col-l--offset-3"><pre><code></code></pre></div></div></div></div><div class="block-paragraph"><div class="rich-text"><p><i>runArgs</i> contains command line arguments remote extension passes to docker when remote container is launched. This is where we set environment variables and mount external drives in a container. This helps to eliminate authorizations and specifies the kubernetes clusters we want to work with in Cloud Code.</p><p>In the <i>extensions</i> section, we add a few VS Code extensions for enhanced productivity in the development container. These will be installed on a dev container but not on the host, so you can tailor this choice to the codebase you plan to work on in the dev container. In this case I am setting up for nodejs development.</p><ul><li><p><a href="https://marketplace.visualstudio.com/items?itemName=GoogleCloudTools.cloudcode"><b>Cloud Code for VS Code</b></a> — Google’s extension that helps to write, deploy and debug cloud-native applications quickly and easily. It allows deploying code to kubernetes and <a href="https://github.com/GoogleCloudPlatform/cloud-code-samples">supports 5 programming languages</a>.</p></li><li><p><a href="https://marketplace.visualstudio.com/items?itemName=eg2.vscode-npm-script">Npm support</a> for VS Code</p></li><li><p><a href="https://marketplace.visualstudio.com/items?itemName=streetsidesoftware.code-spell-checker">Code Spell Checker</a></p></li><li><p><a href="https://marketplace.visualstudio.com/items?itemName=DavidAnson.vscode-markdownlint">Markdownlint</a> — Improves the quality of markdown files. </p></li><li><p><a href="https://marketplace.visualstudio.com/items?itemName=eamodio.gitlens">Gitlens</a> — Shows the history of code commits along with other relevant useful information.</p></li><li><p><a href="https://marketplace.visualstudio.com/items?itemName=IBM.output-colorizer">Output colorizer</a> — Colors the output of various commands. Helpful when observing application logs and other info in the IDE.</p></li><li><p><a href="https://marketplace.visualstudio.com/items?itemName=vscode-icons-team.vscode-icons">Vscode-icons</a> — Changes icons to known file extensions for better visibility and discoverability of the files.</p></li><li><p><a href="https://marketplace.visualstudio.com/items?itemName=ms-azuretools.vscode-docker">Docker</a> — Manages docker images, autocompletion for docker files and more</p></li><li><p><a href="https://marketplace.visualstudio.com/items?itemName=ms-vscode.vscode-typescript-tslint-plugin">TSLint</a> — Linting for typescript (optional)</p></li><li><p><a href="https://marketplace.visualstudio.com/items?itemName=CoenraadS.bracket-pair-colorizer">Bracket pair colorizer</a> (optional)</p></li><li><p><a href="https://marketplace.visualstudio.com/items?itemName=christian-kohler.npm-intellisense">Npm intellisense</a> (optional)</p></li><li><p><a href="http://dbaeumer.vscode-eslint">ESLint Javascript</a> (optional)</p></li></ul><h2>Hello World in Dev Container on Pixelbook</h2><p>Let’s try to build, debug and deploy the <a href="https://github.com/GoogleCloudPlatform/cloud-code-samples/tree/master/nodejs/nodejs-hello-world">sample Hello World nodejs</a> app on Pixelbook using the remote dev container setup we just created:</p><ul><li><p><a href="https://cloud.google.com/sdk/docs/initializing">Initialize gcloud</a> by running <b>gcloud init</b> in a command line of your Pixelbook and following the steps. As part of our earlier setup, when we open the code in a remote container, Gcloud settings will be sync’ed into a dev container, so you won’t need to re-initialize every time.</p></li><li><p><a href="https://cloud.google.com/kubernetes-engine/docs/how-to/cluster-access-for-kubectl">Connect to a GKE cluster</a> using the command below. We will use it to deploy our app. This also can be done outside of the dev container and will be sync’ed using our earlier setup in .devsettings.</p></li></ul></div></div><div class="block-code"><div class="article-module h-c-page"><div class="h-c-grid uni-paragraph-wrap"><div class="uni-paragraph h-c-grid__col h-c-grid__col--8 h-c-grid__col-m--6 h-c-grid__col-l--6 h-c-grid__col--offset-2 h-c-grid__col-m--offset-3 h-c-grid__col-l--offset-3"><pre><code></code></pre></div></div></div></div><div class="block-paragraph"><div class="rich-text"><ul><li><p><b>Open the code in dev container</b>: In VS Code command palette, type: Remote-Containers: Open Folder in Container… and select your code location. The code will open in dev container, pre-configured with all the toolset and ready to go!</p></li></ul><ul><li><p><b>Build and deploy the code to GKE using Cloud Code</b>: In VS Code Command Palette, type: <b>Cloud Code: Deploy</b> and <a href="https://cloud.google.com/code/docs/vscode/deploying-an-application">follow the instructions</a>. Cloud Code will build the code, package it into container image, push it into container registry, then deploy it into GKE cluster we initialized earlier—all from the dev container on a Pixelbook!</p></li></ul><p>Though slick and small, the Pixelbook might just fit your developer needs. With VS Code, Remote development extension, Docker, Kubernetes and Cloud Code you can lift your development setup to the next level, where there is no need to worry about machine-specific or platform-specific differences affecting your productivity. By sharing dev container setup on Github, developers that clone your code will be able to reopen it in a container (assuming they have the Remote - Containers extension installed).</p></div></div><div class="block-image_full_width"><div class="article-module h-c-page"><div class="h-c-grid"><figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 "><img alt="Cloud Code Deploy.gif" src="https://storage.googleapis.com/gweb-cloudblog-publish/original_images/Cloud_Code_Deploy.gif"/></figure></div></div></div><div class="block-paragraph"><div class="rich-text"><p>Once done, developers will get an isolated environment with all dependencies baked in — just start coding!</p><p>If you have a Pixelbook — or if you don’t, and just want to try out Cloud Code — the Hello World app and all config files are available on <a href="https://github.com/GoogleCloudPlatform/cloud-code-samples">GitHub</a>. <a href="https://twitter.com/simon_zeltser">Let me know</a> how it went and what your favorite setup for developer productivity is.</p><h2>Further reading</h2><ul><li><p><a href="https://support.google.com/chromebook/answer/9145439?hl=en">Set up Linux (Beta) on your Chromebook</a></p></li><li><p><a href="https://chromeos-cookbooks.firebaseapp.com/setup.html">Chromebook Developer Toolbox</a></p></li><li><p><a href="https://cloud.google.com/code/docs/vscode/quickstart">Getting Started with Cloud Code for VS Code</a></p></li><li><p><a href="https://github.com/GoogleCloudPlatform/cloud-code-samples">Cloud Code Templates Repo</a></p></li><li><p><a href="https://code.visualstudio.com/docs/remote/containers#_getting-started">Developing inside a Container</a></p></li></ul></div></div></body></html></description><pubDate>Tue, 03 Sep 2019 14:00:00 -0000</pubDate><guid>https://cloud.google.com/blog/products/application-development/build-a-dev-workflow-with-cloud-code-on-a-pixelbook/</guid><category>Google Cloud Platform</category><category>Chrome Enterprise</category><category>Application Development</category><media:content url="https://storage.googleapis.com/gweb-cloudblog-publish/images/Cloud_Code.max-600x600.jpg" width="540" height="540"></media:content><og xmlns:og="http://ogp.me/ns#"><type>article</type><title>Build a dev workflow with Cloud Code on a Pixelbook</title><description>Can you use a Pixelbook for serious software development? Developer Programs Engineer Simon Zeltser shows you how.</description><image>https://storage.googleapis.com/gweb-cloudblog-publish/images/Cloud_Code.max-600x600.jpg</image><site_name>Google</site_name><url>https://cloud.google.com/blog/products/application-development/build-a-dev-workflow-with-cloud-code-on-a-pixelbook/</url></og><author xmlns:author="http://www.w3.org/2005/Atom"><name>Simon Zeltser</name><title>Developer Programs Engineer</title><department></department><company></company></author></item><item><title>Beyond the Map: Solving problems and powering location-based services with imagery</title><link>https://cloud.google.com/blog/products/maps-platform/beyond-map-solving-problems-and-powering-location-based-services-imagery/</link><description><html><head></head><body><div class="block-paragraph"><div class="rich-text"><p><i>Editor’s Note: Product director Ethan Russell brings us the second installment of our Beyond the Map series. In today’s post, he’ll explain how we use imagery to overcome different mapping challenges around the world to help power businesses with location-based data and insights. For a look at how we use imagery to build our consumer Maps products, tune into the <a href="https://www.blog.google/products/maps/">Google Keyword blog</a> soon. <br/></i><i><br/></i>So far in this series <a href="https://cloud.google.com/blog/products/maps-platform/beyond-the-map-how-we-build-the-maps-that-power-your-apps-and-business">we’ve explained</a>, at a high level, how we combine imagery, third-party authoritative data, machine learning, and community contributions to continuously map the changing world. But what do we do when one of these key elements is missing, like authoritative data sources? Or when a city is growing so fast that traditional map making isn’t an option? Or when streets are so narrow, we can’t drive a Street View car through to map them? We run into endless mapping challenges in our tireless pursuit to map the world, but the one constant is that imagery is almost always the foundation of the solution. <br/><br/><b>Mapping growing cities from imagery <br/></b>Some areas of the world simply don't have basic roads and buildings mapped yet, which means we can’t reference basic mapping information from authoritative data sources like local governments and organizations. In these cases we build the map literally from the ground up, starting with imagery from which we can extract mapping data. There are broadly two kinds of imagery that we use. Overhead imagery from satellites and airplanes shows roads and buildings, while street-level imagery lets us see road names, road signs, building numbers and business names. In last month’s post, we touched on how we use machine learning to automatically extract information from imagery and keep maps data up to date for our customers. Let’s take a look at how this served as the foundation for significant improvements of our maps of Lagos, Nigeria and what that means for a local business using Google Maps Platform. <br/><br/>Once we had the necessary imagery of the area, we were able to use a number of our machine learning-based pipelines to quickly update the major components of the map within just a few months (traditional mapping processes can often take far longer). We focused on three deep-learning based approaches: drawing the outlines of buildings, identifying house numbers, and recognizing businesses. We created detailed outlines of buildings using a model trained not only on the per-pixel details of what constitutes a building, but also on the high-level traits of building shapes seen in the overhead imagery. To identify house numbers and businesses, we used three-part detection, classification, and extraction approaches based on the continuation of work discussed in <a href="https://arxiv.org/abs/1704.03549">this paper</a>. These two algorithms were fed high-resolution Street View imagery as input. The accurate positioning of these images in six degrees of freedom was critical to getting the position of the house or business exactly right. As a result, we were able to improve the quality of our map data in Lagos in about one year (from 2017 to 2018) to levels equivalent to countries where we've spent many years building the maps. </p></div></div><div class="block-image_full_width"><div class="article-module h-c-page"><div class="h-c-grid"><figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 "><img alt="Visualization of improved maps data in Lagos, Nigeria" src="https://storage.googleapis.com/gweb-cloudblog-publish/original_images/lagos.gif"/><figcaption class="article-image__caption "><div class="rich-text">Improved coverage of buildings (pink) and points of interest (green) in Lagos, Nigeria from 2012 to 2018</div></figcaption></figure></div></div></div><div class="block-paragraph"><div class="rich-text"><p>For many people, an incorrect address when trying to find a business or other location is just a small nuisance. But for businesses, it could mean loss of business. And for <a href="http://lifebank.ng/">Lifebank</a>, a company that connects blood suppliers to hospital patients in Lagos, it could be a matter of life and death. In 2016, founder Temie Giwa-Tubosun, used Google Maps Platform to create and map an online blood repository in partnership with 52 blood banks across Lagos allowing doctors to request a blood type and immediately access a map that tracks the journey of the delivery. </p></div></div><div class="block-image_full_width"><div class="article-module h-c-page"><div class="h-c-grid"><figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 "><img alt="LifeBank's life-saving app" src="https://storage.googleapis.com/gweb-cloudblog-publish/images/54a218c9-1593-448d-ad9f-e6b507ee1da1_1.max-1000x1000.JPG"/><figcaption class="article-image__caption "><div class="rich-text">The LifeBank app helps connect blood banks, doctors, and drivers across Lagos, Nigeria</div></figcaption></figure></div></div></div><div class="block-paragraph"><div class="rich-text"><p>Before LifeBank, finding and delivering blood to a patient in Lagos could take several hours and in some cases, several days. But LifeBank changed that by transporting blood in an average of 45 minutes from initial request to final delivery. The team has registered over 5,800 blood donors, moved over 15,000 units, served 300-plus hospitals, and saved more than 4,000 lives. For Temie, access to mapping information was an important part of solving the blood crisis problem in her native Nigeria.<br/><br/><b>Mapping narrow roads with Street View 3-wheelers<br/></b>Places like Indonesia have some roads that are too narrow for cars, but just right for the 2-wheelers that are commonly used in the country. We needed to map these roads in order to introduce 2-wheeler navigation in Google Maps and provide 2-wheeler navigation solutions to our ridesharing customers, but our Street View cars were too big. Instead, we mounted a Trekker to a 3-wheeler–taking into account both operator safety and local regulations in our vehicle choice–and started mapping the narrow streets. </p></div></div><div class="block-image_full_width"><div class="article-module h-c-page"><div class="h-c-grid"><figure class="article-image--medium h-c-grid__col h-c-grid__col--4 h-c-grid__col--offset-4 "><img alt="Street View 3-wheeler" src="https://storage.googleapis.com/gweb-cloudblog-publish/images/IMG_20180325_162930.max-1000x1000.jpg"/><figcaption class="article-image__caption "><div class="rich-text"><p>A “Street View 3-wheeler” used to map narrow roads in Indonesia</p></div></figcaption></figure></div></div></div><div class="block-paragraph"><div class="rich-text"><p>The solution makes mapping projects in places off the beaten path or areas that might be inaccessible to cars possible and scalable. It enabled us to capture the street-level imagery of narrow roads needed to launch 2-wheeler navigation in Indonesia and improve our maps of the area. Since we’ve launched in Indonesia, we’ve brought 2-wheeler navigation to 21 other countries. </p><p></p><p>As you can see, imagery really is the foundation for our maps and solving map making problems worldwide. But this is just a look at a couple of the challenges we’ve solved with imagery. It’s an incredible resource for learning about the world and we have lots of creative ways of collecting and using imagery to help people explore and help businesses to build and expand their services–even in hard to map areas. Come back to the Google Maps Platform blog next time for another installment of Beyond the Map. Until then, to learn more about Google Maps Platform, <a href="https://cloud.google.com/maps-platform/">visit our website</a>.</p></div></div></body></html></description><pubDate>Fri, 30 Aug 2019 16:00:00 -0000</pubDate><guid>https://cloud.google.com/blog/products/maps-platform/beyond-map-solving-problems-and-powering-location-based-services-imagery/</guid><category>Google Maps Platform</category><media:content url="https://storage.googleapis.com/gweb-cloudblog-publish/images/AerialBuildings.max-600x600.jpg" width="540" height="540"></media:content><og xmlns:og="http://ogp.me/ns#"><type>article</type><title>Beyond the Map: Solving problems and powering location-based services with imagery</title><description>The second installment of our Beyond the Map series explains how we use imagery to overcome different mapping challenges around the world to help power businesses with location-based data and insights.</description><image>https://storage.googleapis.com/gweb-cloudblog-publish/images/AerialBuildings.max-600x600.jpg</image><site_name>Google</site_name><url>https://cloud.google.com/blog/products/maps-platform/beyond-map-solving-problems-and-powering-location-based-services-imagery/</url></og><author xmlns:author="http://www.w3.org/2005/Atom"><name>Ethan Russell</name><title>Product Director</title><department></department><company></company></author></item><item><title>Kubernetes security audit: What GKE and Anthos users need to know</title><link>https://cloud.google.com/blog/products/containers-kubernetes/kubernetes-security-audit-what-gke-and-anthos-users-need-to-know/</link><description><html><head></head><body><div class="block-paragraph"><div class="rich-text"><p>Kubernetes reached an important milestone recently: the publication of its first-ever security audit! Sponsored by the Cloud Native Computing Foundation (CNCF), this security audit reinforces what has been apparent to us for some time now: Kubernetes is a mature open-source project for organizations to use as their infrastructure foundation.</p><p>While every audit will uncover something, this report only found a relatively small number of significant vulnerabilities that need to be addressed. “Despite many important findings, we did not see fundamental architectural design flaws, or critical vulnerabilities that should cause pause when adopting Kubernetes for high-security workloads or critical business functions,” <a href="https://www.helpnetsecurity.com/2019/08/12/kubernetes-security-matures/">said</a> Aaron Small, Product Manager, Google Cloud and member of the Security Audit Working Group. Further, Kubernetes has an <a href="https://kubernetes.io/docs/reference/issues-security/security/">established vulnerability reporting, response, and disclosure process</a>, which is staffed with senior developers who can triage and take action on issues.</p></div></div><div class="block-image_full_width"><div class="article-module h-c-page"><div class="h-c-grid"><figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 "><img alt="GCP_k8_securityaudit.png" src="https://storage.googleapis.com/gweb-cloudblog-publish/images/GCP_k8_securityaudit.0480025209600258.max-1000x1000.png"/></figure></div></div></div><div class="block-paragraph"><div class="rich-text"><p>Performing this security audit was a big effort on behalf of the CNCF, which has a mandate to improve the security of its projects via its <a href="https://bestpractices.coreinfrastructure.org/en">Best Practices Badge Program</a>. To take Kubernetes through this first security audit, the Kubernetes Steering Committee formed a working group, developed an RFP, worked with vendors, reviewed and then finally published the report. You can get your hands on the <a href="https://github.com/kubernetes/community/blob/master/wg-security-audit/findings/Kubernetes%20Final%20Report.pdf">full report</a> on the Working Group’s GitHub page, or read the <a href="https://www.cncf.io/blog/2019/08/06/open-sourcing-the-kubernetes-security-audit/">highlights in the CNCF blog post</a>.</p><h2>Kubernetes security for GKE and Anthos users</h2><p>Clocking in at 241 pages, the final report is very thorough and interesting and we encourage you to read it. But what if you’re just interested in what this report means for Google Cloud’s managed platforms, <a href="https://cloud.google.com/kubernetes-engine/">Google Kubernetes Engine</a> (GKE) and <a href="https://cloud.google.com/anthos/">Anthos</a>? If you’re not going to read the whole thing, here’s the gist of the report and takeaways for Google Cloud customers.</p><p><b>GKE makes it easy for you to follow recommended configurations<br/></b>The report lays out a <a href="https://github.com/kubernetes/community/blob/master/wg-security-audit/findings/Kubernetes%20White%20Paper.pdf">list of recommended actions for cluster administrators</a>, including using RBAC, applying a Network Policy, and limiting access to logs which may contain sensitive information. The report also calls out Kubernetes’ default settings. In GKE, we’ve been actively changing these over time, including turning off ABAC and basic authentication by default, to make sure new clusters you create are more secure. To apply the recommended configurations in GKE, and see which have already been applied for you, check out the <a href="https://cloud.google.com/kubernetes-engine/docs/how-to/hardening-your-cluster">GKE hardening guide</a>.</p><p><b>It’s not all up to you <br/></b>The <a href="https://github.com/kubernetes/community/blob/master/wg-security-audit/findings/Kubernetes%20Threat%20Model.pdf">threat model</a> assessed the security posture of eight major components, but because of the GKE <a href="https://cloud.google.com/blog/products/containers-kubernetes/exploring-container-security-the-shared-responsibility-model-in-gke-container-security-shared-responsibility-model-gke">shared responsibility model</a>, you don’t have to worry about all of them. GKE is responsible for providing updates to vulnerabilities for the eight components listed in the report, while you as the user are responsible for upgrading nodes and configuration related to workloads. You don’t even need to upgrade nodes if you leave node auto-upgrade enabled. </p><p><b>Kubernetes and GKE security are only going to keep getting better<br/></b>With more eyes on this shared, open source technology, more well-hidden bugs are likely to be found and remediated. The Kubernetes community dedicated significant time and resources to this audit, emphasizing that security is truly a top priority. With open audits like the one performed by the CNCF, it’s easier for researchers—or your team—to understand the real threats, and spend their time further researching or remediating the most complex issues. </p><p>And when issues do arise, as we’ve seen multiple times with recent vulnerabilities, the upstream <a href="https://github.com/kubernetes/security/blob/master/security-release-process.md#product-security-committee-psc">Kubernetes Product Security Committee</a> is on top of it, quickly responding and providing fixes to the community. </p><p>Finally, since GKE is an official distribution, we pick up patches as they become available in Kubernetes and make them available automatically for the control plane, master, and node. Masters are automatically upgraded and patched, and if you have node auto-upgrade enabled, your node patches will be automatically applied too. You can track the progress to address the vulnerabilities surfaced by this report in the <a href="https://github.com/kubernetes/kubernetes/issues/81146">issue dashboard</a>.</p><p>If you want to dig in deeper, check out the full <a href="https://github.com/kubernetes/community/blob/master/wg-security-audit/findings/Kubernetes%20Final%20Report.pdf">report</a>, available on GitHub. Thanks again to the Kubernetes Security Audit Working Group, the CNCF, Trail of Bits and Atredis Partners for the amazing work they did to complete this in-depth assessment! To learn more about trends in container security here at Google Cloud, be sure to follow our <a href="https://cloud.google.com/blog/topics/exploring-container-security">Exploring container security</a> blog series.</p></div></div></body></html></description><pubDate>Fri, 30 Aug 2019 13:00:00 -0000</pubDate><guid>https://cloud.google.com/blog/products/containers-kubernetes/kubernetes-security-audit-what-gke-and-anthos-users-need-to-know/</guid><category>Identity & Security</category><category>Google Cloud Platform</category><category>GKE</category><category>Anthos</category><category>Hybrid Cloud</category><category>Containers & Kubernetes</category><media:content url="https://storage.googleapis.com/gweb-cloudblog-publish/images/Exploring_container_security.max-600x600.png" width="540" height="540"></media:content><og xmlns:og="http://ogp.me/ns#"><type>article</type><title>Kubernetes security audit: What GKE and Anthos users need to know</title><description>Read about the implications of the first Kubernetes security audit on GKE and Anthos.</description><image>https://storage.googleapis.com/gweb-cloudblog-publish/images/Exploring_container_security.max-600x600.png</image><site_name>Google</site_name><url>https://cloud.google.com/blog/products/containers-kubernetes/kubernetes-security-audit-what-gke-and-anthos-users-need-to-know/</url></og><author xmlns:author="http://www.w3.org/2005/Atom"><name>Maya Kaczorowski</name><title>Product Manager, Container security</title><department></department><company></company></author></item><item><title>How to quickly solve machine learning forecasting problems using Pandas and BigQuery</title><link>https://cloud.google.com/blog/products/ai-machine-learning/how-to-quickly-solve-machine-learning-forecasting-problems-using-pandas-and-bigquery/</link><description><html><head></head><body><div class="block-paragraph"><div class="rich-text"><p>Time-series forecasting problems are ubiquitous throughout the business world. For example, you may want to predict the probability that some event will happen in the future or forecast how many units of a product you’ll sell over the next six months. Forecasting like this can be posed as a supervised machine learning problem. </p><p>Like many machine learning problems, the most time-consuming part of forecasting can be setting up the problem, constructing the input, and feature engineering. Once you have created the features and labels that come out of this process, you are ready to train your model.</p><p>A common approach to creating features and labels is to use a sliding window where the features are historical entries and the label(s) represent entries in the future. As any data-scientist that works with time-series knows, this sliding window approach can be tricky to get right.</p></div></div><div class="block-image_full_width"><div class="article-module h-c-page"><div class="h-c-grid"><figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 "><img alt="1_sliding window on an example dataset.gif" src="https://storage.googleapis.com/gweb-cloudblog-publish/original_images/1_sliding_window_on_an_example_dataset.gif"/><figcaption class="article-image__caption "><div class="rich-text"><i>A sliding window on an example dataset. Each window represents a feature vector for the dataset and the label(s) is one or more points in the future.</i></div></figcaption></figure></div></div></div><div class="block-paragraph"><div class="rich-text"><p>Below is a good workflow for tackling forecasting problems:</p><p>1. Create features and labels on a subsample of data using Pandas and train an initial model locally<br/>2. Create features and labels on the full dataset using BigQuery<br/>3. Utilize BigQuery ML to build a scalable machine learning model<br/>4. (Advanced) Build a forecasting model using Recurrent Neural Networks in Keras and TensorFlow</p><p>In the rest of this blog, we’ll use an example to provide more detail into how to build a forecasting model using the above workflow. (The code is available on <a href="https://aihub.cloud.google.com/u/0/p/products%2F167a3129-a605-49eb-9f51-c9b32984c0b6">AI Hub</a>)</p><h2>First, train locally</h2><p>Machine learning is all about running experiments. The faster you can run experiments, the more quickly you can get feedback, and thus the faster you can get to a Minimum Viable Model (MVM). It’s beneficial, then, to first work on a subsample of your dataset and train locally before scaling out your model using the entire dataset.</p><p>Let’s build a model to forecast the median housing price week-by-week for New York City. We spun up a <a href="http://console.cloud.google.com/mlengine/notebooks">Deep Learning VM</a> on Cloud AI Platform and loaded our data from <a href="https://www1.nyc.gov/site/finance/taxes/property-annualized-sales-update.page">nyc.gov</a> into BigQuery. Our dataset goes back to 2003, but for now let’s just use prices beginning 2011.</p></div></div><div class="block-image_full_width"><div class="article-module h-c-page"><div class="h-c-grid"><figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 "><img alt="2_median housing price.png" src="https://storage.googleapis.com/gweb-cloudblog-publish/images/2_median_housing_price.max-1000x1000.png"/></figure></div></div></div><div class="block-paragraph"><div class="rich-text"><p>Since our goal is to forecast future prices, let's create sliding windows that accumulate historical prices (features) and a future price (label). Our source table contains date and median price:</p></div></div><div class="block-image_full_width"><div class="article-module h-c-page"><div class="h-c-grid"><figure class="article-image--small h-c-grid__col h-c-grid__col--2 h-c-grid__col--offset-5 "><img alt="3_forecast future prices.png" src="https://storage.googleapis.com/gweb-cloudblog-publish/images/3_forecast_future_prices.max-1000x1000.png"/></figure></div></div></div><div class="block-paragraph"><div class="rich-text"><p>Here is the entire dataset plotted over time:</p></div></div><div class="block-image_full_width"><div class="article-module h-c-page"><div class="h-c-grid"><figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 "><img alt="4_entire dataset plotted.png" src="https://storage.googleapis.com/gweb-cloudblog-publish/images/4_entire_dataset_plotted.max-1000x1000.png"/></figure></div></div></div><div class="block-paragraph"><div class="rich-text"><p>To create our features, we’ll pick a historical window size—e.g., one year—that will be used to forecast the median home price in six months. To do this, we have implemented a reusable function based on Pandas that allows you to easily generate time-series features and labels. Feel free to use this function on your own dataset.</p></div></div><div class="block-code"><div class="article-module h-c-page"><div class="h-c-grid uni-paragraph-wrap"><div class="uni-paragraph h-c-grid__col h-c-grid__col--8 h-c-grid__col-m--6 h-c-grid__col-l--6 h-c-grid__col--offset-2 h-c-grid__col-m--offset-3 h-c-grid__col-l--offset-3"><pre><code></code></pre></div></div></div></div><div class="block-paragraph"><div class="rich-text"><p>After running <code>create_rolling_features_label</code>, a feature vector of length 52 (plus the date features) is created for each example, representing the features before the prediction date.</p></div></div><div class="block-code"><div class="article-module h-c-page"><div class="h-c-grid uni-paragraph-wrap"><div class="uni-paragraph h-c-grid__col h-c-grid__col--8 h-c-grid__col-m--6 h-c-grid__col-l--6 h-c-grid__col--offset-2 h-c-grid__col-m--offset-3 h-c-grid__col-l--offset-3"><pre><code></code></pre></div></div></div></div><div class="block-image_full_width"><div class="article-module h-c-page"><div class="h-c-grid"><figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 "><img alt="5_create_rolling_features_label.png" src="https://storage.googleapis.com/gweb-cloudblog-publish/images/5_create_rolling_features_label.max-1000x1000.png"/></figure></div></div></div><div class="block-paragraph"><div class="rich-text"><p>This can be shown with a rolling window:</p></div></div><div class="block-image_full_width"><div class="article-module h-c-page"><div class="h-c-grid"><figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 "><img alt="6_rolling window.gif" src="https://storage.googleapis.com/gweb-cloudblog-publish/original_images/6_rolling_window.gif"/><figcaption class="article-image__caption "><div class="rich-text"><i>The create_rolling_features_label function creates windows for the feature and label. In this case, the features consist of 52 weeks and the label consists of a week 6 months into the future.</i></div></figcaption></figure></div></div></div><div class="block-paragraph"><div class="rich-text"><p>Once we have the features and labels, the next step is to create a training and test set. In time-series problems, it’s important to split them temporally so that you are not leaking future information that would not be available at test time into the trained model.</p></div></div><div class="block-image_full_width"><div class="article-module h-c-page"><div class="h-c-grid"><figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 "><img alt="7_training and test set.png" src="https://storage.googleapis.com/gweb-cloudblog-publish/images/7_training_and_test_set.max-1000x1000.png"/><figcaption class="article-image__caption "><div class="rich-text"><i>The training set (blue) will consist of data where the label occurs before the split date (2015-12-30'), while the test set (green) consists of rows where the label is after this date.</i></div></figcaption></figure></div></div></div><div class="block-code"><div class="article-module h-c-page"><div class="h-c-grid uni-paragraph-wrap"><div class="uni-paragraph h-c-grid__col h-c-grid__col--8 h-c-grid__col-m--6 h-c-grid__col-l--6 h-c-grid__col--offset-2 h-c-grid__col-m--offset-3 h-c-grid__col-l--offset-3"><pre><code></code></pre></div></div></div></div><div class="block-paragraph"><div class="rich-text"><p>In practice, you may want to scale your data using z-normalization or detrend your data to reduce seasonality effects. It may help to utilize differencing, as well to remove trend information. Now that we have features and labels, this simply becomes a traditional supervised learning problem, and you can use your favorite ML library to train a model. Here is a simple example using sklearn:</p></div></div><div class="block-code"><div class="article-module h-c-page"><div class="h-c-grid uni-paragraph-wrap"><div class="uni-paragraph h-c-grid__col h-c-grid__col--8 h-c-grid__col-m--6 h-c-grid__col-l--6 h-c-grid__col--offset-2 h-c-grid__col-m--offset-3 h-c-grid__col-l--offset-3"><pre><code></code></pre></div></div></div></div><div class="block-paragraph"><div class="rich-text"><h2>Scale our model</h2><p>Let's imagine we want to put our model into production and automatically run it every week, using batch jobs, to get a better idea of future sales.Let’s also imagine we may want to forecast a model day-by-day.</p><p>Our data is stored in BigQuery, so let’s use the same logic that we used in Pandas to create features and labels, but instead run it at scale using BigQuery. We have developed a generalized Python function that creates a SQL string that lets you do this with BigQuery:</p></div></div><div class="block-code"><div class="article-module h-c-page"><div class="h-c-grid uni-paragraph-wrap"><div class="uni-paragraph h-c-grid__col h-c-grid__col--8 h-c-grid__col-m--6 h-c-grid__col-l--6 h-c-grid__col--offset-2 h-c-grid__col-m--offset-3 h-c-grid__col-l--offset-3"><pre><code></code></pre></div></div></div></div><div class="block-paragraph"><div class="rich-text"><p>We pass the table name that contains our data, the value name that we are interested in, the window size (which is the input sequence length), the horizon of how far ahead in time we skip between our features and our labels, and the labels_size (which is the output sequence length). Labels size is equal to 1 here because, for now, we are only modeling sequence-to-one—even though this data pipeline can handle sequence-to-sequence. Feel free to write your own sequence-to-sequence model to take full advantage of the data pipeline!</p>We can then execute the SQL string <code>scalable_time_series</code> in BigQuery. A sample of the output shows that each row is a different sequence. For each sequence, we can see the time ranges of the features and the labels. For the features, the timespan is 52 weeks, which is the <code>window_size</code>, and for labels it is one day, which is the <code>labels_size</code>.</div></div><div class="block-image_full_width"><div class="article-module h-c-page"><div class="h-c-grid"><figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 "><img alt="8_scalable_time_series.png" src="https://storage.googleapis.com/gweb-cloudblog-publish/images/8_scalable_time_series.max-1000x1000.png"/></figure></div></div></div><div class="block-paragraph"><div class="rich-text"><p>Looking at the same sampled rows, we can see how the training data is laid out. We have a column for each timestep of the previous price, starting with the farthest back in time on the left and moving forward. The last column is the label, the price one week ahead.</p></div></div><div class="block-image_full_width"><div class="article-module h-c-page"><div class="h-c-grid"><figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 "><img alt="9_price one week ahead.png" src="https://storage.googleapis.com/gweb-cloudblog-publish/images/9_price_one_week_ahead.max-1000x1000.png"/></figure></div></div></div><div class="block-paragraph"><div class="rich-text"><p>Now we have our data, ready for training, in a BigQuery table. Let’s take advantage of <a href="https://cloud.google.com/bigquery-ml/docs/bigqueryml-intro">BigQuery ML</a> and build a forecasting model using SQL.</p></div></div><div class="block-image_full_width"><div class="article-module h-c-page"><div class="h-c-grid"><figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 "><img alt="10_forecasting model using SQL.png" src="https://storage.googleapis.com/gweb-cloudblog-publish/images/10_forecasting_model_using_SQL.max-1000x1000.png"/></figure></div></div></div><div class="block-paragraph"><div class="rich-text"><p>Above we are creating a linear regression model using our 52 past price features and predicting our label <code>price_ahead_1</code>. This will create a BQML MODEL in our <code>bqml_forecasting</code> dataset.</p></div></div><div class="block-image_full_width"><div class="article-module h-c-page"><div class="h-c-grid"><figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 "><img alt="11_52 past price features.png" src="https://storage.googleapis.com/gweb-cloudblog-publish/images/11_52_past_price_features.max-1000x1000.png"/></figure></div></div></div><div class="block-paragraph"><div class="rich-text"><p>We can check how our model performed by calling <code>TRAINING_INFO</code>. This shows the training run index, iteration index, the training and eval loss at each iteration, the duration of the iteration, and the iteration's learning rate. Our model is training well since the eval loss is continually getting smaller for each iteration.</p></div></div><div class="block-image_full_width"><div class="article-module h-c-page"><div class="h-c-grid"><figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 "><img alt="12_TRAINING_INFO.png" src="https://storage.googleapis.com/gweb-cloudblog-publish/images/12_TRAINING_INFO.max-1000x1000.png"/></figure></div></div></div><div class="block-paragraph"><div class="rich-text"><p>We can also do an evaluation of our trained model by calling <code>EVALUATE</code>. This will show common evaluation metrics that we can use to compare our model with other models to find the best choice among all of our options.</p></div></div><div class="block-image_full_width"><div class="article-module h-c-page"><div class="h-c-grid"><figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 "><img alt="13_EVALUATE.png" src="https://storage.googleapis.com/gweb-cloudblog-publish/images/13_EVALUATE.max-1000x1000.png"/></figure></div></div></div><div class="block-paragraph"><div class="rich-text"><p>Lastly, machine learning is all about prediction. The training is just a means to an end. We can get our predictions by using the above query, where we have prepended predicted_ to the name of our label.</p><p>Now, let’s imagine that we want to run this model every week. We can easily create a batch job that is automatically executed using a <a href="https://cloud.google.com/bigquery/docs/scheduling-queries">scheduled query</a>.</p></div></div><div class="block-image_full_width"><div class="article-module h-c-page"><div class="h-c-grid"><figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 "><img alt="14_scheduled query.png" src="https://storage.googleapis.com/gweb-cloudblog-publish/images/14_scheduled_query.max-1000x1000.png"/></figure></div></div></div><div class="block-paragraph"><div class="rich-text"><p>Of course, if we want to build a more custom model, we can use TensorFlow or another machine library, while using this same data engineering approach to create our features and labels to be read into our custom machine learning model. This technique could possibly improve performance.</p></div></div><div class="block-image_full_width"><div class="article-module h-c-page"><div class="h-c-grid"><figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 "><img alt="15_custom machine learning model.png" src="https://storage.googleapis.com/gweb-cloudblog-publish/images/15_custom_machine_learning_model.max-1000x1000.png"/></figure></div></div></div><div class="block-paragraph"><div class="rich-text"><p>To use an ML framework like TensorFlow, we'll need to write the model code and also get our data in the right format to be read into our model. We can make a slight modification to the previous query we used for BigQuery ML so that the data will be amenable to the CSV file format. </p><p>For this example, imagine you wanted to build a sequence-to-sequence model in TensorFlow that can handle variable length features. One approach to achieve this would be to aggregate all the features into a single column named <code>med_sales_price_agg</code>, separated by semicolons. The features (if we have more than just this feature in the future) and the label are all separated by a comma.</p></div></div><div class="block-image_full_width"><div class="article-module h-c-page"><div class="h-c-grid"><figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 "><img alt="16_med_sales_price_agg.png" src="https://storage.googleapis.com/gweb-cloudblog-publish/images/16_med_sales_price_agg.max-1000x1000.png"/></figure></div></div></div><div class="block-paragraph"><div class="rich-text"><p>We'll execute the query in BigQuery and will make a table for train and eval. This will then get exported to CSV files in Cloud Storage. The diagram above is what one of the exported CSV files looks like—at least the header and the first line—with some comments added. Then when reading the data into our model using <a href="https://www.tensorflow.org/api_docs/python/tf/data">tf.data</a>, we will specify the delimiter pattern shown above to correctly parse the data.</p><p>Please check out our <a href="https://aihub.cloud.google.com/u/0/p/products%2F167a3129-a605-49eb-9f51-c9b32984c0b6">notebook</a> on AI Hub for an end-to-end example showing how this would work in practice and how to submit a training job on Google Cloud AI Platform. For model serving, the model can deployed on AI Platform or it can <a href="https://cloud.google.com/bigquery-ml/docs/making-predictions-with-imported-tensorflow-models">be deployed directly in BigQuery</a>. </p><h2>Conclusion</h2><p>That's it! The workflow we shared will allow you to automatically and quickly setup any time-series forecasting problem. Of course, this framework can also be adapted for a classification problem, like using a customer’s historical behavior to predict the probability of churn or to identify anomalous behavior over time. Regardless of the model you build, these approaches let you quickly build an initial model locally, then scale to the cloud using BigQuery.</p><p><i>Learn more about <a href="https://cloud.google.com/bigquery/">BigQuery</a> and <a href="https://cloud.google.com/ai-platform/">AI Platform</a>.</i></p></div></div></body></html></description><pubDate>Fri, 30 Aug 2019 13:00:00 -0000</pubDate><guid>https://cloud.google.com/blog/products/ai-machine-learning/how-to-quickly-solve-machine-learning-forecasting-problems-using-pandas-and-bigquery/</guid><category>BigQuery</category><category>Google Cloud Platform</category><category>AI & Machine Learning</category><media:content url="https://storage.googleapis.com/gweb-cloudblog-publish/images/DataAnalytics.max-600x600.png" width="540" height="540"></media:content><og xmlns:og="http://ogp.me/ns#"><type>article</type><title>How to quickly solve machine learning forecasting problems using Pandas and BigQuery</title><description>Learn how to quickly solve machine learning forecasting problems using Pandas, BigQuery, and Google Cloud AI Platform</description><image>https://storage.googleapis.com/gweb-cloudblog-publish/images/DataAnalytics.max-600x600.png</image><site_name>Google</site_name><url>https://cloud.google.com/blog/products/ai-machine-learning/how-to-quickly-solve-machine-learning-forecasting-problems-using-pandas-and-bigquery/</url></og><author xmlns:author="http://www.w3.org/2005/Atom"><name>Chris Rawles</name><title>ML Solutions Engineer</title><department></department><company></company></author><author xmlns:author="http://www.w3.org/2005/Atom"><name>Ryan Gillard</name><title>Machine Learning Solutions Engineer</title><department></department><company></company></author></item><item><title>Expanding your patent set with ML and BigQuery</title><link>https://cloud.google.com/blog/products/data-analytics/expanding-your-patent-set-with-ml-and-bigquery/</link><description><html><head></head><body><div class="block-paragraph"><div class="rich-text"><p>Patents protect unique ideas and intellectual property. Patent landscaping is an analytical approach commonly used by corporations, patent offices, and academics to better understand the potential technical coverage of a large number of patents where manual review (i.e., actually reading the patents) is not feasible due to time or cost constraints. Luckily, patents contain rich information, including metadata (examiner-supplied classification codes, citations, dates, and information about the patent applicant), images, and thousands of words of descriptive text, which enable the use of more advanced methodological techniques to augment manual review.</p><p>Patent landscaping techniques have improved as machine learning models have increased practitioners’ ability to analyze all this data. Here on Google’s Global Patents Team, we’ve developed a new patent landscaping methodology that uses Python and BigQuery on Google Cloud to allow you to easily access patent data and generate automated landscapes.</p><p>There are some important concepts to know as you’re getting started with patent landscaping. Machine learning (ML) landscaping methods that use these sources of information generally fall into one of two categories: </p><ul><li><b>Unsupervised</b>: Given a portfolio of patents about which the user knows no prior information, then utilize an unsupervised algorithm to generate topic clusters to provide users a better high-level overview of what that portfolio contains.</li><li><b>Supervised</b>: Given a seed set of patents about which the user is confident covers a specific technology, then identify other patents among a given set that are likely to relate to the same technology. </li></ul><p>The focus of this post is on supervised patent landscaping, which tends to have more impact and be commonly used across industries, such as:</p><ul><li><p><b>Corporations</b> that have highly curated seed sets of patents that they own and wish to identify patents with similar technical coverage owned by other entities. That may aid various strategic initiatives, including targeted acquisitions and cross-licensing discussions. </p></li><li><p><b>Patent offices</b> that regularly perform statistical analyses of filing trends in emerging technologies (like AI) for which the existing classification codes are not sufficiently nuanced. </p></li><li><p><b>Academics</b> who are interested in understanding how economic policy impacts patent filing trends in specific technology areas across industries. </p></li></ul><p>Whereas landscaping methods have historically relied on keyword searching and Boolean logic applied to the metadata, supervised landscaping methodologies are increasingly using advanced ML techniques to extract meaning from the actual full text of the patent, which contains far richer descriptive information than the metadata. Despite this recent progress, most supervised patent landscaping methodologies face at least one of these challenges:</p><ul><li><p>Lack of confidence scoring: Many approaches simply return a list of patents without indication of which are the most likely to actually be relevant to a specific technology space covered in the seed set. This means that a manual reviewer can’t prioritize the results for manual review, which is a common use of supervised landscapes. </p></li><li><p>Speed: Many approaches that use more advanced machine learning techniques are extremely slow, making them difficult to use on-demand. </p></li><li><p>Cost: Most existing tools are provided by for-profit companies that charge per analysis or as a recurring SaaS model, which is cost-prohibitive for many users. </p></li><li><p>Transparency: Most available approaches are proprietary, so the user cannot actually review the code or have full visibility into the methodologies and data inputs. </p></li><li><p>Lack of clustering: Many technology areas comprise multiple sub-categories that require a clustering routine to identify. Clustering the input set could formally group the sub-categories in a formulaic way that any downstream tasks could then make use of to more effectively rank and return results. Few (if any) existing approaches attempt to discern sub-categories within the seed set. </p></li></ul><p>The new patent landscaping methodology we’ve developed satisfies all of the common shortcomings listed above. This methodology uses Colab (Python) and GCP (BigQuery) to provide the following benefits:</p><ul><li><p>Fully transparent with all code and data publicly available, and provides confidence scoring of all results</p></li><li><p>Clusters patent data to capture variance within the seed set</p></li><li><p>Inexpensive, with sole costs incurring from GCP compute fee</p></li><li><p>Fast, hundreds or thousands of patents can be used as input with results returned in a few minutes</p></li></ul><p>Read on for a high-level overview of the methodology with code snippets. The complete code is found <a href="https://github.com/google/patents-public-data/blob/master/examples/patent_set_expansion.ipynb">here</a>, and can be reused and modified for your own ML and BigQuery projects. Finally, if you need an introduction to the <a href="https://console.cloud.google.com/marketplace/details/google_patents_public_datasets/google-patents-public-data">Google Public Patents Datasets</a>, a great overview is found <a href="https://cloud.google.com/blog/big-data/2017/10/google-patents-public-datasets-connecting-public-paid-and-private-patent-data">here</a>.</p><h2>Getting started with the patent landscaping methodology </h2><p><b>1. Select a seed set and a patent representation<br/></b>Generating a landscape first requires a seed set to be used as a starting point for the search. In order to produce a high-quality search, the input patents should themselves be closely related. More closely related seed sets tend to generate landscapes more tightly clustered around the same technical coverage, while a set of completely random patents will likely yield noisy and more uncertain results.</p><p>The input set could span a <a href="https://www.uspto.gov/web/patents/classification/cpc.html">Cooperative Patent Code (CPC)</a>, a technology, an assignee, an inventor, etc., or a specific list of patents covering some known technological area. In this walkthrough a term (word) is used to find a seed set. In the <a href="https://console.cloud.google.com/marketplace/details/google_patents_public_datasets/google-patents-public-data">Google Patents Public Datasets</a>, there is a “top terms” field available for all patents in the “google_patents_research.publications” table. The field contains 10 of the most important terms used in a patent. The terms can be unigrams (such as “aeroelastic,” “genotyping,” or “engine”) or bi-grams (such as “electrical circuit,” “background noise,” or “thermal conductivity”).</p><p>With a seed set selected, you’ll next need a representation of a patent suitable to be passed through an algorithm. Rather than using the entire text of a patent or discrete features of a patent, it’s more consumable to use an embedding for each patent. <a href="https://en.wikipedia.org/wiki/Word_embedding">Embeddings</a> are a learned representation of a data input through some type of model, often with a neural network architecture. They reduce the dimensionality of an input set by mapping the most important features of the inputs to a vector of continuous numbers. A benefit of using embeddings is the ability to calculate distances between them, since several distance measures between vectors exist.</p><p>You can find a set of patent embeddings in BigQuery. The patent embeddings were built using a machine learning model that predicted a patent's CPC code from its text. Therefore, the learned embeddings are a vector of 64 continuous numbers intended to encode the information in a patent's text. Distances between the embeddings can then be calculated and used as a measure of similarity between two patents. </p><p>In the following example query (performed in BigQuery), we’ve selected a random set of U.S. patents (and collected their embeddings) granted after Jan. 1, 2005, with a top term of "neural network."</p></div></div><div class="block-code"><div class="article-module h-c-page"><div class="h-c-grid uni-paragraph-wrap"><div class="uni-paragraph h-c-grid__col h-c-grid__col--8 h-c-grid__col-m--6 h-c-grid__col-l--6 h-c-grid__col--offset-2 h-c-grid__col-m--offset-3 h-c-grid__col-l--offset-3"><pre><code></code></pre></div></div></div></div><div class="block-paragraph"><div class="rich-text"><p><b>2. Organize the seed set<br/></b>With the input set determined and the embedding representations retrieved, you have a few options for determining similarity to the seed set of patents.</p><p>Let’s go through each of the options in more detail.</p><p>1. Calculating an overall embedding point—centroid, medoid, etc.— for the entire input set and performing similarity to that value. Under this method, one metric is calculated to represent the entire input set. That means that the input set of embeddings, which could contain information on hundreds or thousands of patents, ends up pared down to a single point. </p><p>There are drawbacks to any methodology that is dependent on one point. If the value itself is not well-selected, all results from the search will be poor. Furthermore, even if the point is well-selected, the search depends on only that one embedding point, meaning all search results may represent the same area of a topic, technology, etc. By reducing the entire set of inputs to one point, you’ll lose significant information about the input set.</p><p>2. Seed set x N similarity, e.g., calculating similarity to all patents in the input set to all other patents. Doing it this way means you apply the vector distance metric used between each patent in the input set and all other patents in existence. This method presents a few issues: </p><ul><li><p>Lack of tractability. Calculating similarity for (seed_set_size x all_patents) is an expensive solution in terms of time and compute. </p></li><li><p>Outliers in the input set are treated as equals to highly representative patents.</p></li><li><p>Dense areas around a single point could be overrepresented in the results.</p></li><li><p>Reusing the input points for similarity may fail to expand the input space.</p></li></ul><p>3. Clustering the input set and performing similarity to a cluster. We recommend clustering as the preferred approach to this problem, as it will overcome many of the issues presented by the other two methods. Using clustering, information about the seed set will be condensed into multiple representative points, with no point being an exact replica of its input. With multiple representative points, you can capture various parts of the input technology, features, etc. </p><p><b>3. Cluster the seed set<br/></b>A couple of notes about the embeddings on BigQuery:</p><ul><li><p>The embeddings are a vector of 64 numbers, meaning that data is high-dimensional.</p></li><li><p>As noted earlier, the embeddings were trained in a prediction task, not explicitly trained to capture the "distance" (difference) between patents.</p></li></ul><p>Based on the embedding training, the clustering algorithm needs to be able to effectively handle clusters of varying density. Since the embeddings were not trained to separate patents evenly, there will be areas of the embedding space that are more or less dense than others, yet represent similar information between documents.</p><p>Furthermore, with high-dimensional data, distance measures can degrade rapidly. One possible approach to overcoming the dimensionality is to use a secondary metric to represent the notion of distance. Rather than using absolute distance values, it’s been shown that a ranking of data points from their distances (and removing the importance of the distance magnitudes) will produce more stable results with higher dimensional data. So our clustering algorithm should remove sole dependence on absolute distance.</p><p>It’s also important that a clustering method be able to detect outliers. When providing a large set of input patents, you can expect that not all documents in the set will be reduced to a clear sub-grouping. When the clustering algorithm is unable to group data in a space, it should be capable of ignoring those documents and spaces. </p><p>Several clustering algorithms exist (<a href="https://en.wikipedia.org/wiki/Hierarchical_clustering">hierarchical</a>, <a href="https://en.wikipedia.org/wiki/Clique_percolation_method">clique-based</a>, <a href="https://hdbscan.readthedocs.io/en/latest/how_hdbscan_works.html">hdbscan</a>, etc.) that have the properties we require, any of which can be applied to this problem in place of the algorithm used here. In this application, we used the <a href="http://mlwiki.org/index.php/SNN_Clustering">shared nearest neighbor</a> (SNN) clustering method to determine the patent grouping. </p><p>SNN is a clustering method that evaluates the neighbors for each point in a dataset and compares the neighbors shared between points to find clusters. SNN is a useful clustering algorithm for determining clusters of varying density. It is good for high-dimensional data, since the explicit distance value is not used in its calculation; rather, it uses a ranking of neighborhood density. The complete clustering code is available in the <a href="https://github.com/google/patents-public-data/blob/master/examples/patent_set_expansion.ipynb">GitHub repo</a>.</p><p>For each cluster found, the SNN method determines a representative point for each cluster in order to perform a search against it. Two common approaches for representing geometric centers are centroids and medoids. The centroid simply takes the mean value from each of the 64 embedding dimensions. A medoid is the point in a cluster whose average dissimilarity to all objects in a cluster is minimized. In this walkthrough, we’re using the centroid method.</p><p>Below you’ll see a Python code snippet of the clustering application and calculations of some cluster characteristics, along with a visualization of the clustering results. The dimensions in the visualization were reduced using <a href="https://en.wikipedia.org/wiki/T-distributed_stochastic_neighbor_embedding">TSNE</a>, and outliers in the input set have grayed out. The results of the clustering can be seen by the like colors forming a cluster of patents:</p></div></div><div class="block-code"><div class="article-module h-c-page"><div class="h-c-grid uni-paragraph-wrap"><div class="uni-paragraph h-c-grid__col h-c-grid__col--8 h-c-grid__col-m--6 h-c-grid__col-l--6 h-c-grid__col--offset-2 h-c-grid__col-m--offset-3 h-c-grid__col-l--offset-3"><pre><code></code></pre></div></div></div></div><div class="block-image_full_width"><div class="article-module h-c-page"><div class="h-c-grid"><figure class="article-image--medium h-c-grid__col h-c-grid__col--4 h-c-grid__col--offset-4 "><img alt="Cluster the seed set.png" src="https://storage.googleapis.com/gweb-cloudblog-publish/images/Cluster_the_seed_set.max-1000x1000.png"/></figure></div></div></div><div class="block-paragraph"><div class="rich-text"><p><b>4. Perform a similarity search<br/></b>Once the cluster groups and their centers have been determined, you’ll need a measure of similarity between vectors. Several measures exist, and you can implement any preferred measure. In this example, we used cosine distances to find the similarity between two vectors.</p><p>Using the <a href="https://en.wikipedia.org/wiki/Cosine_similarity">cosine distance</a>, the similarity between a cluster center is compared to all other patents using each of their embeddings. Distance values close to zero mean that the patent is very similar to the cluster point, whereas distances close to one are very far from the cluster point. You’ll see the resulting similarity calculations ordered for each cluster and get an upper bound number of assets.</p><p>Below you’ll see a Python code snippet that iterates through each cluster. For each cluster, a query is performed in BigQuery that calculates the cosine distance between the cluster center and all other patents, and returns the most similar results to that cluster, like this:</p></div></div><div class="block-code"><div class="article-module h-c-page"><div class="h-c-grid uni-paragraph-wrap"><div class="uni-paragraph h-c-grid__col h-c-grid__col--8 h-c-grid__col-m--6 h-c-grid__col-l--6 h-c-grid__col--offset-2 h-c-grid__col-m--offset-3 h-c-grid__col-l--offset-3"><pre><code></code></pre></div></div></div></div><div class="block-paragraph"><div class="rich-text"><p><b>5. Apply confidence scoring<br/></b>The previous step returns the most similar results to each cluster along with its cosine distance values. From here, the final step takes properties of the cluster and the distance measure from the similarity results to create a confidence level for each result. There are multiple ways to construct a confidence function, and each method may have benefits to certain datasets. </p><p>In this walkthrough, we do the confidence scoring using a half squash function. The half squash function is formulated as follows:</p></div></div><div class="block-image_full_width"><div class="article-module h-c-page"><div class="h-c-grid"><figure class="article-image--medium h-c-grid__col h-c-grid__col--4 h-c-grid__col--offset-4 "><img alt="confidence scoring.png" src="https://storage.googleapis.com/gweb-cloudblog-publish/images/confidence_scoring.max-1000x1000.png"/></figure></div></div></div><div class="block-paragraph"><div class="rich-text"><p>The function takes as input the cosine distance value found between a patent and a cluster center (x). Furthermore, the function requires two parameters that affect how the distances of the results are fit onto the confidence scale:</p><ol><li><p>A power variable, which defines the properties of the distribution showing the distance results—effectively the slope of the curve. In this version, a power of two is used.</p></li><li><p>A half value, which represents the midpoint of the curve returned and defines the saturation on either side of the curve. In this implementation, each cluster uses its own half value. The half value for each cluster is formulated as follows:<br/>(mean distance of input patents in cluster + 2 * standard deviation of input cluster distances)</p></li></ol><p>The confidence scoring function effectively re-saturates the returned distance values to a scale between [0,1], with an exponentially decreasing value as the distance between a patent and the cluster center grows:</p></div></div><div class="block-code"><div class="article-module h-c-page"><div class="h-c-grid uni-paragraph-wrap"><div class="uni-paragraph h-c-grid__col h-c-grid__col--8 h-c-grid__col-m--6 h-c-grid__col-l--6 h-c-grid__col--offset-2 h-c-grid__col-m--offset-3 h-c-grid__col-l--offset-3"><pre><code></code></pre></div></div></div></div><div class="block-paragraph"><div class="rich-text"><p><b>Results from this patent landscaping methodology<br/></b>Applying the confidence function for all of the similarity search results yields a distribution of patents by confidence score. At the highest levels of confidence, fewer results will appear. As you move down the confidence distribution, the number of results increases exponentially.</p></div></div><div class="block-image_full_width"><div class="article-module h-c-page"><div class="h-c-grid"><figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 "><img alt="patent landscaping methodology.png" src="https://storage.googleapis.com/gweb-cloudblog-publish/images/patent_landscaping_methodology.max-1000x1000.png"/></figure></div></div></div><div class="block-paragraph"><div class="rich-text"><p>Not all results returned are guaranteed to be high-quality; however, the higher the confidence level, the more likely a result is positive. Depending on the input set, the confidence levels will not necessarily begin at 99%. From the results above, using our “neural network” random patent set, the highest confidence results sit in the 60% to 70% range. From our own experimentation, the more tightly related the input set, the higher the confidence level in the results will be, since the clusters will be more compact.</p><p>This walkthrough provides one method for expanding a set of patents to generate a landscape. Several changes or improvements can be made to the cluster algorithm, distance calculations and confidence functions to suit any dataset. Explore the <a href="https://cloud.google.com/blog/products/gcp/google-patents-public-datasets-connecting-public-paid-and-private-patent-data">patents dataset for yourself</a>, and try out GitHub for the <a href="https://github.com/google/patents-public-data/blob/master/examples/patent_set_expansion.ipynb">patent set expansion code</a> too.</p></div></div></body></html></description><pubDate>Fri, 30 Aug 2019 13:00:00 -0000</pubDate><guid>https://cloud.google.com/blog/products/data-analytics/expanding-your-patent-set-with-ml-and-bigquery/</guid><category>Google Cloud Platform</category><category>BigQuery</category><category>Data Analytics</category><media:content url="https://storage.googleapis.com/gweb-cloudblog-publish/images/Citrix-BlogHeader-r1_gSJYlNx.max-600x600.png" width="540" height="540"></media:content><og xmlns:og="http://ogp.me/ns#"><type>article</type><title>Expanding your patent set with ML and BigQuery</title><description>You can use BigQuery and Python to perform faster patent landscaping. Try it out with newly available code and the Google Patents Public Dataset.</description><image>https://storage.googleapis.com/gweb-cloudblog-publish/images/Citrix-BlogHeader-r1_gSJYlNx.max-600x600.png</image><site_name>Google</site_name><url>https://cloud.google.com/blog/products/data-analytics/expanding-your-patent-set-with-ml-and-bigquery/</url></og><author xmlns:author="http://www.w3.org/2005/Atom"><name>Rob Srebrovic</name><title>Data Scientist, Global Patents</title><department></department><company></company></author></item><item><title>New release of Cloud Storage Connector for Hadoop: Improving performance, throughput and more</title><link>https://cloud.google.com/blog/products/data-analytics/new-release-of-cloud-storage-connector-for-hadoop-improving-performance-throughput-and-more/</link><description><html><head></head><body><div class="block-paragraph"><div class="rich-text"><p>We're pleased to announce a new version of the <a href="https://github.com/GoogleCloudPlatform/bigdata-interop/releases/tag/v2.0.0">Cloud Storage Connector for Hadoop</a> (also known as GCS Connector), which makes it even easier to substitute your Hadoop Distributed File System (HDFS) with Cloud Storage. This new release can give you increased throughput efficiency for columnar file formats such as Parquet and ORC, isolation for Cloud Storage directory modifications, and overall big data workload performance improvements, like lower latency, increased parallelization, and intelligent defaults.</p></div></div><div class="block-image_full_width"><div class="article-module h-c-page"><div class="h-c-grid"><figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 "><img alt="Diagram 1.png" src="https://storage.googleapis.com/gweb-cloudblog-publish/images/Diagram_1.max-1000x1000.png"/></figure></div></div></div><div class="block-paragraph"><div class="rich-text"><p>The Cloud Storage Connector is an open source <a href="https://github.com/GoogleCloudPlatform/bigdata-interop/tree/v2.0.0/gcs">Java client library</a> that runs in Hadoop JVMs (like data nodes, mappers, reducers, Spark executors, and more) and allows your workloads to access Cloud Storage. The connector lets your big data open source software [such as Hadoop and Spark jobs, or the Hadoop Compatible File System (HCFS) CLI] read/write data directly to Cloud Storage, instead of to HDFS. Storing data in Cloud Storage has <a href="https://cloud.google.com/blog/products/storage-data-transfer/hdfs-vs-cloud-storage-pros-cons-and-migration-tips">several benefits</a> over HDFS: </p><ul><li><p>Significant cost reduction as compared to a long-running HDFS cluster with three replicas on persistent disks;</p></li><li><p>Separation of storage from compute, allowing you to grow each layer independently;</p></li><li><p>Persisting the storage even after Hadoop clusters are terminated;</p></li><li><p>Sharing Cloud Storage buckets between ephemeral Hadoop clusters;</p></li><li><p>No storage administration overhead, like managing upgrades and high availability for HDFS.</p></li></ul><p>The Cloud Storage Connector’s source code is completely open source and is supported by <a href="https://cloud.google.com/">Google Cloud Platform</a> (GCP). The connector comes pre-configured in <a href="https://cloud.google.com/dataproc/">Cloud Dataproc</a>, GCP’s managed Hadoop and Spark offering. However, it is also easily installed and fully supported for use in other Hadoop distributions such as <a href="https://mapr.com/support/s/article/Connecting-Google-Storage-bucket-from-MapR-host?language=en_US">MapR</a>, <a href="https://cloud.google.com/blog/products/storage-data-transfer/how-to-connect-clouderas-cdh-to-cloud-storage">Cloudera</a>, and <a href="https://community.hortonworks.com/articles/211804/accessing-google-cloud-storage-via-hdp.html">Hortonworks</a>. This makes it easy to migrate on-prem HDFS data to the cloud or burst workloads to GCP. </p><p>The open source aspect of the Cloud Storage Connector allowed <a href="https://twitter.com/twittereng">Twitter’s engineering team</a> to closely collaborate with us on the design, implementation, and productionizing of the fadvise and cooperative locking features at petabyte scale. </p><p><b>Cloud Storage Connector architecture<br/></b>Here’s a look at what the Cloud Storage Connector architecture looks like:</p></div></div><div class="block-image_full_width"><div class="article-module h-c-page"><div class="h-c-grid"><figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 "><img alt="Diagram 2.png" src="https://storage.googleapis.com/gweb-cloudblog-publish/images/Diagram_2.max-1000x1000.png"/></figure></div></div></div><div class="block-paragraph"><div class="rich-text"><p>Cloud Storage Connector is an <a href="https://github.com/GoogleCloudPlatform/bigdata-interop/blob/master/LICENSE">open source Apache 2.0</a> implementation of an <a href="http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/filesystem/introduction.html#Core_Expectations_of_a_Hadoop_Compatible_FileSystem">HCFS</a> interface for Cloud Storage. Architecturally, it is composed of four major components:</p><ul><li><p><a href="https://github.com/GoogleCloudPlatform/bigdata-interop/tree/v2.0.0/gcs">gcs</a>—implementation of the <a href="https://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/FileSystem.html">Hadoop Distributed File System</a> and input/output channels</p></li><li><p><a href="https://github.com/GoogleCloudPlatform/bigdata-interop/tree/v2.0.0/util-hadoop">util-hadoop</a>—common (authentication, authorization) Hadoop-related functionality shared with other Hadoop connectors</p></li><li><p><a href="https://github.com/GoogleCloudPlatform/bigdata-interop/tree/v2.0.0/gcsio">gcsio</a>—high-level abstraction of <a href="https://cloud.google.com/storage/docs/json_api/">Cloud Storage JSON API</a></p></li><li><p><a href="https://github.com/GoogleCloudPlatform/bigdata-interop/tree/v2.0.0/util">util</a>—utility functions (error handling, HTTP transport configuration, etc.) used by gcs and gcsio components</p></li></ul><p>In the following sections, we highlight a few of the major features in this new release of Cloud Storage Connector. For a full list of settings and how to use them, check out the newly published <a href="https://github.com/GoogleCloudPlatform/bigdata-interop/blob/v2.0.0/gcs/CONFIGURATION.md">Configuration Properties</a> and <a href="https://github.com/GoogleCloudPlatform/bigdata-interop/blob/v2.0.0/gcs/conf/gcs-core-default.xml">gcs-core-default.xml</a> settings pages.</p><p>Here are the key new features of the Cloud Storage Connector:</p><p><b>Improved performance for Parquet and ORC columnar formats<br/></b>As part of Twitter’s <a href="https://blog.twitter.com/engineering/en_us/topics/infrastructure/2019/the-start-of-a-journey-into-the-cloud.html">migration of Hadoop to Google Cloud</a>, in mid-2018 Twitter started testing big data SQL queries against columnar files in Cloud Storage at massive scale, against a 20+ PB dataset. Since the Cloud Storage Connector is open source, Twitter prototyped the use of range requests to read only the columns required by the query engine, which increased read efficiency. We incorporated that work into a more generalized fadvise feature. </p><p>In previous versions of the Cloud Storage Connector, reads were optimized for MapReduce-style workloads, where all data in a file was processed sequentially. However, modern columnar file formats such as Parquet or ORC are designed to support predicate pushdown, allowing the big data engine to intelligently read only the chunks of the file (columns) that are needed to process the query. The Cloud Storage Connector now fully supports predicate pushdown, and only reads the bytes requested by the compute layer. This is done by introducing a technique known as fadvise. </p><p>You may already be familiar with the <a href="http://man7.org/linux/man-pages/man2/posix_fadvise.2.html">fadvise feature in Linux</a>. Fadvise allows applications to provide a hint to the Linux kernel with the intended I/O access pattern, indicating how it intends to read a file, whether for sequential scans or random seeks. This lets the kernel choose appropriate read-ahead and caching techniques to increase throughput or reduce latency.</p><p>The new fadvise feature in Cloud Storage Connector implements a similar functionality and automatically detects (in default auto mode) whether the current big data application’s I/O access pattern is sequential or random.</p><p>In the default auto mode, fadvise starts by assuming a sequential read pattern, but then switches to random mode upon detection of a backward seek or long forward seek. These seeks are performed by the <a href="https://docs.oracle.com/javase/8/docs/api/java/nio/channels/SeekableByteChannel.html#position-long-"><code>position()</code></a> method call and can change the current channel position backward or forward. Any backward seek triggers the mode change to random; however, a forward seek needs to be greater than 8 MB (configurable via <code>fs.gs.inputstream.inplace.seek.limit</code>). The read pattern transition (from sequential to random) in fadvise’s auto mode is stateless and gets reset for each new file read session.</p><p>Fadvise can be configured via the <a href="https://github.com/GoogleCloudPlatform/bigdata-interop/blob/v2.0.0/gcs/conf/gcs-core-default.xml">gcs-core-default.xml</a> file with the <a href="https://github.com/GoogleCloudPlatform/bigdata-interop/blob/v2.0.0/gcs/CONFIGURATION.md#fadvise-feature-configuration"><code>fs.gs.inputstream.fadvise parameter</code></a>:</p><ul><li><p>AUTO (default), also called adaptive range reads—In this mode, the connector starts in SEQUENTIAL mode, but switches to RANDOM as soon as the first backward or forward read is detected that’s greater than <code>fs.gs.inputstream.inplace.seek.limit</code> bytes (8 MiB by default).</p></li><li><p>RANDOM—The connector will send bounded range requests to Cloud Storage; Cloud Storage read-ahead will be disabled.</p></li><li><p>SEQUENTIAL—The connector will send a single, unbounded streaming request to Cloud Storage to read an object from a specified position sequentially.</p></li></ul><p>In most use cases, the default setting of AUTO should be sufficient. It dynamically adjusts the mode for each file read. However, you can hard-set the mode.</p><p>Ideal use cases for fadvise in RANDOM mode include:</p><ul><li><p>SQL (Spark SQL, Presto, Hive, etc.) queries into columnar file formats (Parquet, ORC, etc.) in Cloud Storage</p></li><li><p>Random lookups by a database system (HBase, Cassandra, etc.) to storage files (HFile, SSTables) in Cloud Storage</p></li></ul><p>Ideal use cases for fadvise in SEQUENTIAL mode include:</p><ul><li><p>Traditional MapReduce jobs that scan entire files sequentially</p></li><li><p>DistCp file transfers</p></li></ul><p><b>Cooperative locking: Isolation for Cloud Storage directory modifications<br/></b>Another major addition to Cloud Storage Connector is cooperative locking, which isolates directory modification operations performed through the <a href="https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/FileSystemShell.html">Hadoop file system shell</a> (hadoop fs command) and other HCFS API interfaces to Cloud Storage.</p><p>Although Cloud Storage is <a href="https://cloud.google.com/storage/docs/consistency">strongly consistent</a> at the object level, it does not natively support directory semantics. For example, what should happen if two users issue conflicting commands (delete vs. rename) to the same directory? In HDFS, such directory operations are atomic and consistent. So <a href="https://twitter.com/Joep">Joep Rottinghuis</a>, leading the <a href="https://twitter.com/twitterhadoop">@TwitterHadoop</a> team, worked with us to implement cooperative locking in Cloud Storage Connector. This feature prevents data inconsistencies during conflicting directory operations to Cloud Storage, facilitates recovery of any failed directory operations, and simplifies operational migration from HDFS to Cloud Storage.</p><p>With cooperative locking, concurrent directory modifications that could interfere with each other, like a user deleting a directory while another user is trying to rename it, are safeguarded. Cooperative locking also supports recovery of failed directory modifications (where a JVM might have crashed mid-operation), via the FSCK command, which can resume or roll back the incomplete operation.</p><p>With this cooperative locking feature, you can now perform isolated directory modification operations, using the <code>hadoop fs</code> commands as you normally would to move or delete a folder:</p></div></div><div class="block-code"><div class="article-module h-c-page"><div class="h-c-grid uni-paragraph-wrap"><div class="uni-paragraph h-c-grid__col h-c-grid__col--8 h-c-grid__col-m--6 h-c-grid__col-l--6 h-c-grid__col--offset-2 h-c-grid__col-m--offset-3 h-c-grid__col-l--offset-3"><pre><code></code></pre></div></div></div></div><div class="block-paragraph"><div class="rich-text"><p>To recover failed directory modification operations performed with enabled Cooperative Locking, use the included FSCK tool:</p></div></div><div class="block-code"><div class="article-module h-c-page"><div class="h-c-grid uni-paragraph-wrap"><div class="uni-paragraph h-c-grid__col h-c-grid__col--8 h-c-grid__col-m--6 h-c-grid__col-l--6 h-c-grid__col--offset-2 h-c-grid__col-m--offset-3 h-c-grid__col-l--offset-3"><pre><code></code></pre></div></div></div></div><div class="block-paragraph"><div class="rich-text"><p>This command will recover (roll back or roll forward) all failed directory modification operations, based on the operation log.</p><p>The cooperative locking feature is intended to be used by human operators when modifying Cloud Storage directories through the <code>hadoop fs</code> interface. Since the underlying Cloud Storage system does not support locking, this feature should be used cautiously for use cases beyond directory modifications. (such as when a MapReduce or Spark job modifies a directory).</p><p>Cooperative locking is disabled by default. To enable it, either set <code>fs.gs.cooperative.locking.enable</code> Hadoop property to true in core-site.xml:</p></div></div><div class="block-code"><div class="article-module h-c-page"><div class="h-c-grid uni-paragraph-wrap"><div class="uni-paragraph h-c-grid__col h-c-grid__col--8 h-c-grid__col-m--6 h-c-grid__col-l--6 h-c-grid__col--offset-2 h-c-grid__col-m--offset-3 h-c-grid__col-l--offset-3"><pre><code></code></pre></div></div></div></div><div class="block-paragraph"><div class="rich-text"><p>or specify it directly in your <code>hadoop fs</code> command:</p></div></div><div class="block-code"><div class="article-module h-c-page"><div class="h-c-grid uni-paragraph-wrap"><div class="uni-paragraph h-c-grid__col h-c-grid__col--8 h-c-grid__col-m--6 h-c-grid__col-l--6 h-c-grid__col--offset-2 h-c-grid__col-m--offset-3 h-c-grid__col-l--offset-3"><pre><code></code></pre></div></div></div></div><div class="block-paragraph"><div class="rich-text"><p><b>How cooperative locking works<br/></b>Here’s what a directory move with cooperative locking looks like:</p></div></div><div class="block-image_full_width"><div class="article-module h-c-page"><div class="h-c-grid"><figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 "><img alt="Diagram 3B.png" src="https://storage.googleapis.com/gweb-cloudblog-publish/images/Diagram_3B.max-1000x1000.png"/></figure></div></div></div><div class="block-paragraph"><div class="rich-text"><p>Cooperative Locking is implemented via atomic lock acquisition in the lock file (<code>_lock/all.lock</code>) using <a href="https://cloud.google.com/storage/docs/generations-preconditions#_Preconditions">Cloud Storage preconditions</a>. Before each directory modification operation, the Cloud Storage Connector atomically acquires a lock in this bucket-wide lock file.</p><p>Additional operational metadata is stored in <code>*.lock</code> and <code>*.log</code> files in the <code>_lock</code> directory at the root of the Cloud Storage bucket. Operational files (a list of files to modify) are stored in a per-operation <code>*.log</code> file and additional lock metadata in per-operation <code>*.lock</code> file. This per-operation lock file is used for lock renewal and checkpointing operation progress.</p><p>The acquired lock will automatically expire if it is not periodically renewed by the client. The timeout interval can be modified via the <a href="https://github.com/GoogleCloudPlatform/bigdata-interop/blob/v2.0.0/gcs/CONFIGURATION.md#cooperative-locking-feature-configuration"><code>fs.gs.cooperative.locking.expiration.timeout.ms</code></a> setting.</p><p>Cooperative locking supports isolation of directory modification operations only in the same Cloud Storage bucket, and does not support directory moves across buckets.</p><p><b>Note</b>: Cooperative locking is a Cloud Storage Connector feature, and it is not implemented by gsutil, Object Lifecycle Management or applications directly using the Cloud Storage API.</p><p><b>General performance improvements to Cloud Storage Connector<br/></b>In addition to the above features, there are many other performance improvements and optimizations in this Cloud Storage Connector release. For example:</p><ul><li><p><b>Directory modification parallelization</b>, in addition to using batch request, the Cloud Storage Connector executes Cloud Storage batches in parallel, reducing the rename time for a directory with 32,000 files from 15 minutes to 1 minute, 30 seconds.</p></li><li><p><b>Latency optimizations</b> by decreasing the necessary Cloud Storage requests for high-level Hadoop file system operations.</p></li><li><p><b>Concurrent glob algorithms</b> (regular and flat glob) execution to yield the best performance for all use cases (deep and broad file trees).</p></li><li><p><b>Repair <a href="https://github.com/GoogleCloudPlatform/bigdata-interop/blob/v2.0.0/gcs/CONFIGURATION.md#general-configuration">implicit directories</a> during delete and rename operations</b> instead of list and glob operations, reducing latency of expensive list and glob operations, and eliminating the need for write permissions for read requests.</p></li><li><p><b>Cloud Storage <a href="https://github.com/GoogleCloudPlatform/bigdata-interop/blob/v2.0.0/gcs/CONFIGURATION.md#io-configuration">read consistency</a></b>to allow requests of the same Cloud Storage object version, preventing reading of different object versions and improving performance.</p></li></ul><p>You can upgrade to the new version of Cloud Storage Connector using the <a href="https://github.com/GoogleCloudPlatform/dataproc-initialization-actions/tree/master/connectors">connectors initialization action</a> for existing Cloud Dataproc versions. It will become standard starting in Cloud Dataproc version 2.0.</p><p><i>Thanks to contributors to the design and development of the new release of Cloud Storage Connector, in no particular order: Joep Rottinghuis, Lohit Vijayarenu, Hao Luo and Yaliang Wang from the Twitter engineering team.</i></p></div></div></body></html></description><pubDate>Fri, 30 Aug 2019 13:00:00 -0000</pubDate><guid>https://cloud.google.com/blog/products/data-analytics/new-release-of-cloud-storage-connector-for-hadoop-improving-performance-throughput-and-more/</guid><category>Storage & Data Transfer</category><category>Google Cloud Platform</category><category>Data Analytics</category><media:content url="https://storage.googleapis.com/gweb-cloudblog-publish/images/containers.max-600x600.png" width="540" height="540"></media:content><og xmlns:og="http://ogp.me/ns#"><type>article</type><title>New release of Cloud Storage Connector for Hadoop: Improving performance, throughput and more</title><description>The latest release of the Google Cloud Storage Connector for Hadoop makes it even easier to substitute your HDFS with Cloud Storage for high performance.</description><image>https://storage.googleapis.com/gweb-cloudblog-publish/images/containers.max-600x600.png</image><site_name>Google</site_name><url>https://cloud.google.com/blog/products/data-analytics/new-release-of-cloud-storage-connector-for-hadoop-improving-performance-throughput-and-more/</url></og><author xmlns:author="http://www.w3.org/2005/Atom"><name>Igor Dvorzhak</name><title>Software Engineer</title><department></department><company></company></author><author xmlns:author="http://www.w3.org/2005/Atom"><name>Sameer Farooqui</name><title>Cloud Data Engineer</title><department></department><company></company></author></item><item><title>Now in beta: Managed Service for Microsoft Active Directory (AD)</title><link>https://cloud.google.com/blog/products/identity-security/now-in-beta-managed-service-for-microsoft-active-directory-ad/</link><description><html><head></head><body><div class="block-paragraph"><div class="rich-text"><p>In April at Google Cloud Next ’19, we <a href="https://cloud.google.com/blog/products/identity-security/simplifying-identity-and-access-management-of-your-employees-partners-and-customers">announced</a> Managed Service for Microsoft Active Directory (AD) to help you manage AD-dependent workloads that run in the cloud, automate AD server maintenance and security configuration, and connect your on-premises AD domain to the cloud. Managed Service for Microsoft AD is now available in public beta. </p><h2>Simplifying Active Directory management</h2><p>As more AD-dependent apps and servers move to the cloud, IT and security teams face heightened challenges to meet latency and security goals, on top of the typical maintenance challenges of configuring and securing AD Domain Controllers. While you can deploy a fault-tolerant AD environment in GCP <a href="https://cloud.google.com/solutions/deploy-fault-tolerant-active-directory-environment">on your own</a>, we believe there’s an easier way that gives you time to focus on more impactful projects.</p></div></div><div class="block-image_full_width"><div class="article-module h-c-page"><div class="h-c-grid"><figure class="article-image--medium h-c-grid__col h-c-grid__col--4 h-c-grid__col--offset-4 "><img alt="GCP Active Directory management.png" src="https://storage.googleapis.com/gweb-cloudblog-publish/images/GCP_Active_Directory_management.max-1000x1000.png"/></figure></div></div></div><div class="block-paragraph"><div class="rich-text"><p><a href="https://cloud.google.com/managed-microsoft-ad/">Managed Service for Microsoft AD</a> is a highly available, hardened Google Cloud service that delivers the following benefits:</p><ul><li><p>Actual Microsoft AD. The service runs real Microsoft AD Domain Controllers, so you don’t have to worry about application compatibility. You can use standard Active Directory features such as Group Policy, and familiar administration tools such as Remote Server Administration Tools (RSAT), to manage the domain. </p></li><li><p>Virtually maintenance-free. The service is highly available, automatically patched, configured with secure defaults, and protected by appropriate network firewall rules.</p></li><li><p>Seamless multi-region deployment. You can deploy the service in a specific region to allow your apps and VMs in the same or other regions access the domain over a low-latency <a href="https://cloud.google.com/vpc/">Virtual Private Cloud (VPC)</a>. As your infrastructure needs grow, you can simply expand the service to additional regions while continuing to use the same managed AD domain.</p></li></ul><p>Hybrid identity. You can <a href="https://cloud.google.com/solutions/patterns-for-using-active-directory-in-a-hybrid-environment">connect</a> your on-premises AD domain to Google Cloud or deploy a standalone domain for your cloud-based workloads.</p></div></div><div class="block-image_full_width"><div class="article-module h-c-page"><div class="h-c-grid"><figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 "><img alt="Managed Service for Microsoft AD admin experience.png" src="https://storage.googleapis.com/gweb-cloudblog-publish/images/Managed_Service_for_Microsoft_AD_admin_exp.max-1000x1000_mDS1sXa.png"/><figcaption class="article-image__caption "><div class="rich-text"><i>Managed Service for Microsoft AD admin experience</i></div></figcaption></figure></div></div></div><div class="block-paragraph"><div class="rich-text"><p>Customers and partners have already been using Managed Service for Microsoft AD for their AD-dependent applications and VMs. Use cases include automatically “domain joining” new Windows VMs by integrating the service with <a href="https://cloud.google.com/dns/">Cloud DNS</a>, hardening Windows VMs by applying Group Policy Objects (GPOs), and controlling Remote Desktop Protocol (RDP) access through GPOs. </p><p><a href="https://www.dunnhumby.com/">dunnhumby</a>, a customer data science platform, has been evaluating the service over the last few months. "We have been helping customers to better understand their customers for over 30 years," said Andrew Baird, Infrastructure Engineer, dunnhumby. "With Managed Service for Microsoft AD, we can now offload some of the AD management and security tasks, so we can focus on our main job—our customers."</p><p><a href="https://www.citrix.com/">Citrix</a> has also been evaluating the service to reduce the management overhead for their services that run on GCP. "Citrix Virtual Apps and Desktops service orchestrates customer workloads which run on a managed fleet of “VDA” instances on GCP. For the AD-related operations of these Citrix products, we found infrastructure deployment was significantly simplified with Google Cloud's managed services, especially Managed Service for Microsoft Active Directory," said Harsh Gupta, Director Product Management, Citrix.</p><h2>Getting started</h2><p>Managed Service for Microsoft AD is available in public beta. To get started, check out the <a href="https://cloud.google.com/managed-microsoft-ad/">product page</a> to sign up for beta, read the <a href="https://cloud.google.com/managed-microsoft-ad/docs/">documentation</a>, and watch the latest <a href="https://cloudonair.withgoogle.com/events/security-talks-august/watch?talk=microsoft-ad-mangement">webinar</a>.</p></div></div></body></html></description><pubDate>Thu, 29 Aug 2019 16:00:00 -0000</pubDate><guid>https://cloud.google.com/blog/products/identity-security/now-in-beta-managed-service-for-microsoft-active-directory-ad/</guid><category>Google Cloud Platform</category><category>Cloud Migration</category><category>Identity & Security</category><media:content url="https://storage.googleapis.com/gweb-cloudblog-publish/images/GCP_Identity_Security.max-600x600.jpg" width="540" height="540"></media:content><og xmlns:og="http://ogp.me/ns#"><type>article</type><title>Now in beta: Managed Service for Microsoft Active Directory (AD)</title><description>Managed Service for Microsoft Active Directory (AD) is now available in public beta.</description><image>https://storage.googleapis.com/gweb-cloudblog-publish/images/GCP_Identity_Security.max-600x600.jpg</image><site_name>Google</site_name><url>https://cloud.google.com/blog/products/identity-security/now-in-beta-managed-service-for-microsoft-active-directory-ad/</url></og><author xmlns:author="http://www.w3.org/2005/Atom"><name>Siddharth Bhai</name><title>Product Manager, Google Cloud</title><department></department><company></company></author></item><item><title>The Speed Read with Quentin Hardy: Keep it simple</title><link>https://cloud.google.com/blog/topics/speed-read/the-speed-read-with-quentin-hardy-keep-it-simple/</link><description><html><head></head><body><div class="block-paragraph"><div class="rich-text"><p><i><b>Editor’s note:</b>The Speed Read is a column authored by Google Cloud’s Quentin Hardy, examining important themes and hot topics in cloud computing. It previously existed as an email newsletter. Today, we’re thrilled to welcome it to its new home on the Cloud blog.</i></p><p>Some things in modern enterprise technology are a good deal harder to understand than they need to be. It is a great moment when we’re able to change that. </p><p>Take cloud services, for example. Microservices and service meshes are cloud technologies that will be important in your business life, and they are not all that strange. In fact, the mere concept of them should be familiar. They are really, really powerful as simplifiers that make innovation at scale possible. </p><p>Welcome to The Speed Read, “positive simplifier” edition. </p><p>As with many things in business, the secret to understanding these cloud computing technologies and techniques lies in establishing how their rise relates to supply and demand, the most fundamental elements of any market. With business technology, it’s also good to search for ways that an expensive and cumbersome process is being automated to hasten the delivery of value.</p><p>But what does this have to do with cloud services? At the first technology level, microservices are parts of a larger software application that can be decoupled from the whole and updated without having to break out and then redeploy the whole thing. Service meshes control how these parts interact, both with each other and other services. These complex tools exist with a single great business purpose in mind: to create reusable efficiency.</p><p>Think of each microservice as a tool from a toolbox. At one time, tools were custom made, and were used to custom make machines. For the most part, these machines were relatively simple, because they were single devices, no two alike, and that limited the building and the fixing of them. </p><p>Then with standardized measurement and industrial expansion, we got precision-made machine tools, capable of much more re-use and wider deployment. Those standardized machine tools were more complex than their predecessors. And they enabled a boom in standardized re-use, a simpler model overall.</p><p>The same goes with microservices—the piece parts are often more complex, but overall the process allows for standardized reuse through the management of service meshes. The “tool” in this case is software that carries out a function—doing online payments, say, or creating security verifications. </p><p>Extrapolating from this analogy, does the boom in microservices tell us that the computational equivalent of the Industrial Revolution is underway? Is this an indication of standardization that makes it vastly easier to create objects and experiences, revolutionizes cost models, and shifts industries and fortunes?</p><p>Without getting too grandiose about it, yeah.</p><p>You see it around you, in the creation of companies that come out of nowhere to invent and capture big markets, or in workforce transformations that allow work and product creation to be decoupled, much the way microservices are decouplings from larger applications. Since change is easier, you see it in the importance of data to determine how things are consumed, and in rapidly reconfiguring how things are made and what is offered. </p><p>Perhaps most important for readers like you is that you see it in the way businesses are re-evaluating how they apportion and manage work. Nothing weird about that; we do it all the time.</p><p>It is understandable how the complexity of tech generates anxiety among many of its most promising consumers. Typically a feature of business computing evolves from scarce and difficult knowledge. Its strength and utility makes it powerful, often faster than software developers can socialize it, or the general public can learn. Not that long ago, spreadsheets and email were weird too, for these reasons. </p><p>To move ahead, though, it’s important to recognize big, meaningful changes, and abstract their meaning into something logical and familiar. At a granular level, microservices may be complex, but their function is very straightforward: standardize in order to clear space for innovation.</p></div></div></body></html></description><pubDate>Thu, 29 Aug 2019 13:00:00 -0000</pubDate><guid>https://cloud.google.com/blog/topics/speed-read/the-speed-read-with-quentin-hardy-keep-it-simple/</guid><category>Google Cloud Platform</category><category>Inside Google Cloud</category><category>The Speed Read</category><media:content url="https://storage.googleapis.com/gweb-cloudblog-publish/images/SpeedRead_Aug.max-600x600.jpg" width="540" height="540"></media:content><og xmlns:og="http://ogp.me/ns#"><type>article</type><title>The Speed Read with Quentin Hardy: Keep it simple</title><description>Quentin describes how microservices and service meshes are powerful simplifiers that make innovation at scale possible.</description><image>https://storage.googleapis.com/gweb-cloudblog-publish/images/SpeedRead_Aug.max-600x600.jpg</image><site_name>Google</site_name><url>https://cloud.google.com/blog/topics/speed-read/the-speed-read-with-quentin-hardy-keep-it-simple/</url></og><author xmlns:author="http://www.w3.org/2005/Atom"><name>Quentin Hardy</name><title>Head of Editorial, Google Cloud</title><department></department><company></company></author></item><item><title>How Worldline puts APIs at the heart of payments services</title><link>https://cloud.google.com/blog/products/api-management/how-worldline-puts-apis-at-the-heart-of-payments-services/</link><description><html><head></head><body><div class="block-paragraph"><div class="rich-text"><p><i><b>Editor’s note:</b>Today we hear from <a href="https://worldline.com/">Worldline</a>, a financial services organization that creates and operates digital platforms handling billions of critical transactions between companies, partners, and customers every year. In this post, Wordline head of alliances and partnerships Michaël Petiot and head of API platform support Tanja Foing explain how APIs and API management enable this €2.3 billion enterprise to offer its services to partners in a wide variety of industries.</i></p><p><a href="https://worldline.com/">Worldline</a> is the European leader in the payment and transactional services industry, with activities organized around three axes: merchant services, financial services including <a href="https://equensworldline.com/">equensWorldline</a>, and mobility and e-transactional services. In order to be more agile, we’re undergoing a transformation in how we work internally and with our partners, putting APIs at the heart of how we’re connecting with everyone.</p><p><b>Leveraging APIs for third-party collaboration<br/></b>Like most companies, Worldline collaborates more and more with third parties to deliver the products and services our customers expect. We want to move faster, and open up our platforms to partners who can develop new use cases in payments and customer engagement. To meet evolving technology, business, and regulatory demands for connecting our ecosystem of partners and developers, we needed a robust API platform. It was especially important to us that third parties could connect easily and securely to our platform. </p><p>We chose Google Cloud’s <a href="https://cloud.google.com/apigee/">Apigee</a> API management platform as our company-wide standard. Initially, we leaned toward an open source tool, but Apigee won us over, thanks to its complete feature set, available right out of the box. The Apigee security and analytics features are particularly important to us because of our collaboration with banking and fintech customers and partners. </p><p><b>Developing bespoke customer solutions<br/></b>Our first three API use cases include: digital banking, connected cars, and an internal developer platform. </p><p>Banks need their data to be properly categorized and highly secure, and Apigee gives us the tools to provide the right environment for them. Leveraging Apigee, our <a href="https://worldline.com/en/home/solutions/financial-services-equensworldline/m-digital-banking-platform.html">digital banking solution</a> offers a dedicated developer portal for our customers in a separate environment. It has its own architecture to access back-end services as well. With functionality ranging from trusted authentication to contract completion, payments, and contact management, Worldline digital banking customers can tap into APIs to interact with us at every stage. </p><p>An important trend in transport and logistics is the integration of real-time data with third parties. Our <a href="https://worldline.com/en/home/solutions/mobility-and-e-transactional-services/connected-living-solutions/connected-car.html">Connected Car</a> offering is a white-label solution that provides APIs for a car manufacturer’s fleet of cars. This offering enables fleet owners to exchange data with their entire ecosystem. It also offers a relatively closed environment with a limited number of developers accessing it, and we expose these APIs via the Apigee gateway. We use Apigee analytics features to track how the APIs are used and how they’re performing, and then make changes as needed. </p><p>Our third use case is internal; we’re building a developer portal in order to make APIs easier to access and quicker to deploy.</p><p>Our partner ecosystem includes lessors, insurance companies, repair shops, logistics companies and end-users. Everyone benefits from advanced APIs for real-time secure exchanges, combined with open-exchange protocols such as the Remote Fleet Management Systems standard (used by truck manufacturers) in order to provide the best service to customers.</p><p>We recently presented to the Worldline product management community how we can scale up to a large portfolio of API solutions using Apigee as an accelerator. the presentation was a success, and illustrates how we can leverage the platform as a tool for driving innovation throughout Worldline—and throughout our growing ecosystem of automotive and financial services customers</p></div></div></body></html></description><pubDate>Thu, 29 Aug 2019 13:00:00 -0000</pubDate><guid>https://cloud.google.com/blog/products/api-management/how-worldline-puts-apis-at-the-heart-of-payments-services/</guid><category>Google Cloud Platform</category><category>Apigee</category><category>Financial Services</category><category>Customers</category><category>API Management</category><media:content url="https://storage.googleapis.com/gweb-cloudblog-publish/images/GCP_Financial_Services.max-600x600.jpg" width="540" height="540"></media:content><og xmlns:og="http://ogp.me/ns#"><type>article</type><title>How Worldline puts APIs at the heart of payments services</title><description>Financial services organization Worldline shares how API management enable it to offer its services to partners in a wide variety of industries.</description><image>https://storage.googleapis.com/gweb-cloudblog-publish/images/GCP_Financial_Services.max-600x600.jpg</image><site_name>Google</site_name><url>https://cloud.google.com/blog/products/api-management/how-worldline-puts-apis-at-the-heart-of-payments-services/</url></og><author xmlns:author="http://www.w3.org/2005/Atom"><name>Michael Petiot</name><title>Worldline</title><department></department><company></company></author><author xmlns:author="http://www.w3.org/2005/Atom"><name>Tanja Foing</name><title>Worldline</title><department></department><company></company></author></item><item><title>Using Google Cloud Speech-to-Text to transcribe your Twilio calls in real-time</title><link>https://cloud.google.com/blog/topics/partners/using-google-cloud-speech-to-text-to-transcribe-your-twilio-calls-in-real-time/</link><description><html><head></head><body><div class="block-paragraph"><div class="rich-text"><p>Developers have asked us how they can use Google Cloud’s Speech-to-Text to transcribe speech (especially phone audio) coming from <a href="https://www.twilio.com/">Twilio</a>, a leading cloud communications PaaS. We’re pleased to announce that it’s now easier than ever to integrate live call data with Google Cloud’s Speech-to-Text using Twilio’s Media Streams.</p><p>The new TwiML <i>&lt;stream&gt;</i> command streams call audio to a websocket server. This makes it simple to move your call audio from your business phone system into an AI platform that can transcribe that data in real time and use it for use cases like helping contact center agents and admins, as well as store it for later analysis. </p><p>When you combine this new functionality with Google Cloud’s Speech-to-Text abilities and other infrastructure and analytics tools like BigQuery, you can create an extremely scalable, reliable and accurate way of getting more value from your audio.</p><h2>Architecture</h2><p>The overall architecture for creating this flow looks something like what you see below. Twilio creates and manages the inbound phone number. Their new Stream command takes the audio from an incoming phone call and sends it to a configured websocket which runs on a simple App Engine flexible environment. From there, sending the audio along as it comes to Cloud Speech-to-Text is not very challenging. Once a transcript is created, it’s stored in BigQuery where real-time analysis can be performed.</p></div></div><div class="block-image_full_width"><div class="article-module h-c-page"><div class="h-c-grid"><figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 "><img alt="twilio overall architecture.png" src="https://storage.googleapis.com/gweb-cloudblog-publish/images/twilio_overall_architecture.max-1000x1000.png"/></figure></div></div></div><div class="block-paragraph"><div class="rich-text"><h2>Configuring your phone number</h2><p>Once you’ve <a href="https://www.twilio.com/login?g=/console/phone-numbers/search?&amp;t=a3134facff1edad5ee8c40d35c3a85606f3a8f8a2dfeb64c6f4aedcf3f06da20">bought a number</a> in Twilio, you’ll need to configure your phone number to respond with <a href="https://www.twilio.com/docs/voice/twiml">TwiML</a>, which stands for Twilio Markup Language. It’s a tag-based language much like HTML, which will pass off control via a webhook that expects TwiML that you provide.</p><p>Next, navigate to your list <a href="https://www.twilio.com/console/phone-numbers/incoming">phone numbers</a> and choose your new number. On the number settings screen, scroll down to the <b>Voice</b> section. There is a field labelled “A Call Comes In”. Here, choose <b>TwiML Bin</b> from the drop down and press the plus button next to the field to create a new TwiML Bin.</p><h2>Creating a TwiML Bin</h2><p><a href="https://www.twilio.com/docs/runtime/tutorials/twiml-bins">TwiML Bins</a> are a serverless solution that can seamlessly host TwiML instructions. Using a TwiML Bin prevents you from needing to set up a webhook handler in your own web-hosted environment.</p><p>Give your TwiML Bin a Friendly Name that you can remember later. In the <b>Body</b> field, enter the following code, replacing the url attribute of the &lt;Stream&gt; tag and the phone number contained in the body of the &lt;Dial&gt; tag.</p></div></div><div class="block-code"><div class="article-module h-c-page"><div class="h-c-grid uni-paragraph-wrap"><div class="uni-paragraph h-c-grid__col h-c-grid__col--8 h-c-grid__col-m--6 h-c-grid__col-l--6 h-c-grid__col--offset-2 h-c-grid__col-m--offset-3 h-c-grid__col-l--offset-3"><pre><code></code></pre></div></div></div></div><div class="block-paragraph"><div class="rich-text"><p>The <a href="https://www.twilio.com/docs/voice/twiml/stream">&lt;Stream&gt;</a> tag starts the audio stream asynchronously and then control moves onto the &lt;Dial&gt; verb. &lt;Dial&gt; will call that number. The audio stream will end when the call is completed.</p><p>Save your TwiML Bin and make sure that you see your Friendly Name in the “A Call Comes In“ drop down next to TwiML Bin. Make sure to <b>Save</b> your phone number.</p><h2>Setup in Google Cloud</h2><p>This setup can either be done in an existing Google Cloud project or a new project. To set up a new project, follow the instructions <a href="https://cloud.google.com/resource-manager/docs/creating-managing-projects">here</a>. Once you have the project selected that you want to work in, you’ll need to set up a few key things before getting started:</p><ul><li><p>Enable APIs for Google Speech-to-Text. You can do that by following the instructions <a href="https://cloud.google.com/endpoints/docs/openapi/enable-api">here</a> and searching for “Cloud Speech-to-Text API”.</p></li><li><p><a href="https://cloud.google.com/iam/docs/creating-managing-service-accounts">Create</a> a service account for your App Engine flexible environment to utilize when accessing other Google Cloud services. You’ll need to download the private key as a JSON file as well.</p></li><li><p>Add firewall rules to allow your App Engine flexible environment to accept incoming connections for the websocket. A command like the following should work from a gcloud enabled terminal:</p></li><ul><li><p>gcloud compute firewall-rules create default-allow-websockets-8080 --allow tcp:8080 --target-tags websocket --description "Allow websocket traffic on port 8080"</p></li></ul></ul><h2>App Engine flexible environment setup</h2><p>For the App Engine application, we will be taking the sample code from Twilio’s repository to create a simple node.js websocket server. You can find the github page <a href="https://github.com/twilio/programmable-media-streams/tree/master/node/realtime-transcriptions">here</a> with instructions on environment setup. Once the code is in your project folder, you’ll need to do a few more things to deploy your application:</p><ul><li><p>Place the service account JSON key you downloaded earlier, rename it to “google_creds.json”, and put it in the same directory as the node.js code.</p></li><li><p>Create an app.yaml file that looks like the following:</p></li><ul><li><p>runtime: nodejs</p></li><li><p>env: flex</p></li><li><p>manual_scaling:</p></li><li><p> instances: 1</p></li><li><p>network:</p></li><li><p> instance_tag: websocket</p></li></ul></ul></div></div><div class="block-image_full_width"><div class="article-module h-c-page"><div class="h-c-grid"><figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 "><img alt="App Engine flexible environment setup.png" src="https://storage.googleapis.com/gweb-cloudblog-publish/images/App_Engine_flexible_environment_setup.max-1000x1000.png"/></figure></div></div></div><div class="block-paragraph"><div class="rich-text"><p>Once these two items are in order, you will be able to deploy your application with the command:</p><p><b><i>gcloud app deploy</i></b></p><p>Once deployed, you can tail the console logs with the command:</p><p><b><i>gcloud app logs tail -s default</i></b></p><h2>Verifying your stream is working</h2><p>Call your Twilio number, and you should immediately be connected with the number specified in your TwiML. You should see a websocket connection request made to the url specified in the &lt;Stream&gt;. Your websocket should immediately start receiving messages. If you are tailing the logs in the console, the application will log the intermediate messages as well as any final utterances detected by Google Cloud’s Speech-to-Text API.</p><h2>Writing transcriptions to BigQuery</h2><p>In order to analyze the transcripts later, we can create a BigQuery table and modify the sample code from Twilio to write to that table. Instructions for creating a new BigQuery table can be found <a href="https://cloud.google.com/bigquery/docs/tables">here</a>. Given the way Google Speech-to-Text creates transcription results, a potential schema for the table might look like the following.</p></div></div><div class="block-image_full_width"><div class="article-module h-c-page"><div class="h-c-grid"><figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 "><img alt="Writing transcriptions to BigQuery.jpg" src="https://storage.googleapis.com/gweb-cloudblog-publish/images/Writing_transcriptions_to_BigQuery.max-1000x1000.jpg"/></figure></div></div></div><div class="block-paragraph"><div class="rich-text"><p>Once a table like this exists, you can modify the Twilio sample code to also stream data to the BigQuery table using sample code found <a href="https://github.com/googleapis/nodejs-bigquery/blob/master/samples/insertRowsAsStream.js">here</a>.</p><h2>Conclusion</h2><p>Twilio’s new <i>Stream</i> function allows users to quickly make use of the real time audio that is moving through their phone systems. Paired with Google Cloud, that data can be transcribed in real time and passed on to numerous other applications. This ability to get high quality transcription in real time can benefit businesses—from helping contact center agents document and understand phone calls, to analyzing data from the transcripts of those calls. </p><p>To learn more about Cloud Speech-to-Text, <a href="https://cloud.google.com/speech-to-text/">visit our website</a>.</p></div></div></body></html></description><pubDate>Wed, 28 Aug 2019 14:00:00 -0000</pubDate><guid>https://cloud.google.com/blog/topics/partners/using-google-cloud-speech-to-text-to-transcribe-your-twilio-calls-in-real-time/</guid><category>AI & Machine Learning</category><category>Google Cloud Platform</category><category>Partners</category><og xmlns:og="http://ogp.me/ns#"><type>article</type><title>Using Google Cloud Speech-to-Text to transcribe your Twilio calls in real-time</title><description>It’s now easier than ever to integrate live call data with Google Cloud’s Speech-to-Text using Twilio’s Media Streams.</description><site_name>Google</site_name><url>https://cloud.google.com/blog/topics/partners/using-google-cloud-speech-to-text-to-transcribe-your-twilio-calls-in-real-time/</url></og><author xmlns:author="http://www.w3.org/2005/Atom"><name>Mark Shalda</name><title>Technical Program Manager & ML Partner Engineering Lead</title><department></department><company></company></author></item><item><title>Spot slow MySQL queries fast with Stackdriver Monitoring</title><link>https://cloud.google.com/blog/products/management-tools/spot-slow-mysql-queries-fast-with-stackdriver-monitoring/</link><description><html><head></head><body><div class="block-paragraph"><div class="rich-text"><p>When you’re serving customers online, speed is essential for a good experience. As the amount of data in a database grows, queries that used to be fast can slow down. For example, if a query has to scan every row because a table is missing an index, response times that were acceptable with a thousand rows can turn into multiple seconds of waiting once you have a million rows. If this query is executed every time a user loads your web page, their browsing experience will slow to a crawl, causing user frustration. Slow queries can also impact automated jobs, causing them to time out before completion. If there are too many of these slow queries executing at once, the database can even run out of connections, causing all new queries, slow or fast, to fail. </p><p>The popular open-source databases MySQL and <a href="https://cloud.google.com/">Google Cloud Platform</a>'s fully managed version, <a href="http://cloud.google.com/sql">Cloud SQL for MySQL</a>, include a feature to log slow queries, letting you find the cause, then optimize for better performance. However, developers and database administrators typically only access this slow query log reactively, after users have seen the effects and escalated the performance degradation.</p>With <a href="https://cloud.google.com/logging/">Stackdriver Logging</a> and <a href="https://cloud.google.com/monitoring/">Monitoring</a>, you can stay ahead of the curve for database performance with automatic alerts when query latency goes over the threshold, and a monitoring dashboard that lets you quickly pinpoint the specific queries causing the slowdown.</div></div><div class="block-image_full_width"><div class="article-module h-c-page"><div class="h-c-grid"><figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 "><img alt="Architecture for monitoring MySQ.jpg" src="https://storage.googleapis.com/gweb-cloudblog-publish/images/Architecture_for_monitoring_MySQ.max-1000x1000.jpg"/><figcaption class="article-image__caption "><div class="rich-text"><i>Architecture for monitoring MySQL slow query logs with Stackdriver</i></div></figcaption></figure></div></div></div><div class="block-paragraph"><div class="rich-text"><p>To get started, import MySQL's slow query log into Stackdriver Logging. Once the logs are in Stackdriver, it's straightforward to set up <a href="https://cloud.google.com/logging/docs/logs-based-metrics/">logs-based metrics</a> that can both count the number of slow queries over time, which is useful for setting up appropriate alerts, and also provide breakdowns by slow SQL statement, allowing speedy troubleshooting. What's more, this approach works equally well for managed databases in Cloud SQL for MySQL and for self-managed MySQL databases hosted on Compute Engine. </p>For a step-by-step tutorial to set up slow query monitoring, check out <a href="https://cloud.google.com/community/tutorials/stackdriver-monitor-slow-query-mysql">Monitoring slow queries in MySQL with Stackdriver</a>. For more ideas about what else you can accomplish with Stackdriver Logging, check out <a href="https://cloud.google.com/solutions/design-patterns-for-exporting-stackdriver-logging">Design patterns for exporting Stackdriver Logging</a>.</div></div></body></html></description><pubDate>Wed, 28 Aug 2019 14:00:00 -0000</pubDate><guid>https://cloud.google.com/blog/products/management-tools/spot-slow-mysql-queries-fast-with-stackdriver-monitoring/</guid><category>Google Cloud Platform</category><category>Management Tools</category><media:content url="https://storage.googleapis.com/gweb-cloudblog-publish/images/Google_Cloud_Management-Tools.max-600x600.jpg" width="540" height="540"></media:content><og xmlns:og="http://ogp.me/ns#"><type>article</type><title>Spot slow MySQL queries fast with Stackdriver Monitoring</title><description>Use Stackdriver Monitoring and Logging to quickly see why your MySQL or CloudSQL for MySQL queries are running slowly.</description><image>https://storage.googleapis.com/gweb-cloudblog-publish/images/Google_Cloud_Management-Tools.max-600x600.jpg</image><site_name>Google</site_name><url>https://cloud.google.com/blog/products/management-tools/spot-slow-mysql-queries-fast-with-stackdriver-monitoring/</url></og><author xmlns:author="http://www.w3.org/2005/Atom"><name>Jani Patokallio</name><title>Solutions Architect</title><department></department><company></company></author><author xmlns:author="http://www.w3.org/2005/Atom"><name>Jungwoon Lee</name><title>Customer Engineer</title><department></department><company></company></author></item><item><title>What’s happening in BigQuery: Adding speed and flexibility with 10x streaming quota, Cloud SQL federation and more</title><link>https://cloud.google.com/blog/products/data-analytics/whats-happening-bigquery-adding-speed-and-flexibility-10x-streaming-quota-cloud-sql-federation-and-more/</link><description><html><head></head><body><div class="block-paragraph"><div class="rich-text"><p>We’ve been busy this summer releasing new features for BigQuery, Google Cloud’s petabyte-scale data warehouse. BigQuery lets you ingest and analyze data quickly and with high availability, so you can find new insights, trends, and predictions to efficiently run your business. Our Google Cloud engineering team is continually making improvements to BigQuery to accelerate your time to value. </p><p>Recently added BigQuery features include a newly built back end with 10x the streaming quota, the ability to query live from Cloud SQL datasets, and the ability to run your existing TensorFlow models in BigQuery. These new features are designed to help you stream, analyze, and model more data faster, with more flexibility.</p><p>Read on to learn more about these new capabilities and get quick demos and tutorial links so you can try these features yourself.</p><h2>10x BigQuery streaming quota, now in beta</h2><p>We know your data needs to move faster than your business, so we’re always working on adding efficiency and speed. The BigQuery team has completely redesigned the streaming back end to increase the <a href="https://cloud.google.com/bigquery/quotas#streaming_inserts">default Streaming API quota</a> by a factor of 10, from 100,000 to 1,000,000 rows per second per project. The default quota for maximum bytes per second has also increased, from 100MB per table to 1GB per project and there are now no table-level limitations. This means you get greater capacity and better performance for your streaming workloads like IoT and more. </p><p>There’s no change to the current streaming API. You can choose whether you’d like to use this new streaming back end by filling out this <a href="https://docs.google.com/forms/d/1BpoUfWkHXxgl2m41PnSuufgiN2qvyhBfqDZRRZ9EX5E/">form</a>. If you use the new back end, you won’t have to change your BigQuery API code, since the new back end uses the same <a href="https://cloud.google.com/bigquery/streaming-data-into-bigquery">BigQuery Streaming API</a>. </p><p>Note that this quota increase is only applicable if you don’t need the <a href="https://cloud.google.com/bigquery/streaming-data-into-bigquery#dataconsistency">best effort deduplication</a> that’s offered by the current streaming back end. This is done by not populating the insertId field for each row inserted when calling the streaming API.</p><p>Check out this demo from Google Cloud Next ‘19 to see data stream 20 GB per second from simulated IoT sensors into BigQuery.</p></div></div><div class="block-video"><div class="article-module article-video "><figure><a class="h-c-video h-c-video--marquee" data-glue-modal-disabled-on-mobile="true" data-glue-modal-trigger="uni-modal-eOQ3YJKgvHE-" href="https://youtube.com/watch?v=eOQ3YJKgvHE"><img alt="BigQuery co-founder, Jordan Tigani, describes how today’s enterprise demands from data go far beyond the capabilities of traditional data warehousing. Leaders want to make real-time decisions from fresh information even while that data is growing rapidly. Companies can no longer analyze only what happened yesterday, they need to be able to make future predictions. Cruise Automation will share how they are using BigQuery to get answers to problems that could not be solved in traditional data warehouses. Jordan will also demonstrate some of the latest BigQuery features that will make you rethink what a data warehouse can be and how it can help you focus on the analytics instead of worrying about the infrastructure." src="//img.youtube.com/vi/eOQ3YJKgvHE/maxresdefault.jpg"/><svg class="h-c-video__play h-c-icon h-c-icon--color-white" role="img"><use xlink:href="#mi-youtube-icon"></use></svg></a></figure></div><div class="h-c-modal--video" data-glue-modal="uni-modal-eOQ3YJKgvHE-" data-glue-modal-close-label="Close Dialog"><a class="glue-yt-video" data-glue-yt-video-autoplay="true" data-glue-yt-video-height="99%" data-glue-yt-video-vid="eOQ3YJKgvHE" data-glue-yt-video-width="100%" href="https://youtube.com/watch?v=eOQ3YJKgvHE" ng-cloak=""></a></div></div><div class="block-code"><div class="article-module h-c-page"><div class="h-c-grid uni-paragraph-wrap"><div class="uni-paragraph h-c-grid__col h-c-grid__col--8 h-c-grid__col-m--6 h-c-grid__col-l--6 h-c-grid__col--offset-2 h-c-grid__col-m--offset-3 h-c-grid__col-l--offset-3"><pre><code></code></pre></div></div></div></div><div class="block-paragraph"><div class="rich-text"><p>Check out the documentation for more on <a href="https://cloud.google.com/bigquery/streaming-data-into-bigquery">Streaming data into BigQuery</a>.</p><h2>Query Cloud SQL from BigQuery</h2><p>Data can only create value for your business when you put it to work, and businesses need secure and easy-to-use methods to explore and manage data that is stored in multiple locations. Within Google Cloud, we use our database tools and services to power what we do, including offering new <a href="https://google.qwiklabs.com/">Qwiklabs</a> and <a href="https://cloud.google.com/training/">courses</a> each month. Internally, we manage the roadmap of new releases with a <a href="https://cloud.google.com/sql/docs/">Cloud SQL</a> back end. We then have an hourly Cloud Composer job that pipes our Cloud SQL transactional data from Cloud SQL into BigQuery for reporting. Such periodic export carries considerable overhead and the drawback that reports reflect data that is an hour old. This is a common challenge for enterprise business intelligence teams who want quicker insights from their transactional systems. </p><p>To avoid the overhead of periodic exports and increase the timeliness of your reports, we have expanded support for <a href="https://cloud.google.com/bigquery/external-data-sources">federated queries</a> to include Cloud SQL. You can now query your Cloud SQL tables and views directly from BigQuery through a <a href="https://cloud.google.com/bigquery/docs/cloud-sql-federated-queries">federated Cloud SQL connection</a> (no more moving or copying data). Our curriculum dashboards now run on live data with one simple <a href="http://cloud.google.com/bigquery/docs/cloud-sql-federated-queries#federated_query_syntax">EXTERNAL_QUERY()</a> instead of a complex hourly pipeline. This new connection feature supports both MySQL (second generation) and PostgreSQL instances in Cloud SQL. </p><p>After the initial one-time setup, you can write a query with the new SQL function <a href="http://cloud.google.com/bigquery/docs/cloud-sql-federated-queries#federated_query_syntax">EXTERNAL_QUERY()</a>. Here’s an example where we join existing customer data from BigQuery against the latest orders from our transactional system in Cloud SQL in one query:<br/></p></div></div><div class="block-code"><div class="article-module h-c-page"><div class="h-c-grid uni-paragraph-wrap"><div class="uni-paragraph h-c-grid__col h-c-grid__col--8 h-c-grid__col-m--6 h-c-grid__col-l--6 h-c-grid__col--offset-2 h-c-grid__col-m--offset-3 h-c-grid__col-l--offset-3"><pre><code></code></pre></div></div></div></div><div class="block-paragraph"><div class="rich-text"><p>Note the cross database JOIN on rq.customer_id = c.customer_id. BigQuery actively connects to Cloud SQL to get the latest order data. </p><p>Getting live data from Cloud SQL federated in BigQuery means you will always have the latest data for reporting. This can save teams time, bring the latest data faster, and open up analytics possibilities. We hear from customers that they are seeing the benefits of immediate querying, too.</p><p>"Our data is spread across Cloud SQL and BigQuery. We had to maintain and monitor extract jobs to copy Cloud SQL data into BigQuery for analysis, and data was only as fresh as the last run,” says Zahi Karam, director of data science at Bluecore. “With Cloud SQL Federation, we can use BigQuery to run analysis across live data in both systems, ensuring that we're always getting the freshest view of our data. Additionally, we can securely enable less technical analysts to query Cloud SQL via BigQuery without having to set up additional connections."</p><p>Take a look at the demo for more:<br/></p></div></div><div class="block-video"><div class="article-module article-video "><figure><a class="h-c-video h-c-video--marquee" data-glue-modal-disabled-on-mobile="true" data-glue-modal-trigger="uni-modal-K8A6_G3DTTs-" href="https://youtube.com/watch?v=K8A6_G3DTTs"><img alt="This demo shows how to run a federated query from BigQuery against Cloud SQL. This feature uses the new EXTERNAL_QUERY function to pass a SQL query to the underlying MySQL or Postgres database in Cloud SQL." src="//img.youtube.com/vi/K8A6_G3DTTs/maxresdefault.jpg"/><svg class="h-c-video__play h-c-icon h-c-icon--color-white" role="img"><use xlink:href="#mi-youtube-icon"></use></svg></a></figure></div><div class="h-c-modal--video" data-glue-modal="uni-modal-K8A6_G3DTTs-" data-glue-modal-close-label="Close Dialog"><a class="glue-yt-video" data-glue-yt-video-autoplay="true" data-glue-yt-video-height="99%" data-glue-yt-video-vid="K8A6_G3DTTs" data-glue-yt-video-width="100%" href="https://youtube.com/watch?v=K8A6_G3DTTs" ng-cloak=""></a></div></div><div class="block-paragraph"><div class="rich-text"><p>Check out the documentation to learn more about <a href="https://cloud.google.com/bigquery/docs/cloud-sql-federated-queries">Cloud SQL federated queries from BigQuery</a>.</p><h2>BigQuery ML: Import TensorFlow models </h2><p>Machine learning can do lots of cool things for your business, but it needs to be easy and fast for users. For example, say your data science teams have created a couple of models and they need your help to make quick batch predictions on new data arriving in BigQuery. With new BigQuery ML <a href="https://cloud.google.com/bigquery-ml/docs/making-predictions-with-imported-tensorflow-models">Tensorflow prediction support</a>, you can import and make batch predictions using your existing TensorFlow models on your BigQuery tables, using familiar BQML syntax. Here’s an example.</p><p>First, we’ll import the model from our project bucket:</p></div></div><div class="block-code"><div class="article-module h-c-page"><div class="h-c-grid uni-paragraph-wrap"><div class="uni-paragraph h-c-grid__col h-c-grid__col--8 h-c-grid__col-m--6 h-c-grid__col-l--6 h-c-grid__col--offset-2 h-c-grid__col-m--offset-3 h-c-grid__col-l--offset-3"><pre><code></code></pre></div></div></div></div><div class="block-paragraph"><div class="rich-text"><p>Then we can quickly batch predictions with the familiar BigQuery ML syntax:</p></div></div><div class="block-code"><div class="article-module h-c-page"><div class="h-c-grid uni-paragraph-wrap"><div class="uni-paragraph h-c-grid__col h-c-grid__col--8 h-c-grid__col-m--6 h-c-grid__col-l--6 h-c-grid__col--offset-2 h-c-grid__col-m--offset-3 h-c-grid__col-l--offset-3"><pre><code></code></pre></div></div></div></div><div class="block-paragraph"><div class="rich-text"><p>Want to run batch predictions at regular intervals as new data comes in? Simply set up a <a href="https://cloud.google.com/bigquery/docs/scheduling-queries">scheduled query</a> to pull the latest data and also make the prediction. And as we highlighted in a previous post, scheduled queries can run as frequently as every <a href="https://cloud.google.com/blog/products/data-analytics/new-persistent-user-defined-functions-increased-concurrency-limits-gis-and-encryption-functions-and-more">15 minutes</a>.</p><p>Check out the <a href="https://cloud.google.com/bigquery-ml/docs/making-predictions-with-imported-tensorflow-models">BigQuery ML TensorFlow User Guide</a> for more.</p><h2>Automatic re-clustering now available </h2><p>Efficiency is essential when you’re crunching through huge datasets. One key best practice for cost and performance optimization in BigQuery is table <a href="https://cloud.google.com/bigquery/docs/partitioned-tables">partitioning</a> and <a href="https://cloud.google.com/bigquery/docs/clustered-tables">clustering</a>. As new data is added to your partitioned tables, it may get written into an active partition and need to be periodically re-clustered for better performance. Traditionally, other data warehouse processes like “<a href="https://docs.aws.amazon.com/redshift/latest/dg/r_VACUUM_command.html">VACUUM</a>” and “<a href="https://docs.snowflake.net/manuals/user-guide/tables-auto-reclustering.html">automatic clustering</a>” require setup and financing by the user. BigQuery now <a href="https://cloud.google.com/bigquery/docs/clustered-tables#automatic_re-clustering">automatically re-clusters</a> your data for you at no additional cost and with no action needed on your part.</p></div></div><div class="block-image_full_width"><div class="article-module h-c-page"><div class="h-c-grid"><figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 "><img alt="Automatic re-clustering now available.png" src="https://storage.googleapis.com/gweb-cloudblog-publish/images/Automatic_re-clustering_now_available.max-1000x1000.png"/></figure></div></div></div><div class="block-paragraph"><div class="rich-text"><p>Check out our recent blog post <a href="https://cloud.google.com/blog/products/data-analytics/skip-the-maintenance-speed-up-queries-with-bigquerys-clustering">Skip the maintenance, speed up queries with BigQuery's clustering</a> for a detailed walkthrough. And get more detail in the documentation: <a href="https://cloud.google.com/bigquery/docs/clustered-tables#automatic_re-clustering">automatic re-clustering</a>.</p><h2>UDF performance now faster</h2><p>If you perform a query using <a href="https://cloud.google.com/bigquery/docs/reference/standard-sql/user-defined-functions">JavaScript UDFs</a>, it’ll now take around a second less to execute, on average, due to speedier logic for initializing the JavaScript V8 Engine that BigQuery uses to compute UDFs. Don’t forget you can <a href="https://cloud.google.com/blog/products/data-analytics/new-persistent-user-defined-functions-increased-concurrency-limits-gis-and-encryption-functions-and-more">persist and share your custom UDFs</a> with your team, as we highlighted in our last post. </p><h2>In case you missed it</h2><p>For more on all things BigQuery, check out these recent posts, videos and how-tos:</p><ul><li><p><a href="https://cloud.google.com/blog/products/data-analytics/skip-the-heavy-lifting-moving-redshift-to-bigquery-easily">Skip the heavy lifting: Moving Redshift to BigQuery easily</a></p></li><li><p><a href="https://cloud.google.com/blog/products/data-analytics/introducing-the-bigquery-terraform-module">Introducing the BigQuery Terraform module</a></p></li><li><p><a href="https://towardsdatascience.com/clustering-4-000-stack-overflow-tags-with-bigquery-k-means-ef88f902574a">Clustering 4,000 Stack Overflow tags with BigQuery k-means</a></p></li><li><p><a href="https://medium.com/google-cloud/efficient-spatial-matching-in-bigquery-c4ddc6fb9f69">Efficient spatial matching in BigQuery</a></p></li><li><p>Lab series: <a href="https://www.qwiklabs.com/quests/55">BigQuery for data analysts </a></p></li><li><p><a href="https://cloud.google.com/blog/products/data-analytics/glidefinder-how-we-built-a-platform-on-google-cloud-that-can-monitor-wildfires">GlideFinder: How we built a platform on Google Cloud that can monitor wildfires</a></p></li><li><p><a href="https://cloud.google.com/blog/products/data-analytics/migrating-teradata-and-other-data-warehouses-to-bigquery">Migrating Teradata and other data warehouses to BigQuery</a></p></li><li><p><a href="https://cloud.google.com/blog/products/data-analytics/how-to-use-bigquery-ml-for-anomaly-detection">How to use BigQuery ML for anomaly detection</a></p></li><li><p><a href="https://github.com/GoogleCloudPlatform/bigquery-utils">BigQuery shared utilities GitHub library (scripts, UDFs)</a></p></li></ul><p>To keep up on what’s new with BigQuery, subscribe to our <a href="https://cloud.google.com/bigquery/docs/release-notes">release notes</a> and stay tuned to the blog for news and announcements And <a href="https://twitter.com/gcpcloud?lang=en">let us know</a> how else we can help.</p></div></div></body></html></description><pubDate>Wed, 28 Aug 2019 14:00:00 -0000</pubDate><guid>https://cloud.google.com/blog/products/data-analytics/whats-happening-bigquery-adding-speed-and-flexibility-10x-streaming-quota-cloud-sql-federation-and-more/</guid><category>Google Cloud Platform</category><category>BigQuery</category><category>Data Analytics</category><media:content url="https://storage.googleapis.com/gweb-cloudblog-publish/images/Cloud_BigQuery.max-600x600.jpg" width="540" height="540"></media:content><og xmlns:og="http://ogp.me/ns#"><type>article</type><title>What’s happening in BigQuery: Adding speed and flexibility with 10x streaming quota, Cloud SQL federation and more</title><description>The latest updates for Google Cloud’s BigQuery data warehouse include a streaming quota increase, automatic re-clustering, and lots more features.</description><image>https://storage.googleapis.com/gweb-cloudblog-publish/images/Cloud_BigQuery.max-600x600.jpg</image><site_name>Google</site_name><url>https://cloud.google.com/blog/products/data-analytics/whats-happening-bigquery-adding-speed-and-flexibility-10x-streaming-quota-cloud-sql-federation-and-more/</url></og><author xmlns:author="http://www.w3.org/2005/Atom"><name>Evan Jones</name><title>Technical Curriculum Developer, Google Cloud</title><department></department><company></company></author></item><item><title>Music to their ears: microservices on GKE, Preemptible VMs improved Musiio’s efficiency by 7000%</title><link>https://cloud.google.com/blog/products/containers-kubernetes/microservices-on-gke-preemptible-vms-improved-musiios-efficiency-by-7000/</link><description><html><head></head><body><div class="block-paragraph"><div class="rich-text"><p><i><b>Editor’s note:</b> Advanced AI startup Musiio, the first ever VC-funded music tech company in Singapore, needed more robust infrastructure for the data pipeline it uses to ingest and analyze new music. Moving to Google Kubernetes Engine gave them the reliability they needed; rearchitecting their application as a series of microservices running on Preemptible VMs gave them new levels of efficiency and helped to control their costs. Read on to hear how they did it.</i></p><p>At <a href="https://www.musiio.com/home">Musiio</a> we’ve built an AI that ‘listens’ to music tracks to recognize thousands of characteristics and features from them. This allows us to create highly accurate tags, allow users to search based on musical features, and automatically create personalized playlists. We do this by indexing, classifying and ultimately making searchable new music as it gets created—to the tune of about 40,000 tracks each day for one major streaming provider.</p><p>But for this technology to work at scale, we first need to efficiently scan tens of millions of digital audio files, which represent terabytes upon terabytes of data. </p><p>In Musiio’s early days, we built a container-based pipeline in the cloud orchestrated by Kubernetes, organized around a few relatively heavy services. This approach had multiple issues, including low throughput, poor reliability and high costs. Nor could we run our containers with a high node-CPU utilization for an extended period of time; the nodes would fail or time out and become unresponsive. That made it almost impossible to diagnose the problem or resume the task, so we’d have to restart the scans.</p></div></div><div class="block-image_full_width"><div class="article-module h-c-page"><div class="h-c-grid"><figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 "><img alt="musiio initial platform architecture.jpg" src="https://storage.googleapis.com/gweb-cloudblog-publish/images/musiio_initial_platform_architecture.max-1000x1000.jpg"/><figcaption class="article-image__caption "><div class="rich-text"><i>Figure 1: Our initial platform architecture.</i></div></figcaption></figure></div></div></div><div class="block-paragraph"><div class="rich-text"><p>As a part of reengineering our architecture, we decided to experiment with <a href="https://cloud.google.com/kubernetes-engine/">Google Kubernetes Engine</a> (GKE) on <a href="https://cloud.google.com/">Google Cloud Platform</a> (GCP). We quickly discovered some important advantages that allowed us to improve performance and better manage our costs: </p><ul><li><b>GKE reliability</b>: We were very impressed by GKE’s reliability, as we were able to run the nodes at &gt;90% CPU load for hours without any issues. On our previous provider, the nodes could not take a high CPU load and would often become unreachable.</li><li><b>Preemptible VMs and GPUs</b>: GKE supports both <a href="https://cloud.google.com/preemptible-vms/">Preemptible VMs</a> and <a href="https://cloud.google.com/compute/docs/gpus/#preemptible_with_gpu">GPUs on preemptible instances</a>. Preemptible VMs only last up to 24 hours but in exchange are up to 80% cheaper than regular compute instances; attached GPUs are also discounted. They can be reclaimed by GCP at any time during these 24 hours (along with any attached GPUs). However, reclaimed VMs do not disappear without warning. GCP sends a signal 30 seconds in advance, so your code has time to react. </li></ul><p>We wanted to take advantage of GKE’s improved performance and reliability, plus lower costs with preemptible resources. To do so, though, we needed to implement some simple changes to our architecture. </p><h2>Building a microservices-based pipeline</h2><p>To start, we redesigned our architecture to use lightweight microservices, and to follow one of the most important principles of software engineering: keep it simple. Our goal was that no single step in our pipeline would take more than 15 seconds, and that we could automatically resume any job wherever it left off. To achieve this we mainly relied on three GCP services:</p><ol><li><p><a href="https://cloud.google.com/pubsub/docs/overview">Google Cloud Pub/Sub</a> to manage the task queue,</p></li><li><p><a href="https://cloud.google.com/storage/">Google Cloud Storage</a> to store the temporary intermediate results, taking advantage of its <a href="https://cloud.google.com/storage/docs/managing-lifecycles">object lifecycle management</a>to do automatic cleanup, and</p></li><li><p><a href="https://cloud.google.com/kubernetes-engine/">GKE</a> with preemptible nodes to run the code.</p></li></ol><p>Specifically, the new processing pipeline now consists of the following steps:</p><ol><li><p>New tasks are added through an exposed API-endpoint by the clients.</p></li><li><p>The task is published to Cloud Pub/Sub and attached data is passed to a cloud storage bucket.</p></li><li><p>The services pulls new tasks from the queue and reports success status.</p></li><li><p>The final output is stored in a database and all intermediate data is discarded.</p></li></ol></div></div><div class="block-image_full_width"><div class="article-module h-c-page"><div class="h-c-grid"><figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 "><img alt="musiio new improved architecture.jpg" src="https://storage.googleapis.com/gweb-cloudblog-publish/images/musiio_new_improved_architecture.max-1000x1000.jpg"/><figcaption class="article-image__caption "><div class="rich-text"><i>Figure 2: Our new improved architecture.</i></div></figcaption></figure></div></div></div><div class="block-paragraph"><div class="rich-text"><p>While there are more components in our new architecture, they are all much less complex. Communication is done through a queue where each step of the pipeline reports its success status. Each sub-step takes less than 10 seconds and can easily and quickly resume from the previous state and with no data loss. </p><h2>How do Preemptible VMs fit in this picture?</h2><p>Using preemptible resources might seem like an odd choice for a mission-critical service, but because of our microservices design, we were able to use Preemptible VMs and GPUs without losing data or having to write elaborate retry code. Using Cloud Pub/Sub (see 2. above) allows us to store the state of the job in the queue itself. If a service is notified that a node has been preempted, it finishes the current task (which, by design, is always shorter than the 30-second notification time), and simply stops pulling new tasks. Individual services don't have to do anything else to manage potential interruptions. When the node is available again, services begin pulling tasks from the queue again, starting where they left off.</p><p>This new design means that preemptible nodes can be added, taken away, or exchanged for regular nodes without causing any noticeable interruption.</p><p>GKE’s <a href="https://cloud.google.com/kubernetes-engine/docs/concepts/cluster-autoscaler">Cluster Autoscaler</a> also works very well with preemptible instances. By combining the auto scaling features (which automatically replaces nodes that have been reclaimed) with node labels, we were able to achieve an architecture with &gt;99.9% availability that runs primarily on preemptible nodes. </p><h2>Finally... </h2><p>We did all this over the course of a month—one week for design, and three weeks for the implementation. Was it worth all this effort? Yes! </p><p>With these changes, we increased our throughput from 100,000 to 7 million tracks per week—and <b>at the same cost as before!</b> This is a <b>7000% increase</b> (!) in efficiency, and was a crucial step in making our business profitable. </p><p>Our goal as a company is to be able to transform the way the music industry handles data and volume and make it efficient. With nearly 15 million songs being added to the global pool each year, access and accessibility are the new trend. Thanks to our new microservices architecture and the speed and reliability of Google Cloud, we are on our way to make this a reality. </p><p>Learn more about GKE on the <a href="https://cloud.google.com/kubernetes-engine">Google Cloud Platform website.</a></p></div></div></body></html></description><pubDate>Wed, 28 Aug 2019 14:00:00 -0000</pubDate><guid>https://cloud.google.com/blog/products/containers-kubernetes/microservices-on-gke-preemptible-vms-improved-musiios-efficiency-by-7000/</guid><category>Google Cloud Platform</category><category>Customers</category><category>Cloud Native</category><category>Containers & Kubernetes</category><media:content url="https://storage.googleapis.com/gweb-cloudblog-publish/images/Google_Containers.max-600x600.jpg" width="540" height="540"></media:content><og xmlns:og="http://ogp.me/ns#"><type>article</type><title>Music to their ears: microservices on GKE, Preemptible VMs improved Musiio’s efficiency by 7000%</title><description>By using GKE and preemptible VMs on Google Cloud, Musiio was able to dramatically improve the efficiency of its microservices-based environment.</description><image>https://storage.googleapis.com/gweb-cloudblog-publish/images/Google_Containers.max-600x600.jpg</image><site_name>Google</site_name><url>https://cloud.google.com/blog/products/containers-kubernetes/microservices-on-gke-preemptible-vms-improved-musiios-efficiency-by-7000/</url></og><author xmlns:author="http://www.w3.org/2005/Atom"><name>Aron Pettersson</name><title>CTO, Musiio</title><department></department><company></company></author></item><item><title>With great compute power: Rendering ‘Spider-Man: Far From Home’ on Google Cloud</title><link>https://cloud.google.com/blog/products/compute/luma-pictures-render-spider-man-far-from-home-on-google-cloud/</link><description><html><head></head><body><div class="block-paragraph"><div class="rich-text"><p>In <i>Spider-Man: Far From Home</i>, Spidey leaves the friendly confines of New York City and goes on a school trip to Venice, Prague, Berlin and London (but not Paris). While working on the visual effects (VFX) for the film, Luma Pictures also left the comfort of its on-premises Los Angeles data center, moving its render pipeline to Google Cloud, where the movie’s Air and Fire Elemental characters (a.k.a., Cyclone and Molten Man) were generated.</p></div></div><div class="block-image_full_width"><div class="article-module h-c-page"><div class="h-c-grid"><figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 "><img alt="LumaPictures_SpiderMan_GCP_2.jpg" src="https://storage.googleapis.com/gweb-cloudblog-publish/images/LumaPictures_SpiderMan_GCP_2.max-1000x1000.jpg"/><figcaption class="article-image__caption "><div class="rich-text"><i>Images provided by Luma Pictures.</i></div></figcaption></figure></div></div></div><div class="block-paragraph"><div class="rich-text"><p>“This was remarkable,” said Michael Perdew, a VFX producer at Luma Pictures. Initially, Luma didn’t think the cloud would be a good fit for the latest Spider-Man. “The big technical challenge here was that both of these characters were simulations,” he said. Historically, simulations took too much CPU, bandwidth, and disk space to be rendered in a time- or cost-effective manner outside of a local compute farm. Syncing terabytes of cache data from on-premises to the cloud can take several hours if you have limited bandwidth. In addition, Luma hadn’t yet found a cloud-based file system that could support the massive compute clusters you need to render simulations.<br/></p><p>But this was a big job, and “we had to find a way to render more than our local farms could handle,” Perdew said. So they put their heads together and developed a workflow to make it work in the cloud. </p><p>As it turned out, the cloud turned out to be the perfect place for this project—specifically for Cyclone. In Google Cloud, Luma leveraged Compute Engine custom images with 96-cores and 128 GB of RAM, and paired them with a high-performance ZFS file system. Using up to 15,000 vCPUs, Luma could render shots of the cloud monster in as little as 90 minutes—compared with the 7 or 8 hours it would take on their local render farm. Time saved rendering in the cloud more than made up for time spent syncing data to Google Cloud. “We came out way ahead, actually,” Perdew said.</p></div></div><div class="block-image_full_width"><div class="article-module h-c-page"><div class="h-c-grid"><figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 "><img alt="LumaPictures_SpiderMan_GCP_3.jpg" src="https://storage.googleapis.com/gweb-cloudblog-publish/images/LumaPictures_SpiderMan_GCP_3.max-1000x1000.jpg"/><figcaption class="article-image__caption "><div class="rich-text"><i>Images provided by Luma Pictures.</i></div></figcaption></figure></div></div></div><div class="block-paragraph"><div class="rich-text"><p>Leveraging the cloud also pushed Luma to get savvy with their workflow. By breaking up the Cyclone simulations into pieces, they could work around the clock—and around the world—tapping into the speed of our global fiber network that moves data around the planet. When the L.A. team slept, VFX artists in Luma’s Melbourne, Australia office tweaked animations and simulation settings, and triggered syncs to the cloud, getting the updated scenes ready for the L.A.-based FX and lighting teams. When L.A. artists arrived in the office the next morning, they could start the simulation jobs in Google Cloud, receiving data to review by lunchtime. <br/></p><p>In the end, Luma completed about 330 shots for <i>Spider-Man: Far From Home</i>—with about a third created in the cloud. In addition to creating Cyclone and Molten Man, Luma designed Spider-Man’s Night Monkey suit, created an elaborate CG environment for the Liberec Square in the Molten Man Battle scene, and collaborated on destruction FX in Mysterio’s lair sequence.</p></div></div><div class="block-image_full_width"><div class="article-module h-c-page"><div class="h-c-grid"><figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 "><img alt="LumaPictures_SpiderMan_GCP_0.jpg" src="https://storage.googleapis.com/gweb-cloudblog-publish/images/LumaPictures_SpiderMan_GCP_0.max-1000x1000.jpg"/><figcaption class="article-image__caption "><div class="rich-text"><i>Images provided by Luma Pictures.</i></div></figcaption></figure></div></div></div><div class="block-paragraph"><div class="rich-text"><p>Now that Luma’s work on Spider-Man is done, the studio is ramping up to take advantage of other GCP features. For example, its artists use an in-house proprietary tool called Rill that automates the process of seeing updated character animations through full simulations and render. This tool is currently deployed on an on-prem Kubernetes cluster, which they are exploring migrating—as well as other tools—to Google Kubernetes Engine (GKE) in the cloud. “Having more day-to-day services in the cloud will have all kinds of reliability benefits,” Perdew said, for example, protecting them against the power outages that occasionally happen in Luma’s Santa Monica office.</p><p>Additionally, Luma will install a direct connection to the Google Cloud Los Angeles cloud region (which celebrated its one-year anniversary this summer) for future productions, more bandwidth, and reduced latency to Google Cloud. The team hopes this will open the door to all kinds of possibilities; for example, Perdew is excited to try out remote workstations. “The industry keeps on changing the type of computer you need per discipline to do good work,” he said. “Having the flexibility to upgrade and downgrade an individual artist on the fly…as a producer, that makes me giddy.” </p><p>Here at Google Cloud, we’re also giddy to have helped bring Spider Man’s latest adventure to the big screen. But with great (compute) power comes great responsibility—we’re working diligently to make Google Cloud a great place to render your upcoming production. To learn more about Google Cloud in the media and entertainment industry, swing on over to our <a href="https://cloud.google.com/solutions/media-entertainment/use-cases/rendering/">Rendering Solutions page</a>.</p></div></div></body></html></description><pubDate>Wed, 28 Aug 2019 11:00:00 -0000</pubDate><guid>https://cloud.google.com/blog/products/compute/luma-pictures-render-spider-man-far-from-home-on-google-cloud/</guid><category>Customers</category><category>Media & Entertainment</category><category>Google Cloud Platform</category><category>Compute</category><media:content url="https://storage.googleapis.com/gweb-cloudblog-publish/images/luma_spider-man.max-600x600.jpg" width="540" height="540"></media:content><og xmlns:og="http://ogp.me/ns#"><type>article</type><title>With great compute power: Rendering ‘Spider-Man: Far From Home’ on Google Cloud</title><description>Luma Pictures relied on high-performance compute from Google Cloud to render scenes in Spider-Man: Far From Home.</description><image>https://storage.googleapis.com/gweb-cloudblog-publish/images/luma_spider-man.max-600x600.jpg</image><site_name>Google</site_name><url>https://cloud.google.com/blog/products/compute/luma-pictures-render-spider-man-far-from-home-on-google-cloud/</url></og><author xmlns:author="http://www.w3.org/2005/Atom"><name>Todd Prives</name><title>Product Manager, Cloud Rendering</title><department></department><company></company></author></item><item><title>Ruby support comes to App Engine standard environment</title><link>https://cloud.google.com/blog/products/application-development/ruby-support-comes-to-app-engine-standard-environment/</link><description><html><head></head><body><div class="block-paragraph"><div class="rich-text"><p>We have some exciting news for <a href="https://cloud.google.com/appengine/">App Engine</a> customers. Ruby is now Beta on App Engine standard environment, in addition to being available on the App Engine flexible environment. Let's dive into what that means if you’re a technical practitioner running your apps on Google Cloud. </p><p>There are lots of technical reasons to choose App Engine standard vs. flexible environment (<a href="https://cloud.google.com/appengine/docs/the-appengine-environments">this link explains it if you are curious</a>), but at a high level, App Engine standard environment brings a number of benefits to developers. For many users the most noticeable change is a decrease in deployment time from 4-7 minutes on App Engine flexible environment down to 1-3 minutes on App Engine standard. App Engine standard environment also supports scale-to-zero so you don't have to pay for your website when no one is using it. Finally, start-up time for new instances is measured in seconds rather than minutes—App Engine standard environment is simply more responsive to changes in load. </p><p>Scale-to-zero has its advantages in terms of cost, but it also means that you’ll want a truly serverless background processing architecture. For that, Cloud Pub/Sub and Cloud Tasks are great solutions for handling background tasks, and they also operate on a pay-per-use model. </p>We expect most Ruby developers to choose App Engine standard environment over App Engine flexible environment. The faster deployment time and scale-to-zero features are a huge benefit to most development processes. And deploying an existing Rails app to App Engine standard environment is pretty straightforward. But as they say, <a href="http://www.thagomizer.com/blog/2019/08/20/app-engine-updates-for-rubyists.html">your mileage may vary</a>. Look at the pros and cons in our <a href="https://cloud.google.com/appengine/docs/the-appengine-environments">documentation</a> to choose the right App Engine for your Ruby applications.<p></p></div></div></body></html></description><pubDate>Tue, 27 Aug 2019 17:30:00 -0000</pubDate><guid>https://cloud.google.com/blog/products/application-development/ruby-support-comes-to-app-engine-standard-environment/</guid><category>Application Development</category><og xmlns:og="http://ogp.me/ns#"><type>article</type><title>Ruby support comes to App Engine standard environment</title><description>Support for Ruby is now generally available of App Engine standard environment.</description><site_name>Google</site_name><url>https://cloud.google.com/blog/products/application-development/ruby-support-comes-to-app-engine-standard-environment/</url></og><author xmlns:author="http://www.w3.org/2005/Atom"><name>Morgan Hallmon</name><title>Product Manager</title><department></department><company></company></author></item><item><title>Beyond the Map: A Q&A with engineering director Andrew Lookingbill</title><link>https://cloud.google.com/blog/products/maps-platform/beyond-map-q-engineering-director-andrew-lookingbill/</link><description><html><head></head><body><div class="block-paragraph"><div class="rich-text"><p>Last month we kicked off “Beyond the Map”, a series of blog posts giving you a closer look at how we build maps that keep up with the changing world and that power apps, experiences, and businesses around the globe. In <a href="https://cloud.google.com/blog/products/maps-platform/beyond-the-map-how-we-build-the-maps-that-power-your-apps-and-business">our first post</a>, you heard about the key areas of our mapmaking processes–imagery, authoritative third-party data, community contributions, and machine learning.</p><p>In our next installment we’ll dive deeper into how we use imagery to map the world and what that means for our customers. But before we do that, we sat down with the co-author of the first post, engineering director Andrew Lookingbill, to learn more about his passion for mapmaking, biggest technical challenge at Google, and advice he has for developers working on all kinds of problems. </p><p><b>Of all the companies to work for, why did you choose Google and why have you stayed at Google for more than a decade? <br/></b>My coworkers and I who joined the Google Street View team came to Google because of the scope of the team’s ambitions. It’s easy, now that Street View has been around for a while, to forget how cheerfully mind-blowing the charter was. Sure, let’s take pictures–of everything–and make it possible to visit anywhere on the planet. There was something very attractive about that mindset.</p><p><b>What exactly do you and your team do at Google? <br/></b>Today my team and I focus on the algorithms, infrastructure, and tools we use to create and maintain our model of the real world. This includes all the imagery and 3D models, as well as all of the semantic data like addresses, business information, roads, natural features, buildings, etc. It’s an awesome job both because of the breadth of technical work–everything from building hardware for cars, to working on ML algorithms that can help make our maps better just by looking at pictures–and the breadth of use cases of the data.</p><p><b>Not only have you been at Google for more than a decade, but you’ve been on the Geo team for all that time. Haven’t you gotten bored of mapping the world yet?<br/></b>Google has a wonderful culture of internal mobility, and the fact that I’ve stayed very close to the same team I joined on my first day makes me a bit unusual. Two things have kept me here. The first, unsurprisingly, is the group of people I work with. I’ve never met a more impressive and humble group. The second is the size of the challenge we work on and the impact we can have. The world’s a big place, and it’s changing constantly. Mapping it is a task that’s never “done” and as new use cases for the data keep being imagined by developers inside and outside of Google, it just keeps getting more interesting.</p><p><b>What’s the biggest technical challenge you’ve faced at Google? <br/></b>When we first launched a country’s worth of Google-created and curated map data, the set of technical challenges involved in swapping out map data across all of our systems Google-wide, was probably the hardest, most ill-specified problem I’d ever worked on in my career up until that point. Though it’s a class of problem I’ve gotten to work on several times since. When you swap out the set of data that systems were built on and optimized for, you find all sorts of situations where the code was overfit for the existing data, and subtle differences crop up in downstream systems. For example, if you launch much more detailed geometry for water bodies, various assumptions about the memory required will break, etc. Similarly, swapping all the data out at once, in our live services, so users aren’t impacted by strangeness caused by one service (say routing) using different data than another (say search) without anyone noticing was so closely akin to pulling the tablecloth off a fully set table that we had to stop using that analogy.</p><p><b>How about the most unusual, unexpected, or funny challenge? <br/></b>One of the things I love about my career is that when you do new things, you get new challenges. Early in the Street View project, we were covering the cameras at night to protect them from dew, etc. Turns out a low-tech solution worked wonderfully–socks! The only problem was that every once in a while, someone would forget to take the sock off before they started driving. In the end, the team implemented a “sock detector” image processing algorithm that would quickly give the driver a warning if it thought the driver was driving with the sock still in place. Street View cars today are far more sophisticated, and no socks are required, so the sock detector is no more. </p><p><b>What do you think the role of machine learning is in mapping the world? <br/></b>The role of machine learning in mapping is one of scale. Street View, processed and aligned aerial imagery, and satellite imagery are incredible because they allow a type of telepresence. You can glean information about a place in the world without actually physically being there, often enough to build a useful map. Machine learning has started to allow us to generate these insights without needing to, for instance, examine each Street View panorama for new business addresses. This in turn allows us to make useful maps for a much larger portion of the world’s population than would have been possible otherwise.</p><p><b>Have you ever driven a Street View car? What was it like?</b><br/>I did get a few opportunities to drive cars in the first fleet as we were building them. Even if we were just driving between buildings, it always attracted some attention, since cars with cameras strapped to the roof were a lot less common than they are today, even in Mountain View. I’ve definitely had a soft spot for Chevy Cobalts ever since. Funnily enough, part of our process for building out the cars involved removing the passenger seat to accommodate some hardware, so the extra seats tended to become de facto furniture in the building. Quite comfortable.</p><p><b>Back when Google launched Maps and Street View, it seemed like an audacious task. What advice do you have for engineers working on big ideas like these? <br/></b>Keep your eye on the forest and the trees. Breaking down an audacious goal into the component pieces that have to be built, and identifying metrics and tests to make sure you’re headed in the right direction are important. But periodically you need to reexamine the big picture, make sure you’re still on-track to hit your big goal, and that there aren’t other ways to get where you need to go.</p><p><b>Google Maps Platform has a wide spectrum of customers–from hobbyists to nonprofits to start-ups to Fortune 500 companies. And they’re all using our products in very different ways. What’s one tip you think can help any type of developer, working on any type of business or project? <br/></b>Talk to everyone. The teams I get to work with are inventive and happy to brainstorm about possible approaches. Especially early in your career, it can be daunting to come up against a problem it may take you days or weeks to even understand. Utilizing conversations with others to help make sense of it all and pressure-test ideas is one of the best things you can do to move past seemingly insurmountable obstacles. </p><p><b>What's the one thing about our maps data that you don’t think people know or think about?<br/></b>That the map is, in many ways, a living thing–not a static description of the world. Things change all the time. Neighborhoods are built, businesses change, and so on. That vibrancy means that our users are a huge part of keeping the map fresh and useful for themselves. Local Guides and any user who knows something about the world that we’re missing or have wrong, can report the problem and help themselves and others have a better experience using the product. These community contributions are reflected in our consumer product and also shared with Google Maps Platform customers. So both consumers and customers are getting the most up to date information about the world that we can offer. </p><p><b>What do you hope to accomplish next at Google?<br/></b>Keep mapping the world. As it moves faster, so will we.</p><p><i>For more information on Google Maps Platform, <a href="https://cloud.google.com/maps-platform/">visit our website</a>. </i></p></div></div></body></html></description><pubDate>Tue, 27 Aug 2019 16:00:00 -0000</pubDate><guid>https://cloud.google.com/blog/products/maps-platform/beyond-map-q-engineering-director-andrew-lookingbill/</guid><category>Google Maps Platform</category><media:content url="https://storage.googleapis.com/gweb-cloudblog-publish/images/large-015-MAP-GOO1045-QandA-AndrewLookingbil.max-600x600.png" width="540" height="540"></media:content><og xmlns:og="http://ogp.me/ns#"><type>article</type><title>Beyond the Map: A Q&A with engineering director Andrew Lookingbill</title><description>We sat down with engineering director, Andrew Lookingbill, to learn more about his passion for mapmaking, biggest technical challenge at Google, and advice he has for developers working on all kinds of problems.</description><image>https://storage.googleapis.com/gweb-cloudblog-publish/images/large-015-MAP-GOO1045-QandA-AndrewLookingbil.max-600x600.png</image><site_name>Google</site_name><url>https://cloud.google.com/blog/products/maps-platform/beyond-map-q-engineering-director-andrew-lookingbill/</url></og><author xmlns:author="http://www.w3.org/2005/Atom"><name>Andrew Lookingbill</name><title>Engineering Director</title><department></department><company></company></author></item><item><title>Cloud Text-to-Speech expands its number of voices by nearly 70%, now covering 32 languages and variants</title><link>https://cloud.google.com/blog/products/ai-machine-learning/cloud-text-to-speech-expands-its-number-of-voices-now-covering-33-languages-and-variants/</link><description><html><head></head><body><div class="block-paragraph"><div class="rich-text"><p><i><b>Editor's Note:</b> We have updated this blog to accurately reflect supported languages and variants; Norwegian (Nynorsk) voices are not currently available.</i><br/></p><p>In February, we provided an <a href="https://cloud.google.com/blog/products/ai-machine-learning/making-ai-powered-speech-more-accessible-now-with-more-options-lower-prices-and-new-languages-and-voices">update</a> on how we’re expanding our support for new languages/variants and voices in <a href="https://cloud.google.com/text-to-speech/">Cloud Text-to-Speech</a>. Today, we’re adding to that progress by announcing:</p><ul><li>Voices in 11 new languages or variants, including Czech, English (India), Filipino, Finnish, Greek, Hindi, Hungarian, Indonesian, Mandarin Chinese (China), Modern Standard Arabic, and Vietnamese—bringing the list of total languages/variants available to 32. <p></p></li><li>76 new voices (now 187 in total) overall across all languages/variants, including 38 new <a href="https://deepmind.com/blog/wavenet-generative-model-raw-audio/">WaveNet</a> neural net-powered voices (now 95 in total). See the complete list <a href="https://cloud.google.com/text-to-speech/docs/voices">here</a>.<br/></li><li>Availability of at least one WaveNet voice in all 32 languages/variants.<br/></li></ul><p>With these updates, Cloud Text-to-Speech developers can now reach millions more people across numerous countries with their applications—with many more languages to come. This enables a broad range of use cases, including Contact Center AI virtual agents, interacting with IoT devices in cars and the home, and audio-enablement of books and other text-based content.<br/></p></div></div><div class="block-image_full_width"><div class="article-module h-c-page"><div class="h-c-grid"><figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 "><img alt="cloud text-to-speech languages.png" src="https://storage.googleapis.com/gweb-cloudblog-publish/images/text-to-speech-regions.0873101016261842.max-1000x1000.png"/></figure></div></div></div><div class="block-paragraph"><div class="rich-text"><p>Google Cloud Text-to-Speech runs on Google’s <a href="https://cloud.google.com/tpu/">Tensor Processing Units (TPUs)</a>—custom silicon chips that we designed from the ground up to accelerate machine learning and AI workloads. Our unique compute infrastructure, together with cutting-edge research, has allowed us to develop and deploy WaveNet voices much faster than is typical in the industry. Cloud Text-to-Speech launched a year and a half ago with 6 WaveNet voices in 1 language, and we now have 95 WaveNet voices in 33 languages.</p><p>Among the major public cloud platforms, Cloud Text-to-Speech now offers the most languages/variants with “natural” (neural net-powered) voices, and the most voices overall:</p></div></div><div class="block-image_full_width"><div class="article-module h-c-page"><div class="h-c-grid"><figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 "><img alt="cloud text-to-speech voices.png" src="https://storage.googleapis.com/gweb-cloudblog-publish/images/cloud_text-to-speech_voices_graph_29eoq0Y.max-1000x1000.png"/></figure></div></div></div><div class="block-paragraph"><div class="rich-text"><p><b>The WaveNet advantage<br/></b>When customers call into contact centers, use verbal commands with connected devices in cars or in their homes, or listen to audio conversions of text-based media, they increasingly expect a voice that sounds natural and human. Businesses that offer human-sounding voices offer the best experiences for their customers, and if that experience can also be provided in numerous languages and countries, that advantage becomes global. </p><p>WaveNet in Cloud Text-to-Speech makes that advantage possible without the need for vast investments in developing your own AI-powered speech synthesis. Based on neural-net technology, WaveNet creates natural-sounding voices, closing the perceived quality gap between speech synthesis and human speech in US English by 70% per Mean Opinion Score. The practical impact is that for most listeners, a WaveNet voice makes human/computer interaction a smooth and familiar experience.</p></div></div><div class="block-image_full_width"><div class="article-module h-c-page"><div class="h-c-grid"><figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 "><img alt="WaveNet cloud text-to-speech.png" src="https://storage.googleapis.com/gweb-cloudblog-publish/images/WaveNet_cloud_text-to-speech.max-1000x1000.png"/></figure></div></div></div><div class="block-paragraph"><div class="rich-text"><p>The difference between a standard synthetic voice and a WaveNet one is pretty clear; just listen to some of the new voices for yourself:</p></div></div><div class="block-paragraph"><div class="rich-text"><p><b>English (India):</b><a href="https://storage.googleapis.com/speech-docs/tts/Audio%20samples/En-in/en-in-Std.wav">Standard Voice</a> vs <a href="https://storage.googleapis.com/speech-docs/tts/Audio%20samples/En-in/en-in-WaveNet.wav">WaveNet Voice</a><br/><b>Hungarian</b>: <a href="https://storage.googleapis.com/speech-docs/tts/Audio%20samples/hu-hu/hu-hu-Std.wav">Standard Voice</a> vs <a href="https://storage.googleapis.com/speech-docs/tts/Audio%20samples/hu-hu/hu-hu-WaveNet.wav">WaveNet Voice</a><br/><b>Vietnamese</b>: <a href="https://storage.googleapis.com/speech-docs/tts/Audio%20samples/vietnamese/vi-vn-Std.wav">Standard Voice</a> vs <a href="https://storage.googleapis.com/speech-docs/tts/Audio%20samples/vietnamese/vi-vn-WaveNet.wav">WaveNet Voice</a><br/><b>Mandarin Chinese</b>: <a href="https://storage.googleapis.com/speech-docs/tts/Audio%20samples/cmn-cn/cmn-cn-Std.wav">Standard Voice</a> vs <a href="https://storage.googleapis.com/speech-docs/tts/Audio%20samples/cmn-cn/cmn-cn-WaveNet.wav">WaveNet Voice</a><br/><b>Japanese</b>: <a href="https://storage.googleapis.com/speech-docs/tts/Audio%20samples/ja-jp/ja-jp-Std.wav">Standard Voice</a> vs <a href="https://storage.googleapis.com/speech-docs/tts/Audio%20samples/ja-jp/ja-jp-WaveNet.wav">WaveNet Voice</a><br/></p></div></div><div class="block-paragraph"><div class="rich-text"><p>For a demo using text of your choosing, test-drive the <a href="https://cloud.google.com/text-to-speech/">example UI</a> we built using the Cloud Text-to-Speech API.</p><p><b>Next steps<br/></b>Cloud Text-to-Speech is free to use up to the first million characters processed by the API, so it’s easy to get started by building a simple test/demo app using your own data. We look forward to seeing what you build!</p></div></div></body></html></description><pubDate>Tue, 27 Aug 2019 15:00:00 -0000</pubDate><guid>https://cloud.google.com/blog/products/ai-machine-learning/cloud-text-to-speech-expands-its-number-of-voices-now-covering-33-languages-and-variants/</guid><category>Google Cloud Platform</category><category>AI & Machine Learning</category><media:content url="https://storage.googleapis.com/gweb-cloudblog-publish/images/Cloud_Text-to-Speech.max-600x600.jpg" width="540" height="540"></media:content><og xmlns:og="http://ogp.me/ns#"><type>article</type><title>Cloud Text-to-Speech expands its number of voices by nearly 70%, now covering 32 languages and variants</title><description>With today’s updates, Cloud Text-to-Speech developers can now reach millions more people across numerous countries with their applications—with many more languages to come.</description><image>https://storage.googleapis.com/gweb-cloudblog-publish/images/Cloud_Text-to-Speech.max-600x600.jpg</image><site_name>Google</site_name><url>https://cloud.google.com/blog/products/ai-machine-learning/cloud-text-to-speech-expands-its-number-of-voices-now-covering-33-languages-and-variants/</url></og><author xmlns:author="http://www.w3.org/2005/Atom"><name>Dan Aharon</name><title>Product Manager, Speech</title><department></department><company></company></author></item><item><title>New report analyzes the future of workplace productivity</title><link>https://cloud.google.com/blog/products/productivity-collaboration/new-report-analyzes-the-future-of-workplace-productivity/</link><description><html><head></head><body><div class="block-paragraph"><div class="rich-text"><p><i>TL;DR: we examined the future of work in a recent report. <a href="https://cloud.google.com/make-it-work">Download and read the findings</a>. </i></p><p>Look at the contemporary business landscape, and it seems like everything has changed in just a short amount of time. </p><p>Today’s mid-career professional may have been in high school when the World Wide Web made the Internet a big commercial proposition. She likely started her career just before the dotcom bust, and, for nearly two decades, has witnessed the advent of big data, mobile, artificial intelligence, cloud computing, robotics, ecommerce, social media and more. Alongside the advent of these shifts in tech, the “office” has also transformed. From closed doors to cubicles to open plan, from typewriters to email to instant messaging, each transformation occurred in search of better information sharing and problem solving. </p><p>Yet while it’s true that the world has changed, our ambitions as workers have not. The same things we’ve always wanted to get out of work remain: </p><ul><li>To be able to work fast, with fewer mind-numbing hassles in our day.</li><li>To be able to work smart, with quick access to the best possible information and the sharpest expertise.</li><li>To be able to chase the best ideas, and get our work recognized and improved for maximum impact.</li></ul><p>While technology has increased the number of people we can connect with and how readily we can access new information, these opportunities can at times look like new challenges, especially if you rely on dated tools in the workplace. <a href="https://www.insight.com/content/dam/insight-web/en_US/pdfs/hbr/hbr-the-connected-workforce-report.pdf?utm_campaign=WREC_180601_Q2_ac1147_The%20Connected%20Workforce:%20Maximizing%20Productivity,%20Creativity%20and%20Profitability.02.Converted&amp;utm_source=marketo&amp;utm_medium=email&amp;utm_content=main-cta-button&amp;refcode&amp;mkt_tok=eyJpIjoiWW1NeU1tVm1ZVEE1TkRJeiIsInQiOiJ1NUg3b3ZcL3RsVVBkMitGY1BCUGkyYzBWSWVhcmQzZGMrMUhQN3N5Y2xncExCNFwvSHhtN1ZNN3o3TnlMbGZTWW53VVJyYVBLd1V2WTgzQ1VzR0FcL2RCc2FtaDNNMnRQTUZKazl2dVJNYmI5aGZqejNyOVhiVGZ2UFdhTFlcLzdGcjAifQ%3D%3D">Nearly four in 10</a> U.S.-based business and IT leaders say their current systems make it harder, not easier, for their employees to work quickly. It’s like being asked to make carbon paper copies, when the rest of the world was first on email. </p><p>Google’s <a href="https://cloud.google.com/make-it-work">latest report</a> on the future of work examines challenges such as this, and how businesses can change their tools, workflows, and cultures to improve productivity and encourage innovation in the modern workplace. </p><p>One of the interesting things about Google is that it was one of the first great companies to grow up assuming the internet as part of life. Consequently, this paved the way for the arrival of web-based email systems like Gmail, and productivity software to drive location-agnostic collaboration, like Google Drive or Docs. If you look at how these tools now incorporate advanced security and artificial intelligence for faster task execution, you’ll see a deep reflection of how work—and the world—has changed. People use these tools, however, because they meet human needs that have not changed.</p><p><a href="https://cloud.google.com/make-it-work">Click here</a> to download Google’s full report on the future of work, collaboration and productivity.</p></div></div></body></html></description><pubDate>Tue, 27 Aug 2019 13:00:00 -0000</pubDate><guid>https://cloud.google.com/blog/products/productivity-collaboration/new-report-analyzes-the-future-of-workplace-productivity/</guid><category>G Suite</category><category>Chrome Enterprise</category><category>Drive</category><category>Gmail</category><category>Docs</category><category>Research</category><category>Productivity & Collaboration</category><media:content url="https://storage.googleapis.com/gweb-cloudblog-publish/images/Google_Beyond_Custom_Ink.max-600x600.jpg" width="540" height="540"></media:content><og xmlns:og="http://ogp.me/ns#"><type>article</type><title>New report analyzes the future of workplace productivity</title><description></description><image>https://storage.googleapis.com/gweb-cloudblog-publish/images/Google_Beyond_Custom_Ink.max-600x600.jpg</image><site_name>Google</site_name><url>https://cloud.google.com/blog/products/productivity-collaboration/new-report-analyzes-the-future-of-workplace-productivity/</url></og><author xmlns:author="http://www.w3.org/2005/Atom"><name>Quentin Hardy</name><title>Head of Editorial, Google Cloud</title><department></department><company></company></author></item></channel></rss> |