Enhance! Super Resolution From Google | Two Minute Papers #124


Dear Fellow Scholars, this is Two Minute Papers
with Károly Zsolnai-Fehér. What is super resolution? Super resolution is process where our input
is a coarse, low resolution image, and the output is the same image, but now with more
details and in high resolution. We’ll also refer to this process as image
upscaling. And in this piece of work, we are interested
in performing single image super resolution, which means that no additional data is presented
to the algorithm that could help the process. Despite the incredible results seen in practically
any of the crime solving television shows out there, our intuition would perhaps say
that this problem, for the first sight, sounds impossible. How could one mathematically fill in the details
when these details are completely unknown? Well, that’s only kind of true. Let’s not confuse super resolution with image
inpainting, where we essentially cut an entire part out of an image and try to replace it
leaning on our knowledge of the surroundings of the missing part. That’s a different problem. Here, the entirety of the image is known,
and the details require some enhancing. This particular method is not based on neural
networks, but is still a learning-based technique. The cool thing here, is that we can use a
training dataset, that is, for all intents and purposes, arbitrarily large. We can just grab a high resolution image,
convert it to a lower resolution and we immediately have our hands on a training example for the
learning algorithm. These would be the before and after images,
if you will. And here, during learning, the image is subdivided
into small image patches, and buckets are created to aggregate the information between
patches that share similar features. These features include brightness, textures,
and the orientation of the edges. The technique looks at how the small and large
resolution images relate to each other when viewed through the lens of these features. Two remarkably interesting things arose from
this experiment: – one, it outperforms existing neural network-based
techniques, – two, it only uses 10 thousand images, and
one hour of training time, which is in the world of deep neural networks, is so little,
it’s completely unheard of. Insanity. Really, really well done. Some tricks are involved to keep the memory
consumption low, the paper discusses how it is done, and there are also plenty of other
details within, make sure to have a look, as always, it is linked in the video description. It can either be run directly on the low resolution
image, or alternatively we can first run a cheap and naive decade-old upscaling algorithm,
and run this technique on this upscaled output to improve it. Note that super resolution is a remarkably
competitive field of research, there are hundreds and hundreds of papers appearing on this every
year, and almost every single one of them seems to be miles ahead of the previous ones. Where in reality, the truth is that most of
these methods have different weaknesses and strengths, and so far I haven’t seen any technique
that would be viable for universal use. To make sure that a large number of cases
is covered, the authors posted a sizeable supplementary document with comparisons. This gives so much more credence to the results. I am hoping to see a more widespread adoption
of this in future papers in this area. For now, when viewing websites, I feel that
we are close to the point where we could choose to transmit only the lower resolution images
through the network and perform super resolution on them locally on our phones and computers. This will lead to significant savings on network
bandwidth. We are living amazing times indeed. If you are enjoying the series, make sure
to subscribe to the channel, or you can also pick up really cool perks on our Patreon page
through this icon here with the letter P. Thanks for watching and for your generous
support, and I’ll see you next time!

57 thoughts on “Enhance! Super Resolution From Google | Two Minute Papers #124

  1. Another related blog post that might be worth including in the video description: https://www.blog.google/products/google-plus/saving-you-bandwidth-through-machine-learning/

  2. This is our very first episode made with Final Cut Pro. We hope there were no bigger glitches and that you've enjoyed it! 🙂

  3. The whole point of image compression is that you can choose what pieces of data you want to record. Scaling down, transmitting and using super resolution is less than ideal because you want more data describing the details than the plainer sections. Jpeg already uses more bytes to describe the details, so has an advantage on this. Also any predictable pattern in the compressed data means that bandwidth is being wasted, perfectly compressed data is indistinguishable from random noise if you don't know how to decompress it. Add to that the fact that Jpeg is fast so phone processors can use it.

    A file format that automatically fills gaps with plausible details would be a nightmare in court when people argue parts of an image were "refilled" to show something that never happened.

  4. Would it potentially work for 4 or 8k video or would that add too much potential for artifacts?

    About how long does it take to enhance a typical picture? Fractions of a second, seconds, minutes? If it takes minutes, it might take some time before it gets to video; I don't know how much specialized microarchitecture can make up for the ever slower increase in CPU cycles. Unless it can be done on the GPU.

  5. I wonder if a similar method could be used for image denoising. Phone camera sensor algorithms are very simplistic and there is surely a room for improvement, given ISP/GPU/CPU power already there.

  6. It's my final year graduation project at 2011 about it.
    And now am preparing to make master thesis on this field.

    Thanks for this video

  7. Well, saving on bandwidth is, of course, tempting and the sensible thing to do, but it also harbors the greatest danger of this technique: distorting the (details of) data while at the same time presenting them as "real". Post-factualism has just gained an entirely new perspective.

  8. Can such an algorithm (learning AI) be run in multiple passes to further increase the result? E.g. at 2:20 the best result still has heavy aliasing, e.g. the capital 'A' which is large enough yet it just looks like a standard sharpening filter (increasing contrast). I never understood how an algorithm can decide to anti-alias a line at an angle or leave it pixelated and just sharpining it, though – that is, understanding that it's a line and not a pattern.

  9. In practical use low resolution images are often also jpeg files which are also strongly compressed. The compression artifact must be taken into account from my experience because most of these algorithms tend to exaggerate the artifacts.
    One paper I've found describes an algorithm that not only takes care of exactly this problem but also can be used for superresolution itself: https://arxiv.org/abs/1611.04994

  10. But no OpenSource code out there yet – or have you seen/tested a working github project? I only found one for greyscale images: https://github.com/HerrHao/RAISR

  11. Incredible top quality videos on channel. Realy appreciate Your work. Althou I miss the nice music that was on most previous vids.

    Keep up the good work

  12. Fun fact for normal people (without glasses):
    anyone who is short-sighted views the world like it has very low resolution. When objects come closer to the eyes, the resolution increases. Light sources may have halos around them.
    Glasses or contact lenses correct the "resolution" and also get rid of halos most of the time.

  13. I'm having serious trouble to see the difference between nearest-neighbour and this new algorithm, even inside the papers.

  14. is its available for public use?

    There are some really cool images I want to turn into t-shirts, but their resolution kinda sucks.

  15. The non linear complex degradation model used in paper appears to occur in the neighborhood of manifolds, and likewise, so do neural nets.

  16. And I just watched a video not so long ago state that it's impossible. Lesson learnt: don't trust people who seem like they don't know shit

  17. It's important to remember all of these algorithms are making a guess. Once you lose the high frequency information, it's gone forever. You can't out-Nyquist Nyquist…

  18. I am just wondering how these results are obtained?
    Is there a freely available software to evaluate these techniques, especially RAISR?

  19. this means robots will have ultra long-range vision and super-zoom abilities for seeing little particles and reading tiny anything from 100ft away, I can't wait to see how badass the robolords are

  20. Better yet, just generate a caption for the image, send that caption, then generate the image from the caption on the other end!

    It might be a little lossy, but you save so much bandwidth!

  21. I first ran across super resolution while surfing the website at the Technion school in Israel in 2002. They provided free Matlab scripts for the process. I was studying high dynamic range imaging at the time.

  22. Amazing work with so little training time and data!!! However, when you mentioned in the finishing section of the video that we could save bandwidth by sending low res images and then have high res on end devices, how actually that would be done? locally or server side?

  23. In 1985 while a Israeli engineer was installing a state of the art image manipulation tool at this company worth $5million I was inspired by previews that had been shown of what it could do. (Years later Photoshop could do something similar.) I was curious after it was set up what other capabilities it had. The engineer took me aside and showed me this rasterised pixelated image and by adjusting the image completely re resed the file putting all the detail back in, you could now make out the whole image clearly. It became clear to me then that the Israelis were using this equipment then for satellite imaging. I did not say anything but we just looked at each other and smiled. That was 1985.

  24. It seems to me that this could be the basis for a "super resolution" or a "super compression" method far superior to Google's previously mentioned in another video. What you could do is take a high resolution image and turn it into a 1 bit Pixel map, use this as your input, then take the full color version and scale it down 100x to a thumbnail, then use the input to generate a set of outputs and then use the thumbnail to isolate the one that is most like the original.

  25. This reminded me of Content Aware Resizing. Another way of handling small screens: https://www.youtube.com/watch?v=qadw0BRKeMk Nothing ever came of it, but both techniques combined could be interesting!

  26. Rather than scaling down, transmitting and the upscaling to save on bandwidth, I wonder whether research into "super resolution" techniques will result in a new kind of image compression to rival jpeg/mpeg, etc.?

  27. Are there any websites/programs on the internet that can do that for consumer use? The only good one I found was the letsenhance website but I was wondering if there are better options out there.

  28. I'm interested in super-resolution but made out of several super imposed images where cameras are in an array. Can you do a video about this subject? I'm sure you are really busy but maybe if you bump into some open source implementations maybe send me the links

  29. It can only enhance what's already decipherable. Anything that a human eye cannot guess is lost to these techniques as well.

Leave a Reply

Your email address will not be published. Required fields are marked *