[visionlist] Call for Papers: ICCV 2017 Workshop on Audio-Visual Media


Call for Papers

ICCV 2017 Workshop on Computer Vision for Audio-Visual Media (CVAVM)

Venice, Italy – October 23, 2017


Website: http://ift.tt/2sjMpJD







Paper registration (title, abstract and authors): July 19, 2017

Full paper submission: July 21, 2017

Acceptance notification: August 11, 2017

Camera-ready paper due: August 25, 2017

Workshop date: October 23, 2017 (morning)

ICCV main conference date: October 24-27, 2017






The ICCV 2017 workshop entitled Computer Vision for Audio-Visual Media (CVAVM) is dedicated to the role of computer vision for audio-visual media. Audio-visual data is readily available since it is simple to acquire and the great majority of videos today contain an audio track. Audio-visual media are ubiquitous in our daily life: from movies to TV programs to music videos to YouTube clips, to cite just a few. Moreover audio-visual media exist on various platforms: TVs, movie theaters, tablets and smartphones. Audio-visual media are also applied in many casual and professional contexts and applications such as entertainment, machine learning, biomedical, games, education, movie special effects, among many others.


Our workshop invites paper submissions on any applications and algorithms that combine visual and audio information. The first major thrust is how the combination of audio and visual information can simplify or improve “traditional” computer vision applications, in particular (but not limited to) action recognition, video segmentation and 3D reconstruction. The second major thrust is the exploration of emerging, novel and unconventional applications of audio-visual media, for example movie trailer generation, video editing, and video-to-music alignment. See the list of topics below.


People from the computer vision, machine learning, audio processing and multimedia communities are welcome to submit papers and attend the workshop. It is a wonderful and exciting opportunity to foster the collaboration between the research communities.


Topics include (but are not limited to):

– multi-modal learning and deep learning

– automatic video captioning

– joint audio-visual processing

– 3D reconstruction and tracking

– scene/action recognition, and video classification

– video segmentation and saliency

– speaker identification

– speech recognition in videos

– automatic captioning

– virtual/augmented reality (VR/AR) and tele-presence

– human-computer interaction (HCI)

– automatic generation of videos

– trailer generation

– video and movie manipulation

– video synchronization

– image sonification

– video-to-music alignment

– joint audio-video retargeting





Josh McDermott, Assistant Professor at MIT, USA, http://ift.tt/1khyKFU

Rémi Ronfard, Research Director at INRIA, France, http://ift.tt/2sjOjKg





The paper submission is similar to the ICCV main conference. Papers are limited to 8 pages (excluding references), including figures and tables. The reviewing will be double-blind, and each submission will be reviewed by at least two reviewers. All the accepted papers will be published in the ICCV workshop proceedings.






Jean-Charles Bazin, Assistant Professor at KAIST, http://ift.tt/2th2Tjd

William T. Freeman, Professor at MIT, https://billf.mit.edu/

Zhengyou Zhang, Principal Researcher and Research Manager at Microsoft Research, http://ift.tt/2fuNaqv



Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s