ITU-T J 902-2012 Multilayered data structure for scalable view-range representation (Study Group 9)《可伸缩视场深度表示的多层数据结构 9号研究组》.pdf

资源描述

1、 International Telecommunication Union ITU-T J.902TELECOMMUNICATION STANDARDIZATION SECTOR OF ITU (01/2012) SERIES J: CABLE NETWORKS AND TRANSMISSION OF TELEVISION, SOUND PROGRAMME AND OTHER MULTIMEDIA SIGNALS Transmission of 3-D TV services Multilayered data structure for scalable view-range repres

2、entation Recommendation ITU-T J.902 Rec. ITU-T J.902 (01/2012) i Recommendation ITU-T J.902 Multilayered data structure for scalable view-range representation Summary Recommendation ITU-T J.902 specifies a data structure within the scope of Recommendation ITU-T J.901, in order to provide a scalable

3、view-range representation. This Recommendation does not specify view generation schemes, but defines the generic data structure to be used to achieve an efficient view generation. History Edition Recommendation Approval Study Group 1.0 ITU-T J.902 2012-01-13 9 ii Rec. ITU-T J.902 (01/2012) FOREWORD

4、The International Telecommunication Union (ITU) is the United Nations specialized agency in the field of telecommunications, information and communication technologies (ICTs). The ITU Telecommunication Standardization Sector (ITU-T) is a permanent organ of ITU. ITU-T is responsible for studying tech

5、nical, operating and tariff questions and issuing Recommendations on them with a view to standardizing telecommunications on a worldwide basis. The World Telecommunication Standardization Assembly (WTSA), which meets every four years, establishes the topics for study by the ITU-T study groups which,

6、 in turn, produce Recommendations on these topics. The approval of ITU-T Recommendations is covered by the procedure laid down in WTSA Resolution 1. In some areas of information technology which fall within ITU-Ts purview, the necessary standards are prepared on a collaborative basis with ISO and IE

7、C. NOTE In this Recommendation, the expression “Administration“ is used for conciseness to indicate both a telecommunication administration and a recognized operating agency. Compliance with this Recommendation is voluntary. However, the Recommendation may contain certain mandatory provisions (to en

8、sure, e.g., interoperability or applicability) and compliance with the Recommendation is achieved when all of these mandatory provisions are met. The words “shall“ or some other obligatory language such as “must“ and the negative equivalents are used to express requirements. The use of such words do

9、es not suggest that compliance with the Recommendation is required of any party. INTELLECTUAL PROPERTY RIGHTS ITU draws attention to the possibility that the practice or implementation of this Recommendation may involve the use of a claimed Intellectual Property Right. ITU takes no position concerni

10、ng the evidence, validity or applicability of claimed Intellectual Property Rights, whether asserted by ITU members or others outside of the Recommendation development process. As of the date of approval of this Recommendation, ITU had received notice of intellectual property, protected by patents,

11、which may be required to implement this Recommendation. However, implementers are cautioned that this may not represent the latest information and are therefore strongly urged to consult the TSB patent database at http:/www.itu.int/ITU-T/ipr/. ITU 2012 All rights reserved. No part of this publicatio

12、n may be reproduced, by any means whatsoever, without the prior written permission of ITU. Rec. ITU-T J.902 (01/2012) iii Table of Contents Page 1 Scope 1 2 References. 1 3 Definitions 1 3.1 Terms defined in this Recommendation . 1 4 Abbreviations and acronyms 1 5 Conventions 2 6 A hypothetical syst

13、em configuration based on Recommendation ITU-T J.901 . 2 7 Scalable view-range representation 2 7.1 Limitation of view generation by a single data set . 2 7.2 Multilayered data structure . 2 Appendix I Walk-through view illustration 4 I.1 View generation set-up . 4 I.2 Simple inter-view generation .

14、 4 I.3 Walkthrough view generation 5 Appendix II Examples of data format . 8 II.1 MPEG-3DV 8 II.2 Multilayer multi-view video plus depth . 9 Bibliography. 10 Rec. ITU-T J.902 (01/2012) 1 Recommendation ITU-T J.902 Multilayered data structure for scalable view-range representation 1 Scope Free viewpo

15、int television (FTV) is an innovative technology that allows one to view a three-dimensional (3D) world by freely changing the viewpoint. The transmission aspect of FTV is published as Recommendation ITU-T J.901. ITU-T J.901 defines the reference system configuration and shows the allocation of dept

16、h estimation and the interpolation module in the configuration. Then, ITU-T J.901 specifies the requirements for the protocols and data format that are needed, in accordance with the configuration. The most favourable feature of FTV is its ability to offer an audience a selection of viewpoints. Howe

17、ver, as a reproducible range of viewpoints can give rise to a trade-off with the amount of data to be transmitted, ITU-T J.901 demands data scalability as an optional requirement. This Recommendation specifies the data structure within the scope of ITU-T J.901, where the data structure enables scala

18、bility in the sense of a reproducible view-range and the amount of data. This Recommendation utilizes existing data representation and view generation schemes specified by ITU and other related standardization organizations, as well as future view generation schemes. 2 References The following ITU-T

19、 Recommendations and other references contain provisions which, through reference in this text, constitute provisions of this Recommendation. At the time of publication, the editions indicated were valid. All Recommendations and other references are subject to revision; users of this Recommendation

20、are therefore encouraged to investigate the possibility of applying the most recent edition of the Recommendations and other references listed below. A list of the currently valid ITU-T Recommendations is regularly published. The reference to a document within this Recommendation does not give it, a

21、s a stand-alone document, the status of a Recommendation. ITU-T J.901 Recommendation ITU-T J.901 (2008), Requirements for the free viewpoint television (FTV) video transmission system. 3 Definitions 3.1 Terms defined in this Recommendation This Recommendation defines the following terms: 3.1.1 depth

22、 Distance from the capturing camera to a surface of an object in the scene. 3.1.2 view range: The position and direction of a viewpoint in a three-dimensional (3D) scene, where a virtual view can be generated. 4 Abbreviations and acronyms This Recommendation uses the following abbreviations and acr

23、onyms: 3D Three-Dimensional FTV Free Viewpoint Television 2 Rec. ITU-T J.902 (01/2012) 5 Conventions None. 6 A hypothetical system configuration based on Recommendation ITU-T J.901 ITU-T J.901 bases view generation technology using multi-view images with their depth information. This also applies to

24、 this Recommendation. There are several configurations to place the depth search functionality. This Recommendation uses the configuration in Figure 1, which originates from Figure 2 of ITU-T J.901. J.902(12)_F01CaptureCorrectionDepthsearchEncoderTransmission/storageDecoderInter-polationControlDispl

25、ayFigure 1 System configuration based on Recommendation ITU-T J.901 In this configuration, a depth search is performed at the sender side, while interpolation is performed at the receiver side. The sender transmits multi-view images as well as depth information with some additional parameters (ex. c

26、amera parameters). The computational load at the receiver side is reduced because the depth information significantly helps the interpolation process. 7 Scalable view-range representation 7.1 Limitation of view generation by a single data set In this Recommendation the data set is comprised of image

27、s, depth information, and additional parameters. In general, it is possible to generate virtual view images at the viewpoint between video cameras using the data set. The virtual viewpoint can be slightly distant from the line connecting the cameras. However, there are limitations to moving the view

28、point freely due to occlusion and a limited resolution of images. The former arises with the existence of objects in front of the cameras. The scene behind the object is not captured by the cameras, and cannot be generated. The latter affects the generated picture quality. When the viewpoint moves f

29、orward and gets close to the objects, the generated images are degraded due to excessive image enlargement. 7.2 Multilayered data structure In the above case, the substantial problem is that the data set does not have sufficient information to generate the scene. One solution is to provide the data

30、set for any position of the scene, however it causes a data size problem. Figure 2 shows the concept of a multilayered data structure. Here the data is classified into each layer. In the case of occlusion by the object at Layer 1, it can be solved by Layer 2. Hence, the data set has more ability to

31、generate view images using additional layers. That means that such layer structure provides scalability to the data set. Figure 3 shows another configuration for spatial segmentation. This configuration can describe a wide space in an efficient manner. It allows a video server to transmit only the n

32、ecessary parts of the whole data when the required view range is limited to a part of the whole range. Rec. ITU-T J.902 (01/2012) 3 A multilayered data structure is specified as follows: Each layer has a set of images, depth information, and associated parameters. Each layer has an indicator to dist

33、inguish it and the reproducible view range information. It is noted that the image data is not limited to a real camera image, but can include a virtually generated image; depth information describes a kind of 3D object model; associated parameters including optical and geometrical parameters such a

34、s the projection matrix of each image. A reproducible view range tells a view rendering system an area of virtual viewpoint to be displayed, so that the system is able to know which layer data is necessary for rendering the requested viewpoint. It improves the system performance regarding response t

35、ime and transmission bandwidth, since the system can avoid the excessive and/or deficient transmission of data. J.902(12)_F02Layer 3Layer 2Layer 1ObjectFigure 2 Conceptual figure of multilayered data structure (Arrow denotes a camera position.) J.902(12)_F03Layer 3Layer 2Layer 1Figure 3 Another conf

36、iguration of multilayered data structure Examples of data formats are presented in the appendices according to each depth-based image generation scheme. 4 Rec. ITU-T J.902 (01/2012) Appendix I Walk-through view illustration (This appendix does not form an integral part of this Recommendation.) This

37、appendix illustrates an example of view generation that can achieve walk-through experience which solves the occlusion problem by utilizing a layered structure. I.1 View generation set-up In the simulation, multi-view videos are virtually generated using the multiple local ray-space method in b-Tehr

38、ani. Depth information is also estimated using the 3-D model information that is available in the multiple local ray-space method. Multi-view video plus depth: 1 layer, 180 views per layer (maximum disparity =50 pixels) Multilayer multi-view video plus depth: 3 layers, 180 views per layer (maximum d

39、isparity =50 pixels) I.2 Simple inter-view generation Three examples of free viewpoint generation using single multi-view video plus depth are provided below in Figures I.1, I.2 and I.3. Figure I.1 Example 1 of free viewpoint generation using single multi-view video plus depth Rec. ITU-T J.902 (01/2

40、012) 5 Figure I.2 Example 2 of free viewpoint generation using single multi-view video plus depth Figure I.3 Example 3 of free viewpoint generation using single multi-view video plus depth I.3 Walkthrough view generation Figure I.4 represents the free viewpoint within the first layer, in the multila

41、yer multi-view video plus depth representation. The result in this layer can also be achieved with single multi-view video plus depth. 6 Rec. ITU-T J.902 (01/2012) Figure I.4 Example of free viewpoint within the first layer in the multilayer multi-view video plus depth Figure I.5 represents an examp

42、le of walk-through view generation in the second layer of multilayer multi-view video plus depth. Figure I.5 Example of walk-through view generation in the second layer of multilayer multi-view video plus depth Figure I.6 represents an example of walk-through view generation in the third layer of mu

43、ltilayer multi-view video plus depth. Rec. ITU-T J.902 (01/2012) 7 Figure I.6 Example of walk-through view generation in the third layer of multilayer multi-view video plus depth 8 Rec. ITU-T J.902 (01/2012) Appendix II Examples of data format (This appendix does not form an integral part of this Re

44、commendation.) This appendix gives examples of multilayer representation based on existing data formats. Each base data format uses a view image and depth for view generation. II.1 MPEG-3DV MPEG-3DV is developing a 3D video format that enables both advanced stereoscopic display processing and improv

45、ed support for auto-stereoscopic N-view displays. They use multiple video data for input, and generate an arbitrary number of views with the appropriate disparity for 3D presentation. In this clause, the video format in the MPEG-3DV test sequence is used with spatial segmentation (Figure II.1). Ther

46、efore, the basic nature of each data is the same as those in MPEG-3DV1. J.902(12)_FII.1Stereoscopic displays- Variable stereo baselineAdjust depth perception-LeftRightDataformatDataformatConstrained rate(based on distribution)Auto-stereoscopicN-view displays-Wide viewing angleLarge number of output

47、viewsLimitedcameraimputsFigure II.1 Target of the 3D video format b-ISPA II.1.1 Data representation a) Video, depth and associated information View location: all view images are on a straight line with the same interval. View direction: direction of all view images is the same and perpendicular to t

48、he image mounted line. Focal length: common to all view images. Image resolution: common to all view images, and rectangular shaped. b) Reproducible view range View location: on the same line and in between input view images. View direction: Same as input view images. Focal length: Same as input vie

49、w images. Image resolution: Same as input view images. _ 1It should be noted that the MPEG-3DV discussion is on-going, and the information included in the data format is tentative. Rec. ITU-T J.902 (01/2012) 9 II.2 Multilayer multi-view video plus depth As described in Appendix I, the multilayer multi-view video plus depth representation has the ability of a walk-through view generation (clause I.3). The key issue of the walk-through view generation is its resolution of the occlusion problem that prevents the

展开阅读全文